Read more of this story at Slashdot.
AI model development has reached an inflection point, bringing high-performance computing capabilities typically reserved for the cloud out to edge devices. It’s a refreshing perspective compared to the all-consuming nature of large language models (LLMs) and the GPUs needed to run them.
“You’re gonna run out of compute, power, energy and money at some point,” said Zach Shelby, CEO and co-founder of Edge Impulse, a Qualcomm Technologies company. “We want to deploy [generative AI] so broadly. It’s not scalable, right? And then it runs into so many reliability issues. It runs into power issues.”
At the edge, power matters differ, according to the device. The upshot, though? These devices can run a variety of language models, but LLMs pose a noteworthy challenge.
The AI story is about more than just the big data centers. We need the edge to run applications close to the data that the models process. Round-trip trips to a cloud service in a region across the country get expensive and pose a variety of issues that make real-time applications unusable.
Shelby started Edge Impulse in 2019 with Jan Jangboom, the company’s CTO. Shelby spoke with The New Stack on two occasions following Edge Impulse’s annual Imagine conference at the Computer History Museum in Mountain View, Calif. The company offers an edge AI platform for collecting data, training models and deploying them to edge computing devices.
“We need to find ways to make these probabilistic LLM architectures behave more deterministic, for no human in the loop, or minimum human in the loop applications,” Shelby said.
LLMs have multiple use cases for the back office, but the edge is a bit different in industrial environments.
There are many different types of architectures, such as small language models (SLMs), visual language models (VLMs) and others that are increasingly useful on the edge. But the use case remains unclear when it comes to large language general models typically used in consumer markets.
“Where do companies see real value?” Shelby asked. “That’s been a challenge in the early days of LLMs in industrial” settings.
It’s a matter of what people in the industry really trust, he said: “With industrial, we have to have [a return on investment], right? We have to understand what we’re solving. We have to understand how it works. The bar is much higher.”
VLMs, for example, are maturing fast, Shelby said.
“I do think now, with VLM just maturing fast, we really are finding lots of use cases, because it lets us do complex vision analysis that we couldn’t normally do with discrete models. Super useful, but it requires a lot of testing. You have to have end-to-end testing. You have to parameterize and put these guardrails around it.”
At Imagine, I wore a pair of extended reality (XR) glasses to view a circuit board part. With the glasses, I could detect the part and then choose from a range of questions to ask. I used voice to ask the question, enabling Whisper, a speech recognition service, YOLO (You Only Look Once) and OpenVocabulary for object detection.

That was in turn fed into a Retrieval-Augmented Generation (RAG) tool and integrated with Llama 3.2, which includes small and medium-sized vision LLMs (11B and 90B), and lightweight, text-only models (1B and 3B). The models, according to Meta, fit onto edge and mobile devices, including pre-trained and instruction-tuned versions.
The next step, according to Shelby? Apply agents to the physical AI that Edge Impulse enables with cascading models.
The workload might run in the glass, with one agent interpreting what it sees and what the person is saying. That data may then be cascaded into an AI appliance, where another agent performs the lookup.
“I think that’s really interesting from an edge AI technology, we’re starting to be able to distribute these agents on the edge,” Shelby said. “That’s cool. But I do think that agentic and physical AI does make it understandable.”
People can relate to the XR glasses, Shelby said. And they show the connection between agentic AI and physical AI.
Small, discrete models, such as object detection, are feasible with battery-powered, low-cost embedded devices, he said. However, they cannot manage generative AI (GenAI). For that, you need far more powerful devices on the edge.
“A 10-billion model parameter model, think of that as a small VLM,” Shelby said. “Or a small SLM. So you’re able to do something that is focused. We don’t have a worldview of everything, but we can do something very focused, like vehicle or defect analytics, a very focused human language interface, or a simple SLM to interpret it.
“We could run that on one device. The XR glasses are a good example of this. That is kind of the 12 to 100 TOP class of devices that you can produce today.”
TOP is a term used to describe an NPU’s processing capabilities. An NPU is a neural processing unit used in GenAI. According to Qualcomm, “TOPS quantifies an NPU’s processing capabilities by measuring the number of operations (additions, multiplies, etc.) in trillions executed within a second.”
The XR glasses can run simple, focused applications, Shelby said, such as natural language processing with an SLM for interpretation, on a 12 to 100 TOPS-class device.
Beyond the screen, there is a need for agentic applications that specifically reduce latency and improve throughput.
“You need an agentic architecture with several things going on,” Shelby said about using models to analyze the packaging of pharmaceuticals, for instance. “You might need to analyze the defects. Then you might need an LLM with a RAG behind it to do manual lookup. That’s very complex. It might need a lot of data behind it. It might need to be very large. You might need 100 billion parameters.”
The analysis, he noted, may require integration with a backend system to perform another task, necessitating collaboration among several agents. AI appliances are then necessary to manage multiagent workflows and larger models.
The more complex the task, the more general intelligence is required, which necessitates moving to larger AI appliances.
David Aronchik, CEO and founder of Expanso, said three things will never change on the edge, which will have an impact on how developers build out on edge devices:
Agentic architectures are a layer on top of the data and the networks, Aronchick said. “With those three things being true, that means you’ve got to start moving your agents out there, or programs, or whatever they may be. You’ve got to.”
Expanso provides distributed computing to workloads. Instead of moving the data, the compute goes to the data itself — increasingly relevant as enterprise customers look beyond the cloud for their computing needs. It offers an open source architecture that enables users to run jobs that generate and store data.
What we call the tools of agentic architecture is anyone’s guess, Aronchick said. But like Shelby, Aronchick said latency and throughput are the big issues to resolve. Further, moving data opens security and regulatory issues. With this in mind, it makes sense to keep your applications as close as possible to your servers.
The nature of LLMs, Shelby said, requires a person to tell you if the LLM’s output is correct, which in turn impacts how to judge the relevancy of LLMs in edge environments.
It’s not like you can rely on an LLM to provide an answer to a prompt. Consider a camera in the Texas landscape, focusing on an oil pump, Shelby said. “The LLM is like, ‘Oh, there are some campers cooking some food,’ when really there’s a fire” at the oil pump.
So, how do you make the process testable in a way that engineers expect, Shelby asked. It requires end-to-end guard rails. And that’s why random, cloud-based LLMs do not yet apply to industrial environments.
Edge Impulse tests the output pattern matching that developers expect, while also understanding end-to-end performance and accuracy. The tests are run on real data.
It’s not just the raw camera stream Edge Impulse tests, but also the object detector plus the VLM, and the output’s categorization.
LLMs, Shelby said, need training on relevant base data, such as industrial machinery: “Then you do transfer learning, which is like fine-tuning those models.”
Edge Impulse may then squeeze a lot more neurons into smaller compute, Shelby said, as it controls the architecture for the edge compute environment.
But the LLM use cases still show immaturity, so the company is developing edge constraints for industrial use cases. The base models are essential. The company processes the data as soon as it arrives from the camera using basic preprocessing models.
It needs to be careful with the LLMs, putting up the guardrails and testing the developer experience and usability so that an LLM can be deployed in the field.
“We’re careful to do it really step by step, like we haven’t brought in our LLMs yet,” Shelby said. “We’re still getting convinced how these can be safely used in industry.”
A text-based input for someone out on a wind tower may work OK. Still, there are other input methods, such as voice interfaces, which Shelby said the company is looking at as a way to interact, such as using an SLM with voice interfaces like Whisper to better understand a problem or to do maintenance using natural language automatically.
“We’ll bring in the technology and make it, make it very easy for developers, but you have to do it a little bit more slowly than what the hype is for the cloud,” Shelby said. “It’s interesting. So, that’s the challenge now: How do you expose this stuff?
“With LLMs, what are you going to do — have your maintenance guy chat with the chatbot on an oil pump?”
The post The AI Inflection Point Isn’t in the Cloud, It’s at the Edge appeared first on The New Stack.
The Ember project is excited to announce the release of Ember v6.8. This is a standard minor release as part of the standard Ember Release Train process, but this release isn't just like every other release! We have some exciting new framework features that unlock a new world of experimentation and our build system is now using Vite by default when you generate a new app! 🎉 Keep reading to find out all the details!
Ember.js 6.8 introduces 3 key features, 2 new things for Ember developers to use today and a new way to publish the ember-source package. We have also included one bugfix and there are no new deprecations.
renderComponentThe new renderComponent API provides a way to render components directly into any DOM element, making it easier to integrate components in other environments like d3, ag-grid, WYSIWYG editors, etc. This feature is particularly useful for micro applications, REPLs, and "islands"-based tools.
renderComponent can be imported from @ember/renderer and accepts a component definition along with configuration options:
import { renderComponent } from '@ember/renderer';
const Greeting = <template>Hello {{@name}}!</template>;
const result = renderComponent(Greeting, {
into: document.querySelector('#my-element'),
args: { name: 'World' }
});
// Clean up when done
result.destroy();
The API supports several configuration options including:
into: The DOM element to render intoargs: Arguments to pass to the component - these can be a trackedObjectowner: Optional owner object for service access, (or minimal partial implementation of what your component needs)You can read more about this on in renderComponent() RFC #1068
@ember/reactive/collectionsEmber 6.8 introduces a new package @ember/reactive/collections that provides built-in tracking utilities for common collections. This package includes tracked versions of JavaScript's native collection types: trackedArray, trackedObject, trackedMap, trackedSet, trackedWeakMap, and trackedWeakSet.
These utilities offer performance and ergonomics improvements over what has been used via public APIs.
import { trackedArray } from '@ember/reactive/collections';
import { on } from '@ember/modifier';
const items = trackedArray(['apple', 'banana']);
// usually you would have the pushed item be dynamic but this is only a demo
const addItem = (item) => items.push('cherry');
<template>
{{#each items as |item|}}
<div>{{item}}</div>
{{/each}}
<button type="button" {{on "click" addItem}}>
Add Item
</button>
</template>
You can read more about this in the Built in tracking utilities for common collections RFC #1068.
This feature was inspired by tracked-built-ins and brings these essential reactivity primitives directly into the framework core.
v6.8.0 of ember-source is the first minor version of the package to be published to npm with Trusted Publishing. We will be implementing this across all our packages.
At the bottom of the npm package page, you'll find a section labeled 'Provenance' that provides verification that the package contents were published from the source repository.
Ember.js 6.8 introduces 1 bug fix.
Ember CLI 6.8 introduces 2 key features, a brand-new default app blueprint and a new default for generated templates. There are also 3 minor features, 5 bugfixes, and 2 new deprecations introduced.
This is the first release that enables Embroider by default! 🎉 This has been a monumental effort by the whole community over many years and it represents a new era for Ember developers. The improvements to the developer experience and new capabilities are so numerous that they deserve their own blog post, but here are some of the highlights
These are just some of the highlights, but one key theme that has been true throughout the effort to make Vite the default build system for Ember apps is that we now have an opportunity to integrate much more seamlessly with the wider JS ecosystem. Ember is no longer working in a walled garden, forced to re-implement every good idea that the JS community comes up with. If someone comes up with a Vite plugin that does something cool, chances are that adding it to your Vite config in your Ember app will just work!
Anyone generating a new app using ember new after Ember CLI v6.8 will get an app generated with the new @ember/app-blueprint by default. The new app blueprint has a lot of changes, but each change is explained in great detail in the V2 App Format RFC so it's worth taking a look to understand all the changes.
If you have the need to generate a new app with the classic blueprint after Ember CLI v6.8, we have provided a new blueprint @ember-tooling/classic-build-app-blueprint. You can opt into this blueprint with the -b argument:
ember new -b @ember-tooling/classic-build-app-blueprint
This is not intended to be used long term, for most teams the new default Vite-based blueprint will be the right choice and it represents the intended future direction of the Ember project. Providing the legacy classic-build blueprint is in keeping with Ember's dedication to backwards compatibility and will give teams that can't yet upgrade to Vite some breathing space to upgrade at their own pace.
This also means that any team relying on ember-cli-update can still have an update path without being automatically upgraded to Vite. If you have an existing application and you do want to upgrade to vite you should check out the ember-vite-codemod which will guide you through the upgrade process.
--strict by defaultNow that the default blueprint is using Vite by default it makes sense for newly generated Components and Route templates to use template-tag format (a.k.a GJS). This means that all of the templates in your app will be in "strict mode" and not look up any of the Invokables (Components, Helpers, or Modifiers) on the global resolver, but instead use local scoping to know what Invokable to use in your templates. In practice, for most people, this would mean importing any components that you are using at the top of the file that you are using them (this is why this feature is sometimes referred to as template-imports). This allows build systems to have a better understanding of where your code is coming from and can significantly improve tree-shaking and developer tooling performance.
With the Vite blueprint it makes sense to enable the strict-mode template generation by default, and to keep the new app blueprint and the classic app blueprint in sync we also decided to make it the default for new apps generated with the classic app blueprint. In practice this only sets the required setting in the .ember-cli settings file in your repo to the new default values.
You can read more about the specifics of this feature in the First-Class Component Templates RFC #779.
ember (generate|destroy) (http-proxy|http-mock|server) is used in a Vite-based project--ts alias for the addon, init and new commandsEmber CLI 6.8 introduced 5 bug fixes.
package.json for the classic blueprints @ember-tooling/classic-build-addon-blueprint and @ember-tooling/classic-build-app-blueprint@warp-drive/ember/install to remove deprecation when generating a classic app import from ember-data breakage/deprecationEmber CLI 6.8 introduces 2 new deprecations.
ember init with file names or globsember init is a little known (and under documented) functionality of the ember-cli blueprint system. An even less known functionality was the ability to filter the files that get reinitialized by a path or a glob when running ember init. We know that this was a mostly unknown feature because it was never added to the ember init --help documentation and it has been broken for some time. Instead of trying to fix it for all the new blueprints we opted to deprecate the functionality. You can read more about the deprecation on the deprecation guide for init-no-file-names
ember new --embroiderGenerating an ember app with ember new --embroider generated an app using Embroider@v3 with Webpack. Since Embroider@v4 and Vite is now the default for newly generated apps and provides a significantly better developer experience, nobody should be generating new apps with Embroider@v3 any more. To support people who haven't yet upgraded from Embroider@v3 to Embroider@v4 yet, we have opted not to make this argument generate a new Vite app and instead deprecated it. You can read more about the deprecation on the deprecation guide for dont-use-embroider-option
As a community-driven open-source project with an ambitious scope, each of these releases serves as a reminder that the Ember project would not have been possible without your continued support. We are extremely grateful to our contributors for their efforts.
In this stream, I work on properties and indexers and explicit implementations.
Dhanji R. Prasanna is the chief technology officer at Block (formerly Square), where he’s managed more than 4,000 engineers over the past two years. Under his leadership, Block has become one of the most AI-native large companies in the world. Before becoming CTO, Dhanji wrote an “AI manifesto” to CEO Jack Dorsey that sparked a company-wide transformation (and his promotion to CTO).
We discuss:
1. How Block’s internal open-source agent, called Goose, is saving employees 8 to 10 hours weekly
2. How the company measures AI productivity gains across technical and non-technical teams
3. Which teams are benefiting most from AI (it’s not engineering)
4. The boring organizational change that boosted productivity even more than AI tools
5. Why code quality has almost nothing to do with product success
6. How to drive AI adoption throughout an organization (hint: leadership needs to use the tools daily)
7. Lessons from building Google Wave, Google+, and other failed products
—
Brought to you by:
Sinch—Build messaging, email, and calling into your product: https://sinch.com/lenny
Figma Make—A prompt-to-code tool for making ideas real: https://www.figma.com/lenny/
Persona—A global leader in digital identity verification: https://withpersona.com/lenny
—
Where to find Dhanji R. Prasanna:
• LinkedIn: https://www.linkedin.com/in/dhanji/
—
Where to find Lenny:
• Newsletter: https://www.lennysnewsletter.com
• X: https://twitter.com/lennysan
• LinkedIn: https://www.linkedin.com/in/lennyrachitsky/
—
In this episode, we cover:
(00:00) Introduction to Dhanji
(05:26) The AI manifesto: convincing Jack Dorsey
(07:33) Transforming into a more AI-native company
(12:05) How engineering teams work differently today
(15:24) Goose: Block’s open-source AI agent
(20:18) Measuring AI productivity gains across teams
(21:38) What Goose is and how it works
(32:15) The future of AI in engineering and productivity
(37:42) The importance of human taste
(40:10) Building vs. buying software
(44:08) How AI is changing hiring and team structure
(53:45) The importance of using AI tools yourself before deploying them
(55:13) How Goose helped solve a personal problem with receipts
(58:01) What makes Goose unique
(59:57) What Dhanji wishes he knew before becoming CTO
(01:01:49) Counterintuitive lessons in product development
(01:04:56) Why controlled chaos can be good for engineering teams
(01:08:07) Core leadership lessons
(01:13:36) Failure corner
(01:15:50) Lightning round and final thoughts
—
Referenced:
• Jack Dorsey on X: https://x.com/jack
• Block: https://block.xyz/
• Square: https://squareup.com/
• Cash App: https://cash.app/
• What is Conway’s Law?: https://www.microsoft.com/en-us/microsoft-365-life-hacks/organization/what-is-conways-law#
• Goose: https://github.com/block/goose
• Gosling: https://github.com/block/goose-mobile
• Salesforce: https://www.salesforce.com/
• Snowflake: https://www.snowflake.com/
• Claude: https://claude.ai/
• Anthropic co-founder on quitting OpenAI, AGI predictions, $100M talent wars, 20% unemployment, and the nightmare scenarios keeping him up at night | Ben Mann: https://www.lennysnewsletter.com/p/anthropic-co-founder-benjamin-mann
• OpenAI: https://openai.com/
• OpenAI’s CPO on how AI changes must-have skills, moats, coding, startup playbooks, more | Kevin Weil (CPO at OpenAI, ex-Instagram, Twitter): https://www.lennysnewsletter.com/p/kevin-weil-open-ai
• Llama: https://www.llama.com/
• Cursor: https://cursor.com/
• The rise of Cursor: The $300M ARR AI tool that engineers can’t stop using | Michael Truell (co-founder and CEO): https://www.lennysnewsletter.com/p/the-rise-of-cursor-michael-truell
• Top Gun: https://www.imdb.com/title/tt0092099/
• Lenny’s vibe-coded Lovable app: https://gdoc-images-grab.lovable.app/
• Afterpay: https://github.com/afterpay
• Bitkey: https://bitkey.world/
• Proto: https://github.com/proto-at-block
• Brad Axen on LinkedIn: https://www.linkedin.com/in/bradleyaxen/
• Databricks: https://www.databricks.com/
• Carl Sagan’s quote: https://www.goodreads.com/quotes/32952-if-you-wish-to-make-an-apple-pie-from-scratch
• Google Wave: https://en.wikipedia.org/wiki/Google_Wave
• Google Video: https://en.wikipedia.org/wiki/Google_Video
• Secret: https://en.wikipedia.org/wiki/Secret_(app)
• Alien Earth on FX: https://www.fxnetworks.com/shows/alien-earth
• Slow Horses on AppleTV+: https://tv.apple.com/us/show/slow-horses/umc.cmc.2szz3fdt71tl1ulnbp8utgq5o
• Fargo TV series on Prime Video: https://www.amazon.com/Fargo-Season-1/dp/B09QGRGH6M
• Steam Deck OLED display: https://www.steamdeck.com/en/oled
• Doc Brown: https://backtothefuture.fandom.com/wiki/Emmett_Brown
—
Recommended books:
• The Master and Margarita: https://www.amazon.com/Master-Margarita-Mikhail-Bulgakov/dp/0802130119
• Tennyson Poems: https://www.amazon.com/Tennyson-Poems-Everymans-Library-Pocket/dp/1400041872/
—
Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email podcast@lennyrachitsky.com.
—
Lenny may be an investor in the companies discussed.