Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
147231 stories
·
32 followers

The EV tax credit is gone — now the hard part begins

1 Share

Hello, and welcome to Decoder! This is Jake Kastrenakes, executive editor at The Verge. I’m filling in for Nilay for one Thursday episode while he’s settling back into full-time podcast hosting duties. 

We’ve got a very good episode for you today. My guest is Verge transportation editor Andy Hawkins, and we’re talking about the federal EV tax credit. The tax credit expired at the end of September, and there are a lot of questions about what happens to the auto industry after its demise

In its latest form, the tax credit offered a $7,500 discount on eligible, domestic-made electric cars. As you’ll hear Andy explain, this was designed to accomplish a lot of different things all at once: prop up the United States’ EV market, fight climate change, and keep pace with China, which has become the global leader in affordable EVs. 

But the second Trump administration has not been kind to the renewable energy movement, and EVs have become a bit of a political football over the last several years. Trump has turned them into a symbol for government overreach and wielded them as weapons to target his enemies. Just a few weeks ago, at the United Nations General Assembly, Trump called climate change a “con job.” So, that’s how he feels about that. 

Now the EV tax credit has expired, and it’s not coming back anytime soon. So where does that leave the auto industry? And what happens to the traditional American carmakers that have been investing heavily in domestic production to electrify their lineups? 

Verge subscribers, don’t forget you get exclusive access to ad-free Decoder wherever you get your podcasts. Head here. Not a subscriber? You can sign up here.

As you’ll hear Andy lay out, there’s a tough road ahead. EVs are expensive to make and expensive to buy. The supply chain they rely on is intertwined with China, and now subject to tariffs and an escalating trade war. And consumers are highly price sensitive when it comes to EVs and new technology that comes with them, much more so than the early adopters who flocked to Tesla years ago.

If the US auto industry wants to win back buyers, it’s going to need to produce cheaper EVs, much like China does. And that’s going to require manufacturing, supply chain, and technology innovations that will take some time to materialize. 

This is a really hard, complicated set of problems, with a lot of moving parts, so I was really excited to have Andy on the show to break down all of these components and give us a clearer picture about what’s coming next.

If you’d like to read more on what we talked about in this episode, check out the links below:

Questions or comments about this episode? Hit us up at decoder@theverge.com. We really do read every email!

Read the whole story
alvinashcraft
1 hour ago
reply
Pennsylvania, USA
Share this story
Delete

Microsoft wants you to talk to your PC and let AI control it

1 Share

As Microsoft bids farewell to Windows 10 and gets ready to mark the 40-year milestone of its operating system, it’s looking forward to what’s next for Windows. Microsoft might not be ready to announce Windows 12 just yet, but it clearly wants to turn every Windows 11 PC into an AI PC that Copilot controls and users talk to.

“We think we’re on the cusp of the next evolution, where AI happens not just in that chatbot and gets naturally integrated into the hundreds of millions of experiences that people use every day,” says Yusuf Mehdi, executive vice president and consumer chief marketing officer at Microsoft, in a briefing with The Verge. “The vision that we have is: let’s rewrite the entire operating system around AI, and build essentially what becomes truly the AI PC.”

Microsoft is launching a set of capabilities in Windows today that will start to weave AI features into regular Windows 11 PCs, instead of consumers having to buy a special Copilot Plus PC. The biggest change is that Microsoft thinks people will want to talk to their computers and have Copilot take actions on their behalf.

“You should be able to talk to your PC, have it understand you, and then be able to have magic happen from that,” says Mehdi. “With your permission, we want people to be able to share with their AI on Windows what they’re doing and what they’re seeing. The PC should be able to act on your behalf.”

Microsoft is leaning on Copilot’s Voice and Vision capabilities to try and make this reality, thanks to a new “Hey, Copilot!” wake word that is now rolling out on Windows 11 PCs. “In our minds, voice will now become the third input mechanism to use with your PC,” says Mehdi. “It doesn’t replace the keyboard and mouse necessarily, but it’s an added thing and it will be pretty profound and a new way to do it.”

We’ve been here many times before. Microsoft tried to convince people to use Cortana on Windows 10 PCs a decade ago, and has added various voice features to Windows for accessibility over the past 40 years. Microsoft is now convinced that AI will somehow spark a change in behavior and convince people that talking to a PC isn’t weird.

“All the data that we see is when people use voice, they love it,” says Mehdi. Some of that data is the billions of minutes that people spend talking in Microsoft Teams meetings. “They’re talking through their computers today, and I think this change to ‘talk with and talk to’ will come to reality and we’ll see this thing really take off,” says Mehdi.

I’m not convinced that most people want to talk to their computer, even if it’s great for accessibility and scenarios like getting help with apps. “Doctors are taking transcriptions while they’re performing examinations, people use it for searching, and our work with the accessibility community has taught us a lot about how to make voice access and voice typing really valuable,” says Mehdi.

For AI to control a PC and take actions on behalf of the user, it must first be granted access to see what’s on the screen. Microsoft has been testing Copilot Vision in recent months, a feature that can scan everything on your screen and coach you through using apps or answer questions about photos and documents.

Copilot Vision is now rolling out worldwide in all markets where Copilot is available, and it will let you get help using apps, troubleshoot PC problems, learn new tasks, and even get step-by-step guidance in games. Unlike the Recall feature that automatically takes a snapshot of your PC, Copilot Vision is an opt-in feature where you essentially stream what you’re seeing on your screen much like you would in a Teams call.

The next step beyond Copilot Vision is Copilot Actions, allowing Microsoft’s AI assistant to take actions on a local PC, like making edits to a folder full of photos. Microsoft is starting to test these actions on Windows PCs through a preview program, limited to a narrow set of use cases while Microsoft optimizes the AI model.

“In the beginning you might see the agent make some mistakes, or encounter some challenges when trying to use some really complex applications,” explains Navjot Virk, corporate vice president of Windows Experiences. An AI agent making mistakes using a computer doesn’t fill me with confidence, which is probably why Microsoft is limiting this to Copilot Labs for now. “We’re absolutely committed to learning from how people use it, and we want to continue to improve the experience to make it more capable and streamlined over time,” says Virk.

Copilot Actions launches in a separate Windows desktop “secure and contained environment,” and uses an AI agent to complete the job you’ve asked it to do. You can have it running in the background while you go ahead and do something else, and it will list all the steps it’s taking and you can sit and watch it complete them.

Microsoft is also integrating Copilot into the Windows taskbar, with one-click access to these new Copilot Vision and Voice features. It also has a new integrated search experience to make it faster to find local files, apps, and settings.

After the Recall fiasco last year, I think Microsoft will have a hard time convincing people to trust its Copilot Vision and Copilot Actions features, and an equally challenging time getting people to talk to their PCs. That’s not stopping Microsoft from trying, though. The company is planning to run television ads that highlight these new AI features in Windows 11, with the tagline “meet the computer you can talk to.”

The ads coincide with the end-of-support phase for Windows 10 earlier this week, and Microsoft is once again promoting Windows 11 PCs that consumers can upgrade to. “We want every person making the move to experience what it means to have a PC that’s not just a tool, but a true partner,” says Mehdi.

Read the whole story
alvinashcraft
1 hour ago
reply
Pennsylvania, USA
Share this story
Delete

Extortion and ransomware drive over half of cyberattacks

1 Share

In 80% of the cyber incidents Microsoft’s security teams investigated last year, attackers sought to steal data—a trend driven more by financial gain than intelligence gathering. According to the latest Microsoft Digital Defense Report, written with our Chief Information Security Officer Igor Tsyganskiy, over half of cyberattacks with known motives were driven by extortion or ransomware. That’s at least 52% of incidents fueled by financial gain, while attacks focused solely on espionage made up just 4%. Nation-state threats remain a serious and persistent threat, but most of the immediate attacks organizations face today come from opportunistic criminals looking to make a profit.

Every day, Microsoft processes more than 100 trillion signals, blocks approximately 4.5 million new malware attempts, analyzes 38 million identity risk detections, and screens 5 billion emails for malware and phishing. Advances in automation and readily available off-the-shelf tools have enabled cybercriminals—even those with limited technical expertise—to expand their operations significantly. The use of AI has further added to this trend with cybercriminals accelerating malware development and creating more realistic synthetic content, enhancing the efficiency of activities such as phishing and ransomware attacks. As a result, opportunistic malicious actors now target everyone—big or small—making cybercrime a universal, ever-present threat that spills into our daily lives.

In this environment, organizational leaders must treat cybersecurity as a core strategic priority—not just an IT issue—and build resilience into their technology and operations from the ground up. In our sixth annual Microsoft Digital Defense Report, which covers trends from July 2024 through June 2025, we highlight that legacy security measures are no longer enough; we need modern defenses leveraging AI and strong collaboration across industries and governments to keep pace with the threat. For individuals, simple steps like using strong security tools—especially phishing-resistant multifactor authentication (MFA)—makes a big difference, as MFA can block over 99% of identity-based attacks. Below are some of the key findings.

World map showing countries most frequently impacted by cyber threats from January to June 2025, with the United States leading at 24.8%, followed by the United Kingdom, Israel, and Germany. Green circles indicate scale of impact.

Critical services are prime targets with a real-world impact

Malicious actors remain focused on attacking critical public services—targets that, when compromised, can have a direct and immediate impact on people’s lives. Hospitals and local governments, for example, are all targets because they store sensitive data or have tight cybersecurity budgets with limited incident response capabilities, often resulting in outdated software. In the past year, cyberattacks on these sectors had real-world consequences, including delayed emergency medical care, disrupted emergency services, canceled school classes, and halted transportation systems.

Ransomware actors in particular focus on these critical sectors because of the targets’ limited options. For example, a hospital must quickly resolve its encrypted systems, or patients could die, potentially leaving no other recourse but to pay. Additionally, governments, hospitals, and research institutions store sensitive data that criminals can steal and monetize through illicit marketplaces on the dark web, fueling downstream criminal activity. Government and industry can collaborate to strengthen cybersecurity in these sectors—particularly for the most vulnerable. These efforts are critical to protecting communities and ensuring continuity of care, education, and emergency response.

Nation-state actors are expanding operations

While cybercriminals are the biggest cyber threat by volume, nation-state actors still target key industries and regions, expanding their focus on espionage and, in some cases, on financial gain. Geopolitical objectives continue to drive a surge in state-sponsored cyber activity, with a notable expansion in targeting communications, research, and academia.

Key insights:

  • China is continuing its broad push across industries to conduct espionage and steal sensitive data. State-affiliated actors are increasingly attacking non-governmental organizations (NGOs) to expand their insights and are using covert networks and vulnerable internet-facing devices to gain entry and avoid detection. They have also become faster at operationalizing newly disclosed vulnerabilities.
  • Iran is going after a wider range of targets than ever before, from the Middle East to North America, as part of broadening espionage operations. Recently, three Iranian state-affiliated actors attacked shipping and logistics firms in Europe and the Persian Gulf to gain ongoing access to sensitive commercial data, raising the possibility that Iran may be pre-positioning to have the ability to interfere with commercial shipping operations.
  • Russia, while still focused on the war in Ukraine, has expanded its targets. For example, Microsoft has observed Russian state-affiliated actors targeting small businesses in countries supporting Ukraine. In fact, outside of Ukraine, the top ten countries most affected by Russian cyber activity all belong to the North Atlantic Treaty Organization (NATO)—a 25% increase compared to last year. Russian actors may view these smaller companies as possibly less resource-intensive pivot points they can use to access larger organizations. These actors are also increasingly leveraging the cybercriminal ecosystem for their attacks.
  • North Korea remains focused on revenue generation and espionage. In a trend that has gained significant attention, thousands of state-affiliated North Korean remote IT workers have applied for jobs with companies around the world, sending their salaries back to the government as remittances. When discovered, some of these workers have turned to extortion as another approach to bringing in money for the regime.

The cyber threats posed by nation-states are becoming more expansive and unpredictable. In addition, the shift by at least some nation-state actors to further leveraging the cybercriminal ecosystem will make attribution even more complicated. This underscores the need for organizations to stay abreast of the threats to their industries and work with both industry peers and governments to confront the threats posed by nation-state actors.

2025 saw an escalation in the use of AI by both attackers and defenders

Over the past year, both attackers and defenders harnessed the power of generative AI. Threat actors are using AI to boost their attacks by automating phishing, scaling social engineering, creating synthetic media, finding vulnerabilities faster, and creating malware that can adapt itself. Nation-state actors, too, have continued to incorporate AI into their cyber influence operations. This activity has picked up in the past six months as actors use the technology to make their efforts more advanced, scalable, and targeted.

For defenders, AI is also proving to be a valuable tool. Microsoft, for example, uses AI to spot threats, close detection gaps, catch phishing attempts, and protect vulnerable users. As both the risks and opportunities of AI rapidly evolve, organizations must prioritize securing their AI tools and training their teams. Everyone—from industry to government—must be proactive to keep pace with increasingly sophisticated attackers and to ensure that defenders keep ahead of adversaries.

Adversaries aren’t breaking in; they’re signing in

Amid the growing sophistication of cyber threats, one statistic stands out: more than 97% of identity attacks are password attacks. In the first half of 2025 alone, identity-based attacks surged by 32%. That means the vast majority of malicious sign-in attempts an organization might receive are via large-scale password guessing attempts. Attackers get usernames and passwords (“credentials”) for these bulk attacks largely from credential leaks.

However, credential leaks aren’t the only place where attackers can obtain credentials. This year, we saw a surge in the use of infostealer malware by cybercriminals. Infostealers can secretly gather credentials and information about your online accounts, like browser session tokens, at scale. Cybercriminals can then buy this stolen information on cybercrime forums, making it easy for anyone to access accounts for purposes such as the delivery of ransomware.

Luckily, the solution to identity compromise is simple. The implementation of phishing-resistant multifactor authentication (MFA) can stop over 99% of this type of attack even if the attacker has the correct username and password combination. To target the malicious supply chain, Microsoft’s Digital Crimes Unit (DCU) is fighting back against the cybercriminal use of infostealers. In May, the DCU disrupted the most popular infostealer—Lumma Stealer—alongside the US Department of Justice and Europol.

Moving forward: Cybersecurity is a shared defensive priority

As threat actors grow more sophisticated, persistent, and opportunistic, organizations must stay vigilant, continually updating their defenses and sharing intelligence. Microsoft remains committed to doing its part to strengthen our products and services via our Secure Future Initiative. We also continue to collaborate with others to track threats, alert targeted customers, and share insights with the broader public when appropriate.

However, security is not only a technical challenge but a governance imperative. Defensive measures alone are not enough to deter nation-state adversaries. Governments must build frameworks that signal credible and proportionate consequences for malicious activity that violates international rules. Encouragingly, governments are increasingly attributing cyberattacks to foreign actors and imposing consequences such as indictments and sanctions. This growing transparency and accountability are important steps toward building collective deterrence. As digital transformation accelerates—amplified by the rise of AI—cyber threats pose risks to economic stability, governance, and personal safety. Addressing these challenges requires not only technical innovation but coordinated societal action.

The post Extortion and ransomware drive over half of cyberattacks appeared first on Microsoft On the Issues.

Read the whole story
alvinashcraft
1 hour ago
reply
Pennsylvania, USA
Share this story
Delete

Generative AI in the Real World: Context Engineering with Drew Breunig

1 Share

In this episode, Ben Lorica and Drew Breunig, a strategist at the Overture Maps Foundation, talk all things context engineering: what’s working, where things are breaking down, and what comes next. Listen in to hear why huge context windows aren’t solving the problems we hoped they might, why companies shouldn’t discount evals and testing, and why we’re doing the field a disservice by leaning into marketing and buzzwords rather than trying to leverage what current crop of LLMs are actually capable of.

About the Generative AI in the Real World podcast: In 2023, ChatGPT put AI on everyone’s agenda. In 2025, the challenge will be turning those agendas into reality. In Generative AI in the Real World, Ben Lorica interviews leaders who are building with AI. Learn from their experience to help put AI to work in your enterprise.

Check out other episodes of this podcast on the O’Reilly learning platform.

Transcript

This transcript was created with the help of AI and has been lightly edited for clarity.

00.00: All right. So today we have Drew Breunig. He is a strategist at the Overture Maps Foundation. And he’s also in the process of writing a book for O’Reilly called the Context Engineering Handbook. And with that, Drew, welcome to the podcast.

00.23: Thanks, Ben. Thanks for having me on here. 

00.26: So context engineering. . . I remember before ChatGPT was even released, someone was talking to me about prompt engineering. I said, “What’s that?” And then of course, fast-forward to today, now people are talking about context engineering. And I guess the short definition is it’s the delicate art and science of filling the context window with just the right information. What’s broken with how teams think about context today? 

00.56: I think it’s important to talk about why we need a new word or why a new word makes sense. I was just talking with Mike Taylor, who wrote the prompt engineering book for O’Reilly, exactly about this and why we need a new word. Why is prompt engineering not good enough? And I think it has to do with the way the models and the way they’re being built is evolving. I think it also has to deal with the way that we’re learning how to use these models. 

And so prompt engineering was a natural word to think about when your interaction and how you program the model was maybe one turn of conversation, maybe two, and you might pull in some context to give it examples. You might do some RAG and context augmentation, but you’re working with this one-shot service. And that was really similar to the way people were working in chatbots. And so prompt engineering started to evolve as this thing. 

02.00: But as we started to build agents and as companies started to develop models that were capable of multiturn tool-augmented reasoning usage, suddenly you’re not using that one prompt. You have a context that is sometimes being prompted by you, sometimes being modified by your software harness around the model, sometimes being modified by the model itself. And increasingly the model is starting to manage that context. And that prompt is very user-centric. It is a user giving that prompt. 

But when we start to have these multiturn systematic editing and preparation of contexts, a new word was needed, which is this idea of context engineering. This is not to belittle prompt engineering. I think it’s an evolution. And it shows how we’re evolving and finding this space in real time. I think context engineering is more suited to agents and applied AI programing, whereas prompt engineering lives in how people use chatbots, which is a different field. It’s not better and not worse. 

And so context engineering is more specific to understanding the failure modes that occur, diagnosing those failure modes and establishing good practices for both preparing your context but also setting up systems that fix and edit your context, if that makes sense. 

03.33: Yeah, and also, it seems like the words themselves are indicative of the scope, right? So “prompt” engineering means it’s the prompt. So you’re fiddling with the prompt. And [with] context engineering, “context” can be a lot of things. It could be the information you retrieve. It might involve RAG, so you retrieve information. You put that in the context window. 

04.02: Yeah. And people were doing that with prompts too. But I think in the beginning we just didn’t have the words. And that word became a big empty bucket that we filled up. You know, the quote I always quote too often, but I find it fitting, is one of my favorite quotes from Stuart Brand, which is, “If you want to know where the future is being made, follow where the lawyers are congregating and the language is being invented,” and the arrival of context engineering as a word came after the field was invented. It just kind of crystallized and demarcated what people were already doing. 

04.36: So the word “context” means you’re providing context. So context could be a tool, right? It could be memory. Whereas the word “prompt” is much more specific. 

04.55: And I think it also is like, it has to be edited by a person. I’m a big advocate for not using anthropomorphizing words around large language models. “Prompt” to me involves agency. And so I think it’s nice—it’s a good delineation. 

05.14: And then I think one of the very immediate lessons that people realize is, just because. . . 

So one of the things that these model providers do when they have a model release,  one of the things they note is, What’s the size of the context window? So people started associating context window [with] “I stuff as much as I can in there.” But the reality is actually that, one, it’s not efficient. And two, it also is not useful to the model. Just because you have a massive context window doesn’t mean that the model treats the entire context window evenly.

05.57: Yeah, it doesn’t treat it evenly. And it’s not a one-size-fits-all solution. So I don’t know if you remember last year, but that was the big dream, which was, “Hey, we’re doing all this work with RAG and augmenting our context. But wait a second, if we can make the context 1 million tokens, 2 million tokens, I don’t have to run RAG on all of my corporate documents. I can just fit it all in there, and I can constantly be asking this. And if we can do this, we essentially have solved all of the hard problems that we were worrying about last year.” And so that was the big hope. 

And you started to see an arms race of everybody trying to make bigger and bigger context windows to the point where, you know, Llama 4 had its spectacular flameout. It was rushed out the door. But the headline feature by far was “We will be releasing a 10 million token context window.” And the thing that everybody realized is. . .  Like, all right, we were really hopeful for that. And then as we started building with these context windows, we started to realize there were some big limitations around them.

07.01: Perhaps the thing that clicked for me was in Google’s Gemini 2.5 paper. Fantastic paper. And one of the reasons I love it is because they dedicate about four pages in the appendix to talking about the kind of methodology and harnesses they built so that they could teach Gemini to play Pokémon: how to connect it to the game, how to actually read out the state of the game, how to make choices about it, what tools they gave it, all of these other things.

And buried in there was a real “warts and all” case study, which are my favorite when you talk about the hard things and especially when you cite the things you can’t overcome. And Gemini 2.5 was a million-token context window with, eventually, 2 million tokens coming. But in this Pokémon thing, they said, “Hey, we actually noticed something, which is once you get to about 200,000 tokens, things start to fall apart, and they fall apart for a host of reasons. They start to hallucinate. One of the things that is really demonstrable is they start to rely more on the context knowledge than the weights knowledge. 

08.22: So inside every model there’s a knowledge base. There’s, you know, all of these other things that get kind of buried into the parameters. But when you reach a certain level of context, it starts to overload the model, and it starts to rely more on the examples in the context. And so this means that you are not taking advantage of the full strength or knowledge of the model. 

08.43: So that’s one way it can fail. We call this “context distraction,” though Kelly Hong at Chroma has written an incredible paper documenting this, which she calls “context rot,” which is a similar way [of] charting when these benchmarks start to fall apart.

Now the cool thing about this is that you can actually use this to your advantage. There’s another paper out of, I believe, the Harvard Interaction Lab, where they look at these inflection points for. . . 

09.13: Are you familiar with the term “in-context learning”? In-context learning is when you teach the model to do something that doesn’t know how to do by providing examples in your context. And those examples illustrate how it should perform. It’s not something that it’s seen before. It’s not in the weights. It’s a completely unique problem. 

Well, sometimes those in-context learning[s] are counter to what the model has learned in the weights. So they end up fighting each other, the weights and the context. And this paper documented that when you get over a certain context length, you can overwhelm the weights and you can force it to listen to your in-context examples.

09.57: And so all of this is just to try to illustrate the complexity of what’s going on here and how I think one of the traps that leads us to this place is that the gift and the curse of LLMs is that we prompt and build contexts that are in the English language or whatever language you speak. And so that leads us to believe that they’re going to react like other people or entities that read the English language.

And the fact of the matter is, they don’t—they’re reading it in a very specific way. And that specific way can vary from model to model. And so you have to systematically approach this to understand these nuances, which is where the context management field comes in. 

10.35: This is interesting because even before those papers came out, there were studies which showed the exact opposite problem, which is the following: You may have a RAG system that actually retrieves the right information, but then somehow the LLMs can still fail because, as you alluded to, they have weights so they have prior beliefs. You saw something [on] the internet, and they will opine against the precise information you retrieve from the context. 

11.08: This is a really big problem. 

11.09: So this is true even if the context window’s small actually. 

11.13: Yeah, and Ben, you touched on something that’s really important. So in my original blog post, I document four ways that context fails. I talk about “context poisoning.” That’s when you hallucinate something in a long-running task and it stays in there, and so it’s continually confusing it. “Context distraction,” which is when you overwhelm that soft limit to the context window and then you start to perform poorly. “Context confusion”: This is when you put things that aren’t relevant to the task inside your context, and suddenly they think the model thinks that it has to pay attention to this stuff and it leads them astray. And then the last thing is “context clash,” which is when there’s information in the context that’s at odds with the task that you are trying to perform. 

A good example of this is, say you’re asking the model to only reply in JSON, but you’re using MCP tools that are defined with XML. And so you’re creating this backwards thing. But I think there’s a fifth piece that I need to write about because it keeps coming up. And it’s exactly what you described.

12.23: Douwe [Kiela] over at Contextual AI refers to this as “context” or “prompt adherence.” But the term that keeps sticking in my mind is this idea of fighting the weights. There’s three situations you get yourself into when you’re interacting with an LLM. The first is when you’re working with the weights. You’re asking it a question that it knows how to answer. It’s seen many examples of that answer. It has it in its knowledge base. It comes back with the weights, and it can give you a phenomenal, detailed answer to that question. That’s what I call “working with the weights.” 

The second is what we referred to earlier, which is that in-context learning, which is you’re doing something that it doesn’t know about and you’re showing an example, and then it does it. And this is great. It’s wonderful. We do it all the time. 

But then there’s a third example which is, you’re providing it examples. But those examples are at odds with some things that it had learned usually during posttraining, during the fine-tuning or RL stage. A really good example is format outputs. 

13.34: Recently a friend of mine was updating his pipeline to try out a new model, Moonshots. A really great model and really great model for tool use. And so he just changed his model and hit run to see what happened. And he kept failing—his thing couldn’t even work. He’s like, “I don’t understand. This is supposed to be the best tool use model there is.” And he asked me to look at his code.

I looked at his code and he was extracting data using Markdown, essentially: “Put the final answer in an ASCII box and I’ll extract it that way.” And I said, “If you change this to XML, see what happens. Ask it to respond in XML, use XML as your formatting, and see what happens.” He did that. That one change passed every test. Like basically crushed it because it was working with the weights. He wasn’t fighting the weights. Everyone’s experienced this if you build with AI: the stubborn things it refuses to do, no matter how many times you ask it, including formatting. 

14.35: [Here’s] my favorite example of this though, Ben: So in ChatGPT’s web interface or their application interface, if you go there and you try to prompt an image, a lot of the images that people prompt—and I’ve talked to user research about this—are really boring prompts. They have a text box that can be anything, and they’ll say something like “a black cat” or “a statue of a man thinking.”

OpenAI realized this was leading to a lot of bad images because the prompt wasn’t detailed; it wasn’t a good prompt. So they built a system that recognizes if your prompt is too short, low detail, bad, and it hands it to another model and says, “Improve this prompt,” and it improves the prompt for you. And if you inspect in Chrome or Safari or Firefox, whatever, you inspect the developer settings, you can see the JSON being passed back and forth, and you can see your original prompt going in. Then you can see the improved prompt. 

15.36: My favorite example of this [is] I asked it to make a statue of a man thinking, and it came back and said something like “A detailed statue of a human figure in a thinking pose similar to Rodin’s ‘The Thinker.’ The statue is made of weathered stone sitting on a pedestal. . .” Blah blah blah blah blah blah. A paragraph. . . But below that prompt there were instructions to the chatbot or to the LLM that said, “Generate this image and after you generate the image, do not reply. Do not ask follow up questions. Do not ask. Do not make any comments describing what you’ve done. Just generate the image.” And in this prompt, then nine times, some of them in all caps, they say, “Please do not reply.” And the reason is because a big chunk of OpenAI’s posttraining is teaching these models how to converse back and forth. They want you to always be asking a follow-up question and they train it. And so now they have to fight the prompts. They have to add in all these statements. And that’s another way that fails. 

16.42: So why I bring this up—and this is why I need to write about it—is as an applied AI developer, you need to recognize when you’re fighting the prompt, understand enough about the posttraining of that model, or make some assumptions about it, so that you can stop doing that and try something different, because you’re just banging your head against a wall and you’re going to get inconsistent, bad applications and the same statement 20 times over. 

17.07: By the way, the other thing that’s interesting about this whole topic is, people actually somehow have underappreciated or forgotten all of the progress we’ve made in information retrieval. There’s a whole. . . I mean, these people have their own conferences, right? Everything from reranking to the actual indexing, even with vector search—the information retrieval community still has a lot to offer, and it’s the kind of thing that people underappreciated. And so by simply loading your context window with massive amounts of garbage, you’re actually, leaving on the field so much progress in information retrieval.

18.04: I do think it’s hard. And that’s one of the risks: We’re building all this stuff so fast from the ground up, and there’s a tendency to just throw everything into the biggest model possible and then hope it sorts it out.

I really do think there’s two pools of developers. There’s the “throw everything in the model” pool, and then there’s the “I’m going to take incremental steps and find the most optimal model.” And I often find that latter group, which I called a compound AI group after a paper that was published out of Berkeley, those tend to be people who have run data pipelines, because it’s not just a simple back and forth interaction. It’s gigabytes or even more of data you’re processing with the LLM. The costs are high. Latency is important. So designing efficient systems is actually incredibly key, if not a total requirement. So there’s a lot of innovation that comes out of that space because of that kind of boundary.

19.08: If you were to talk to one of these applied AI teams and you were to give them one or two things that they can do right away to improve, or fix context in general, what are some of the best practices?

19.29: Well you’re going to laugh, Ben, because the answer is dependent on the context, and I mean the context in the team and what have you. 

19.38: But if you were to just go give a keynote to a general audience, if you were to list down one, two, or three things that are the lowest hanging fruit, so to speak. . .

19.50: The first thing I’m gonna do is I’m going to look in the room and I’m going to look at the titles of all the people in there, and I’m going to see if they have any subject-matter experts or if it’s just a bunch of engineers trying to build something for subject-matter experts. And my first bit of advice is you need to get yourself a subject-matter expert who is looking at the data, helping you with the eval data, and telling you what “good” looks like. 

I see a lot of teams that don’t have this, and they end up building fairly brittle prompt systems. And then they can’t iterate well, and so that enterprise AI project fails. I also see them not wanting to open themselves up to subject-matter experts, because they want to hold on to the power themselves. It’s not how they’re used to building. 

20.38: I really do think building in applied AI has changed the power dynamic between builders and subject-matter experts. You know, we were talking earlier about some of like the old Web 2.0 days and I’m sure you remember. . . Remember back at the beginning of the iOS app craze, we’d be at a dinner party and someone would find out that you’re capable of building an app, and you would get cornered by some guy who’s like “I’ve got a great idea for an app,” and he would just talk at you—usually a he. 

21.15: This is back in the Objective-C days. . .

21.17: Yes, way back when. And this is someone who loves Objective-C. So you’d get cornered and you’d try to find a way out of that awkward conversation. Nowadays, that dynamic has shifted. The subject-matter expertise is so important for codifying and designing the spec, which usually gets specced out by the evals that it leads itself to more. And you can even see this. OpenAI is arguably creating and at the forefront of this stuff. And what are they doing? They’re standing up programs to get lawyers to come in, to get doctors to come in, to get these specialists to come in and help them create benchmarks because they can’t do it themselves. And so that’s the first thing. Got to work with the subject-matter expert. 

22.04: The second thing is if they’re just starting out—and this is going to sound backwards, given our topic today—I would encourage them to use a system like DSPy or GEPA, which are essentially frameworks for building with AI. And one of the components of that framework is that they optimize the prompt for you with the help of an LLM and your eval data. 

22.37: Throw in BAML?

22.39: BAML is similar [but it’s] more like the spec for how to describe the entire spec. So it’s similar.

22.52: BAML and TextGrad? 

22.55: TextGrad is more like the prompt optimization I’m talking about. 

22:57: TextGrad plus GEPA plus Regolo?

23.02: Yeah, those things are really important. And the reason I say they’re important is. . .

23.08: I mean, Drew, those are kind of advanced topics. 

23.12: I don’t think they’re that advanced. I think they can appear really intimidating because everybody comes in and says, “Well, it’s so easy. I could just write what I want.” And this is the gift and curse of prompts, in my opinion. There’s a lot of things to like about.

23.33: DSPy is fine, but I think TextGrad, GEPA, and Regolo. . .

23.41: Well. . . I wouldn’t encourage you to use GEPA directly. I would encourage you to use it through the framework of DSPy. 

23.48: The point here is if it’s a team building, you can go down essentially two paths. You can handwrite your prompt, and I think this creates some issues. One is as you build, you tend to have a lot of hotfix statements like, “Oh, there’s a bug over here. We’ll say it over here. Oh, that didn’t fix it. So let’s say it again.” It will encourage you to have one person who really understands this prompt. And so you end up being reliant on this prompt magician. Even though they’re written in English, there’s kind of no syntax highlighting. They get messier and messier as you build the application because they start to grow and become these growing collections of edge cases.

24.27: And the other thing too, and this is really important, is when you build and you spend so much time honing a prompt, you’re doing it against one model, and then at some point there’s going to be a better, cheaper, more effective model. And you’re going to have to go through the process of tweaking it and fixing all the bugs again, because this model functions differently.

And I used to have to try to convince people that this was a problem, but they all kind of found out when OpenAI deprecated all of their models and tried to move everyone over to GPT-5. And now I hear about it all the time. 

25.03: Although I think right now “agents” is our hot topic, right? So we talk to people about agents and you start really getting into the weeds, you realize, “Oh, okay. So their agents are really just prompts.” 

25.16: In the loop. . .

25.19: So agent optimization in many ways means injecting a bit more software engineering rigor in how you maintain and version. . .

25.30: Because that context is growing. As that loop goes, you’re deciding what gets added to it. And so you have to put guardrails in—ways to rescue from failure and figure out all these things. It’s very difficult. And you have to go at it systematically. 

25.46: And then the problem is that, in many situations, the models are not even models that you control, actually. You’re using them through an API like OpenAI or Claude so you don’t actually have access to the weights. So even if you’re one of the super, super advanced teams that can do gradient descent and backprop, you can’t do that. Right? So then, what are your options for being more rigorous in doing optimization?

Well, it’s precisely these tools that Drew alluded to, which is the TextGrads of the world, the GEPA. You have these compound systems that are nondifferentiable. So then how do you actually do optimization in a world where you have things that are not differentiable? Right. So these are precisely the tools that will allow you to turn it from somewhat of a, I guess, black art to something with a little more discipline. 

26.53: And I think a good example is, even if you aren’t going to use prompt optimization-type tools. . . The prompt optimization is a great solution for what you just described, which is when you can’t control the weights of the models you’re using. But the other thing too, is, even if you aren’t going to adopt that, you need to get evals because that’s going to be step one for anything, which is you need to start working with subject-matter experts to create evals.

27.22: Because what I see. . . And there was just a really dumb argument online of “Are evals worth it or not?” And it was really silly to me because it was positioned as an either-or argument. And there were people arguing against evals, which is just insane to me. And the reason they were arguing against evals is they’re basically arguing in favor of what they called, to your point about dark arts, vibe shipping—which is they’d make changes, push those changes, and then the person who was also making the changes would go in and type in 12 different things and say, “Yep, feels right to me.” And that’s insane to me. 

27.57: And even if you’re doing that—which I think is a good thing and you may not go create coverage and eval, you have some taste. . . And I do think when you’re building more qualitative tools. . . So a good example is like if you’re Character.AI or you’re Portola Labs, who’s building essentially personalized emotional chatbots, it’s going to be harder to create evals and it’s going to require taste as you build them. But having evals is going to ensure that your whole thing didn’t fall apart because you changed one sentence, which sadly is a risk because these are probabilistic software.

28.33: Honestly, evals are super important. Number one, because, basically, leaderboards like LMArena are great for narrowing your options. But at the end of the day, you still need to benchmark all of these against your own application use case and domain. And then secondly, obviously, it’s an ongoing thing. So it ties in with reliability. The more reliable your application is, that means most likely you’re doing evals properly in an ongoing fashion. And I really believe that eval and reliability are a moat, because basically what else is your moat? Prompt? That’s not a moat. 

29.21: So first off, violent agreement there. The only asset teams truly have—unless they’re a model builder, which is only a handful—is their eval data. And I would say the counterpart to that is their spec, whatever defines their program, but mostly the eval data. But to the other point about it, like why are people vibe shipping? I think you can get pretty far with vibe shipping and it fools you into thinking that that’s right.

We saw this pattern in the Web 2.0 and social era, which was, you would have the product genius—everybody wanted to be the Steve Jobs, who didn’t hold focus groups, didn’t ask their customers what they wanted. The Henry Ford quote about “They all say faster horses,” and I’m the genius who comes in and tweaks these things and ships them. And that often takes you very far.

30.13: I also think it’s a bias of success. We only know about the ones that succeed, but the best ones, when they grow up and they start to serve an audience that’s way bigger than what they could hold in their head, they start to grow up with AB testing and ABX testing throughout their organization. And a good example of that is Facebook.

Facebook stopped being just some choices and started having to do testing and ABX testing in every aspect of their business. Compare that to Snap, which again, was kind of the last of the great product geniuses to come out. Evan [Spiegel] was heralded as “He’s the product genius,” but I think they ran that too long, and they kept shipping on vibes rather than shipping on ABX testing and growing and, you know, being more boring.

31.04: But again, that’s how you get the global reach. I think there’s a lot of people who probably are really great vibe shippers. And they’re probably having great success doing that. The question is, as their company grows and starts to hit harder times or the growth starts to slow, can that vibe shipping take them over the hump? And I would argue, no, I think you have to grow up and start to have more accountable metrics that, you know, scale to the size of your audience. 

31.34: So in closing. . . We talked about prompt engineering. And then we talked about context engineering. So putting you on the spot. What’s a buzzword out there that either irks you or you think is undertalked about at this point? So what’s a buzzword out there, Drew? 

31.57: [laughs] I mean, I wish you had given me some time to think about it. 

31.58: We are in a hype cycle here. . .

32.02: We’re always in a hype cycle. I don’t like anthropomorphosizing LLMs or AI for a whole host of reasons. One, I think it leads to bad understanding and bad mental models, that means that we don’t have substantive conversations about these things, and we don’t learn how to build really well with them because we think they’re intelligent. We think they’re a PhD in your pocket. We think they’re all of these things and they’re not—they’re fundamentally different. 

I’m not against using the way we think the brain works for inspiration. That’s fine with me. But when you start oversimplifying these and not taking the time to explain to your audience how they actually work—you just say it’s a PhD in your pocket, and here’s the benchmark to prove it—you’re misleading and setting unrealistic expectations. And unfortunately, the market rewards them for that. So they keep going. 

But I also think it just doesn’t help you build sustainable programs because you aren’t actually understanding how it works. You’re just kind of reducing it down to it. AGI is one of those things. And superintelligence, but AGI especially.

33.21: I went to school at UC Santa Cruz, and one of my favorite classes I ever took was a seminar with Donna Haraway. Donna Haraway wrote “A Cyborg Manifesto” in the ’80s. She’s kind of a tech science history feminist lens. You would just sit in that class and your mind would explode, and then at the end, you just have to sit there for like five minutes afterwards, just picking up the pieces. 

She had a great term called “power objects.” A power object is something that we as a society recognize to be incredibly important, believe to be incredibly important, but we don’t know how it works. That lack of understanding allows us to fill this bucket with whatever we want it to be: our hopes, our fears, our dreams. This happened with DNA; this happened with PET scans and brain scans. This happens all throughout science history, down to phrenology and blood types and things that we understand to be, or we believed to be, important, but they’re not. And big data, another one that is very, very relevant. 

34.34: That’s my handle on Twitter. 

34.55: Yeah, there you go. So like it’s, you know, I fill it with Ben Lorica. That’s how I fill that power object. But AI is definitely that. AI is definitely that. And my favorite example of this is when the DeepSeek moment happened, we understood this to be really important, but we didn’t understand why it works and how well it worked.

And so what happened is, if you looked at the news and you looked at people’s reactions to what DeepSeek meant, you could basically find all the hopes and dreams about whatever was important to that person. So to AI boosters, DeepSeek proved that LLM progress is not slowing down. To AI skeptics, DeepSeek proved that AI companies have no moat. To open source advocates, it proved open is superior. To AI doomers, it proved that we aren’t being careful enough. Security researchers worried about the risk of backdoors in the models because it was in China. Privacy advocates worried about DeepSeek’s web services collecting sensitive data. China hawks said, “We need more sanctions.” Doves said, “Sanctions don’t work.” NVIDIA bears said, “We’re not going to need any more data centers if it’s going to be this efficient.” And bulls said, “No, we’re going to need tons of them because it’s going to use everything.”

35.44: And AGI is another term like that, which means everything and nothing. And when the point we’ve reached it comes, isn’t. And compounding that is that it’s in the contract between OpenAI and Microsoft—I forget the exact term, but it’s the statement that Microsoft gets access to OpenAI’s technologies until AGI is achieved.

And so it’s a very loaded definition right now that’s being debated back and forth and trying to figure out how to take [Open]AI into being a for-profit corporation. And Microsoft has a lot of leverage because how do you define AGI? Are we going to go to court to define what AGI is? I almost look forward to that.

36.28: So because it’s going to be that thing, and you’ve seen Sam Altman come out and some days he talks about how LLMs are just software. Some days he talks about how it’s a PhD in your pocket, some days he talks about how we’ve already passed AGI, it’s already over. 

I think Nathan Lambert has some great writing about how AGI is a mistake. We shouldn’t talk about trying to turn LLMs into humans. We should try to leverage what they do now, which is something fundamentally different, and we should keep building and leaning into that rather than trying to make them like us. So AGI is my word for you. 

37.03: The way I think of it is, AGI is great for fundraising, let’s put it that way. 

37.08: That’s basically it. Well, until you need it to have already been achieved, or until you need it to not be achieved because you don’t want any regulation or if you want regulation—it’s kind of a fuzzy word. And that has some really good properties. 

37.23: So I’ll close by throwing in my own term. So prompt engineering, context engineering. . . I will close by saying pay attention to this boring term, which my friend Ion Stoica is now talking more about “systems engineering.” If you look at particularly the agentic applications, you’re talking about systems.

37.55: Can I add one thing to this? Violent agreement. I think that is an underrated. . . 

38.00: Although I think it’s too boring a term, Drew, to take off.

38.03: That’s fine! The reason I like it is because—and you were talking about this when you talk about fine-tuning—is, looking at the way people build and looking at the way I see teams with success build, there’s pretraining, where you’re basically training on unstructured data and you’re just building your base knowledge, your base English capabilities and all that. And then you have posttraining. And in general, posttraining is where you build. I do think of it as a form of interface design, even though you are adding new skills, but you’re teaching reasoning, you’re teaching it validated functions like code and math. You’re teaching it how to chat with you. This is where it learns to converse. You’re teaching it how to use tools and specific sets of tools. And then you’re teaching it alignment, what’s safe, what’s not safe, all these other things. 

But then after it ships, you can still RL that model, you can still fine-tune that model, and you can still prompt engineer that model, and you can still context engineer that model. And back to the systems engineering thing is, I think we’re going to see that posttraining all the way through to a final applied AI product. That’s going to be a real shades-of-gray gradient. It’s going to be. And this is one of the reasons why I think open models have a pretty big advantage in the future is that you’re going to dip down the way throughout that, leverage that. . .

39.32: The only thing that’s keeping us from doing that now is we don’t have the tools and the operating system to align throughout that posttraining to shipping. Once we do, that operating system is going to change how we build, because the distance between posttraining and building is going to look really, really, really blurry. I really like the systems engineering type of approach, but I also think you can also start to see this yesterday [when] Thinking Machines released their first product.

40.04: And so Thinking Machines is Mira [Murati]. Her very hype thing. They launched their first thing, and it’s called Tinker. And it’s essentially, “Hey, you can write a very simple Python code, and then we will do the RL for you or the fine-tuning for you using our cluster of GPU so you don’t have to manage that.” And that is the type of thing that we want to see in a maturing kind of development framework. And you start to see this operating system emerging. 

And it reminds me of the early days of O’Reilly, where it’s like I had to stand up a web server, I had to maintain a web server, I had to do all of these things, and now I don’t have to. I can spin up a Docker image, I can ship to render, I can ship to Vercel. All of these shared complicated things now have frameworks and tooling, and I think we’re going to see a similar evolution from that. And I’m really excited. And I think you have picked a great underrated term. 

40.56: Now with that. Thank you, Drew. 

40.58: Awesome. Thank you for having me, Ben.



Read the whole story
alvinashcraft
1 hour ago
reply
Pennsylvania, USA
Share this story
Delete

The “10x” Commandments of Highly Effective Go

1 Share

This is a guest post from John Arundel of Bitfield Consulting, a Go trainer and writer who runs a free newsletter for Go learners. His most recent book is The Deeper Love of Go.

Ever wondered if there’s a software engineer, somewhere, who actually knows what they’re doing? Well, I finally found the one serene, omnicompetent guru who writes perfect code. I can’t disclose the location of her mountain hermitage, but I can share her ten mantras of Go excellence. Let’s meditate on them together.

1. Write packages, not programs

The standard library is great, but the universal library of free, open-source software is Go’s biggest asset. Return the favour by writing not just programs, but packages that others can use too.

Your main function’s only job should be parsing flags and arguments, and handling errors and cleanup, while your imported “domain” package does the real work.
Flexible packages return data instead of printing, and return errors rather than calling panic or os.Exit. Keep your module structure simple: ideally, one package.

Tip: Use Structure view (Cmd-F12) for a high-level picture of your module.

2. Test everything

Writing tests helps you dogfood your packages: awkward names and inconvenient APIs are obvious when you use them yourself.

Test names should be sentences. Focus tests on small units of user-visible behaviour. Add integration tests for end-to-end checks. Test binaries with testscript.

Tip: Use GoLand’s “generate tests” feature to add tests for existing code. Run with coverage can identify untested code. Use the debugger to analyse test failures.

3. Write code for reading

Ask a co-worker to read your code line by line and tell you what it does. Their stumbles will show you where your speed-bumps are: flatten them out and reduce cognitive load by refactoring. Read other people’s code and notice where you stumble—why?

Use consistent naming to maximise glanceability: err for errors, data for arbitrary []bytes, buf for buffers, file for *os.File pointers, path for pathnames, i for index values, req for requests, resp for responses, ctx for contexts, and so on.

Good names make code read naturally. Design the architecture, name the components, document the details. Simplify wordy functions by moving low-level “paperwork” into smaller functions with informative names (createRequest, parseResponse).

Tip: In GoLand, use the Extract method refactoring to shorten long functions. Use Rename to rename an identifier everywhere.

4. Be safe by default

Use “always valid values” in your programs, and design types so that users can’t accidentally create values that won’t work. Make the zero value useful for literals, or write a validating constructor that guarantees a valid, usable object with default settings. Add configuration using WithX methods:

<code>widget := NewWidget().WithTimeout(time.Second)</code>

Use named constants instead of magic values. http.StatusOK is self-explanatory; 200 isn’t. Define your own constants so IDEs like GoLand can auto-complete them, preventing typos. Use iota to auto-assign arbitrary values:

const (
    Planet = iota // 0
    Star          // 1
    Comet         // 2
    // ...
)

Prevent security holes by using os.Root instead of os.Open, eliminating path traversal attacks:

root, err := os.OpenRoot("/var/www/assets")
if err != nil {
    return err
}
defer root.Close()
file, err := root.Open("../../../etc/passwd")
// Error: 'openat ../../../etc/passwd: path escapes from parent'
<code data-enlighter-language="generic" class="EnlighterJSRAW"></code>

Don’t require your program to run as root or in setuid mode; let users configure the minimal permissions and capabilities they need.

Tip: Use Goland’s Generate constructor and Generate getter and setter functions to help you create always valid struct types.

5. Wrap errors, don’t flatten

Don’t type-assert errors or compare error values directly with ==, define named “sentinel” values that users can match errors against:

var ErrOutOfCheese = "++?????++ Out of Cheese Error. Redo From Start."

Don’t inspect the string values of errors to find out what they are; this is fragile. Instead, use errors.Is:

if errors.Is(err, ErrOutOfCheese) {

To add run-time information or context to an error, don’t flatten it into a string. Use the %w verb with fmt.Errorf to create a wrapped error:

return fmt.Errorf("GNU Terry Pratchett: %w", ErrOutOfCheese)

This way, errors.Is can still match the wrapped error against your sentinel value, even though it contains extra information.

Tip: GoLand will warn you against comparing or type-asserting error values.

6. Avoid mutable global state

Package-level variables can cause data races: reading a variable from one goroutine while writing it from another can crash your program. Instead, use a sync.Mutex to prevent concurrent access, or allow access to the data only in a single “guard” goroutine that takes read or write requests via a channel.

Don’t use global objects like http.DefaultServeMux or DefaultClient;  packages you import might invisibly change these objects, maliciously or otherwise. Instead, create a new instance with http.NewServeMux (for example) and configure it how you want.

Tip: Use GoLand’s Run/Debug Configurations settings to enable the Go race detector for testing concurrent code.

7. Use (structured) concurrency sparingly

Concurrent programming is a minefield: it’s easy to trigger crashes or race conditions. Don’t introduce concurrency to a program unless it’s unavoidable. When you do use goroutines and channels, keep them strictly confined: once they escape the scope where they’re created, it’s hard to follow the flow of control. “Global” goroutines, like global variables, can lead to hard-to-find bugs.

Make sure any goroutines you create will terminate before the enclosing function exits, using a context or waitgroup:

var wg sync.WaitGroup
wg.Go(task1)
wg.Go(task2)
wg.Wait()

The Wait call ensures that both tasks have completed before we move on, making control flow easy to understand, and preventing resource leaks.

Use errgroups to catch the first error from a number of parallel tasks, and terminate all the others:

var eg errgroup.Group
eg.Go(task1)
eg.Go(task2)
err := eg.Wait()
if err != nil {
	fmt.Printf("error %v: all other tasks cancelled", err)
} else {
	fmt.Println("all tasks completed successfully")
}

When you take a channel as the parameter to a function, take either its send or receive aspect, but not both. This prevents a common kind of deadlock where the function tries to send and receive on the same channel concurrently.

func produce(ch chan<- Event) {
	// can send on `ch` but not receive
}

func consume(ch <-chan Event) {
	// can receive on `ch` but not send
}

Tip: Use GoLand’s profiler and debugger to analyse the behaviour of your goroutines, eliminate leaks, and solve deadlocks.

8. Decouple code from environment

Don’t depend on OS or environment-specific details. Don’t use os.Getenv or os.Args deep in your package: only main should access environment variables or command-line arguments. Instead of taking choices away from users of your package, let them configure it however they want.

Single binaries are easier for users to install, update, and manage; don’t distribute config files. If necessary, create your config file at run time using defaults.

Use go:embed to bundle static data, such as images or certificates, into your binary:

import _ "embed"

//go:embed hello.txt
var s string

fmt.Println(s) // `s` now has the contents of 'hello.txt'

Use xdg instead of hard-coding paths. Don’t assume $HOME exists. Don’t assume any disk storage exists, or is writable.

Go is popular in constrained environments, so be frugal with memory. Don’t read all your data at once; handle one chunk at a time, re-using the same buffer. This will keep your memory footprint small and reduce garbage collection cycles.

Tip: Use GoLand’s profiler to optimise your memory usage and eliminate leaks.

9. Design for errors

Always check errors, and handle them if possible, retrying where appropriate. Report run-time errors to the user and exit gracefully, reserving panic for internal program errors. Don’t ignore errors using _: this leads to obscure bugs.

Show usage hints for incorrect arguments, don’t crash. Rather than prompting users interactively, let them customise behaviour with flags or config.

Tip: GoLand will warn you about unchecked or ignored errors, and offer to generate the handling code for you.

10. Log only actionable information

Logorrhea is irritating, so don’t spam the user with trivia. If you log at all, log only actionable errors that someone needs to fix. Don’t use fancy loggers, just print to the console, and let users redirect that output where they need it. Never log secrets or personal data.

Use slog to generate machine-readable JSON:

logger := slog.New(slog.NewJSONHandler(os.Stdout, nil))
logger.Error("oh no", "user", os.Getenv("USER"))
// Output:
// {"time":"...","level":"ERROR","msg":"oh no",
// "user":"bitfield"}

Logging is not for request-scoped troubleshooting: use tracing instead. Don’t log performance data or statistics: that’s what metrics are for.

Tip: Instead of logging, use GoLand’s debugger with non-suspending logging breakpoints to gather troubleshooting information.

Guru meditation

My mountain-dwelling guru also says, “Make it work first, then make it right. Draft a quick walking skeleton, using shameless green, and try it out on real users. Solve their problems first, and only then focus on code quality.”

Software takes more time to maintain than it does to write, so invest an extra 10% effort in refactoring, simplifying, and improving code while you still remember how it works. Making your programs better makes you a better programmer

Read the whole story
alvinashcraft
1 hour ago
reply
Pennsylvania, USA
Share this story
Delete

Debug Docker Builds with Visual Studio Code

1 Share

Building Docker images is an important component of the software delivery pipeline for modern applications. It’s how we package our apps and services so that they can be distributed to others and deployed to production. While the Dockerfile has long been the standard for defining container images, it is known to be challenging to make changes and debug when issues arise. It’s currently a real pain to understand the build time state during the different stages of the build. What was the state of the ARG? Which files were copied into the image?

Recently, we have been making updates to the Docker Build clients (Buildx) and our VS Code extension (Docker DX) to improve the developer experience when using Docker. Today, we are sharing the next stage of that process with the introduction of Build Debugging in VS Code and Docker Build.

With the new debugging feature in Buildx from Docker, you will be able to reduce the time you spend fixing your Docker builds. In this post, you’ll learn how to configure the Buildx debugger in Visual Studio Code, step through a build and inspect variables and the image’s file system, and open a shell inside the image being built. Finally, you will learn a little about the debugger’s implementation and how it can be integrated into other editors.

Configuring Visual Studio Code

To start debugging Dockerfiles in Visual Studio Code:

  1. Install the latest version of the Docker DX extension.
  2. Update to the latest version of Docker Desktop to ensure you have the latest Docker build tooling.
  3. Run docker buildx version and verify that your Buildx is at least version 0.29.x.

Creating a Launch Configuration

Open up your Dockerfile and open the Run and Debug view in Visual Studio Code. If you do not have any launch configurations, you will see something like the following.

Visual Studio Code's Run and Debug panel is displayed on the left and a Dockerfile for a Python application is opened in the editor to the right.

Figure 1: Run and Debug view opened in Visual Studio Code with no launch configurations defined.

Click on the “create a launch.json file” hyperlink. If you have launch configurations, open up your launch.json file by clicking on the cog icon in the top right hand corner of the Run and Debug view.

In your launch.json file, create a new launch configuration for debugging your Docker build. You can use the sample below to get started. For a full description of the various attributes in a launch configuration, see here.

{
  "name": "Docker: Build",
  "type": "dockerfile",
  "request": "launch",
  "dockerfile": "Dockerfile",
  "contextPath": "${workspaceFolder}"
}

Adding a Breakpoint

Now that you have completed setting up your launch configuration, let’s add a breakpoint to our Dockerfile. Place a breakpoint next to one of your RUN instructions by clicking in the editor’s left margin or by pressing F9. A circle should appear to indicate that a breakpoint has been added.

Launching the Debugger

We are now ready to start the debugger. Select the launch configuration you created and then hit F5. The build should pause at the RUN line where you placed the breakpoint.

Visual Studio Code's Run and Debug view showing the Variables panel with information about the currently paused build. A Dockerfile for a Python application is opened on the right hand side.

Figure 2: Docker build suspended by a breakpoint in Visual Studio Code.

Debugging Features

We will now walk you through the three different features that the Buildx Debugger provides.

Inspecting Variables

When a build is in a suspended state, you can look at any variables that may have been defined. In this example, by looking at the executed command’s workdir value on the left-hand side, we can now see that the command is not being run in the right folder as we had copied the contents into /app. We can fix this by adding WORKDIR /app before the RUN line. Also note that we can view variables that have been defined by our image and the base image as seen by VAR and NODE_VERSION.

oct2025 blog 03

Figure 3: Docker build encounters an error and is suspended by the debugger instead of terminating.

File Explorer

In addition to inspecting variables, you can also look at the structure of the file system to see what is already there and what you have copied in. For text files, you can also see its file content as shown in the file’s data field.

File Explorer tree node shown under the Variables pane. It displays the files and folders of the image being built.

Figure 4: View the file system of the Docker image being built.

Interactive Debugging

Creating the right Dockerfile is often an iterative process. Part of this is usually because the host system you are developing on shares few similarities with the image you are building. Consider the differences between running Ubuntu locally but trying to build an Alpine Linux image. The small differences in package names creates a lot of back and forth between your editor and your browser as you search for the right name. You add a line here and then maybe comment another line somewhere else before running docker build again to just hope for the best.

This iterative process can now be streamlined with the help of the debugger. When your build is in a suspended state, open the Debug Console view and then place your cursor in the input field at the bottom. Type in exec and then hit the enter key. The Terminal view should now open with a shell that is attached to the image that is being built.

Docker build suspended in Visual Studio Code. The Debug Console is opened and the word "exec" has been entered into the console's input field.

Figure 5: Use the Debug Console to open a shell into the Docker image being built by running exec.

Terminal pane opened in Visual Studio Code showing a shell inside the Docker image being built.

Figure 6: The Docker image that is being built can now be accessed and inspected with a terminal.

This feature is a game changer as you can now easily open the image of a Dockerfile at any given step and inspect its content and run commands for testing. Previously, we would have to comment everything after the buggy line, build the Docker image, and then manually run and open a shell into the image. All of that is now condensed into adding a breakpoint in your editor and starting a debug session!
Keep in mind that none of the changes you make in the terminal are persisted so this is purely for experimentation. In the figure below, we can see that a file was created when the debugger was paused at line 3. When the debugger was advanced to line 4, the file disappeared.

Terminal shows that the user created a new file in the shell but it disappears after the debugger has advanced to another line.

Figure 7: Changes to the Docker image inside the exec terminal will be reset when the debugger steps to another line.

Integrations powered by an Open Specification

Just like our work with the Docker Language Server that implements the Language Server Protocol, the Buildx debugger is built on open standards as it implements the Debug Adapter Protocol which means that you can debug Dockerfile builds with any editor that supports the protocol. Besides Visual Studio Code, we also provide an official plugin for Neovim. For the JetBrains users out there, we have verified that it integrates well with the LSP4IJ plugin. If your favourite editor supports the Debug Adapter Protocol, there should be a way for the Buildx debugger to integrate with it.

Thank You

We want to take this opportunity to thank Kohei Tokunaga (ktock) for his ideas and initial work around this feature. The contributions he provided to Buildx gave us a great foundation for us to build out and complete this feature. This release would not have been possible without his help. Thank you, Kohei!

Next Steps

Learn More

Read the whole story
alvinashcraft
1 hour ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories