Everyone’s favorite global developer event is back for another year of learning, connection, building, and donuts. We hope you’re excited to join us at Fort Mason Center in San Francisco on October 28–29. Of course, you don’t just have to be in attendance this year: Our Call for Sessions is open now through Friday, May 1 at 11:59 p.m. PT. If you’ve been thinking about taking the stage, this is your moment. Submit a proposal that shares what you’ve been building over the past year, what you’ve learned, and what other builders can take away from it. And if you know someone who deserves a mic, you can also nominate a speaker.
Need some inspiration? Below, we have five past Universe sessions that have captured our imaginations (so much so that we’re still talking about them). These sessions perfectly encapsulate what Universe is all about: learning, fun, and just a little magic. We can’t wait to see what they spark for you.

Pillippa Pérez Pons took a problem every frontend team recognized: messy rebases, ever-growing monorepos, mysteriously vanishing commits, and general Git chaos, and made it delightfully weird by framing the whole thing as a cat’s nine lives. Each “life” unlocked a lesser-known Git feature or optimization—sparse checkouts, partial clones, reflog rescues, and performance boosts—delivered with storytelling, visuals, and just enough humor to make the tricky parts stick.

This full-on fantasy adventure, presented by Matteo Bianchi (GitHub) and Alexandra Aldershaab (Eficode), cast CI/CD as a castle, reframed ancient scripts as lurking monsters, and sent the audience on a quest to modernize automation without inviting supply chain dragons into the build. Under the playful storytelling, the session delivered a genuinely practical payoff: secure GitHub Actions patterns (with Copilot as a trusty sidekick) that helped teams speed up workflows while keeping security front and center.
In Martin Woodward’s (GitHub) hands, “speed” became less of a productivity metric and more of a creative superpower (and yes, occasionally a little Furby-powered). The session zoomed out to the big shift underway in software: ideas moving from sketch to prototype to shipped experience faster than ever, and the new questions that came with that acceleration. Instead of obsessing over velocity, Martin challenged the audience with the idea that the best developers never stop experimenting. They stay curious. “If you can dream it,” he said. “You can build it.”
If you’ve ever wished Kubernetes security training came with a party of adventurers and a ridiculous quest narrative, this one absolutely delivered. Noah Abrahams, Ian Coldwater, Kat Cosgrove, Seth McCombs, and Natali Vlatko walked the audience through a serious cluster of security concepts while roleplaying their way through a chaotic fantasy world, complete with memorable villains and dramatic stakes.
With true cinematic excitement, Nick Liffen (GitHub) and Niroshan Rajadurai turned application security into a near-future mission briefing. During their session, they pulled back the curtain on how GitHub applies AI to streamline remediation and make alerts easier to interpret—then pushed the story forward into what comes next: AI that can fortify defenses and even spot emerging threats before they turn into incidents. They even saved a special surprise for the finale, giving the whole session that “stay until the credits” energy. This blog post will self-destruct in 3…2…1…just kidding.
If you need help polishing your proposal or making your idea stand out, check out our submission guide, which covers this year’s content tracks, provides an in-depth look at session types, and outlines the anatomy of a great submission.
This year’s sessions fall into three categories:
💡 Pro tip: Ship & Tell is a new format this year that’s ideal for startup founders and builders to tell their story. What did you ship? How did you scale it? What broke? What worked? Your takeaways will provide inspiration for the next generation of builders.
Ready to pitch your own Universe-worthy idea? If these sessions have one thing in common, it’s this: they’re grounded in real engineering lessons, then delivered with personality, creativity, and a clear point of view. That’s exactly the kind of proposal we’re excited to see for Universe.
Before you submit, review the submission guide for what makes a strong proposal, plus tips to match your idea to the right format.
And don’t wait: the deadline to submit a session proposal or nominate a speaker is Friday, May 1 at 11:59 pm PT.
The post GitHub Universe is back: We want you to take the stage appeared first on The GitHub Blog.
The current conversation about AI in software development is still happening at the wrong layer.
Most of the attention goes to code generation. Can the model write a method, scaffold an API, refactor a service, or generate tests? Those things matter, and they are often useful. But they are not the hard part of enterprise software delivery. In real organizations, teams rarely fail because nobody could produce code quickly enough. They fail because intent is unclear, architectural boundaries are weak, local decisions drift away from platform standards, and verification happens too late.
That becomes even more obvious once AI enters the workflow. AI does not just accelerate implementation. It accelerates whatever conditions already exist around the work. If the team has clear constraints, good context, and strong verification, AI can be a powerful multiplier. If the team has ambiguity, tacit knowledge, and undocumented decisions, AI amplifies those too.
That is why the next phase of AI-infused development will not be defined by prompt cleverness. It will be defined by how well teams can make intent explicit and how effectively they can keep control close to the work.
This shift has become clearer to me through recent work around IBM Bob, an AI-powered development partner I have been working with closely for a couple of months now, and the broader patterns emerging in AI-assisted development.
The real value is not that a model can write code. The real value appears when AI operates inside a system that exposes the right context, limits the action space, and verifies outcomes before bad assumptions spread.
The market likes simple narratives, and “AI helps developers write code faster” is a simple narrative. It demos well. You can measure it in isolated tasks. It produces screenshots and benchmark charts. It also misses the point.
Enterprise development is not primarily a typing problem. It is a coordination problem. It is an architecture problem. It is a constraints problem.
A useful change in a large Java codebase is rarely just a matter of producing syntactically correct code. The change has to fit an existing domain model, respect service boundaries, align with platform rules, use approved libraries, satisfy security requirements, integrate with CI and testing, and avoid creating support headaches for the next team that touches it. The code is only one artifact in a much larger system of intent.
Human developers understand this instinctively, even if they do not always document it well. They know that a “working” solution can still be wrong because it violates conventions, leaks responsibility across modules, introduces fragile coupling, or conflicts with how the organization actually ships software.
AI systems do not infer those boundaries reliably from a vague instruction and a partial code snapshot. If the intent is not explicit, the model fills in the gaps. Sometimes it fills them in well enough to look impressive. Sometimes it fills them in with plausible nonsense. In both cases, the danger is the same. The system appears more certain than the surrounding context justifies.
This is why teams that treat AI as an ungoverned autocomplete layer eventually run into a wall. The first wave feels productive. The second wave exposes drift.
There is a phrase I keep coming back to because it captures the problem cleanly. If intent is missing, the model fills the gap.
That is not a flaw unique to one product or one model. It is a predictable property of probabilistic systems operating in underspecified environments. The model will produce the most likely continuation of the context it sees. If the context is incomplete, contradictory, or detached from the architectural reality of the system, the output may still look polished. It may even compile. But it is working from an invented understanding.
This becomes especially visible in enterprise modernization work. A legacy system is full of patterns shaped by old constraints, partial migrations, local workarounds, and decisions nobody wrote down. A model can inspect the code, but it cannot magically recover the missing intent behind every design choice. Without guidance, it may preserve the wrong things, simplify the wrong abstractions, or generate a modernization path that looks efficient on paper but conflicts with operational reality.
The same pattern shows up in greenfield projects, just faster. A team starts with a few useful AI wins, then gradually notices inconsistency. Different services solve the same problem differently. Similar APIs drift in style. Platform standards are applied unevenly. Security and compliance checks move to the end. Architecture reviews become cleanup exercises instead of design checkpoints.
AI did not create those problems. It accelerated them.
That is why the real question is no longer whether AI can generate code. It can. The more important question is whether the development system around the model can express intent clearly enough to make that generation trustworthy.
For a long time, teams treated intent as something informal. It lived in architecture diagrams, old wiki pages, Slack threads, code reviews, and the heads of senior developers. That has always been fragile, but human teams could compensate for some of it through conversation and shared experience.
AI changes the economics of that informality. A system that acts at machine speed needs machine-readable guidance. If you want AI to operate effectively in a codebase, intent has to move closer to the repository and closer to the task.
That does not mean every project needs a heavy governance framework. It means the important rules can no longer stay implicit.
Intent, in this context, includes architectural boundaries, approved patterns, coding conventions, domain constraints, migration goals, security rules, and expectations about how work should be verified. It also includes task scope. One of the most effective controls in AI-assisted development is simply making the task smaller and sharper. The moment AI is attached to repository-local guidance, scoped instructions, architectural context, and tool-mediated workflows, the quality of the interaction changes. The system is no longer guessing in the dark based on a chat transcript and a few visible files. It is operating inside a shaped environment.
One practical expression of this shift is spec-driven development. Instead of treating requirements, boundaries, and expected behavior as loose background context, teams make them explicit in artifacts that both humans and AI systems can work from. The specification stops being passive documentation and becomes an operational input to development.
That is a much more useful model for enterprise development.
The important pattern is not tool-specific. It applies across the category. AI becomes more reliable when intent is externalized into artifacts the system can actually use. That can include local guidance files, architecture notes, workflow definitions, test contracts, tool descriptions, policy checks, specialized modes, and bounded task instructions. The exact format matters less than the principle. The model should not have to reverse engineer your engineering system from scattered hints.
This becomes even clearer when you look at migration work and try to attach cost to it.
One of the recent discussions I had with a colleague was about how to size modernization work in token/cost terms. At first glance, lines of code look like the obvious anchor. They are easy to count, easy to compare, and simple to put into a table. The problem is that they do not explain the work very well.
What we are seeing in migration exercises matches what most experienced engineers would expect. Cost is often less about raw application size and more about how the application is built. A 30,000 line application with old security, XML-heavy configuration, custom build logic, and a messy integration surface can be harder to modernize than a much larger codebase with cleaner boundaries and healthier build and test behavior.
That gap matters because it exposes the same flaw as the code-generation narrative. Superficial output measures are easy to report, but they are weak predictors of real delivery effort.
If AI-infused development is going to be taken seriously in enterprise modernization, it needs better effort signals than repository size alone. Size still matters, but only as one input. The more useful indicators are framework and runtime distance. Those can be expressed in the number of modules or deployables, the age of the dependencies or the number of files actually touched.
This is an architectural discussion. Complexity lives in boundaries, dependencies, side effects, and hidden assumptions. Those are exactly the areas where intent and control matter most.
There is another lesson here that applies beyond migrations. Teams often ask AI systems to produce a single comprehensive summary at the end of a workflow. They want the sequential list of changes, the observed results, the effort estimate, the pricing logic, and the business classification all in one polished report. It sounds efficient, but it creates a problem. Measured facts and inferred judgment get mixed together until the output looks more precise than it really is.
A better pattern is to separate workflow telemetry from sizing recommendations. The first artifact should describe what actually happened. How many files were analyzed or modified. How many lines changed in which time. How many tokens were actually consumed. Or which prerequisites were installed or verified. That is factual telemetry. It is useful because it is grounded.
The second artifact should classify the work. How large and complex was the migration. How broad was the change. How much verification effort is likely required. That is interpretation. It can still be useful, but it should be presented as a recommendation, not as observed truth.
AI is very good at producing complete-sounding narratives but enterprise teams need systems that are equally good at separating what was measured from what was inferred.
If we want AI-assisted modernization to be economically credible, a one-dimensional sizing model will not be enough. A much more realistic model is at least two-dimensional. The first axis is size, meaning the overall scope of the repository or modernization target. The second axis is complexity. This stands for things like legacy depth, security posture, integration breadth, test quality, and the amount of ambiguity the system must absorb.
That model reflects real modernization work far better than a single LOC (lines of code)-driven label. It also gives architects and engineering leaders a much more honest explanation for why two similarly sized applications can land in very different token ranges.
And it reinforces the core point: Complexity is where missing intent becomes expensive.
A code assistant can produce output quickly in both projects. But the project with deeper legacy assumptions, more security changes, and more fragile integrations will demand far more control. It will need tighter scope, better architectural guidance, more explicit task framing, and stronger verification. In other words, the economic cost of modernization is directly tied to how much intent must be recovered and how much control must be imposed to keep the system safe. That is a much more useful way to think about AI-infused development than raw generation speed.
Control is what turns AI assistance from an interesting capability into an operationally useful one. In practice, control means the AI does not just have broad access to generate output. It works through constrained surfaces. It sees selected context. It can take actions through known tools. It can be checked against expected outcomes. Its work can be verified continuously instead of inspected only at the end.
A lot of recent excitement around agents misses this point. The ambition is understandable. People want systems that can take higher-level goals and move work forward with less direct supervision. But in software development, open-ended autonomy is usually the least interesting form of automation. Most enterprise teams do not need a model with more freedom. They need a model operating inside better boundaries.
That means scoped tasks, local rules, architecture-aware context, and tool contracts, all with verification built directly into the flow. It also means being careful about what we ask the model to report. In migration work, some data is directly observed, such as files changed, elapsed time, or recorded token use. Other data is inferred, such as migration complexity or likely cost. If a prompt asks the model to present both as one seamless summary, it can create false confidence by making estimates sound like facts. A better workflow requires the model to separate measured results from recommendations and to avoid claiming precision the system did not actually record.
Once you look at it this way, the center of gravity shifts. The hard problem is no longer how to prompt the model better. The hard problem is how to engineer the surrounding system so the model has the right inputs, the right limits, and the right feedback loops. That is a software architecture problem.
Prompt engineering suggests that the main lever is wording. Ask more precisely. Structure the request better. Add examples. Those techniques help at the margins, and they can be useful for isolated tasks. But they are not a durable answer for complex development environments. The more scalable approach is to improve the system around the prompt.
The more scalable approach is to improve the surrounding system with explicit context (like repository and architecture constraints), constrained actions (via workflow-aware tools and policies), and integrated tests and validation.
This is why intent and control is a more useful framing than better prompting. It moves the conversation from tricks to systems. It treats AI as one component in a broader engineering loop rather than as a magic interface that becomes trustworthy if phrased correctly.
That is also the frame enterprise teams need if they want to move from experimentation to adoption. Most organizations do not need another internal workshop on how to write smarter prompts. They need better ways to encode standards and context, constrain AI actions, and implement verification that separates facts from recommendations.
The pattern I expect to see more often over the next few months is fairly simple. Teams will begin with chat-based assistance and local code generation because it is easy to try and immediately useful. Then they will discover that generic assistance plateaus quickly in larger systems.
In theory, the next step is repository-aware AI, where models can see more of the code and its structure. In practice, we are only starting to approach that stage now. Some leading models only recently moved to 1 million-token context windows, and even that does not mean unlimited codebase understanding. Google describes 1 million tokens as enough for roughly 30,000 lines of code at once, and Anthropic only recently added 1 million-token support to Claude 4.6 models.
That sounds large until you compare it with real enterprise systems. Many legacy Java applications are much larger than that, sometimes by an order of magnitude. One case cited by vFunction describes a 20-year-old Java EE monolith with more than 10,000 classes and roughly 8 million lines of code. Even smaller legacy estates often include multiple modules, generated sources, XML configuration, old test assets, scripts, deployment descriptors, and integration code that all compete for attention.
So repository-aware AI today usually does not mean that the agent fully ingests and truly understands the whole repository. More often, it means the system retrieves and focuses on the parts that look relevant to the current task. That is useful, but it is not the same as holistic awareness. Sourcegraph makes this point directly in its work on coding assistants: Without strong context retrieval, models fall back to generic answers, and the quality of the result depends heavily on finding the right code context for the task. Anthropic describes a similar constraint from the tooling side, where tool definitions alone can consume tens of thousands of tokens before any real work begins, forcing systems to load context selectively and on demand.
That is why I think the industry should be careful with the phrase “repository-aware.” In many real workflows, the model is not aware of the repository in any complete sense. It is aware of a working slice of the repository, shaped by retrieval, summarization, tool selection, and whatever the agent has chosen to inspect so far. That is progress, but it still leaves plenty of room for blind spots, especially in large modernization efforts where the hardest problems often sit outside the files currently in focus.
After that, the important move is making intent explicit through local guidance, architectural rules, workflow definitions, and task shaping. Then comes stronger control, which means policy-aware tools, bounded actions, better telemetry, and built-in verification. Only after those layers are in place does broader agentic behavior start to make operational sense.
This sequence matters because it separates visible capability from durable capability. Many teams are trying to jump directly to autonomous flows without doing the quieter work of exposing intent and engineering control. That will produce impressive demos and uneven outcomes. The teams that get real leverage from AI-infused development will be the ones that treat intent as infrastructure.
For the last year, the question has often been, “What can the model generate?” That was a reasonable place to start because generation was the obvious breakthrough. But it is not the question that will determine whether AI becomes dependable in real delivery environments.
The better question is: “What intent can the system expose, and what control can it enforce?”
That is the level where enterprise value starts to become durable. It is where architecture, platform engineering, developer experience, and governance meet. It is also where the work becomes most interesting, not as a story about an assistant producing code but as part of a larger shift toward intent-rich, controlled, tool-mediated development systems.
AI is making discipline more visible.
Teams that understand this will not just ship code faster. They will build development systems that are more predictable, more scalable, more economically legible, and far better aligned with how enterprise software actually gets delivered.

Even with years of experience in leadership and change management, I couldn’t escape the familiar phases of change, this time with AI adoption. And that’s natural. Whenever something new arises, our first instinct is fear, especially when it could solve years of “only ifs.”
Yes, the tech industry is in a storm, and being an engineering lead has never been harder. Everyone is looking to you for answers, direction, and vision.
At first, I was in denial, thinking AI was for someone with fewer responsibilities. But then I realized the real question was:
What can I do to help my teams adapt to AI?
The answer was simple: push myself through the change and proactively lead my team’s transformation. Needless to say, it wasn’t easy.
Even without an assistant, I am a top performer; I do not need to perform better.
Last year, AI tools arrived in our workspaces, but most of us were still in denial. They made for good morning coffee talk, but using them daily? That felt unreal.
Yeah, right. That will not happen. It’s not that I don’t want to learn something new, but I already have a ton on my plate: five teams to lead, acting as PM for a few products, and coaching a few prospective managers. On top of that, there are parallel technical initiatives I need to oversee. I simply don’t have time to play with shiny new tools,that’s for people not juggling ten parallel topics.
You can see how easy it is to fall into the trap of your own perspective. The painful truth is: even if you keep delivering at your current pace, without adopting AI tools, you won’t be able to keep up in a few months.
How can you tell if you’re stuck in denial?
Start by asking yourself three crucial questions:
You don’t want to stay in this state for too long. As a proactive leader, your next step should be to take the time to investigate.
I don’t want you! You are not my assistant!
And suddenly you realize the change is real. Your doubt becomes reality. Every topic is now an AI topic, and it’s irritating. It should feel new and interesting, but you feel pushed into it, without the chance to choose. Already overstretched, your natural response is another primary feeling: anger.
I could sense that state of mind in many of my peers and engineers over the last year. It’s not easy to force yourself into a positive mindset instantly.
The message I want to share is this: it’s okay to feel negative, but staying in that state too long can undermine your results. Spend as little time as possible in it, learn to let it go, and give it a try.
Ok, let’s see what I can really do.
Keeping an open mind is valuable for anyone, no matter their rank or role. As an engineer, rolling up your sleeves should be natural.
So, aside from trying it out of curiosity, I decided to dive deeper and explore architecture and try the tools. Engineers usually transition quickly, but if they get stuck, you can help by highlighting the positive aspects. I used the opportunity to innovate and learn as a positive hook.
I need to relearn everything again – more work for human me… again.
Reality strikes.
I’ve opened a new Pandora’s box. So much to adapt to, while still maintaining old performance. AI should help me, not add more work. Will I ever escape this loop?
Leading through this phase is about providing support and reminding people of the positive outcomes.
We’re friends now. Sorry I was so mean before.
With the knowledge comes the acceptance.
Yes, that was a big change, but I found a couple of good use cases quickly. Claude code amazed me, and I even saw a few valid Copilot use cases, even though I despised it at first. I started thinking about all the cases I could explore, and my inner engineer took over.
Now it’s easy to bring others on board and help them through the change. And stay transparent, sharing the doubts you’ve faced and showing the human side.
Remember: ignoring changes around you is risky for any organization. It’s natural to fall into denial, but as a leader, it’s crucial to recognize it and take action.
Being aware of the steps that individuals, teams, or the organization need to push through (and helping them do it) is a key leadership skill.
The post As an Engineering Manager, I couldn’t ignore AI if my teams are to survive appeared first on ShiftMag.
This guest post comes from Jacob Lee, Founding Software Engineer at LangChain, who set out to build a coding agent more aligned with how he actually likes to work. Here, he walks through what he built using Deep Agents and the Agent Client Protocol (ACP), and what he learned along the way.
I’ve come to accept that I will delegate an ever-increasing amount of my work as a software engineer to LLMs. I was an early Claude Code superfan, and though my ego still tells me I can write better code situationally than Anthropic’s proto-geniuses in a data center, these days I’m mostly making point edits and suggestions rather than writing modules by hand.
This shift has made me far more productive, but I’ve become increasingly uncomfortable with blindly turning over such a big part of my job to an opaque third party. While training my own model was out of the question for many obvious reasons (and model interpretability is an unsolved problem anyway), the agent harness and UX on top of it is just software, and software IS something I understand. So when I had some free time during my paternity leave, I took a stab at building some tooling to my own specifications.
I work at a startup called LangChain, where we’ve been developing our own set of open-source agentic building blocks, and I settled on building an adapter between our Deep Agents framework and Agent Client Protocol (ACP). My goal was just to build a bespoke coding agent that fit my workflows, but the results were better than I expected. Over the past few months, it’s completely replaced Claude Code as my daily driver, with the added benefit of full observability into my agent’s actions by running LangSmith on top. In this post, I’ll cover how it works and how to set it up for yourself!
If you’re not familiar with ACP, it’s an open protocol that defines how a client (most often used with IDEs like WebStorm or Zed) interacts with AI agents. It allows you to do cool things like quickly pass a coding agent the exact context you’re looking at in an IDE.
I’ve gotten quite used to being productive in IDEs over my decade writing software professionally, and I still find them valuable for a few reasons:
grep around.I previously used Claude Code in a separate terminal pane in an IDE, which worked but always felt like two disconnected tools. In JetBrains IDEs, the agent lives in a native tool window with tight integration. I can @mention the file or block of code I’m currently looking at, and many of my threads are littered with messages like “Take a look at this. Does it look funny? @thisFile“.
Though I could have created the various pieces for my agent from scratch, Deep Agents provided a good, opinionated starting point, providing the following:
read/write/edit_file, ls, grep, etc.).write_todos tool, which encourages the agent to take a planning step that breaks work into steps and tracks progress.

I also added some custom middleware that appends information about the current project setup in the system prompt, such as the current directory open in the IDE, whether a git repo was present, package manager detection, and more.
It’s also possible to add skills, tweak the system prompt, add custom tools or MCP servers, and more, directly in Python, rather than having to create a new CLI config option.
After deciding on a basic agent setup, I needed to hook that agent into the client via ACP. I created an adapter that implements the ACP interface and handles the session lifecycle, message routing, model switching, and streaming.
One nice surprise was how cleanly the agent’s capabilities mapped onto ACP concepts.
For example:
write_todos) maps naturally to agent plans in ACP.This meant I didn’t need to invent much glue logic – the protocol already had good primitives for most of what I wanted. The overall agent runner looks roughly like this, minus the tool call and message formatting:
current_state = None
user_decisions = []
while current_state is None or current_state.interrupts:
# Check for cancellation
if self._cancelled:
self._cancelled = False # Reset for next prompt
return PromptResponse(stop_reason="cancelled")
async for stream_chunk in agent.astream(
Command(resume={"decisions": user_decisions})
if user_decisions
else {"messages": [{"role": "user", "content": content_blocks}]},
config=config,
stream_mode=["messages", "updates"],
subgraphs=True,
):
if stream_chunk.__interrupt__:
# If Deep Agents interrupts, request next actions from
# the client via ACP's session/request_permission method
user_decisions = await self._handle_interrupts(
current_state=current_state,
session_id=session_id,
)
# Break out of the current Deep Agent stream. The while
# loop above resumes it with the user decisions
# returned from the session/request_permission method
break
# ...translate LangGraph output into ACP
# Tools that do not require interrupts are called
# internally results are just streamed back here as well
# current_state will be none when the agent has finished
current_state = await agent.aget_state(config)
return PromptResponse(stop_reason="end_turn")
The human-in-the-loop flow was where I spent the most time. When the agent wants to run a shell command or make a file edit that requires approval, the adapter intercepts the interrupt from Deep Agents, and depending on what permissions mode the user has selected and what they have previously approved, either resumes immediately or sends a permission request to the IDE with options to approve, reject, or always-allow that command type.
The always-allow is session-scoped – if you approve uv sync once and choose “always allow”, subsequent uv sync calls skip the prompt automatically, but I made efforts to prevent similar commands such as uv run script.py from bypassing the permission check.
Here’s how the end result looks in WebStorm:

While I haven’t run formal evals, I was pleasantly surprised by how well my agent performed after only a few iterations. I didn’t actually expect to switch away from Claude Code, and it was a great dogfooding exercise as well, since our OSS team was able to upstream some of my feedback back into Deep Agents itself.
My original goal of regaining code-level, rather than config-level, control over my daily workflows has also been great. When Anthropic had an outage a few weeks ago, I was able to switch over to OpenAI’s gpt-5.4 without skipping a beat, and I even found that it had some interesting quirks. I switch back and forth between models mid-session to gain different perspectives from each model when working on tricky tasks, and have also found open-source models like GLM-5 are quite capable while offering significant cost savings.
Another boon is observability via LangSmith tracing, which allows me to debug and improve my agent when I run into issues. Being able to see exactly what context was passed to the model, which tools it called, and where it went sideways helped me understand behaviors that were previously hidden inside the harness. Here’s an example of what such a trace looks like:

For example, when I noticed that my agent was starting to take wide, slow sweeps of my filesystem, I used a trace to find a bug in my system prompt that told the agent the project was at the filesystem root rather than the current working directory.
What started as a small late-night project I worked on around taking care of a newborn daughter turned into a huge success, both for my own understanding of agent behavior and for improving my daily workflow.
It proved to me that Claude Code isn’t magic but a bundle of very clever tricks rolled up into a neat package. The harness layer is just software, and software is something any developer can shape to fit how they want to work.
If you’re curious, I’d highly recommend trying an experiment like this yourself. Even a small prototype can teach you a lot about how these systems think and where they break. Clone the repo and follow the setup guide here to get started from source code. I’d love to know what you think. You can reach out to me on X @Hacubu to let me know!
Special thanks to @veryboldbagel and @masondxry for helping productionize the adapter and dealing with my unending questions and feedback!
Until now, Junie CLI has worked like any other standalone agent. It was powerful, but disconnected from the workflows you set up for your specific projects. That changes today.
Junie CLI can now connect to your running JetBrains IDE and use its full code intelligence, including the indexing, semantic analysis, and tooling you already rely on. The agent works with your IDE the same way you do. It sees what you see, knows what you’ve been working on, and uses the same build and test configurations you’ve already set up.
No manual setup is required – Junie CLI detects your running IDE automatically. If you have a JetBrains AI subscription, everything works out of the box.
Most AI coding agents operate in isolation. They read your files, guess at your project structure, and and attempt to run builds or tests without full context. This can work for simple projects, but it falls apart in real-world codebases, such as monorepos with complex build configurations, projects with hundreds of modules, or test setups that took your team weeks to get right.
Junie doesn’t guess. It asks your IDE, which gives it the power to:
Junie sees what you’re working on right now – which file is open, what code you’ve selected, and which builds and tests you’ve run recently. Instead of scanning your entire repository to understand what’s relevant, it starts with the same context you have.
On a monorepo or any project with a non-trivial test setup, Junie uses the IDE’s pre-configured test runners – no guessing at commands and no broken configurations.
When Junie renames a symbol, it uses the IDE’s semantic index to find every usage – searching across files, respecting scope, and handling overloads and variables with the same name that appear in different contexts. This is the kind of refactoring that text-based search gets wrong.
Junie runs builds and tests using your existing IDE configurations.
Custom build commands, non-obvious test runners, cross-compilation targets – if your IDE understands them, Junie does too.
From the IDE’s index, Junie accesses the project structure without reading files line by line. Its synonym-aware search finds “variants” when you search for “options”. It navigates code the way you would, not the way grep does.
Junie CLI’s IDE integration works in all JetBrains IDEs. Support for Android Studio is coming soon.
Make sure your JetBrains IDE is running, then launch Junie CLI in your project directory. It will automatically detect the IDE and prompt you to install the integration plugin. One click, and you’re connected.
If you’re a JetBrains AI subscriber, authentication is automatic, while Bring Your Own Key (for Anthropic, OpenAI, etc.) is also fully supported.
This integration is currently in Beta. We’re actively expanding the capabilities Junie can access through your IDE, and your feedback will directly shape what comes next.
Try it out, and let us know what you think.