Kotlin 2.3.0-RC is out!
Kotlin 2.3.0-RC has landed! Learn about the new features, explore the improvements, and get ready for the upcoming stable release.
See what’s newI’ve gathered the latest Kotlin highlights for you – from the Kotlin Reddit AMA and documentation updates to learning programs and Google Summer of Code 2025 projects. Whether you’re here to stay up to date or just looking for something interesting to explore, there’s plenty to dive into.
In this special episode, Dawid Dahl introduces Augmented AI Development (AAID)—a disciplined approach where professional developers augment their capabilities with AI while maintaining full architectural control. He explains why starting with software engineering fundamentals and adding AI where appropriate is the opposite of most frameworks, and why this approach produces production-grade software rather than technical debt.
"Two of the fundamental developer principles for AAID are: first, don't abandon your brain. And the second is incremental steps."
Dawid's Augmented AI Development framework stands in stark contrast to "vibecoding"—which he defines strictly as not caring about code at all, only results on screen. AAID is explicitly designed for professional developers who maintain full understanding and control of their systems. The framework is positioned on the furthest end of the spectrum from vibe coding, requiring developers to know their craft deeply. The two core principles—don't abandon your brain, work incrementally—reflect a philosophy that AI is a powerful collaborator, not a replacement for thinking. This approach recognizes that while 96% of Dawid's code is now written by AI, he remains the architect, constantly steering and verifying every step.
In this segment we refer to Marcus Hammarberg's work and his book The Bungsu Story.
"You should start with software engineering wisdom, and then only add AI where it's actually appropriate. I think this is super, super important, and the entire foundation of this framework. This is a hill I will personally die on."
What makes AAID fundamentally different from other AI-assisted development frameworks is its starting point. Most frameworks start with AI capabilities and try to add structure and best practices afterward. Dawid argues this is completely backwards. AAID begins with 50-60 years of proven software engineering wisdom—test-driven development, behavior-driven development, continuous delivery—and only then adds AI where it enhances the process. This isn't a minor philosophical difference; it's the foundation of producing maintainable, production-grade software. Dawid admits he's sometimes "manipulating developers to start using good, normal software engineering practices, but in this shiny AI box that feels very exciting and new." If the AI wrapper helps developers finally adopt TDD and BDD, he's fine with that.
"Every time I prompt an AI and it writes code for me, there is often at least one or two or three mistakes that will cause catastrophic mistakes down the line and make the software impossible to change."
Test-driven development isn't just a nice-to-have in AAID—it's essential. Dawid has observed that AI consistently makes 2-3 mistakes per prompt that could have catastrophic consequences later. Without TDD's red-green-refactor cycle, these errors accumulate, making code increasingly difficult to change. TDD answers the question "Is my code technically correct?" while acceptance tests answer "Is the system releasable?" Both are needed for production-grade software. The refactor step is where 50-60 years of software engineering wisdom gets applied to make code maintainable. This matters because AAID isn't vibe coding—developers care deeply about code quality, not just visible results. Good software, as Dave Farley says, is software that's easy to change. Without TDD, AI-generated code becomes a maintenance nightmare.
"When I hear 'our AI can now code for over 30 hours straight without stopping,' I get very afraid. You fall asleep, and the next morning, the code is done. Maybe the tests are green. But what has it done in there? Imagine everything it does for 30 hours. This system will not work."
Dawid sees two diverging paths for AI-assisted development's future. The first—autonomous agents working for hours or days without supervision—terrifies him. The marketing pitch sounds appealing: prompt the AI, go to sleep, wake up to completed features. But the reality is technical debt accumulation at scale. Imagine all the decisions, all the architectural choices, all the mistakes an AI makes over 30 hours of autonomous work. Dawid advocates for the stark contrast: working in extremely small increments with constant human steering, always aligned to specifications. His vision of the future isn't AI working alone—it's voice-controlled confirmations where he says "Yes, yes, no, yes" as AI proposes each tiny change. This aligns with DORA metrics showing that high-performing teams work in small batches with fast feedback loops.
"Without Dave Farley, this framework would be totally different. I think he does everything right, basically. With this framework, I want to stand on the shoulders of giants and work on top of what has already been done."
AAID explicitly requires product discovery and specification phases before AI-assisted coding begins. This is based on Dave Farley's product journey model, which shows how products move from idea to production. AAID starts at the "executable specifications" stage—it requires input specifications from prior discovery work. This separates specification creation (which Dawid is addressing in a separate "Dream Encoder" framework) from code execution. The prerequisite isn't arbitrary; it acknowledges that AI-assisted implementation works best when the problem is well-defined. This "standing on shoulders of giants" approach means AAID doesn't try to reinvent software engineering—it leverages decades of proven practices from TDD pioneers, BDD creators, and continuous delivery experts.
"When the AI decides to check the box [in task lists], that means this is the definition of done. But how is the AI taking that decision? It's totally ad hoc. It's like going back to the 1980s: 'I wrote the code, I'm done.' But what does that mean? Nobody has any idea."
Dawid is critical of current AI frameworks like SpecKit, pointing out fundamental flaws. They start with AI first and try to add structure later (backwards approach). They use task lists with checkboxes where AI decides when something is "done"—but without clear criteria, this becomes ad hoc decision-making reminiscent of 1980s development practices. These frameworks "vibecode the specs," not realizing there's a structured taxonomy to specifications that BDD already solved. Most concerning, some have removed testing as a "feature," treating it as optional. Dawid sees these frameworks as over-engineered, process-centric rather than developer-centric, often created by people who may not develop software themselves. AAID, in contrast, is built by a practicing developer solving real problems daily.
"The first thing developers should do is learn the fundamentals. They should skip AI altogether and learn about BDD and TDD, just best practices. But when you know that, then you can look into a framework, maybe like mine."
Dawid's advice for developers interested in AI-assisted coding might seem counterintuitive: start by learning fundamentals without AI. Master behavior-driven development, test-driven development, and software engineering best practices first. Only after understanding these foundations should developers explore frameworks like AAID. This isn't gatekeeping—it's recognizing that AI amplifies whatever approach developers bring. If they start with poor practices, AI will help them build unmaintainable systems faster. But if they start with solid fundamentals, AI becomes a powerful multiplier that lets them work at unprecedented speed while maintaining quality. AAID offers both a dense technical article on dev.to and a gentler game-like onboarding in the GitHub repo, meeting developers wherever they are in their journey.
About Dawid Dahl
Dawid is the creator of Augmented AI Development (AAID), a disciplined approach where developers augment their capabilities by integrating with AI, while maintaining full architectural control. Dawid is a software engineer at Umain, a product development agency.
You can link with Dawid Dahl on LinkedIn and find the AAID framework on GitHub.
The pressure is on. Every conference, every tech blog, every corner of the internet is buzzing with AI agents, autonomous workflows, and the promise of a revolution powered by large language models (LLMs). As a developer, it’s easy to feel like you need to integrate AI into every feature and deploy agents for every task.
But what if the smartest move isn’t to use AI, but to know when not to?
This isn’t a contrarian take for the sake of it; it’s a call for a return to engineering pragmatism. The current hype cycle often encourages us to reach for the most complex, exciting tool in the box, even when a simple screwdriver would do the job better. Just as you wouldn’t spin up a Kubernetes cluster to host a static landing page, you shouldn’t use a powerful, probabilistic LLM for a task that is, and should be, deterministic.
This post is a guide to cutting through the noise. We’ll explore why using AI indiscriminately is an anti-pattern, and lay out a practical framework for deciding when and how to use AI and agents effectively. All of this will help ensure you’re building solutions that are robust, efficient, and cost-effective.
Would you use a cloud-based AI service to solve 1 + 1? Of course not. It sounds absurd. Yet, many of the AI implementations I see today are the developer equivalent of that very question. We’re so mesmerized by what AI can do that we forget to ask if it should do it.
Using an LLM for a simple, well-defined task is a classic case of over-engineering. It introduces three significant penalties that deterministic, traditional code avoids.

API calls to powerful models like GPT-5 or Claude 4.5 are not free. Let’s say you need to validate if a user’s input is a valid email address. You could send this string to an LLM with a prompt like, “Is the following a valid email address? Answer with only ‘true’ or ‘false’.”
A simple regex in JavaScript, however, is free and executes locally in microseconds.
function isValidEmail(email) {
const regex = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
return regex.test(email);
}
Now, imagine this check runs on every keystroke in a form for thousands of users. The cost of the LLM approach quickly spirals from negligible to substantial. The regex remains free, forever.

Every API call to an LLM is a network round-trip. It involves sending your prompt, waiting for the model to process it, and receiving the response. For the user, this translates to noticeable lag.
Consider a simple data transformation: converting a string from snake_case to camelCase. A local function is instantaneous. An LLM call could take anywhere from 300 milliseconds to several seconds. In a world where user experience is paramount, introducing such a bottleneck for a trivial task is a step backward.

This is the most critical issue for developers. LLMs are probabilistic; your code should be deterministic. When you run a function, you expect the same input to produce the same output, every single time. This is the bedrock of reliable software.
LLMs don’t offer that guarantee. They can hallucinate or misinterpret context. If your AI-powered email validator suddenly decides a valid email is invalid, you have a bug that is maddeningly difficult to reproduce and debug. For core application logic, you need 100% reliability:
So, if we shouldn’t use AI for simple, deterministic tasks, what is it actually good for?
The answer is simple: use AI for problems that are difficult or impossible to solve with traditional code. LLMs excel where logic is fuzzy, data is unstructured, and the goal is generation or interpretation, not calculation.
The guiding principle should be: Use deterministic code for deterministic problems. Use probabilistic models for probabilistic problems.
This applies to both in-application logic and our own development processes.
Here are a few areas where integrating an LLM into your application is the right tool for the job:
Beyond integrating AI into your applications, remember its immense utility as a personal productivity tool. Using AI for these auxiliary tasks boosts your efficiency without introducing its probabilistic nature into your core application logic. You can use LLMs to:
The hype around AI gets even more intense when we talk about AI agents. An agent is often presented as a magical entity that can autonomously use tools, browse the web, and solve complex problems with a single prompt.
This autonomy is both a great strength and a significant risk. Letting an LLM decide which tool to use or determine next steps introduces another layer of non-determinism. What if it chooses the wrong tool? What if it gets stuck in a loop, burning through your budget?
Before jumping to a fully autonomous agent, we should look at a spectrum of patterns that offer more structure and reliability. In their excellent article, “Building effective agents,” the team at Anthropic draws a crucial architectural distinction:

The key takeaway is to start with the simplest solution and only add complexity when necessary. The image below visualizes this spectrum, moving from predictable workflows to inference-driven agents:
Before building complex workflows, we need to understand the fundamental unit: what Anthropic calls the Augmented LLM. This isn’t just a base model; it’s an LLM enhanced with external capabilities. The two most important augmentations are:
These two building blocks—Tool Use and RAG—are the foundation upon which more sophisticated and reliable systems are built.
Now, let’s see how these building blocks can be assembled into the structured, predictable workflows that Anthropic recommends as alternatives to fully autonomous agents.
The simplest multi-step pattern. A task is broken down into a fixed sequence of steps, where the output of one LLM call becomes the input for the next. A step within this chain could absolutely involve the LLM using a tool or performing a RAG query.
In this pattern, an initial LLM call classifies an input and directs it to one of several specialized, downstream workflows. This is perfect for customer support, where you might route a query to a “refund process” (which uses a tool to access an orders database) or a “technical support” handler (which uses RAG to search documentation).
This workflow involves running multiple LLM calls simultaneously and aggregating their outputs. For instance, you could have one LLM call use RAG to find relevant policy documents while another call uses a tool to check the user’s account status.
A central “orchestrator” LLM breaks down a complex task and delegates sub-tasks to specialized “worker” LLMs, which in turn use the tools and retrieval capabilities necessary to complete their specific job.
This creates an iterative refinement loop. One LLM call generates a response, while a second “evaluator” LLM provides feedback. The first LLM then uses this feedback to improve its response. The evaluator’s criteria could be informed by information retrieved via RAG.
Before you write import openai, pause and ask yourself a few questions. The goal is not to use AI; the goal is to solve a problem efficiently.
Being a great developer isn’t about chasing every new trend. It’s about building robust, efficient, and valuable software. AI and agents are incredibly powerful tools, but they are just that: tools. True innovation comes from knowing your toolbox inside and out and picking the right tool for the job.
Focus on the customer’s pain points. Solve them in the most efficient and reliable way possible. Sometimes that will involve a cutting-edge LLM, but often, it will be a simple, elegant piece of deterministic code. That is the path to building things that last.
The post You don’t need AI for everything: A reality check for developers appeared first on LogRocket Blog.
TL;DR: Turn static PDFs into interactive documents! Learn how to add and manage annotations in JavaScript with highlights, notes, and export options. Includes API integration steps for creating, editing, and persisting annotations in web apps.
PDF annotations are interactive markups, such as highlights, underlines, strikethroughs, stamps, comments, free text, ink, and shapes, that help clarify content without altering the original document. They make collaboration easier and keep feedback organized.
In modern workflows, reviewing PDFs without annotation tools is a slow and inefficient process. Teams often rely on emails or external chat platforms to share feedback, which leads to delays and increases the risk of miscommunication.
JavaScript PDF annotation enables users to add these markups directly in the browser, preserving the document’s integrity while facilitating faster, clearer, and more contextual reviews. This is especially useful for online PDF workflows where desktop tools aren’t practical.
In this guide, you’ll learn how to implement PDF annotation in JavaScript using Syncfusion’s PDF Viewer component.
When you use JavaScript to add PDF annotations, you can choose from several types to make documents more interactive and easier to review. In PDFs, an annotation is any user-added markup, such as highlights, shapes, ink, free text, stamps, or comments, that helps clarify the document without altering its original content.
Here are the most common types of annotations and how you can use them:
Text markup annotations are used to visually highlight specific parts of a PDF without changing its original content. Common types include:
Ideal for legal reviews, academic proofreading, and technical documentation to mark important sections or suggest edits.

Shape annotations include Rectangle, Circle, Arrow, Polygon, and Line, allowing users to highlight specific areas in a document. They are particularly useful for marking diagrams, technical drawings, or visually emphasizing sections. Commonly applied in design and engineering documents to highlight critical areas.

Stamp annotations enable users to place predefined or custom stamps, such as “Approved,” “Rejected,” or logos, on a PDF to indicate status or branding. This feature is ideal for workflows that require visual confirmation of approvals or the addition of branding elements.

Sticky notes allow users to add small comment pop-ups to a PDF, providing feedback or extra context without cluttering the main content. They are ideal for document reviews where suggestions, clarifications, or inline comments are needed.

Free Text annotations enable users to type text directly onto the PDF, adding visible notes or instructions without modifying the original content. Used to add comments, labels, or instructions in forms and review documents.

Ink annotations allow freehand drawing on a PDF using a pen tool, making them ideal for sketches, notes, or signatures. They are often used to sign forms, mark diagrams, or add any freehand input.

Measurement annotations allow users to measure distances, radius, volume, or areas directly on a PDF, which is essential for precision-based tasks. They are widely used in architectural plans and engineering drawings to verify dimensions, all of which are enabled through JavaScript PDF annotations.

Handwritten signature annotations allow users to sign documents directly in the browser. This feature is ideal for signing contracts or approving forms directly within the web application.

Would you like to try these annotations in your app? Explore Syncfusion JavaScript PDF Viewer and check out the live annotation demo to see it in action.
After exploring all these annotation types, let’s dive into the simple steps to integrate Syncfusion JavaScript PDF Viewer into your application and start adding annotations to your PDFs.
Setting up the Syncfusion JavaScript PDF Viewer is quick and straightforward.
Here’s how to get started:
First, open your terminal and run the following command to install the Syncfusion PDF Viewer package:
npm install @syncfusion/ej2-pdfviewer
Next, navigate to your JavaScript or TypeScript file and import the modules needed:
import {
PdfViewer,
Toolbar,
Magnification,
Navigation,
Annotation,
LinkAnnotation,
ThumbnailView,
BookmarkView,
TextSelection,
TextSearch,
FormFields,
FormDesigner
} from '@syncfusion/ej2-pdfviewer';
These modules offer essential features, including navigation, annotations, text search, and more, to provide a comprehensive PDF viewing experience.
Finally, create an instance of the PDF Viewer and attach it to your page:
let viewer = new PdfViewer({
documentPath: 'https://cdn.syncfusion.com/content/pdf/pdf-succinctly.pdf',
});
viewer.appendTo('#PdfViewer');
And that’s it! Your Syncfusion JavaScript PDF Viewer is now ready to use.
Note: For a detailed walkthrough and advanced configuration options, refer to the Getting Started documentation.
Now that your PDF Viewer is set up, let’s explore how to add, delete, and manage annotations using the built-in toolbar or through APIs.
Adding annotations is simple and fast. You can add annotations either through the user interface or programmatically using Syncfusion’s JavaScript PDF Viewer.
The easiest way to add annotations is through the built-in toolbar. It provides an intuitive interface that allows users to interact directly with PDF pages for seamless review and markup. The toolbar supports several annotation types, including:
With these options, users can quickly and efficiently annotate PDFs, making collaboration and document review more interactive.

For developers who need flexibility and automation, the API provides a powerful way to add annotations dynamically. Instead of relying on the UI, you can use the addAnnotation method to create annotations programmatically:
pdfviewer.annotation.addAnnotation(type, options);
This approach is ideal for custom workflows or data-driven annotation logic, where annotations need to be generated automatically based on user actions or external data.
Note: For advanced configuration and additional options, refer to the documentation.
Removing annotations is just as simple as adding them. You can delete annotations either through the user interface or programmatically using Syncfusion’s JavaScript PDF Viewer.
Users can remove specific annotations directly from the PDF by clicking the Delete icon in the toolbar or by right-clicking the annotation and selecting Delete from the context menu. This provides a quick and intuitive way to clean up unwanted markups during review.

For developers who need more control, annotations can be deleted programmatically using the deleteAnnotationById method. Simply pass the annotation’s unique ID:
viewer.annotationModule.deleteAnnotationById(annotationId);
The annotationId is a unique identifier for the annotation you want to remove. You can retrieve this ID from the annotationCollection array, making it easy to manage annotations dynamically.
Note: For advanced options and detailed examples, refer to the official documentation.
Beyond deleting, Syncfusion’s PDF Viewer allows users to edit, customize, comment, export, and import annotations, creating a seamless review experience.
To add comments, simply double-click an annotation or use the toolbar to open the comment panel. From there, you can add notes, reply to existing comments, and even set statuses like Accepted or Rejected for collaborative reviews.

Make annotations stand out by adjusting properties such as stroke or fill color, opacity, thickness, or font styles for free text annotations. These changes can be applied through the Properties panel or programmatically via the API.

Syncfusion’s JavaScript PDF Viewer supports exporting annotations in XFDF and JSON formats. This feature allows developers to persist, share, and reload annotations efficiently in web applications.
Export example:
var xfdfData = pdfviewer.exportAnnotations();
console.log(xfdfData); // XFDF string

To reload annotations from XFDF, use the code below:
var savedData = localStorage.getItem('pdfAnnotations');
if (savedData) {
pdfviewer.importAnnotations(savedData);
}
Managing annotations doesn’t have to be complicated. Users can simply right-click on any annotation to access quick actions such as cut, copy, paste, delete, or add comments. This streamlined approach makes it easy to modify annotations directly within the document, eliminating the need for additional UI components and ensuring a faster, more intuitive review process.

Note: To explore all annotation management features in detail, refer to the official documentation.
Thank you for reading! Implementing annotation with Syncfusion JavaScript PDF Viewer enhances your web application by enabling users to add highlights, sticky notes, shapes, stamps, ink, and free text directly within PDFs. Whether through an intuitive toolbar or a flexible API, annotations can be added and customized with ease.
Developers can fine-tune properties like color, opacity, and thickness, and ensure persistence by saving annotations to the PDF or exporting/importing them in XFDF or JSON formats. Here’s what developers say about Syncfusion:
Ready to implement JavaScript PDF annotation? Try the live demo and explore the full PDF Viewer features in the documentation.
The PDF Viewer component offers all the functionality I could ask for and is fully customizable, G2 Reviewer.
If you’re a Syncfusion user, you can download the setup from the license and downloads page. Otherwise, you can download a free 30-day trial.
You can also contact us through our support forum, support portal, or feedback portal for queries. We are always happy to assist you!
Ever tried to hammer a nail in with a potato?
Nor me, but that’s what I’ve felt like I’ve been attempting to do when trying to really understand agents, as well as to come up with an example agent to build.
As I wrote about previously, citing Simon Willison, an LLM agent runs tools in a loop to achieve a goal. Unlike building ETL/ELT pipelines, these were some new concepts that I was struggling to fit to an even semi-plausible real world example.
That’s because I was thinking about it all wrong.
For the last cough 20 cough years I’ve built data processing pipelines, either for real or as examples based on my previous experience. It’s the same pattern, always:
Data comes in
Data gets processed
Data goes out
Maybe we fiddle around with the order of things (ELT vs ETL), maybe a particular example focusses more on one particular point in the pipeline—but all the concepts remain pleasingly familiar. All I need to do is figure out what goes in the boxes:
I’ve even extended this to be able to wing my way through talking about applications and microservices (kind of). We get some input, we make something else happen.
Somewhat stretching beyond my experience, admittedly, but it’s still the same principles. When this thing happens, make a computer do that thing.
Perhaps I’m too literal, perhaps I’m cynical after too many years of vendor hype, or perhaps it’s just how my brain is wired—but I like concrete, tangible, real examples of something.
So when it comes to agents, particularly with where we’re at in the current hype-cycle, I really wanted to have some actual examples on which to build my understanding. In addition, I wanted to build some of my own. But where to start?
Here was my mental model; literally what I sketched out on a piece of paper as I tried to think about what real-world example could go in each box to make something plausible:
But this is where I got stuck, and spun my proverbial wheels on for several days. Every example I could think of ended up with me uttering, exasperated…but why would you do it like that.
My first mistake was focussing on the LLM bit as needing to do something to the input data.
I had a whole bunch of interesting data sources (like river levels, for example) but my head blocked on "but that’s numbers, what can you get an LLM to do with those?!". The LLM bit of an agent, I mistakenly thought, demanded unstructured input data for it to make any sense. After all, if it’s structured, why aren’t we just processing it with a regular process—no need for magic fairy dust here.
This may also have been an over-fitting of an assumption based on my previous work with an LLM to summarise human-input data in a conference keynote.
The tool bit baffled me just as much. With hindsight, the exact problem turned out to be the solution. Let me explain…
Whilst there are other options, in many cases an agent calling a tool is going to do so using MCP. Thus, grabbing the dog firmly by the tail and proceeding to wag it, I went looking for MCP servers.
Looking down a list of hosted MCP servers that I found, I saw that there was only about a half-dozen that were open, including GlobalPing, AlphaVantage, and CoinGecko.
Flummoxed, I cast around for an actual use of one of these, with an unstructured data source. Oh jeez…are we really going to do the 'read a stream of tweets and look up the stock price/crypto-token' thing again?
The mistake I made was this: I’d focussed on the LLM bit of the agent definition:
an LLM agent runs tools in a loop to achieve a goal
Actually, what an agent is about is this:
[…] runs tools
The LLM bit can do fancy LLM stuff—but it’s also there to just invoke the tool(s) and decide when they’ve done what they need to do.
A tool is quite often just a wrapper on an API. So what we’re saying is, with MCP, we have a common interface to APIs. That’s…all.
We can define agents to interact with systems, and the way they interact is through a common protocol: MCP. When we load a web page, we don’t concern ourselves with what Chrome is doing, and unless we stop and think about it we don’t think about the TCP and HTTP protocols being used. It’s just the common way of things talking to each other.
And that’s the idea with MCP, and thus tool calling from agents. (Yes, there are other ways you can call tools from agents, but MCP is the big one, at the moment).
Given this reframing, it makes sense why there are so few open MCP servers. If an MCP server is there to offer access to an API, who leaves their API open for anyone to use? Well, read-only data provided like CoinGecko and AlphaVantage, perhaps.
In general though, the really useful thing we can do with tools is change the state of systems. That’s why any SaaS platform worth its salt is rushing to provide an MCP server. Not to jump on the AI bandwagon per se, but because if this is going to be the common protocol by which things get to be automated with agents, you don’t want to be there offering Betamax when everyone else has VHS.
SaaS platforms will still provide their APIs for direct integration, but they will also provide MCP servers. There’s also no reason why applications developed within an organisation wouldn’t offer MCP either, in theory.
No, not really. It actually makes a bunch of sense to me. I personally also like it a lot from a SQL-first, not-really-a-real-coder point of view.
Let me explain.
If you want to build a system to respond to something that’s happened by interacting with another external system, you have two choices now:
Write custom code to call the external system’s API. Handle failures, retries, monitoring, etc.
If you want to interact with a different system, you now need to understand the different API, work out calling it, write new code to do so.
Write an agent that responds to the thing that happened, and have it call the tool. The agent framework now standardises handling failures, retries, and all the rest of it.
If you want to call a different system, the agent stays pretty much the same. The only thing that you change is the MCP server and tool that you call.
You could write custom code—and there are good examples of where you’ll continue to. But you no longer have to.
For Kafka folk, my analogy here would be data integration with Kafka Connect. Kafka Connect provides the framework that handles all of the sticky and messy things about data integration (scale, error handling, types, connectivity, restarts, monitoring, schemas, etc etc etc). You just use the appropriate connector with it and configure it. Different system? Just swap out the connector. You want to re-invent the wheel and re-solve a solved-problem? Go ahead; maybe you’re special. Or maybe NIH is real ;P
So…what does an actual agent look like now, given this different way of looking at it? How about this:
Sure, the LLM could do a bunch of clever stuff with the input. But it can also just take our natural language expression of what we want to happen, and make it so.
Agents can use multiple tools, from multiple MCP servers.
Confluent launched Streaming Agents earlier this year. They’re part of the fully-managed Confluent Cloud platform and provide a way to run agents like I’ve described above, driven by events in a Kafka topic.
Here’s what the above agent would look like as a Streaming Agent:
Is this over-engineered? Do you even need an agent? Why not just do this?
or this?
You can. Maybe you should. But…don’t forget failure conditions. And restarts. And testing. And scaling.
All these things are taken care of for you by Flink.
|
Reality Check
Although having the runtime considerations taken care of for you is nice, let’s not forget another failure vector which LLMs add into the mix: There are mitigating steps we can take, but it’s important to recognise the trade-offs between the approaches. |
Permit me to indulge this line of steel-manning, because I think I might even have a valid argument here.
Let’s say we’ve built the above simplistic agent that sends a Slack when a data point is received. Now we want to enhance it to also include information about the weather forecast.
An agent would conceptually be something like this:
Our streaming agent above changes to just amending the prompt and adding a new tool (just DDL statements, defining the MCP server and its tools):
Whilst the bespoke application might have a seemingly-innocuous small addition:
But consider what this looks like in practice. Figuring out the API, new lines of code to handle calling it, failures, and so on. Oh, whilst you’re at it; don’t introduce any bugs into the bespoke code. And remember to document the change. Not insurmountable, and probably a good challenge if you like that kind of thing. But is it as straightforward as literally changing the prompt in an agent to use an additional tool, and let it figure the rest out (courtesy of MCP)?
|
Reality Check
Let’s not gloss over the reality too much here though; whilst adding a new tool call into the agent is definitely easier and less prone to introducing code errors, LLMs are by their nature non-deterministic—meaning that we still need to take care with the prompt and the tool invocation to make sure that the agent is still doing what it’s designed to do. You wouldn’t be wrong to argue that at least the non-Agent route (of coding API invocations directly into your application) can actually be tested and proved to work. |
There are different types of AI Agent—the one I’ve described is a tools-based one. As I mentioned above, its job is to run tools.
The LLM provides the natural language interface with which to invoke the tools. It can also, optionally, do additional bits of magic:
Process [unstructured] input, such as summarising or extracting key values from it
Decide which tool(s) need calling in order to achieve its aim
But at the heart of it, it’s about the tool that gets called. That’s where I was going wrong with this. That’s the bit I needed to think differently about :)