Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
149788 stories
·
33 followers

Building AI Agents in Kotlin – Part 2: A Deeper Dive Into Tools

1 Share

In the previous article, we saw how to build a basic coding agent with list, read, write, and edit capabilities. Today, we’ll dive into how to extend the agents’ capabilities by creating additional tools within the Koog framework. As an example, we’ll be building an ExecuteShellCommandTool, teaching our agent to run code and close the feedback loop that real engineering depends on: running the code, observing failures, and improving the code based on real output.

While LLMs tend to be good at avoiding syntax errors, they do struggle with integration issues. For example, they have a tendency to call nonexistent methods, miss imports, and only partially implement interfaces. The traditional approach of compiling and running the code immediately exposes these problems. But with a little additional prompting, we can push the LLM to run small tests to validate these kinds of behavior.

So, how do we build such a tool? Let’s start with the fundamentals.

What is the anatomy of a Koog tool?

First, we start by inheriting from the abstract ai.koog.agents.core.tools.Tool class, which tells us that we need to provide:

  1. Name: In our team, we like to follow the convention of snake_casing, surrounding the name with two underscores, though that’s just a matter of individual preference.
  2. Description: This field serves as the main documentation for the LLM, explaining what the tool does and why it should be called.
  3. Args class: This class describes the parameters that the LLM needs to or can provide when calling the tool.
  4. Result class: This class defines the data that will be formatted into a message for the LLM. It can contain a textForLLM() function that formats the data into a string for the LLM to read. The Result class exists mainly for developer convenience so it is easier to log or render the tool result in a UI. The agent itself only requires the formatted string.
  5. Execute() method: This method takes an instance of Args and returns an instance of Result. It defines the logic of what happens when the LLM calls the tool.

These components form the foundation of every Koog tool, whether you’re building a database connector, an API client, or, as we’ll see today, a shell command executor. Let’s walk through building an ExecuteShellCommandTool to see these principles in action.

A quick aside on safety

Before we dive into the specifics of the ExecuteShellCommandTool, we need to address a few key safety considerations.

While LLMs are not bad actors intentionally trying to cause us problems, they do occasionally make unexpected mistakes that can lead to issues down the line. And if we’re going to give them the power of command-line execution, their mistakes could have serious consequences.

There are several methods of mitigating this risk, however, including sandboxing command execution in isolated environments and/or limiting the permissions you grant them. However, these can be quite complicated to implement, and our focus now is on creating actual tools.

With this in mind, we’ll provide the two simplest risk mitigation strategies:

  1. Command execution confirmation for every command. This is the safest option as long as we take a minute to review every command the LLM wants to execute, but it does seriously limit the agent’s autonomy.
  2. Brave mode, in which every command is approved automatically. While this mode can pose a risk to the machine you’re working on if it is not properly sandboxed, we’ll only be using it inside our own benchmarks, which run in isolated environments that can be destroyed without consequence.

Implementing an ExecuteShellCommandTool

For our ExecuteShellCommandTool, these components are as follows:

  1. Name: __execute_shell_command__.
  2. Description: Something like “Executes shell commands and returns their output”.
  3. Args class: command, timeoutSeconds, workingDirectory
  4. Result class: output, exitCode, command (might seem surprising, but is convenient to report in logs or UI what command was run).
  5. Execute() method: Request confirmation, then execute the command (we’ll look closely at the implementation details below).

Execute() method implementation

Now, let’s look at how we implement our execute() method. We keep the method simple by delegating core logic to helper methods as follows:

override suspend fun execute(args: Args): Result = when (
        val confirmation = confirmationHandler.requestConfirmation(args)
    ) {
        is ShellCommandConfirmation.Approved -> try {
            val result = executor.execute(
                args.command, args.workingDirectory, args.timeoutSeconds
            )
            Result(args.command, result.exitCode, result.output)
        } catch (e: CancellationException) {
            throw e
        } catch (e: Exception) {
            Result(
               args.command, null, "Failed to execute command: ${e.message}"
            )
        }

        is ShellCommandConfirmation.Denied ->
            Result(
                args.command, null, 
                "Command execution denied with user response: ${confirmation.userResponse}"
            )
    }

The flow is straightforward: Request user confirmation for running the command, execute if this is approved via the command executor, or return the denial message to the LLM.

This also allows us to catch exceptions and forward error messages to the LLM, enabling it to adjust its approach or try alternatives.


ConfirmationHandler configuration

ConfirmationHandler becomes configurable when you create the ExecuteShellCommandTool, allowing for various implementations. Currently, we offer two:

  1. PrintShellCommandConfirmationHandler: Prompts the user via the command line.
  2. BraveModeConfirmationHandler: Automatically approves everything.

The second implementation just approves without any conditions, but the first has some interesting nuances:

override suspend fun requestConfirmation(
        args: ExecuteShellCommandTool.Args
    ): ShellCommandConfirmation {
        println("Agent wants to execute: ${args.command}")
        args.workingDirectory?.let { println("In: $it") }
        println("Timeout: ${args.timeoutSeconds}s")
        print("Confirm (y / n / reason-for-denying): ")

        val userResponse = readln().lowercase()
        return when (userResponse) {
            "y", "yes" -> ShellCommandConfirmation.Approved
            else -> ShellCommandConfirmation.Denied(userResponse)
        }
    }

Note that while users perceive three options (approve, deny, deny with reason), the implementation treats both denial types identically: Both return to the LLM, which interprets and handles each type appropriately.


CommandExecutor configuration

Like the ConfirmationHandler, the CommandExecutor is also configurable, though we currently only provide a JVM implementation. Theoretically, you could create implementations for Android, iOS, WebAssembly, and other platforms, but without clear demand, we’ll defer these for now.

How do we handle timeouts?

Regardless of the platform used, one aspect of the CommandExecutor – timeout handling – deserves special attention. Our current implementation doesn’t allow the agent to interrupt long-running commands.

Humans often act from an impatience heuristic, which can lead to them using Ctrl+C to cancel an execution. But this behavior assumes multi-threaded, agentic awareness and an underlying concept of impatience. However, there is a simpler and more intuitive alternative that fits this context much better.

By requiring the LLM to specify a maximum execution time, we can safely interrupt commands that exceed this limit. This timeout value is shown to users, allowing them to reject execution if the requested duration appears unreasonable or excessive.

Instead of simply terminating the process and returning a generic timeout message, we should aim to preserve as much output as possible. Even incomplete results can help the LLM extract useful information, or at least understand where the timeout occurred. With careful implementation, we can achieve this:

val stdoutJob = launch {
    process.inputStream.bufferedReader().useLines { lines ->
        try {
            lines.forEach { stdoutBuilder.appendLine(it) }
        } catch (_: IOException) {
            // Ignore IO exception if the stream is closed and silently stop stream collection
        }
    }
}

val isCompleted = withTimeoutOrNull(timeoutSeconds * 1000L) {
    process.onExit().await()
} != null

if (!isCompleted) {
    process.destroyForcibly()
}

stdoutJob.join()

What changed at the agent level?

At the agent level, the modifications are minimal, roughly a dozen lines of code. As the diff shows, the majority of these updates involve extending the system prompt.

We could have made our change even smaller by simply adding the tool, but instead, we also introduced two additional, though still minor, modifications to our agent.

A) BRAVE_MODE toggle

The implementation of this toggle is relatively straightforward; it just checks the BRAVE_MODE environment variable. We even use a lambda function for the Brave mode ConfirmationHandler to show that implementing your own can be very simple, but one could also use the BraveModeConfirmationHandler that was mentioned earlier. 

fun createExecuteShellCommandToolFromEnv(): ExecuteShellCommandTool {
    return if (System.getenv("BRAVE_MODE")?.lowercase() == "true") {
        ExecuteShellCommandTool(JvmShellCommandExecutor()) {
            _ -> ShellCommandConfirmation.Approved 
        }
    } else {
        ExecuteShellCommandTool(
            JvmShellCommandExecutor(), 
            PrintShellCommandConfirmationHandler(),
        )
    }
}

B) Enhanced “definition of done” in the prompt

To ensure the LLM leverages this new capability to execute code effectively, we extended the prompt with a “definition of done” that strongly encourages writing and running tests:

"""
... // Previous prompt from step 01

Production-ready means verified to work—your changes must be proven correct and not introduce regressions.

You have shell access to execute commands and run tests. Use this to work with executable feedback instead of assumptions. Establish what correct behavior looks like through tests, then iterate your implementation until tests pass. Validate that existing functionality remains intact. Production-ready means proven through green tests—that's your definition of done.
"""

So that’s it. We’ve built a complete ExecuteShellCommandTool component and integrated it into our agent with minimal changes. But the real question is: Does it actually improve performance?

Benchmark testing results

Running both versions against the SWE-bench-verified set confirms the execution capabilities provide a clear benefit: The agent’s performance improved from 249/500 (50%) to 279/500 (56%) successful examples. While leaderboard scores reach around 70%, our results indicate that giving agents the ability to run and validate code is a step in the right direction. However, to understand where our agent is still struggling and what to improve next, we need better visibility into its behavior. That’s where logging becomes essential.

Conclusion: Building tools in Koog

Throughout this article, we’ve seen how Koog’s tool structure works: Every tool needs a name, description and execute() method. We made our ExecuteShellCommandTool configurable via ConfirmationHandler and CommandExecutor components, showing how to delegate complex logic to swappable implementations.

These same patterns apply to any tool you might want to build: database connectors, API clients, file processors, or custom integrations. The framework provides the structure for LLM communication; you provide the specific capabilities. 

In the next article, we’ll look at how to add logging and tracing to our agent, giving us the visibility we need to understand and improve its behavior. Understanding how your agent makes decisions is crucial for iteration, and proper tooling around observability is just as important as the tools you give the agent itself.

Read the whole story
alvinashcraft
53 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

WebSocket Connection Failed: Quick Troubleshooting Guide

1 Share

Quick troubleshooting checklist

When a WebSocket connection fails:

  • Confirm the URL and protocol (ws:// or wss://)

  • Check browser console logs

  • Ensure the server is reachable and running

  • Test on another network

  • Verify SSL certificates if using wss://

  • Review reverse proxy config

  • Look for firewall or antivirus interference

  • Validate cross-origin settings

WebSockets enable real-time, bidirectional communication between clients and servers. When a connection fails, chat windows stop updating, dashboards freeze, and any feature that depends on instant updates breaks. This guide explains the most common failure points and provides instructions for quickly diagnosing them.


Try Postman today →

Understanding connection errors

A WebSocket connection starts with an HTTP handshake that upgrades to the WebSocket protocol. When that handshake fails, browsers typically show errors like:

  • WebSocket connection failed

  • HTTP 400 or 403 during upgrade

  • A connection that drops without explanation

  • A stalled attempt that never transitions to an open state

A successful WebSocket upgrade requires:

  • A valid HTTP upgrade request from the client

  • A server that accepts the upgrade

  • Network infrastructure that allows persistent TCP connections

  • Consistent security requirements (HTTPS with WSS)

If any of these fail, the WebSocket connection cannot be established.

Common causes and solutions

Network and firewall restrictions

Corporate networks, proxies, and firewalls often block WebSocket traffic. WebSockets maintain an open connection, which can look unusual to strict security systems.

How to diagnose:

  • Try the connection on a different network

  • Verify that normal HTTPS traffic works while WebSocket traffic fails

  • Check firewall logs for connection rejections

Solution: If you control the network, allow traffic on ports 80 (ws://) and 443 (wss://). If you’re in a corporate environment, request that the WebSocket traffic be added to the allowlist. This applies to both browser-based applications and backend services that rely on the protocol.

SSL and TLS mismatches

WebSockets that use wss:// require valid certificates, just like HTTPS. Insecure-to-secure mixing can also block connections.

How to diagnose:

  • Look for mixed content warnings in DevTools

  • Check for TLS or certificate errors during the handshake

  • See whether ws:// works while wss:// fails

Solution: Use HTTPS on the page if you are connecting with wss://. Make sure your certificate is valid, not expired, and matches the domain you are calling.

Nginx and reverse proxy configuration

Reverse proxies need explicit configuration to support WebSocket upgrades. Without the correct headers, they treat upgrade requests as standard HTTP traffic, which breaks the connection.

Required Nginx configuration:

location /websocket {
    proxy_pass http://backend_server;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";
    proxy_read_timeout 86400;
}

The important pieces are:

  • proxy_http_version 1.1 so the upgrade request is allowed

  • Upgrade and Connection headers to signal the protocol change

  • proxy_read_timeout so long-lived connections are not terminated early

For local development, replace backend_server with a local address such as localhost:3000 or your Docker container name.

Cross-origin issues

WebSockets follow the browser’s same-origin rules. If your client and server are on different origins, the server must explicitly allow the connection.

Solution: Return the appropriate CORS headers during the initial HTTP upgrade request. The headers must be present during the handshake, not later in the message flow.

JavaScript implementation errors

Client-side code can make debugging harder if error handlers are missing. Without handlers for open, error, and close events, failures become invisible.

Common mistakes:

const ws = new WebSocket('wss://api.example.com');

ws.onerror = (error) => {
  console.error('WebSocket error:', error);
};

ws.onclose = (event) => {
  console.log('Connection closed:', event.code, event.reason);
};

ws.onopen = () => {
  console.log('Connected successfully');
};

Adding these handlers makes it much easier to see why a connection failed.

Protocol version mismatches

Modern browsers use WebSocket protocol version 13. Older servers or outdated frameworks may use something different.

Solution: Confirm that your server and libraries support version 13. Update outdated WebSocket libraries if necessary.

Testing and debugging with a WebSocket client

Using a dedicated WebSocket client helps isolate handshake errors, authentication issues, and message flow problems without requiring modifications to your app.

Postman is one option that provides a visual WebSocket client. You can:

  1. Open a new WebSocket request.

  2. Enter the ws:// or wss:// URL.

  3. Add headers or authentication if required.

  4. Connect and inspect the full event stream.

  5. Send messages and view responses in real time.

Postman includes a built-in WebSocket client that shows the full handshake, event stream, and message history. This makes it easier to confirm whether the upgrade succeeded and spot issues early in the debugging process.

Best practices for reliable WebSocket connections

  • Use wss:// in production to avoid security blocks and ensure encrypted transport.

  • Implement reconnection logic with exponential backoff.

  • Set reasonable timeouts on both client and server to avoid idle disconnects.

  • Validate the handshake result and check for HTTP 101, which indicates a successful upgrade.

  • Test across different networks because behavior can change behind proxies or corporate firewalls.

  • Load scripts correctly so your WebSocket code runs only after dependencies are available.

Most WebSocket failures come from configuration, networking, or TLS issues rather than from the application code itself. A systematic approach usually reveals the root cause quickly.

The post WebSocket Connection Failed: Quick Troubleshooting Guide appeared first on Postman Blog.

Read the whole story
alvinashcraft
54 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Turns Out Even I Can Mix Up Azure Policy and Service Groups. Oops.

1 Share
a.k.a. Here's Your Full Guide to Understanding Azure Cloud Governance Without Losing Your Mind or Sanity Cloud governance is one of those topics that sounds straightforward until you start hearing people use the same five words to describe completely different things. You walk into a meeting and someone says, “We should solve this with Azure Policy,” and someone else responds, “We need a Blueprint,” and a third person chimes in with, “This belongs in our Service Group model.” At that point...

Read the whole story
alvinashcraft
54 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

GitHub - matthewrdev/maude: Maude is a plugin for .NET MAUI to monitor app memory at runtime and view it via live-rendered chart.

1 Share
submitted by SmartmanApps to dotnetmaui
1 points | 0 comments
https://github.com/matthewrdev/maude

Read the whole story
alvinashcraft
54 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Securing Azure AI Applications: A Deep Dive into Emerging Threats

1 Share

Why AI Security Can’t Be Ignored?

Generative AI is rapidly reshaping how enterprises operate—accelerating decision-making, enhancing customer experiences, and powering intelligent automation across critical workflows.

But as organizations adopt these capabilities at scale, a new challenge emerges: AI introduces security risks that traditional controls cannot fully address.

AI models interpret natural language, rely on vast datasets, and behave dynamically. This flexibility enables innovation—but also creates unpredictable attack surfaces that adversaries are actively exploiting. As AI becomes embedded in business-critical operations, securing these systems is no longer optional—it is essential.

The New Reality of AI Security

The threat landscape surrounding AI is evolving faster than any previous technology wave. Attackers are no longer focused solely on exploiting infrastructure or APIs; they are targeting the intelligence itself—the model, its prompts, and its underlying data.

These AI-specific attack vectors can:

  • Expose sensitive or regulated data
  • Trigger unintended or harmful actions
  • Skew decisions made by AI-driven processes
  • Undermine trust in automated systems

As AI becomes deeply integrated into customer journeys, operations, and analytics, the impact of these attacks grows exponentially.

Why These Threats Matter?

Threats such as prompt manipulation and model tampering go beyond technical issues—they strike at the foundational principles of trustworthy AI. They affect:

  • Confidentiality: Preventing accidental or malicious exposure of sensitive data through manipulated prompts.
  • Integrity: Ensuring outputs remain accurate, unbiased, and free from tampering.
  • Reliability: Maintaining consistent model behavior even when adversaries attempt to deceive or mislead the system.

When these pillars are compromised, the consequences extend across the business:

  • Incorrect or harmful AI recommendations
  • Regulatory and compliance violations
  • Damage to customer trust
  • Operational and financial risk

In regulated sectors, these threats can also impact audit readiness, risk posture, and long-term credibility.

Understanding why these risks matter builds the foundation.
In the upcoming blogs, we’ll explore how these threats work and practical steps to mitigate them using Azure AI’s security ecosystem.

Why AI Security Remains an Evolving Discipline?

Traditional security frameworks—built around identity, network boundaries, and application hardening—do not fully address how AI systems operate. Generative models introduce unique and constantly shifting challenges:

  • Dynamic Model Behavior: Models adapt to context and data, creating a fluid and unpredictable attack surface.
  • Natural Language Interfaces: Prompts are unstructured and expressive, making sanitization inherently difficult.
  • Data-Driven Risks: Training and fine-tuning pipelines can be manipulated, poisoned, or misused.
  • Rapidly Emerging Threats: Attack techniques evolve faster than most defensive mechanisms, requiring continuous learning and adaptation.

Microsoft and other industry leaders are responding with robust tools—Azure AI Content Safety, Prompt Shields, Responsible AI Frameworks, encryption, isolation patterns—but technology alone cannot eliminate risk. True resilience requires a combination of tooling, governance, awareness, and proactive operational practices.

Let's Build a Culture of Vigilance:

AI security is not just a technical requirement—it is a strategic business necessity. Effective protection requires collaboration across:

  • Developers
  • Data and AI engineers
  • Cybersecurity teams
  • Cloud platform teams
  • Leadership and governance functions

Security for AI is a shared responsibility. Organizations must cultivate awareness, adopt secure design patterns, and continuously monitor for evolving attack techniques. Building this culture of vigilance is critical for long-term success. 

Key Takeaways:

AI brings transformative value, but it also introduces risks that evolve as quickly as the technology itself. Strengthening your AI security posture requires more than robust tooling—it demands responsible AI practices, strong governance, and proactive monitoring.

By combining Azure’s built-in security capabilities with disciplined operational practices, organizations can ensure their AI systems remain secure, compliant, and trustworthy, even as new threats emerge.

What’s Next?

In future blogs, we’ll explore two of the most important AI threats—Prompt Injection and Model Manipulation—and share actionable strategies to mitigate them using Azure AI’s security capabilities. Stay tuned for practical guidance, real-world scenarios, and Microsoft-backed best practices to keep your AI applications secure.

Stay Tuned.!

Read the whole story
alvinashcraft
54 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

How to Implement Google OAuth Login in Next.js

1 Share
Learn how to add Google OAuth login to your Next.js app with a simple step-by-step guide. This article explains setup, configuration, API routes, environment variables, and frontend integration in clear, easy-to-understand language.
Read the whole story
alvinashcraft
55 minutes ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories