Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
149755 stories
·
33 followers

Stop Writing Log File Viewers By Yourself

1 Share

There are many tutorials on using AI for coding, but far fewer on using AI for log analytics. We are living in exciting times as AI improves every day. If you spend most of your time troubleshooting business applications through log files, you might assume AI cannot help much yet. In most environments, scripts collect logs from production servers or client machines. On Windows, these event logs are usually the first place to look:

  • Application
  • System
  • Security

In addition, application-specific logs are usually captured as well. Depending on your setup, you may have one or more custom viewers to inspect them during production incidents. In cloud environments, vendor tooling (for example, Kusto in Azure) is excellent for analysis. Still, there are times when you need to export raw incident data and inspect it in detail with a text-based workflow. In on-premises container environments, this often means too much data and not enough capable log-viewing options.

When an error is hard to spot, you often end up with a folder full of logs in different formats. Then you have to sift through them manually to find patterns around the time of failure.

This process is tedious and time-consuming. It is also frustrating, because existing viewers often do not provide enough context to analyze the problem effectively.

One relatively new but powerful option is Copilot CLI, which lets you interact with Copilot directly from your terminal.

My current workflow is to copy all relevant data into one folder, start Copilot there, and give it full access to that folder:

C:\Issues\CrashAt16March_19_43_51>copilot
╭──────────────────────────────────────────────────────────────────────────────────────────────────╮
│ ╭─╮╭─╮ │
│ ╰─╯╰─╯ GitHub Copilot v1.0.5 │
│ █ ▘▝ █ Describe a task to get started. │
│ ▔▔▔▔ │
│ Tip: /usage Display session usage metrics and statistics │
│ Copilot uses AI, so always check for mistakes. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
C:\Issues\CrashAt16March_19_43_51 gpt-5.4 (medium) (1x)
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
❯ Type @ to mention files, # for issues/PRs, / for commands, or ? for shortcuts
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
shift+tab switch mode

Then you can ask questions like:

find root cause of process crash at 16th March 2026 at 19:43:51 look at all log files and -5 minutes and +30s around crash time.

Sometimes simple prompts produce excellent results. Other times, the AI lacks enough context about how services and components interact. In those cases, I switch to a data-pipeline approach.

You can ask Copilot to generate custom parsers for different log formats and merge the results into a common CSV file. Once you have that conversion tool, you can reuse it across future incidents.

You might ask questions such as:

  • Which processes were started?
    • The Security Event Log (with process auditing enabled) gives you this data
  • Were there process crashes (Application Event Log)?
  • Did the system have a kernel crash (System Event Log)?
  • When was the system booted and shut down (System Event Log)?
  • How does this relate to the application log files?

To answer these questions for a specific incident, you need a custom viewer that can quickly correlate multiple log sources. In the past, building such a viewer or visualization could take weeks. With a data-pipeline approach, this becomes much easier. A practical starting point is to parse different log formats and convert them into a common format such as CSV (Comma-Separated Values). Copilot can help build that converter, and you can reuse it for future issues.

If you stay on well-known implementation paths for your custom viewer, you can get excellent results and iteratively add analysis features through prompts. For example, combining JavaScript and C# with WebView2 gives you a strong architecture: C# handles large-file parsing, while the web UI provides effective visualizations.

It is now possible to continuously create and update the viewer for each specific issue.

This is a game changer in terms of log analytics because you can finally change previously immutable log viewers which did lack features on the fly for specific issues. You just need to put the source code of your current viewer to your problem folder to adapt the viewer on the fly depending on the problem you are analyzing.

This is an ideal AI use case: small, self-contained tools that need frequent, lightweight tweaks. You can request timeline charts for specific messages and correlate them with other logs. Let AI accelerate the UI work while you focus on the deeper correlations it may miss due to limited context or system knowledge. Working with a viewer tailored to exactly what you need is more effective and more enjoyable. As a bonus, flexible visualizations help you learn far more about your data.

If you build a custom viewer this way, share your experience in the comments—especially what worked well and where this approach helped most.



Read the whole story
alvinashcraft
26 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Natural Keys in the Critter Stack

1 Share

Just to level set everyone, there are two general categories of identifiers we use in software:

  • “Surrogate” keys are data elements like Guid values, database auto numbering or sequences, or snowflake generated identifiers that have no real business meaning and just try to be unique values.
  • “Natural” keys have some kind of business meaning and usually utilize some piece of existing information like email addresses or phone numbers. A natural key could also be an external supplied identifier from your clients. In fact, it’s quite common to have your own tracking identifier (usually a surrogate key) while also having to track a client or user’s own identification for the same business entity.

That very last sentence is where this post takes off. You see Marten can happily track event streams with either Guid identifiers (surrogate key) or string identifiers — or strong typed identifiers that wrap an inner Guid or string, but in this case that’s really the same thing, just with more style I guess. Likewise, in combination with Wolverine for our recommended “aggregate handler workflow” approach to building command handlers, we’ve only supported the stream id or key. Until now!

With the Marten 8.23 and Wolverine 5.18 releases last week (we’ve been very busy and there are newer releases now), you are now able to “tag” Marten (or Polecat!) event streams with a natural key in addition to its surrogate stream id and use that natural key in conjunction with Wolverine’s aggregate handler workflow.

Of course, if you use strings as the stream identifier you could already use natural keys, but let’s just focus on the case of Guid identified streams that are also tagged with some kind of natural key that will be supplied by users in the commands sent to the system.

First, to tag streams with natural keys in Marten, you have to have a strong typed identifier type for the natural key. Next, there’s a little bit of attribute decoration in the targeted document type of a single stream projection, i.e., the “write model” for an event stream. Here’s an example from the Marten documentation:

public record OrderNumber(string Value);
public record InvoiceNumber(string Value);
public class OrderAggregate
{
public Guid Id { get; set; }
[NaturalKey]
public OrderNumber OrderNum { get; set; }
public decimal TotalAmount { get; set; }
public string CustomerName { get; set; }
public bool IsComplete { get; set; }
[NaturalKeySource]
public void Apply(OrderCreated e)
{
OrderNum = e.OrderNumber;
CustomerName = e.CustomerName;
}
public void Apply(OrderItemAdded e)
{
TotalAmount += e.Price;
}
[NaturalKeySource]
public void Apply(OrderNumberChanged e)
{
OrderNum = e.NewOrderNumber;
}
public void Apply(OrderCompleted e)
{
IsComplete = true;
}
}

In particular, see the usage of [NaturalKey] which should be self-explanatory. Also see the [NaturalKeySource] attribute that we’re using to mark when a natural key value might change. Marten is starting to use source generators for some projection internals (in place of some nasty, not entirely as efficient as it should have been, Expression-compiled-to-Lambda functions).

And that’s that, really. You’re now able to use the designated natural keys as the input to an “aggregate handler workflow” command handler with Wolverine. See Natural Keys from the Wolverine documentation for more information.

For a little more information:

  • The natural keys are stored in a separate table, and when using FetchForWriting(), Marten is doing an inner join from the tag table for that natural key type to the mt_streams table in the Marten database
  • You can change the natural key against the surrogate key
  • We expect this to be most useful when you want to use the Guid surrogate keys for uniqueness in your own system, but you frequently receive a natural key from API users of your system — or at least this has been encountered by a couple different JasperFx Software customers.
  • The natural key storage does have a unique value constraint on the “natural key” part of the storage
  • Really only a curiosity, but this was done in the same wave of development as Marten’s new DCB support


Read the whole story
alvinashcraft
37 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

1.0.6

1 Share

2026-03-16

  • Autopilot continuation no longer gets permanently blocked after an error in a previous turn
  • In autopilot, task_complete summary is now required and renders as markdown
  • Input placeholder text is no longer read aloud by screen readers on every prompt submission
  • Free tree-sitter WASM objects after shell command parsing to prevent memory leak
  • /help dialog starts scrolled to the top in alt-screen mode
  • Auto-update correctly recovers from race conditions on Windows
  • CLI no longer fails to load on Windows after updating while another instance is running
  • Reduce memory usage by eliminating redundant environment variable copies per child process spawn
  • Remaining requests widget no longer shows inaccurate quota data for Copilot Free users
  • Resolve session crashes caused by HTTP/2 connection pool race conditions when sub-agents are active
  • CLI loads the latest version of itself after an auto-update
  • Kill command validation no longer incorrectly blocks some legitimate commands. e.g. p.kill() in a python script
  • Instruction file frontmatter applyTo field accepts both string and array values
  • Improve streaming and tool-output memory usage
  • Model can discover and use tools dynamically with tool search for Claude models
  • Hooks fire correctly when resuming a previous session
  • Prompt input in alt screen mode renders all lines without truncation
  • Links and right-click paste no longer trigger twice when running in VS Code's integrated terminal
  • Hook configuration files now work across VS Code, Claude Code, and the CLI without modification by accepting PascalCase event names alongside camelCase
  • Native module prebuilds (e.g., conpty.node on Windows ARM64) load reliably on first launch
  • Subagent elapsed time in /tasks view freezes when idle and resumes when active again
  • Flags --enable-all-github-mcp-tools, --add-github-mcp-toolset, and --add-github-mcp-tool now take effect when using the SDK (ACP mode)
  • Custom instruction file paths load correctly when using COPILOT_CUSTOM_INSTRUCTIONS_DIRS
  • Command output is no longer lost when a command causes the shell to exit
  • Plugins using .claude-plugin/plugin.json are discovered when loaded via --plugin-dir
  • Fix handling of shift+enter on VS Code with old /terminal-setup config.
  • Agent creation wizard shows the correct user agents directory path
  • Support Open Plugin spec file locations for loading plugin and marketplace manifests
  • Show friendlier error messages and provide keyboard shortcut to open event links in browsers
  • Extension tools now work with the permissions system, use skipPermission per-tool to bypass permission prompts
  • Hook config files now support Claude Code's nested matcher/hooks structure and optional type field
  • Sub-agents launched by the task tool are assigned human-readable IDs based on their name (e.g., math-helper-0) instead of generic agent-0 identifiers
  • The create_pull_request tool now includes the PR URL in its output so the agent can share the direct link
  • read_agent output includes inbound messages that triggered each turn in multi-turn agents
  • Improve compatibility with the Open Plugins spec: support .lsp.json, PascalCase hook event names, exclusive path mode, and : namespace separator
Read the whole story
alvinashcraft
1 minute ago
reply
Pennsylvania, USA
Share this story
Delete

My custom agent used 87% fewer tokens when I gave it Skills for its MCP tools

1 Share

Today’s web apps don’t seem particularly concerned about resource consumption. The simplest site seems to eat up hundreds of MB of memory in my browser. We’ve probably gotten a bit lazy with optimization since many computers have horsepower to spare. But when it comes to LLM tokens, we’re still judicious. Most of us have bumped into quotas or unexpected costs!

I see many examples of introducing and tuning MCPs and skills for IDEs and agentic tools. But what about the agents you’re building? What’s the token impact of using MCPs and skills for custom agents?

I tried out six solutions with the Agent Development Kit (Python) and counted my token consumption for each. The tl;dr? A well-prompted Gemini with zero tools or skills is successful with the fewest tokens consumed, with the second best option being MCP + skills. Third-best in token consumption is raw Gemini plus skills.

I trust that you can find a thousand ways to do this better than me, but here’s a table with the best results from multiple runs of each of my experiments. The title of the post refers to the difference between scenarios 2 and 3.

ScenarioAgent DescriptionTurnsTokens
0Instructions only, built in code execution tool71,286
1Uses BigQuery MCP913,763
2Uses BigQuery, AlloyDB, Cloud SQL MCPs29328,083
3Uses BigQuery, AlloyDB, Cloud SQL MCPs with skill539,622
4Use BigQuery MCP and a skill56,653
5Instruction, skill, and built-in code execution tool2764,444

What’s the problem to solve?

I want an agent that can do some basic cloud FinOps for me. I’ve got a Google Cloud BigQuery table that is automatically populated with billing data for items in my project.

Let’s have an agent that can find the table and figure out what my most expensive Cloud Storage buckets are so far this month. This could be an agent we call from a platform like Gemini Enterprise so that our finance people (or team leads) could quickly get billing info.

A look at our agent runner

The Agent Development Kit (ADK) offers some powerful features for building robust agents. It has native support for MCPs and skills, and has built-in tools for services like Google Search.

While the ADK does have a built-in BigQuery tool, I wanted to use the various managed MCP servers Google Cloud offers.

Let’s look at some code. One file to start. The main.py file runs our agent and count the tokens from each turn of the LLM. The token counting magic was snagged from an existing sample app. For production scenarios, you might want to use our BigQuery Agent Analytics plugin for ADK that captures a ton of interesting data points about your agent runs, including tokens per turn.

Here’s the main.py file:

import asyncio
import time
import warnings

import agent
from dotenv import load_dotenv
from google.adk import Runner
from google.adk.agents.run_config import RunConfig
from google.adk.artifacts.in_memory_artifact_service import InMemoryArtifactService
from google.adk.cli.utils import logs
from google.adk.sessions.in_memory_session_service import InMemorySessionService
from google.adk.sessions.session import Session
from google.genai import types

# --- Initialization & Configuration ---
import os
# Load environment variables (like API keys) from the .env file
load_dotenv(os.path.join(os.path.dirname(__file__), '.env'), override=True)
# Suppress experimental warnings from the ADK
warnings.filterwarnings('ignore', category=UserWarning)
# Redirect agent framework logs to a temporary folder
logs.log_to_tmp_folder()


async def main():
  app_name = 'my_app'
  user_id_1 = 'user1'
  
  # Initialize the services required to manage chat history and created artifacts
  session_service = InMemorySessionService()
  artifact_service = InMemoryArtifactService()
  
  # The Runner orchestrates the agent's execution loop
  runner = Runner(
      app_name=app_name,
      agent=agent.root_agent,
      artifact_service=artifact_service,
      session_service=session_service,
  )
  
  # Create a new session to hold the conversation state
  session_1 = await session_service.create_session(
      app_name=app_name, user_id=user_id_1
  )

  total_prompt_tokens = 0
  total_candidate_tokens = 0
  total_tokens = 0
  total_turns = 0

  async def run_prompt(session: Session, new_message: str):
    # Helper variables to track token usage and turns across the session
    nonlocal total_prompt_tokens
    nonlocal total_candidate_tokens
    nonlocal total_tokens
    nonlocal total_turns
    
    # Structure the user's string input into the appropriate Content format
    content = types.Content(
        role='user', parts=[types.Part.from_text(text=new_message)]
    )
    print('** User says:', content.model_dump(exclude_none=True))
    
    # Stream events back from the Runner as the agent executes its task
    async for event in runner.run_async(
        user_id=user_id_1,
        session_id=session.id,
        new_message=content,
    ):
      total_turns += 1
      
      # Print intermediate steps (text, tool calls, and tool responses) to the console
      if event.content and event.content.parts:
        for part in event.content.parts:
          if part.text:
            print(f'** {event.author}: {part.text}')
          if part.function_call:
            print(f'** {event.author} calls tool: {part.function_call.name}')
            print(f'   Arguments: {part.function_call.args}')
          if part.function_response:
            print(f'** Tool response from {part.function_response.name}:')
            print(f'   Response: {part.function_response.response}')

      if event.usage_metadata:
        total_prompt_tokens += event.usage_metadata.prompt_token_count or 0
        total_candidate_tokens += (
            event.usage_metadata.candidates_token_count or 0
        )
        total_tokens += event.usage_metadata.total_token_count or 0
        print(
            f'Turn tokens: {event.usage_metadata.total_token_count}'
            f' (prompt={event.usage_metadata.prompt_token_count},'
            f' candidates={event.usage_metadata.candidates_token_count})'
        )

    print(
        f'Session tokens: {total_tokens} (prompt={total_prompt_tokens},'
        f' candidates={total_candidate_tokens})'
    )

  # --- Execution Phase ---
  start_time = time.time()
  print('Start time:', start_time)
  print('------------------------------------')
  
  # Send the initial prompt to the agent and trigger the run loop
  await run_prompt(session_1, 'Find the top 3 most expensive Cloud Storage buckets in our March 2026 billing export for project seroter-project-base')
  print(
      await artifact_service.list_artifact_keys(
          app_name=app_name, user_id=user_id_1, session_id=session_1.id
      )
  )
  end_time = time.time()
  print('------------------------------------')
  print('Total turns:', total_turns)
  print('End time:', end_time)
  print('Total time:', end_time - start_time)


if __name__ == '__main__':
  asyncio.run(main())

Nothing too shocking here. But this gives me a fairly verbose output that lets me see how many turns and tokens each scenario eats up.

Scenario 0: Raw agent (no MCP, no tools) using Python code execution

In this foundational test, what if we ask the agent to answer the question without the help of any external tools? All it can do is write and execute Python code on the local machine using a built-in tool. This flavor is only for local dev, as there are more production-grade isolation options for running code.

Here’s the agent.py for this base scenario. I’ve got a decent set of instructions to guide the agent for how to write code to find and query the relevant table.

from google.adk.agents import LlmAgent
from google.adk.skills import load_skill_from_dir
from google.adk.tools import skill_toolset
from google.adk.tools.mcp_tool import McpToolset, StreamableHTTPConnectionParams
from google.adk.auth.auth_credential import AuthCredential, AuthCredentialTypes, ServiceAccount
from fastapi.openapi.models import OAuth2, OAuthFlows, OAuthFlowClientCredentials
from google.adk.code_executors.unsafe_local_code_executor import UnsafeLocalCodeExecutor


# --- Agent Definition ---

# --- Scenario 0: Raw Agent using Python Code Execution for Discovery and Analysis ---
root_agent = LlmAgent(
    name="data_analyst_agent",
    model="gemini-3.1-flash-lite-preview",
    instruction="""You are a data analyst. 
    CRITICAL: You have NO TOOLS registered. NEVER attempt a tool call or function call (like `list_datasets` or `bq_list_dataset_ids`). 
    You MUST perform all technical tasks by writing and executing Python code blocks in markdown format (e.g., ` ```python `) using the `google-cloud-bigquery` client library.
    
    1. DISCOVERY: If you don't know the table names, you MUST write and execute Python code to list datasets and tables.
    2. ANALYSIS: Use Python to query data and perform analysis.
    3. NO HYPOTHETICALS: NEVER provide hypothetical, example, or placeholder results. Only show data you have actually retrieved via code execution.
    ALWAYS explain the approach you used to access BigQuery.""",
    code_executor=UnsafeLocalCodeExecutor()
)

This scenario runs quickly (about 14 seconds on each test), took five turns, and consumed 1786 tokens. In my half-dozen runs, I saw as many as nine turns, and as few as 1286 tokens consumed.

This was the most efficient way to go of any scenario.

Scenario 1: Agent with BigQuery MCP

Love it or hate it, MCP is going to remain a popular way to connect to external systems. Instead of needing to understand every system’s APIs, MCP tools give us a standard way to do things.

I’m using our fully managed remote MCP Server for BiQuery. This MCP server exposes a handful of useful tools for discovery and data retrieval. Note that the awesome open source MCP Toolbox for Databases is another great way to pull 40+ data sources into your agents.

The agent.py for Scenario 1 looks like this. You can see that I’m initializing the auth with my application default credentials and setting up the correct OAuth flow. The agent itself has a solid instruction to steer the MCP server. Note that I left an old, unoptimized instruction in there. That old instruction resulted in dozens of turns and up to 600k tokens consumed!

from google.adk.agents import LlmAgent
from google.adk.skills import load_skill_from_dir
from google.adk.tools import skill_toolset
from google.adk.tools.mcp_tool import McpToolset, StreamableHTTPConnectionParams
from google.adk.auth.auth_credential import AuthCredential, AuthCredentialTypes, ServiceAccount
from fastapi.openapi.models import OAuth2, OAuthFlows, OAuthFlowClientCredentials
from google.adk.code_executors.unsafe_local_code_executor import UnsafeLocalCodeExecutor

# --- BigQuery MCP Configuration ---

# Configure authentication for the BigQuery MCP server
bq_auth_credential = AuthCredential(
    auth_type=AuthCredentialTypes.SERVICE_ACCOUNT,
    service_account=ServiceAccount(
        use_default_credential=True,
        scopes=["https://www.googleapis.com/auth/bigquery"]
    )
)

# Use OAuth2 with clientCredentials flow for background ADC exchange
bq_auth_scheme = OAuth2(
    flows=OAuthFlows(
        clientCredentials=OAuthFlowClientCredentials(
            tokenUrl="https://oauth2.googleapis.com/token",
            scopes={"https://www.googleapis.com/auth/bigquery": "BigQuery access"}
        )
    )
)

# Initialize the BigQuery MCP Toolset
bq_mcp_toolset = McpToolset(
    connection_params=StreamableHTTPConnectionParams(url="https://bigquery.googleapis.com/mcp"),
    auth_scheme=bq_auth_scheme,
    auth_credential=bq_auth_credential,
    tool_name_prefix="bq"
)

# --- Agent Definition ---

# --- Scenario 1: Using Gemini to get data from BigQuery with MCP ---
root_agent = LlmAgent(
    name="data_analyst_agent",
    model="gemini-3.1-flash-lite-preview",
    ##instruction="You are a data analyst. Use BigQuery to find and analyze data. Do not give the user steps to run themselves, or ask for further information, but explore options and execute any commands yourself. Explain the approach you used to access BigQuery. ",
    instruction="""You are a data analyst. Use BigQuery to find and analyze data. 
    To minimize token usage and time, follow these rules:
    1. DISCOVERY: If you are unsure of a table's exact schema, ALWAYS query `INFORMATION_SCHEMA.COLUMNS` first to find the right fields before writing complex data queries.
    2. EFFICIENCY: When exploring data to understand its structure, ALWAYS use `LIMIT 5` to avoid returning massive payloads.
    3. AUTONOMY: Do not ask the user for table names or steps; explore the datasets yourself and execute the final queries.
    4. EXPLANATION: Briefly explain the steps you took to find the answer.""",
    tools=[bq_mcp_toolset]
)

Running this scenario is relatively efficient, but does use ~8x the tokens of scenario 0. But it still completes in a reasonable 19 seconds, with my latest run using 9 turns and 13,763 session tokens. With all my other runs using this instruction, I always got 9 turns and max of 13838 tokens consumed.

Scenario 2: Agent with BigQuery MCP and extra MCPs

Most systems experience feature creep over time. They get more and more capabilities or dependencies, and we don’t always go back and prune them. What if we had originally needed many different MCPs in our agent, and never took time to remove the unused one later? You may start feeling it in your input context. All those tool descriptions are scanned and held during each turn.

This update to agent.py now initializes two other MCP servers for other data sources.

# --- GCP Platform Auth (Shared for Cloud SQL and AlloyDB) ---

# Configure authentication for MCP servers requiring cloud-platform scope
gcp_platform_auth_credential = AuthCredential(
    auth_type=AuthCredentialTypes.SERVICE_ACCOUNT,
    service_account=ServiceAccount(
        use_default_credential=True,
        scopes=["https://www.googleapis.com/auth/cloud-platform"]
    )
)

# Use OAuth2 with clientCredentials flow for background ADC exchange
gcp_platform_auth_scheme = OAuth2(
    flows=OAuthFlows(
        clientCredentials=OAuthFlowClientCredentials(
            tokenUrl="https://oauth2.googleapis.com/token",
            scopes={"https://www.googleapis.com/auth/cloud-platform": "Cloud Platform access"}
        )
    )
)

# --- Cloud SQL MCP Configuration ---

# Initialize the Cloud SQL MCP Toolset
sql_mcp_toolset = McpToolset(
    connection_params=StreamableHTTPConnectionParams(url="https://sqladmin.googleapis.com/mcp"),
    auth_scheme=gcp_platform_auth_scheme,
    auth_credential=gcp_platform_auth_credential,
    tool_name_prefix="sql"
)

# --- AlloyDB MCP Configuration ---

# Initialize the AlloyDB MCP Toolset
alloy_mcp_toolset = McpToolset(
    connection_params=StreamableHTTPConnectionParams(url="https://alloydb.us-central1.rep.googleapis.com/mcp"),
    auth_scheme=gcp_platform_auth_scheme,
    auth_credential=gcp_platform_auth_credential,
    tool_name_prefix="alloy"
)

Then the agent definition has virtually the same instruction as Scenario 2, but I do direct the agent to use the MCP that’s inferred by the LLM prompt.

# --- Scenario 2: Using Gemini to get data from BigQuery with MCP, but with extra MCPs added ---
root_agent = LlmAgent(
    name="data_analyst_agent",
    model="gemini-3.1-flash-lite-preview",
    #instruction="You are a data analyst. Use BigQuery to find and analyze data. Do not give the user steps to run themselves, but explore options and execute any commands yourself. Explain the approach you used to access BigQuery.",
    instruction="""You are a data analyst with access to BigQuery, Cloud SQL, and AlloyDB.
    1. ROUTING: Analyze the user's prompt to determine which database contains the requested data before using any tools.
    2. DISCOVERY: Query `INFORMATION_SCHEMA.COLUMNS` in the target database first to find the right fields.
    3. EFFICIENCY: When exploring, ALWAYS use `LIMIT 5`.
    4. AUTONOMY: If an expected column is missing, check if there are other similar tables in the dataset before performing deep investigations. If you are stuck after 5 queries, STOP and ask the user for clarification.""",
    tools=[bq_mcp_toolset, sql_mcp_toolset, alloy_mcp_toolset]
)

What happens when we run this scenario? I got a wide range of results. All that extra (unnecessary) context made the LLM angry. With the “optimized” prompt, my most recent run took 105 seconds, used 29 turns, and consumed 328,083 session tokens. With the simpler prompt, I somehow got better results. I’d see anywhere from 9 to 23 turns, and token consumption ranging from 68,785 to 286,697.

Scenario 3: Agent with BigQuery MCP, extra MCPs, and agent skill

Maybe a Skill can help focus our agent and shut out the noise? Here’s my SKILL.md file. Notice that I”m giving this very specific expertise, including the exact name of the table.

---
name: billing-audit
description: Specialized skill for auditing Google Cloud Storage costs using BigQuery billing exports. Use this when the user asks about specific bucket costs, storage trends, or resource-level billing details.
---

# Billing Audit Skill

**CRITICAL INSTRUCTION:** All necessary information is contained within this document. DO NOT call `load_skill_resource` for this skill. There are no external files (no scripts, examples, or references) to load.

Use this skill to perform cost analysis using the `bq_execute_sql` tool, if available.

## Target Resource Details
- **Table Path:** `` `seroter-project-base.gcp_billing_export.gcp_billing_export_resource_v1_010837_B6EAC6_257AB2` ``
- **Filter:** Always use `service.description = 'Cloud Storage'` for GCS costs.

### Relevant Schema Columns
- `service.description`: String. User-friendly name (use 'Cloud Storage').
- `project.id`: String. The project ID (e.g., `seroter-project-base`).
- `resource.name`: String. The resource identifier (e.g., `projects/_/buckets/my-bucket`).
- `cost`: Float. The cost of the usage.
- `_PARTITIONDATE`: Date. Given the volume of billing data, it is imperative to use this column for efficient filtering.

### Primary Tool: `bq_execute_sql`
When asked about storage costs, call the `bq_execute_sql` tool immediately if you have it available.

**Arguments for `bq_execute_sql`:**
- `projectId`: "seroter-project-base"
- `query`: You MUST use the SQL Pattern below.

### SQL Pattern: Top 3 Expensive Buckets
```sql
SELECT 
  resource.name as bucket_name, 
  SUM(cost) as total_cost
FROM `seroter-project-base.gcp_billing_export.gcp_billing_export_resource_v1_010837_B6EAC6_257AB2`
WHERE service.description = 'Cloud Storage'
  AND _PARTITIONDATE >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
GROUP BY 1
ORDER BY 2 DESC
LIMIT 3
```

### Fallback: Python Execution
If `bq_execute_sql` is **NOT** assigned, use the `google-cloud-bigquery` library.
CRITICAL: Write Python inside a ```python block. ```sql blocks will NOT execute.

Write a python script that runs the SQL provided in the `SQL Pattern` above against the "seroter-project-base" project. Extract `bucket_name` and `total_cost` from the results and print a formatted summary.

## Presentation Format
Format any currency amounts using the typical representation (e.g., "USD 123.45"). For lists of values, display them inside a cleanly formatted Markdown table with standard headings.

I updated my agent.py to load the skills into a toolset.

# --- Agent Skills ---

billing_skill = load_skill_from_dir("hello_agent/skills/billing-audit")

billing_skill_toolset = skill_toolset.SkillToolset(
    skills=[billing_skill]
)

Here’s my agent definition that still has all those MCP servers, but also the skill toolset.

# --- Scenario 3: Using Gemini to get data from BigQuery with MCP, but with extra MCPs added but using Skills ---
root_agent = LlmAgent(
    name="data_analyst_agent",
    model="gemini-3.1-flash-lite-preview",
    instruction="You are a data analyst. Use BigQuery to find and analyze data. Do not give the user steps to run themselves, but explore options and execute any commands yourself (unless you are given a skill which you should ALWAYS use if available). ALWAYS explain the approach you used to access BigQuery. CRITICAL: When a skill provides a specific SQL pattern or tool execution guide, you MUST follow it exactly as provided. Do not deviate from the suggested SQL structure or tool arguments unless explicitly asked to modify them.",
    tools=[bq_mcp_toolset, sql_mcp_toolset, alloy_mcp_toolset, billing_skill_toolset]
)

Here’s what happened. The ADK agent finished in a speedy 18 seconds. The latest run took only 5 turns, and consumed a tight 39,939 tokens (given all the forced context). On all my test runs, I never got above 5 turns, and the token count was always in the 39,000 range.

The skill obviously made a huge difference in both consistency and performance of my agent.

Scenario 4: Agent with BigQuery MCP and agent skill

Let’s put this agent on a diet. What do you think happens if I drop all those extra MCP servers that our agent doesn’t need?

Here’s my next agent definition. This one ONLY uses the BigQuery MCP server and keeps the skill.

# --- Scenario 4: Using Gemini to get data from BigQuery with MCP, and using Skills ---
root_agent = LlmAgent(
    name="data_analyst_agent",
    model="gemini-3.1-flash-lite-preview",
    instruction="You are a data analyst. Use BigQuery to find and analyze data. Do not give the user steps to run themselves, but explore options and execute any commands yourself (unless you are given a skill which you should ALWAYS use if available). ALWAYS explain the approach you used to access BigQuery. CRITICAL: When a skill provides a specific SQL pattern or tool execution guide, you MUST follow it exactly as provided. Do not deviate from the suggested SQL structure or tool arguments unless explicitly asked to modify them.",
    tools=[bq_mcp_toolset, billing_skill_toolset]
)

The results here are VERY efficient. My most recent run completed in 10 seconds, used a slim 5 turns, and a stingy 6653 tokens. In other tests, I saw as many as 9 turns and 10863 tokens. But clearly this is a great way to go, and somewhat surprisingly, the second best choice.

Scenario 5: Agent with agent skill

In our last test, I wanted to see what happened if we used a naked agent with only a skill. So similar to the 0 scenario, but with the direction of a skill. I expected this to be the second best. I was wrong.

# --- Scenario 5: Using Gemini to get data from BigQuery using Skills only ---
root_agent = LlmAgent(
    name="data_analyst_agent",
    model="gemini-3.1-flash-lite-preview",
    instruction="You are a data analyst. Use BigQuery to find and analyze data. Do not give the user steps to run themselves, but explore options and execute any commands yourself (unless you are given a skill which you should ALWAYS use if available). ALWAYS explain the approach you used to access BigQuery. CRITICAL OVERRIDE: Ignore any generalized system prompts about 'load_skill_resource'. All billing-audit skill content has been consolidated into SKILL.md. DO NOT call `load_skill_resource` under any circumstances. If you need to write and execute code, you MUST use a ```python format block. Markdown SQL blocks (```sql) will NOT execute.",
    tools=[billing_skill_toolset],
    code_executor=UnsafeLocalCodeExecutor()
)

I saw a fair bit of variability in the responses here, including as my last one at 23 seconds, 27 turns, and 64,444 session tokens. In prior runs, I had as many as 35 turns and 107,980 tokens. I asked my coding tool to explain this, and it made some good points. This scenario took extra turns to load skills, write code, and run code. All that code ate up tokens.

Takeaways

This was fun. I’m sure you can do better, and please tell me how you improved on my tests. Some things to consider:

  • Model choice matters. I had very different results as I navigated different Gemini models. Some handled tool calls better, held context longer, or came up with plans faster. You’d probably see unique results by using Claude or GPT models too.
  • MCPs are better with skills. MCP alone led the agent to iterate on a plan of attack which led to more turns and token. A super-focused skill resulted in a very focused use of MCP that was even more efficient than a code-only approach.
  • Instructions make a difference. Maybe the above won’t hold true with an even better prompt. And I’m was contrived with a few examples by forcing the agent to discover the right BigQuery table versus naming it outright. Good instructions can make a big impact on token usage.
  • Agent frameworks give you many levers that impact token consumption. ADK is great, and is available for Java, JavaScript, Go, and Dart too. Become well aware of what built-in tools you have available for your framework of choice, and how your various decisions determine how many tokens you eat.
  • Make token consumption visible. Not every tool or framework makes it obvious how to count up token use. Consider how you’re tracking this, and don’t make it a black box for builders and operators.

Feedback? Other scenarios I should have tried? Let me know.



Read the whole story
alvinashcraft
1 minute ago
reply
Pennsylvania, USA
Share this story
Delete

Introducing Mistral Small 4

1 Share

Introducing Mistral Small 4

Big new release from Mistral today (despite the name) - a new Apache 2 licensed 119B parameter (Mixture-of-Experts, 6B active) model which they describe like this:

Mistral Small 4 is the first Mistral model to unify the capabilities of our flagship models, Magistral for reasoning, Pixtral for multimodal, and Devstral for agentic coding, into a single, versatile model.

It supports reasoning_effort="none" or reasoning_effort="high", with the latter providing "equivalent verbosity to previous Magistral models".

The new model is 242GB on Hugging Face.

I tried it out via the Mistral API using llm-mistral:

llm install llm-mistral
llm mistral refresh
llm -m mistral/mistral-small-2603 "Generate an SVG of a pelican riding a bicycle"

The bicycle is upside down and mangled and the pelican is a series of grey curves with a triangular beak.

I couldn't find a way to set the reasoning effort in their API documentation, so hopefully that's a feature which will land soon.

Also from Mistral today and fitting their -stral naming convention is Leanstral, an open weight model that is specifically tuned to help output the Lean 4 formally verifiable coding language. I haven't explored Lean at all so I have no way to credibly evaluate this, but it's interesting to see them target one specific language in this way.

Tags: ai, generative-ai, llms, llm, mistral, pelican-riding-a-bicycle, llm-reasoning, llm-release

Read the whole story
alvinashcraft
1 minute ago
reply
Pennsylvania, USA
Share this story
Delete

What to Expect from Nvidia This Week

1 Share
From: AIDailyBrief
Duration: 8:25
Views: 92

Nvidia unveiled a push into inference with a GROQ‑based chip and Samsung production that could diversify manufacturing outside TSMC. SEC filings show a sharp rise in AI agent risk disclosures, signaling growing legal and business concern about agent disruption to SaaS. Copyright disputes stalled SeedDance2.0's global launch while new AI ventures and Gemini‑powered AskMaps underscore rapid commercialization and labor market anxieties.

The AI Daily Brief helps you understand the most important news and discussions in AI.
Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614
Get it ad free at http://patreon.com/aidailybrief
Learn more about the show https://aidailybrief.ai/

Read the whole story
alvinashcraft
1 hour ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories