Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
152910 stories
·
33 followers

Making (Very) Small LLMs Smarter

1 Share

Hello, I’m Philippe, and I am a Principal Solutions Architect helping customers with their usage of Docker. I started getting seriously interested in generative AI about two years ago. What interests me most is the ability to run language models (LLMs) directly on my laptop (For work, I have a MacBook Pro M2 max, but on a more personal level, I run LLMs on my personal MacBook Air M4 and on Raspberry Pis – yes, it’s possible, but I’ll talk about that another time).

Let’s be clear, reproducing a Claude AI Desktop or Chat GPT on a laptop with small language models is not possible. Especially since I limit myself to models that have between 0.5 and 7 billion parameters. But I find it an interesting challenge to see how far we can go with these small models. So, can we do really useful things with small LLMs? The answer is yes, but you need to be creative and put in a bit of effort.

I’m going to take a concrete use case, related to development (but in the future I’ll propose “less technical” use cases).

(Specific) Use Case: Code Writing Assistance

I need help writing code

Currently, I’m working in my free time on an open-source project, which is a Golang library for quickly developing small generative AI agents. It’s both to get my hands dirty with Golang and prepare tools for other projects. This project is called Nova; there’s nothing secret about it, you can find it here.

If I use Claude AI and ask it to help me write code with Nova: “I need a code snippet of a Golang Nova Chat agent using a stream completion.”

tiny model fig 1

The response will be quite disappointing, because Claude doesn’t know Nova (which is normal, it’s a recent project). But Claude doesn’t want to disappoint me and will still propose something which has nothing to do with my project.

And it will be the same with Gemini.

tiny model fig 2

So, you’ll tell me, give the “source code of your repository to feed” to Claude AI or Gemini. OK, but imagine the following situation: I don’t have access to these services, for various reasons. Some of these reasons could be confidentiality, the fact that I’m on a project where we don’t have the right to use the internet, for example. That already disqualifies Claude AI and Gemini. How can I get help writing code with a small local LLM? So as you guessed, with a local LLM. And moreover, a “very small” LLM.

Choosing a language model

When you develop a solution based on generative AI, the choice of language model(s) is crucial. And you’ll have to do a lot of technology watching, research, and testing to find the model that best fits your use case. And know that this is non-negligible work.

For this article (and also because I use it), I’m going to use hf.co/qwen/qwen2.5-coder-3b-instruct-gguf:q4_k_m, which you can find here. It’s a 3 billion parameter language model, optimized for code generation. You can install it with Docker Model Runner with the following command:

docker model pull hf.co/Qwen/Qwen2.5-Coder-3B-Instruct-GGUF:Q4_K_M

And to start chatting with the model, you can use the following command:

docker model run hf.co/qwen/qwen2.5-coder-3b-instruct-gguf:q4_k_m

Or use Docker Desktop:

tiny model fig 3

So, of course, as you can see in the illustration above, this little “Qwen Coder” doesn’t know my Nova library either. But we’re going to fix that.

Feeding the model with specific information

For my project, I have a markdown file in which I save the code snippets I use to develop examples with Nova. You can find it here. For now, there’s little content, but it will be enough to prove and illustrate my point.

So I could add the entire content of this file to a user prompt that I would give to the model. But that will be ineffective. Indeed, small models have a relatively small context window. But even if my “Qwen Coder” was capable of ingesting all the content of my markdown file, it would have trouble focusing on my request and on what it should do with this information. So,

  • 1st essential rule: when you use a very small LLM, the larger the content provided to the model, the less effective the model will be.
  • 2nd essential rule: the more you keep the conversation history, the more the content provided to the model will grow, and therefore it will decrease the effectiveness of the model.

So, to work around this problem, I’m going to use a technique called RAG (Retrieval Augmented Generation). The principle is simple: instead of providing all the content to the model, we’re going to store this content in a “vector” type database, and when the user makes a request, we’re going to search in this database for the most relevant information based on the user’s request. Then, we’re going to provide only this relevant information to the language model. For this blog post, the data will be kept in memory (which is not optimal, but sufficient for a demonstration).

RAG?

There are already many articles on the subject, so I won’t go into detail. But here’s what I’m going to do for this blog post:

  1. My snippets file is composed of sections: a markdown title (## snippet name), possibly a description in free text, and a code block (golang … ).
  2. I’m going to split this file by sections into chunks of text (we also talk about “chunks”),
  3. Then, for each section I’m going to create an “embedding” (vector representation of text == mathematical representation of the semantic meaning of the text) with the ai/embeddinggemma:latest model (a relatively small and efficient embedding model). Then I’m going to store these embeddings (and the associated text) in an in-memory vector database (a simple array of JSON objects).
  4. If you want to learn more about embedding, please read this article:Run Embedding Models and Unlock Semantic Search with Docker Model Runner

Diagram of the vector database creation process:

tiny model fig 4

Similarity search and user prompt construction

Once I have this in place, when I make a request to the language model (so hf.co/qwen/qwen2.5-coder-3b-instruct-gguf:q4_k_m), I’m going to:

  1. Create an embedding of the user’s request with the embedding model.
  2. Compare this embedding with the embeddings stored in the vector database to find the most relevant sections (by calculating the distance between the vector representation of my question and the vector representations of the snippets). This is called a similarity search.
  3. From the most relevant sections (the most similar), I’ll be able to construct a user prompt that includes only the relevant information and my initial request.

Diagram of the search and user prompt construction process:

tiny model fig 5

So the final user prompt will contain:

  • The system instructions. For example: “You are a helpful coding assistant specialized in Golang and the Nova library. Use the provided code snippets to help the user with their requests.”
  • The relevant sections were extracted from the vector database.
  • The user’s request.

Remarks:

  • I explain the principles and results, but all the source code (NodeJS with LangchainJS) used to arrive at my conclusions is available in this project 
  • To calculate distances between vectors, I used cosine similarity (A cosine similarity score of 1 indicates that the vectors point in the same direction. A cosine similarity score of 0 indicates that the vectors are orthogonal, meaning they have no directional similarity.)
  • You can find the JavaScript function I used here
  • And the piece of code that I use to split the markdown snippets file
  • Warning: embedding models are limited by the size of text chunks they can ingest. So you have to be careful not to exceed this size when splitting the source file. And in some cases, you’ll have to change the splitting strategy (fixed-size chunk,s for example, with or without overlap)

Implementation and results, or creating my Golang expert agent

Now that we have the operating principle, let’s see how to put this into music with LangchainJS, Docker Model Runner, and Docker Agentic Compose.

Docker Agentic Compose configuration

Let’s start with the Docker Agentic Compose project structure:

services:
  golang-expert:
    build:
      context: .
      dockerfile: Dockerfile
    environment:
      TERM: xterm-256color

      HISTORY_MESSAGES: 2
      MAX_SIMILARITIES: 3
      COSINE_LIMIT: 0.45

      OPTION_TEMPERATURE: 0.0
      OPTION_TOP_P: 0.75
      OPTION_PRESENCE_PENALTY: 2.2

      CONTENT_PATH: /app/data

    volumes:
      - ./data:/app/data

    stdin_open: true   # docker run -i
    tty: true          # docker run -t

    configs:
      - source: system.instructions.md
        target: /app/system.instructions.md

    models:
      chat-model:
        endpoint_var: MODEL_RUNNER_BASE_URL
        model_var: MODEL_RUNNER_LLM_CHAT

      embedding-model:
        endpoint_var: MODEL_RUNNER_BASE_URL
        model_var: MODEL_RUNNER_LLM_EMBEDDING


models:
  chat-model:
    model: hf.co/qwen/qwen2.5-coder-3b-instruct-gguf:q4_k_m

  embedding-model:
    model: ai/embeddinggemma:latest

configs:
  system.instructions.md:
    content: |
      Your name is Bob (the original replicant).
      You are an expert programming assistant in Golang.
      You write clean, efficient, and well-documented code.
      Always:
      - Provide complete, working code
      - Include error handling
      - Add helpful comments
      - Follow best practices for the language
      - Explain your approach briefly

      Use only the information available in the provided data and your KNOWLEDGE BASE.

What’s important here is:

I only keep the last 2 messages in my conversation history, and I only select the 2 or 3 best similarities found at most (to limit the size of the user prompt):

HISTORY_MESSAGES: 2
MAX_SIMILARITIES: 3
COSINE_LIMIT: 0.45

You can adjust these values according to your use case and your language model’s capabilities.

The models section, where I define the language models I’m going to use:

models:
  chat-model:
    model: hf.co/qwen/qwen2.5-coder-3b-instruct-gguf:q4_k_m

  embedding-model:
    model: ai/embeddinggemma:latest

One of the advantages of this section is that it will allow Docker Compose to download the models if they’re not already present on your machine.

As well as the models section of the golang-expert service, where I map the environment variables to the models defined above:

models:
    chat-model:
    endpoint_var: MODEL_RUNNER_BASE_URL
    model_var: MODEL_RUNNER_LLM_CHAT

    embedding-model:
    endpoint_var: MODEL_RUNNER_BASE_URL
    model_var: MODEL_RUNNER_LLM_EMBEDDING

And finally, the system instructions configuration file:

configs:
    - source: system.instructions.md
    target: /app/system.instructions.md

Which I define a bit further down in the configs section:

configs:
  system.instructions.md:
    content: |
      Your name is Bob (the original replicant).
      You are an expert programming assistant in Golang.
      You write clean, efficient, and well-documented code.
      Always:
      - Provide complete, working code
      - Include error handling
      - Add helpful comments
      - Follow best practices for the language
      - Explain your approach briefly

      Use only the information available in the provided data and your KNOWLEDGE BASE.

You can, of course, adapt these system instructions to your use case. And also persist them in a separate file if you prefer.

Dockerfile

It’s rather simple:

FROM node:22.19.0-trixie

WORKDIR /app
COPY package*.json ./
RUN npm install
COPY *.js .

# Create non-root user
RUN groupadd --gid 1001 nodejs && \
    useradd --uid 1001 --gid nodejs --shell /bin/bash --create-home bob-loves-js

# Change ownership of the app directory
RUN chown -R bob-loves-js:nodejs /app

# Switch to non-root user
USER bob-loves-js

Now that the configuration is in place, let’s move on to the agent’s source code.

Golang expert agent source code, a bit of LangchainJS with RAG

The JavaScript code is rather simple (probably improvable, but functional) and follows these main steps:

1. Initial configuration

  • Connection to both models (chat and embeddings) via LangchainJS
  • Loading parameters from environment variables

2. Vector database creation (at startup)

  • Reading the snippets.md file
  • Splitting into sections (chunks)
  • Generating an embedding for each section
  • Storing in an in-memory vector database

3. Interactive conversation loop

  • The user asks a question
  • Creating an embedding of the question
  • Similarity search in the vector database to find the most relevant snippets
  • Construction of the final prompt with: history + system instructions + relevant snippets + question
  • Sending to the LLM and displaying the response in streaming
  • Updating the history (limited to the last N messages)
import { ChatOpenAI } from "@langchain/openai";
import { OpenAIEmbeddings} from '@langchain/openai';

import { splitMarkdownBySections } from './chunks.js'
import { VectorRecord, MemoryVectorStore } from './rag.js';


import prompts from "prompts";
import fs from 'fs';

// Define [CHAT MODEL] Connection
const chatModel = new ChatOpenAI({
  model: process.env.MODEL_RUNNER_LLM_CHAT || `ai/qwen2.5:latest`,
  apiKey: "",
  configuration: {
    baseURL: process.env.MODEL_RUNNER_BASE_URL || "http://localhost:12434/engines/llama.cpp/v1/",
  },
  temperature: parseFloat(process.env.OPTION_TEMPERATURE) || 0.0,
  top_p: parseFloat(process.env.OPTION_TOP_P) || 0.5,
  presencePenalty: parseFloat(process.env.OPTION_PRESENCE_PENALTY) || 2.2,
});


// Define [EMBEDDINGS MODEL] Connection
const embeddingsModel = new OpenAIEmbeddings({
    model: process.env.MODEL_RUNNER_LLM_EMBEDDING || "ai/embeddinggemma:latest",
    configuration: {
    baseURL: process.env.MODEL_RUNNER_BASE_URL || "http://localhost:12434/engines/llama.cpp/v1/",
        apiKey: ""
    }
})

const maxSimilarities = parseInt(process.env.MAX_SIMILARITIES) || 3
const cosineLimit = parseFloat(process.env.COSINE_LIMIT) || 0.45

// ----------------------------------------------------------------
//  Create the embeddings and the vector store from the content file
// ----------------------------------------------------------------

console.log("========================================================")
console.log(" Embeddings model:", embeddingsModel.model)
console.log(" Creating embeddings...")
let contentPath = process.env.CONTENT_PATH || "./data"

const store = new MemoryVectorStore();

let contentFromFile = fs.readFileSync(contentPath+"/snippets.md", 'utf8');
let chunks = splitMarkdownBySections(contentFromFile);
console.log(" Number of documents read from file:", chunks.length);


// -------------------------------------------------
// Create and save the embeddings in the memory vector store
// -------------------------------------------------
console.log(" Creating the embeddings...");

for (const chunk of chunks) {
  try {
    // EMBEDDING COMPLETION:
    const chunkEmbedding = await embeddingsModel.embedQuery(chunk);
    const vectorRecord = new VectorRecord('', chunk, chunkEmbedding);
    store.save(vectorRecord);

  } catch (error) {
    console.error(`Error processing chunk:`, error);
  }
}

console.log(" Embeddings created, total of records", store.records.size);
console.log();


console.log("========================================================")


// Load the system instructions from a file
let systemInstructions = fs.readFileSync('/app/system.instructions.md', 'utf8');

// ----------------------------------------------------------------
// HISTORY: Initialize a Map to store conversations by session
// ----------------------------------------------------------------
const conversationMemory = new Map()

let exit = false;

// CHAT LOOP:
while (!exit) {
  const { userMessage } = await prompts({
    type: "text",
    name: "userMessage",
    message: `Your question (${chatModel.model}): `,
    validate: (value) => (value ? true : "Question cannot be empty"),
  });

  if (userMessage == "/bye") {
    console.log(" See you later!");
    exit = true;
    continue
  }

  // HISTORY: Get the conversation history for this session
  const history = getConversationHistory("default-session-id")

  // ----------------------------------------------------------------
  // SIMILARITY SEARCH:
  // ----------------------------------------------------------------
  // -------------------------------------------------
  // Create embedding from the user question
  // -------------------------------------------------
  const userQuestionEmbedding = await embeddingsModel.embedQuery(userMessage);

  // -------------------------------------------------
  // Use the vector store to find similar chunks
  // -------------------------------------------------
  // Create a vector record from the user embedding
  const embeddingFromUserQuestion = new VectorRecord('', '', userQuestionEmbedding);

  const similarities = store.searchTopNSimilarities(embeddingFromUserQuestion, cosineLimit, maxSimilarities);

  let knowledgeBase = "KNOWLEDGE BASE:\n";

  for (const similarity of similarities) {
    console.log(" CosineSimilarity:", similarity.cosineSimilarity, "Chunk:", similarity.prompt);
    knowledgeBase += `${similarity.prompt}\n`;
  }

  console.log("\n Similarities found, total of records", similarities.length);
  console.log();
  console.log("========================================================")
  console.log()

  // -------------------------------------------------
  // Generate CHAT COMPLETION:
  // -------------------------------------------------

  // MESSAGES== PROMPT CONSTRUCTION:
  let messages = [
      ...history,
      ["system", systemInstructions],
      ["system", knowledgeBase],
      ["user", userMessage]
  ]

  let assistantResponse = ''
  // STREAMING COMPLETION:
  const stream = await chatModel.stream(messages);
  for await (const chunk of stream) {
    assistantResponse += chunk.content
    process.stdout.write(chunk.content);
  }
  console.log("\n");

  // HISTORY: Add both user message and assistant response to history
  addToHistory("default-session-id", "user", userMessage)
  addToHistory("default-session-id", "assistant", assistantResponse)

}

// Helper function to get or create a conversation history
function getConversationHistory(sessionId, maxTurns = parseInt(process.env.HISTORY_MESSAGES)) {
  if (!conversationMemory.has(sessionId)) {
    conversationMemory.set(sessionId, [])
  }
  return conversationMemory.get(sessionId)
}

// Helper function to add a message to the conversation history
function addToHistory(sessionId, role, content) {
  const history = getConversationHistory(sessionId)
  history.push([role, content])

  // Keep only the last maxTurns conversations
  const maxTurns = parseInt(process.env.HISTORY_MESSAGES) // Adjust this value based on your needs
  if (history.length > maxTurns * 2) { // *2 because each turn has user & assistant message
    history.splice(0, 2) // Remove oldest turn (user + assistant messages)
  }
}

You can find it here

All that’s left is to launch to verify my hypotheses

In the project folder, run the following command:

docker compose up --build --no-log-prefix -d

Then connect to the container and launch the application:

docker compose exec golang-expert /bin/bash
node index.js

Nova Chat Agent with streaming completion

Once the application started, it’s time to ask our question to our Golang expert agent in development with the Nova library:

========================================================
 Embeddings model: ai/embeddinggemma:latest
 Creating embeddings...
 Number of documents read from file: 4
 Creating the embeddings...
 Embeddings created, total of records 4

========================================================
?  Your question (hf.co/qwen/qwen2.5-coder-3b-instruct-gguf:q4_k_m):  › I need a code snippet of a Golang Nova Chat agent using a stream completion


After a quick similarity search, our agent found two relevant snippets in the vector database. And it was able to provide me with complete, functional, and commented code to answer my request:

Certainly! Below is a complete, working Go code snippet for a Nova Chat agent that uses streaming completion. This example demonstrates how to set up an agent with the necessary configurations and generate a stream completion response.

```go
package main

import (
        "context"
        "fmt"

        "github.com/snipwise/nova/nova-sdk/agents"
        "github.com/snipwise/nova/nova-sdk/agents/chat"
        "github.com/snipwise/nova/nova-sdk/messages"
        "github.com/snipwise/nova/nova-sdk/messages/roles"
        "github.com/snipwise/nova/nova-sdk/models"
)

func main() {

        ctx := context.Background()

        // Create a new chat agent with streaming capabilities
        agent, err := chat.NewAgent(
                ctx,
                agents.Config{
                        EngineURL:          "http://localhost:12434/engines/llama.cpp/v1",
                        SystemInstructions: "You are Bob, a helpful AI assistant.",
                        KeepConversationHistory: true,
                },
                models.Config{
                        Name:        "ai/qwen2.5:1.5B-F16",
                        Temperature: models.Float64(0.8),
                },
        )
        if err != nil {
                panic(err)
        }

        // Generate a stream completion response
        result, err := agent.GenerateStreamCompletion(
                []messages.Message{
                        {Role: roles.User, Content: "Who is James T Kirk?"},
                },
                func(chunk string, finishReason string) error {
                        // Simple callback that receives strings only
                        if chunk != "" {
                                fmt.Print(chunk)
                        }
                        if finishReason == "stop" {
                                fmt.Println()
                        }
                        return nil
                },
        )
        if err != nil {
                panic(err)
        }

        // Print the final response and finish reason
        fmt.Println("Final Response:", result.Response)
        fmt.Println("Finish Reason:", result.FinishReason)
}

### Explanation:

1. **Imports**: The code imports necessary packages from the Nova SDK to interact with chat agents.

2. **Context**: A context is created using `context.Background()` to manage the lifecycle of requests.

3. **Agent Creation**:
   - An agent is created using `chat.NewAgent` with specified configurations:
     - `EngineURL`: The URL of the engine to use for generating text.
     - `SystemInstructions`: Instructions for the AI assistant, such as who Bob is.
     - `KeepConversationHistory`: Whether to keep a history of previous messages.

4. **Model Configuration**:
   - A model configuration is provided with:
     - `Name`: The name of the model to use.
     - `Temperature`: Controls the randomness of the output (higher values make responses more random).
     - `MaxTokens`: Limits the maximum number of tokens in the response.

5. **Stream Completion Generation**:
   - The `GenerateStreamCompletion` method is called with a list of messages and a callback function that processes each chunk of text as it arrives.
   - The callback prints chunks to the console until the finish reason is "stop", indicating the completion has been reached.

6. **Output**:
   - The final response and finish reason are printed after the stream completes.

This code snippet demonstrates how to set up a streaming chat agent using Nova, allowing you to receive text responses incrementally as they become available.

I then asked other questions.

Nova Structured Agent

This time, I asked for a structured agent: “I need a code snippet of a Nova Structured Agent.”

And the result was conclusive:

Extract from the response:

Certainly! Below is a complete, working example of a Nova Structured Agent in Go. This agent will generate structured data about a specified country based on the user's input.

```go
package main

import (
        "context"
        "fmt"

        "github.com/snipwise/nova/nova-sdk/agents"
        "github.com/snipwise/nova/nova-sdk/agents/structured"
        "github.com/snipwise/nova/nova-sdk/messages"
        "github.com/snipwise/nova/nova-sdk/messages/roles"
        "github.com/snipwise/nova/nova-sdk/models"
)

type Country struct {
        Name       string   `json:"name"`
        Capital    string   `json:"capital"`
        Population int      `json:"population"`
        Languages  []string `json:"languages"`
}

func main() {
        ctx := context.Background()

        // Create a new Structured Agent
        agent, err := structured.NewAgent[Country](
                ctx,
                agents.Config{
                        EngineURL: "http://localhost:12434/engines/llama.cpp/v1",
                        SystemInstructions: `
                                Your name is Bob.
                                You are an assistant that answers questions about countries around the world.
                        `,
                },
                models.Config{
                        Name:        "hf.co/menlo/jan-nano-gguf:q4_k_m",
                        Temperature: models.Float64(0.0),
                },
        )
        if err != nil {
                panic(err)
        }

        // Generate structured data about Canada
        response, finishReason, err := agent.GenerateStructuredData([]messages.Message{
                {Role: roles.User, Content: "Tell me about Canada."},
        })

        if err != nil {
                panic(err)
        }

        // Print the generated country information
        fmt.Println("Country Name:", response.Name)
        fmt.Println("Capital:", response.Capital)
        fmt.Println("Population:", response.Population)
        fmt.Println("Languages:", strings.Join(response.Languages, ", "))
}

### Explanation:

1. **Imports**: The code imports necessary packages from the Nova SDK to interact with agents and models.

2. **Country Struct**: A simple struct `Country` is defined to represent the data structure expected by the agent.

3. **Main Function**:
   - A context is created using `context.Background()`.
   - A new Structured Agent is created for generating structured data about countries.
   - The system instructions are set to provide a friendly greeting and instruction on answering questions about countries.
   - The model configuration specifies the engine URL, temperature, and other parameters.

4. **GenerateStructuredData Method**: This method generates structured data based on the user's input. It takes an array of messages as input, where each message has a role (e.g., `User`) and content (e.g., "Tell me about Canada").

5. **Error Handling**: The code includes error handling to ensure that any issues during agent creation or data generation are properly caught and reported.

6. **Output**: The generated country information is printed, including the name, capital, population, and languages.

This example demonstrates how to use a Nova Structured Agent to generate structured data about countries based on user input. You can modify the system instructions and model configuration as needed for your specific use case.

Then, I moved on to the next question.

Nova RAG Agent with vector store. Be careful with your similarity settings!

This time, I asked for a “RAG” agent: “I need a snippet of a Nova RAG agent with a vector store.”

And once again, I got a relevant response.

However, when I tried with this question (after restarting the agent to start from a clean base without conversation history): “I need a snippet of a Nova RAG agent.”

The similarity search returned no relevant results (because the words “vector store” were not present in the snippets). And the agent responded with generic code that had nothing to do with Nova or was using code from Nova Chat Agents.

There may be several possible reasons:

  • The embedding model is not suitable for my use case,
  • The embedding model is not precise enough,
  • The splitting of the code snippets file is not optimal (you can add metadata to chunks to improve similarity search, for example, but don’t forget that chunks must not exceed the maximum size that the embedding model can ingest).

In that case, there’s a simple solution that works quite well: you lower the similarity thresholds and/or increase the number of returned similarities. This allows you to have more results to construct the user prompt, but be careful not to exceed the maximum context size of the language model. And you can also do tests with other “bigger” LLMs (more parameters and/or larger context window).

In the latest version of the snippets file, I added a KEYWORDS: … line below the markdown titles to help with similarity search. Which greatly improved the results obtained.

Conclusion

Using “Small Language Models” (SLM) or “Tiny Language Models” (TLM) requires a bit of energy and thought to work around their limitations. But it’s possible to build effective solutions for very specific problems. And once again, always think about the context size for the chat model and how you’ll structure the information for the embedding model. And by combining several specialized “small agents”, you can achieve very interesting results. This will be the subject of future articles.

Learn more

Read the whole story
alvinashcraft
32 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

The Astro Technology Company joins Cloudflare

1 Share
The Astro Technology Company is joining Cloudflare! Astro remains open-source, MIT-licensed, and platform-agnostic. With Cloudflare's support, we're focusing 100% on building the best framework for content-driven websites. Astro 6 beta is available now.
Read the whole story
alvinashcraft
32 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Code.org: Use AI In an Interview Without Our OK and You're Dead To Us

1 Share
theodp writes: Code.org, the nonprofit backed by AI giants Microsoft, Google and Amazon and whose Hour of AI and free AI curriculum aim to make world's K-12 schoolchildren AI literate, points job seekers to its AI Use Policy in Hiring, which promises dire consequences for those who use AI during interviews or take home assignments without its OK. Explaining "What's Not Okay," Code.org writes: "While we support thoughtful use of AI, certain uses undermine fairness and honesty in the hiring process. We ask that candidates do not [...] use AI during interviews and take-home assignments without explicit consent from the interview team. Such use goes against our values of integrity and transparency and will result in disqualification from the hiring process." Interestingly, Code.org CEO Partovi last year faced some blowback from educators over his LinkedIn post that painted schools that police AI use by students as dinosaurs. Partovi wrote, "Schools of the past define AI use as 'cheating.' Schools of the future define AI skills as the new literacy. Every desk-job employer is looking to hire workers who are adept at AI. Employers want the students who are best at this new form of 'cheating.'"

Read more of this story at Slashdot.

Read the whole story
alvinashcraft
32 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

21 Lessons from 14 Years at Google

1 Share

The following article originally appeared on Addy Osmani’s Substack newsletter, Elevate, and is being republished here with his permission.

When I joined Google ~14 years ago, I thought the job was about writing great code. I was partly right. But the longer I’ve stayed, the more I’ve realized that the engineers who thrive aren’t necessarily the best programmers. They’re the ones who’ve figured out how to navigate everything around the code: the people, the politics, the alignment, the ambiguity.

These lessons are what I wish I’d known earlier. Some would have saved me months of frustration. Others took years to fully understand. None of them are about specific technologies—those change too fast to matter. They’re about the patterns that keep showing up, project after project, team after team.

I’m sharing them because I’ve benefited enormously from engineers who did the same for me. Consider this my attempt to pay it forward.

1. The best engineers are obsessed with solving user problems.

It’s seductive to fall in love with a technology and go looking for places to apply it. I’ve done it. Everyone has. But the engineers who create the most value work backwards: They become obsessed with understanding user problems deeply and let solutions emerge from that understanding.

User obsession means spending time in support tickets, talking to users, watching users struggle, asking “why” until you hit bedrock. The engineer who truly understands the problem often finds that the elegant solution is simpler than anyone expected.

The engineer who starts with a solution tends to build complexity in search of a justification.

2. Being right is cheap. Getting to right together is the real work.

You can win every technical argument and lose the project. I’ve watched brilliant engineers accrue silent resentment by always being the smartest person in the room. The cost shows up later as “mysterious execution issues” and “strange resistance.”

The skill isn’t being right. It’s entering discussions to align on the problem, creating space for others, and remaining skeptical of your own certainty.

Strong opinions, weakly held—not because you lack conviction but because decisions made under uncertainty shouldn’t be welded to identity.

3. Bias towards action. Ship. You can edit a bad page, but you can’t edit a blank one.

The quest for perfection is paralyzing. I’ve watched engineers spend weeks debating the ideal architecture for something they’ve never built. The perfect solution rarely emerges from thought alone. It emerges from contact with reality. AI can in many ways help here.

First do it, then do it right, then do it better. Get the ugly prototype in front of users. Write the messy first draft of the design doc. Ship the MVP that embarrasses you slightly. You’ll learn more from one week of real feedback than a month of theoretical debate.

Momentum creates clarity. Analysis paralysis creates nothing.

4. Clarity is seniority. Cleverness is overhead.

The instinct to write clever code is almost universal among engineers. It feels like proof of competence.

But software engineering is what happens when you add time and other programmers. In that environment, clarity isn’t a style preference. It’s operational risk reduction.

Your code is a strategy memo to strangers who will maintain it at 2am during an outage. Optimize for their comprehension, not your elegance. The senior engineers I respect most have learned to trade cleverness for clarity, every time.

5. Novelty is a loan you repay in outages, hiring, and cognitive overhead.

Treat your technology choices like an organization with a small “innovation token” budget. Spend one each time you adopt something materially nonstandard. You can’t afford many.

The punchline isn’t “never innovate.” It’s “innovate only where you’re uniquely paid to innovate.” Everything else should default to boring, because boring has known failure modes.

The “best tool for the job” is often the “least-worst tool across many jobs”—because operating a zoo becomes the real tax.

6. Your code doesn’t advocate for you. People do.

Early in my career, I believed great work would speak for itself. I was wrong. Code sits silently in a repository. Your manager mentions you in a meeting, or they don’t. A peer recommends you for a project, or someone else.

In large organizations, decisions get made in meetings you’re not invited to, using summaries you didn’t write, by people who have five minutes and 12 priorities. If no one can articulate your impact when you’re not in the room, your impact is effectively optional.

This isn’t strictly about self-promotion. It’s about making the value chain legible to everyone—including yourself.

7. The best code is the code you never had to write.

We celebrate creation in engineering culture. Nobody gets promoted for deleting code, even though deletion often improves a system more than addition. Every line of code you don’t write is a line you never have to debug, maintain, or explain.

Before you build, exhaust the question: “What would happen if we just…didn’t?” Sometimes the answer is “nothing bad,” and that’s your solution.

The problem isn’t that engineers can’t write code or use AI to do so. It’s that we’re so good at writing it that we forget to ask whether we should.

8. At scale, even your bugs have users.

With enough users, every observable behavior becomes a dependency—regardless of what you promised. Someone is scraping your API, automating your quirks, caching your bugs.

This creates a career-level insight: You can’t treat compatibility work as “maintenance” and new features as “real work.” Compatibility is product.

Design your deprecations as migrations with time, tooling, and empathy. Most “API design” is actually “API retirement.”

9. Most “slow” teams are actually misaligned teams.

When a project drags, the instinct is to blame execution: People aren’t working hard enough; the technology is wrong; there aren’t enough engineers. Usually none of that is the real problem.

In large companies, teams are your unit of concurrency, but coordination costs grow geometrically as teams multiply. Most slowness is actually alignment failure—people building the wrong things, or the right things in incompatible ways.

Senior engineers spend more time clarifying direction, interfaces, and priorities than “writing code faster” because that’s where the actual bottleneck lives.

10. Focus on what you can control. Ignore what you can’t.

In a large company, countless variables are outside your control: organizational changes, management decisions, market shifts, product pivots. Dwelling on these creates anxiety without agency.

The engineers who stay sane and effective zero in on their sphere of influence. You can’t control whether a reorg happens. You can control the quality of your work, how you respond, and what you learn. When faced with uncertainty, break problems into pieces and identify the specific actions available to you.

This isn’t passive acceptance, but it is strategic focus. Energy spent on what you can’t change is energy stolen from what you can.

11. Abstractions don’t remove complexity. They move it to the day you’re on call.

Every abstraction is a bet that you won’t need to understand what’s underneath. Sometimes you win that bet. But something always leaks, and when it does, you need to know what you’re standing on.

Senior engineers keep learning “lower level” things even as stacks get higher. Not out of nostalgia but out of respect for the moment when the abstraction fails and you’re alone with the system at 3am. Use your stack.

But keep a working model of its underlying failure modes.

12. Writing forces clarity. The fastest way to learn something better is to try teaching it.

Writing forces clarity. When I explain a concept to others—in a doc, a talk, a code review comment, even just chatting with AI—I discover the gaps in my own understanding. The act of making something legible to someone else makes it more legible to me.

This doesn’t mean that you’re going to learn how to be a surgeon by teaching it, but the premise still holds largely true in the software engineering domain.

This isn’t just about being generous with knowledge. It’s a selfish learning hack. If you think you understand something, try to explain it simply. The places where you stumble are the places where your understanding is shallow.

Teaching is debugging your own mental models.

13. The work that makes other work possible is priceless—and invisible.

Glue work—documentation, onboarding, cross-team coordination, process improvement—is vital. But if you do it unconsciously, it can stall your technical trajectory and burn you out. The trap is doing it as “helpfulness” rather than treating it as deliberate, bounded, visible impact.

Timebox it. Rotate it. Turn it into artifacts: docs, templates, automation. And make it legible as impact, not as personality trait.

Priceless and invisible is a dangerous combination for your career.

14. If you win every debate, you’re probably accumulating silent resistance.

I’ve learned to be suspicious of my own certainty. When I “win” too easily, something is usually wrong. People stop fighting you not because you’ve convinced them but because they’ve given up trying—and they’ll express that disagreement in execution, not meetings.

Real alignment takes longer. You have to actually understand other perspectives, incorporate feedback, and sometimes change your mind publicly.

The short-term feeling of being right is worth much less than the long-term reality of building things with willing collaborators.

15. When a measure becomes a target, it stops measuring.

Every metric you expose to management will eventually be gamed. Not through malice but because humans optimize for what’s measured.

If you track lines of code, you’ll get more lines. If you track velocity, you’ll get inflated estimates.

The senior move: Respond to every metric request with a pair: one for speed; one for quality or risk. Then insist on interpreting trends, not worshiping thresholds. The goal is insight, not surveillance.

16. Admitting what you don’t know creates more safety than pretending you do.

Senior engineers who say “I don’t know” aren’t showing weakness. They’re creating permission. When a leader admits uncertainty, it signals that the room is safe for others to do the same. The alternative is a culture where everyone pretends to understand and problems stay hidden until they explode.

I’ve seen teams where the most senior person never admitted confusion, and I’ve seen the damage. Questions don’t get asked. Assumptions don’t get challenged. Junior engineers stay silent because they assume everyone else gets it.

Model curiosity, and you get a team that actually learns.

17. Your network outlasts every job you’ll ever have.

Early in my career, I focused on the work and neglected networking. In hindsight, this was a mistake. Colleagues who invested in relationships—inside and outside the company—reaped benefits for decades.

They heard about opportunities first, could build bridges faster, got recommended for roles, and cofounded ventures with people they’d built trust with over years.

Your job isn’t forever, but your network is. Approach it with curiosity and generosity, not transactional hustle.

When the time comes to move on, it’s often relationships that open the door.

18. Most performance wins come from removing work, not adding cleverness.

When systems get slow, the instinct is to add: caching layers, parallel processing, smarter algorithms. Sometimes that’s right. But I’ve seen more performance wins from asking, “What are we computing that we don’t need?”

Deleting unnecessary work is almost always more impactful than doing necessary work faster. The fastest code is code that never runs.

Before you optimize, question whether the work should exist at all.

19. Process exists to reduce uncertainty, not to create paper trails.

The best process makes coordination easier and failures cheaper. The worst process is bureaucratic theater. It exists not to help but to assign blame when things go wrong.

If you can’t explain how a process reduces risk or increases clarity, it’s probably just overhead. And if people are spending more time documenting their work than doing it, something has gone deeply wrong.

20. Eventually, time becomes worth more than money. Act accordingly.

Early in your career, you trade time for money—and that’s fine. But at some point, the calculus inverts. You start to realize that time is the nonrenewable resource.

I’ve watched senior engineers burn out chasing the next promo level, optimizing for a few more percentage points of compensation. Some of them got it. Most of them wondered, afterward, if it was worth what they gave up.

The answer isn’t “don’t work hard.” It’s “know what you’re trading, and make the trade deliberately.”

21. There are no shortcuts, but there is compounding.

Expertise comes from deliberate practice—pushing slightly beyond your current skill, reflecting, repeating. For years. There’s no condensed version.

But here’s the hopeful part: Learning compounds when it creates new options, not just new trivia. Write—not for engagement but for clarity. Build reusable primitives. Collect scar tissue into playbooks.

The engineer who treats their career as compound interest, not lottery tickets, tends to end up much further ahead.

A final thought

Twenty-one lessons sounds like a lot, but they really come down to a few core ideas: Stay curious, stay humble, and remember that the work is always about people—the users you’re building for and the teammates you’re building with.

Addy Osmani at Google

A career in engineering is long enough to make plenty of mistakes and still come out ahead. The engineers I admire most aren’t the ones who got everything right. They’re the ones who learned from what went wrong, shared what they discovered, and kept showing up.

If you’re early in your journey, know that it gets richer with time. If you’re deep into it, I hope some of these resonate.

Addy will be joining Tim O’Reilly on February 12 for an hour-long deep dive into the lessons he’s learned over his career. They’ll also chat about the progress being made in agentic coding workflows, in a conversation guided by questions from the audience. Save your seat. It’s free. 

Then on March 26, Addy and Tim will be hosting the next event in our AI Codecon series: Software Craftsmanship in the Age of AI. Over four hours, they and a lineup of expert practitioners will explore what it takes to build excellent software in the age of AI that creates value for all participants. It’s also free and open to all. Register here.

If you have a story to share about how you’re using agents to build innovative and effective AI-powered experiences, we want to hear it—and possibly feature it at AI Codecon. Get the details at our call for proposals and send us your proposal by February 17.



Read the whole story
alvinashcraft
32 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

How I Built a Full-Stack App in 6 Days with the Help of AI

1 Share

By the end of 2025, I had read a lot about people building entire apps with AI, going from an idea to a product in just weeks or even days. That made me ask myself: how are they doing this?

I have more than ten years of experience in enterprise backend development. Seeing these fast, AI-built apps sparked my curiosity. I use AI in my day-to-day work, but only for small tasks, nothing close to building a full product.

So I thought, why not try it myself and see if it’s really possible?

I started thinking about what I actually wanted to build, and it quickly became clear that it had to be something I’d use myself. I’m a runner and I work in enterprise, so the idea came pretty naturally: a running app, but with a twist – tracking fitness while mapping it onto a corporate-style career journey.

That’s how RunCorp came to life.

AI as a fast MVP builder and UI scaffold

My main stack is Java and Python, but I deliberately avoided what I know best. I wanted something familiar, yet still a bit uncomfortable. For the backend, I chose Laravel, since I’ve done some freelance work with it. For mobile, I picked Flutter, even though I had never built a mobile app before.

I bought a Claude Code subscription and started building. I quickly learned that you can’t just tell it to do something and expect a good result. That realization led me to throw away the very first thing it generated.

What really works is using plan mode. You let the AI analyze the problem, ask you questions, and create a plan. Only after you feel happy with the plan do you let it generate code. It feels more like collaborating with a teammate than typing commands at a tool.

The downside is that this approach burns through tokens. If you don’t want to pay for extra usage, you have to wait a few hours for the limit to reset. Even so, I managed to build a basic MVP in six days.

I started with just the UI and mocked data. By the end of the first day, I had an app with all the MVP screens in place. You could actually “use” the app – move from screen to screen and see something -even though real functionality didn’t exist yet. Everything ran on mocked data. For this kind of work, AI proved very effective at generating basic Flutter layouts like this:

class HomeScreen extends StatelessWidget { 
@override
Widget build(BuildContext context) {
return Scaffold(
 appBar: AppBar(title: Text('RunCorp')),
  body: ListView(
   padding: EdgeInsets.all(16), 
    children: [
     Text(
      'Welcome back',
     style: Theme.of(context).textTheme.headlineSmall,
    ),
    SizedBox(height: 16),
    Card(
     child: ListTile(
      title: Text('Current Level'), 
      subtitle: Text('Senior Runner'),
      ),
     ),
    ],
   ),
  );
 }
}

Nothing complex, but very fast. This is where AI clearly shines: it scaffolds the UI, wires screens together, and handles repetitive setup.

Outdated libraries and testing challenges

After that, I worked screen by screen, building front-end and back-end functionality together. I followed the same workflow every time: create a plan, answer AI questions, refine the plan, answer again, and generate code only after I felt satisfied with the plan.

Here’s the first big problem for someone without much experience: when you use plan mode, the AI asks questions you might not know how to answer. It asks things like: do you want to use OAuth? How do you want to encrypt passwords? Do you want a separate table for user profiles, or do you want everything in one table? You have to make these decisions early. Otherwise, the AI generates something that either doesn’t work or becomes hard to change later.

One concrete backend issue came up during the Strava integration. When I asked the AI for help with Strava authentication, it suggested an existing Laravel package.

At first glance, the suggestion looked reasonable. But after I checked it more closely, I realized the package hadn’t received updates for several years and didn’t support the current Laravel version. It relied on outdated dependencies and failed in a modern Laravel setup.

This example shows where AI falls short. It suggests known libraries, but it doesn’t judge whether teams still maintain them or whether they fit today’s ecosystem.

Instead of forcing the package into the project, I implemented the Strava OAuth flow directly using Laravel’s HTTP client. The core token exchange looked like this:

$response = Http::asForm()->post('https://www.strava.com/oauth/token', [.                                  
    'client_id'	=> config('services.strava.client_id'), 
    'client_secret' => config('services.strava.client_secret'),
    'code'	=> $request->get('code'),
    'grant_type'	=> 'authorization_code',
]);

$data = $response->json();

A similar issue appeared on the frontend. The AI generated Flutter UI that looked great on a large simulator. Once I tested it on smaller phones, problems started to show up. Text became barely readable, layouts overflowed, and some screens turned difficult to use. A typical example looked like this:

Text(
'Weekly Distance',
style: TextStyle(fontSize: 24),
);

Hardcoded font sizes worked fine on larger screens but failed badly on smaller ones. Fixing this required a shift toward responsive design. I improved the layout by switching to theme-based typography:

Text(
'Weekly Distance',
style: Theme.of(context).textTheme.titleMedium,
);
 
LayoutBuilder(
 builder: (context, constraints) { 
  return Text(
   'Weekly Distance', 
  style: TextStyle(
   fontSize: constraints.maxWidth < 360 ? 16 : 20,
   ),
  );
 },
);

The UI technically worked before – it just didn’t work well everywhere. Issues like this only show up when you test on real devices.

When AI-generated code meets real users

After six days, I had an app with enough functionality to be tested by someone. I asked people from my running club if they wanted to try it, and they said yes. I sent them a beta build, and 25 of them started using it. That’s when the real problems began to appear.

After just two days, people reported that the home screen took a long time to load and that the stats screen kept crashing. That was the moment I realized this was the end of just talking with AI. I had to do what I’ve been doing for years: actually debug and review all the code.

The AI-generated backend code looked clean. It passed basic tests and worked perfectly with small datasets, so I shipped it. But once real users arrived, reality hit hard. Requests were timing out, the stats and home pages barely loaded on mobile, database CPU spiked, and every request ran over 20 queries. Classic red flags.

These were problems that wouldn’t have made it to production if I had written the code myself from the start. Experience teaches you where bottlenecks usually hide. AI didn’t optimize anything; it just made things work.

Here’s what the AI-generated backend code got wrong all at once:

  • No eager loading, causing N+1 queries
  • Queries running inside loops
  • Loading entire models into memory just to calculate aggregates
  • No caching at all
  • Missing database indexes
  • Recalculating values that were already stored
  • Doing heavy computation in PHP instead of letting the database handle it

None of these were bugs – they were experience problems.

At that point, I had to step in and do what I’ve been doing for years: rewrite the entire statistics pipeline. Some of the key fixes were:

Eager Loading Instead of N+1

$user = $request->user()->load('profile');

Aggressive Response Caching

return Cache::remember("user_stats:{$user->id}", now()->endOfDay(), 
function () {
    // heavy calculations

});

Query Consolidation (7 Queries → 1)

$stats = DB::table('activities')

    ->where('user_id', $userId)

    ->selectRaw("

         SUM(distance_meters) as total_distance, 
         COUNT(DISTINCT DATE(start_at)) as active_days,
 
         SUM(CASE WHEN start_at >= ? THEN distance_meters ELSE 0 END) as weekly_distance
", [$weekStart])

->first();

GROUP BY Instead of Query Loops

->groupBy('week_num')

Using Stored Values Instead of Recomputing

$longestStreak = $user->profile->highest_streak ?? 0;

Strategic Indexes

$table->index(['user_id', 'start_at', 'distance_meters']);

The results were immediate

MetricBeforeAfter
Queries per request20+7–9
Response time>30s (timeout)300–500ms
Cached responseN/A10–50ms
Memory usage200MB+40–60MB
TimeoutsConstantZero

After these changes, the app finally felt usable in real conditions. The home screen loaded instantly, the stats screen didn’t crash, and the database handled multiple users at the same time. Experience made the difference.

I shipped a working app quickly, but I still had to do a lot of work myself. Coding the basics – the tasks any junior developer can handle – is easy, and that’s exactly where AI excels. It can speed up development, but you need to know when to step in, what to change, what to ask, and how to plan everything properly.

Coding isn’t even the hardest part, at least for mobile apps. You can build the app fast, but you must deploy the backend, set up servers, configure domains, and prepare everything for the app stores.

Then comes the part no one talks about: Google Play beta testing, Apple App Store review, waiting for developer accounts and licenses to get approved, fixing small issues reviewers reject, and updating screenshots, descriptions, and privacy policies. None of this is hard, but all of it takes time.

From the moment I started building to the moment the app went public, the process took just over a month, still very fast.

I realized that today AI becomes incredibly powerful in the hands of someone who knows what they’re doing. For experienced engineers, it can provide a huge speed boost, almost like a superpower. AI acts like a team of juniors handling repetitive work while you focus on high-level decisions.

Someone with solid experience can now build projects that used to require a small team. Can a non-technical person build an app with AI? Yes, to some extent. They can create something that looks decent and runs, but most of the time it will start falling apart quickly.

The main takeaway: the basics still matter. Understanding how things work under the hood makes all the difference. AI doesn’t replace experience, but it amplifies it, and that’s where it delivers real value.

The post How I Built a Full-Stack App in 6 Days with the Help of AI appeared first on ShiftMag.

Read the whole story
alvinashcraft
33 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Jonathan Haidt Strikes Again + What You Vibecoded + An Update on the Forkiverse

1 Share
“If we can’t win on social media, then we definitely can’t win on A.I.,” says Haidt.
Read the whole story
alvinashcraft
33 minutes ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories