Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
153794 stories
·
33 followers

Writers Are Fleeing the Substack Tax

1 Share
A growing number of writers are leaving Substack for alternatives most people haven't heard of like Ghost, Beehiiv, Patreon, and Passport. The reason, writes The Verge's Emma Roth, is the "platform's increased focus on social features as well as a pricing model that puts a chokehold on their business." From the report: Sean Highkin, the creator of the NBA-focused publication The Rose Garden Report, tells The Verge that he makes "significantly more money" after switching from Substack to Ghost last April. "When I first joined up, [Substack] gave me a big push and featured me and funneled a lot of traffic to me, which led to a good amount of growth," Highkin says. "But once I wasn't one of the 'new recruited talent' they could tout, they stopped featuring me and I saw my growth stagnate." Highkin now pays $2,052 per year using Ghost and an add-on called Outpost, compared to $4,968 per year on Substack. The Rose Garden Report's subscriber base has grown 22 percent since the end of 2024, Highkin says. [...] Substack launched in 2017 as a platform that allows writers to create their own newsletters and manage paying subscribers. Unlike some of its biggest rivals, Substack takes a 10 percent cut of total subscription revenue. That tax may not seem substantial at first, but it quickly adds up as creators gain subscribers and begin charging more for their subscriptions. A calculator on Substack's own website estimates that for a newsletter charging $10 per month with 400 subscribers, the total monthly cost -- including the platform's 10 percent cut and credit card processing fees -- would add up to $636. That cost jumps to $15,900 per month with 10,000 subscribers and skyrockets to $79,500 per month for 50,000 members -- nearly $1 million per year. Many Substack rivals charge a flat monthly fee, rather than a commission. Ghost, an open-source platform for blogs and newsletters, starts at $15 per month with 1,000 members for website creation, email newsletter capabilities, and a custom domain. Beehiiv, a creator platform with tools for launching a newsletter, website, and podcast, is free for up to 2,500 subscribers with limited access to certain features, like a built-in ad network, while its other plans vary in price based on subscriber count. A person with 10,000 subscribers, for example, will pay $96 per month for Beehiiv's "Scale" plan. There's also Kit, a newsletter platform that offers a tiered pricing model similar to Beehiiv, costing $116 per month with 10,000 subscribers on its "Creator" plan. It's not just the 10% fee critics are complaining about; they also argue the platform offers limited customization and third-party integrations compared to some of the mentioned alternatives, heavily promotes its own branding and social features, and makes creators more dependent on its ecosystem. Beehiiv founder Tyler Denk argues that creators should be able to build their own brands without the platform taking center stage: "We don't want to take credit for the work of our content creators." While writers can export subscribers, content, and some payment relationships, they cannot take Substack "followers" or Apple-managed iOS billing data with them.

Read more of this story at Slashdot.

Read the whole story
alvinashcraft
8 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Microsoft starts canceling Claude Code licenses

1 Share
Vector illustration of the Microsoft logo.

Microsoft first started opening up access to Claude Code in December, inviting thousands of its own developers to use Anthropic's AI coding tool daily. It was part of an effort to get project managers, designers, and other employees to experiment with coding for the first time, and sources tell me that Claude Code has proved very popular inside Microsoft over the past six months. Perhaps a little too popular, as Microsoft is now preparing to walk back its Claude Code push.

I understand that Microsoft is planning to remove most of its Claude Code licenses and push many of its developers to use Copilot CLI instead. While Claude Code has been …

Read the full story at The Verge.

Read the whole story
alvinashcraft
9 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Microsoft Adds AI Hackathon to VSLive! @ Microsoft HQ

1 Share
Microsoft announced a new VSLive! Microsoft AI Hackathon 2026, an in-person build event at its Redmond campus focused on Microsoft Foundry, Azure OpenAI, GitHub Copilot and agent-based development.
Read the whole story
alvinashcraft
10 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Defense in depth for autonomous AI agents

1 Share

Designing Secure Autonomous AI Agents with Defense in Depth

AI agents are moving beyond assistance and into action. Instead of generating content, they invoke tools, modify data, trigger workflows, and operate across systems with increasing autonomy. This shift changes the security problem fundamentally. When an agent can act autonomously, mistakes propagate faster, blast radius increases, and rollback becomes harder.

Security for agentic AI relies on defense in depth. What changes with autonomous agentic AI is where security decisions matter most. As autonomy increases, the center of gravity moves away from the model alone and toward how agents are assembled, constrained, and governed inside real applications. To build agentic AI applications that can be operated safely at scale, you need to deliberately design how agents are assembled, constrained, and governed within real applications. In return, you increase the likelihood of predictable behavior, controlled blast radius, and the confidence to deploy autonomy in production.

Defense in depth for agentic AI systems

Agentic AI systems are vulnerable to the existing security risks of software systems, and introduce new threat classes: agent hijacking, intent breaking, sensitive data leakage, supply chain compromise, and inappropriate reliance. Any weakness in permissions, data protection, or access control that exists today is amplified when an agent is added to the system.

A useful way to reason about agent security is through the following mitigation layers:

  • Model layer: Influences how the agent reasons through training data, fine-tuning, and refusal behaviors.
  • Safety system layer: Provides runtime protections such as content filtering, guardrails, logging, and observability.
  • Application layer: Defines what the agent can do and how it does it through application architecture, permissions, workflows, and escalation paths.
  • Positioning layer: Shapes how the system is presented to users through transparency documentation and UX disclosure.

Each layer reinforces the others, and no single layer is sufficient on its own. The model layer is probabilistic by nature. The safety system layer observes and intervenes at runtime. The positioning layer shapes perception. But for organizations building agentic AI applications, the application layer is the decisive one because it is the only layer builders fully control.  The application layer translates probabilistic model behavior into deterministic system outcomes. This is also where customers turn generic components into differentiated systems: two organizations can start with the same model and tools and end up with very different security outcomes depending on how they constrain agent behavior at this layer.

Why the application layer matters most when building agentic AI applications

Most organizations build agentic AI applications by combining off-the-shelf models, tools, and business data into systems that perform specific tasks. The application layer is where they decide which actions an agent is allowed to take, which tools and data it can access, how permissions are scoped and enforced, how failures are handled, and when humans must be involved.

Getting these decisions right requires thinking through several specific design patterns. Each one addresses a distinct failure mode. Together, they form the practical expression of defense in depth at the application layer.

Here are some recommended design patterns for building a more resilient application layer for your agents.

Pattern 1: Design agents like microservices

The most consequential application layer decision is action scope: how broadly you define an agent’s responsibilities. A common and dangerous failure mode is the “everything agent,” a single agent with broad permissions, many tools, and loosely defined responsibilities. Every additional tool expands the attack surface. Every ambiguous instruction increases the risk of error or task drift. As autonomy and tools increase, these risks compound quickly.

A more resilient approach is to design agents the way distributed systems have been designed for decades: as carefully scoped components with bounded capabilities. Agents should have isolated permissions, clear interfaces, and narrow responsibilities. More complex behaviors emerge from orchestration rather than from granting a single agent broad authority. Building agents like microservices, with constrained responsibilities and scoped permissions by design, is one of the most effective structural controls available at the application layer.

Pattern 2: Least permissions

Bounded scope defines what an agent is responsible for. Progressive permissioning governs what actions are permitted within that scope. As a rule, permissions should always start at zero (“zero trust”).

For safe design, no actions should be permitted by default. Actions are enabled explicitly, based on role and system needs. Least-privilege and zero-access principles apply to agents just as they do to human users.

Permissions granted loosely at design time become exploitable surfaces at runtime.

In practice, this means every tool call, data access, and external integration an agent can invoke should be the result of a deliberate authorization decision, not an implicit one. The question is not “should we restrict this?” but “have we explicitly permitted this?”

The general rule is to scope capabilities to the duration of a specific task. If task-based limits aren’t feasible, implement time-based limits. Task-focused permissions are preferred because they naturally “expire” when the task completes; temporal permissions help limit blast radius.

Pattern 3: Deterministic human-in-the-loop design

Even well-scoped, well-permissioned agents need a governance backstop for high-stakes decisions. Human-in-the-loop (HITL) review is often discussed as a trust mechanism: a way to keep humans informed. In agentic systems, it is better understood as a governance mechanism: a structural control that prevents agents from self-authorizing consequential actions.

The critical design mistake here is letting the model decide when human review is required. If escalation is left to probabilistic reasoning, an adversarial prompt or an ambiguous instruction can bypass review entirely. A model that reasons its way out of escalating is exhibiting exactly the behavior the escalation mechanism was supposed to catch.

In secure agentic systems:

  • HITL review ideally is enforced deterministically by the application layer, or orchestrator, not delegated to the model.
  • Escalation triggers are defined in code.
  • An orchestrator enforces HITL review triggers.
  • Intervention can occur mid-execution — including during tool calls — rather than only before or after an action completes.

This design removes ambiguity about when review is required, supports auditability for oversight and compliance, and ensures that as agents move toward greater autonomy, the separation between reasoning and enforcement remains intact.

Pattern 4: Agent identity as a security primitive

It is an unfortunate reality that human users are routinely over-permissioned (“give them access to everything”). To implement Pattern 1: Agents as Microservices and Pattern 2: Least permissions, agents must never have the same identity as the user. This sounds obvious, but it requires deliberate design: When an action is taken, you need to know if it was executed by the user, the agent was acting on its own behalf, or the agent acting on the user’s behalf. Each agent must be assigned a unique, verifiable identity which allows assignment of explicit and narrowly scoped permissions, lifecycle controls, and accountability.

Agent identity enables least-privilege enforcement, because you cannot scope permissions to a specific agent if you cannot distinguish that agent from other agents or a human user. It also enables lifecycle governance, because revocation actions won’t be invoked when many agents are affected. Finally, separate agent identity enables meaningful observability, because actions can be traced back to a specific agent rather than being attributed vaguely to “the system.”

 As enterprises manage agent sprawl (with more agents, more deployments, and even more integrations), identity clarity becomes operationally critical. Identity is not a feature you add later. It is a prerequisite for operating autonomous agents responsibly at scale, and it ties together every other application layer pattern: permissioning, escalation, and logging all depend on knowing which agent is acting.

How the Other Layers Reinforce ApplicationLayer Design

Focusing on the application layer does not diminish the importance of the other layers. Instead, it clarifies their roles.

  • The model layer – the model chosen to enable the application – shapes how an agent reasons, but remains probabilistic. It can be tuned toward safer behavior, but it cannot guarantee it.
  • The safety system layer – platform tools like content filters and groundedness detection – compensates for what models alone cannot prevent: it detects anomalies, filters harmful outputs, and fulfills the observability teams’ need to respond when something goes wrong.
  • The positioning layer – how the UI and UX explains that AI is in use, what it can do, and what it can’t do

Each layer addresses failure modes the others cannot fully cover. A strong safety system cannot compensate for an agent with unlimited scope. A well-tuned model cannot substitute for deterministic escalation triggers. The application layer is where the load-bearing decisions are made. The other layers make those decisions more resilient.

Designing for Secure Autonomy

The four patterns described here — agents as microservices, least permissions, deterministic human-in-the-loop design, and agent identity — are mutually reinforcing. Scope containment limits blast radius. Permissioning limits what a contained agent can do. Deterministic escalation ensures that neither scope nor permissions can be circumvented by adversarial input. Identity makes all of it auditable.

The application layer is where customers have the most power to shape how their agent behaves. It is where off‑the‑shelf models become real agentic AI applications. It is where security decisions shape both business value and risk. Defense in depth remains the right strategy. As agents take on more responsibility, the application layer becomes the place where that strategy succeeds or fails.

As organizations deploy more agentic AI systems, the question is not whether agents will make mistakes. They already have and will continue to. The question is whether those mistakes are minimized, identified, and contained. Secure autonomous agentic AI systems are achieved by designing systems where autonomy is bounded by architecture, permissions, identity, and deterministic oversight from the start.

To learn more about Microsoft Security solutions, visit our website. Bookmark the Security blog to keep up with our expert coverage on security matters. Also, follow us on LinkedIn (Microsoft Security) and X (@MSFTSecurity) for the latest news and updates on cybersecurity.

The post Defense in depth for autonomous AI agents appeared first on Microsoft Security Blog.

Read the whole story
alvinashcraft
10 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Conductor: Deterministic orchestration for multi-agent AI workflows

1 Share

Multi-agent AI systems are becoming the default approach for complex tasks:

  • Code review pipelines.
  • Research-then-synthesize workflows.
  • Plan-then-implement loops.

These aren’t single-prompt problems. They need multiple specialized agents coordinating in sequence, in parallel, and sometimes in cycles.

Most frameworks approach this by making the orchestrator itself an LLM—an agent that dynamically plans which agents to call, in what order, and with what inputs. That works when the task is exploratory. But for workflows with known structure (and in practice, many of the most useful workflows do have known structure), dynamic orchestration adds cost, latency, and unpredictability that can work against you.

Conductor is an open-source CLI (MIT license, Microsoft org) that takes a different approach: you define your multi-agent workflows in YAML, and the routing between agents is deterministic. Jinja2 templates and expression evaluation handle conditions and branching. The orchestration layer consumes zero tokens. The structure is fixed at definition time—and that’s the point.

The problem with multi-agent workflows today

We kept building multi-agent workflows—code review pipelines, design document generation, research assistants—and writing the same glue code every time: Python scripts stitching prompt chains, ad hoc retries, manual state between steps, no good way to version-control the workflow itself.

We looked at other tools, such as Microsoft Agent Framework (MAF), Microsoft’s primary SDK for building agents in code, which covers many of the same primitives. Conductor is a different surface for similar patterns: a YAML-first CLI for teams who want to compose agents and tools without writing SDK code. Declared, diffable, and as readable as a CI/CD pipeline.

We also wanted to separate concerns that keep getting mashed together in multi-agent systems:

  • Orchestration should be deterministic and inspectable. Not an LLM making routing decisions.
  • Execution should support multiple providers and models, so you can put a cheap model on triage and a capable one on reasoning.
  • Context flow between agents should be explicit. No implicit conversation bleeding.
  • Human oversight should be a built-in workflow step, not something you bolt on later.

Conductor is the result: YAML workflows, isolated agents, and a routing graph you can see before anything runs.

Key capabilities of Conductor

YAML-defined workflows

Every Conductor workflow is a YAML file that declares agents, their prompts, models, inputs, outputs, and routing logic. Workflows are version-controlled, diffable, and reviewable, the same way you’d treat infrastructure-as-code or CI/CD pipelines.

workflow: 
  name: design-review 
  entry_point: architect 
 
agents: 
  - name: architect 
    model: claude-opus-4.6-1m 
    prompt: | 
      Create a design document for: {{ workflow.input.purpose }} 
    output: 
      file_path: { type: string } 
    routes: 
      - to: reviewer 
 
  - name: reviewer 
    model: claude-opus-4.7 
    prompt: | 
      Review the design at {{ architect.output.file_path }} 
    output: 
      score: { type: number } 
      approved: { type: boolean } 
    routes: 
      - to: $end 
        when: "{{ output.approved }}" 
      - to: architect 

Deterministic routing, zero token overhead

Routing between agents uses Jinja2 templates and expression evaluation. First matching condition wins. A workflow can loop hundreds of times through an evaluator-optimizer cycle without the routing layer consuming any tokens. This is what separates Conductor from dynamic orchestration: the workflow topology is declared, not discovered at runtime.

Mix providers and models per agent

Conductor supports GitHub Copilot and Anthropic Claude as providers, with per-agent model overrides. You can mix them in a single workflow: run claude-haiku-4.5 for classification, gpt-5.2 for research with MCP tool access, and claude-opus-4.6-1m for complex reasoning. Each agent gets its own session with no shared conversation state.

Parallel execution

Static parallel groups run multiple agents concurrently with configurable failure modes (fail_fast, continue_on_error, all_or_nothing). Dynamic for each groups process variable-length arrays in parallel with batched concurrency. Results are aggregated and available to downstream agents through template expressions.

parallel:
- name: researchers
agents: [academic, web, technical]
failure_mode: continue_on_error
routes:
-to:synthesizer

Script steps

Not every step needs an LLM. Script steps run shell commands directly, capturing stdout, stderr, and exit codes into the workflow context. A code review workflow can run pytest between the “implement” and “review” steps. Routes can branch on exit codes. No model invocation, no token cost.

Human gates

Human gate steps pause execution, present options in a Rich terminal UI or the web dashboard, and route based on the response. Approval workflows, review checkpoints, interactive decision points: they’re part of the workflow graph, defined the same way as any other step.

Web dashboard

Conductor includes a web dashboard that visualizes execution in real time. An interactive DAG shows the workflow topology with animated edges for execution flow. Each node is clickable, showing the agent’s prompt, model, token usage, cost, activity stream, and output. Human gates work directly in the browser. Background mode (–web-bg) starts the dashboard, prints the URL, and returns control to the terminal.

Context control

Three context modes control what each agent sees: accumulate (all prior outputs), last_only (just the previous step), and explicit (only named dependencies). The default is accumulated, but for larger workflows, explicit mode cuts token consumption significantly. Being deliberate about what each agent sees turned out to matter more than we expected.

Plugins and workflow registries

Plugins follow the Agent Skills open standard, bundling reusable skills and MCP server configurations that agents can use. Reference them from Git repos or local paths. Workflow registries let teams share and version workflows: configure a registry once, then run workflows by short name.

Safety limits

Max iteration limits and wall-clock timeouts prevent runaway execution. Dry-run mode previews the execution plan without calling any models. conductor validate catches schema errors, missing references, and unreachable agents before anything runs.

Works with your existing tools

Conductor doesn’t replace your editor, CI system, or agent framework. It’s a CLI that reads YAML, calls models, and produces structured output. It plugs into what you already have:

  • MCP servers give agents tool access: web search, documentation lookup, code analysis, anything with a Model Context Protocol server.
  • Shell commands run directly as workflow steps, so your scripts, linters, test suites, and build tools participate without modification.
  • Structured output with JSON schemas means downstream tools and scripts can consume results programmatically.
  • A Claude skill ships in the repository. Point your coding agent at it and it can build workflows for you.

What we learned

1. Determinism is a feature

The most common pushback is “what about dynamic orchestration?” Fair question. If your task needs to restructure itself based on what it discovers, let the LLM decide what comes next. But the workflows we keep reaching for (review loops, research pipelines, plan-then-implement) have known structure. We’d rather have predictability, cost control, and auditability than replanning flexibility. Conditional routing and loop-back patterns cover more ground than you’d expect.

2. Agent isolation pays for itself

Each agent gets its own session, system prompt, model, provider, and temperature. No shared conversation bleeding. This seems like overhead until you’re debugging a workflow where step 4 is mysteriously influenced by step 2’s output. Explicit context flow makes multi-agent systems tractable.

3. Events over logs

The engine uses a pub/sub event system for all output. The terminal renderer, web dashboard, and any future consumers subscribe independently. More work upfront than printing to stdout, but it decoupled the execution engine from the presentation layer in a way that keeps paying off. Adding the web dashboard required zero changes to the workflow engine.

4. YAML is the right level of abstraction

We considered Python APIs, JSON schemas, and visual builders. YAML hit the sweet spot: readable, structured, diffable in pull requests, and familiar to anyone who’s written a GitHub Actions workflow or a Kubernetes manifest.

Open source and ready to use

MIT-licensed, developed in the open from day one.

  • pytest test suite covering the engine, CLI, config validation, providers, and integration scenarios.
  • Ruff for linting and formatting, ty for type checking, both enforced in CI.
  • Runs on macOS, Linux, and Windows.
  • One-line installers (curl | sh and irm | iex) with SHA-256 checksum verification.
  • Self-update via conductor update.

Contributions welcome: provider integrations, workflow examples, plugins, docs, bug reports.

How to start using Conductor today

Install:

# macOS / Linux
curl -sSfL https://aka.ms/conductor/install.sh | sh

# Windows (PowerShell)
irm https://aka.ms/conductor/install.ps1 | iex

Run your first workflow:

conductor run workflow.yaml –input question=”What is Python?”

Visualize it:

conductor run workflow.yaml –web –input topic=”AI in healthcare”

Conductor requires Python 3.12+ and works with GitHub Copilot or Anthropic Claude. The repository has documentation, example workflows, and a getting-started guide.

Multi-agent workflows are becoming infrastructure: repeatable, versioned, shared across teams. We chose deterministic orchestration because for the workflows we build most often, known structure is the whole point.

Close up of female developer hands typing on a laptop keyboard

Start using Conductor

If you’re stitching together agent pipelines with glue code, give Conductor a look.

Conductor is open source under the MIT license at github.com/microsoft/conductor.

The post Conductor: Deterministic orchestration for multi-agent AI workflows appeared first on Microsoft Open Source Blog.

Read the whole story
alvinashcraft
10 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Generative AI in the Real World: Chang She on Data Infrastructure for AI

1 Share

As a pandas core contributor and early Parquet adopter who built AI data pipelines at streaming company Tubi TV, Chang She saw firsthand why the traditional data stack breaks down for AI workloads—and founded LanceDB to fix it. Chang joined Ben Lorica to explain why vector databases are too narrow a solution for modern AI data needs, and what a true multimodal data infrastructure actually looks like. Chang and Ben get into why the Lance file format is quickly becoming the open source standard for multimodal data, how the rise of agents is exploding data infrastructure demands, why open-weight models are the enterprise cost shift to watch in the next 12 months, and more. “Trillion is the new billion,” Chang says, and the enterprises that set up their data infrastructure now for that scale will be the ones that succeed.

About the Generative AI in the Real World podcast: In 2023, ChatGPT put AI on everyone’s agenda. In 2026, the challenge will be turning those agendas into reality. In Generative AI in the Real World, Ben Lorica interviews leaders who are building with AI. Learn from their experience to help put AI to work in your enterprise.

Check out other episodes of this podcast on the O’Reilly learning platform or follow us on YouTube, Spotify, Apple, or wherever you get your podcasts.

Transcript

This transcript was created with the help of AI and has been lightly edited for clarity.

00.35
All right, so today we have Chang She, CEO and cofounder of LanceDB, which you can find at lancedb.com. Tagline is “Build better models faster.” So Chang, welcome to the podcast.

00.49
Hey Ben, super excited to be here.

00.52
All right, we’ll jump into the core topics, but a bit of a background there for our listeners who may not be familiar with you. You worked on pandas—you were a core member of the pandas team. You were very early on with Parquet as well. And at some point, you became convinced that for AI workloads, these former tools that you worked on—Parquet, pandas—were not enough. So what was the moment of realization for you that these traditional tools that were foundational for analytics were lacking?

01.33
Absolutely. So I worked at a company called Tubi TV, which was video on-demand and streaming. So movies and TV. And it was there that I ended up dealing with a lot of I guess what I would call AI data. So we had to have embeddings for personalization, video assets, image assets, audio, text for subtitles and all of those things. All of those did not really fit into the traditional data stack—you know, pandas, Spark, Parquet, and even Arrow. So that was sort of the inspiration for me to start LanceDB.

02.15
And Chang, at this point, do you think that more people are aware of this disconnect between those tools and the kinds of tools they’ll need moving forward?

02.30
When I talk to data infrastructure folks who are building and managing that stack for dealing with this kind of data, there’s broad recognition that something has to be done, that the existing stack is just not sufficient to deal with this data. And what’s more interesting is that this data is also becoming a lot more valuable because of AI.

02.52
So obviously, before you came on the scene, there was this wave of vector stores or vector databases which were optimized for retrieval. So let’s say I’m a listener and all I have is text. Do I need anything beyond the vector database?

03.17
Even if you just have text and you just have text embeddings, the creation of those embeddings and then the management of all of those data assets—your metadata, the actual documents, how to serve that—a lot of that falls outside the purview of a vector database. The vector databases tend to be very narrow solutions for a very narrow problem, whereas something like LanceDB takes a broader view of, “When you have AI data, what are all the things you need to do to it throughout that life cycle of application development or model development? And how do we build a tool and a system that allows you to simplify your life by having one system to do all of the major workloads throughout that life cycle?”

04.13
And by the way, for our listeners, there’s LanceDB and then there’s the open Lance file format, and I wanna ask you about this file format in a second, but you mentioned something about vector databases and you were kind of saying that, you know, they’re not great at creating the embeddings. But Chang, the vector database people, they never really positioned themselves as responsible for creating the embeddings, right? So they just assume that you’ll show up with embeddings.

04.47
That’s right. But even if you take that narrow view, what we find in enterprises today is a lot of folks have an offline generation process in the data lake itself, where they chunk up the documents, then they generate the embeddings, then they have what they call an offline store, then they have to copy-paste that data into a vector database for serving. So there’s a lot of data syncing [and] data movement, so it creates expense and there’s a lot of complexity.

And so that’s the. . . Even for just text-based workloads, even just for pure vector search, that tends to be a big pain point. And then two is vector databases, a lot of times, don’t pay as much attention to the overall retrieval stack, right? If you remember, the task for users is I want to find the right data in my dataset, and vector search is just one technique. You have many different kinds of techniques, full-text search, or even just outside of search. You might have SQL queries that you want to run, filters, regexes, all of that goes into a rich and very accurate retrieval process. And vector databases, in general, do not expand beyond just that simple semantic or vector search.

06.10
So I mentioned the Lance open file format, which. . . I guess the shortcut that people use is like Parquet for AI, but it’s actually both a file and table format. So maybe give our listeners, Chang, a high-level description of the Lance format and why it’s become so popular.

06.33
Lance is what we call a lakehouse format. It is quickly becoming the new open source standard for multimodal data. And what I mean by a lakehouse format is that it spans a couple of different layers. So you mentioned in the beginning a file format. So this is the equivalent in the stack to Parquet, where we would talk about “How do we lay out the data in a particular file?” And at this layer, the innovation in Lance is that it is really, really good for random access without sacrificing any speed and scans. And our files are actually smaller than Parquet for many AI datasets.

The next layer is usually what we call a table format that is occupied by projects like Iceberg and Delta and Hudi today. And [the] Lance format comes in at this layer. We have much better designs, more optimizations for machine learning experimentation, so doing backfills easily, doing two-dimensional data evolution, being able to handle really large blob data like videos and images, and then just being able to do a branching strategy that supports true sort of Git for data semantics that takes the best of Parquet and Iceberg. 

And then finally, there’s a third layer, which is about indexing so that you can have fast scans, fast searches, fast queries. So when you put all that together, that’s what we call the Lance lakehouse format.

08.11
I described Lance as open. Can you kind of clarify what that means, because I actually don’t know?

08.19
Number one is Lance format is open source. It’s Apache 2.0 license. You can find it on our GitHub. We have community governance; [we] have PMCs that are from lots of external contributors. And then I think beyond that, there’s open source and there’s open source, right? I think what Lance format is designed for is a true open architecture as well. So not only is it open source; it also plays really well into the rest of the data ecosystem. 

So for example, when people compare us to Parquet and Iceberg, well, we’re not designed as a head-to-head competitor with Parquet and Iceberg. We will slot into the same Polaris data catalog, or you can have one unified view on all of your datasets, but then under the hood it can be Parquet/Iceberg for BI data and Lance for your AI data. And then Lance itself plugs in natively to Spark and pandas and Polars and DuckDB and any sort of open data tooling that you’re already used to.

09.31
So operationally then, Chang, if I’m a data architect, should I think of Lance as, “OK, so I have Parquet and these table formats like Delta and Iceberg for my structured data. And then if it’s nonstructured, which could mean video, audio, and also text, right? So then I have to bring in this other format, Lance.” Is that operationally what happens in practice?

10.07
Yeah, often what the data infra folks and data engineers we talk to interact with is the tooling, right? So they’re looking at their data pipelines, they’re looking at maybe their Spark jobs or their search applications, and then those are the jobs that actually interact with the underlying storage, for example. And so instead of. . . 

And that data transfer process is actually really easy through Apache Arrow. And most of the time, it’s really just one line of code change. It’s the same Spark code, for example. Instead of writing to Parquet, you’re writing to Lance. And it simplifies your overall data pipeline by bringing all of your tabular data and metadata along with your multimodal data all in the same place and also embeddings.

11.05
And then in terms of workload, you alluded to the fact that the previous-generation vector source, they excelled at something very specific, maybe retrieval. So is Lance equally specialized in the sense that, “All right, Lance is great for X, and X might be, I don’t know, analytics, but it doesn’t excel in other things”? Describe the kinds of workloads that teams that are using Lance are using.

11.39
So very high-level, the summary is LanceDB, our enterprise data platform, excels at helping our customers manage really large-scale AI data. So embeddings for search, adding new, adding new features and extracting new, new columns, enriching their dataset, doing data curation and exploration, and then feeding that to GPUs really quickly for distributed training jobs so that they can get as high GPU utilization and as high auto-flops utilization as they can.

12.20
You’ve used the word multimodal a few times, and I’ve always been a proponent of people really making sure that their data infrastructure is positioned for this multimodal world. But sometimes I question this assumption in the following sense, right? Is multimodality a Bay Area bubble thing? In other words, if I go to the East Coast and talk to, I don’t know, Goldman Sachs or an insurance company, are they still grappling with legacy systems that are mostly structured data? What they want to do is be able to do all this fancy AI stuff now with agents, but still using the old-school data that they have.

13.12
I think when we talk about multimodal data, a lot of times what comes to mind first is video generation, image generation, all of those. Self-driving cars. . . So there’s a lot of high-tech, cutting-edge applications that are multimodal. But I think if you look at more traditional enterprises, they already have a lot of multimodal data. 

So you just mentioned insurance: They have millions of documents and PDFs and contracts lying around. Insurance especially will have top-down views of houses and boundaries so that they can figure out and assess risk a little bit better. The way I think about it is before AI, it’s just really hard to get value out of that data. They just really haven’t paid as much attention.

So it’s kind of like when I clean up my house, what I like to do is just like move all the mess into a back room or storage. And so then I don’t have to think about it, right? My wife yells at me all the time. She opens up the storage and everything kind of falls out. And so I feel like with multimodal data, this is kind of what traditional enterprises have done: They didn’t know what to do with it. They stuck it in some directory in SharePoint or something like that and kind of just like leave it there for storage. But there’s actually a tremendous amount of value and AI is helping them unlock all of that. So I think in the next few years, especially, we’re going to see a lot more attention paid to, “If we can get a lot more value out of this data, how do we actually manage it? How do we work with it? And how do we combine it with the rest of our data stack so that it’s governed within a single entity?”

15.06
The hot thing a few years ago in data infrastructure was the lakehouse, right? Great term we introduced. [laughs]

15.18
I wonder who came up with that one. [laughs]

15.22
Yeah. So you folks are starting to use the term multimodal lakehouse. So compare the status of the lakehouse. . . [The term] is I think now widely used, right? And then now you’re introducing the multimodal lakehouse. So where is the multimodal lakehouse now kind of mature, and where does it still need to do some work?

15.50
Just for the audience who’s not as familiar, the really, really simplified way I think about just a lakehouse is you have all your data in one place in the data lake, and then you have a combined data warehousing layer on top that provides structure, tables, and structured ways to run workloads on all of that data. 

Now, the way we think about multimodal lakehouse is in a couple of different ways. One, the data changes so that you go from purely tabular data or maybe like clickstream data to now all sorts of multimodal data. So from embeddings to all of your multimedia types. So that changes a lot about how you can read and write data efficiently, how you manage that, how you synchronize that with metadata.

Number two is the workloads also are multimodal. You’re not just thinking about running SQL and analytics workloads. You’re now thinking about search. Now you’re thinking about training. Now you’re thinking about feature engineering and “How does your lakehouse interact with GPU clusters?” and all of those things that traditional lakehouses are not very good at.

And then I think the third layer, where the meaning “multimodal” comes in, is traditional lakehouses tend to be good only at batch offline processing. And then if you want to do serving, online processing, you probably need to introduce a sort of an OLTP kind of database or some system that’s primarily for serving. Well, with LanceDB, because of the innovations in the format, you can actually do both at the same time. So the online-offline scenario can also become multimodal in this sense.

17.44
So if I understand what you’re saying, you’re multimodal in multiple senses. So multimodal data types, multimodal workloads, and multimodal kinds of operations. So right now, in the Databricks world, they have—I don’t think they used the word multimodal. If anything, they go back to that HTAP kind of thing, so [a] hybrid transactional analytics kind of processing engine. I think through an acquisition, now they are very good at Postgres. I forget what they call this. [Chang: A lakebase.] So they have the transactions, and they have the analytics. So what you’re saying is that your vision of the multimodal lakehouse has that hybrid transactional analytics, multimodal types of data, and then multimodal workloads. Is that a fair summation? Surely, Chang, certain aspects of what you just described are more fleshed out than others, right? So what areas do you anticipate you folks will be working on hard, in terms of multiple notions of multimodality?

19.16
Number one is actually scale. Scale is actually the biggest driving factor late last year and this year. And a lot of that has been the rise of agents. Because of the rise of agents, data volume and scale, query throughput and scale, and performance and latency requirements, all of those things have just kind of been exploded. And that’s the thing that we find we’re uniquely suited for. And that’s something that we’re pushing a lot on. Oftentimes when we talk to customers, really what we think about is like, trillion is new billion. And we have folks who probably are operating at a thousand times the scale that they were just a year ago or two years ago.

20.22
I guess the hack that people will do for some of these things, Chang, is just let’s put the files in S3 and then use a database somehow. So are you still seeing a lot of people kind of try to do this?

20.39
Yeah, I mean, I think there are a few attempts that [are] doing that. And I think there’s generally a trend because of the data scale, like object storage is kind of the only sort of cost effective and scalable storage backend for a lot of these newer data storage systems. I think where the challenge lies for data infrastructure providers is “How do you actually have scalability and high performance and maintain the cost advantages of S3 and object store?” That is, I think, the difficult challenge. And so we actually have a recent blog article talking about how we do that at 10 billion-vector scale.

At smaller scales, that’s actually really easy. You just slurp up all the data from S3 into some caching system. You can serve it from there in any in-memory system. That’s a really easy problem. There’s tons of open source projects, Lance, for example, that can help you do that pretty effectively. And then the challenge is really at scale. If you have 10 billion vectors, pretty much, your only cost-effective solution is to store that on object storage. Then, you know, imagine the query times if you were just targeting S3 directly. So then indexing challenges and search and caching and all of that, that becomes a big distributed systems problem. So that’s what we solve.

22.16
Like you said, many data engineering and data infrastructure teams are trying to think through, “So what does our infrastructure look like in a world of agents?” right? So imagine—this isn’t happening yet—the equivalent of OpenClaw in enterprise, where a single employee might have 10 of these AI delegates or AI assistants. Some of the things that come up: One, identity management, so access control, identity management. Secondly, maybe some of these AI agents and AI delegates don’t really need anything permanent. They just want something ephemeral. So stand up a LanceDB for a minute and then make it go away. Are these some of the things that you are starting to think of?

23.14
Yeah, so for our cutting-edge customers, that’s already the reality. We specialize a lot in infrastructure for model training, for example. So if you think about features, like a researcher might have, “Hey, I have a feature idea. There’s two input features, each with 10 variants. And then I have some output feature that combines the two.” Well, now I’ve got 100 different variants. So before, there was a limited [number] of variants that I can test as an individual researcher manually. But now I can use agents to run all of that automatically. And I can just go to sleep and it’ll run. Well, now humans can go to sleep, but then the agents are presenting a lot of load on the underlying data infrastructure. This year we’re talking about going from hundreds of queries per second from plain RAG a couple of years ago to a hundred thousand queries per second in this land of agents. 

And then when it comes to security and compliance, there’s a lot of churn in the stack about sandboxing and ephemeral systems. And when we talk about object storage, this is actually a big, even a bigger challenge, right? So if your source of truth is on object store, that’s actually the only way you can make this ephemeral workload work out well so that when you have hot data, you cache it, you serve it for a time, and then that can go away. And then the cache can expire it [to] be replaced by the next hot workload. And you can do that without having to pay for really expensive memory and NVMe for all of your data.

25.04
So the other thing, Chang, that comes up with agents right now, the hot thing that it seems like there’s a gazillion people working on is this notion of memory. So I guess my question to you is, if I have a bunch of agents and then I have a multimodal lakehouse. . . I have a lakehouse and now I have memories. So I have three different systems that I have to maintain. What’s your what’s your guys’ take in terms of agent memory?

25.42
LanceDB open source is actually the main memory plug-in for OpenClaw and a number of other agents like Crew AI, for example. And for a lot of these agent frameworks and harnesses, there’s a couple of different requirements. Number one is just lightweight, super easy to use. LanceDB is the only one where it supports hybrid search; it supports reranking, all these fairly sophisticated retrieval mechanisms, without having to maintain a service.

26.20
Before you continue. . . All right, so this notion of lightweight, right? On the one hand, there’s the notion of multimodal lakehouse and a lakehouse is never lightweight, right? But then, it seems like you folks are positioning yourself also in the DuckDB kind of very lightweight SQLite world. Can you clarify what you mean by lightweight when you are supposedly a lakehouse, right?

26.49
So what I mean by lightweight here is that if you think about it from an agent perspective, it simplifies a lot of things if you don’t have to connect to another service and talk to another system in order to get access to your memory and to retrieve from memory. So that’s what I mean. So the open source, the. . .

27.15
But then you’re large-scale infrastructure. . . So then if I’m a lightweight agent, how can you… This is where I guess I’m a bit confused. Can you clarify, why am I bringing along a big piece of infrastructure if I’m a lightweight agent?

27.37
Right. LanceDB open source is actually very lightweight. So there’s no heavy infrastructure involved. This is why it’s perfect for memory. Because a lot of times, memory is very ephemeral. So you just interact with a session and then when that session is gone, you want to retain all of that. At most you might want to compress some of it and then retain it for downstream historical processing. But most of the time, it’s just gone. You don’t have to think about it. And so that’s what I mean by lightweight. So there’s a version of that. 

And then for large-scale retrieval, you have a large historical corpus, if you’re working in a corporate environment, if you have an agent that’s searching through patent history or something like that, right? And then that’s where the infrastructure comes in. Well, if I have a petabyte of data out there that I need to search through, the embedded library is not going to do. So you need to have a scalable system out there, but it needs to be easy to use. And from an agent perspective, it’s the same interface. So from the agent perspective, it’s just as easy, but there is a scalable system for that large amount of data that’s kind of hidden beneath the surface there. 

I think for agents, that’s sort of just one of the requirements. The other one is having more sophisticated retrieval so that agents can find what they’re looking for. And different agents will want to look for data in different ways. So being able to support all of that without having like a million different plug-ins to do each modality, I think that’s also something very important for agents as well.

29.28
By the way, I was playing devil’s advocate there because I actually use LanceDB every day on my laptop. It can be something that you can use in your laptop just in-memory.

29.42
Yeah. So I think what we find is that when you make it really easy for agents to actually use it, that’s when scale really takes off. The way we’re looking at it is agents are kind of like an ideal gas that if you make it easy for them to use, no matter how much compute you have, no matter how much data and infrastructure you have, agents will expand to fill all of that that you have, right? So what we’ve seen is. . . We talked about growth and creep throughput. And then because of complex agents, there’s compression and latency. Your agents want a hundred-millisecond or like 20-millisecond latencies now. And then we also see a lot of proliferation of data.

One of the largest users in LanceDB told us they’re now managing something like a billion tables. Just because they have so many agents and so much data that they have to manage, like that number of tables within their system. Any computational and data management dimension you can think of, agents will expand to however much capacity you give them.

30.59
So this is a two-part question. Our listeners may not be aware, but for some reason, LanceDB kind of blew up a little more during the launch of OpenClaw. So I guess my two questions are one: How did this OpenClaw community land on Lance? And have you heard back from them, and have they told you what they liked about Lance?

31.32
Yeah, I mean, a lot of that is what we just talked about: It’s lightweight; it’s easy to use the model.

31.39
But how did it happen? How did they land on Lance? Do you know?

31.43
So my recollection was that originally it was a recommendation from Claude or something like that. And I think [Lance] was the only one out there that met the requirements, was embedded, lightweight, sophisticated retrieval. And it can do both in-memory on NVMe local and also on object store.

32.11
Interesting. So since then, has this kind of marriage [with OpenClaw] continued?

32.20
Yeah, we continue to see engagement from the open source community. Our open source continues to grow. I think at the latest, we’re at around 14 million downloads a month across our open source projects. And we’re super excited about working and supporting the open source community on that. What we see now is demand for a more filesystem-like interface. It’s easier for agents a lot of times to interact with a filesystem interface.

Now, I’m choosing my words carefully. I don’t mean a filesystem. I just mean an interface. This is something that we’re looking into—trying to see what it would look like to put a filesystem interface over a LanceDB or Lance format. Based on the usage patterns that we see from agents, this is fairly straightforward to do. So I think if you’re listening and this is something interesting, we’d love to have early users come check it out and test it out with us.

33.29
It’s interesting, actually, as you were talking there, it just dawned on me that this notion. . . These various notions of multimodality that you described earlier actually might be another reason why people landed on Lance. Because there are other vector search systems that you can run in-memory or embedded. If you want to build agents that are more capable moving forward, then the various notions of multimodality that Chang described earlier might come in handy, right?

34.06
Yeah, yeah, absolutely. I will say that like, I’m sort of a. . . There are AI maximalists. I’m sort of a multimodal maximalist. So my prediction is that in five years, multimodal won’t even be a word anymore. It’ll just be data, and it’ll just be multimodal by default. People will just say data, and it’ll be inclusive of all the different modalities. And when we think about data engineering, there won’t be multimodal data engineering. It’ll just be multimodal by default when we say data engineering.

34.37
Interesting, which actually. . . As we’re winding down here, I was going to ask you, If I’m a CxO or an architect at an enterprise, what data infrastructure decision do you think I should bear in mind? Or I guess to put it negatively, what are some of the decisions I can make right now that potentially can hurt my team moving forward in the next year?

35.08
Right, right. So I think we’re already. . . For a lot of early adopters, we see big pain points around new AI data silos. So one pattern, I wouldn’t call it an anti-pattern, but one I would say pain point is if you’re a CIO or CDO or something like that, chances are a lot of your teams within the enterprise have charged forward with their own AI applications and AI stack. And so now the centralized data platform team are faced with maybe like 10 different vector databases that they have to support and maybe five different ways to store the AI data, some in images and some just embeddings and others, many different modalities. So that becomes a big pain point going forward, right? So as companies go from “Let’s try out AI in this particular area” to, I guess, AI transformation, having large swaths of the enterprise be AI-assisted or AI-native, that becomes a big pain point. 


I think if I were a CIO or a CEO or CTO at a larger enterprise, I would be looking forward a little bit to think about how do I set up all of my teams across the enterprise for success so that one, “How do I allow them to charge forward very quickly and iterate very quickly without presenting this crazy, untenable challenge on the central platform team?” So that’s what I would be thinking of. That’s actually. . . At LanceDB, that’s what we’re building for.

37.05
If your thesis is multimodal data matures over the next few years, and so do agents and everything that comes with agents, including memory, what does the data stack look like in a few years?

37.22
In broad strokes, the base layers are not going to change all that much. I think the infrastructure layer stays roughly the same. There’s going to be object storage. There’s going to be a storage layer. And then the compute layer will start to change. 

37.49
Ray. [laughs]

37.52
What I think we’ll see is that the middle layer of data tooling will start to melt away a little bit because of agents.

38.04
Define data tooling.

38.07
I don’t want to name names, but I think there’s a lot of [what] I would call developer middleware for data where it’s neither the infrastructure layer nor is it the layer that’s interfacing with agents and users directly, right? That middle layer, I think will melt away a little bit or at least be very much refactored. So there’s going to be a lot of churn in that. It’s going to be interesting to see what shakes out. I think what will happen is that agents will continue to push that layer down, and agents will want to get as close to the base layer as possible. 

If you look at this middle layer, there’s really two things that they’re providing. One is a precanned data model for how their users think about the problem, right? So they built that on top of the base infrastructure. So they would build that on top of LanceDB, for example. And then the other thing that they have in this middle tier right now is user interaction, right? The combination of the two is how they capture user workflows. And that’s the core of that. I think what happens in the future is that that UI workflow layer will largely go away and be replaced by agents.

But useful data models will still be useful, and they’ll still stay. Yes, you can have agents directly talk to random bits on S3, but why waste all that intelligence? It’s not worth the token cost. A well-formed data model is the right base layer for agents to interact with. And so I think that’s what we’ll see, is that melting away and reformatting of that middle layer. And I think this is something when I talk to data builders and AI infrastructure builders today, I think we’re all seeing that all at the same time.

40.22
What I describe to people right now as kind of the forward-looking stack has two main parts: So one, you have the multimodal lakehouse built around Lance, LanceDB, and the Lance format. And then you have the AI compute layer, which I call the PARK stack, so PyTorch, AI foundation models, Ray, and Kubernetes. So PARK stack here, and then your lakehouse will be around Lance and the Lance format. I see that quite a bit actually. I definitely see the PARK stack, PyTorch, Ray, Kubernetes. And now I’m starting to see more and more people talking about Lance and Lance format. Do you think of these as complementary or what?

41.16
Yeah, yeah, absolutely. I think we have close relationships with Ray and Spark and really like native-level integrations. And also PyTorch, right? I don’t think that’s going away. Those are either like. . . PyTorch is essentially interacting with developers directly, whereas Spark and Ray are very much infrastructure layer, so I don’t think those things are going anywhere. Kubernetes is definitely still around.

41.51
Yeah, yeah, yeah, yeah. And so what big trend are you paying attention to right now that we haven’t yet talked about? This is how we close.

42.08
What’s been really interesting that we didn’t talk about is the rise of open source models. And I think that’s going to have a big impact, maybe starting next year or even the remainder of this year. Enterprise AI. [Ben: Open weight.] Open-weight models. That’s correct. Yeah.

42.35
Who’s the source? Because right now the main source is China for the better ones. And I still see a lot of hesitation for enterprise teams to adopt such models. I actually just wrote a short post about this. Basically the perception seems to be that while the open-weight models from China are closing the gap, there is still a gap, and there’s structural reasons why there’s a gap. So one is the Chinese seem to be benchmaxxing. You know, they’re optimized for the benchmark, so not real workloads. And then secondly, there is a compute challenge, which makes iteration for them more challenging. So whereas the labs here may update their models every three or four months, the Chinese have to wait six months. And then finally, the data pipelines and the investment in data pipelines is just not the same as you would see at, for example, Gemini, Anthropic, and OpenAI. They’re licensing data from all over the place. The Chinese labs tend to do distillation, which means. . . When you’re doing distillation, your cap is basically the model you’re distilling from.

And then there’s the flywheel—OpenAI and Anthropic and Gemini have a lot of users, so therefore they get better as more users interact with them. . .

44.20
That’s right. Don’t forget the open-weight models in China are also. . . [cross-talk] Here’s the way I think about it, right? So I think as AI adoption grows exponentially within enterprises, they are going to be extremely motivated to invest in their own inference on open-weight models, right? Just because there’s such a drastic cost in tokens.

Because of that economic incentive, I think there’s going to be a lot more incentive for companies to create better open-weight models. If you look at the open-weight models in China, one, the fact that they can create open-weight models of this quality on really limited hardware is really telling. So a team in the US theoretically should be able to create much better quality open-weight models because of that.

Number two, I don’t think the distillation argument is actually true. If you look at the report that Anthropic threw out, right, like if you look at the numbers of how much distillation they accused DeepSeek of doing, it’s actually not that much. It’s basically negligible, right? Like MiniMax is a legit big offender, but DeepSeek, basically, didn’t really do that much. I don’t think distillation is a big factor in the quality of open-weight models anymore.

So then there is a remaining gap in quality. Maybe there’s a three- to four-month gap between open-weight models and SOTA. But what’s interesting is the experiments that people have done is, open-weight models, one, are cheaper, and they’re much faster. So if you have a coding agent task, you can do a one-shot with SOTA models or you can do multiple rounds and iterations on an open-weight model, which gets you the same quality, still lower total costs and tokens, and you finish around the same time, or you actually might finish faster. So then I think a lot of that is lack of familiarity and a skill gap, where if you have to do a few shots, that complexity is way more than what people want to think about right now.

So the pattern today is you go into production with SOTA models, then you reach some cost-prohibitive moment where you say, “OK, what are the areas where there’s not requirements for really heavy intelligence but still have a lot of token costs, and then I can replace [them] with open models?” And I think that will happen more and more across enterprises. So I think that’s going to be a big trend to watch this year and next.

47.18
And actually, as you mentioned, my conversations are a product of the fact of the stage of adoption, which is basically [the] early stage of adoption. I will deploy with state-of-the-art models because I’m early. And then as my agent or my application gets used, then I start paying attention to cost, latency, and all these. And then I can worry about swapping the models then. And hopefully, we will have some Western labs start cranking on open-weights models again, right? It seems like Meta is off the table. The Gemma folks produce models, but they’re meant for on-device, I think. Maybe there’s an opening there for someone to start up something that…

Especially as people become more clever in terms of training and tools like LanceDB make training more affordable somehow. We’ll see what happens. And with that, thank you, Chang.

48.24
That’s right. Thank you, Ben.



Read the whole story
alvinashcraft
10 minutes ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories