Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
154208 stories
·
33 followers

Neo Automations: Scheduled Tasks Shipped as Pull Requests

1 Share

Recurring platform work slips: provider versions fall behind, drift accumulates between checks, and the quarterly audit keeps getting pushed back another month. Pulumi Neo can now run any task on a cadence you set, opening a pull request for each run.

Automations in action

Your platform team runs stacks across staging and production, and the AWS, GCP, and Kubernetes providers keep shipping new versions. Nobody has time to bump them stack by stack.

You write one automation:

Every Monday at 8 AM, check the infra/ project for stacks where the AWS, GCP, or Kubernetes provider is more than two minor versions behind. For each one, bump the out-of-date provider, run pulumi preview, and open a PR if the preview is clean.

Monday morning, Neo runs the prompt. It finds three stacks behind on the AWS provider, edits each program, runs preview, and opens a PR for each clean run. You review the PRs like you would any other dependency bump, merge them, and Neo runs again next Monday.

What automations are for

The launch includes four built-in templates: a provider freshness check, an encryption audit, a backup audit, and an activity digest. You can also skip the templates and write your own prompt.

Pick from hourly, daily, weekdays, or weekly cadences. Each automation gets its own page in the Automations tab, where you can edit the prompt, change the schedule, run it once on demand, or pause it.

Safe by default

Automations default to two settings that fit recurring work. Approval mode is auto, so a run doesn’t wait for human confirmation between steps. Permission mode is read-only, so a run can read state and propose changes through pull requests but can’t apply changes directly. You can override either default per automation.

How automations fit with the rest of Neo

A scheduled task uses the same context as an interactive Neo task. Custom Instructions at the organization and project level apply, so a scheduled run respects the same naming conventions, tagging policies, and architecture rules your team has written down.

MCP integrations and CLI integrations work in scheduled tasks the same way they work in interactive ones, so a weekly drift check can query AWS through the aws CLI, file Linear issues, and link related PagerDuty incidents. Scheduled tasks also run with the RBAC permissions of the user who scheduled them, checked at run time; if permissions change between scheduling and execution, the new permissions apply.

Try it out

Open Neo in Pulumi Cloud, switch to the Automations tab, and pick a template or write your own prompt. The automations docs cover the form, scheduling options, and per-automation overrides.

Setting up a scheduled task for Pulumi Neo

Today’s launch is part of a bigger story. Read our launch-day piece on the agentic infrastructure era for the broader vision, and the Neo Integrations post for the third-party tools and CLIs your automations can use.

As always, we’d love to hear what you think — and if you have any suggestions for automations that’d make Neo even better, file an issue in pulumi-cloud-requests.

Read the whole story
alvinashcraft
just a second ago
reply
Pennsylvania, USA
Share this story
Delete

How to Implement RBAC with Terraform & Best Practices

1 Share
Learn how to implement RBAC with Terraform on AWS, Azure, and Google Cloud, plus best practices for roles, scopes, and access at scale.
Read the whole story
alvinashcraft
just a second ago
reply
Pennsylvania, USA
Share this story
Delete

Visual Studio Code 1.122

1 Share

Learn what's new in Visual Studio Code 1.122 (Insiders)

Read the full article

Read the whole story
alvinashcraft
23 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

AI Cost Visibility Before the Invoice: How to Trace, Measure and Optimize Token Spend

1 Share

The Cost Visibility Problem

Most AI teams don’t realize they’ve overspent until the invoice arrives. The math seems simple—you know the per-token rate, you estimate average usage and you multiply. But production agent behavior introduces compounding factors that make those estimates wildly inaccurate: retries, context window growth, multi-step reasoning chains, automated evaluations and framework-level overhead that’s invisible without proper instrumentation.

The real problem isn’t just cost—it’s the lack of visibility into what caused the cost.

The Spreadsheet Model

When evaluating an LLM provider, the cost model looks deterministic:

Monthly cost = (avg_input_tokens + avg_output_tokens) × per_token_rate × monthly_invocations

For a simple weather agent: ~500 input tokens + ~200 output tokens per query, at GPT-4.1-mini rates ($0.40/1M input, $1.60/1M output), running 10,000 queries/month = roughly $3.60/month. Budget approved.

The Production Reality

In production, that same agent exhibits behavior the spreadsheet never captured:

  1. Context accumulation - each tool result is appended to the conversation context. A weather agent that calls one tool adds ~300 tokens of tool output to every subsequent LLM call. An agent with 5 tools might accumulate 2,000+ tokens of context before generating the final response.
  2. Retry amplification - most agent frameworks implement automatic retry logic. LangChain’s create_agentwill retry on parsing failures. If the LLM returns a malformed tool call 20% of the time, you’re paying 1.2x the expected token cost - and the retry itself includes the full accumulated context.
  3. Multi-step reasoning - ReAct-style agents loop: Think → Act → Observe → Think → Act → Observe → Final Answer. Each iteration sends the entire conversation history. A 3-iteration agent sends roughly 3 × (system_prompt + accumulated_context + new_thought) - not 3× the base cost but more like 5-7× due to context growth.
  4. Framework overhead - LangChain, LlamaIndex and other frameworks inject system prompts, format instructions and intermediate parsing prompts that aren’t visible in your application code. These add 200-500 tokens per call that never appear in your cost estimate.
  5. Invisible evaluations - platforms that run automatic quality evaluations consume tokens for the judge LLM call. If evaluations run on every trace, they effectively double your token spend.

The result: that $3.60/month estimate becomes $25-40/month in practice. At enterprise scale with multiple agents, the gap between estimated and actual spend can reach thousands of dollars monthly.

Why This Becomes Urgent with Usage-Based Billing

This problem is about to get significantly worse. The AI tooling industry is rapidly shifting from flat-rate to usage-based pricing, making every token a direct billing event.

GitHub recently announced that Copilot is moving to usage-based billing starting June 2026. The reasoning is explicit: “a quick chat question and a multi-hour autonomous coding session can cost the user the same amount” under flat pricing - so they’re replacing premium request counts with “GitHub AI Credits” consumed based on token usage (input, output and cached tokens) at published API rates per model.

This isn’t an isolated decision. It reflects a structural reality: as AI tools become agentic - running multi-step sessions, invoking tools, iterating across codebases - the cost variance between “light” and “heavy” usage becomes too large for flat pricing to absorb. Providers are pushing that variance downstream to users.

The implication for engineering teams is clear: in a token-based billing world, cost optimization requires the same granular visibility as performance optimization. Every retry, long context window, tool call, model switch and evaluation becomes a measurable billing event. You can’t optimize what you can’t measure and you can’t measure token economics with infrastructure metrics.

What Teams Need for AI Cost Visibility and Management

Before choosing any tool or platform, it helps to define what metrics and dimensions actually matter for AI cost control. These are the building blocks of cost visibility regardless of implementation:

Token Metrics

  • Input, output, total and cached tokens - the raw cost drivers behind every LLM call. Cached tokens matter because many providers price them differently (often at a discount) and knowing your cache hit rate affects cost projections.

Cost Dimensions

  • Cost by model and provider - shows where routing decisions affect spending. If 90% of cost comes from one model, that’s where optimization has the highest leverage.
  • Cost by request, trace, span, workflow, service and agent - progressively broader views from a single LLM call up to an entire service, enabling drill-down from aggregate anomalies to root causes.
  • Cost by customer, team, environment, release and experiment - attribution dimensions that answer, “who or what caused the spend?” rather than just “how much did we spend?”

Behavioral Metrics

  • Retry count and agent iteration count - expose hidden cost amplification. A retry rate of 20% means you’re paying 1.2x what you expected, compounded by context size.
  • Tool calls and retrieval steps - each tool invocation may trigger additional LLM calls or add context that inflates subsequent calls.
  • Context growth across a conversation or workflow - the silent cost multiplier. If context grows linearly with conversation turns, cost grows quadratically.

Quality and Efficiency Metrics

  • Evaluation usage and judge-model cost - quality checks that use LLM judges have their own token cost, which can rival or exceed the primary inference cost if unchecked.
  • Latency, error status and failed/partial responses - failed responses still consume tokens. High latency may indicate retries or queuing that affects both cost and user experience.

Meta-Metrics

  • Observability usage and cost - the observability layer itself should be measurable. If you can’t quantify what observability costs you, it may become an uncontrolled expense.

When Observability Itself Becomes a Cost Problem

Teams need observability to manage AI cost but the observability layer itself needs to be transparent, predictable and intentional.

A recent Reddit post from a developer using Azure AI Foundry illustrates the issue:

“I am noticing very high, unexpected charges coming from ‘Observability’. I do not need these logs, metrics or trace data right now and my main goal is to stop these charges completely.”

The root cause: Azure AI Foundry enables playground evaluations by default (which consume LLM tokens and are billed) and automatically configures Application Insights tracing for hosted agents. The developer was being charged for observability features they never consciously enabled.

A Microsoft PM confirmed the fix: navigate to the agents playground, select metrics in the upper right and unselect all evaluators. The developer had to disable all observability to stop the charges—trading cost visibility for cost control.

This creates a fundamental tension: you need observability to control AI costs but if observability itself is an uncontrolled cost with hidden defaults, it becomes part of the problem. The solution requires an observability platform where:

  • Instrumentation is explicitly opt-in (nothing runs unless you add it to your code)
  • Pricing is predictable and based on discrete units, not data volume
  • The observability model should be predictable and ROI-positive, helping teams reduce avoidable AI spend without creating a new cost surprise

Why These Metrics Matter

Understanding what each metric tells you - and what decisions it enables - is the difference between collecting data and controlling cost:

  • Token counts show the raw cost driver. If you don’t know how many tokens a workflow consumes, you can’t estimate or optimize its cost.
  • Model and provider data shows where routing decisions affect spend. Switching from a flagship model to a smaller model for classification tasks can reduce cost 10-50× for that step with minimal quality impact.
  • Trace-level cost shows which step in a workflow created the spike. Without this, you know that cost went up but not why.
  • Tags (customer, release, environment, experiment) make cost attributable. They turn “we spent $4,000 this month” into “customer X’s workflow costs 8× more than average because of context length.”
  • Retry and iteration counts expose hidden cost amplification. An agent that retries 3 times on 10% of requests is silently spending 30% more than expected on those requests.
  • Evaluation metrics prevent quality checks from becoming invisible spend. If your judge model runs on every trace and costs $0.02 per evaluation, that’s $200/month at 10,000 traces - potentially more than the inference cost you’re trying to optimize.
  • Latency helps teams optimize cost without degrading user experience. A cheaper model that adds 2 seconds of latency may not be an acceptable tradeoff but you need both metrics to make that decision.

Vendor-Neutral Workflow for Tracing, Observing and Optimizing AI Cost

Before introducing any specific tool, here is the general process teams should follow to move from reactive invoice surprises to proactive cost control:

  1. Instrument - Capture AI requests, model calls, tool calls, retrieval steps and evaluations. Every step that could consume tokens or trigger billing should emit telemetry.
  2. Capture - Record token counts, model metadata, latency, status and cost estimates for each instrumented operation.
  3. Attribute - Add tags for customer, release, environment, team and experiment so cost can be sliced by any business dimension.
  4. Baseline - Establish a cost baseline for normal operations. Without a baseline, you can’t distinguish a spike from expected variance.
  5. Monitor - Watch for spikes or regressions after deployments, prompt changes, model switches or traffic shifts.
  6. Investigate - Drill into expensive traces to identify the root cause: was it a retry loop, context bloat, a model routing error or an evaluation storm?
  7. Optimize - Fix the identified problem: optimize prompts, adjust routing, cap retries, manage context windows, prune unnecessary tool calls or limit evaluations.
  8. Validate - Confirm that cost improved without hurting quality or latency. Cost optimization that degrades user experience isn’t optimization - it’s a tradeoff that needs explicit approval.

This loop takes your cost discovery time from “30 days (when the invoice arrives)” to “same day (when the trace appears).”

Progress Observability as a Practical Example

Here is what this framework looks like when implemented using Progress Observability. The sections below demonstrate each capability as a concrete example of the general principles.

Instrumented Weather Agent - Example of Capturing Telemetry

A real Python agent instrumented with the Progress Observability SDK, using LangChain with OpenAI to answer weather questions:

importos
fromdotenv importload_dotenv
fromlangchain_community.utilities importOpenWeatherMapAPIWrapper
fromlangchain_community.tools importOpenWeatherMapQueryRun
fromlangchain_openai importChatOpenAI
fromlangchain.agents importcreate_agent
fromprogress.observability importObservability, ObservabilityInstruments
fromprogress.observability importagent, workflow, task, tool

load_dotenv()

os.environ.pop(
"SSL_CERT_FILE", None)

# Initialize observability - explicitly opt-in, called before LLM usage
Observability.instrument(
   app_name
=os.getenv("OBSERVABILITY_APP_NAME"),
   api_key
=os.getenv("OBSERVABILITY_API_KEY"),
   trace_content
=True,
   instruments
={
        ObservabilityInstruments.OPENAI,
        ObservabilityInstruments.LANGCHAIN
   },
   additional_tags
=["production", "release:2.5.1"]
)

model
=ChatOpenAI(
  api_key
=os.getenv("OPENAI_API_KEY"),
  model
="gpt-5.4-mini"
)

# Setup Tools
weather_api
=OpenWeatherMapAPIWrapper(
    openweathermap_api_key
=os.getenv("OPENWEATHERMAP_API_KEY")
)
weather_tool
=OpenWeatherMapQueryRun(api_wrapper=weather_api)
tools
=[weather_tool]

# Create Agent
lang_agent
=create_agent(model, tools=tools)

Manual Instrumentation - Example of Cost Attribution

Auto-instrumentation captures LLM calls and framework operations but your own business logic - data pipelines, custom tools orchestration—needs explicit decoration to appear in traces with cost attribution:

@tool(name="weather-lookup")
def fetch_weather_data(city: str) -> str:
    """Fetch raw weather data from OpenWeatherMap API."""
    return weather_api.run(city)


@task(name="normalize-weather", attributes={"team": "ml"}, tags=["experiment-a"])
def normalize_weather_data(raw: str) -> dict:
    """Transform raw weather string into a structured dict."""
    return {"raw_report": raw, "source": "openweathermap", "format": "normalized"}


@workflow(name="data-pipeline", version=2)
def retrieve_weather_context(query: str) -> dict:
    """Retrieve and normalize weather data for a given query."""
    raw = fetch_weather_data(query)
    return normalize_weather_data(raw)


@agent(name="weather-agent")
def handle_weather_request(query: str) -> str:
    """Top-level agent handler that orchestrates the full weather request."""
    context = retrieve_weather_context("London")
    result = lang_agent.invoke({
        "messages": [{"role": "user", "content": query}]
    })
    return result["messages"][-1].content


# Run with proper shutdown to flush telemetry
try:
    response = handle_weather_request("What is the weather in London?")
    print(response)
finally:
    Observability.shutdown()

What This Captures - Example of Trace-Level Cost Granularity

Each decorator (@agent, @workflow, @task, @tool) creates a span with a specific kind. The auto-instrumentation adds spans for every LLM call with token counts and model information. Together, the trace tree for this agent looks like:

▼ agent: weather-agent
  ▼ workflow: data-pipeline (v2)
    ▼ tool: weather-lookup
    ▼ task: normalize-weather [tags: experiment-a]
  ▼ llm_call: ChatOpenAI.chat (model: gpt-5.4-mini, 364 tokens, $0.0004)
  ▼ tool: OpenWeatherMapQueryRun
  ▼ llm_call: ChatOpenAI.chat (model: gpt-5.4-mini, 128 tokens, $0.0002)

Every span records: duration, token count (for LLM calls), model name and estimated cost. This is the granularity needed to answer “why did costs spike?” - you can see exactly which step consumed tokens and whether it was expected.

Cost Analytics Dashboard - Example of Aggregate Cost Visibility

The Cost Analytics Dashboard aggregates token usage and costs across your organization, calculated server-side by the Collector (which maps model names to current per-token pricing - meaning costs stay accurate even as providers change rates).

The dashboard surfaces two critical dimensions:

  • Cost by model - immediately see that gpt-5.5 accounts for 90%+ of spend while gpt-5.4-mini is orders of magnitude cheaper. This drives model routing decisions: can you use the cheaper model for initial reasoning and reserve the expensive model for final responses only?
  • Cost by service/application - attribute spend to specific agents and workflows. Your “weather-agent” costs $0.0092/day while your “document-processor” costs $4.30/day. Now you know where to focus optimization efforts.

The Cost Analytics dashboard breaks down spending by model and by service, with totals for cost and token usage over any selected time range.


This is the view you check weekly (or daily during rollouts) to catch cost regressions before they hit the invoice.

Trace Explorer - Example of Trace-Level Cost Investigation

When the Cost Analytics dashboard shows a spike, the Observations page lets you drill into individual traces to find the root cause.

The Observations page lists all traces with span kind, model, tokens, cost and status. Click any trace to expand the full span tree.

Each trace shows the full span tree with token counts and costs per span. You can filter by: - Time range (isolate the spike window)

- Application name

- Tags (e.g., customer:acme, env:production)

- Success/failure status

For cost investigation, the typical workflow is:
1. Dashboard shows Tuesday had 40% higher cost than Monday

  1. Filter Observations to Tuesday, sort by token count descending
  2. Find the expensive traces - maybe a subset of users triggers a 5-iteration agent loop
  3. Open the trace, see that iteration 4 and 5 add 8,000 tokens with no useful output
  4. Fix the agent’s termination logic, deploy, verify cost drops Wednesday

LLM Requests - Example of Real-Time Token Monitoring

The LLM Requests view provides a live stream of individual LLM calls. Each entry shows model, provider, token count and cost. This is your real-time cost monitor during deployments - if a new prompt template increases average tokens per call from 400 to 1,200, you’ll see it immediately, not on next month’s invoice.

Tag-Based Attribution - Example of Customer/Release/Environment Cost Breakdown

Tags enable multi-dimensional cost analysis. The SDK supports three levels:

# 1. Global tags - applied to ALL spans
Observability.instrument(
    app_name="weather-agent",
    api_key="...",
    additional_tags=["production", "release:2.5.1"]
)

# 2. Scoped tags - applied within a context block
from progress.observability import propagate_attributes

with propagate_attributes(tags=["customer:acme-corp", "request:req-abc-123"]):
    result = handle_weather_request(query)
    # All spans (including LLM calls) inside this block get these tags

# 3. Decorator tags - applied to a single function's span
@task(tags=["cohort-a", "experiment:new-prompt"])
def my_function():
    ...

With customer-level tags, you can answer: “Which customers trigger the most expensive agent paths?” With experiment tags: “Did the new prompt template reduce or increase token consumption?” With release tags: “Did v2.5.1 introduce a cost regression vs v2.5.0?”

All three levels merge and deduplicate automatically. Each tag is limited to 200 characters.

SDK Instrumentation - Example of How Teams Capture the Required Telemetry

Every SDK parameter can be overridden via environment variables, enabling the same code to run across dev/staging/prod with different cost configurations:

export OBSERVABILITY_APP_NAME="weather-agent"
export OBSERVABILITY_API_KEY="ac_p_001_..."
export OBSERVABILITY_ENDPOINT="https://collector.observability.progress.com:443"
export OBSERVABILITY_TRACE_CONTENT="true"

This means you can:

- Use trace_content=True in development (full prompt/completion capture for debugging)

- Use trace_content=False in production (reduces telemetry size, still captures token counts and costs)

 - Tag environments differently for cost comparison

For teams using .NET with IChatClient from Microsoft.Extensions.AI:

using Microsoft.Extensions.AI;
using Progress.Observability.Extensions.AI;

IChatClient chatClient = new OpenAI.Chat.ChatClient("gpt-4.1-mini", openAIApiKey)
    .AsIChatClient();

chatClient = chatClient.AddObservability(options => {
    options.AppName = "Weather Agent";
    options.ApiKey = "ac_p_001_.....";
    options.RecordInputs = true;
    options.RecordOutputs = true;
    options.AdditionalTags = new List<string> { "production", "v2.1.0" };
});

// Tool observability for cost attribution on tool calls
ChatOptions chatOptions = new() { Tools = [..tools] };
chatOptions.AddToolObservability();

The .NET SDK provides the same cost visibility—every chat completion and tool invocation generates a span with token counts, model name and cost calculated server-side by the Collector.

Units and Pricing - Example of Making Observability Usage Predictable

The Progress Observability Platform prices usage in units, which keeps the model straightforward and predictable: - 1 telemetry span = 1 unit - 1 evaluation = 2 units, since the judge LLM generates internal spans

The free tier includes 20,000 units/month, 1 seat and 7 days of data retention. In the weather agent example above, a single execution produces about 6 spans total (agent + workflow + task + tool + 2 LLM calls), which means each run consumes 6 units. At that rate, the free tier supports roughly 3,333 agent executions per month - enough for development, testing and smaller production workloads.

The key characteristic of this model is that cost is fixed per span, regardless of how much content that span contains. A span carrying a 10,000-token prompt still costs the same 1 unit as a span carrying a 50-token prompt. That removes the perverse incentive common in traditional APM systems, where richer AI traces become disproportionately expensive to observe - exactly the failure mode highlighted by the Azure AI Foundry example.

Shutdown and Telemetry Flushing

Always call shutdown before your process exits so that buffered spans (including cost-critical token count data) are flushed:

try:
    run_agent()
finally:
    Observability.shutdown()

Without this, the last batch of spans may be lost - especially in short-lived processes like serverless functions or CLI tools. Lost spans mean lost cost data, which defeats the purpose of instrumentation.

Closing Thoughts

AI cost surprises are visibility problems. The per-token rates are published and stable. What’s unpredictable is agent behavior—retries, context growth, multi-step reasoning, automated evaluations - and that behavior is invisible to traditional monitoring. Teams need trace-level cost context before the invoice arrives.

The workflow is straightforward: instrument, attribute, baseline, monitor, investigate, optimize, validate. Any platform that gives you token-level cost data per trace, per span and per tag will close the visibility gap. Progress Observability is one way to put that into practice - with explicit opt-in instrumentation, predictable unit-based pricing and trace-level cost visibility from the first call.

If you want to explore this approach, get started free at telerik.com/ai-observability-platform or reach out to discuss how cost visibility fits your team’s workflow.

Read the whole story
alvinashcraft
28 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Why iframe PDF Embedding Fails and How to Fix It with JavaScript PDF Viewer

1 Share

Why iframe PDF Embedding Falls Short in Modern Web Applications

TL;DR: Iframe PDF embedding works for simple viewing, but breaks down in real web applications. This article explains why browser-based PDF rendering fails for search, navigation, annotations, security, mobile usability, and performance ,and how a JavaScript PDF viewer solves these limitations with reliable, interactive, and scalable document experiences.

Embedding a PDF in a website using an <iframe> is quick and convenient. For simple, read‑only use cases like displaying a static document on a desktop page, it can be enough.

But modern web applications rarely use PDFs that way.

In real products, you don’t just view documents. They search through them, jump between sections, annotate pages, fill out forms, and review large files across desktop and mobile devices. As soon as those expectations come into play, iframe-based PDF embedding begins to show serious limitations.

This article breaks down why iframe PDF embedding fails in production-grade applications and explains how Syncfusion® JavaScript PDF Viewer addresses those gaps for document-heavy workflows.

PDFs in real applications are interactive, not passive

In SaaS platforms, enterprise tools, and internal dashboards, PDFs are part of everyday workflows:

  • Support teams search manuals while handling live tickets
  • Legal teams navigate contracts and add review comments
  • Finance teams scan large reports without downloading files
  • Users expect the same responsiveness on mobile as on desktop

These workflows demand more than basic rendering. They require search, structured navigation, interaction tools, and consistent performance areas where browser-native PDF handling struggles.

5 Reasons iframe PDF embedding breaks down in practice

1. The browser controls rendering, not your application

When a PDF is embedded using an iframe, rendering is delegated entirely to the browser’s native PDF engine. That means:

  • Behavior varies across Chrome, Firefox, Safari, and Edge
  • Mobile browsers often refuse to render PDFs inline
  • Browser updates can change PDF behavior without warning
  • Developers have no control over loading strategy, UI, or rendering logic

Once rendering lives outside your application, so does consistency.

2. Navigation and search capabilities are inconsistent or missing

Large PDFs are unusable without proper navigation. With iframe embedding:

  • Full‑text search is unreliable or unavailable
  • Bookmarks and table of contents aren’t exposed consistently
  • Users are forced into linear scrolling
  • Visually locating pages is slow without thumbnails

For long, structured documents, iframe embedding turns simple tasks into productivity bottlenecks.

3. Annotations and review workflows are not supported

Native browser viewers treat PDFs as read‑only content.

That means no:

  • Highlights or markup
  • Comments or threaded reviews
  • Drawing or ink annotations
  • Persistent collaboration metadata

As a result, PDFs displayed via iframe can’t participate in review, approval, or collaboration workflows, making them unsuitable for many professional use cases.

4. Security and access controls are weak by design

Embedding a PDF with an iframe does not secure the document.

  • Files remain accessible via direct URL
  • Download and print controls are easily bypassed
  • There’s no viewer‑level authentication or authorization
  • No reliable way to track document access or interaction

For internal tools, regulated industries, or customer portals, these gaps create real risk.

5. Performance and mobile usability degrade at scale

Iframe embedding doesn’t scale well as documents grow larger or usage shifts to mobile:

  • Large PDFs cause slow initial loads or frozen tabs
  • Entire documents often load before any page is visible
  • Memory usage spikes on low‑end devices
  • Touch gestures and scrolling behavior are inconsistent

What works for a five‑page PDF quickly falls apart for a 200‑page report.

What developers actually need from a PDF experience

At this point, the issue becomes clear: iframe embedding solves display, not interaction.

Modern applications need PDF experiences that support:

  • Reliable, document‑wide search
  • Structured navigation (TOC, bookmarks, thumbnails)
  • Annotations, comments, signatures, and form fields
  • Fine‑grained control over downloads and printing
  • Progressive loading and responsive performance
  • Consistent behavior across browsers and devices

These capabilities move PDF handling from a browser feature into an application feature.

The next section shows how Syncfusion JavaScript PDF Viewer addresses each of these gaps directly.

How Syncfusion JavaScript PDF Viewer addresses iframe limitations

The JavaScript PDF Viewer gives you full control over how PDFs are displayed, searched, navigated, and interacted with across desktop and mobile browsers. Here’s how it addresses each iframe limitation, feature by feature.

1. Built-in bookmark navigation, page thumbnails, and toolbar controls

  • Bookmark navigation: A bookmark (TOC) panel that renders document bookmarks as a navigable sidebar, letting you jump directly to any section without linear scrolling.
  • Page thumbnails: It provides visual page access with a single click using the page thumbnail panel, particularly useful for large, image-heavy documents where page numbers aren’t meaningful.
  • Toolbar navigation: The Toolbar-based navigation includes first/last page, previous/next page, and number-specific page through the toolbar, covering every navigation pattern you need during review workflows.
Page thumbnails and bookmark navigation in the JavaScript PDF Viewer
Page thumbnails and bookmark navigation in the JavaScript PDF Viewer

You shouldn’t rely on manual scrolling to navigate long PDFs. Explore bookmark, thumbnail, and page navigation in our live demo.

2. Reliable full-text search

  • Advanced full-text search: Offers integrated text search without requiring additional configuration or custom code.
  • Document-wide search: The search scans the entire PDF content, not just visible pages.
  • Inline highlighting: Matching keywords are highlighted directly within pages for quick discovery.
  • Direct navigation to results: Search results jump you to exact page locations, eliminating manual scrolling.
  • Asynchronous search mode: For extremely large documents, asynchronous text search keeps the UI responsive while the search runs in the background.
  • Reliable at scale: Search performs consistently even on large documents where browser‑native viewers fall short.
Full‑text search across PDF documents
Full-text search across PDF documents

If you are downloading PDFs just to use Ctrl+F, the problem is with the viewer, not the document. See full-text search in the Syncfusion JavaScript PDF Viewer demo.

3. Annotations, comments, signatures, and form fields

  • Annotations: Built‑in annotations to highlight, underline, strikethrough, and other markup, freehand ink, and shape annotations without custom implementation, unlike iframe embedding.
  • Comments and review: The comments panel supports threaded discussions linked to specific document locations to enable document collaboration during review.
  • Annotation persistence: Annotation data can be exported and imported across sessions. Review cycles don’t break when you close and reopen the viewer, a feature not available with basic iframe PDF embedding.
  • Signatures: Our SDK supports draw text, and image-based signature annotation to enable approval and sign‑off workflows directly within the viewer.
  • Form fields: You can create form fields, including textboxes, checkboxes, radio buttons, list boxes, dropdowns, and signature fields, using the built-in form designer toolbar option, and validation is also supported.
Annotations, comments, and form field interactions
Annotations, comments, and form field interactions

Your document review cycle doesn’t need a separate annotation or form fields; everything your reviewers need is already in our PDF Viewer SDK. Explore the annotation demo and form-designer demo in live!

4. Secure download and print controls

  • No download and no print: You can disable and hide the Download and Print options from toolbar buttons.
  • Toolbar customization: Offers flexible toolbar customization for developers to expose only the controls appropriate for their workflow. Remove anything that doesn’t belong.
  • In-app viewing: The entire process stays inside your application, and files do not require a local download to be accessible.

The image below illustrates print and download options disabled in the our JavaScript PDF Viewer.

Disabled print and download options in JavaScript PDF Viewer
Disabled print and download options in JavaScript PDF Viewer

5. Tile-based rendering and progressive loading

  • Tile-based rendering: Supports tile-based rendering that loads only the visible page area for instant page loading of image-heavy PDFs.
  • Progressive loading: Loading only viewport pages to reduce time-to-first-page significantly compared to full-document load
  • Responsive Design: Enable smooth scrolling and responsive design by surface zoom optimization, zoom level, and initial page rendering count.

6. Cross-browser and mobile compatibility out of the box

  • No browser-specific tuning: The viewer behaves consistently without custom rendering logic across different browsers.
  • Mobile-friendly UI: Support mobile toolbars and controls that are adapted for touch‑first mobile interactions.

Try the live demos of the JavaScript PDF Viewer. Upload a PDF and try freehand drawing, annotations, signatures, navigation, and search to experience the full PDF capability set.

Embed the Syncfusion JavaScript PDF Viewer in your website

Embedding a JavaScript PDF Viewer in your website is straightforward and requires only a few setup steps.

  • Install the PDF Viewer package: Add the Syncfusion JavaScript PDF Viewer npm package and include the required CSS files in your project.
  • Import modules: Import the PDF Viewer and Toolbar modules from our PDF Viewer package into your application.
  • Initialize the PDF Viewer: Create a new PdfViewer instance by setting the documentPath and resourceUrl properties in a JavaScript HTML file.

Here’s the complete code block:

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
    <title>Essential JS 2</title>
    <!-- Placeholder to add Essential JS 2 material theme -->
    <!-- Essential JS 2 PDF Viewer's script --> 
    <script src="https://cdn.syncfusion.com/ej2/33.2.3/dist/ej2.min.js" type="text/javascript"></script>
    <script src="https://cdn.syncfusion.com/ej2/syncfusion-helper.js" type ="text/javascript"></script>
</head>
<body>
    <!--element which is going to render-->
    <div id="container">
        <div id="PdfViewer" style="height:580px;width:100%;"></div>
    </div>
    <script>                
        //Initialize PDF Viewer component
        var pdfviewer = new ej.pdfviewer.PdfViewer({
            documentPath:'https://cdn.syncfusion.com/content/pdf/pdf-succinctly.pdf',
            resourceUrl:'https://cdn.syncfusion.com/ej2/33.1.44/dist/ej2-pdfviewer-lib'
        });
        ej.pdfviewer.PdfViewer.Inject(
            ej.pdfviewer.TextSelection,
            ej.pdfviewer.TextSearch,
            ej.pdfviewer.Print,
            ej.pdfviewer.Navigation,
            ej.pdfviewer.Toolbar,
            ej.pdfviewer.Magnification,
            ej.pdfviewer.Annotation,
            ej.pdfviewer.FormDesigner,
            ej.pdfviewer.FormFields,
            ej.pdfviewer.PageOrganizer
        );

        //PDF Viewer control rendering starts
        pdfviewer.appendTo('#PdfViewer');
    </script>       
</body>
</html>
Secure in‑app PDF viewing in the JavaScript PDF Viewer
Secure in-app PDF viewing in the JavaScript PDF Viewer

Want the full setup guide? Follow the our JavaScript PDF Viewer documentation.

Frequently Asked Questions

Can I upload image-type signatures in approval workflow PDFs?

Yes, our JavaScript PDF Viewer supports drawing, text, and image-based signatures to add a user-customized signature.

Does Syncfusion provide support for rearranging or editing PDF pages in study materials?

Yes, the viewer has built-in options to organize pages, such as rotating, rearranging, inserting, copying, and deleting PDF pages directly within the viewer. This makes Syncfusion a practical option for any workflow that involves page-level document editing.

Does Syncfusion PDF Viewer support hyperlink navigation within PDFs?

Yes, Syncfusion PDF Viewer supports hyperlink navigation with custom options. Developers can control this behavior through configuration, like enable/disable hyperlink navigation, and setting where links open (in the current tab or a new one).

Can I search the text in extremely large files without freezing the UI?

Yes. Syncfusion supports asynchronous text search, which runs the search operation in the background and keeps the viewer UI fully interactive while results are being found. This is particularly valuable for large multi-hundred-page documents.

Can sensitive information be redacted before sharing a PDF with a third party?

Yes. Syncfusion JavaScript PDF Viewer includes built-in redaction tools that permanently remove sensitive text, images, and page regions from PDF files. This is an irreversible operation; redacted content cannot be recovered, which is the correct behavior for compliance-driven use cases.

Can I save and restore form fields when the PDF workflow moves between different sessions?

Yes, Syncfusion supports export and import form fields data in FDF, XFDF, and JSON formats. You can fill out a form, save their progress, and resume it in a later session without losing any data, a critical requirement for multi-step document workflows.

Conclusion

Thank you for reading! Iframe PDF embedding is suitable only for basic viewing. Once documents become searchable, reviewable, secure, and performance‑sensitive, browser‑native PDF rendering reaches its limits. Modern web applications require predictable behavior, deeper interaction, and tighter control than iframes can offer.

If PDFs play a meaningful role in your product, the question isn’t how to embed them, but how to integrate them.

The JavaScript PDF Viewer replaces the limitations of iframe-based PDF embedding with a complete document-viewing and interaction experience, built for real-world usage. Search, bookmarks, thumbnails, annotations, form fields, secure access controls, and tile-based rendering; all ship as part of one SDK, configurable for your specific workflow.

If you’re a Syncfusion user, you can download the setup from the license and downloads page. Otherwise, you can download a free 30-day trial.

You can also contact us through our support forumsupport portal, or feedback portal for queries. We are always happy to assist you!

Read the whole story
alvinashcraft
34 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Seven Rules for Building an AI-Native Software Factory

1 Share

Ewan Dawson is CTO of Compostable AI, where five engineers run an AI-native software factory: nineteen clients, custom AWS deployments, most of them shipped within a day of contract signing. This article is adapted from his recent Pulumi webinar, and covers rules in more depth than we had time for on stage.

For the past twenty years, I’ve viewed software development as a craft. The best engineers drew on decades of experience to get every function right.

But two years into the agentic AI revolution, I realised software is going to look more like a factory than a craft. The economics have changed. We can’t treat code as bespoke anymore. To scale, we have to think industrial — use the tools to ship more value with fewer engineers.

I joined Compostable AI soon after it was founded 2.5 years ago, and I built the engineering org AI-native from day one. The technology has come a long way since then, and so has my understanding of what AI-native actually means. Here are seven rules I keep coming back to.

1. Transform, don’t enhance

Going AI-native isn’t an upgrade to your existing process. If you treat AI as a way to hand your developers smarter tools, you leave most of the value on the table. You get the leverage by rebuilding how you write software — and the culture and processes around it.

I know that’s a tall order for a large, mature engineering org. My advice: start small. Pick one team or one business area and run it as a fully AI-native function. Take what you learn and roll it out from there. And do the political work early, especially with your Governance, Risk, and Compliance function. Get GRC on your side early. Otherwise AI becomes a compliance fight instead of a structural advantage.

Don’t bolt AI onto your existing workflow. Redesign the workflow around what agents can do.

Most of the leverage in this technology comes from rebuilding around it. The tool change is the small part.

2. Remove the problem, don’t solve it

Going AI-native flips which problems are hard and which are easy. The right move often isn’t to engineer a solution. It’s to reframe the problem so it goes away.

Here’s an example. Serving multiple clients with agents writing the code, blast radius wasn’t a hypothetical. One bad agent run could trash a customer’s database, or leak one client’s data into another’s. Our instinct was to build a secure multi-tenant sandbox with guardrails, approvals, rollback. But every version we tried still had agents loose in a shared environment, one bug away from making one customer’s data visible to another’s. So we removed the problem: every client gets two dedicated AWS accounts, one for production and one “digital twin” staging account. Agents iterate on staging until the work checks out. Only then does it ship to production. We have nineteen accounts now, one per client.

Managing nineteen AWS accounts with five engineers used to be an administrative nightmare. When code is cheap, infrastructure-as-code tools like AWS Control Tower and Pulumi make it the easier path.

Remove the problem before you try to solve it.

It’s cheaper to reframe the problem than to engineer your way through it.

3. Pick tools your agents can drive

Removing problems is the process side. The other side is tooling. If you want an automated factory, your tech stack has to be something agents can drive. This overlaps a lot with tools that have great developer experience. If a tool has a robust API plus a clean CLI, agents can drive it. If it’s heavy click-ops around a web UI, agents stop there.

We didn’t get there first try. Our first IaC tool worked fine when we had a couple of clients. As we added more, accounts drifted, deployments slowed, retries got complicated. We needed something built for where we were heading.

I went looking, and Pulumi fit. We express infrastructure as type-safe code — TypeScript, in our case, rather than HCL — and agents are good at writing it. Pair that with Pulumi Neo — pre-loaded with domain-specific Pulumi skills — and we ship infrastructure that follows best practices. One of my colleagues put it: “The scary thing about Neo is it just seems to know everything about what we do.” Pulumi IaC plus Pulumi ESC for configuration beats stitching tools together. And TypeScript lets us build higher-level abstractions that keep the AWS account fleet tractable.

“I don’t actually care if it’s HCL or TypeScript, as long as my software development agents can write it. And they do a better job with TypeScript than HCL.”

Tools have to share your AI-native mindset. If they don’t integrate deeply, the human becomes the glue.

If part of your stack still requires a human to click through a web UI to provision an account, your agents stop there.

4. Don’t let one agent do everything

When I first started with agents, I reached for a god prompt: one massive system prompt meant to guide a single agent through the whole software lifecycle. It didn’t work. Agents struggle when you give them multiple goals. The writer is lenient on its own work — it won’t catch what it just shipped. You don’t want it reviewing the code, checking for security flaws, or hunting bugs.

We get better results from a constellation of specialized agents, each handling one part of the line. Pulumi Neo handles infrastructure. Alongside it sit agents specialized in:

  • Code implementation
  • Code review and testing
  • Security auditing
  • Internal standards compliance
  • Documentation updates

Tasks pass down the line. Clean code comes out the other end, with almost no human involved.

Don’t let any agent mark its own homework. Specialize by job.

Treat agents the way you’d treat a team. The one who writes the code shouldn’t be the one signing it off.

5. Measure human hours per unit of value

Once we had agents writing and agents reviewing, throughput went up — but the bottleneck moved past the PR. Engineering hours were still the most expensive thing in the building, so my core metric is human hours per unit of value produced. Minimize that.

That means hunting for every step that still goes through a person — especially the mid-pipeline steps between ideation and production. Automate the human touchpoints along that line, and the factory runs 24/7.

Pushing automation this hard also forces good engineering. A chaotic, undocumented process is impossible to automate. Good engineering is still good engineering, AI or not. Agents won’t fix a weak process.

Measure human hours per unit of value. Treat every one as a bottleneck to remove.

You can’t automate what you can’t describe. Every human in the pipeline marks a piece that hasn’t been described yet.

6. Design for convergence, not one-shot correctness

Even with the human touchpoints removed, the agents don’t ship right the first try. Once you embrace the factory pipeline, you stop needing them to. We design for convergence instead — a system that lands on the right answer through automated iteration.

The loop we run looks like this:

  1. Refinement: agents iterate on the Product Requirements Document until the problem is clear.
  2. Planning: agents draft multiple technical approaches, and evaluation agents pick the best one.
  3. Implementation: coding agents write the software.
  4. Review: specialized checking agents look for bugs, API misuse, and security flaws.

If the checkers find a problem, they hand it back to the implementation agent. The loop repeats until the tests pass and the agents agree on a clean PR. Once it converges, we merge and deploy to staging.

Two things have to be true. You need a way to evaluate the output. Without that, you don’t know when to stop. And the loop has to converge — each pass has to get closer. A checker that fails every PR for a different reason isn’t helping — it just keeps the work going in circles. The feedback has to narrow the search, not widen it.

Once it converges, the question moves on. How cheap can we make it? Lower the time to PR, reduce token count, drop the overall cost. The optimization never really ends.

Don’t aim for one-shot correctness. Design for convergence.

It doesn’t matter how many tries it takes, as long as the loop closes without a human in it. Get convergence first. The optimization comes after.

7. Run the factory in the cloud, not on a laptop

Even a converged factory has to live somewhere. Try running a fully automated factory on individual developers’ laptops, and it falls apart. Laptops are highly trusted machines. Put autonomous agents on them and your security posture drops, fast. And the factory has to run 24/7. Events come from elsewhere — PR comments, Slack threads, errors in test environments.

Cloud also kills configuration drift across a dozen developer machines. The same prompts run against different model versions, and env vars sit half-set on half the laptops. The thing you’re trying to optimize lives in different states across the team. Cloud isn’t just where the factory runs; it’s the only place a team can iterate on it together. Keep everything in one place — AWS, Pulumi Cloud, GitHub. The specific stack matters less than the principle of one place.

And the part that matters most: the factory keeps running, testing, and deploying long after we’ve closed our laptops and gone to sleep.

Build the factory somewhere you can work on it — not just somewhere it can run.

A factory scattered across laptops can’t be improved as a system. Cloud keeps it in one shape, 24/7, and lets the team iterate together.

Closing thought

I’ve shipped more code in the last two years than I did in the fifteen before that. Most of it in languages I couldn’t write by hand. And that’s after a stretch in leadership where I wrote almost none.

If you’re where I was two years ago: don’t ask how AI fits into what you already do. The factory is built one rule at a time, and it’s not a template — it’s the practice of finding where you’re taking advantage of the new economics and where you’re not, where your practices still need an update. The leverage is in finding these places and improving them.


Watch the original Pulumi webinar. Learn more about Compostable AI and Pulumi Neo.

Read the whole story
alvinashcraft
41 seconds ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories