Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
152114 stories
·
33 followers

Playwright vs. Selenium: A 2026 Architecture Review

1 Share

1. Introduction: Beyond the API Surface

By 2026, the debate between Playwright and Selenium has largely moved beyond "syntax preference" or "language support." For the senior automation architect, the choice is no longer about whether you prefer driver.find_element or page.locator; it is a fundamental decision about infrastructure topology and protocol efficiency.

Historically, browser automation was viewed as "scripting": sending a command to click a button and waiting for a result. Today, it is a critical layer of distributed infrastructure. We run tests in ephemeral containers, scrape data behind aggressive WAFs, and automate complex authentications protected by hardware-bound credentials. In this high-stakes environment, the underlying architecture of your automation tool dictates its reliability, speed, and maintainability.

This review dismantles the internal mechanics of Selenium (including the W3C WebDriver BiDi evolution) and Playwright. We will analyze why architectural decisions made in 2004 still constrain Selenium today, and how Playwright’s "headless-first" event loop aligns with the reality of the modern web.

2. The Protocol Gap: HTTP vs. WebSockets

The single most defining difference between the two frameworks is the communication protocol used to drive the browser. This is not an implementation detail; it is the root cause of nearly every performance and stability difference between the two.

Selenium: The HTTP Request/Response Cycle

Selenium’s architecture is built on the WebDriver W3C Standard. At its core, it is a RESTful HTTP protocol. Every single action in a Selenium script triggers a discrete HTTP request:

  1. Client: Sends POST /session/{id}/element (Find element)
  2. Driver: Receives request, translates to browser internal command, waits for browser, returns JSON response.
  3. Client: Parses JSON.
  4. Client: Sends POST /session/{id}/element/{id}/click (Click element)
  5. Driver: Receives request, executes click, returns JSON.

This "chatty" architecture introduces Control Loop Latency. Between every command, there is network overhead, serialization, and deserialization. In a local environment, this is negligible (milliseconds). In a distributed grid (e.g., running tests on Sauce Labs or BrowserStack from a CI runner), these round-trip times accumulate, creating a "stop-and-go" execution rhythm that is inherently slower and prone to race conditions.

Playwright: The Persistent WebSocket

Playwright abandons standard HTTP for a single, persistent WebSocket connection (leveraging the Chrome DevTools Protocol or CDP). Once the connection is established, the channel remains open.

  • Bi-Directional: The browser can push events to the script (e.g., "network request failed" or "DOM node added") without the script asking for them.
  • Command Batching: Playwright can send multiple instructions down the pipe without waiting for HTTP handshakes.
  • Low Overhead: Binary data (like screenshots) streams efficiently without the massive Base64 overhead typical of JSON payloads.

A comparison diagram titled

3. Execution Models & The "Auto-Wait" Myth

Senior engineers often cite Playwright’s "Auto-Wait" as a key feature. However, understanding how it works architecturally explains why Selenium struggles to replicate it, even with "Explicit Waits."

Selenium: External Polling

When you use WebDriverWait in Selenium, the logic lives in your client script (Python/Java/C#). The script effectively spams the browser driver with HTTP requests:
"Is it visible? No. (Wait 500ms). Is it visible? No. (Wait 500ms). Is it visible? Yes."
This polling happens outside the browser process. It creates network noise and, crucially, a race condition window. The element might flicker into visibility and back out between poll intervals, causing the test to fail or interact with a detaching element.

Playwright: Internal Event Listeners

Playwright compiles your locator logic and injects it directly into the browser context via the CDP session. The "waiting" happens inside the browser's own event loop.
Playwright hooks into requestAnimationFrame and the browser’s painting cycle. It checks for element stability (is the bounding box moving?) and actionability (is it covered by a z-index overlay?) in the same render loop as the application itself. The command to "click" is only executed when the browser itself confirms the element is ready. This atomic "Check-and-Act" mechanism eliminates the race conditions inherent to external polling.

4. Network Interception: Proxy vs. Native

In 2026, automation is rarely just about clicking buttons. It requires mocking API responses, injecting headers, and blocking analytics trackers.

Selenium historically required a "Man-in-the-Middle" (MITM) proxy (like BrowserMob) to intercept network traffic. This added a massive point of failure: certificate trust issues, decreased throughput, and complex infrastructure setup. While Selenium 4+ introduced NetworkInterceptor, it is a patchwork implementation on top of the WebDriver protocol, often limited in granularity and prone to compatibility issues across different browser drivers.

Playwright gains network control for free via its architecture. Because it communicates via CDP (or the Firefox/WebKit equivalents), it sits between the browser’s network stack and the rendering engine. It can pause, modify, or abort requests natively without a proxy server. This allows for:

  • Zero-latency mocking: The request never leaves the browser process.
  • Reliability: No SSL certificate installation required on the host machine.
  • Granularity: Routing logic (glob patterns, regex) is evaluated instantaneously.

5. Ecosystem & The "Selenium 5" Promise

As of 2026, Selenium 5 has fully embraced WebDriver BiDi, a standardized effort to bring bi-directional communication (WebSockets) to the WebDriver standard. This is Selenium’s answer to Playwright.

The Reality of BiDi:
While BiDi allows Selenium to receive events (like console logs) without polling, it is fundamentally an "add-on" to a legacy architecture. The vast ecosystem of Selenium Grids, cloud providers, and existing test suites relies on the HTTP Request/Response model. Migrating a massive Selenium codebase to utilize BiDi features often requires significant refactoring, bringing the effort parity close to a full migration to Playwright.

Playwright’s Advantage:
Playwright was designed after the Single Page Application (SPA) revolution. Its "Browser Context" model—which allows spinning up hundreds of isolated, incognito-like profiles within a single browser process—is an architectural leap over Selenium’s "One Driver = One Browser" resource-heavy model. This makes Playwright exponentially cheaper to run at scale in containerized environments (Kubernetes/Docker).

6. Decision Matrix: Choosing in 2026

When should you stick with the incumbent, and when should you adopt the challenger?

Feature Selenium (w/ BiDi) Playwright
Primary Protocol HTTP (Restful) WebSocket (Event-driven)
Wait Mechanism External Polling Internal Event Loop (RAF)
Language Support Java, C#, Python, Ruby, JS TS/JS, Python, Java,.NET
Legacy Browsers Excellent (IE Mode support) Non-existent (Modern engines only)
Mobile Support Appium (Native Apps) Experimental / Web only
Scale Cost High (1 Process per Test) Low (Contexts per Process)

Stick with Selenium if:

  • You require testing on legacy browsers (IE11) or specific older versions of Chrome/Firefox.
  • You are integrated heavily with Appium for native mobile testing.
  • Your team is strictly Java/C# and prefers a synchronous, blocking API style.

Migrate to Playwright if:

  • You are testing modern SPAs (React, Vue, Angular) where component re-rendering causes flakiness in Selenium.
  • You need high-performance scraping or data extraction (Network interception is critical).
  • You want to reduce CI/CD infrastructure costs by 30-50% via browser contexts.

7. Conclusion

In 2026, Playwright is not just a "better Selenium"—it is a different species of tool. By coupling tightly with the browser internals via WebSockets, it removes the layers of abstraction that caused a decade of "flaky tests."

Selenium remains a titan of interoperability and standard compliance. Its W3C WebDriver standard ensures it will run on anything, forever. But for the engineering team tasked with building a reliable, high-speed automation pipeline for a modern web application, Playwright’s architecture offers the path of least resistance. It solves the hard problems of synchronization and latency at the protocol level, allowing you to focus on the test logic, not the sleep statements.

Read the whole story
alvinashcraft
51 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Why Every Developer Should Embrace “Reading Code” as Much as Writing It

1 Share

When we talk about improving as a developer, most advice focuses on writing code: learning new frameworks, mastering algorithms, or optimizing performance. But one of the most underrated skills in programming isn’t writing—it’s reading code.

The Hidden Superpower

Think about it. Every day, developers spend hours working with code they didn’t write: legacy systems, open-source libraries, or teammate contributions. Being able to quickly understand someone else’s code is like having a superpower. It helps you debug faster, avoid introducing bugs, and even learn new patterns you hadn’t considered.

Yet, many developers skip this practice. We’re wired to solve problems by coding, not by reading. But reading code can teach you how others think, reveal idiomatic uses of a language, and expose you to clever techniques that you can later apply in your own projects.

Start Small, Read Daily

You don’t need to dive into huge codebases right away. Start with something small:

  1. Open-source projects on GitHub.
  2. Code snippets on StackOverflow.
  3. Even a teammate’s pull request.

As you read, ask questions: Why did they structure it this way? Could it be simpler? How does this function interact with the rest of the system?

This approach trains you to think like a developer before you type a single line of code.

The Debugging Bonus

Reading code is also the secret to better debugging. Often, the bug isn’t where you think it is. By systematically reading through the code, you can understand the flow, spot edge cases, and find issues before they turn into hours of frustration.

It’s like being a detective: the more you read, the more clues you pick up, and the faster you solve the mystery.

Learning Beyond Tutorials

Tutorials and courses are great for learning syntax, but real growth comes from reading real-world code. You’ll see patterns, anti-patterns, trade-offs, and compromises that tutorials never teach. Over time, your own code starts to look cleaner, more maintainable, and more efficient because you’ve absorbed best practices by osmosis.

A Habit Worth Building

Try setting aside 20–30 minutes a day to read someone else’s code. Treat it like reading a book: analyze, reflect, and learn. Pair it with your coding time, and you’ll notice subtle improvements in your speed, design sense, and problem-solving skills.

Read the whole story
alvinashcraft
52 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Writing Docs in a World Where LLMs Are the Readers

1 Share

Writing documentation is no longer just for humans.

Developers still read it, but AI reads it too. Large language models scan, summarize, and even generate code from your docs.

This shift does not make documentation less important, it changes how we write and structure it. In this article, you’ll learn how llms.txt helps your docs work for both humans and machines.

1. How Google Cloud Is Adapting Documentation for AI

The Google Cloud Developer Experience team focuses on one goal: helping developers move from learning to launching as fast as possible.

As Google Cloud services grew, keeping documentation accurate and up to date became harder. Developers expect quick, correct answers. If docs fall behind, adoption suffers.

Google Cloud did not replace technical writers.

They augmented them with AI.

Generative AI is now part of their documentation workflow. It helps with formatting, markup translation, and validation. Some docs are even tested automatically by running the documented steps in real environments.

Documentation is treated like code: generated, tested, and continuously improved.

You may not work at Google Cloud scale, but the same pressures already exist in many teams today.

2. Documentation Is No Longer Read Only by Humans

Developers still read documentation. But very often, AI reads it first.

Today, developers:

  • Ask AI to generate and debug code
  • Let AI research APIs and tools
  • Paste full documentation pages into prompts

Human readers are still important. But LLMs are now a primary consumer of documentation.

Documentation is no longer just read by humans. It is consumed by LLMs.

That reality changes how docs should be structured and published.

3. Tech Writers Do Not Compete With AI. They Enable It.

It is easy to worry that AI will replace tech writers.

In practice, the opposite is happening.

AI can generate text quickly. It cannot decide what matters, what is correct, or how concepts should be structured.

Tech writers provide that structure.

Tech writers do not need to compete with AI. They need to organize knowledge so AI can use it correctly.

This shift moves the role from writing pages to designing knowledge systems.

One common way to do this is by providing AI tools with a structured, machine-readable version of your docs. This is where llms.txt comes in.

4. What llms.txt Is and What It Is Not

llms.txt is a machine-readable version of your documentation. It is usually written in Markdown and designed for AI tools and LLMs.

Think of it as a translation layer:

  • Your main documentation stays human-friendly
  • llms.txt gives AI a clean and structured view of the same content

A good llms.txt file often includes:

  • Core concepts and terminology
  • API overviews and constraints
  • Authentication and environment assumptions
  • Canonical examples
  • Known limitations and edge cases

What it is not is just as important.

This does not replace documentation.

It protects it.

By giving AI its own context file, you avoid turning human docs into prompt-shaped content. Human readers get clarity. AI tools get structure.

5. Make llms.txt Auto-Generated

One key lesson from Google Cloud is automation.

Their documentation is generated, validated, and tested continuously. llms.txt should follow the same idea.

Best practice is to auto-generate it whenever documentation changes.

Practical guidance:

  1. Generate llms.txt as part of your docs build process
  2. Regenerate it on every docs edit
  3. Preserve headings, code blocks, links, and examples in Markdown
  4. Add simple checks to ensure the file is complete

This matters because:

  • AI relies on fresh context
  • Manual updates drift over time
  • Automation keeps humans and AI aligned

One source of truth.

Two audiences.

No duplication.

6. Lessons from Google Cloud’s AI Code Systems

Google Cloud also applied AI to code samples.

They faced thousands of APIs, many languages, and constant change. Manual maintenance did not scale.

Their solution used AI systems that:

  • Generate samples from official API definitions
  • Review and refine results automatically
  • Test code before publishing

The lesson is simple.

AI works best when knowledge is structured, grounded, and validated.

That same principle applies to documentation. llms.txt provides that structure for AI tools.

7. How to Use llms.txt in Practice

For AI tools with limited capabilities that cannot fetch docs on their own, llms.txt is especially useful.

A simple workflow:

  1. Open docs.example.com/llms.txt
  2. Download/Copy the Markdown file
  3. Upload it into your AI coding tool
  4. Ask the tool to analyze, debug, or generate code using this context

This keeps AI output aligned with real documentation and real constraints.

8. Make It Easy to Find

For llms.txt to be useful, it must be visible.

Recommended approach:

  • Publish it at docs.example.com/llms.txt
  • Keep it in Markdown format
  • Add a visible button in your docs like “AI Context (llms.txt)”
  • Open it in a new tab

This is not a power-user trick.

It is basic documentation infrastructure.

9. Closing Thoughts

AI is not removing the need for tech writers.

It is raising expectations.

The work shifts from writing more pages to:

  • Structuring knowledge clearly
  • Designing systems that scale
  • Making documentation reliable for humans and machines

llms.txt is a small file, but it represents a real shift.

If you own documentation today, the question is not whether AI will read it.

It already does.

The real question is whether it is reading the right version.

Read the whole story
alvinashcraft
53 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

What Are Over-The-Air Updates and Why They Matter for React Native

1 Share

Imagine shipping a critical bug fix to your mobile app and having it reach users within minutes—not days or weeks waiting for App Store review. That's the power of Over-The-Air (OTA) updates, and it's changing how mobile developers ship software.

In this article, we'll explore what OTA updates are, how they work in React Native, and why they're becoming essential for modern mobile development. We'll also compare pricing between major OTA platforms to help you make an informed decision.

What Exactly Are OTA Updates?

Over-The-Air updates allow you to push new code directly to users' devices without going through the traditional app store submission process. Instead of uploading a new binary to Apple or Google, waiting for review, and hoping users update their apps, you can deploy changes instantly.

Think of it like deploying a web application. When you push changes to your website, users see them immediately on their next visit. OTA updates bring this same workflow to mobile apps.

What you can update OTA:

  • JavaScript code and logic
  • React components and screens
  • Styles and layouts
  • Images and assets bundled with your code

What you cannot update OTA:

  • Native code (Swift, Kotlin, Objective-C, Java)
  • Native dependencies and libraries
  • App permissions and capabilities
  • App Store metadata (name, icon, screenshots)

How OTA Updates Work in React Native

React Native apps consist of two parts: the native shell (compiled code that runs on the device) and the JavaScript bundle (your React components, business logic, and UI code).

When you build a React Native app, your JavaScript code gets bundled into a single file that the native shell loads at runtime. Here's where OTA updates come in:

  1. Initial Install: User downloads your app from the App Store. The app contains a native shell and a JavaScript bundle.

  2. Update Check: When the app launches, it checks a remote server for new JavaScript bundles.

  3. Download: If a new bundle is available and compatible with the native shell, it downloads in the background.

  4. Apply: The new bundle replaces the old one, either immediately or on the next app restart.

  5. Rollback Safety: If the new bundle crashes, most OTA systems automatically rollback to the previous working version.

┌─────────────────────────────────────────────────────────────┐
│                     User's Device                           │
│  ┌─────────────────┐     ┌─────────────────────────────┐    │
│  │   Native Shell  │ ←── │  JavaScript Bundle (OTA)    │    │
│  │   (App Store)   │     │  - React Components         │    │
│  │                 │     │  - Business Logic           │    │
│  └─────────────────┘     │  - Styles & Assets          │    │
│                          └─────────────────────────────┘    │
└─────────────────────────────────────────────────────────────┘
                                    ↑
                                    │ Download new bundle
                                    │
                          ┌─────────────────┐
                          │   OTA Server    │
                          │   (Turbopush)   │
                          └─────────────────┘

This architecture is what makes React Native uniquely suited for OTA updates. Unlike fully native apps where every change requires recompilation, React Native's JavaScript-based approach allows for dynamic code updates.

Why OTA Updates Matter

1. Speed to Market

App Store review can take anywhere from 24 hours to several days. For critical bug fixes, that delay can mean lost users, revenue, or even legal issues. OTA updates bypass this entirely—deploy a fix in seconds, not days.

2. Reduced User Friction

Users don't always update their apps. Some have automatic updates disabled, others ignore update notifications, and many simply forget. With OTA updates, users get the latest version automatically, ensuring everyone runs your best code.

3. A/B Testing and Feature Flags

Want to test a new feature with 10% of users before rolling it out to everyone? OTA platforms often include rollout controls that let you gradually deploy changes and rollback instantly if something goes wrong.

4. Cost Efficiency

Every app store submission has costs: developer time for testing, QA cycles, and the opportunity cost of waiting. OTA updates streamline this process, letting your team focus on building features instead of managing releases.

5. Emergency Response

When a critical bug hits production, every minute counts. OTA updates let you deploy a hotfix immediately, without waiting for app store approval or hoping users update manually.

What is Turbopush?

Turbopush is a modern Over-The-Air update platform built specifically for React Native applications. Whether you're using bare React Native or Expo, Turbopush integrates seamlessly with your existing workflow.

Key features:

  • Universal React Native support — Works with bare React Native, Expo, and the New Architecture (Fabric + TurboModules)
  • Gradual rollouts — Release to 10%, 50%, or any percentage of users before going full
  • Instant rollbacks — Revert problematic updates with a single command
  • Rich analytics — Track install rates, version adoption, and user metrics in real-time

Pricing Comparison: Turbopush vs Expo Updates

When choosing an OTA platform, pricing is a crucial factor. Let's compare Turbopush and Expo Updates (EAS Update) across different app sizes.

Understanding the Pricing Models

Expo Updates charges based on Monthly Active Users (MAUs)—the number of unique users who download at least one update during a billing period. Multiple downloads by the same user count as a single MAU.

Turbopush charges based on updates delivered—each time a user downloads an update, it counts as one update. This model is often more cost-effective because you only pay for actual usage.

Expo Updates Pricing

Plan Monthly Cost MAUs Included
Free $0 1,000 MAUs
Starter $19 3,000 MAUs
Production $199 50,000 MAUs
Enterprise $1,999 1,000,000 MAUs

Overage pricing: If you exceed your plan's MAU limit, Expo charges per additional MAU on a tiered scale starting at $0.005/MAU for the first 197K extra, decreasing to $0.00085/MAU at scale (100M+).

Turbopush Pricing

Plan Monthly Cost Releases Updates
Free $0 20 2,500
Startup $15 100 50,000
Growth $40 400 1,000,000
Business $90 Unlimited Unlimited

Real-World Scenarios

Let's analyze three realistic scenarios to see how costs compare:

Scenario 1: Early-Stage Startup

10,000 active users, 3 releases per month

With Expo Updates:

  • 10,000 MAUs exceeds Starter plan (3,000 MAUs)
  • Starter + overage: $19 + (7,000 × $0.005) = $54/month

With Turbopush:

  • 10,000 users × 3 releases = 30,000 updates/month
  • Startup plan includes 50,000 updates
  • Required plan: Startup at $15/month
  • 20,000 updates remaining — room to grow or ship more releases
Platform Monthly Cost
Expo Updates $54
Turbopush $15
Annual Savings $468

Scenario 2: Growing Product

20,000 active users, 3 releases per month

With Expo Updates:

  • 20,000 MAUs exceeds Starter plan (3,000 MAUs)
  • Starter + overage: $19 + (17,000 × $0.005) = $104/month
  • (Production at $199 would be more expensive)

With Turbopush:

  • 20,000 users × 3 releases = 60,000 updates/month
  • Growth plan includes 1,000,000 updates
  • Required plan: Growth at $40/month
  • 940,000 updates remaining — 94% headroom for growth
Platform Monthly Cost
Expo Updates $104
Turbopush $40
Annual Savings $768

Scenario 3: Scaling Application

100,000 active users, 4 releases per month

With Expo Updates:

  • 100,000 MAUs exceeds Production plan (50,000 MAUs)
  • Production + overage: $199 + (50,000 × $0.005) = $449/month
  • (Starter + overage would be $504, so Production is better)

With Turbopush:

  • 100,000 users × 4 releases = 400,000 updates/month
  • Growth plan includes 1,000,000 updates
  • Required plan: Growth at $40/month
  • 600,000 updates remaining — room to 2.5x your user base
Platform Monthly Cost
Expo Updates $449
Turbopush $40
Annual Savings $4,908

Why the Difference?

The key insight is that Turbopush's generous update limits mean you're paying for actual delivery, not user count. Even when accounting for Expo's overage pricing, a scaling app with 100,000 users pushing 4 releases per month pays $40/month with Turbopush vs $449/month with Expo—that's 91% savings.

The difference becomes even more significant at scale. Turbopush's Business plan at $90/month offers unlimited releases and updates, providing completely predictable costs regardless of how much you ship or how many users you have.

No Hidden Costs: Storage & Bandwidth Included

One often-overlooked aspect of OTA pricing is storage and bandwidth. With Expo Updates and other OTA platforms, you pay extra for these:

  • Storage: $0.05 per GB after your plan's limit (20 GB on Starter, 1 TB on Production)
  • Bandwidth: $0.10 per GB after your plan's limit (10 GB on Starter, 100 GB on Production)

These costs can add up quickly, especially if your app bundles assets or you're shipping frequent updates to a large user base. A single 5 MB update to 100,000 users consumes ~500 GB of bandwidth—that's $40 in bandwidth alone on top of your base plan.

Turbopush includes storage and bandwidth in all plans. No surprise charges, no metering your bundle sizes, no worrying about how many assets you're shipping. Your monthly cost is exactly what you see in your plan—nothing more.

Getting Started with Turbopush

Ready to add OTA updates to your React Native app? Getting started with Turbopush takes less than 30 minutes. Our step-by-step guide walks you through everything: creating your account, installing the SDK, configuring your app, and shipping your first update.

Whether you're using bare React Native or Expo, we have dedicated guides for each setup.

The entire setup typically takes less than 30 minutes, and the productivity gains are immediate.

Read the Getting Started Guide →

Conclusion

OTA updates represent a fundamental shift in how mobile apps are developed and deployed. By decoupling your JavaScript code from app store releases, you gain the speed and flexibility that web developers have enjoyed for years.

Whether you're fixing a critical bug, testing a new feature, or simply iterating faster, OTA updates remove friction from your development workflow. And with platforms like Turbopush offering generous pricing tiers, there's never been a better time to adopt this technology.

The question isn't whether you should use OTA updates—it's how quickly you can get started.

Ready to ship faster?

Happy shipping! 🚀

Read the whole story
alvinashcraft
54 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Learning AI

1 Share

Very excited to say that I have purchased 3-4 books in preparation for the Azure AI 102 certification from Microsoft. I will be creating another series here, distilling down what I learn, even while we keep the API series going.

This is the first certification I’ve ever tried for… no that’s a lie, I was a certified Xamarin developer. But beside that one, this is the first. I’ve never been sure that they are worthwhile (as an autodidact), but AI is so new and changing so fast and so bloody interesting that I couldn’t resist.

I’ll begin with the obvious question: which AI engine? The answer to that was easy; I’ve spent the past three decades working in what is now .NET, and I worked for Microsoft, and I currently work in an all-Microsoft shop. So Azure it is.

The Bits and Pieces

Azure offers 7 major AI services:

  • AI Search
  • Document Intelligence
  • Azure OpenAI Service
  • Vision
  • Speech
  • Language

Over the course of these blog posts, I hope to cover Large Language Models (LLMs) and the following AI sub-topics:

  • Retrieval-Augmented Generation (RAG)
  • Prompt Engineering
  • Natural Language Processing (NLP)

Generative AI

Generative AI is what all the excitement is about. Generative AI can create new content that it has never seen before—even content that has never existed before. Show it a bunch of books (including 29 of mine, thank you very much, Anthropic!) and after a while it can be an intelligent CoPilot™ for your work.

Programming with Generative AI, and especially Agentic Generative AI, is an incredible experience. It allows you to articulate what you want, and then it goes off and does it for you. Specifically, you can say something like (from the previous blog post):

Create an ASP.net API with a sqlserver* backend. SQLServer will have two tables: bookList and Authors, with the booklist id as a foreign key in Authors. Unit tests using xUnit and Moq. Set up Azurite to provide a message queue, have the GET endpoint create a durable function to listen to the message queue, and have POST use an Azure function to add records to the database. Create service classes for the logic and Repository classes for interacting with the Database

and it does. Fast. Incredibly, the code is usually good. I’d say it gets you about 80% of the way there, which is pretty fantastic. You can even tell it general design requirements (even coding styles, etc.) with a copilot-instructions.md file that you put into the .github directory.

The more context you can provide the better, where context includes the assumptions you make such as which unit testing framework to use, what you’ll be doing with this code, how it relates to other code CoPilot knows about, etc.

I’ll come back to Generative AI in future blog posts. I’ll also touch on ASI — Artificial SuperIntelligence, and I’ll point you to the book If Anyone Builds It, Everyone Dies. Which is an amazing title and a book that will scare your pants off.

Finally, I should mention that AI is the foundational technology for Vibe Coding. For more on what this is, see this video or this podcast of the video sound track. You may also want to check out Jeff Blankenburg’s series of blog posts, 31 Days of Vibe Coding which starts here.

More to come in the next post in this series.


* So, is it “a SQL” or “an SQL”? Since I pronounce it SeeQuill, I’ll go with “a SQL.”

Read the whole story
alvinashcraft
56 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

MRTK3 apps on Apple Vision Pro - fixing the stuck to your head issue (and more)

1 Share

Ah, Unity. The company that was instrumental in me being able to venture into Mixed Reality, the very embodiment of the Silicon Valley motto “move fast and break things”… with unfortunately, emphasis on the second part.

Early HoloLens days, I learned an important lesson quickly: one of the scariest things you could do is upgrade your apps to a new Unity version - even just a point release could hose your app. As fellow Mixed Reality developer Oliver Vasvari noted in comments on my August blog about running MRTK3 apps on Apple Vision Pro, my solution (based on Unity 6000.0.49) was still working on 6000.0.53, but fell apart on 6000.0.62. Specifically, the XR camera view was ‘stuck’ to your head - that is, every virtual object was moving along with your head on the Vision Pro.

This kind of thing is unfortunately part of the life of a Unity Mixed Reality developer. So I set out to find if I could revive my solution. Spoiler - I could.

Starting point was my last Vision Pro MRTK port. I went for broke and did not just upgrade to the version where Oliver noticed the problem, but jumped to the newest Unity 6.3 version (which was 6000.3.2f1 at the point I wrote that).

Upgrade project to 6.3

Straightforward enough: open the solution with 6000.3.2f1. This apparently needs an URP upgrade:

Click OK, and then you will be greeted by another issue:

Click open Package Manager and you will notice my hacked 2.2.4 Vision OS XR Plugin is no longer accepted.

Uninstall it, and Unity will proceed to install the newest Vision Pro packages:

Upgrade MRTK3 (optional)

In my previous Vision Pro post I used MRTK3 4.0.0-pre.1, but in the meantime 4.0.0-pre.2 has been released. It would be a shame not to use all the good work done by Kurtis Eveleigh, the proud MRTK custodian. The MRTK Feature Tool has been abandoned, and never worked on the Mac anyway, so the only way to upgrade I can think of is to download the packages manually, and put them in the Packages/MixedReality folder. The quickest way to effectuate the upgrade is to open the manifest.json in a text editor, do a search & replace on pre.1, and replace it with pre.2. You can then, of course, remove the pre.1 tgz files.

Unity will then popup this warning:

This you can safely ignore.

Fix nothing visible

The app can now be deployed on the Vision Pro, but when yourun it, you will most likely see nothing. That is because they are created 1.6m above your head. To fix this, you will need to change the Y value of the Camera Offset in the MRTK XR Rig to 0.

This is normally not a problem, but apparently it is now. If it’s Unity does not like this, the Vision Pro packages, or something else, I don’t know.

Fix view being stuck to camera

If you now run the app, the cubes will appear all right, but if you move your head, all the cubes move with your head instead of hanging in space after the initial spawn, as displayed on the gif a the start of this post. This is one of the most peculiar changes we need to make:

As you can see, I have disabled the original Tracked Pose Driver and added a second one. This one has some hard-coded Action definitions in stead of an Action References. I got this wisdom by studying the Vision OS template v 3.0.2. Why I didn’t use Action References, and made a new Input Actions asset, like the default MRTK Default Input Actions, tailored for Vision Pro? Believe me, I tried, but either I don’t understand those Input Action assets correctly, or it simply doesn’t work in the Unity 6.3/Vision Pro combo. However, this works. But this leads to a new problem.

Fix cubes appearing on floor

The cubes are now no longer stuck to your head, but instead of appearing in the view, they appear on the floor and even partially in the floor, with only one or two rows visible. The rest is invisible due to occlusion being applied to the Spatial Map. This issue, although seemingly very simple, took me quite some time to figure out. My best guess is: whatever layer Unity is putting on top of the Apple stuff - it apparently needs some time to initialize and figure out where the headset actually is before it properly sets the main camera’s transform position.

The simplest way to fix that was to change my own startup code in CubeManager from:

private void Start()
{
    audioSource = GetComponent<AudioSource>();
    CreateGrid();
}

to

private async Task Start()
{
    audioSource = GetComponent<AudioSource>();
    await Task.Delay(1000);
    CreateGrid();
}

Simply wait a second till the Unity Player get it’s act together. I don’t know if this is optimal, but for my demo app, it does the trick.

Fix hand menu

Now the only thing missing is the hand menu - that does not appear when you hold up your hand. The root cause is still the same: Vision Pro apparently does not track your palm, and that’s what the hand menu (and other things in the MRTK) apparently rely on. So I went back to the trick Guillaume Vauclin from EADS, France created - adapt the code of the VisionOSHandProvider.cs file so that it no longer reports the palm cannot be tracked, but returns the wrist Pose instead. There is only one thing - I could not get Unity to accept a hacked version of the Apple Vision OS XR Plugin with a changed file inside it. So I went back to an old trick I used in November 2022 for patching the MRTK2 - replace the file in the Library folder after Unity has unpacked all the library files.

You will find the fixed VisionOSHandProvider.cs in the Patches folder, together with a small shell script I created (actually I asked Claude to write it) that simply takes all C# files in the current folder, finds the location of each correspondingey named file in the Library folder, and replaces it:

#!/bin/bash

# Get the current directory
CURRENT_DIR="$(pwd)"
LIBRARY_DIR="$(pwd)/../Library"

# Check if Library directory exists
if [ ! -d "$LIBRARY_DIR" ]; then
    echo "Error: $LIBRARY_DIR directory not found"
    exit 1
fi

# Find all C# files in current directory
for cs_file in *.cs; do
    # Skip if no .cs files found
    if [ ! -f "$cs_file" ]; then
        echo "No C# files found in current directory"
        exit 0
    fi
    
    echo "Processing: $cs_file"
    
    # Find all matching files in Library and subdirectories
    while IFS= read -r target_file; do
        echo "  Replacing: $target_file"
        cp "$cs_file" "$target_file"
    done < <(find "$LIBRARY_DIR" -type f -name "$cs_file")
done

echo "Done!"

Make sure the project is opened by Unity at least once, wait until it is fully done loading and installing all packages. Then open the Patches folder in Terminal, and run the following commands:

chmod +x replace_files.sh
./replace_files.sh

The first command you will need to run only the first time. Also, remember to run this command in your CI build scripts. In my previous jobs, I wrote CI scripts that did the following steps when a patch like this was needed:

  • Pull code
  • Open in Unity
    • basically do nothing
    • quit
  • Run patch script
  • Open in Unity again
    • run tests
    • perform actual build
    • etc

Concluding words

It’s remarkable how flexible and malleable MRTK3 is - I have got it to run on all Mixed Reality headsets I managed to get my hands on, with minor tweaks. This makes for a very consistent experience and retains as much of your investments in Mixed Reality as possible. Which is an important thing, given the HoloLens 2 deprecation.

Updated QuestBouncer project for Apple Vision Pro can be found here.

Read the whole story
alvinashcraft
56 minutes ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories