Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
148696 stories
·
33 followers

Supreme Court won’t hear AI-generated art copyright case

1 Share
Photo illustration of the Supreme Court building with pixelated sky.

The US Supreme Court has declined to hear a case over whether AI-generated art can obtain a copyright, as reported earlier by Reuters. The Monday decision comes after Stephen Thaler, a computer scientist from Missouri, appealed a court's decision to uphold a ruling that found AI-generated art can't be copyrighted.

In 2019, the US Copyright Office rejected Thaler's request to copyright an image, called A Recent Entrance to Paradise, on behalf of an algorithm he created. The Copyright Office reviewed the decision in 2022 and determined that the image doesn't include "human authorship," disqualifying it from copyright protection.

After Thale …

Read the full story at The Verge.

Read the whole story
alvinashcraft
34 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

OpenAI Shares How They’re Turning Engineers into AI Team Leads

1 Share

Six months ago, if someone had told me that engineers would start naming their AI agents and treating them like teammates, I probably would’ve rolled my eyes.

Honestly, even today, it still sounds a little… absurd.

That is, until I heard directly at the Pragmatic Summit in San Francisco that’s happening right now inside OpenAI.

Vijaye Raji and Thibaut Sottiaux from OpenAI say AI is shifting development from manual coding to guiding AI teams (setting goals and guardrails) while speeding up work and keeping core roles essential.

Close the laptop. Join the meeting. Come back to finished code. 

Raji’s (CTO, Applications, OpenAI) been at OpenAI for only six months, and already he’s seen Codex go from just a tool, to an extension, to an agent… and now it actually feels like a teammate.

Inside OpenAI, they recently launched something called a Codex Box.

Basically, engineers can grab a dev box on the server, fire off prompts, and let the system run things in parallel while they just work from their laptop. Sounds amazing, right?

Ivan Brezak Brkan
Photo by Ivan Brezak Brkan

Some engineers are using hundreds of billions of tokens per week across multiple agents – not for fun, but because that’s just how they build now. Raji said:

Software development inside OpenAI isn’t a single-threaded human loop anymore. It’s parallel. And that is going to become the new normal.

Designers and PMs are writing code. What’s going on?

Sottiaux (Engineering lead for Codex, OpenAI) described how the Codex team works today.

“It changes constantly. Almost week to week,” he said. “We look for bottlenecks, solve them, and then a new one pops up.”

At first, the slowest part was code generation, then it became code review, and now the friction often comes from understanding user needs faster – parsing feedback from Twitter, Reddit, and SDK experiments and turning that into product direction.

Speed up coding, and suddenly reviews become the bottleneck. Fix reviews, and CI/CD slows things down. That rhythm has become normal. Instead of debating every trade-off in design docs and discarding alternatives, teams try multiple implementations in parallel and focus on what actually works.

“Trying things is cheaper,” Sottiaux added. “So we try more things.”

And the rules? They’re blurring. Designers are shipping more code, PMs are writing and testing ideas, and it’s not that roles disappear – everyone’s capabilities are expanding.

Usually the problem is the prompt, not the system

What about long-running, autonomous tasks?

AI coding tools might seem like advanced autocomplete – type a few words, get a few lines back. Helpful, yes, but still reactive. Sottiaux challenged that:

Give the model a meaningful, well-defined objective, and it doesn’t just respond – it runs, for hours.

Inside OpenAI, the model runs on its own for hours, sometimes producing full reports. Engineers review the results, pick what works, and feed it back – this isn’t just suggestions anymore, it’s delegated execution.

There was also an unusually honest anecdote shared during the discussion: a researcher admitted that whenever he thought he was smarter than Codex, it turned out the problem was the prompt, not the system.

The bottleneck isn’t typing speed – it’s defining the goal clearly.

Photo by Ivan Brezak Brkan

AI tools accelerate work and ahape AI-native engineers

During weekly analytics reviews, teams don’t assign follow-ups, they just trigger Codex threads. “Twenty minutes later, the answers are ready before the meeting even ends,” one leader said.

In high-severity incidents, Codex gets effectively paged into calls to help figure out what went wrong and suggest the fastest recovery. “It’s like having small consultants working quietly in parallel,” they added.

So what does this mean for junior engineers?

OpenAI is hiring new grads and running a strong internship program, believing the next generation will be AI-native and comfortable with these tools from day one.

At the same time, strong foundations, guardrails, and code reviews remain essential. As they put it, “Foundations will never go out of fashion.”

Engineers will guide AI teams, speeding up code without touching every line

Vijaye has spent more than two decades in the industry. He has lived through the rise of developer tools, the shift to higher-level abstractions, the mobile wave, and the social platform era. In his view, none of those transitions felt quite like this one.

What makes the current moment different isn’t just what the technology can do, it’s how quickly it is evolving. The speed of change, he suggested, is on another level entirely.

And Sottiaux expects that pace to accelerate even further.

In the near term, I anticipate another order-of-magnitude jump in development speed, enabled by networks of agents collaborating toward large, shared goals. Instead of a single assistant responding to prompts, entire clusters could work together on complex builds.

As systems get more complex, engineers stop checking every line of code and start setting constraints, guardrails, and validating outputs. It’s less about manual control and more about guiding the system, and working through a single assistant that coordinates all the agents behind the scenes.

Whether this ends up being the smartest leap in the industry or a step we rushed into too quickly, only time will tell.

The post OpenAI Shares How They’re Turning Engineers into AI Team Leads appeared first on ShiftMag.

Read the whole story
alvinashcraft
37 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

What’s new at Stack Overflow: March 2026

1 Share
All that's new on Stack Overflow last month, including the redesigned Stack Overflow now available in beta and open-ended questions now available to all users, plus a shoutout to the community members earning the Populist badge.
Read the whole story
alvinashcraft
37 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Apple introduces the new iPad Air, powered by M4

1 Share
Apple announced the new iPad Air featuring M4 and more memory, giving users a big jump in performance and making it more versatile than ever.

Read the whole story
alvinashcraft
37 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Apple introduces iPhone 17e

1 Share
Apple today announced the new iPhone 17e, a powerful and affordable addition to the iPhone 17 lineup.

Read the whole story
alvinashcraft
37 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Why Capacity Planning Is Back

1 Share

In a previous article, we outlined why GPUs have become the architectural control point for enterprise AI. When accelerator capacity becomes the governing constraint, the cloud’s most comforting assumption—that you can scale on demand without thinking too far ahead—stops being true.

That shift has an immediate operational consequence: capacity planning is back. Not the old “guess next year’s VM count” exercise, but a new form of planning where model choices, inference depth, and workload timing directly determine whether you can meet latency, cost, and reliability targets.

In an AI-shaped infrastructure world, you don’t “scale” as much as you “get capacity.” Autoscaling helps at the margins, but it can’t create GPUs. Power, cooling, and accelerator supply set the limits.

The return of capacity planning

For a decade, cloud adoption trained organizations out of multi-year planning. CPU and storage scaled smoothly, and most stateless services behaved predictably under horizontal scaling. Teams could treat infrastructure as an elastic substrate and focus on software iteration.

AI production systems do not behave that way. They are dominated by accelerators and constrained by physical limits, and that makes capacity a first-order design dependency rather than a procurement detail. If you cannot secure the right accelerator capacity at the right time, your architecture decisions are irrelevant—because the system simply cannot run at the required throughput and latency.

Planning is returning because AI forces forecasting along four dimensions that product teams cannot ignore:

  • Model growth: model count, version churn, and specialization increase accelerator demand even when user traffic is flat.
  • Data growth: retrieval depth, vector store size, and freshness requirements increase the amount of inference work per request.
  • Inference depth: multi-stage pipelines (retrieve, rerank, tool calls, verification, synthesis) multiply GPU time non-linearly.
  • Peak workloads: enterprise usage patterns and batch jobs collide with real-time inference, creating predictable contention windows.

This is not merely “IT planning.” It is strategic planning, because these factors push organizations back toward multi-year thinking: procurement lead times, reserved capacity, workload placement decisions, and platform-level policies all start to matter again.

This is increasingly visible operationally: capacity planning is becoming a rising concern for data center operators, as The Register reports.

The cloud’s old promise is breaking

Cloud computing scaled on the premise that capacity could be treated as elastic and interchangeable. Most workloads ran on general-purpose hardware, and when demand rose, the platform could absorb it by spreading load across abundant, standardized resources.

AI workloads violate that premise. Accelerators are scarce, not interchangeable, and tied to power and cooling constraints that do not scale linearly. In other words, the cloud stops behaving like an infinite pool—and starts behaving like an allocation system.

First, the critical path in production AI systems is increasingly accelerator-bound. Second, “a request” is no longer a single call. It is an inference pipeline with multiple dependent stages. Third, those stages tend to be sensitive to hardware availability, scheduling contention, and performance variance that cannot be eliminated by simply adding more generic compute.

This is where the elasticity model starts to fail as a default expectation. In AI systems, elasticity becomes conditional. It depends on capacity access, infrastructure topology, and a willingness to pay for assurance.

AI changes the physics of cloud infrastructure

In modern enterprise AI, the binding constraints are no longer abstract. They are physical.

Accelerators introduce a different scaling regime than CPU-centric enterprise computing. Provisioning is not always immediate. Supply is not always abundant. And the infrastructure required to deploy dense compute has facility-level limits that software cannot bypass.

Power and cooling move from background concerns to first-order constraints. Rack density becomes a planning variable. Deployment feasibility is shaped by what a data center can deliver, not only by what a platform can schedule.

AI-driven density makes power and cooling the gating factors—as Data Center Dynamics explains in its ‘path to power’ overview.

This is why “just scale out” no longer behaves like a universal architectural safety net. Scaling is still possible, but it is increasingly constrained by physical reality. In AI-heavy environments, capacity is something you secure, not something you assume.

From elasticity to allocation

As AI becomes operationally critical, cloud capacity begins to behave less like a utility and more like an allocation system.

Organizations respond by shifting from on-demand assumptions to capacity controls. They introduce quotas to prevent runaway consumption, reservations to ensure availability, and explicit prioritization to protect production workflows from contention. These mechanisms are not optional governance overhead. They are structural responses to scarcity.

In practice, accelerator capacity behaves more like a supply chain than a cloud service. Availability is influenced by lead time, competition, and contractual positioning. The implication is subtle but decisive: enterprise AI platforms begin to look less like “infinite pools” and more like managed inventories.

This changes cloud economics and vendor relationships. Pricing is no longer only about utilization. It becomes about assurance. The questions that matter are not just “how much did we use,” but “can we obtain capacity when it matters,” and “what reliability guarantees do we have under peak demand.”

When elasticity stops being a default

Consider a platform team that deploys an internal AI assistant for operational support. In the pilot phase, demand is modest and the system behaves like a conventional cloud service. Inference runs on on-demand accelerators, latency is stable, and the team assumes capacity will remain a provisioning detail rather than an architectural constraint.

Then the system moves into production. The assistant is upgraded to use retrieval for policy lookups, reranking for relevance, and an additional validation pass before responses are returned. None of these changes appear dramatic in isolation. Each improves quality, and each looks like an incremental feature.

But the request path is no longer a single model call. It becomes a pipeline. Every user request now triggers multiple GPU-backed operations: embedding generation, retrieval-side processing, reranking, inference, and validation. GPU work per request rises, and the variance increases. The system still works—until it meets real peak behavior.

The first failure is not a clean outage. It is contention. Latency becomes unpredictable as jobs queue behind each other. The “long tail” grows. Teams begin to see priority inversion: low-value exploratory usage competes with production workflows because the capacity pool is shared and the scheduler cannot infer business criticality.

The platform team responds the only way it can. It introduces allocation. Quotas are placed on exploratory traffic. Reservations are used for the operational assistant. Priority tiers are defined so production paths cannot be displaced by batch jobs or ad hoc experimentation.

Then the second realization arrives. Allocation alone is insufficient unless the system can degrade gracefully. Under pressure, the assistant must be able to narrow retrieval breadth, reduce reasoning depth, route deterministic checks to smaller models, or temporarily disable secondary passes. Otherwise, peak demand simply converts into queue collapse.

At that point, capacity planning stops being an infrastructure exercise. It becomes an architectural requirement. Product decisions directly determine GPU operations per request, and those operations determine whether the system can meet its service levels under constrained capacity.

How this changes architecture

When capacity becomes constrained, architecture changes—even if the product goal stays the same.

Pipeline depth becomes a capacity decision. In AI systems, throughput is not just a function of traffic volume. It is a function of how many GPU-backed operations each request triggers end-to-end. This amplification factor often explains why systems behave well in prototypes but degrade under sustained load.

Batching becomes an architectural tool, not an optimization detail. It can improve utilization and cost efficiency, but it introduces scheduling complexity and latency trade-offs. In practice, teams must decide where batching is acceptable and where low-latency “fast paths” must remain unbatched to protect user experience.

Model choice becomes a production constraint. As capacity pressure increases, many organizations discover that smaller, more predictable models often win for operational workflows. This does not mean large models are unimportant. It means their use becomes selective. Hybrid strategies emerge: smaller models handle deterministic or governed tasks, while larger models are reserved for exceptional or exploratory scenarios where their overhead is justified.

In short, architecture becomes constrained by power and hardware, not only by code. The core shift is that capacity constraints shape system behavior. They also shape governance outcomes, because predictability and auditability degrade when capacity contention becomes chronic.

What cloud and platform teams must do differently

From an enterprise IT perspective, this shows up as a readiness problem: can infrastructure and operations absorb AI workloads without destabilizing production systems? Answering that requires treating accelerator capacity as a governed resource—metered, budgeted, and allocated deliberately.

Meter and budget accelerator capacity

  • Define consumption in business-relevant units (e.g., GPU-seconds per request and peak concurrency ceilings) and expose it as a platform metric.
  • Turn those metrics into explicit capacity budgets by service and workload class—so growth is a planning decision, not an outage.

Make allocation first-class

  • Implement admission control and priority tiers aligned to business criticality; do not rely on best-effort fairness under contention.
  • Make allocation predictable and early (quotas/reservations) instead of informal and late (brownouts and surprise throttling).

Build graceful degradation into the request path

  • Predefine a degradation ladder (e.g., reduce retrieval breadth or route to a smaller model) that preserves bounded cost and latency.
  • Ensure degradations are explicit and measurable, so systems behave deterministically under capacity pressure.

Separate exploratory from operational AI

  • Isolate experimentation from production using distinct quotas/priority classes/reservations, so exploration cannot starve operational workloads.
  • Treat operational AI as an enforceable service with reliability targets; keep exploration elastic without destabilizing the platform.

 In an accelerator-bound world, platform success is no longer maximum utilization—it is predictable behavior under constraint.

What this means for the future of the cloud

AI is not ending the cloud. It is pulling the cloud back toward physical reality.

The likely trajectory is a cloud landscape that becomes more hybrid, more planned, and less elastic by default. Public cloud remains critical, but organizations increasingly seek predictable access to accelerator capacity through reservations, long-term commitments, private clusters, or colocated deployments.

This will reshape pricing, procurement, and platform design. It will also reshape how engineering teams think. In the cloud-native era, architecture often assumed capacity was solvable through autoscaling and on-demand provisioning. In the AI era, capacity becomes a defining constraint that shapes what systems can do and how reliably they can do it.

That is why capacity planning is back—not as a return to old habits, but as a necessary response to a new infrastructure regime. Organizations that succeed will be the ones that design explicitly around capacity constraints, treat amplification as a first-order metric, and align product ambition with the physical and economic limits of modern AI infrastructure.

Author’s Note: This implementation is based on the author’s personal views based on independent technical research and does not reflect the architecture of any specific organization.



Read the whole story
alvinashcraft
37 minutes ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories