Read more of this story at Slashdot.
Read more of this story at Slashdot.
Developers spend much of their time editing, refactoring, and debugging rather than producing entirely new code. Code creation tends to involve non-sequential back-and-forth refinement rather than typing out a complete function in one uninterrupted sequence. You might sketch a part, adjust parameters, skip ahead, then revisit earlier sections to refine them.
Diffusion models, and in particular diffusion large language models (d-LLMs), operate differently from current coding assistants. Unlike autoregressive models, which generate token by token in a strict left-to-right sequence, d-LLMs condition on both past and future context. This enables them to model edits and refinements more directly, reflecting how developers iteratively construct and adjust code. As shown by Gong et al. (2025): “the [d-LLM] model often plans token generation more globally, much like a programmer jumping back and forth through code to refine a code implementation.” This matches the reality of code authorship, which is non-linear: you sketch a bit, revise earlier parts, jump ahead, and keep iterating.
One of the most striking demos of diffusion-based models like DiffuCoder showed exactly this: the model skipped a parameter mid-function, continued writing later parts, then circled back to fill in what was missing.
(The prompt used here is: “Write a Python function to implement binary search together with docstrings and type hints.” The example is generated using the apple/DiffuCoder-7B-Instruct model, configured to produce one token per forward pass with a limit of 256 new tokens. The blue slots illustrate positions that the model later revisits and refines during the diffusion process.)
This structure may mirror how many developers think. You may not know every detail upfront, but you can scaffold a function and refine as you go. A model that generates in a sequential way is better suited to support this process.
Autoregressive models can be prompted to consider bidirectional context by providing both prefix and suffix in the prompt, but this remains a workaround rather than a native capability. Diffusion models, particularly diffusion large language models (d-LLMs), are designed from the ground up to condition on both past and future context during generation.
This design proves valuable for tasks requiring reversal reasoning, where coherence must hold in both directions, and for code generation, where a variable’s usage downstream should inform its earlier definition. As shown by Nie at al. (2025), d-LLMs exhibit “consistent zero-shot performance across both forward and reversal tasks.”
For developers, this translates into improved handling of structured logic, long-range dependencies, and code constraints that depend on order-sensitive relationships.
Because diffusion models mask and unmask tokens gradually at any random position, they are naturally suited to infilling. If you ask a diffusion model to rewrite a block of code with a different parameter or to refactor a loop into a comprehension, it can directly operate on masked sections.
The distinction with autoregressive LLMs is subtle here, since both require the relevant code region to appear in the prompt. Where diffusion models add value is in integration with deterministic tooling such as IDEs. An IDE could highlight several problematic or incomplete regions, mask them, and allow the diffusion model to unmask and regenerate all affected parts in a single coherent pass. This distinguishes diffusion models from FIM-enabled autoregressive models, which can handle isolated infilling but struggle to maintain global consistency across multiple edits.
Consider adding a field to a class that must be initialised in the constructor, used in a method, and serialised elsewhere. Rather than orchestrating multiple FIM calls or regenerating entire methods, a diffusion model can mask the relevant locations and generate all necessary updates at once.
This makes diffusion models well-suited to refactoring tasks where changes must satisfy global constraints, such as ensuring a new parameter appears consistently in a function signature, its documentation, call sites, and test cases.
For example, an IDE might flag a type mismatch in a function signature. Instead of regenerating the entire function, a diffusion model could unmask just the problematic parameter declaration and rewrite it to match the expected type, leaving the rest of the code untouched. This localised editing process mirrors how developers typically fix errors and refactor code incrementally.
Autoregressive models operate sequentially, generating one token per forward pass. Diffusion models, by contrast, can produce multiple tokens in a single forward pass. Benchmarks reveal a current practical shortcoming: increasing the number of tokens per step often reduces quality. The underlying mechanism, however, allows faster inference and is likely to improve in future.
Researchers have proposed semi-autoregressive approaches to bridge the gap between autoregressive and diffusion-based generation – most notably Block Diffusion – Arriola et al. (2025). This method generates blocks of tokens from left to right while allowing diffusion models to unmask flexibly within each block. In principle, this allows reuse of the KV cache, which plays a key role in the efficiency of autoregressive models. In practice, however, unmasking too many tokens in parallel creates a trade-off. Throughput increases, but quality often drops sharply, especially if the KV cache is not reused carefully.
Semi-autoregressive generation represents an intermediate step between autoregressive and truly out-of-order inference. Diffusion-based language models work fundamentally out of sequence, yet current methods still borrow ideas from autoregressive design, such as KV cache reuse, because the optimisation tools for autoregressive generation remain highly developed and effective. Ironically, these mature autoregressive mechanisms improve generation speed even as research moves towards models that can generate fully out of order.
For now, developers should temper expectations. Our internal experimentations with latest open-source models show that:
def factorial(n):
if n == 0:
return 1
else:
return n * factorial(n - 1)
def factorial(n): # repeated
if n == 0:
return 1
else:
return n * factorial(n - 1)
def factorial(n): # repeated again
if n == 0:
return 1
else:
return n * factorial(n - 1)
def factorial(n):
if n == 0:
return 1
else:
return n * factorial( # truncated, no argument
def factorial(n):
if (n == 0:
return 1,
else:
return n ** factorial[n - 1))
Benchmarking current state-of-the-art d-LLMs – open source (DiffuCoder, Seed-Diffusion) and closed-source (Mercury, Gemini-Diffusion) – shows mixed performance when compared against strong autoregressive baselines such as Qwen2.5-Coder. See Gong et al. (2025) and Song, Yuxuan et al. (2025).
Despite these issues, diffusion models still introduce valuable new possibilities for code generation and editing. At the same time, their ecosystem is very immature compared to autoregressive LLMs.
Training and inference techniques that help mitigate sequential bottlenecks in LLMs, such as chunked prefill, speculative decoding, or prefix caching, have no direct equivalents yet for diffusion models.
Diffusion also requires defining output length in advance, which often leads to inefficiency compared to the <eos> termination signal in LLMs.
Finally, the open-source diffusion models for code makes it harder for developers to experiment and refine these methods.
These niches give developers a reason to experiment, even if production-ready tools remain a little way off. Beyond these early applications, the major promise of d-LLMs lies in their potential for much faster generation, since they can produce N tokens per forward pass rather than one.
This capability could eventually reshape performance expectations for coding assistants once the quality–efficiency trade-offs are better understood.
Diffusion models won’t replace autoregressive models overnight. But they represent a new paradigm that better reflects how developers think and work. Their ability to edit flexibly, consider context in both directions, and potentially accelerate inference sets them apart.
For developers, the practical benefit is clear: more snappy generation, and more support for the unstructured, iterative way you actually write code.
As research continues, diffusion models could become the backbone of coding assistants that feels less like next token generators and more like principled, code structure-aware, programming collaborators.
⭐⭐ Review Us ⭐⭐
Machine transcription available on http://mergeconflict.fm
Alec welcomes Sakari Nahi, CEO of Zure, for a fun and thoughtful discussion that spans 25 years of tech evolution. Sakari shares how a single C# book jump-started his career, why he left a job he didn’t love to found a cloud-native consultancy, and what it’s like building a people-first engineering culture across multiple countries.
The two dig into real AI use cases that actually work today—vector search, customer service automation, field-tech knowledge retrieval—and explore how spec-driven development and tools like GitHub Copilot are transforming the way teams build software. They also get honest about shadow IT, geopolitics affecting cloud decisions, the future of Power Platform, and why AI feels “magical” even without AGI.
Whether you’re a developer, leader, or just AI-curious, this episode is packed with relatable stories and practical perspectives.
Anthropic forecasts explosive growth and large profits, targeting $70 billion in revenue and $17 billion in positive cash flow by 2028. Market sentiment is volatile as big short positions and bank hedging plans collide with massive debt needs for AI infrastructure. Snap's $400 million Perplexity deal and Amazon's lawsuit over agent scraping foreshadow legal and distribution battles over shopping agents and data access.
Brought to you by:
KPMG – Go to www.kpmg.us/ai to learn more about how KPMG can help you drive value with our AI solutions.
Vanta - Simplify compliance - https://vanta.com/nlw
The AI Daily Brief helps you understand the most important news and discussions in AI.
Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614
Get it ad free at
Join our Discord: https://bit.ly/aibreakdown
The solution consists of three main components:
Create an Azure Service Principal and configure variable groups in Azure DevOps for both DEV and PROD environments:
# Example from fabric-ci-deploy.yml variables: - group: Fabric-variables - name: lakehouse_config_file value: 'lakehouse_id.yml' - name: parameter_config_file value: 'parameter.yml' ``This file uses JSONPath to target and replace parameters for Data Pipelines:
key_value_replace: - find_key: "properties.parameters.region_cd.defaultValue" replace_value: dev: "'xxxx','xxxx'" prod: "'xxxx'" item_type: "DataPipeline" item_name: "InitialLoad_NA"
Defines which lakehouse and workspace to target for each environment:
environments: dev: workspace_id: "xxxxxxx-xxxx-xxxx-xxxx-xxxxxx" workspace_name: "fabrictest" lakehouses: - source_id: "xxxxxxx-xxxx-xxxx-xxxx-xxxxxx" target_name: "SilverLakeHouse" prod: workspace_id: "xxxxxxx-xxxx-xxxx-xxxx-xxxxxx" workspace_name: "Enterprise Workspace" lakehouses: - source_id: "xxxxxxx-xxxx-xxxx-xxxx-xxxxxx" target_name: "prod"The pipeline supports environment selection, triggers, and parameter management:
trigger: branches: include: - develop - feature/* exclude: - main - prod pr: none parameters: - name: items_in_scope displayName: Enter Fabric items to be deployed type: string default: 'Notebook,DataPipeline,SemanticModel,Report' - name: deployment_environment displayName: 'Deployment Environment' type: string default: 'dev' values: - dev
The script automates authentication, deployment, and metadata updates:
from fabric_cicd import FabricWorkspace, publish_all_items from azure.identity import ClientSecretCredential def authenticate(): credential = ClientSecretCredential( tenant_id=os.environ["AZTENANTID"], client_id=os.environ["AZCLIENTID"], client_secret=os.environ["AZSPSECRET"] ) return credential def deploy_lakehouse(ws, lakehouse_config, credential): # Deploy lakehouse via # Deploy lakehouse via Fabric REST APIIt also updates notebook metadata to reference the correct lakehouse and workspace IDs, ensuring consistency across environments. The complete Python script (deploy-to-fabric.py) is attached below for reference. You can copy and adapt it for your own deployments.
The fabric-cicd library scans for DataPipeline folders, matches names, and applies environment-specific replacements from parameter.yml.
Step 3: Deploy to Fabric
The script authenticates, creates the FabricWorkspace object, processes lakehouse configurations, and deploys all specified artifacts.
This automated deployment solution for Microsoft Fabric ensures reliable, repeatable, and secure artifact management across multiple environments. By leveraging Azure DevOps, Python scripting, and robust configuration files, teams can achieve seamless CI/CD for complex Fabric projects.