Shorter list today as I begin my trip to Delhi. I’m in NYC at this very moment, and getting ready for the next segment. Expect regular reading lists next week, just at a more unusual time (for me).
[blog] In defense of not reading the code. This isn’t a glib take by someone outside of tech. It’s a well-reasoned point and I enjoyed the challenge to the conventional wisdom.
[blog] Besieged. Good piece. AI is impacting so many dimensions of work and community. Plenty of it is positive, but there’s a lot of unknowns.
[blog] Learn fundamentals, not frameworks. Someone needs to know frameworks in 2026, but that’s not me. I’m definitely trying to invest more in fundamentals.
[blog] Codelab — Gemini for Developers. Gemini covers a lot of territory for us, and this new codelab goes through the full spectrum of products.
Configuration Rewrite history (rebase -i) with VSCode A rebase with VSCode does not work, as the editor window is immediately closed and the command is ended. Unless you add this to your config.
[sequence] editor = code --wait --reuse-window Aliases Here are some aliases that I want to remember.
Squash-commits I have two commits and thing they should be one. The message of the last commit is used for the new commit.
In the rapidly evolving landscape of large language models (LLMs), achieving precise control over model behavior while maintaining quality has become a critical challenge. While models like GPT-4 demonstrate impressive capabilities, ensuring their outputs align with human preferences—whether for safety, helpfulness, or style—requires sophisticated fine-tuning techniques. Direct Preference Optimization (DPO) represents a breakthrough approach that simplifies this alignment process while delivering exceptional results.
This comprehensive guide explores DPO fine-tuning, explaining what it is, how it works, when to use it, and how to implement it using Microsoft Foundry SDK. Whether you’re building a customer service chatbot that needs to be consistently helpful, a content generation system that should avoid harmful outputs, or any AI application where response quality matters, understanding DPO will empower you to create better-aligned models.
What is Direct Preference Optimization (DPO)?
Direct Preference Optimization is an innovative technique for training language models to align with human preferences without the complexity of traditional Reinforcement Learning from Human Feedback (RLHF). Introduced in the groundbreaking paper Direct Preference Optimization: Your Language Model is Secretly a Reward Model” by Rafailov, Sharma,…, DPO fundamentally reimagines how we teach models to generate preferred outputs.
Unlike traditional supervised fine-tuning where you show a model “what to say,” DPO teaches models by showing them comparative examples: “this response is better than that one.” For each prompt, you provide:
Preferred response: A high-quality, desirable output
Non-preferred response: A lower-quality or undesirable output
The model learns to increase the likelihood of generating preferred responses while decreasing the probability of non-preferred ones, all without requiring explicit reward modeling or complex reinforcement learning pipelines.
import os
from dotenv import load_dotenv
from azure.identity import DefaultAzureCredential
from azure.ai.projects import AIProjectClient
# Load environment variables
load_dotenv()
endpoint = os.environ.get("AZURE_AI_PROJECT_ENDPOINT")
model_name = os.environ.get("MODEL_NAME")
# Define dataset file paths
training_file_path = "training.jsonl"
validation_file_path = "validation.jsonl"
credential = DefaultAzureCredential()
project_client = AIProjectClient(endpoint=endpoint, credential=credential)
openai_client = project_client.get_openai_client()
# Upload training and validation files
with open(training_file_path, "rb") as f:
train_file = openai_client.files.create(file=f, purpose="fine-tune")
with open(validation_file_path, "rb") as f:
validation_file = openai_client.files.create(file=f, purpose="fine-tune")
openai_client.files.wait_for_processing(train_file.id)
openai_client.files.wait_for_processing(validation_file.id)
# Create DPO Fine Tuning job
fine_tuning_job = openai_client.fine_tuning.jobs.create(
training_file=train_file.id,
validation_file=validation_file.id,
model=model_name,
method={
"type": "dpo",
"dpo": {
"hyperparameters": {
"n_epochs": 3,
"batch_size": 1,
"learning_rate_multiplier": 1.0
}
}
},
extra_body={"trainingType": "GlobalStandard"}
)
DPO Fine-Tuning Results:
print(f"Testing fine-tuned model via deployment: {deployment_name}")
response = openai_client.responses.create(
model=deployment_name,
input=[{"role": "user", "content": "Explain machine learning in simple terms."}]
)
print(f"Model response: {response.output_text}")
Inference result:
Model response: Machine learning is like teaching a computer to learn from experience, similar to how people do. Instead of programming specific instructions for every task, we give the computer a lot of data and it figures out patterns on its own. Then, it can use what it learned to make decisions or predictions. For example, if you show a machine learning system lots of pictures of cats and dogs, it will learn to recognize which is which by itself.
Data format example:
{
"input": {
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
]
},
"preferred_output": [
{"role": "assistant", "content": "The capital of France is Paris."}
],
"non_preferred_output": [
{"role": "assistant", "content": "I think it's London."}
]
}
Why? Because it seems I’m getting an increased number of client requests for help with this stuff. And by “this stuff” I mean how to target resources (users, devices, mostly) for various things like policies, deployments, inventory, audit logging, sign-in activity, etc.
Effectively organizing users and devices is foundational to managing policies, deployments, and security at scale. Whether you’re targeting Microsoft Intune policies, Conditional Access rules, or software deployments, how you group objects determines your flexibility and maintainability. Here’s a breakdown of the most common methods.
In case you’re wondering, I’m not focused on security boundary aspects in this article. This is about identifying or targeting resources and accounts for other purposes (refer to the first paragraph above).
1. Naming Conventions
Using standardized prefixes, suffixes, or patterns in object names (e.g., WKS-NYC-FIN-001, JSMITH, SRV-PRNT-DFW-001).
Advantages:
Instantly human-readable; easy to identify location, department, or type at a glance
Works across nearly all platforms and tools—no special features required
Enables simple wildcard filtering in scripts and queries
Disadvantages:
Brittle—renaming objects when roles/locations change is tedious and error-prone
Inconsistent adoption leads to gaps; relies heavily on discipline
Limited queryability in modern identity platforms (dynamic groups can’t always parse naming patterns)
Administrative Overhead: Simple to start, but moderate to maintain at scale due to enforcement challenges.
Notes: More commonly used for devices than users. However, user names and properties should ALWAYS follow a consistent pattern. For example, if you use the “description” property, be consistent. It can (and often will) pay off incredibly later on when you need to mine that property for operational needs.
2. AD Organizational Units (OUs)
Placing objects in a hierarchical OU structure (e.g., OU=Finance,OU=NYC,DC=contoso,DC=com).
Advantages:
Native GPO targeting—policies link directly to OUs
Clear visual hierarchy in AD tools
Well-understood by most Windows admins, LDAP developers
Disadvantages:
Objects can only exist in one OU—no multi-dimensional grouping
Moving objects between OUs can break GPO inheritance unexpectedly
Irrelevant to cloud-only (Entra-joined) devices and increasingly hybrid environments
Administrative Overhead: Simple for on-prem GPO targeting, but complex when trying to maintain parity with cloud workloads.
Notes: For those still tied to Active Directory, OU assignments should be focused on policy relevance more than visual organization, unless you have a VERY simple environment with very few custom Group Policy Objects. Example image below for Active Directory user accounts.
Assigning users or devices to groups, then targeting those groups for policies or access. Or using them filter inventory, audit logs, etc. for reporting, or targeting deployments.
Advantages:
Objects can belong to multiple groups simultaneously—flexible multi-dimensional targeting
Native integration with Conditional Access, Intune, RBAC, and app assignments
Supports nested groups for hierarchical scenarios
Disadvantages:
Group sprawl is real—without governance, you’ll end up with hundreds of overlapping groups
Static group membership requires manual updates or automation
Troubleshooting “why does this user have access?” becomes detective work, especially with nested groups, multi-domain or forest-trust environments
Administrative Overhead: Moderate—requires naming conventions and lifecycle management for the groups themselves.
Notes: The overhead of maintaining static assignments can be mitigated through automation. For example, when onboarding user accounts, you can assign group memberships based on role-mapping such as department, job title, location, and so on.
Azure Management Groups, Azure Subscriptions, Azure Resource Groups, and Entra ID Administrative Units were not included here since they’re intended for security delegation and cost management more than purely for inventory-oriented organization. Azure Policy assignments are kind of a gray area, and I have enough gray as it is.
Examples using PowerShell without real coffee or quality sleep:
Groups whose membership is automatically calculated based on attribute queries (e.g., user.department -eq “Finance”).
Not applicable to Active Directory domains. Void where prohibited, taxed or regulated. Batteries not included.
Advantages:
Zero manual membership management—objects automatically join/leave as attributes change
Ideal for large, fluid environments with frequent onboarding/offboarding
Enables attribute-driven policy targeting without touching group memberships
Disadvantages:
Membership updates are not instant—can lag by minutes to hours
Complex rules are hard to debug; subtle syntax errors cause silent failures
Requires clean, consistent attribute data—garbage in, garbage out
Administrative Overhead: Moderate to complex—rule authoring is straightforward, but ensuring attribute hygiene is ongoing work.
Note: The time it takes to update membership will be affected by total number of dynamic groups in the tenant, as well as number of object changes per update cycle, and dynamic rule configurations which are complex, and/or rely on operators like Match, Contains or memberOf. See link for more.
5. Attribute-Based Filtering (Direct) / Tags
Targeting objects directly by attribute values in policies or queries (e.g., filtering by extensionAttribute1, department, city). This also extends (conceptually at least) to Azure resource tags.
Imagine if you could literally tag every single thing in your home/apartment or vehicle. Everything. And then you had some magical tool that would find anything based on that tag. “Find spatula” or “Find guitar pick” and it lit up and broadcast it’s precise location. That’s kind of what this about.
Advantages:
No group management overhead—filter directly on source-of-truth data
Changes to attributes immediately affect targeting (no group sync delay)
Highly granular; combine multiple attributes for precise scoping
Disadvantages:
Not all platforms support direct attribute filtering (Intune filters do; some legacy tools don’t)
Requires well-governed attribute schema and consistent data population
Can become opaque—”why was this device targeted?” requires querying attributes
Administrative Overhead: Simple if attributes are already populated; complex if you need to establish and enforce an attribute schema from scratch.
Notes: This is more of a late-binding approach. Very much like using Azure resource tags: It lays the groundwork for other processes to query and return matching items based on search criteria. Some common examples: job title, department, physicalDeliveryOfficeName, employeeType, description, any of the various “extension” attributes, and so on. For Azure, this could be resource tags such as (just examples): “environment”, or “cost center”.
Lame example of Azure Virtual machine with resource tags. Just add water, makes its own sauce.
Summary
So, in conclusion: there are many ways you can model an environment to make downstream automation, and reporting easier or more flexible. MBA folks might even say “robust” or “synergistic”, or “wholistically synergistic”, which is fine as long as it means they approve your budget request. Just nod and smile, and say thank you.
Method
Best For
Overhead
Naming Conventions
Quick identification, scripting
Simple → Moderate
AD OUs
On-prem GPO targeting
Simple (on-prem)
Security Groups
Multi-dimensional access & policy
Moderate
Dynamic Groups
Automated membership at scale
Moderate → Complex
Attribute Filtering
Granular, real-time targeting
Simple → Complex
Final Thoughts
Most mature environments use a combination of these methods. OUs handle legacy GPO needs, dynamic groups automate cloud policy targeting, and attributes or resource tags provide the underlying data that drives additional flexibility. The key is investing in attribute hygiene early—clean data makes every other method more effective.
Start simple, automate where you can, and resist the urge to create a group for every edge case. Establish standards which are published in a central location, and require all staff to comply or have their company photo replaced with an image of dog poo.
Whatever you do, avoid creating secondary inventory data repositories to link things together. You should have ONE and only ONE source of truth for anything. Don’t make a spreadsheet or database to identify who’s in a department, when you can just populate the “department” attribute, group or tag. Make the systems work for you, not the other way around.
If you’re wondering why I chose the bar photo above, it’s because labeling and organizing bottles is common for most bartenders. Also for hospitals, car mechanics, dentists, electricians, and many other careers. It helps you find the right things when you need them. Not paying attention to that early will lead you straight back to the bar, and, well, you know how that ends.
Disclaimer
Blah blah blah… all information provided is for informational purposes only. Use or adaptation, even derivatives of derived derivations, of anything provided herein, in whole, or in part, is without warranty of fitness or purpose of any kind, in any galaxy or spatial realm, time dimensions notwithstanding. You assume any and all risk for damages, interruptions, delays, weird looks, gossip or drink bottles thrown from passing vehicles. The author assumes no risk or liability or blame or backhanded wierdness, even smirks or unusual grunt sounds, even from creatures with more than two legs. And, no, we don’t accept coupons. I’m not an attorney, but my family could form a law firm if they could stop talking about politics and Bruce Springsteen long enough to do the paperwork. I have no idea where this is going, but I’m sure glad you made it this far! Kirk out.
Ranjan Roy from Margins is back for our weekly discussion of the latest tech news. We're also joined by Steven Adler, ex-OpenAI safety researcher and author of Clear-Eyed AI on Substack. We cover: 1) The Viral "Something Big Is Happening" essay 2) What the essay got wrong about recursive self-improving AI 3) Where the essay was right about the pace of change 4) Are we ready for the repercussions of fast moving AI? 5) Anthropic's Claude Opus 4.6 model card's risks 6) Do AI models know when they're being tested? 7) An Anthropic researcher leaves and warns "the world is in peril" 8) OpenAI disbands its mission alignment team 9) The risks of AI companionship 10) OpenAI's GPT 4o is mourned on the way out 11) Anthropic raises $30 billion
---
Enjoying Big Technology Podcast? Please rate us five stars ⭐⭐⭐⭐⭐ in your podcast app of choice.
Want a discount for Big Technology on Substack + Discord? Here’s 25% off for the first year: https://www.bigtechnology.com/subscribe?coupon=0843016b
EXCLUSIVE NordVPN Deal ➼ https://nordvpn.com/bigtech. Try it risk-free now with a 30-day money-back guarantee!