Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
153182 stories
·
33 followers

Three words sparked an AI breakthrough in restaurant operations: “Hair. On. Fire.”

1 Share

That’s how an EcoLab® executive described decision‑making for quick‑service restaurant managers during peak rush—raw instinct, pure adrenaline, with seconds ticking away. That moment sparked a Microsoft Garage Hackathon project that became RushReady™, turning frontline kitchen data into real‑time guidance that helps managers boost sales per hour, speed of service, and profit margin. What started as a hackathon idea now helps....

The post Three words sparked an AI breakthrough in restaurant operations: “Hair. On. Fire.” appeared first on Microsoft Garage.

Read the whole story
alvinashcraft
just a second ago
reply
Pennsylvania, USA
Share this story
Delete

.NET MAUI Community Standup: Rust, SkiaSharp Everywhere, AI/ML Live Processing

1 Share
From: dotnet
Duration: 0:00
Views: 0

David and Gerald are joined this month by Nick Kovalsky who will show us some amazing .NET MAUI things with Rust, SkiaSharp and drawn UI (also coming to Blazor?!) and AI live processing that he has been working on.

🔗 Links: https://www.theurlist.com/maui-standup-may2026

🎙️ Featuring: David Ortinau (@davidortinau), Gerald Versluis (@jfversluis), Nick Kovalsky (@nickkovalsky)

#dotnetmaui #skiasharp #crossplatform

Read the whole story
alvinashcraft
53 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Java Application Modernization - Series Introduction

1 Share
From: Microsoft Developer
Duration: 7:46
Views: 63

Still running legacy Java in production? You're not alone — and modernization doesn't have to be a risky, manual grind. In this series introduction, learn how GitHub Copilot transforms Java application modernization from a slow, error-prone process into a structured, AI-assisted engineering workflow.

In this episode, you'll learn:
→ Why Java modernization is critical for AI readiness and cloud adoption
→ The four-phase modernization loop: Assess → Upgrade → Migrate → Test & Deploy
→ How GitHub Copilot agents automate repetitive modernization tasks — frameworks, dependencies, CVEs, containerization
→ How portfolio assessment tools (Azure Migrate, Cast, Doctor Migrate) connect to developer workflows via GitHub Issues
→ What the full enterprise modernization workflow looks like end to end

📺 This is Episode 0 of the Modernize Java Apps with AI series — a 9-part, hands-on guide to upgrading legacy Java applications using GitHub Copilot. Each episode is 5–7 minutes and covers a focused step in the modernization journey.

🔗 Series playlist: https://www.youtube.com/playlist?list=PLlrxD0HtieHhaBJWlcxGd-kTDikSD4xyD
🔗 GitHub Copilot Modernization extension: https://aka.ms/GHCPMod-Java
🔗 Azure Migrate: https://aka.ms/azuremigrate

👤 Presented by Ayan Gupta, Java & AI Advocate, Microsoft

Java #GitHubCopilot #JavaModernization #LegacyCode #CloudMigration #Azure #SpringBoot #AI #EnterpriseJava #DevOps

Read the whole story
alvinashcraft
58 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Azure Blob Storage for AI

1 Share
From: MicrosoftAzure
Duration: 12:47
Views: 69

Learn how to Azure Blob Storage scales for demanding AI Workloads across Training and Agentic Inferencing.

Explore more technical guidance and videos on the Azure Infrastructure as a Service (IaaS) Resource Center: https://msft.it/6055QeJxb

#Microsoft #MicrosoftAzure

Read the whole story
alvinashcraft
1 minute ago
reply
Pennsylvania, USA
Share this story
Delete

144: Beijing Auto Show Reactions w/ Lei Xing - Feels Like The Future Of EVs

1 Share

In this episode, we talk with China auto expert Lei Xing and and our co-host Kyle Conner about the being at the Beijng Auto Show, the future of (super) fast charging, testing autonomous driving systems, and lots more.

Our special guest Lei Xing is the co-host of China EVs & More https://www.chinaevsandmore.com/
He can also be found at https://x.com/leixing77 and https://www.youtube.com/@leixing77





Download audio: https://dts.podtrac.com/redirect.mp3/audioboom.com/posts/8899864.mp3?modified=1778069425&sid=5141110&source=rss
Read the whole story
alvinashcraft
1 minute ago
reply
Pennsylvania, USA
Share this story
Delete

Now in Foundry: IBM Granite 4.1, NVIDIA Nemotron Nano Omni, and Qwen3.6-35B-A3B

1 Share

This week Microsoft Foundry adds two major model families alongside a reasoning powerhouse that spans the full spectrum from specialized speech and vision to general-purpose coding and long-context analysis. IBM's Granite 4.1 is a famiily of 10: six LLMs across 3B, 8B, and 30B sizes in both full-precision and FP8 variants, plus a safety model, a vision-language model for document extraction, and a multilingual speech recognition model. NVIDIA's Nemotron-3-Nano-Omni-30B-A3B-Reasoning brings multimodal capability—video, audio, image, and text—to a 31B Mamba2-Transformer Hybrid Mixture-of-Experts (MoE) architecture that activates only 3B parameters per forward pass; three variants are available in Foundry (BF16, FP8, and NVFP4), with the FP8 variant featured here. Qwen3.6-35B-A3B is designed for agentic coding among open models, with thinking preservation across conversation turns and a context window extensible to 1 million tokens.

Models of the week

IBM: granite-4.1-30b

Model Specs

  • Parameters / size: 30B (flagship of the Granite 4.1 family)
  • Context length: 131,072 tokens
  • Primary task: Text generation (multilingual instruction following, RAG, tool calling, code, summarization)

Why it's interesting

  • The Granite 4.1 release brings 10 models to Microsoft Foundry. The LLM lineup covers granite-4.1-3b-instruct, granite-4.1-8b-instruct, and granite-4.1-30b-instruct with FP8 variants for each, plus granite-guardian-4.1-8b for safety, granite-vision-4.1-4b for document and chart understanding, and granite-speech-4.1-2b for multilingual speech recognition. This is a deployment-ready stack where teams can mix and match model sizes and modalities from a single provider. 
  • Strong instruction following and reasoning at the 30B scale: granite-4.1-30b-instruct scores 80.16 on MMLU, 64.09 on MMLU-Pro, 83.74 on Big-Bench Hard (BBH), 77.80 on AGI Eval, 45.76 on GPQA (Graduate-Level Google-Proof Q&A, a graduate-level science reasoning benchmark), and 89.65 average on IFEval (instruction following). These results reflect SFT and reinforcement learning post-training focused specifically on instruction compliance, tool calling accuracy, and long-context retrieval. (View benchmarks here)
  • Enhanced tool calling and 12-language support: Granite 4.1 models are trained for structured function calling and support 12 languages—Arabic, Chinese, Czech, Dutch, English, French, German, Italian, Japanese, Korean, Portuguese, and Spanish—with dialog, extraction, and summarization capabilities.
  • Safety and multimodal coverage within the same family: The inclusion of granite-guardian-4.1-8b (a safety classifier for detecting harmful content and prompt injections), granite-vision-4.1-4b (a Vision Language Model optimized for document extraction from PDFs, charts, and tables), and granite-speech-4.1-2b (a 2B multilingual Automatic Speech Recognition model) means teams can address safety, document parsing, and audio ingestion within the same model family—reducing integration complexity across a full pipeline.

Try it

Use Case

Prompt Pattern

Multilingual RAG

Submit retrieved document passages in any of 12 supported languages; ask model to synthesize and cite sources

Agentic tool calling

Provide function definitions + user goal; model plans and executes tool calls in structured format

Document extraction (granite-vision-4.1-4b)

Submit PDF page image; extract tables, key figures, or form fields as structured JSON

Safety classification (granite-guardian-4.1-8b)

Pass user input or model output; receive structured risk assessment before serving response

Sample prompt for an enterprise document processing deployment:

You are building a multilingual document intelligence pipeline for a global financial institution. Using the granite-4.1-30b-instruct endpoint deployed in Microsoft Foundry, submit each incoming policy or regulatory document with the following system instruction: "You are a compliance analysis assistant. Review the document and extract: (1) all regulatory requirements described, (2) the entities to which each requirement applies, (3) any compliance deadlines mentioned, and (4) any penalties or consequences for non-compliance. Return the output as a structured JSON array with one entry per requirement." For documents that include scanned pages, first route them through granite-vision-4.1-4b to extract text and table content before passing to the 30B model for compliance analysis. Pass all user-facing outputs through granite-guardian-4.1-8b to screen for sensitive information before returning results.

NVIDIA: Nemotron-3-Nano-Omni-30B-A3B-Reasoning-FP8

Model Specs

  • Parameters / size: 31B total, ~3B activated per forward pass (Mamba2-Transformer Hybrid Mixture-of-Experts)
  • Context length: 256,000 tokens
  • Primary task: Video-audio-image-text-to-text (Multimodal understanding, reasoning, tool calling)

Why it's interesting

  • Multimodal input from a single efficient endpoint: Nemotron-3-Nano-Omni-30B-A3B-Reasoning supports video (up to 2 minutes), audio (up to 1 hour), images (RGB), and text—all from a single model deployed in Microsoft Foundry. Three variants are available in Foundry: full-precision BF16, FP8, and NVFP4. Paper: Nemotron Nano Omni technical report.
  • Strong results across vision, document, video, and audio benchmarks: With reasoning mode enabled, the model scores 82.8 on MathVista-MINI (visual math reasoning), 67.04 on OCRBenchV2-EN (document OCR), 63.6 on Charxiv Reasoning (chart understanding), 72.2 on Video MME (video Q&A), 74.52 on Daily Omni (video+audio omnimodal understanding), and 89.39 on VoiceBench (speech instruction following). On OSWorld (GUI agent benchmark measuring autonomous computer use), it scores 47.4—a notable result for a model at the 3B active parameter scale. (Please see above model cards for further benchmark data)
  • Mamba2-Transformer Hybrid MoE for efficient long-context inference: The model's layers alternate between Mamba2 state-space blocks (which process sequences with linear rather than quadratic cost) and standard Transformer attention blocks, combined with Mixture-of-Experts feedforward layers. Only ~3B parameters are activated per token despite the 31B total, making the 256K context window practically usable at lower compute cost than a comparably sized dense model.
  • Word-level timestamps, JSON output, and tool calling for structured media workflows: The model produces word-level timestamps from audio, enabling precise transcript-to-timecode alignment for review and indexing workflows. Combined with JSON-structured output, chain-of-thought reasoning, and native tool calling, it can serve as an agentic step that ingests raw media (meeting recordings, M&E assets, training videos) and produces structured outputs for downstream systems without requiring separate transcription or OCR preprocessing stages.

Try it

Use Case

Prompt Pattern

Meeting intelligence

Submit audio recording (up to 1 hr); extract transcript with word-level timestamps, action items, and decisions as structured JSON

Video content analysis

Submit video clip (up to 2 min) + query; retrieve timestamped summary of key events or spoken content

Document + audio joint analysis

Submit scanned document image alongside narrated walkthrough audio; extract and reconcile information from both modalities

Multimodal tool calling

Provide tool definitions + combined image/audio input; model reasons over content and executes structured tool calls

Sample prompt for a media and compliance deployment:

You are building a broadcast compliance review system for a media company. Using the Nemotron-3-Nano-Omni-30B-A3B-Reasoning-FP8 endpoint deployed in Microsoft Foundry, submit each recorded segment as video input with the following instruction: "Review this video segment and produce a compliance report as a JSON object with the following fields: transcript (full text with word-level timestamps), flagged_segments (array of objects with start_time, end_time, content, and reason for flagging), speaker_count (estimated number of distinct speakers), and compliance_summary (overall assessment). Flag any content that includes unverified factual claims, restricted product categories, or regulatory disclosures that may be incomplete." Use the word-level timestamps from the compliance report to route flagged segments directly to the editorial review queue with precise timecode references.

Qwen: Qwen3.6-35B-A3B

Model Specs

  • Parameters / size: 35B total, 3B activated (Mixture-of-Experts)
  • Context length: 262,144 tokens natively, extensible to 1,010,000 tokens
  • Primary task: Image-text-to-text (agentic coding, reasoning, vision)

Why it's interesting

  • Agentic coding improvements over Qwen3.5-35B-A3B: Qwen3.6-35B-A3B scores 73.4 on SWE-bench Verified (vs. 70.0 for Qwen3.5-35B-A3B and 52.0 for Gemma 4 31B), 67.2 on SWE-bench Multilingual (vs. 60.3 and 51.7), and 49.5 on SWE-bench Pro (vs. 44.6 and 35.7). Terminal-Bench 2.0 reaches 51.5 (vs. 40.5 and 42.9). The update targets frontend workflows and repository-level reasoning specifically, areas where earlier Qwen3.5 iterations showed gaps. Blog post: Qwen3.6-35B-A3B.
  • Hybrid architecture: Gated DeltaNet and Mixture-of-Experts: The model's 40 layers alternate between Gated DeltaNet blocks (a form of linear attention that avoids the quadratic cost of standard self-attention), Gated Attention blocks (using Grouped Query Attention with 16 query heads and 2 key-value heads), and Mixture-of-Experts (MoE) feedforward layers with 256 experts (8 routed + 1 shared active per token). Only 3B parameters are activated per forward pass, keeping inference cost comparable to a 3B dense model while retaining the capacity of a 35B model for knowledge and specialization.
  • Thinking preservation across conversation turns: Qwen3.6 introduces an option to retain reasoning context from previous messages in multi-turn conversations. In prior models, chain-of-thought traces were stripped between turns, requiring the model to re-derive context it had already reasoned through. With thinking preservation enabled, iterative coding workflows—such as debugging across multiple exchanges—benefit from accumulated reasoning without repeating earlier analysis.
  • Natively extensible to 1 million token context: The 262K native context is already among the largest in open models at this size, and the architecture supports extension to 1,010,000 tokens. On GPQA Diamond (science reasoning), Qwen3.6-35B-A3B scores 86.0—above both Gemma 4 31B (84.3) and Qwen3.5-27B (85.5)—while matching Gemma 4 31B on MMLU Pro (85.2) and LiveCodeBench v6 (80.4 vs. 80.0).

Try it

Use Case

Prompt Pattern

Repository-level code change

Provide repository structure + task description; model plans file edits and outputs unified diff

Multi-turn iterative debugging

Enable thinking preservation; submit failing test + code across multiple turns; accumulate reasoning context

Frontend code generation

Provide design spec or screenshot + existing codebase context; generate component implementation

Long-document reasoning

Submit technical specification (up to 262K tokens); ask model to identify ambiguities or implementation gaps

Sample prompt for a software engineering deployment:

You are building an automated code review and implementation assistant for a platform engineering team. Using the Qwen3.6-35B-A3B endpoint deployed in Microsoft Foundry, enable thinking preservation for multi-turn sessions. In the first turn, submit the repository file tree and a GitHub issue describing a required API endpoint change. Prompt the model: "Review the repository structure and describe your implementation plan, including which files need to change and why." In the second turn, submit the relevant source files and prompt: "Based on your earlier plan, implement the changes and produce a unified diff." In the third turn, submit the test suite and prompt: "Write additional unit tests for the new endpoint, covering edge cases identified in your reasoning." The thinking preservation feature ensures the model carries forward its understanding of the codebase across all three turns without re-explaining context.

Getting started

You can deploy open-source Hugging Face models directly in Microsoft Foundry by browsing the Hugging Face collection in the Foundry model catalog and deploying to managed endpoints in just a few clicks. You can also start from the Hugging Face Hub. First, select any supported model and then choose "Deploy on Microsoft Foundry", which brings you straight into Azure with secure, scalable inference already configured. Learn how to discover models and deploy them using Microsoft Foundry documentation.

Read the whole story
alvinashcraft
1 minute ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories