Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
151617 stories
·
33 followers

Now in Foundry: Microsoft Harrier and NVIDIA EGM-8B

1 Share

This week's Model Mondays edition highlights three models that share a common thread: each achieves results comparable to larger leading models, as a result of targeted training strategies rather than scale. Microsoft Research's harrier-oss-v1-0.6b from achieves state-of-the-art results on the Multilingual MTEB v2 embedding benchmark at 0.6B parameters through contrastive learning and knowledge distillation. NVIDIA's EGM-8B scores 91.4 average IoU on the RefCOCO visual grounding benchmark by training a small Vision Language Model (VLM) with reinforcement learning to match the output quality of much larger models.

Together they represent a practical argument for efficiency-first model development: the gap between small and large models continues to narrow when training methodology is the focus rather than parameter count alone.

Models of the week

Microsoft Research: harrier-oss-v1-0.6b

Model Specs

  • Parameters / size: 0.6B
  • Context length: 32,768 tokens
  • Primary task: Text embeddings (retrieval, semantic similarity, classification, clustering, reranking)

Why it's interesting

  • State-of-the-art on Multilingual MTEB v2 from Microsoft Research: harrier-oss-v1-0.6b is a new embedding model released by Microsoft Research, achieving a 69.0 score on the Multilingual MTEB v2 (Massive Text Embedding Benchmark) leaderboard—placing it at the top of its size class at release. It is part of the harrier-oss family spanning harrier-oss-v1-270m (66.5 MTEB v2), harrier-oss-v1-0.6b (69.0), and harrier-oss-v1-27b (74.3), with the 0.6B variant further trained with knowledge distillation from the larger family members. Benchmarks: Multilingual MTEB v2 Leaderboard.
  • Decoder-only architecture with task-instruction queries: Unlike most embedding models that use encoder-only transformers, harrier-oss-v1-0.6b uses a decoder-only architecture with last-token pooling and L2 normalization. Queries are prefixed with a one-sentence task instruction (e.g., "Instruct: Retrieve relevant passages that answer the query\nQuery: ...") while documents are encoded without instructions—allowing the same deployed model to be specialized for retrieval, classification, or similarity tasks through the prompt alone.
  • Broad task coverage across six embedding scenarios: The model is trained and evaluated on retrieval, clustering, semantic similarity, classification, bitext mining, and reranking—making it suitable as a general embedding backbone for multi-task pipelines rather than a single-use retrieval model. One endpoint, consistent embeddings across the stack.
  • 100+ language support: Trained on a large-scale mixture of multilingual data covering Arabic, Chinese, Japanese, Korean, and 100+ additional languages, with strong cross-lingual transfer for tasks that span language boundaries.

Try it

Use Case

Prompt Pattern

Multilingual semantic search

Prepend task instruction to query; encode documents without instruction; rank by cosine similarity

Cross-lingual document clustering

Embed documents across languages; apply clustering to group semantically related content

Text classification with embeddings

Encode labeled examples + new text; classify by nearest-neighbor similarity in embedding space

Bitext mining

Encode parallel corpora in source and target languages; align segments by embedding similarity

Sample prompt for a global enterprise knowledge base deployment:

You are building a multilingual internal knowledge base for a global professional services firm. Using the harrier-oss-v1-0.6b endpoint deployed in Microsoft Foundry, encode all internal documents—policy guides, project case studies, and technical documentation—across English, French, German, and Japanese. At query time, prepend the task instruction to each employee query: "Instruct: Retrieve relevant internal documents that answer the employee's question\nQuery: {question}". Retrieve the top-5 most similar documents by cosine similarity and pass them to a language model with the instruction: "Using only the provided documents, answer the question and cite the source document title for each claim. If no document addresses the question, say so."

NVIDIA: EGM-8B

Model Specs

  • Parameters / size: ~8.8B
  • Context length: 262,144 tokens
  • Primary task: Image-text-to-text (visual grounding)

Why it's interesting

  • Preforms well on visual grounding compared to larger models even at its small size: EGM-8B achieves 91.4 average Intersection over Union (IoU) on the RefCOCO benchmark—the standard measure of how accurately a model localizes a described region within an image. Compared to its base model Qwen3-VL-8B-Thinking (87.8 IoU), EGM-8B achieves a +3.6 IoU gain through targeted Reinforcement Learning (RL) fine-tuning. Benchmarks: EGM Project Page.
  • 5.9x faster than larger models at inference: EGM-8B achieves 737ms average latency. The research demonstrates that test-time compute can be scaled horizontally across small models—generating many medium-quality responses and selecting the best—rather than relying on a single expensive forward pass through a large model.
  • Two-stage training: EGM-8B is trained first with Supervised Fine-Tuning (SFT) on detailed chain-of-thought reasoning traces generated by a proprietary VLM, then refined with Group Relative Policy Optimization (GRPO) using a reward function combining IoU accuracy and task success. The intermediate SFT checkpoint is available as nvidia/EGM-8B-SFT for developers who want to experiment with the intermediate stage.
  • Addresses a root cause of small model grounding errors: The EGM research identifies that 62.8% of small model errors on visual grounding stem from complex multi-relational descriptions—where a model must reason about spatial relationships, attributes, and context simultaneously. By focusing test-time compute on reasoning through these complex prompts, EGM-8B closes the gap without increasing the underlying model size.

Try it

Use Case

Prompt Pattern

Object localization

Submit image + natural language description; receive bounding box coordinates

Document region extraction

Provide scanned document image + field description; extract specific regions

Visual quality control

Submit product image + defect description; localize defect region for downstream classification

Retail shelf analysis

Provide shelf image + product description; return location of specified SKU

Sample prompt for a retail and logistics deployment:

You are building a visual inspection system for a logistics warehouse. Using the EGM-8B endpoint deployed in Microsoft Foundry, submit each incoming package scan image along with a natural language grounding query describing the region of interest: "Please provide the bounding box coordinate of the region this sentence describes: {description}". For example: "the label on the upper-left side of the box", "the barcode on the bottom face", or "the damaged corner on the right side". Use the returned bounding box coordinates to route each package to the appropriate inspection station based on the identified region.

Getting started

You can deploy open-source Hugging Face models directly in Microsoft Foundry by browsing the Hugging Face collection in the Foundry model catalog and deploying to managed endpoints in just a few clicks. You can also start from the Hugging Face Hub. First, select any supported model and then choose "Deploy on Microsoft Foundry", which brings you straight into Azure with secure, scalable inference already configured. Learn how to discover models and deploy them using Microsoft Foundry documentation:

Read the whole story
alvinashcraft
4 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Skia vs Impeller: A Performance Comparison

1 Share

Through Google’s official videos, we’ve learned about the rendering advantages of Impeller and its overall rendering pipeline. Next, I conducted an experiment to compare the rendering performance of Skia and Impeller in practice.

I wrote a custom Canvas animation that draws a circle on the screen and generates bubbles at random positions to simulate a complex UI animation scenario. The effect is shown below:

The test device was a Pixel 8 Pro. The rendering results were as follows:

Rendering performance using Skia:

Rendering performance using Impeller:

I also exported profiling data with DevTools and analyzed it using a custom Python script, producing the following table:

As shown, Impeller has a clear advantage in GPU rendering: average GPU raster time per frame was reduced by about 30% (2.81ms vs 4.05ms), and the 95th percentile was reduced by ~1.5ms. This means Impeller places less load on the GPU thread and completes frame rendering faster. On the UI thread, frame build time was almost identical for both renderers at around 2ms, confirming that switching renderers has little impact on layout/build overhead.

In terms of total frame time, Impeller averaged ~6.57ms per frame versus Skia’s ~7.71ms. With a 120Hz refresh budget of 8.33ms, most Impeller frames finished before the next VSync. As shown, 91.6% of Impeller frames stayed within the 8.33ms target, enabling near-120fps smoothness, while Skia only achieved 67.1%. In other words, about one-third of Skia frames missed the 120Hz deadline (falling between 8–12ms), effectively lowering the frame rate to somewhere between 60–120fps. By reducing GPU stalls, Impeller delivered far more frames at the full 120fps. Both renderers rarely exceeded 16.67ms (dropping below 60fps): ~0.09% for Skia and ~0.29% for Impeller. Impeller’s few outliers were linked to pipeline creation overhead, which was smoothed out in the averages.

Overall, the statistics and charts show Impeller’s performance advantage: lower average frame times and less frame time jitter. Next, let’s break down the differences by event category.

1. UI Thread Frame Build (UI::Frame)

The UI thread handles build/layout operations, determined by the app’s widget tree and logic, and is generally unaffected by the renderer. The data confirms Impeller does not change UI thread efficiency. I did capture an anomalous Skia frame with ~39.8ms UI build time, likely caused by GC or an edge case unrelated to the renderer. Impeller showed no such spikes. In general, UI thread costs can be considered equivalent.

However, UI performance can be indirectly affected if the GPU lags and blocks UI from producing new frames (PipelineProduce delays). In my data, Skia occasionally showed very short UI build times but overall frame times near two frame intervals, implying UI was waiting for the GPU. This was far less common under Impeller.

2. GPU Thread Rasterization (Rasterizer::Draw)

The GPU thread is where Impeller shows the largest gains. In Skia, GPU raster averaged 4.05ms per frame, while Impeller averaged 2.81ms — about 1.4ms faster at the median. This improvement comes from Impeller’s use of modern graphics APIs, batching to reduce driver overhead, and avoiding runtime shader compilation.

Skia often showed spikes corresponding to JIT shader compilation, with some frames hitting 9–12ms or more. Impeller avoids this by precompiling shaders, though it does incur one-time pipeline creation overhead. In my traces, two such events caused spikes of 17ms and 35ms, producing temporary stalls. These are predictable and can be prewarmed, unlike Skia’s unpredictable shader jitters.

3. PipelineProduce and Frame Synchronization

PipelineProduce is where the UI thread submits scenes to the GPU. If the GPU is behind, UI may block. My analysis found Skia triggered 29 SceneDisplayLag events vs 12 for Impeller, about 2.4× more frequent. This aligns with the higher proportion of Skia frames missing the 8ms vsync deadline. In some Skia frames, UI finished quickly but still waited nearly a whole frame for GPU, while Impeller’s faster GPU kept the pipeline flowing smoothly.

4. Shader Compilation and GPU Events

Skia timelines showed multiple ShaderCompile events (about 14 total), matching the frame time spikes. Impeller had none, confirming its precompiled approach. Instead, Impeller timelines showed PipelineVK::Create events, representing pipeline creation on first use. Most were negligible, but a few caused the long-tail spikes noted earlier.

Impeller’s design reduces runtime surprises, providing predictable performance — valuable for animation-heavy apps. It leverages modern GPU APIs for concurrency and finer resource management, though this test’s simple Canvas animation did not fully stress those advantages.

Conclusion

In this custom Canvas animation test, Flutter’s Impeller renderer outperformed Skia. Impeller eliminates runtime shader compilation stalls, delivering lower frame times and more stable performance. For animation-heavy, graphics-rich apps, enabling Impeller significantly reduces jank and provides a smoother user experience.


Skia vs Impeller: A Performance Comparison was originally published in Flutter Community on Medium, where people are continuing the conversation by highlighting and responding to this story.

Read the whole story
alvinashcraft
4 hours ago
reply
Pennsylvania, USA
Share this story
Delete

How to Load Embedding Models into Oracle AI Database in 2026

1 Share

Key Takeaways

  • Importing ONNX into Oracle AI Database is not “uploading a file” — it registers a model object in your schema.
  • Most “model import failures” are actually privilege or access issues (roles, grants, directory permissions, or wrong schema).
  • You can import models using a directory-based path for fast local iteration, or a database-driven import approach if you handle the model as a BLOB.
  • Once imported, inference becomes a SQL expression, which makes it easy to embed into ETL, pipelines, triggers, or app queries.

Most teams don’t struggle to export a model — they struggle to operationalize it. The hard part starts after training: running the same model reliably across environments in production.

In 2026, many teams already have solid models exported to ONNX, because it’s the most practical handoff format between training and production. It’s portable, widely supported, and makes it easier to reuse models and move them across stacks without re-implementing inference. Yet production inference still ends up scattered across services: a model server here, a vector store there, glue code everywhere in between, and an ever-growing list of credentials and network hops.

Why Choose ONNX

ONNX (Open Neural Network Exchange) is a standard file format for machine learning models. It’s used to package a trained model into a framework-neutral representation, so the model can be moved and executed across different tools and environments more reliably.

Many ML engineers and AI developers choose ONNX over framework-specific formats because it reduces lock-in and friction between training and deployment:

  • Portability: the same model artifact can travel across environments and runtimes with fewer rewrites.
  • Interoperability: training and inference don’t need to share the same framework choices.
  • Reuse: models become shareable artifacts — teams can reuse existing models instead of rebuilding from scratch.
  • Faster prototyping: swapping models becomes simpler, which speeds up experiments and iteration cycles.
  • Cleaner production handoff: ONNX helps separate “model development” from “system integration,” reducing glue code and making deployments more repeatable.

Oracle AI Database changes the story. Instead of treating inference as “something outside the database,” you can import an ONNX model into the data.

How to integrate machine learning into Oracle AI Database

A practical database-to-ML integration usually looks like this:

  1. Start from the data path: decide where inference should run (next to the data vs across services).
  2. Set up ownership and privileges: make schema ownership explicit and keep grants minimal and correct.
  3. Prepare the data in SQL: clean, filter, and validate inside the database.
  4. Package the model artifact: export the model to a portable format (often ONNX) for predictable deployment.
  5. Register the model in the database: import ONNX as a schema-native model object.
  6. Run inference in SQL: generate embeddings or predictions as part of query execution.
  7. Persist what you’ll reuse: store embeddings next to rows, index them, and query with filters.
  8. Operationalize updates: re-embed new/changed data, version models, and keep the pipeline repeatable.

A clean, production-oriented flow looks like this: ADB setup → roles/grants → upload ONNX → inference … and the part that causes most failures in real systems: roles/grants pitfalls.

Before getting into setup, a quick note on the target environment: everything below assumes Oracle AI Database 26ai. The same flow works whether you’re running Autonomous Database (cloud) or Oracle AI Database Free 26ai in a local container — but it’s worth confirming the version up front, since many errors are simply caused by an environment/version mismatch.

SELECT banner_full FROM v$version;

What does it mean to “load ONNX into the database”?

Loading ONNX into the database isn’t “deploying a model server.” It registers a database model object in your schema — something SQL can call directly. A good mental model is a stored procedure, except it’s backed by an ONNX runtime and the model lives as a first-class object in the database.

Here’s the full flow from ONNX import to vector search.

Flow from ONNX import to vector search via Oracle AI Database

ADB setup

This section covers the minimum setup needed to connect, grant the right privileges, and confirm the environment is ready for ONNX import and SQL inference.

What “ready” means for this article

  • You can connect to the database (SQL Developer / SQLcl / app connection string).
  • You have an admin-capable user for grants (usually ADMIN on ADB).
  • You know which schema will own the model (we will use ML_USER).

Roles & grants (the invisible dependency)

The database session that imports the model must be allowed to create a model object and must be allowed to read the model source.

Most “it doesn’t work” tickets come from:

  • missing privileges,
  • directory grants missing,
  • importing under the wrong schema,
  • or relying on a role that isn’t enabled the way you expect.

Minimal schema setup (clean, reusable):

-- Run as ADMIN (or a privileged user)CREATE USER ml_user IDENTIFIED BY "<strong-password>";
-- A practical baseline role for building DB apps
GRANT DB_DEVELOPER_ROLE TO ml_user;

-- Required for model objects (common requirement when importing ONNX models)
GRANT CREATE MINING MODEL TO ml_user;

Why grant this explicitly? Because ONNX imports land as model objects. If the schema can’t create the object, the import fails, even if the ONNX file is fine.

A small “privilege flow” mental diagram:

Privilege flow diagram

Upload / import ONNX (two production-friendly paths)

You’ll see two patterns in the real world:

  • Directory-based import: fast iteration, simple, great when you have access to a server path.
  • BLOB-based import: model comes as a BLOB (for example, retrieved from object storage via your preferred mechanism), then imported.

This article shows both — cloud-first teams usually prefer the second because it avoids filesystem coupling, but both are valuable.

Option A: Directory-based import (cleanest “first success”)

This approach has one dependency: a database directory object pointing to a server path.

Directory creation and grants

-- Run as ADMIN (or privileged user)
CREATE OR REPLACE DIRECTORY DM_DUMP AS '<work directory path>';
GRANT READ ON DIRECTORY DM_DUMP TO ml_user;
GRANT WRITE ON DIRECTORY DM_DUMP TO ml_user;

Import / load ONNX (with explicit metadata)

Why the JSON metadata matters
The JSON block tells Oracle AI Database how to invoke the ONNX graph for your use case. For embedding models, it removes ambiguity by specifying (1) what kind of function you’re loading, (2) which output should be treated as the embedding vector, and (3) how the model’s input tensor(s) map to the data you’ll pass at inference time. That’s what makes the import self-describing and helps prevent inference mismatches later.

When Oracle defaults are enough
For many standard embedding ONNX models that follow common input/output conventions (single obvious embedding output, expected input naming), a minimal import is often enough to get a clean first success.

When you should use explicit JSON

Use explicit metadata when the model deviates from those conventions — for example:

  • the embedding output tensor has a different name, or there are multiple outputs and you need to choose one
  • the model expects different input tensor names, multiple inputs, or a non-standard signature
  • you’re loading a custom/exported ONNX graph and want the import to stay predictable and future-proof

Here’s a quick way to think about it: some ONNX embedding models are “default-friendly” — for example, a well-known embedding model you grab from Hugging Face and export to ONNX. These typically follow common conventions (one obvious embedding output, expected input naming), so a minimal import is often enough to get a first success. Other models are more custom — maybe a fine-tuned model exported from your own pipeline, or an ONNX graph with different tensor names or multiple outputs. In those cases, explicit JSON metadata removes ambiguity and keeps inference behavior predictable.

Default-friendly (minimal import):

-- Run as ML_USER
EXEC DBMS_VECTOR.LOAD_ONNX_MODEL(
'DM_DUMP',
'all_minilm_l12_v2.onnx',
'minilm_embed'
);

Custom model import (explicit JSON metadata):

-- Run as ML_USER (schema owner)
EXEC DBMS_VECTOR.DROP_ONNX_MODEL(model_name => 'doc_model', force => TRUE);

EXEC DBMS_VECTOR.LOAD_ONNX_MODEL(
'DM_DUMP',
'my_embedding_model.onnx',
'doc_model',
JSON('{
"function" : "embedding",
"embeddingOutput" : "embedding",
"input": { "input": ["DATA"] }
}')
);

What happened here (in one paragraph)

  • The ONNX file is read from the directory.
  • A model object named DOC_MODEL is created in ML_USER.
  • The metadata tells the database how to bind SQL input (DATA) to the model input tensor, and which output is the embedding.

Option B: BLOB-based import (best for “cloud-first” pipelines)

In many organizations, models are stored in artifact registries or OCI Object Storage. The principle stays the same:

Get the model into a BLOB → import the BLOB as a model object.
Generic BLOB import
-- Run as ML_USER
BEGIN
DBMS_DATA_MINING.IMPORT_ONNX_MODEL(
model_name => 'doc_model',
model_data => :model_blob,
metadata => JSON('{
"function" : "embedding",
"embeddingOutput" : "embedding",
"input": { "input": ["DATA"] }
}')
);
END;
/

Import paths visualized

Import path diagram

How do you know the model is really in the database?

Import is only useful if you can confirm two things quickly:

  • the model exists as an object in your schema, and
  • the database understands the model signature well enough to run inference.

That’s why it helps to keep a small “verification corner” in your notebook or SQL script — especially when you’re iterating on metadata JSON or switching between schemas.

Check that the model is registered

-- Run as ML_USER
SELECT model_name, mining_function, algorithm, model_size
FROM user_mining_models
WHERE model_name = 'DOC_MODEL';

Inspect model attributes (helps spot signature issues)

-- Run as ML_USER
SELECT model_name, attribute_name, attribute_type, data_type, vector_info
FROM user_mining_model_attributes
WHERE model_name = 'DOC_MODEL'
ORDER BY attribute_name;

Verification mental model

Verification flowchart

Inference: where Oracle AI Database becomes the runtime

Once the model is imported, Oracle AI Database can invoke it directly from SQL. The key benefit here isn’t “SQL can call a model” as a party trick — it’s that inference becomes a database-native operation, living inside the same security boundary as your data, with the same lifecycle discipline you already apply to database artifacts.

A quick confidence check (smoke test)

This single call verifies the whole chain: model registry, signature mapping, privileges, and runtime.

SELECT VECTOR_EMBEDDING(doc_model USING 'hello' AS data) AS embedding;

If this returns a vector, you’ve confirmed the end-to-end path is correct.

A practical pattern: persist embeddings next to your data

In real systems, you typically don’t want embeddings to exist only at query time. You want them stored alongside content so they can be reused consistently (search, recommendations, ranking, analytics), and refreshed intentionally when the model or data changes.

CREATE TABLE docs (
doc_id NUMBER GENERATED BY DEFAULT AS IDENTITY,
content VARCHAR2(4000),
embed VECTOR
);

INSERT INTO docs (content, embed)
VALUES (
'Oracle AI Database can run ONNX models in-database.',
VECTOR_EMBEDDING(doc_model USING 'Oracle AI Database can run ONNX models in-database.' AS data)
);

COMMIT;

Embedding lifecycle (what you’re building)

Embedding lifecycle diagram

Persisted vectors unlock a clean database-native workflow:

  • embeddings are created where the data is,
  • stored and governed like any other column,
  • reused across queries and applications,
  • and can be refreshed via jobs/pipelines with predictable cost control.

We’ve also published a runnable companion notebook for this article in the Oracle AI Developer Hub on GitHub. It walks through the same end-to-end workflow, from importing an ONNX embedding model into Oracle AI Database to validating the model and running in-database embedding inference in SQL.

ML model data sync and retraining: what to keep stable

Once embeddings are persisted, the main question becomes lifecycle: how to keep vectors in sync as data and models evolve. A practical pattern is to treat embeddings as derived data: refresh them when rows change, and re-embed the corpus when you promote a new model version. Keep the model version explicit, validate with a small smoke test, and only then roll the change into production workflows.

Security and privacy risks in ML–database integration

Most failures here are not “ML problems” — they’re access and boundary problems. Keep a least-privilege mindset: make schema ownership clear, restrict who can import models, and avoid broad grants when a single directory/BLOB permission is enough. The safest operational setup is the one that keeps inference and data access inside the database security boundary, with auditing and predictable privileges.

Common ADB roles/grants pitfalls (and fast fixes)

This section exists because most “ONNX import is broken” reports are actually “permissions and ownership are unclear.” The following pitfalls show up repeatedly in real ADB environments.

Pitfall 1: ORA “insufficient privileges” during model import

Symptom: import/load fails with an insufficient privileges error.
Root cause: the schema can’t create model objects.
Fix: grant explicitly:

GRANT CREATE MINING MODEL TO ml_user;

Tip: When you’re troubleshooting, prefer explicit grants over “it’s in some role,” because role-enabled behavior can vary across tools and execution contexts.

Pitfall 2: The import worked, but the model “is missing”

Symptom: you imported DOC_MODEL, but USER_MINING_MODELS shows nothing.
Root cause: you’re connected as the wrong user (different schema than the importer).
Fast check:

SHOW USER;
SELECT model_name FROM user_mining_models;

Fix: connect as the schema owner that performed the import — or query broader views if you have privileges (ALL/DBA).

Pitfall 3: Directory-based import fails even though the file exists

Symptom: directory/path read errors.
Root cause: missing directory object, missing READ/WRITE, or directory points to a server path that isn’t accessible in that environment.

Fix checklist:

  • directory exists and points to the correct path
  • ML_USER has READ (and often WRITE) on the directory
  • the ONNX file is present at that path (server-side)

Pitfall 4: Import succeeds, but inference fails

Symptom: VECTOR_EMBEDDING errors, returns unexpected output, or can’t bind input.
Root cause: metadata JSON doesn’t match the model signature (wrong input tensor name / wrong output name).

Fix: import with explicit JSON metadata and validate attributes:

SELECT model_name, attribute_name, attribute_type, data_type
FROM user_mining_model_attributes
WHERE model_name = 'DOC_MODEL'
ORDER BY attribute_name;

Pitfall 5: “It works in one tool but not in another”

Symptom: you can import or run inference in one client, but not in another.

Root cause: relying on privileges delivered via roles vs direct grants; differences in execution context can surface as “random” failures.

Fix: for the importing schema, grant the required privileges directly while validating the workflow, then tighten later.

Pitfall map (symptom → root cause → fix)

Map of common pitfalls

Conclusion

Thank you for reading, and we hope you found this useful!

Don’t forget to check out the companion notebook on our Oracle AI Developer Hub on GitHub.

Frequently Asked Questions

What are the steps to connect a database to a machine learning pipeline?

Use a simple flow: prepare data in SQL → export a portable model artifact (often ONNX) → import/register it in the database → run inference in SQL → persist and index results → automate refresh and versioning.

How can I keep my database and ML model data in sync?

Treat embeddings as derived data. Refresh vectors when rows change, and re-embed when you promote a new model version. Keep model/version metadata explicit so the refresh is deterministic.

How do I update my ML model with new data from the database?

In practice, it’s an iteration loop: collect new data → retrain → export to ONNX → import as a new model version → validate quickly → re-embed what needs refreshing.

What challenges arise when integrating large databases with ML?

Data movement and operational complexity. Running inference where the data lives and persisting results reduces network hops, simplifies governance, and improves repeatability.

What security concerns exist when linking databases to ML systems?

Least privilege and clear ownership. Restrict model import privileges, control directory/BLOB access, and keep inference within the database boundary when possible.

Do ONNX models become database objects?

Yes — after import they exist as schema-managed model objects and can be referenced from SQL.

Which privileges matter most?

CREATE MINING MODEL is the first one to confirm. For directory-based import, directory READ/WRITE grants are equally important.

Should I compute embeddings at query time or store them?

For experimentation, computing embeddings at query time is fine. For production reuse and indexing, persisting embeddings next to the source data is usually the practical choice — especially if you want hybrid search.

Hybrid search combines keyword search over the original text (exact matches, filters, structured predicates) with semantic similarity over embeddings (meaning-based matching). Keeping both the text and vectors in the same database makes it easy to blend the two signals in one query, which often yields better relevance than either keyword-only or vector-only retrieval.

A common pattern is: persist embeddings for your corpus, compute the query embedding at runtime, and use both for hybrid ranking. This is a key Oracle AI Database advantage: combining relational predicates, keyword/text search, and vector similarity in a single SQL query — without stitching together separate systems.

What’s the fastest end-to-end validation?

Run a single VECTOR_EMBEDDING smoke test and confirm the model appears in USER_MINING_MODELS.

Who does the “heavy lifting” for embeddings when the ONNX model runs in-database?

The computation happens inside Oracle AI Database, as part of the SQL execution. When you call VECTOR_EMBEDDING(...) (or invoke the imported model via SQL), the database runs the ONNX runtime and produces the embedding vector on the database side—no external model server is required for the inference step.

In practice, this means embeddings are computed where the data lives, and the results can be immediately stored, indexed, filtered, and combined with SQL predicates in the same workflow.


How to Load Embedding Models into Oracle AI Database in 2026 was originally published in Oracle Developers on Medium, where people are continuing the conversation by highlighting and responding to this story.

Read the whole story
alvinashcraft
4 hours ago
reply
Pennsylvania, USA
Share this story
Delete

🖼🚀️ MAI-Image-2 Just Dropped — And .NET Support Is Already Here

1 Share

⚠ This blog post was created with the help of AI tools. Yes, I used a bit of magic from language models to organize my thoughts and automate the boring parts, but the geeky fun and the 🤖 in C# are 100% mine.

Hi!

When Microsoft announced MAI-Image-2, I immediately thought: “I need to add this to ElBruno.Text2Image. Today.”

So I did. 😄

MAI-Image-2 is Microsoft’s new image generation model on Microsoft Foundry — high-quality generation, a synchronous API (no polling!), a 32K character prompt limit, and flexible dimensions. And it’s already supported in ElBruno.Text2Image with the same clean interface you already know.

Let me show you how it works.


☁ Getting Started — MAI-Image-2 on Azure AI Foundry

MAI-Image-2 delivers high-quality image generation with a simpler developer experience than FLUX.2. The API is synchronous — you send a request, you get an image back. No 202 status codes, no polling loops, no waiting callbacks. Just a prompt and a picture.

Here’s all you need:

using ElBruno.Text2Image;
using ElBruno.Text2Image.Foundry;
using var generator = new MaiImage2Generator(
endpoint: "https://your-resource.services.ai.azure.com",
apiKey: "your-api-key",
modelId: "MAI-Image-2");
var result = await generator.GenerateAsync(
"a futuristic cityscape with neon lights, cyberpunk style");
await result.SaveAsync("mai-image2-output.png");
Console.WriteLine($"Generated in {result.InferenceTimeMs}ms");

Setting up credentials

The library reads from User Secrets, environment variables, or appsettings.json. For local development:

dotnet user-secrets set MAI_IMAGE2_ENDPOINT "https://your-resource.services.ai.azure.com"
dotnet user-secrets set MAI_IMAGE2_API_KEY "your-api-key-here"
dotnet user-secrets set MAI_IMAGE2_MODEL_ID "MAI-Image-2"

💡 Fun fact: MAI-Image-2 uses a dedicated /mai/v1/images/generations endpoint. The library handles this automatically — just provide your .services.ai.azure.com base URL and it builds the correct API path for you.


⚡ Key Differences from FLUX.2

If you’re already using FLUX.2 with this library, here’s how MAI-Image-2 compares:

FeatureMAI-Image-2FLUX.2
API styleSynchronous (direct response)Asynchronous (202 + polling)
API path/mai/v1/images/generationsBFL provider path
Prompt limit32,000 characters~1,000 characters
Min dimensions768px (per side)256px
Max dimensions1M total pixelsModel-dependent
InterfaceIImageGeneratorIImageGenerator
DI support✅ Same pattern✅ Same pattern
Endpoint auto-conversion✅✅

The synchronous API is a big deal for developer experience. No more writing polling loops or handling intermediate states. Send a prompt, get an image. Done.


🔌 Same Interface, Multiple Backends

🧩 Microsoft.Extensions.AI Compatible

Every generator in the library — including the new MaiImage2Generator — implements the standard Microsoft.Extensions.AI.IImageGenerator interface from the
Microsoft.Extensions.AI.Abstractions package. This means you can use ElBruno.Text2Image as a drop-in provider anywhere the MEAI abstraction is expected — dependency
injection, middleware pipelines, or any framework that programs against IImageGenerator. Cloud or local, FLUX.2 or MAI-Image-2 or Stable Diffusion — they all plug
into the same standard .NET AI contract.

// MAI-Image-2 (cloud)
IImageGenerator generator = new MaiImage2Generator(endpoint, apiKey, modelId: "MAI-Image-2");
// FLUX.2 Pro (cloud)
IImageGenerator generator = new Flux2Generator(endpoint, apiKey, modelId: "FLUX.2-pro");
// Stable Diffusion 1.5 (local)
IImageGenerator generator = new StableDiffusion15();
// Same API for all three
var result = await generator.GenerateAsync("a beautiful landscape");

💉 Dependency Injection

If you’re building with DI, the library has an extension method ready to go:

services.AddMaiImage2Generator(
endpoint: "https://your-resource.services.ai.azure.com",
apiKey: "your-api-key",
modelId: "MAI-Image-2");

Same pattern as the FLUX.2 registration. Inject IImageGenerator and you’re done.


🔗 Links

Happy coding!

Greetings

El Bruno

More posts in my blog ElBruno.com.

More info in https://beacons.ai/elbruno






Read the whole story
alvinashcraft
4 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Build a Movie Watchlist with Node.js, TypeScript, and MongoDB

1 Share

Almost every modern web application will need a REST API for a client to talk to, and in almost every scenario, that client is going to expect JSON. The best developer experience is a stack where you can stay in JSON-shaped data end to end, without awkward transformations in the middle.

Take MongoDB, Express Framework, and Node.js as an example.

Express receives HTTP requests and sends responses. MongoDB sits in the middle and stores documents. The client can send JSON to your routes, your routes can send documents to MongoDB, and MongoDB can hand BSON back that maps naturally to what you serialize in the response. That works well because MongoDB is a document database. When you also want text search over fields like title and plot, MongoDB Search gives you a $search stage in an aggregation pipeline on the same cluster, so you are not bolting on a separate search system just to power a search box.

In this tutorial, we’ll see how to build a small movie watchlist API using TypeScript and MongoDB. We’ll explore a few different schema design opportunities and make use of MongoDB Search for full-text search.

The post Build a Movie Watchlist with Node.js, TypeScript, and MongoDB appeared first on DEV.



Read the whole story
alvinashcraft
4 hours ago
reply
Pennsylvania, USA
Share this story
Delete

HVTools – first Community response is insane!

1 Share

Last Updated on April 13, 2026 by Michael Morten Sonne Introduction The community is key – and I…

The post HVTools – first Community response is insane! first appeared on Blog - Sonne´s Cloud.
Read the whole story
alvinashcraft
4 hours ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories