Philip Kiely, software developer relations lead at Baseten, speaks with host Jeff Doolittle about multi-agent AI, emphasizing how to build AI-native software beyond simple ChatGPT wrappers. Kiely advocates for composing multiple models and agents that take action to achieve complex user goals, rather than just producing information. He explains the transition from off-the-shelf models to custom solutions, driven by needs for domain-specific quality, latency improvements, and economic sustainability, which introduces the engineering challenge of inference engineering. Kiely stresses that AI engineering is primarily software engineering with new challenges, requiring robust observability and careful consideration of trust and safety through evals and alignment. He recommends an approach of iterative experimentation to get started with multi-agent AI systems.
A lot of people building software today never took the traditional CS path. They arrived through curiosity, a job that needed automating, or a late-night itch to make something work. This week, David Kopec joins me to talk about rebuilding computer science for exactly those folks, the ones who learned to program first and are now ready to understand the deeper ideas that power the tools they use every day.
Photo from Slush, âHow to build a cultâ, James Hawkings, co-founder at PostHog
I snapped the photo above during a talk at Slush 2025, Europeâs largest founder focused conference, and it encapsulates the reality we are living through well. The speakerâs slide highlighted a brutal truth: we have moved from an era where software was âsimple logicâ to one where âmagic is the new expectation.â
In the âPre-AIâ era, building software was a slow process, often limited to basic feature creation like connecting screens to REST APIs. Today, however, development is rapid, and the market is âfiercely competitive.â To succeed, we must quickly deliver features like once required specialized ML models and the effort of PhDs.
The great news is that the entry barrier to creating this âmagicâ has drastically fallen. I am now able to implement complex AI pipelines directly within a Flutter application, thanks to Googleâs Firebase hosted Gemini stack.
The Project: Finnish-it
The image description feature is being developed for Finnish-it (iOS, Android), a production Flutter application that aims to guide Finnish đ«đź language learners from CEFR A1 to C1 proficiency levels and prepare them for the YKI (National Certificate of Language Proficiency) examination.
Although the YKI exam doesnât include a dedicated âImage Descriptionâ task, the core skills involvedâââsuch as spontaneous vocabulary recall, spatial awareness, and developing a coherent narrativeâââare exactly what test-takers struggle with during the open-ended speaking tasks.
To address this skills gap, I created a feature that uses images as a framework. Users describe these generated scenes to build the necessary muscle memory for the examâs âNarrationâ (Kertominen) and âOpinionâ (Mielipide) sections. Traditionally, apps rely on a static database of stock photos for this kind of practice. I wanted something better.
I implemented an automated pipeline capable of generating unique speaking scenarios, creating a photorealistic visual prompt, and strictly mapping B1-level vocabulary to concrete visual elements, all within seconds.
The stack is simpleâââno custom infra, no separate backend:
Flutter: âVibe once, run everywhereâ.
Firebase AI Logic (firebase_ai): A Flutter plugin to use the Firebase AI Logic SDK, providing access to the latest generative AI models like Gemini and Imagen.
Gemini Models: gemini-2.5-flash for structured text, gemini-2.5-flash-image for scenes, and gemini-3-pro-image-preview for annotated overlays.
Flutter at Google I/Oâ25
Letâs look at how to build this.
1. The Blueprint: Topics as Seeds
We donât want the AI to be completely randomâââwe need it to follow a pedagogical curriculum. However, we also donât want to hardcode thousands of static exercises.
The solution is Structured Serendipity. We maintain a lightweight JSON configuration of topics. Each topic contains a âseedâ descriptionââânot the final image caption, but a high-level directive that sets the scene for Gemini.
Structured Serendipity: The sweet spot between rigid hard-coding (which doesnât scale) and pure random generation (which risks quality). It allows us to deliver infinite content variations without losing control over the learning outcomes.
Here is a look at our topics.json:
[ { "id": "urbanSpaces", "translations": { "en": "Urban Spaces & City Life", "fi": "Kaupunkitilat ja elĂ€mĂ€", "tr": "Kentsel Alanlar ve Ćehir Hayatı", }, "description": "A broad category for everyday life in Finnish cities and towns. Scenes should always take place in clearly urban or suburban public spaces, but the exact setting should change often. Vary strongly between busy hubs (squares, shopping streets, transit stations) and quiet corners (residential courtyards, small parks, side streets), between different seasons, weather and light (snowy winter, slushy spring, bright summer evening, rainy autumn day, dark winter with streetlights), and between indoor and outdoor locations (modern libraries, shopping centres, lobbies, stairwells, underpasses). Examples might include waiting at a tram or bus stop, a modern central library interior, a pedestrian street with buskers, a railway or metro station hall, or a quiet bench near a playground or recycling point. Treat these only as sample ideas: every new scene is encouraged to invent a fresh, safe and realistic Finnish city situation rather than repeating the same setting." }, { "id": "seasonsAndScenery", "translations": { "en": "Seasons & Scenery", "fi": "Vuodenajat ja maisemat", "tr": "Mevsimler ve Manzaralar", }, "description": "Focuses on the strong seasonal contrasts of Finland, from south to north. Each scenario should clearly highlight one specific combination of season, weather and light, and this combination should vary widely across generations: winter scenes with deep snow, pale blue light, frozen lakes or soft snowfall; spring with melting snow, buds on trees and wet streets; summer with lakes and archipelago, lush green forests, warm evenings or long twilight; autumn with colourful ruska forests, foggy mornings or windy coastal views. The setting can be countryside, small town or city edge, with or without people, as long as the landscape and atmosphere are easy to read, positive or calm in mood and clearly tied to a season. Use many different viewpoints (on a path, from a pier, from a hill, by a shore) instead of repeating the same composition." }, ... ]
When a user selects âUrban Spaces,â we pull that description field from the JSON content and inject it directly into our prompt. This gives Gemini the context it needs ("Finnish city life") while leaving the specifics ("a snowy tram stop vs. a library") up to the model's creativity.
2. The Data Model: Enforcing Structure
In the past, integrating Large Language Models (LLMs) often required developers to use lengthy prompts, essentially begging the model to adhere to specific formatting: âPlease reply in JSON, do not include markdown blocks, make sure the list is an arrayâŠâ Even then, you would often get broken syntax or hallucinated fields that crashed your app.
We have a better solution.
We use Structured Output (available in Gemini models via the firebase_ai package). Instead of passively receiving unstructured, open-ended responses, we mandate that Gemini conforms to a strict JSONÂ Schema.
Consider this schema as a mandatory blueprintâââa legally binding contractâââbetween your Flutter application and the AI. It dictates the exact field names, expected data types, and required fields, ensuring predictable and reliable data output.
In our app, every image-speaking exercise is represented by ImageDescriptionPractice. We define this schema in standard Dart code:
final Uint8List? generatedImageBytes; final Uint8List? annotatedImageBytes; final String sample; final String id; final String description; final List<String> vocabulary; final List<String> annotatedWords;
static Schema imageDescriptionJsonSchema = Schema.object( description: "An image description speaking practice exercise for Finnish language learning, focusing on describing an image.", properties: { 'id': Schema.string( description:"A unique identifier for the practice item.", ), 'description': Schema.string( description: "A detailed, photorealistic image prompt in English that will be used to generate an image.", ), 'vocabulary': Schema.array( description: "An array containing 5 to 10 unique strings. Each string must be a single Finnish vocabulary word (a noun, verb, or adjective) relevant to the image. The vocabulary should be suitable for the CEFR B1 level.", items: Schema.string(), minItems: 5, maxItems: 10, ), 'sample': Schema.string( description: "A sample B1-level response in Finnish language describing the image.", ), 'annotatedWords': Schema.array( description: "An array of strings containing the words that should be annotated in the image.", items: Schema.string(), minItems: 6, maxItems: 8, ), }, ); }
By passing imageDescriptionJsonSchema to the model, the response is always valid JSON. No hallucinated fields, just clean data ready for the UI.
Read more about âGenerate structured output (like JSON and enums) using the Gemini APIâ in official Firebase docs.
3. The Prompts
We take a rigorous approach to prompting. Rather than simply asking the model to âcreate a picture,â we assign it a specific role, such as Expert Curriculum Designer, and provide a detailed checklist of constraints.
The variable $description below is the "seed text" we pulled from our JSON topic list earlier. We inject that seed into a much larger system instruction:
String _textPrompt(String description) => ''' # PRIMARY DIRECTIVE & ROLE You are an expert curriculum designer and visual prompter specializing in a ${F.sourceLanguage} language exam. Your task is to generate a detailed, photorealistic image prompt for the Image Description speaking task. **The `description` field you create will be used to generate an image directly via Google's image generation models.** Your primary goal is to produce a prompt that is guaranteed to pass automated safety filters by being wholesome, inclusive, and unambiguous.
# CORE TASK & CONSTRAINTS 1. **Topic Constraint:** The generated image prompt MUST be a perfect fit for the topic category: "$description". 2. **Critical Task Constraint:** The prompt must describe a scene ideal for a B1-level language learner, serving as a conversational springboard for describing, interpreting, narrating, and comparing. 3. **Annotation Preparation (Crucial):** You must identify **vocabulary words** mentioned in (or relevant to) your description to populate the `annotatedWords` list. * **Variety:** While most words should be **concrete nouns** (e.g., "table", "dog"), you should also include **verbs/actions** (e.g., "reading") or **adjectives** (e.g., "cozy", "happy") if the scene clearly depicts them. * **Visual Clarity:** Ensure every word chosen has a clear visual anchor in the scene. 4. **Visual & Technical Constraints:** The description MUST be optimized for a **Mobile App UI (Vertical 3:4 Aspect Ratio)**. * **Style:** Explicitly request "photorealistic," "high definition," and "natural lighting." * **Composition:** Request a "medium shot" with a **"vertical composition"**. Ensure the subjects and action are centered to fit within an **896x1280 (3:4)** frame without important details being cut off at the edges. * **Framing:** Avoid wide-angle or panoramic descriptions. Focus on verticality (e.g., capturing torso-up or full-body interactions clearly).
# Depicting People (Optional) Including people is encouraged for language practice but not mandatory. If you choose to include people, they must be depicted according to the following guidelines:
- **Interaction and Number:** The scene must include at least two people interacting in a clear, positive, and non-romantic way. - **Roles and Actions:** Describe individuals by their roles (e.g., a pharmacist, a customer) and specific actions (e.g., explaining, listening) rather than static physical traits. The focus should always be on what they are doing. - Examples: "a librarian helping a student find a book," "two colleagues collaborating at a whiteboard," "a parent and child baking cookies together." - **Expressions and Emotion:** Specify clear, positive facial expressions and body language that define the interaction's emotional tone. - Examples: "smiling warmly," "listening intently with a thoughtful expression," "laughing with joy." - **Appearance and Demographics:** Keep physical descriptions general and functional. Describe practical, everyday clothing relevant to the situation. Avoid specific age ranges unless essential for the context (e.g., "a child," "an elderly person") to ensure characters are relatable for people living in Finland and free from bias.
# SAFETY & RESPONSIBLE AI GUIDELINES (Strictly Enforced) 1. **General Exclusion:** AVOID any scene that is abstract, niche, ambiguous, emotionally distressing, or negative. The scene must be universally positive and easy to understand. 2. **Proactive Safety by Design:** * **People & Interaction:** All individuals must be generic, non-identifiable, and fully clothed in practical, everyday attire. Depict only neutral, friendly, or familial interactions. Absolutely no romantic intimacy, suggestive poses, or revealing clothing. * **Safe Environments:** The scene must be safe. No violence, weapons, accidents, illegal acts, or depiction of emergencies. For the "healthAndWellbeing" topic, focus on preventative care, consultation, and general wellness (e.g., a conversation with a doctor, a yoga class, buying healthy food). Do not depict injuries, wounds, invasive medical procedures, or distress. * **Representation & Bias:** Actively create inclusive and respectful scenes. Counteract stereotypes by default. For example, show diversity in gender roles, professions, and family structures (e.g., a male daycare teacher, a female IT professional, diverse groups of friends). Ensure people of different backgrounds and abilities are represented naturally and positively. * **Data Privacy:** Ensure no personally identifiable information (PII) is requested. No readable text on documents, screens, license plates, or badges. All text should be generic and illegible. No logos or specific brand names. 3. **Ultimate Fallback:** If any requested detail has even a minor risk of triggering a safety filter, default to a simpler, more benign version of the scene. A successful, safe image generation is the top priority.
# ONE-SHOT EXAMPLE (Follow this format precisely for the output) { "id": "imageDescription_01", "description": "A photorealistic, medium shot of two friends, a man and a woman, hiking in a Finnish national park during autumn. They have stopped on a well-marked trail to look at a map. The man is holding the map, and the woman is pointing towards the trail ahead, both smiling and engaged in a friendly discussion. They are wearing practical outdoor clothing like hiking jackets and carrying backpacks. The forest around them is full of birch and pine trees with golden autumn leaves on the ground. In the background, a wooden trail signpost with illegible text is visible. The atmosphere is peaceful, friendly, and adventurous.", "sample": "Kuvassa on kaksi ystÀvystÀ, mies ja nainen, jotka ovat retkellÀ luonnossa. On syksy, koska puissa on keltaisia lehtiÀ ja ihmisillÀ on lÀmpimÀt takit. He seisovat metsÀpolulla ja katsovat karttaa. Nainen osoittaa eteenpÀin ja hymyilee. Luulen, ettÀ he suunnittelevat, mihin suuntaan heidÀn pitÀisi mennÀ. He nÀyttÀvÀt iloisilta ja innostuneilta. MielestÀni luonnossa liikkuminen on todella hyvÀ harrastus. Se on rentouttavaa ja ilmaista. Suomessa on hienot mahdollisuudet retkeilyyn.", "vocabulary": ["retki", "luonto", "ystÀvÀ", "syksy", "kartta", "suunnitella", "polku", "metsÀ", "harrastus", "rentouttava", "vaellus"], "annotatedWords": ["kartta", "reppu", "takki", "polku", "puu", "opaste", "suunnitella", "syksy"] } ''';
This single prompt performs three critical jobs at once:
description: Generates the English prompt that will be input into the image generator.
vocabulary: Presents a CEFR B1 vocabulary list specific to the scene. This list is shown to the user as an optional resource for checking word meanings.
annotatedWords: Selects a strict subset of words that are visually anchored in the image, which will later be highlighted with arrows.
Safety by Design: Trusting the Machine
The prompt includes an extensive âSafety & Responsible AI Guidelinesâ section, which is a crucial component of our production pipeline.
Because image generation happens instantly, we operate without any human-in-the-loop moderation. This means manual approval of every image before it reaches the user is impossible. Consequently, safety measures must be shifted upstream, directly into the prompt engineering phase.
By explicitly guiding the model to avoid PII, counter stereotypes, and strictly filter out âemotionally distressingâ content, we significantly reduce the likelihood of the image generation model producing unusable output. Essentially, the text model functions as a pre-flight safety check for the image model.
4. Implementation: The Logic Pipeline
4.1 Enforcing the Contract (Text to JSON)
This is the most critical part of the integration. We use gemini-2.5-flash and force it to behave like an API endpoint using responseMimeType and responseSchema.
final fullPrompt = Content.text(textPrompt);
final model = FirebaseAI.googleAI().generativeModel( model: 'gemini-2.5-flash', generationConfig: GenerationConfig( responseMimeType: 'application/json', // <--- The magic key responseSchema: ImageDescriptionPractice.imageDescriptionJsonSchema, temperature: 1.0, // High creativity for unique scenarios ), );
final response = await model.generateContent([fullPrompt]);
// 1. Parse the JSON safely final practice = ImageDescriptionPractice.fromJson(jsonDecode(response.text!));
// 2. Immediately trigger visual generation using the description we just got final imageBytes = await generateImage(topic: practice.description);
// 3. Return the fully hydrated object return practice.copyWith(generatedImageBytes: imageBytes);
By enforcing the schema, we eliminate concerns about hallucinated markdown or broken syntax, allowing for the safe execution of jsonDecode. The response is then parsed into our Dart object, which immediately initiates the visual generation process.
At this specific moment in the pipeline, we have a single Dart object containing:
Scenario: The natural language prompt.
Vocabulary: The B1-level word list.
Targets: The exact words to annotate later.
Visual: The raw bytes of the image the user will see (explained in 4.2).
4.2 Turning Text into Pixels
Now letâs focus on generateImage method which we use to visualize the scene that the text model just described. For this, we switch to an image-capable Gemini model (gemini-2.5-flash-image). We need to explicitly configure the model to return binary image data instead of just text.
The necessary configuration involves initializing a separate model instance and setting the responseModalities to include ResponseModalities.image. This configuration is the vital instruction that tells Gemini, âRender the scene visually.â
Future<Uint8List?> generateImage({required String topic}) async { // 1. Initialize the Image-Capable Model final model = FirebaseAI.googleAI().generativeModel( model: 'gemini-2.5-flash-image', generationConfig: GenerationConfig( responseModalities: [ResponseModalities.text, ResponseModalities.image], temperature: 1.2, topP: 0.95, topK: 40, candidateCount: 1, ), );
// 2. Send the Description final prompt = [Content.text(topic)]; final response = await model.generateContent(prompt);
// 3. Extract the Bytes final parts = response.candidates.firstOrNull?.content.parts; if (parts == null || parts.isEmpty) return null;
for (final part in parts) { // If the model chats back, we log it (useful for debugging) if (part is TextPart) { print('Log: ${part.text}'); }
// The Golden Ticket: Raw Image Bytes if (part is InlineDataPart) { return part.bytes; } } return null; }
How it works:
The Input: We pass the rich, photorealistic description generated in Step 3.1 as the topic.
âdescriptionâ: âA photorealistic, medium shot of a classic Finnish red wooden summer cottage (âmökkiâ) on a gentle grassy slope, next to a serene, calm blue lake. The cottage features a wide, light-colored wooden terrace. On the terrace, an adult woman with a kind smile, dressed in a simple t-shirt and shorts, is comfortably seated on a light wooden chair, engrossed in reading an open book. Beside her, a child (approximately 7â8 years old), also dressed in light summer attire, is happily drawing with colorful crayons on paper at a small, low wooden table. A bright yellow juice box is placed near the childâs drawing. Tall green pine and slender birch trees stand around the cottage, with a few natural rocks on the ground. The sky is clear blue with soft white clouds, and the sun shines warmly, creating a peaceful, joyful, and quintessential Finnish summer atmosphere. No legible text, logos, or brand names are visible.â
The Output: The response doesnât come back as a URL; it comes back as an InlineDataPart containing the raw PNG/JPEGÂ bytes.
The UI: We return these bytes (Uint8List) and stick them directly into our ImageDescriptionPractice object. This allows the Flutter UI to render the image immediately using Image.memory(), with no need to download from a secondary URL.
The InlineDataPart contains the raw PNG/JPEG bytes. We write these directly into our ImageDescriptionPractice object. From here, the user sees the image, and a later step in the pipeline will use these exact bytes to generate the annotated overlay.
5. Annotating the Image with Strict Vocabulary Mapping
Once we have a base image and a list of target words, the final stage is to burn the annotatedWords into the pixelsâââliterally. The annotation flow takes the raw image plus the annotated word list and asks Gemini to return a new image with professional labels and pointing lines, with one label per word, no omissions, no duplicates.
We do this in two layers: the calling code and the prompt.
Step 1â The annotation prompt: turn words into arrows
The real âmagicâ is in the annotatedImagePrompt method. It takes the raw annotated word list and turns it into a strict contract for how labels must be placed on the image.
String annotatedImagePrompt(List<String> annotatedWords) { // Join the list into a comma-separated string for the prompt final wordsString = annotatedWords.join(', ');
return ''' **Primary Goal:** Generate a modified version of the input image that includes professional, educational annotations (labels and pointing lines) precisely matching the provided Finnish words list.
**CRITICAL CONSTRAINTS (MUST FOLLOW):**
1. **Target Vocabulary List:** You MUST use the following list of Finnish words for annotation: **$wordsString** 2. **Exact Use:** Every single word from the list must be used EXACTLY ONCE as a label. Do not use any word more than once. Do not omit any words.
**Annotation Logic & Placement Rules:**
* **Categorize & Point:** Before drawing, categorize each word and apply the correct pointing logic: * **Concrete Nouns (e.g., objects, people, animals):** Draw a line with an arrowhead pointing *directly and unambiguously* to the specific object. * **Verbs / Actions:** Draw a line pointing to the *focal point* of the action. (e.g., for "petting", point to the hand touching the animal; for "reading", point to the open book). * **Adjectives / Abstract Concepts (e.g., cozy, peaceful, together):** These describe the *overall atmosphere* or the group relationship. Do NOT point to a single random object. Instead, place the label in a prominent area of negative space (like a background wall) and draw a line that indicates the entire scene or the central group of subjects, signifying the general mood.
* **Layout Strategy:** * **Negative Space Priority:** Place text labels in the "negative space" of the image (blurred background, plain walls, ceiling) so they do not cover the main subjects or actions. * **No Crossing Lines:** Organize the labels and lines so that **no two lines cross each other**. This is essential for clarity. * **Clear View:** Ensure lines do not pass over people's faces or critical parts of the objects they are pointing to.
**Visual Styling Requirements:**
* **Text Appearance:** All text labels must be a clean, bold, sans-serif font. The text color should be high-contrast (e.g., white text with a distinct black outline) to ensure maximum legibility against any background texture. * **Line Appearance:** The connector lines must be clean, distinct, and match the text style. They must end in a clear arrowhead at the target.
**Final Output:** A photorealistic image identical to the input, but overlaid with a clean, organized, and professional set of annotations that use exactly the vocabulary list provided, following all the logic and layout rules above. '''; }
Every word in annotatedWords list must appear exactly once as a label. This is crucial for pedagogy and makes the overlay feel âcomplete.â
We distinguish:
Nouns â arrows to objects,
Verbs â arrows to where the action is happening,
Adjectives/moods â labels pointing to the whole scene or group.
Readable design: We push labels into negative space, forbid crossing lines, and require highâcontrast typography so the learner can easily scan the scene.
Step 2â Call the annotation model with image + vocabulary
The annotatedImage function below wires everything together: it bundles the userâs image and the annotation instructions into a single multimodal request, and then reads back the edited image from inlineDataParts.
try { final prompt = Content.multi([ TextPart(annotatedImagePrompt(annotatedWords)), InlineDataPart('image/jpeg', imageBytes), ]);
final model = GenerativeAiUtils.firebaseAI.generativeModel( model: 'gemini-3-pro-image-preview', generationConfig: GenerationConfig( temperature: 0.4, topP: 0.9, topK: 40, candidateCount: 1, responseModalities: [ResponseModalities.image], ), );
final response = await model.generateContent([prompt]);
if (response.inlineDataParts.isNotEmpty) { return response.inlineDataParts.first.bytes; } } catch (e) { print("Error: Failed to annotate image: $e"); } return null; }
Key details:
We pass both the instructions (TextPart) and the original image bytes (InlineDataPart) in a single Content.multi call.
We use gemini-3-pro-image-preview, which supports image editing with reliable annotation processing.
We explicitly ask for an image in responseModalities, then read the first inlineDataPart as our edited, annotated image.
6. Conclusion: Accessible âMagicâ
What used to be a research projectâââa pipeline that generates a scenario, creates an image, and strictly maps vocabulary to visible objectsâââis now just another feature behind buttons in a Flutter app.
The key isnât that Gemini is âsmart.â It is that we:
Modeled the data we actually needed (ImageDescriptionPractice +Â schema).
Designed prompts as contracts, not wishes (imagePrompt + annotatedImagePrompt).
Let Firebase AI handle the heavy lifting, so our Flutter code stays focused on product behavior, not infrastructure.
The barrier to entry is lower; the ceiling for innovation is higher. You donât need a custom ML stack to build magical language-learning experiences anymoreâââyou just need a good schema, a disciplined prompt, and a willingness to ship.
Focus on the creativity; let Gemini and Firebase handle the rest.
Posted by Nick Butcher, Jetpack Compose Product Manager
Today, the Jetpack Compose December â25 release is stable. This contains version 1.10 of the core Compose modules and version 1.4 of Material 3 (see the full BOM mapping), adding new features and major performance improvements.
To use todayâs release, upgrade your Compose BOM version to 2025.12.00:
We know that the runtime performance of your app is hugely important to you and your users, so performance has been a major priority for the Compose team. This release brings a number of improvementsâand you get them all by just upgrading to the latest version. Our internal scroll benchmarks show that Compose now matches the performance you would see if using Views:
Scroll performance benchmark comparing Views and Jetpack Compose across different versions of Compose
Pausable composition in lazy prefetch
Pausable composition in lazy prefetch is now enabled by default. This is a fundamental change to how the Compose runtime schedules work, designed to significantly reduce jank during heavy UI workloads.
Previously, once a composition started, it had to run to completion. If a composition was complex, this could block the main thread for longer than a single frame, causing the UI to freeze. With pausable composition, the runtime can now "pause" its work if it's running out of time and resume the work in the next frame. This is particularly effective when used with lazy layout prefetch to prepare frames ahead of time. The Lazy layout CacheWindow APIs introduced in Compose 1.9 are a great way to prefetch more content and benefit from pausable composition to produce much smoother UI performance.
Pausable composition combined with Lazy prefetch help reduce jank
Weâve also optimized performance elsewhere, with improvements to Modifier.onPlaced, Modifier.onVisibilityChanged, and other modifier implementations. Weâll continue to invest in improving the performance of Compose.
New features
Retain
Compose offers a number of APIs to hold and manage state across different lifecycles; for example, remember persists state across compositions, and rememberSavable/rememberSerializable to persist across activity or process recreation. retain is a new API that sits between these APIs, enabling you to persist values across configuration changes without being serialized, but not across process death. As retain does not serialize your state, you can persist objects like lambda expressions, flows, and large objects like bitmaps, which cannot be easily serialized. For example, you may use retain to manage a media player (such as ExoPlayer) to ensure that media playback doesnât get interrupted by a configuration change.
@Composable
fun MediaPlayer() {
val applicationContext = LocalContext.current.applicationContext
val exoPlayer = retain { ExoPlayer.Builder(applicationContext).apply { ... }.build() }
...
}
We want to extend our thanks to the AndroidDev community (especially the Circuit team), who have influenced and contributed to the design of this feature.
Material 1.4
Version 1.4.0 of the material3 library adds a number of new components and enhancements:
TextField now offers an experimental TextFieldState based version, which provides a more robust method for managing text's state. In addition, new SecureTextField and OutlinedSecureTextField variants are now offered. The material Text composable now supports autoSize behaviour.
TimePicker now supports switching between the picker and input modes.
A vertical drag handle helps users to change an adaptive paneâs size and/or position.
Horizontal centered hero carousel
Note that Material 3 Expressive APIs continue to be developed in the alpha releases of the material3 library. To learn more, see this recent talk:
New animation features
We continue to expand on our animation APIs, including updates for customizing shared element animations.
Dynamic shared elements
By default, sharedElement() and sharedBounds() animations attempt to animate
layout changes whenever a matching key is found in the target state. However, you may want to disable this animation dynamically based on certain conditions, such as the direction of navigation or the current UI state.
To control whether the shared element transition occurs, you can now customize the
SharedContentConfig passed to rememberSharedContentState(). The isEnabled
property determines if the shared element is active.
SharedTransitionLayout {
val transition = updateTransition(currentState)
transition.AnimatedContent { targetState ->
// Create the configuration that depends on state changing.
fun animationConfig() : SharedTransitionScope.SharedContentConfig {
A new modifier, Modifier.skipToLookaheadPosition(), has been added in this release, which keeps the final position of a composable when performing shared element animations. This allows for performing transitions like ârevealâ type animation, as can be seen in the Androidify sample with the progressive reveal of the camera. See the video tip here for more information:
Initial velocity in shared element transitions
This release adds a new shared element transition API, prepareTransitionWithInitialVelocity, which lets you pass an initial velocity (e.g. from a gesture) to a shared element transition:
Modifier.fillMaxSize()
.draggable2D(
rememberDraggable2DState{offset+=it},
onDragStopped={velocity->
// Set up the initial velocity for the upcoming shared element
// transition.
sharedContentStateForDraggableCat
?.prepareTransitionWithInitialVelocity(velocity)
showDetails=false
},
)
A shared element transition that starts with an initial velocity from a gesture
Veiled transitions
EnterTransition and ExitTransition define how an AnimatedVisibility/AnimatedContent composable appears or disappears. A new experimental veil option allows you to specify a color to veil or scrim content; e.g., fading in/out a semi-opaque black layer over content:
Veiled animated content â note the semi-opaque veil (or scrim) over the grid content during the animation
unveilIn(initialColor = veilColor) togetherWith slideOutHorizontally { it }
}
},
) { targetPage ->
...
}
Upcoming changes
Deprecation of Modifier.onFirstVisible
Compose 1.9 introduced Modifier.onVisibilityChanged and Modifier.onFirstVisible. After reviewing your feedback, it became apparent that the contract of Modifier.onFirstVisible was not possible to honor deterministically; specifically, when an item first becomes visible. For example, a Lazy layout may dispose of items that scroll out of the viewport, and then compose them again if they scroll back into view. In this circumstance, the onFirstVisible callback would fire again, as it is a newly composed item. Similar behavior would also occur when navigating back to a previously visited screen containing onFirstVisible. As such, we have decided to deprecate this modifier in the next Compose release (1.11) and recommend migrating to onVisibilityChanged. See the documentation for more information.
Coroutine dispatch in tests
We plan to change coroutine dispatch in tests to improve test flakiness and catch more issues. Currently, tests use the UnconfinedTestDispatcher, which differs from production behavior; e.g., effects may run immediately rather than being enqueued. In a future release, we plan to introduce a new API that uses StandardTestDispatcher by default to match production behaviours. You can try the new behavior now in 1.10:
@get:Rule // also createAndroidComposeRule, createEmptyComposeRule
val rule = createComposeRule(effectContext = StandardTestDispatcher())
Using the StandardTestDispatcher will queue tasks, so you must use synchronization mechanisms like composeTestRule.waitForIdle() or composeTestRule.runOnIdle(). If your test uses runTest, you must ensure that runTest and your Compose rule share the same StandardTestDispatcher instance for synchronization.
Fix UI quality issues audits your UI for common problems, such as accessibility issues, and then proposes fixes.
To see these tools in action, watch this recent demonstration:
Happy Composing
We continue to invest in Jetpack Compose to provide you with the APIs and tools you need to create beautiful, rich UIs. We value your input, so please share your feedback on these changes or what you'd like to see next in our issue tracker.
It’s tempting to see web and application accessibility as altruistic rather than profitable. But that’s not true, contends Navya Agarwal, a senior software engineer and technical lead at Adobe who focuses on frontend development.
Agarwal is also an accessibility expert who actively contributes to the W3C Accessible Rich Internet Applications (ARIA) Working Group.
“Building equitable products isn’t simply about altruism,” said Agarwal. “It can create opportunities for market expansion, penetration and sustainable growth. So that’s a section that is often left out by someone who is developing a new product, but building for all makes sure that you are getting more revenue at the end.”
Adobe’s AI Assistant Prioritizes Accessibility
Agarwal was on the team that built Adobe Express’ new AI Assistant, which was released in October and is in beta. The AI assistant soon will be integrated with ChatGPT Plus as well, she added.
The assistant is basically a generic conversational interface designed to make creativity more accessible and intuitive for everyone, she said.
“What we want to present to the world is a more humanly centered model where you focus on the intention, and the system helps you orchestrate everything else around you so it can go from any possibilities, basically creating images, rewriting content, making quick, quick edits, anything,” she said.
Accessibility is often considered an add-on, rather than an essential part of the product. That’s why it’s often layered on top of the existing product created for a general audience, rather than embedded into the product development process. Adobe Express AI Assistant was designed to support accessibility from its inception.
“It expands to cognitive disabilities, for example, things like ADHD, dyslexia, which are not really talked about right now; it’s underrepresented,” she said. “For example, if someone is going on a website who is facing dyslexia and ADHD, the website looks cluttered.”
The offering shows what’s possible when AI is applied to accessibility. While many think of accessibility as relevant to the vision or hearing impaired, with AI it can accommodate other challenges as well. For instance, the Adobe Express AI Assistant can change design to be less cluttered for those with ADHD, autism or other sensory issues. It can also just be helpful to people as they age, she added.
“Just imagine that you have agent where you only have a voice command; you’re just talking and it is … giving you the results,” she said. “All these are use cases that can be served with adaptive technology.”
While AI does introduce the risk of hallucinations, Agarwal sees that as a lesser evil than having no text descriptions or support at all.
“It expands to cognitive disabilities, for example, things like ADHD, dyslexia, which are not really talked about right now; it’s underrepresented.” — Navya Agarwal, senior software engineer and technical lead at Adobe
As the tech world moves toward agentic AI, she foresees users having a digital personal shopping assistant to help users find clothes based on preferred parameters.
Benefits to Developers
With AI, developers are no longer limited to tactics that only assist the vision or hearing impaired, she said. Instead, users can tell the assistant their accommodation needs and the AI can create those, she said. That means users don’t have to tolerate a cluttered site or toolbar, for example; they can just talk to the web using voice commands or writing prompts. Some screen readers have already added a feature that lets users request an image description from ChatGPT or Claude without having to switch context, she said.
Previously, developers could only add an alt-text description to an image that says something simple — this is a long-sleeve knitted jumper in black that’s 100% cotton, for instance.
“But it doesn’t tell you so many different things, whether it’s lightweight or whether it’s chunky, etc.,” Agarwal said. “As AI enters the system, now we can just simply have our image being described in the context by using ChatGPT or Claude. Basically, my screen reader already has a feature that lets me request an image description from ChatGPT or Claude without having to switch context to do it.”
Incorporating accessibility also offers benefits to developers themselves, she added.
“By embedding more equitable practices into our product development process up front, rather than as an afterthought, we can enable teams to launch products faster, with lower risk and greater success for broader audiences,” Agarwal said.