Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
153254 stories
·
33 followers

I've Been Busy Updating My Pluralsight Courses

1 Share

For the past few months, I’ve been a quiet but busy beaver updating my Pluralsight Courses to .NET 10.

If you’re not familiar with my courses, here are the ones that are already updated:

Building a Web App with ASP.NET Core 10, MVC, Entity Framework, TailwindCSS, and Vue: When you’re ready to become a web developer, it can be hard to know where to start. In this course, you will go from your first HTML page all the way through creating single-page applications.

ASP.NET Core 10 Best Practices: Entities, Validation, and Data Models: Are you struggling when dealing with entities, validation, and models inside your .NET applications? This course will teach you pragmatic best practices for using entities, validation, and models in ASP.NET Core projects.

Using gRPC in ASP.NET Core 10: gRPC is a new way to build APIs based on contracts and binary serialization. This course will show you how to write and use these APIs in a variety of different clients.

You can get a complete list of my courses at Pluralsight by looking at my Author’s Page

I’m also working on two new Entity Framework Core 10 Courses to be announced very soon. Follow me on BlueSky or Twitter for announcements.


If you liked this article, see Shawn's courses on Pluralsight.
Read the whole story
alvinashcraft
12 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

.NET Upgrades Done Right: Beyond the "It Builds" Target

1 Share
With technology evolving at the speed of light, it is hard to manage upgrades and updates. Sometimes it is easy to get lulled into the "It builds" so its good enough mindset. However, there are a few quick steps that you can take to help reduce the burden of doing things the "right" way.
Read the whole story
alvinashcraft
19 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Scalable AI with Azure Cosmos DB: Bringing Generative AI to Enterprise Scale with Super Insight by AVASOFT

1 Share

Azure Cosmos DB enables scalable AI-driven document processing, addressing one of the biggest barriers to operational scale in today’s enterprise AI landscape. Organizations continue to manage massive volumes of structured and unstructured content—contracts, regulatory filings, operational records, images, and field documentation—yet many workflows remain fragmented, manual, and slow.

Every month, the Scalable AI in Action with Azure Cosmos DB series brings the community together with Microsoft partners who are building real, production AI systems — not slides, not demos built for the occasion, but live walkthroughs of solutions already solving enterprise problems at scale. In this month, we were thrilled to welcome AVASOFT, a Microsoft partner with deep expertise across Modern Work, Data & AI, and Digital App Innovation.

The leadership fireside chat was presented by Sairam Srinivasan, CTO at AVASOFT, who addressed our questions with depth and candor. In the Architecture segment, Sarvesh Raghupathy and Balamurugan Subramanian, Architects at AVASOFT, demonstrated their Generative AI solution, AVASOFT Nexus, powered by Azure Cosmos DB.

▶ Watch the session on-demand


Meet the Partner: AVASOFT

AVASOFT — Engineering AI-Powered Enterprise Solutions

AVASOFT is a full-spectrum Microsoft solutions partner spanning Modern Work, Data & AI, Infrastructure, Security, and Digital & App Innovation. With a strong track record of delivering Azure-native solutions, AVASOFT has invested deeply in Azure Cosmos DB as the data backbone for its next-generation Generative AI offerings. Their portfolio includes enterprise-grade applications built on Microsoft Copilot, Azure AI Foundry, and Azure Cosmos DB — making them a natural fit for the Scalable AI in Action series.

What sets AVASOFT apart is their commitment to building solutions that are not only functionally impressive but architecturally sound — designed for global scale, low latency, and the kind of operational resilience that enterprise customers demand. Their April session exemplified this philosophy.


Inside the Session

The April 2026 session followed our signature format: a leadership conversation on real-world AI trends, a technical deep dive and architecture walkthrough, and a live Q&A with pre-submitted and real-time audience questions. Here is how the session unfolded:

Opening — Leadership Perspectives on Enterprise GenAI The session opened with a candid conversation on where Generative AI stands in enterprise adoption today — the genuine breakthroughs, the persistent challenges, and the architectural decisions that determine whether AI investments translate to measurable business value.

Technical Deep Dive — AVASOFT’s GenAI Solution Walkthrough AVASOFT’s engineering team walked through their GenAI solution end-to-end — from data ingestion and vector storage in Azure Cosmos DB, through retrieval-augmented generation pipelines, to the final user-facing interface. The session covered code, configuration, and design decisions in detail.

Architecture Review — Reference Architecture & Design Patterns A dedicated segment focused on the solution’s reference architecture — how Azure Cosmos DB integrates with Azure AI Foundry, Azure AI Search, and the broader Azure AI ecosystem to create a cohesive, scalable platform. Design patterns shared are intended to be reusable across industries.

Feature Showcase — Latest Azure Cosmos DB Capabilities The Azure Cosmos DB Engineering team presented the latest platform features most relevant to AI workloads — covering vector search enhancements, multi-agent memory support, and performance improvements announced in recent months.

Community Q&A — Live & Pre-Submitted Questions The session concluded with a rich Q&A addressing both pre-submitted and live queries — covering everything from cost optimisation strategies to multi-region deployment patterns for AI workloads.


Solution Architecture

AVASOFT’s GenAI solution is architected around Azure Cosmos DB as the central operational data store, handling document storage, vector embeddings, and real-time retrieval — all within a single, globally distributed service. The diagram below illustrates the end-to-end data and inference flow presented during the session.

AVASOFT Nexus Architecture Diagram

[Figure – Architecture Diagram] The diagram illustrates the end-to-end flow across: Azure Cosmos DB · Azure AI Foundry · Azure AI Search · GenAI Inference Layer.


Why Azure Cosmos DB for GenAI?

Building a production-grade Generative AI solution is not simply a matter of wiring up a large language model to a database. The data layer must handle vector search at low latency, maintain transactional consistency for state-bearing AI agents, and scale elastically as query loads fluctuate unpredictably. Azure Cosmos DB addresses each of these requirements within a single managed service.

  • Integrated Vector Search — Native vector indexing eliminates the need for a separate vector database, reducing architectural complexity and enabling semantic search directly alongside operational data.
  • Multi-Agent Thread Storage — With thread storage now generally available, AI agents built on Azure AI Foundry can persist conversational context in Cosmos DB — enabling continuity across sessions without custom state management.
  • Global Distribution & Low Latency — Multi-region replication with single-digit millisecond reads ensures consistent response times regardless of where end users are located.
  • Autoscale Throughput — Cosmos DB’s autoscale capability absorbs the burst traffic patterns typical of AI-assisted workflows, removing the need to manually right-size throughput allocations.
  • Flexible Data Modelling — The schema-agnostic document model accommodates the diverse, rapidly evolving data structures that AI pipelines produce — without costly schema migrations.
  • Enterprise Security & Compliance — Built-in role-based access control, private endpoints, customer-managed keys, and comprehensive compliance certifications make Cosmos DB enterprise-ready from day one.

“The database is not just infrastructure for an AI application — it is the memory system. Getting that foundation right determines everything that follows: accuracy, latency, scalability, and the ability to evolve.”

— Sairam Srinivasan, CTO at AVASOFT


Key Takeaways from the Session

Whether you attended live or are watching on-demand, here are the most actionable insights from AVASOFT’s presentation:

  • RAG is a Pattern, not a Product Retrieval-Augmented Generation requires deliberate design choices at every layer. The quality of retrieval matters as much as the quality of generation.
  • Single-Platform Data Strategy Consolidating operational data and vector embeddings in Cosmos DB reduces round-trip latency and simplifies data governance across the AI pipeline.
  • Design for Global Scale from Day One Enterprise AI solutions that start single-region typically face painful re-architecture later. Global distribution should be a first-class design consideration.
  • Agent Memory Unlocks New Use Cases Persistent agent state stored in Cosmos DB enables conversational AI that genuinely learns from prior interactions — a qualitative leap beyond stateless chatbots.
  • Security as Architecture Enterprise-grade AI requires security controls embedded in the data layer — not bolted on afterwards. Cosmos DB’s native features simplify this significantly.
  • Measure What Matters AVASOFT shared practical approaches to evaluating GenAI solution quality — moving beyond user sentiment to quantifiable metrics for retrieval accuracy and response relevance.

About the Series

The Scalable AI in Action with Azure Cosmos DB series runs monthly. Each edition features a Microsoft partner demonstrating a real-world GenAI solution in production — with a leadership conversation, live technical walkthrough, architecture review, and open Q&A.

Past partners in the series: MLAI Digital · MAQ Software · Celebal Technologies · Neudesic (IBM) · Adiom · Datavail · AVASOFT ★ Apr 2026

View last editions →


What to Watch and Where to Go Next

  1. Watch the full session on-demand at aka.ms/scalableai-live-apr26 — the complete recording includes the architecture walkthrough, code review, and Q&A.
  2. Register for upcoming monthly editions at aka.ms/scalable-ai-cosmosdb to attend live and submit questions directly to the engineering team and partners.
  3. Explore AVASOFT’s Azure Cosmos DB practice at avasoft.com/azure-cosmosdb if you are looking for an experienced partner to accelerate your own GenAI build.
  4. Review the Azure Cosmos DB documentation on Microsoft Learn — vector search, multi-agent thread storage, and autoscale are the features most prominently featured in this session.
  5. Submit a proposal for a future series edition if your team has built a GenAI solution on Azure Cosmos DB that the community would benefit from seeing.

Don’t Miss the Next Edition

Join us each month to see how Microsoft partners are bringing scalable, production-grade AI to life — powered by Azure Cosmos DB.

▶ Watch the April 2026 Session

About Azure Cosmos DB

Azure Cosmos DB is a fully managed and serverless NoSQL and vector database for modern app development, including AI applications. With its SLA-backed speed and availability as well as instant dynamic scalability, it is ideal for real-time NoSQL and MongoDB applications that require high performance and distributed computing over massive volumes of NoSQL and vector data.

To stay in the loop on Azure Cosmos DB updates, follow us on XYouTube, and LinkedIn.  Join the discussion with other developers on the #nosql channel on the Microsoft Open Source Discord.

The post Scalable AI with Azure Cosmos DB: Bringing Generative AI to Enterprise Scale with Super Insight by AVASOFT appeared first on Azure Cosmos DB Blog.

Read the whole story
alvinashcraft
53 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

GitHub Copilot and me: The evolution of my partnership with a coding assistant

1 Share

GitHub Copilot and me: The evolution of my partnership with a coding assistant

Read the whole story
alvinashcraft
1 minute ago
reply
Pennsylvania, USA
Share this story
Delete

The Best Risk Mitigation Strategy in Data? A Single Source of Truth

1 Share

Every data leader has a version of this story. A regulatory audit surfaces a metric that doesn’t match across systems. A board member catches conflicting revenue numbers in two reports presented back-to-back. An AI tool generates a recommendation based on data that hasn’t been governed since the analyst who built it left the company two years ago. The specifics change, but the pattern doesn’t: Somewhere in the stack, data risk turned into business risk, and nobody saw it coming.

In my first article, I covered what a semantic layer is and why it matters. In my second, I spoke with early adopters about what happens when you actually build one. This piece tackles a different angle: The semantic layer as a risk mitigation strategy. Not risk in the abstract, compliance-framework sense, but the practical, operational risk that quietly drains organizations every day—bad numbers reaching decision-makers, sensitive data reaching the wrong people, and metric changes that never fully propagate.

Three risks hiding in plain sight

Data risk tends to concentrate in three areas, and most organizations are exposed in all of them simultaneously.

The first is accuracy. Inaccurate data leading to bad decisions is the oldest problem in analytics, and it hasn’t gone away. It’s gotten worse. As organizations add more tools, more dashboards, and more AI-powered applications, the surface area for error expands. A revenue metric defined one way in a Tableau workbook, another way in a Power BI model, and a third way in a Python notebook isn’t just an inconvenience. It’s a liability. When leadership makes a strategic decision based on a number that turns out to be wrong—or, more commonly, based on a number that’s one version of right—the downstream consequences are real: misallocated resources, missed targets, eroded trust in the data team.

The second is governance and access. Most organizations have some framework for controlling who sees what data. In practice, those controls are scattered across warehouses, BI tools, individual dashboards, shared drives, and cloud storage buckets. Each system has its own permissions model, its own admin interface, and its own gaps. The result is a patchwork that’s expensive to maintain and nearly impossible to audit with confidence. Sensitive data finds its way into a dashboard it shouldn’t be in—not because someone acted maliciously, but because the governance surface area is too large to manage consistently.

The third is change management. A CFO decides that ARR should exclude trial customers starting next quarter. In theory, that’s a single metric change. In practice, it’s a scavenger hunt. That ARR calculation lives in a warehouse view, two Tableau workbooks, a Power BI model, an Excel report that someone on the FP&A team maintains manually, and now the new AI analytics tool that pulls directly from the data lake. Some of those get updated. Some don’t. Three months later, someone notices the numbers don’t match and the cycle starts again. The risk isn’t that the change was wrong—it’s that the change was never fully implemented.

These three risks—accuracy, governance, and change management—aren’t independent. They compound. An ungoverned metric that’s defined inconsistently and can’t be updated in one place is a ticking clock. The question isn’t whether it causes a problem, it’s when.

The legacy approach: more people, more tools, more problems

The traditional response to data risk has been to throw structure at it—and structure usually means people and process.

The most common pattern is the BI analyst as gatekeeper. Critical metrics, reports, and dashboards are managed by a centralized team. Need a new report? Submit a request. Need a metric change? Submit a request. Need to understand why two numbers don’t match? Submit a request and wait. This model exists because organizations don’t trust their data enough to let people self-serve, and for good reason—without a governed foundation, self-service creates chaos. But the gatekeeper model has its own costs. It’s slow. It creates bottlenecks. It’s expensive to staff. And performance is inconsistent—the quality of the output depends entirely on which analyst picks up the ticket and which tools they prefer.

Governance gets its own layer of complexity. Organizations deploy access controls across their data warehouse, BI platforms, file storage, and application layer—each with different permission models, administrators, and audit capabilities. Quality reporting, lineage, and business ownership tracking create additional tooling, complexity, and management overhead. Maintaining consistency across all of these systems is resource-intensive, and the more tools you add, the harder it gets. Most organizations know their governance has gaps. They just can’t find them all.

The combination of centralized BI teams and sprawling governance frameworks produces a predictable outcome: large, slow-moving data organizations that spend more time fixing and maintaining the infrastructure than actually delivering data or insight. When everything is managed manually across dozens of tools, problems don’t grow linearly—they grow exponentially. Every new dashboard, data source, BI tool adds another surface to govern, another place where logic can diverge, another potential point of failure. The legacy approach doesn’t scale. It just gets more expensive.

The semantic approach: govern once, access everywhere

The semantic layer offers a fundamentally different model for managing data risk. Instead of distributing control across every tool in the stack, it consolidates it.

Start with accuracy and change management because the semantic layer addresses both with the same mechanism: A single location for all metric definitions, business logic, and calculations. When ARR is defined once in the semantic layer, it’s defined once everywhere. Tableau, Power BI, Excel, Python, your AI chatbot—they all reference the same governed definition. When the CFO decides to exclude trial customers, that change happens in one place and propagates automatically to every downstream tool. No scavenger hunt. No version that got missed. No analyst discovering three months later that their workbook is still running the old logic. And when that same CFO wants to know how we calculated that same metric several years ago? Semantic layers are driven by version control by default, allowing for seamless versioning across key metrics.

This same centralization transforms governance. Instead of managing access controls across a warehouse, three BI platforms, a shared drive, and an application layer, organizations can align governance around the semantic layer itself. It becomes the single access point for governed data. Users connect to the semantic layer and pull data into the tool of their choice, but the permissions, definitions, and business logic are all managed in one place. The governance surface area shrinks from dozens of systems to one.

But the semantic layer does something else that the legacy approach can’t: it makes data self-documenting. In a traditional environment, the context around data—what a metric means, why certain records are excluded, how a calculation works—lives in the heads of analysts, in scattered documentation, or nowhere at all. The semantic layer captures that context as structured metadata alongside the models, columns, and metrics themselves. Field descriptions, metric definitions, relationship mappings, business rules—all of it is documented where the data lives, not in a wiki that nobody updates. This is what makes genuine self-service possible. When the data carries its own context, users don’t need to submit a ticket to understand what they’re looking at (and AI agents can read-it in for contextual understanding at scale).

The practical result is a shift from centralized gatekeeping to federated, hub-and-spoke delivery. The semantic layer is the hub: governed, documented, consistent. The spokes are the teams and tools that consume it. A finance analyst pulls data into Excel. A data scientist queries it in Python. An AI agent accesses it via MCP. They all get the same numbers, definitions, governance—without a centralized BI team manually ensuring consistency across every output.

Risk reduction, not risk elimination

The semantic layer doesn’t eliminate data risk. The underlying data still needs to be clean, well-structured, and maintained—as every practitioner I’ve spoken with has confirmed, garbage in still produces garbage out. And organizational alignment around metric definitions requires leadership commitment that no software can substitute for.

But the semantic layer changes the economics of data risk. Instead of scaling risk management by adding more people and more governance tools, you reduce the surface area that needs to be managed. Fewer places where logic can diverge. Fewer systems to audit. Fewer opportunities for a metric change to get lost in translation. The problems don’t disappear, but they become containable—manageable in one place rather than scattered across the entire stack.

For organizations serious about AI-driven analytics, this matters more than ever. AI tools need governed, contextualized data to produce trusted outputs. The semantic layer provides that foundation—not just as a nice-to-have for consistency, but as critical risk infrastructure for an era where the cost of bad data is accelerating.

One definition. One access point. One place to govern. That’s not just a better architecture. It’s a better risk strategy.



Read the whole story
alvinashcraft
3 hours ago
reply
Pennsylvania, USA
Share this story
Delete

How to Make Code Highlighting-Friendly

1 Share

This article introduces the notion of highlighting complexity and provides recipes for making your code highlighting-friendly, resulting in faster, more efficient highlighting.

Code style is not just for style – it impacts the physical world! The benefits of highlighting-friendly code include:

  1. Better responsiveness
  2. Optimized CPU usage
  3. Efficient memory usage
  4. Cooler system temperatures
  5. Quieter operation
  6. Longer battery life

While monads are burritos, you shouldn’t be frying eggs on your laptop!

Consider highlighting complexity

Imagine you’ve written this function to compute Fibonacci numbers using naive recursion:

def fib(n: Int): Int =
  if (n <= 1) n
  else fib(n - 1) + fib(n - 2)

It is predictably slow, but you wouldn’t blame Scala for that. The issue is more fundamental and not specific to the programming language. However, this doesn’t mean that the function cannot be made fast. There is a way to adjust the code so it outputs exactly the same sequence much more efficiently.

The same is true for highlighting code. If highlighting is slow, the IDE is not always to blame. Some code is inherently difficult to analyze. However, this doesn’t mean that highlighting cannot be fast. Minor code tweaks can make highlighting significantly more efficient, even if the code stays essentially the same.

So far, so good. However, while algorithmic complexity is “CS 101”, developers rarely think about highlighting complexity. (The two differ: Code might run slow but be easy to highlight, or run fast but be difficult to highlight.) Even if you study compiler construction, it’s primarily not about performance, and parts that are about performance refer to compilers rather than source code. Furthermore, batch-compiling code is not the same as editing code.

Following software engineering best practices may often speed up highlighting. It’s also useful to do in general: keeping your classes and methods small and focused, preferring clarity over cleverness, etc. However, these principles are mostly about cognitive complexity. In contrast to algorithmic complexity, cognitive complexity often correlates with highlighting complexity. Still, they are not the same and sometimes can differ significantly.

When writing code, you should also consider highlighting complexity. If you ignore algorithmic complexity, your code will perform poorly. If you ignore cognitive complexity, your code will be difficult to understand. If you ignore highlighting complexity, your code will take a long time to compile or highlight and will consume excessive resources in the process.

Good code should be good in all respects. Fortunately, the principles for making your code highlighting-friendly are simple and easy to apply in practice. (Most of the recipes are not Scala-specific and can be useful for other languages as well.)

Separate code into modules

Most Scala programmers divide code into packages, but fewer divide code into modules. There’s one and the same reason for both.

In contrast to a language like C, Scala supports packages, and most Scala projects naturally use them. Modules, however, are a concept of IDEs and build tools rather than the programming language, so they are used less often. Even the Java Platform Module System is mostly about compiled classes and JARs rather than source code.

Modules limit the scopes of bindings and introduce an explicit graph of dependencies – otherwise, any source file could, in principle, depend on any other source file. This limits the scope of incremental compilation and analysis, which makes compilation faster, reduces peak resource consumption, and allows modules to compile in parallel.

Likewise, modules improve the performance of highlighting – an IDE can search for entities and invalidate caches more efficiently. Moreover, this improves the UX by making autocomplete and auto-import more relevant, reducing clutter. Another benefit is that you can compile (or recompile) only part of a project when running an application or a unit test in one of the modules (even if other modules don’t compile cleanly).

Packages are often natural boundaries for modules. If there’s only a single module in your project, or if some modules are too large, consider extracting one or more packages into a separate module. Since the refactoring doesn’t affect packages as such, this should be backward-compatible. Furthermore, you can still package the classes into a single JAR – the refactoring is for the source code, but not necessarily the bytecode.

Note that you must use true modules – using multiple directories or multiple source roots is not the same thing. (See multi-project builds for sbt.)

Put classes in separate files

The Scala compiler doesn’t limit how many classes you can add to a source file (or how you name that file). This can be useful, but you shouldn’t overuse this capability.

If you modify only one class in a source file, the Scala compiler cannot compile that class separately – it has to compile the entire source file. The same is generally true for IDEs: You open a file rather than a class in an editor tab, which analyzes the entire file. (However, you can use incremental highlighting to overcome this limitation.)

Furthermore, when each class has a file with a dedicated name, it’s easier to find classes and navigate around the project, even without an IDE. You should put classes into corresponding files the same way you put packages into corresponding directories.

Another reason is import statements. While each class requires its own set of imports, defining multiple classes in a single file merges these imports and makes them common. This can slow down the resolution of references. (If there are many imports and imported entities that, in turn, depend on many imports, then there could be a combinatorial explosion.)

If you notice many relatively large classes in a single file, consider extracting classes into separate source files. It’s easy to do and doesn’t affect backward compatibility. (Obviously, companion classes and sealed class hierarchies should remain in the same file.)

Define classes in packages rather than objects

In Scala, packages and objects are similar, and there are even package objects! This makes it possible to put classes in objects rather than packages. However, there are good reasons to avoid that.

First, since each object is contained in a single source file, multiple classes in an object implies multiple classes in a file, which, as we’ve already seen, is not ideal.

Second, this also affects compiled code, not just source files. While every class is compiled to a separate JVM .class file, as if they were defined in a package, there’s only one outline for the object – pickles or TASTy. As a result, both the compiler and IDE have to process multiple classes even if they need to access only one.

Thus, you should normally define classes in packages rather than objects. Leave objects for methods, variables, and types. (And in Scala 3, even top-level definitions can reside in a package.)

Favor small classes and methods

Yes, yes, you already know this. But there’s a twist. When you normally think of “small”, you often think of “simple”. For example, if a class contains only a few methods with descriptive names, the class looks simple, and you don’t have to analyze the code of these methods to understand what they do.

This luxury, however, doesn’t apply to compilers or IDEs. If you open the file, the entire contents will be analyzed, and if the methods (and consequently the class) are large, the analysis will consume time and resources.

Consider splitting large classes and methods into smaller ones, even if they are simple. For highlighting, “lines of code” matter; even a single class or method can be too much if it’s very large.

This also applies to generated sources: If a source file is generated and other sources depend on it, you don’t need to look into that code, but IDEs and compilers still do. When generating code, divide the output into smaller parts – files, classes, and methods; don’t mix everything into one blob.

Depend on interfaces rather than classes

It’s good to “program to an interface” in general, and this can also help with highlighting.

Suppose there is a large class with a few methods that comprise its API. Even if you access only the API, reading the source file requires parsing the entire class, including all the implementation details. And even if you specify the types explicitly, resolving the corresponding references requires processing many imports.

Therefore, if a class is very large, consider extracting an interface instead of referencing the class directly.

Avoid wildcard imports

Using named imports rather than wildcard imports is a well-known best practice. It makes code more readable – you can clearly see where symbols come from. It also makes your code more robust. (Otherwise, code might stop compiling after a library adds a class that conflicts with another imported class.) And there’s less clutter – autocomplete will show only relevant symbols that are actually in use.

Furthermore, named imports can speed up code analysis. When resolving identifiers, each wildcard import has to be checked, and import expressions might, in turn, depend on wildcard imports above. There might be imports from objects, which themselves depend on imports elsewhere. All of that is not limited to the file being highlighted. Even if your code depends only on signatures in other files, because paths in the type annotations are not absolute, the analysis still has to process imports in those files.

Wildcard imports are especially problematic for implicits. Because implicits are, well, implicit, and might require other implicits, searching for them can be computationally intensive. And if implicits are imported using a wildcard, then both the usage and the import are implicit. This complicates the task even more – not only does the analysis need to find some vague entity, but it also has to look in a blurry scope.

Therefore, prefer specific imports to wildcard imports. Convert existing wildcards to named imports. In Scala 2, consider importing implicits by name. Although given imports in Scala 3 are an improvement, they are effectively wildcard imports and thus rely on good library design. To be on the safe side, prefer by-type imports to plain given imports. (And if you’re designing a library, define implicits in a separate package or object.)

Prefer imports to mixins

It’s possible to use inheritance instead of imports. We can see this even in Java: Every TestCase is also Assert, so you can access methods such as assertEquals without having to import them. This might seem convenient. However, this is effectively a forced wildcard import, with all the usual drawbacks. It’s better to import Assert.assertEquals selectively (or import Assert.*, as an option).

Furthermore, the approach with subclassing or mixing in traits is slower compared to regular wildcard imports. Analysis has to take inheritance and linearization, as well as overloading and overriding, into account. And if you modify the trait, classes that use it have to be recompiled.

If some definitions are effectively static, put them in an object rather than a trait, so that clients import rather than inherit them.

Declare classes and methods private

There are many good reasons to minimize the accessibility of classes and methods: to distinguish between API and implementation, to maintain source and binary compatibility, to prevent clutter in autocomplete, and to reduce cognitive load.

What’s less known is that declaring classes and methods private, whenever possible, improves the performance of compilation and highlighting. Incremental compilers don’t include private members when determining APIs and thus don’t need to store and compare them. In the process of resolving references, IDEs can skip inaccessible elements faster. When you write “Foo”, you already know which Foo is implied. However, you might be surprised by how much computation resolving a reference often involves. Declaring unsuitable Foos inaccessible helps make analysis faster.

The Scala plugin can help by automatically detecting declarations that can be private.

Specify types of public or complex definitions

Each non-local definition should either be private or have a type annotation. Definitions that are accessible to clients comprise an API. APIs are boundaries of abstraction and thus must be explicit; clients shouldn’t have to study the implementation – the right-hand side – to understand the signature – the left-hand side. In contrast to implementations, APIs must be stable and must not depend on the contents of the right-hand side. Type annotations make APIs both explicit and stable.

Type annotations greatly help incremental computations. When signatures are stable, fewer classes need to be recompiled after a code modification. Likewise, more caches can be reused when you edit code in an IDE, making highlighting faster and reducing resource consumption.

Thus, it’s best to always specify the types of non-private members explicitly. Note that you should specify the type even if there’s overriding because the inferred type might be more specific, at least in Scala 2. (For example, if a superclass method returns Seq[Int] and the subclass method is just = List(1), the type of the latter would be List[Int], which might affect clients that use the subclass directly.) You should also specify the types of protected members, not just public ones – subclasses are also clients. (As an exception, you may omit types when the right-hand side is both simple and stable, e.g., a literal. That said, having the type spelled out explicitly is often better, both for humans and compilers.)

Furthermore, explicit types can benefit even private and local definitions. While an incremental compiler recompiles the entire file, an IDE can invalidate caches more gradually and within a narrower scope. Thus, add type annotations to private members if they are complex – this can make editing code more efficient. Also, specify the types of complex local variables. (Sometimes you may first need to extract a method or introduce a variable to specify the type.)

Code Style | Type Annotation in the Scala plugin requires type annotations for public and protected members – they are automatically added by refactorings and code generation, and are checked by the corresponding inspection. However, there are exceptions for simple expressions, and they are not required for private or local definitions, regardless of complexity. You can make these settings stricter to be on the safe side.

Favor standard language features over macros

The concept behind macros might seem tempting – you do computations at compile time rather than at runtime. However, “compile time” is also “highlighting time”, which is true regardless of whether you use a compiler or an IDE when editing code… unless you always write everything in one go, without any assistance. So, macros might interfere with writing and editing code, making feedback slower and consuming more resources. Note that this applies not just to defining a macro, which requires a feature flag, but also to using macros, which doesn’t require a feature flag.

Macros are rarely actually needed. Take, for example, Lisp: The syntax is very limited, and the language is dynamic, so no static analysis is performed anyway. Scala, however, is a very expressive language as it is, and it’s statically typed. In Scala, the standard language features are sufficient for most tasks. In such a case, macros only make static analysis, as well as understanding code, more difficult. Thus, when writing code, reach for the standard language features first: type parameters, implicit parameters, etc. Macros are supposed to be the last resort, not a go-to solution.

This can be generalized: Don’t use complex language features just “because you can”, only when they are really needed; prefer the least powerful solution that solves the problem. For more details on this topic, see Lean Scala by Martin Odersky.

Apply these principles to AI-generated code

Even if you use AI to generate 100% of your code, you still read that code. (Right?) Therefore, producing highlighting-friendly code is as relevant as ever – the code is generated in a data center but is highlighted on your machine. This also improves incremental compilation, reducing system load when using agents. Moreover, it prevents context stuffing (when a model loads irrelevant information), which improves accuracy and reduces costs.

The first thing you can do is lead AI by example, because models tend to propagate existing conventions and coding styles. In a new project, you can explicitly add recommendations to AGENTS.md. Last but not least, you can always refactor your code, whether it’s written by a human or AI.

Summary

That said, the performance of your IDE is also important. We’re constantly working on improving the performance of both IntelliJ IDEA and the Scala plugin, and there are tips for improving performance that you can apply in practice. However, just as no amount of compiler optimizations can fix the example with naive recursion, highlighting may sometimes require assistance from your side.

As with everything, highlighting complexity is not the only factor; you need to balance different considerations. But often, there’s no contradiction: Clean code improves highlighting complexity, and improving highlighting complexity results in cleaner code. In any case, it’s useful to always consider highlighting complexity and having the recipes at hand.

For more details, see the corresponding ticket in YouTrack. It also lists features that can help you apply the refactorings more easily. If you find them useful, vote for the tickets so we know there is demand.

If you have any questions, feel free to ask us on Discord.

Happy developing!

The Scala team at JetBrains

Read the whole story
alvinashcraft
3 hours ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories