Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
148161 stories
·
33 followers

Learning PostgreSQL with Grant: Data Storage

1 Share

If you’re hosting your databases within a Virtual Machine (VM) or on some big iron, one of the principal bottlenecks you’re likely to see within your PostgreSQL clusters is I/O. With I/O at the center of potential performance problems, a good understanding of how PostgreSQL manages it is very important. I’m going to start with just how things are stored on disk. We’ll get to how writes and reads occur in another article.

Before we get into the details, I do want to address a couple of issues around data storage and PostgreSQL. First up, extensions. Some of the most impactful extensions are directly focused on data storage (looking at you, TimeScale). They can change the fundamentals we’re going to cover here. As such, each of these would need to be addressed individually to understand how they impact PostgreSQL storage.

Also, if you’re running PostgreSQL on any of the cloud vendors’ Platform as a Service (PaaS) offerings, storage is, again, usually very different than the core behaviors of PostgreSQL. They would also have to be addressed individually (but I/O is still a very direct concern on the cloud). Some, but not all, of what’s covered in this article apply to the cloud platforms.

With that out of the way, let’s get started.

How PostgreSQL stores data: understanding page architecture

The whole idea behind a database is the ability to persist the data. You want your inventory of widgets to get stored so you can look at it later. That means writing out to disks. However, what is writing to disk and where is it being written? Unlike SQL Server which has one (or more) big file for all data, PostgreSQL has a collection of a large number of files. There is a methodology and structure to these files that you need to understand in order to later understand how the data gets written to and retrieved from these files.

While we’re going to be very focused on file, page, folder, etc., throughout this article, that’s just part of the physical nature of persisting your data. What is being persisted is still the logical information you’re most interested in – rows and columns. I just wanted to emphasize the distinction between the two here.

What are PostgreSQL pages?

The files themselves are stored in a collection of 8kb pages. Since this is PostgreSQL, you’re not limited to that if you choose to change the page allocation size when – or if – you compile your server. Since I’ll probably never be doing custom compilations of PostgreSQL (and most of you won’t either), we can safely say PostgreSQL stores data in 8kb pages within the files that define a table or index.

The pages have a defined structure. At the front, there’s the PageHeaderData. This is a 24-byte collection of data about the page. It includes things like the Write Ahead Log (WAL) Log Sequence Number (LSN) for the latest write to the page, an offset to where the free space within the page starts and other information about the data on the page.

After that, it gets a little weird for me. You get a set of Item ID values. These are long term pointers to the data and include information like how long the piece of data is and where it’s located, but that’s not the weird part. The weird part is that the actual data is written backwards on the page, so the free space and offset needed to find it are opposite the set of ID pointers to the data itself. I’m no artist, but it looks something like this:

A diagram showing what the ID pointers look like.

The final bit of the page, Special Space, isn’t used by tables. It’s generally used by indexes to define storage, linked lists, and other things like that, depending on the index type.

Get started with PostgreSQL – free book download

‘Introduction to PostgreSQL for the data professional’, authored by Grant and Ryan Booz, covers all the basics of how to get started with PostgreSQL.
Download your free copy

Explaining TOAST in PostgreSQL

All this assumes that the data will fit on a page. When it doesn’t, TOAST (The Oversized-Attribute Storage Technique) comes into play. Basically, if a row or column just isn’t going to fit on 8kb of storage, something has to be done in order to persist that data. Only variable length data will be stored as TOAST. Fixed-length data, like an integer, just doesn’t need to overflow beyond a page.

PostgreSQL developers use terms like TOASTable, for data that can be stored on TOAST, or TOASTed, for data stored on TOAST. It’s all a bit cute, but it’s a very interesting technical solution.

As explained earlier, each table has a file defined for its storage and, if columns on the table are TOASTable, there will be a TOAST file associated with the table’s data file. This works by a pointer to the appropriate place on TOAST being stored in regular page storage with the Item values, which is then used to find the appropriate place in the TOAST file.

TOAST storage is compressed. You can pick and choose from different compression algorithms and set that on a per-column basis (because your data may benefit from a different algorithm, depending on the data in question). Also, TOAST data can be very large so, in addition to compression, they break it down into chunks, set by the TOAST_MAX_CHUNK_SIZE. By default, this is around 2kb; 2kb stores four chunks on a page, and pages are still used for the actual storage.

Since TOAST is stored separately from the rest of the data, it’s referred to as “out of line” storage (meaning it’s not a standard part of the tuple or row of data stored with the table.) The default value where TOAST kicks off is 2kb. Up to there, a given column is considered OK to be stored with the row, assuming the rest of the row also fits within that 2kb limit. Otherwise, data will be moved out of line and stored in the TOAST file instead of the table file.

This is considered to be efficient because most of the work of retrieving data will be done on the table files. By moving the large storage off these pages and files, you store more rows/tuples per page, making both disk and memory I/O more efficient overall. This is especially true if the TOASTed data isn’t to be retrieved by a given query.

So, that’s the basics on what gets written to the pages within PostgreSQL. Now, let’s talk about where those pages are stored.

There are details to every different data type in exactly how they’re stored and retrieved within the TOAST file, but we can’t cover them all here!

Where does PostgreSQL store files?

By default, there’s no defined location for storing PostgreSQL files – it simply comes down to how your cluster was initialized. Per the documentation, a “popular” place for the files is: /var/lib/pgsql/data – and yes, Windows will be different, but you’re not running PostgreSQL on Windows, right?!

On my container, the location is /home/postgres/pgdata/data. Of course, you can take control and place the files where you want. To see where your files are located, run:

SHOW data_directory;

This will output the path for databases within your cluster. You can also query the settings, which might be handy for some automation:

SELECT current_setting('data_directory') AS data_directory;

With that location in hand, we’re still not to where the databases are stored, but we are to where all data necessary for running PostgreSQL is stored. You’ll have subdirectories for things as varied as the location of the status of transaction commitments (pg_xact), to the storage of cluster-wide information (global.) What we’re looking for is the directory “base” which, on my container, looks like this:

What Grant's container currently looks like.

I can hear you now. Where are the databases? They’re right there. I see postgres, template0, template1, bluebox, testdb and perftuning. No? OK. Here’s a query that can help:

SELECT
db.datname,
              db.oid AS db_oid,
              current_setting('data_directory') AS data_directory
FROM
              pg_database AS db
JOIN pg_tablespace AS ts ON
              db.dattablespace = ts.oid;

Each folder you see corresponds to an OID (Object Identifier) for the object in question. In this case, databases. Within each folder, the database objects are stored in an individual file for each table and index. For example, here’s folder 20967, the bluebox database:

Folder 20967 - the bluebox database

Each file is named after the file node number. In practice, at the start, that corresponds to the OID for the table. However, operations such as REINDEX or ALTER TABLE can result in the Node Number changing, even as the OID stays the same. So, when looking up what a given table or index is, based on these files, use the OID, not simply the Node Number. You can use the Node Number to get the OID.

However, we need to understand a few things right up front. Some of those files are user tables or indexes, and others are system tables or even special objects. When you see a file node number, such as 1247 in the upper-left, it might be followed by one or two other files.

Free desktop tool for fast PostgreSQL monitoring and diagnostics

Stay in control of PostgreSQL performance with Redgate pgNow – a free desktop tool for fast, focused diagnostics. No agents, no setup, just actionable insights when you need them.
Learn more & download now

In this case, you’ll see 1247_fsm and 1247_vm. *_fsm designates a Free Space Map – a storage mechanism that tracks the free space within the table, or relation, 1247. You’ll see this with both indexes and tables. The *_vm shows which pages are frozen, meaning a given row within the page is always visible (part of the behaviors of the Multi-Value-Concurrency Control which I introduced here.)

You’ll also have a list of the pages that are known to only contain values visible to all active transactions (also explained more in the MVCC article). Only tables will have a *_vm file. We just have to run a query to identify what a given table or index actually is:

WITH rel AS (
  SELECT pg_filenode_relation(0, 22224) AS regclass
)
SELECT
  ns.nspname AS schema_name,
  c.relname AS rel_name
FROM rel
JOIN pg_class     c ON c.oid = rel.regclass::oid
JOIN pg_namespace ns ON ns.oid = c.relnamespace;

The function pg_filenode_relation maps between the names of the files and the objects within the database. You have to pass it your database ID or, as I did, 0 for the current database, and 22224 for the file node number. It’s then just a question of combining this information with others, such as pg_namespace to get the schema name, and pg_class to get the object name. In this case, it’s the bluebox.film table.

You can also go the other way – if you know the schema and table name, you can easily find where it’s being stored on the file system:

SELECT
	ns.nspname AS schema_name,
	c.relname AS rel_name,
	c.relfilenode AS filenode
FROM
	pg_class AS c
JOIN pg_namespace AS ns ON
	ns.oid = c.relnamespace
WHERE
	ns.nspname = 'bluebox'
	AND c.relname = 'film';

With that, you have a good idea of how PostgreSQL stores the objects that make up your database…although…where are the functions? Where are the views? Do they have their own files or folders somewhere? Actually, no.

Where are functions, views, procedures, etc, stored in PostgreSQL?

In PostgreSQL, functions, views, procedures, triggers and the like, are stored within various system catalogs. For example, a function is stored in pg_proc and a view is stored in pg_class.

So, clearly, changes made to a table or index is dealt with pretty radically differently than a change made to a function. We’ll get into the details of what happens during writes in another article (although, the MVCC and Vacuum articles do cover some of this behavior).

One other concept worth addressing here is the idea of a TableSpace. Most of what I’ve described up to now is the behavior of the default PostgreSQL tablespace. However, you can add additional tablespaces to your database. At its core, a tablespace is just another location for storing data (specifically, relations or tables & indexes).

While the location of this new tablespace may be a new disk or a networked disk subsystem, for example, the basic behavior from within PostgreSQL is roughly the same. It’s a new folder for storing data. It’ll have a folder for a database, and files for tables and indexes.

The files will behave the same way already described with similar page layouts and behaviors: you can control which tablespace is the default for a database, and you can individually specify a tablespace for individual tables or indexes.

Finally, the Write Ahead Log for transaction management (and more) is a different storage mechanism that I covered in another article.

Final Thoughts

What we have with PostgreSQL is both a very simple structure and a very complicated one. The core behaviors are easy enough to get your head around – tables are files, a database is a collection of tables so it’s a folder, etc – but the devil, as always, is in the details. The interaction between all these various storage mechanisms when combined with reads, writes, transactions…it all introduces complexity. However, you should now have a basic understanding of what gets stored where within PostgreSQL.

FAQs: Data Storage in PostgreSQL

1. How does PostgreSQL store data on disk?

PostgreSQL stores data in 8KB pages within individual files for each table and index. Each database is a folder, and each table or index is stored as its own file inside that folder.

2. What is PostgreSQL page architecture?

PostgreSQL pages contain a PageHeaderData section, Item IDs (row pointers), actual row data stored in reverse order, and optional special space for indexes. This structured layout improves storage efficiency and I/O performance.

3. What is TOAST in PostgreSQL?

TOAST (The Oversized-Attribute Storage Technique) handles large variable-length data that doesn’t fit within an 8KB page. It stores large values “out of line” in separate TOAST files, often using compression for better storage efficiency.

4. Where are PostgreSQL database files located?

Database files are stored in the cluster’s data_directory. Inside it, the base folder contains subfolders named after database OIDs, each holding table and index files.

5. What are *_fsm and *_vm files in PostgreSQL?

  • _fsm (Free Space Map) tracks available space in tables and indexes.

  • _vm (Visibility Map) tracks pages visible to all transactions.

    These files help optimize performance and support MVCC operations.

6. Do PostgreSQL views and functions have their own files?

No. Objects like views, functions, triggers, and procedures are stored in system catalogs (e.g., pg_class, pg_proc) rather than as separate files.

7. What is a PostgreSQL tablespace?

A tablespace is an additional storage location for database objects. It allows administrators to store specific tables or indexes on different disks to optimize performance and manage I/O workloads.

8. Why is understanding PostgreSQL storage important for performance?

PostgreSQL performance bottlenecks – especially I/O issues – are directly tied to how pages, files, TOAST data, and tablespaces are structured. Understanding storage architecture helps optimize disk usage and database tuning.

The post Learning PostgreSQL with Grant: Data Storage appeared first on Simple Talk.

Read the whole story
alvinashcraft
just a second ago
reply
Pennsylvania, USA
Share this story
Delete

TDD as induction

1 Share

A metaphor.

In the mid 2010s I was working with a Danish software development organisation, effectively acting as a lead developer. Because of a shortage of salaried employees, we needed to hire freelancers, and after I had exhausted my local network, I turned to international contacts. One (excellent) addition to the team was Mike Hadlow, who worked out of England.

On his first day, we had him clone the repository and run the tests. About five minutes later, we received a message from him (paraphrasing from memory): "Guys, I have three failing tests. Is this expected?"

No, we didn't expect that. The team had used test-driven development (TDD) for the code. It had hundreds of tests, all of them deterministic. Or so we thought.

It didn't take long to figure out that three tests failed on Mike's computer because it, naturally, was configured with the UK English locale, whereas so far, everyone had been running with the Danish locale. In Danish, like many other languages, comma is the decimal separator and period the thousands separator. As readers of this article will know, in English, it's the other way around.

The three tests failed because they expected Danish formatting rules to be in effect.

I don't remember the specifics, but once we had identified the root cause, fixing it was easy. Be more explicit in the arrange phase, or be less explicit in the assertion phase.

The lesson was that even tests written with TDD make implicit assumptions about the environment.

Horizontal scaling #

A decade earlier, a colleague taught me that the most difficult scale-out was going from one to two. This was in the early noughties, and the challenge of the day was scaling out servers. Already back then, we were running into the problem of stagnating CPU clock speed improvements. For decades, computers had become faster each year, so if you had performance issues, often you could wait a year or two and buy a faster machine.

In the early 2000s, this stopped being the rule, and chip manufacturers instead started to add more processors to a single chip. This solved some problems, but not all. Another attempt to address performance problems was to scale out instead of up. Instead of buying a faster, more expensive computer, you'd buy another computer like the one you already had, and somehow distribute the workload. If you could make that work, that made better economic sense than buying more expensive equipment only to decommission the old machine.

The problem, however, was that at the time, most software was designed with the implicit assumption that it would run on one machine only. Not client software, perhaps, but certainly database servers, and often application servers, too. Going from one to two machines was not a trivial undertaking.

On the other hand, once you had done the hard work of enabling, say, a web site to run on two servers, it would typically be trivial to make it run on three, or four.

Two as many #

The notion that the most difficult scale-out is going from one to two made such a deep impression that it's been with me ever since. It seems to generalise to other fields, too. That going from the singular to the plural is where you find most barriers. Once you've enabled having two of something, then the actual number seems to be of lesser importance.

It took me a long time to come to terms with the notion that the number two is only a 'representation' for any plural number. One reason, I think, is that my thinking may have been tainted by an innocuous phrase that my mother often uttered: "En, to, mange" or, in English, one, two, many.

As any 'real' software tester will tell you, it's actually nought, one, many. It took me many years of test-driven development (TDD) to finally accept that when testing for plurality, it was often good enough to test with collections of two values. In my early TDD years, I would often insist on adding a test case for the 'three' case, but over the years I learned that this extra step didn't enable me to move forward. In the parlance of the transformation priority premise, adding such a test case lead to no transformation.

Once I, grudgingly, accepted that two is many, I started noticing other patterns and connections.

TDD and inductive reasoning #

Much has already been said about TDD, particularly example-driven development, as a sort of inductive reasoning. You start with one example, and implement the simplest thing that could possibly work. You add another example, and the System Under Test becomes slightly more sophisticated. After enough iterations, you have a working solution.

This looks like inductive reasoning, in that you are generalising from the specific to the general.

Such an analogy calls for criticism, because inductive reasoning in general suffers from fundamental epistemological problems. How do we know that we can safely generalise from finite examples?

We can, because TDD is not a process of uncovering some natural law. The problem of induction, typically, is that in natural science, researchers attempt to uncover underlying relationships; cause and effect. Their area of study, however, is the result of natural processes. Or, if a researcher studies economics, perhaps a result of complex social interactions. In scientific settings, the object of study is not man-made, and you can't ask anyone for the correct answer.

With TDD, the situation is different. You can consult the source code. In fact, if TDD is done right and you made no mistakes, the System Under Test (SUT) should be the generalisation of all the examples.

Of course, to err is human, so you could have made mistakes, but with TDD we are on much more solid ground than is usually the case in epistemology.

This seems to suggest that TDD has more in common with formal science than with natural or social science.

Tests as statements #

Consider a test following the Arrange Act Assert pattern. As the last step indicates, a test is an assertion. It's a claim that if things are arranged just so, and a particular action is taken, posterior state will have certain verifiable properties. We might consider such a construction a formal statement. Formal, in the sense that it's expressed in a formal (programming) language, and a statement because its truth value is either true (i.e. passed) or false (i.e. either failed, crashed, or hanging).

Excluding property-based testing from the discussion, a test is still an example. We shouldn't infer a system's general behaviour from a single example, but when viewed collectively, we may, as discussed above, engage in inductive reasoning. For the rest of this article, however, that is not what I have in mind. Rather, I want to talk about an independent kind of generalisation; a different dimension, if you will.

Coordinate system with behaviour along the x-axis and adaptability along the y-axis.

So far, I have discussed how we may infer a system's behaviour from examples. The more examples you provide, the more you trust the induction.

In the rest of this article, I will discuss how replicating a test to multiple environments tend to demonstrate increased adaptability. In this light, a single test is a statement about one single example, but the statement is now assumed to be universal. It should hold in all circumstances described by it.

What does that mean?

Tests are the first clients #

As I wrote a long time ago, in an otherwise too confrontational article, unit tests are the first clients of the SUT's APIs. Only once tests pass do you put the SUT to use in its intended context. The function/class/module/component that you test-drove now becomes part of the overall solution. The View Model correctly helps render the user interface. The Domain Model makes the right decision. A security component correctly rejects unauthorised users.

When you integrate a test-driven unit in a larger system, any test (even a manual test) of that system is a secondary test. Often, you simply verify that the composition of smaller elements work as intended. Occasionally, an integration test reveals that the unit doesn't work in the new context.

This is expected. It's the reason integration testing is important.

When unit tests succeed, but integration tests fail, the reason is usually that the unit tests are too parochial. Integration test failures reveal that the unit has to handle situations that you hadn't thought of. Sometimes, the problem is that input is more varied than you initially thought. Other times, like the above story about Danish and UK locales, it turns out that the test made implicit assumptions that ought to be explicit.

While this error-discovery process is normal, in my experience, once you've addressed bugs that only manifest in a new context, additional contexts tend to unearth few new problems. You find most defects in the first context, which is the automated test environment. You find a few more test once you move the code to a new execution context. After that, however, error discovery tends to dry out.

Bar chart showing execution context on the horizontal axis and errors found on the vertical axis. The execution context labelled '1' has the highest bar; the bar labelled '2' has a bar only a tenth the size, and the labels '3', '4', and '5' has no bars.

In my desire to make a point, I'm deliberately simplifying things. It is not, however, my intention to mislead anyone. In reality, you do sometimes find new errors in the third or fourth context. Some errors, as everyone knows, only manifest in production, and only in certain mysterious circumstances. In other words, the above chart is deceptive in the sense that it seems to claim that the third, fourth, etc. contexts reveal no additional bugs. This is not the case.

That said, in my experience the relationship is clearly non-linear, and for a long time, I wondered about that.

Mathematical induction #

Although the following is, at best, an imperfect metaphor, this reminds me of mathematical induction. You start with the statement that a particular example (implemented as a test) works in a single environment (typically a developer machine). Call this statement P(1).

A box labelled 'Dev box' with an arrow pointing to a box labelled 'System'.

Already when you synchronise your code with coworkers' code, the example or use case now executes on multiple other machines; P(2), P(3), etc.

Multiple, overlapping boxes labelled 'Dev boxes' with an arrow pointing to a box labelled 'System'.

As the initial anecdote about locale-dependent tests shows, you may already find a problem here. In many cases, however, the development machines are sufficiently identical that any single test is effectively running in the same context. In this sense, you may still be establishing that the first statement, P(1), holds.

If so, you may discover problems in execution contexts that differ from developer machines to a larger degree.

Boxes labelled respectively 'Dev boxes', 'CI/CD', and 'Production, each with an arrow pointing to a box labelled 'System'.

Sometimes with mathematical induction, you need to establish more than a single base case. You may, for example, first prove P(1) and P(2). The induction step then assumes P(n-2) and P(n-1) in order to prove P(n).

Although the metaphor is flawed in more than one way, the non-linear relationship between environments and defect discovery reminds me of this kind of induction. Experience indicates that if an example works in the first and second context, it typically works in new contexts.

Implicit assumptions #

This induction-like relationship sometimes falls apart, as the opening anecdote illustrates. Sometimes, as the anecdote example shows, the problem is not with the implementation, but with the test. In mathematics, it may turn out that a proof makes implicit assumptions, and that it doesn't hold as universally as first believed. An example is that Euler believed that the characteristics of all polyhedra was constant, but failed to take non-convex shapes into account.

In the same way, tests may inadvertently assume that some property is universal. Later, you may discover that such an assumption, for example about locale, is not as universal as you thought.

This explains why my DIPPP coauthor Steven van Deursen correctly insisted that Ambient Context should be classified as an anti-pattern. Otherwise, it's too easy to forget essential pre-conditions, and thus make it easier to introduce bugs that only appear in certain contexts.

This is one of many reasons I prefer Haskell over most other programming languages. Haskell APIs don't make implicit assumptions about execution context. Or, rather, they have deterministic behaviour according to 'standards' which are often English; e.g. a decimal number like 12.3 always renders as "12.3", and never as "12,3", as it would in German, Danish, etc.

Even so, as Conal Elliot complains, some APIs are not as deterministic as one might hope.

The bottom line is that when writing tests, one has to carefully and explicitly state all relevant assumptions as part of the test.

Conclusion #

As imperfect a metaphor as it is, I find comfort in comparing defect discovery using automated tests with induction. After decades of test-driven development, I've wondered if there's a deeper reason that if test-driven code works on one machine, it tends to work on most machines, and that the relationship seems to be distinctly non-linear.

An automated test, if it properly describes all relevant context, is effectively a statement that a particular example always behaves the same. We may, then, choose to believe that if it works in one context, and we've seen it work in one additional, arbitrary context, it seems likely that it will work in most other contexts.


This blog is totally free, but if you like it, please consider supporting it.
Read the whole story
alvinashcraft
just a second ago
reply
Pennsylvania, USA
Share this story
Delete

Assembly Scanning in Needlr: Filtering and Organizing Type Discovery

1 Share

Learn how to control assembly scanning in Needlr, including filtering assemblies, controlling which types get discovered, and organizing type registration in .NET applications.

Read the whole story
alvinashcraft
17 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Podcast: Software Evolution with Microservices and LLMs: A Conversation with Chris Richardson

1 Share

In this podcast, Michael Stiefel spoke with Chris Richardson about using microservices to modernize software applications and the use of artificial intelligence in software architecture. We first discussed the problems of monolithic enterprise software and how to use microservices to evolve them to enable fast flow - the ability to achieve rapid software delivery.

By Chris Richardson
Read the whole story
alvinashcraft
22 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Anthropic Study: AI Coding Assistance Reduces Developer Skill Mastery by 17%

1 Share

Anthropic research shows developers using AI assistance scored 17% lower on comprehension tests when learning new coding libraries, though productivity gains were not statistically significant. Those who used AI for conceptual inquiry scored 65% or higher, while those delegating code generation to AI scored below 40%.

By Steef-Jan Wiggers
Read the whole story
alvinashcraft
29 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

WPF — June 2026 Roadmap (v26.1)

1 Share

Thank you for choosing DevExpress and for your on-going support. We value your business.

In this blog post, I’ll highlight WPF-related features/capabilities we expect to ship in our upcoming mid-year release (v26.1, June 2026).

As always, if you have questions or suggestions, feel free to share your thoughts in the survey at the end of this post or submit a ticket via the DevExpress Support Center.

The information contained within this blog post details our current/projected development plans. Please note that this information is being shared for INFORMATIONAL PURPOSES ONLY and does not represent a binding commitment on the part of Developer Express Inc. This roadmap and the features/products listed within it are subject to change. You should not rely on or use this information to help make a purchase decision about Developer Express Inc products.

New WPF Fluent Theme (CTP)

We are working on a new Fluent Theme built on a flexible design token–based architecture. This foundation is designed to unlock a number of key benefits across our WPF product line, including:

  • Enhanced appearance

  • Support for Mica material

  • Flexible resource management and built-in palette customization

  • Multiple UI density modes

  • System high-contrast support

  • Centralized global font settings

Not all of the features listed above will be available in v26.1, and final scope may change as development progresses. In v26.1, we plan to introduce the new Fluent Theme for a subset of WPF controls as part of an initial Community Technology Preview (CTP).

By adopting a design token–driven approach, we can deliver new themes more efficiently in the future and ensure a unified visual language across multiple product lines. For example, in DevExpress-powered Blazor Hybrid applications, Blazor and WPF views will share a "common" look and feel based on the same Fluent theme.


WPF Fluent Icons

With v26.1, we expect to ship a comprehensive set of Fluent icons. To simplify migration to these new icons, you will be able to switch the icon set across all DevExpress controls using a single global setting. Our new collection will include thousands of Fluent icons which can be used in any UI element with SVG support.

WPF Grid Control

Column Drag & Drop Customization Events

We expect to introduce a new event to control which drag-and-drop operations are permitted for WPF Grid Control columns. This capability will address the following usage scenarios:

  • Prevent a column from being dropped at specific positions

  • Disable column reordering while still allow columns to be dragged out of the Grid or into the Column Chooser

  • Enforce fixed grouping order (for example, ensuring a specific column always has GroupIndex = 0)

You will be able to customize any column drag operation based on comprehensive information about the dragged column and the target area:

private void TableView_ColumnDragOver(object sender, ColumnDragOverEventArgs e) {
    if (e.TargetArea == HeaderArea.ColumnHeader) {
        e.Cancel = true;
    }

    // e.TargetArea
    // e.TargetIndex
    // e.SourceColumn
    // e.SourceIndex
    // e.SourceArea
    // ...
}

Expression Editor Support for Custom Format Conditions

With v26.1, users will be able to create advanced conditional formatting rules via custom expressions. This enhancement will allow users to define complex logic (such as [Created Date] > AddDays(LocalDateTimeToday(), -3) ) as requirements dictate (as you would expect, this feature/capability will offer increased flexibility when constructing formatting rules that rely on calculated values, functions, or advanced comparisons). 

Dependency Injection Enhancements

With v26.1, we hope to enhance Dependency Injection (DI) support in our MVVM framework. We will introduce a new XAML markup extension that instantiates view models using a globally configured DI container:

var host = RegisterServices().Build();
IocServiceProvider.Default.ConfigureServices(host.Services);
<Window DataContext="{dxmvvm:Ioc Type={Binding ViewType}}" />
In addition, DevExpress MVVM services will automatically resolve views from the DI container during navigation. For example, in code below, HomeView will be created with IDataService and ILogger dependencies if they are registered in the container:
public class MainViewModel {
    private INavigationService NavigationService {
        get { return this.GetService<INavigationService>(); }
    }

    public void OnViewLoaded() {
        NavigationService.Navigate("HomeView", null, this);
    }
}

public class HomeView {
    public HomeView(IDataService dataService, ILogger logger) {
        //...
    }
}

Prompt to Expression in Filter Editor and Expression Editor

We will add AI-powered expression generation to our WPF controls to help users create filter and unbound column expressions for data-aware WPF controls using natural language prompts. Instead of writing complex expressions, users describe the desired logic in plain text. The system will send the prompt to the configured AI service, which will generate an expression for the control.

WPF Charts – DateOnly / Time Only Support

The DevExpress WPF Chart Control will support .NET DateOnly and TimeOnly data types. You’ll be able to use these types to specify arguments, values, workday/work time options, strips, constant lines, and scale breaks, as well as to apply data filters, sorting, and summaries.

WPF Pivot Grid

DateOnly / TimeOnly Support

The DevExpress WPF Pivot Grid will support DateOnly and TimeOnly data types (in .NET applications). You'll be able to use these types with for all data operations, including: filtering, grouping, calculating summaries etc.

Filter Panel Navigation

We plan to enhance the Pivot Grid’s Filter Panel to support keyboard navigation when filters are applied. Users will be able to navigate to the Filter Panel to remove individual filter nodes or modify the entire filter.

This enhancement will help applications that use DevExpress Pivot Grid meet WCAG Keyboard (Level A) accessibility requirements.

Template Kit Enhancements

We plan to extend DevExpress Template Kit support to include Visual Basic templates:

In addition, we expect to introduce usability updates (such as search) to simplify navigation across our extensive template library.

WPF TreeList – Performance Optimization

The DevExpress WPF TreeList will operate much faster when data is updated with a custom filter applied (via the CustomNodeFilter event). We have identified a performance bottleneck in this usage scenario and found a way to eliminate it. Internal testing demonstrates speed gains of up to ~85%.

WPF AI Chat Control Enhancements

We plan to introduce a set of small but useful APIs to help customize the DevExpress AI Chat Control. For example, new properties will allow you to specify the text displayed in an empty chat view and in an empty text input field.

WPF Spreadsheet

Screen Reader Support

To help meet accessibility standards, we’re beginning work on screen reader for the DevExpress Spreadsheet. As you'd expect, this feature will allow users to access document content via screen readers, such as Narrator and NVDA.

New Excel Functions

We will enhance our Spreadsheet calculation engine by adding support for modern, dynamic array-based Excel functions, including functions like:

  • XLOOKUP
  • XMATCH
  • SORT
  • FILTER
  • UNIQUE

WPF Reporting

Accessibility & Export Compliance

DOCX/HTML Export — AccessibleDescription Support

Organizations subject to WCAG 2.2 and Section 508 requirements will be able to include meaningful alternative text for all embedded images and graphical elements in exported Word documents and HTML pages. With v26.1, DevExpress Reports will apply a report control's AccessibleDescription property value to DOCX and HTML export output. This capability is especially relevant for controls exported as images, such as XRPictureBoxXRBarCodeXRShape, and XRChart, where assistive technologies rely on alternative text to describe visual content.

DOCX Export — AccessibleDescription Support

PDF/UA Export Enhancements

Government agencies and enterprises that must comply with PDF/UA standards will benefit from the following PDF export capabilities in v26.1:

  • XRRichText content will use proper semantic tagging and group continuous text into logical paragraphs instead of word-by-word <P> tags.
  • The AccessibleDescription property will apply to PDF signatures and AcroForm controls so that digitally signed or form-enabled PDF documents meet accessibility requirements.
  • XRPageInfo controls will support paragraph-level semantic markup.

Data Source Enhancements

DateTimeOffset Support

Applications that store time zone-aware timestamps, such as financial transaction logs, audit trails, and event scheduling systems, require DateTimeOffset support to preserve the original offset for accurate data representation. With v26.1, the Report Designer's Field List will display DateTimeOffset fields and allow data binding, filter, group, and sort operations across SqlDataSourceEFDataSource, and ObjectDataSource.

XRLabel & XRPanel — Custom Border Styles Example

Invoice layouts, financial statements, and reports with visually distinct section separators often require independent border configuration per side. We will extend the How to Create a Custom DevExpress Report Control GitHub repository with a new example that demonstrates custom XRLabel and XRPanel controls. These custom controls will allow you to define distinct border styles and width for top, bottom, left, and right sides independently.

Invoice Header — Different border styles for panels

IDE Integration

DevExpress Report Designer for JetBrains Rider — .NET Projects Support

In v25.2, we released the DevExpress Report Designer for JetBrains Rider with .NET Framework support. v26.1 will extend this integration to .NET-based projects. We also expect to deliver enhancements such as property search and reset in the Properties panel, along with light/dark theme support to match the Rider IDE.

DevExpress Report Designer for JetBrains Rider — .NET Projects Support

AI-powered Enhancements

WPF Report Designer — AI-powered Expressions

The DevExpress WPF Report Designer will support expression generation via natural language queries, and will match the AI-powered expression capabilities already available in WinForms and Web Report Designer components. Report creators can describe the desired logic in plain text, and the AI assistant will generate the appropriate expression syntax. This update achieves AI-powered expression parity across all supported desktop and web platforms.

AI Prompt-to-Report Wizard — Optimization

We will optimize the AI Prompt-to-Report Wizard so that users can input shorter prompts and generate more consistent, production-ready report output. Our updated implementation will use a multi-agent workflow and will also reduce token consumption per report generation request.

AI Prompt-to-Report Wizard —Multi-Agentic Workflow

Your Feedback Matters


Read the whole story
alvinashcraft
37 seconds ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories