If you’re hosting your databases within a Virtual Machine (VM) or on some big iron, one of the principal bottlenecks you’re likely to see within your PostgreSQL clusters is I/O. With I/O at the center of potential performance problems, a good understanding of how PostgreSQL manages it is very important. I’m going to start with just how things are stored on disk. We’ll get to how writes and reads occur in another article.
Before we get into the details, I do want to address a couple of issues around data storage and PostgreSQL. First up, extensions. Some of the most impactful extensions are directly focused on data storage (looking at you, TimeScale). They can change the fundamentals we’re going to cover here. As such, each of these would need to be addressed individually to understand how they impact PostgreSQL storage.
Also, if you’re running PostgreSQL on any of the cloud vendors’ Platform as a Service (PaaS) offerings, storage is, again, usually very different than the core behaviors of PostgreSQL. They would also have to be addressed individually (but I/O is still a very direct concern on the cloud). Some, but not all, of what’s covered in this article apply to the cloud platforms.
With that out of the way, let’s get started.
How PostgreSQL stores data: understanding page architecture
The whole idea behind a database is the ability to persist the data. You want your inventory of widgets to get stored so you can look at it later. That means writing out to disks. However, what is writing to disk and where is it being written? Unlike SQL Server which has one (or more) big file for all data, PostgreSQL has a collection of a large number of files. There is a methodology and structure to these files that you need to understand in order to later understand how the data gets written to and retrieved from these files.
While we’re going to be very focused on file, page, folder, etc., throughout this article, that’s just part of the physical nature of persisting your data. What is being persisted is still the logical information you’re most interested in – rows and columns. I just wanted to emphasize the distinction between the two here.
What are PostgreSQL pages?
The files themselves are stored in a collection of 8kb pages. Since this is PostgreSQL, you’re not limited to that if you choose to change the page allocation size when – or if – you compile your server. Since I’ll probably never be doing custom compilations of PostgreSQL (and most of you won’t either), we can safely say PostgreSQL stores data in 8kb pages within the files that define a table or index.
The pages have a defined structure. At the front, there’s the PageHeaderData. This is a 24-byte collection of data about the page. It includes things like the Write Ahead Log (WAL) Log Sequence Number (LSN) for the latest write to the page, an offset to where the free space within the page starts and other information about the data on the page.
After that, it gets a little weird for me. You get a set of Item ID values. These are long term pointers to the data and include information like how long the piece of data is and where it’s located, but that’s not the weird part. The weird part is that the actual data is written backwards on the page, so the free space and offset needed to find it are opposite the set of ID pointers to the data itself. I’m no artist, but it looks something like this:

The final bit of the page, Special Space, isn’t used by tables. It’s generally used by indexes to define storage, linked lists, and other things like that, depending on the index type.
Get started with PostgreSQL – free book download
Explaining TOAST in PostgreSQL
All this assumes that the data will fit on a page. When it doesn’t, TOAST (The Oversized-Attribute Storage Technique) comes into play. Basically, if a row or column just isn’t going to fit on 8kb of storage, something has to be done in order to persist that data. Only variable length data will be stored as TOAST. Fixed-length data, like an integer, just doesn’t need to overflow beyond a page.
PostgreSQL developers use terms like TOASTable, for data that can be stored on TOAST, or TOASTed, for data stored on TOAST. It’s all a bit cute, but it’s a very interesting technical solution.
As explained earlier, each table has a file defined for its storage and, if columns on the table are TOASTable, there will be a TOAST file associated with the table’s data file. This works by a pointer to the appropriate place on TOAST being stored in regular page storage with the Item values, which is then used to find the appropriate place in the TOAST file.
TOAST storage is compressed. You can pick and choose from different compression algorithms and set that on a per-column basis (because your data may benefit from a different algorithm, depending on the data in question). Also, TOAST data can be very large so, in addition to compression, they break it down into chunks, set by the TOAST_MAX_CHUNK_SIZE. By default, this is around 2kb; 2kb stores four chunks on a page, and pages are still used for the actual storage.
Since TOAST is stored separately from the rest of the data, it’s referred to as “out of line” storage (meaning it’s not a standard part of the tuple or row of data stored with the table.) The default value where TOAST kicks off is 2kb. Up to there, a given column is considered OK to be stored with the row, assuming the rest of the row also fits within that 2kb limit. Otherwise, data will be moved out of line and stored in the TOAST file instead of the table file.
This is considered to be efficient because most of the work of retrieving data will be done on the table files. By moving the large storage off these pages and files, you store more rows/tuples per page, making both disk and memory I/O more efficient overall. This is especially true if the TOASTed data isn’t to be retrieved by a given query.
So, that’s the basics on what gets written to the pages within PostgreSQL. Now, let’s talk about where those pages are stored.
There are details to every different data type in exactly how they’re stored and retrieved within the TOAST file, but we can’t cover them all here!
Where does PostgreSQL store files?
By default, there’s no defined location for storing PostgreSQL files – it simply comes down to how your cluster was initialized. Per the documentation, a “popular” place for the files is: /var/lib/pgsql/data – and yes, Windows will be different, but you’re not running PostgreSQL on Windows, right?!
On my container, the location is /home/postgres/pgdata/data. Of course, you can take control and place the files where you want. To see where your files are located, run:
SHOW data_directory;
This will output the path for databases within your cluster. You can also query the settings, which might be handy for some automation:
SELECT current_setting('data_directory') AS data_directory;
With that location in hand, we’re still not to where the databases are stored, but we are to where all data necessary for running PostgreSQL is stored. You’ll have subdirectories for things as varied as the location of the status of transaction commitments (pg_xact), to the storage of cluster-wide information (global.) What we’re looking for is the directory “base” which, on my container, looks like this:

I can hear you now. Where are the databases? They’re right there. I see postgres, template0, template1, bluebox, testdb and perftuning. No? OK. Here’s a query that can help:
SELECT
db.datname,
db.oid AS db_oid,
current_setting('data_directory') AS data_directory
FROM
pg_database AS db
JOIN pg_tablespace AS ts ON
db.dattablespace = ts.oid;
Each folder you see corresponds to an OID (Object Identifier) for the object in question. In this case, databases. Within each folder, the database objects are stored in an individual file for each table and index. For example, here’s folder 20967, the bluebox database:

Each file is named after the file node number. In practice, at the start, that corresponds to the OID for the table. However, operations such as REINDEX or ALTER TABLE can result in the Node Number changing, even as the OID stays the same. So, when looking up what a given table or index is, based on these files, use the OID, not simply the Node Number. You can use the Node Number to get the OID.
However, we need to understand a few things right up front. Some of those files are user tables or indexes, and others are system tables or even special objects. When you see a file node number, such as 1247 in the upper-left, it might be followed by one or two other files.
Free desktop tool for fast PostgreSQL monitoring and diagnostics
In this case, you’ll see 1247_fsm and 1247_vm. *_fsm designates a Free Space Map – a storage mechanism that tracks the free space within the table, or relation, 1247. You’ll see this with both indexes and tables. The *_vm shows which pages are frozen, meaning a given row within the page is always visible (part of the behaviors of the Multi-Value-Concurrency Control which I introduced here.)
You’ll also have a list of the pages that are known to only contain values visible to all active transactions (also explained more in the MVCC article). Only tables will have a *_vm file. We just have to run a query to identify what a given table or index actually is:
WITH rel AS (
SELECT pg_filenode_relation(0, 22224) AS regclass
)
SELECT
ns.nspname AS schema_name,
c.relname AS rel_name
FROM rel
JOIN pg_class c ON c.oid = rel.regclass::oid
JOIN pg_namespace ns ON ns.oid = c.relnamespace;The function pg_filenode_relation maps between the names of the files and the objects within the database. You have to pass it your database ID or, as I did, 0 for the current database, and 22224 for the file node number. It’s then just a question of combining this information with others, such as pg_namespace to get the schema name, and pg_class to get the object name. In this case, it’s the bluebox.film table.
You can also go the other way – if you know the schema and table name, you can easily find where it’s being stored on the file system:
SELECT
ns.nspname AS schema_name,
c.relname AS rel_name,
c.relfilenode AS filenode
FROM
pg_class AS c
JOIN pg_namespace AS ns ON
ns.oid = c.relnamespace
WHERE
ns.nspname = 'bluebox'
AND c.relname = 'film';With that, you have a good idea of how PostgreSQL stores the objects that make up your database…although…where are the functions? Where are the views? Do they have their own files or folders somewhere? Actually, no.
Where are functions, views, procedures, etc, stored in PostgreSQL?
In PostgreSQL, functions, views, procedures, triggers and the like, are stored within various system catalogs. For example, a function is stored in pg_proc and a view is stored in pg_class.
So, clearly, changes made to a table or index is dealt with pretty radically differently than a change made to a function. We’ll get into the details of what happens during writes in another article (although, the MVCC and Vacuum articles do cover some of this behavior).
One other concept worth addressing here is the idea of a TableSpace. Most of what I’ve described up to now is the behavior of the default PostgreSQL tablespace. However, you can add additional tablespaces to your database. At its core, a tablespace is just another location for storing data (specifically, relations or tables & indexes).
While the location of this new tablespace may be a new disk or a networked disk subsystem, for example, the basic behavior from within PostgreSQL is roughly the same. It’s a new folder for storing data. It’ll have a folder for a database, and files for tables and indexes.
The files will behave the same way already described with similar page layouts and behaviors: you can control which tablespace is the default for a database, and you can individually specify a tablespace for individual tables or indexes.
Finally, the Write Ahead Log for transaction management (and more) is a different storage mechanism that I covered in another article.
Final Thoughts
What we have with PostgreSQL is both a very simple structure and a very complicated one. The core behaviors are easy enough to get your head around – tables are files, a database is a collection of tables so it’s a folder, etc – but the devil, as always, is in the details. The interaction between all these various storage mechanisms when combined with reads, writes, transactions…it all introduces complexity. However, you should now have a basic understanding of what gets stored where within PostgreSQL.
FAQs: Data Storage in PostgreSQL
1. How does PostgreSQL store data on disk?
PostgreSQL stores data in 8KB pages within individual files for each table and index. Each database is a folder, and each table or index is stored as its own file inside that folder.
2. What is PostgreSQL page architecture?
PostgreSQL pages contain a PageHeaderData section, Item IDs (row pointers), actual row data stored in reverse order, and optional special space for indexes. This structured layout improves storage efficiency and I/O performance.
3. What is TOAST in PostgreSQL?
TOAST (The Oversized-Attribute Storage Technique) handles large variable-length data that doesn’t fit within an 8KB page. It stores large values “out of line” in separate TOAST files, often using compression for better storage efficiency.
4. Where are PostgreSQL database files located?
Database files are stored in the cluster’s data_directory. Inside it, the base folder contains subfolders named after database OIDs, each holding table and index files.
5. What are *_fsm and *_vm files in PostgreSQL?
-
_fsm (Free Space Map) tracks available space in tables and indexes.
-
_vm (Visibility Map) tracks pages visible to all transactions.
These files help optimize performance and support MVCC operations.
6. Do PostgreSQL views and functions have their own files?
No. Objects like views, functions, triggers, and procedures are stored in system catalogs (e.g., pg_class, pg_proc) rather than as separate files.
7. What is a PostgreSQL tablespace?
A tablespace is an additional storage location for database objects. It allows administrators to store specific tables or indexes on different disks to optimize performance and manage I/O workloads.
8. Why is understanding PostgreSQL storage important for performance?
PostgreSQL performance bottlenecks – especially I/O issues – are directly tied to how pages, files, TOAST data, and tablespaces are structured. Understanding storage architecture helps optimize disk usage and database tuning.
The post Learning PostgreSQL with Grant: Data Storage appeared first on Simple Talk.








