The Linux Page Cache and PostgreSQL

PostgreSQL performance on Linux is often discussed in terms of SQL tuning, indexes, and query plans. Yet many real-world performance problems originate much lower in the stack: inside the Linux page cache and its writeback policy. PostgreSQL deliberately relies on the kernel for file caching and writeback, which means that kernel misconfiguration can silently undermine otherwise well-tuned database systems.

This blog posting explains how the Linux page cache works, how PostgreSQL integrates with it, and – most importantly – how incorrect page cache and dirty page settings can negatively affect PostgreSQL workloads.

The Linux Page Cache: Deferred I/O by Design

On Linux, the page cache is the kernel’s in-memory representation of file-backed data. Any read or write performed through normal system calls interacts with this cache. Pages are typically 4 KiB in size and are tracked globally by the virtual memory subsystem.

When data is read, the page cache acts as a classic cache: if the data is present and valid, the kernel serves it directly from memory. When data is written, however, Linux behaves differently. The kernel copies the data into the page cache, marks the affected pages as dirty, and immediately returns control to the calling process. Persistence is deferred.

This deferred-write model is fundamental to Linux I/O performance. It allows the kernel to coalesce writes, reorder them, and schedule I/O efficiently. The cost is that memory and disk can diverge significantly, and the kernel must actively manage how much dirty data it allows to accumulate.

Dirty Pages and the Two Critical Thresholds

Linux controls dirty memory using two related thresholds.

The first threshold is governed by vm.dirty_background_ratio. Once the amount of dirty memory exceeds this percentage of total RAM, the kernel starts background writeback. Dedicated flusher threads begin writing dirty pages to disk asynchronously. At this stage, applications are not slowed down; the kernel is merely trying to keep dirty memory from growing without bound.

The second threshold is defined by vm.dirty_ratio. This is a hard limit. When the amount of dirty memory exceeds this value, the kernel stops being polite. Processes that generate new dirty pages are actively throttled. Writes may block, and processes may be forced to sleep until enough dirty pages have been written back.

The distinction matters greatly. vm.dirty_background_ratio controls when cleaning starts. vm.dirty_ratio controls when applications are forced to wait. Misconfiguring either can have serious consequences for write-heavy workloads.

PostgreSQL Caching: Shared Buffers on Top of the Kernel

PostgreSQL maintains its own cache, known as shared buffers. These buffers store table and index pages and implement database-specific logic such as MVCC visibility, locking, and WAL coordination. Shared buffers are essential for correctness and concurrency, but they are not intended to replace the kernel’s page cache.

When PostgreSQL writes a modified page, it does so using standard system calls. The data flows into the Linux page cache, where it becomes dirty. PostgreSQL relies on carefully placed fsync() calls to ensure durability, but it does not dictate when the kernel writes individual data pages to disk.

As a result, PostgreSQL effectively operates with two layers of caching: shared buffers in userspace and the page cache in the kernel. This is intentional. PostgreSQL delegates readahead, writeback batching, and I/O scheduling to Linux, while focusing on transactional correctness itself.

The downside is that PostgreSQL is directly exposed to the kernel’s dirty page policy. When the page cache is misconfigured, PostgreSQL backends pay the price.

How Page Cache Misconfiguration Hurts PostgreSQL

Many PostgreSQL performance issues attributed to “slow disks” or “bad checkpoints” are, in reality, symptoms of poor page cache configuration.

If vm.dirty_background_ratio is set too high, background writeback starts very late. During write-heavy operations – such as bulk inserts, VACUUM, or index creation – dirty pages accumulate rapidly in memory. The system may appear fast initially because writes return immediately. Eventually, however, the dirty set grows so large that writeback cannot keep up. When vm.dirty_ratio is finally reached, PostgreSQL backend processes are suddenly throttled by the kernel. Queries stall, latency spikes appear, and throughput collapses in bursts.

If vm.dirty_ratio itself is set excessively high, the problem becomes even worse. The kernel allows massive amounts of dirty memory to build up, sometimes tens of gigabytes on large systems. When writeback eventually catches up – often during checkpoints or periods of reduced activity – it does so aggressively. This results in long fsync() times, I/O saturation, and unpredictable response times, precisely the opposite of what a database workload needs.

On the other hand, setting dirty limits too low has its own dangers. If vm.dirty_background_ratio or vm.dirty_ratio is overly restrictive, PostgreSQL backends are throttled almost continuously. Write throughput drops, CPU cores go idle while waiting on I/O, and overall system efficiency suffers. The database feels sluggish even though storage bandwidth may not be fully utilized.

The most subtle failure mode is instability. Poorly chosen dirty limits can cause PostgreSQL to oscillate between fast and stalled phases, making performance hard to predict and even harder to diagnose.

Best Practices for Kernel Dirty Page Configuration

On modern database servers with large amounts of RAM, percentage-based dirty limits are often a trap. Percentages scale with memory size, not with storage throughput. Adding more RAM should not automatically allow vastly more dirty data.

For PostgreSQL systems, it is usually safer to configure absolute limits using vm.dirty_background_bytes and vm.dirty_bytes. This keeps writeback behavior predictable and independent of RAM size. Background writeback should start early enough to run continuously, and the hard limit should be low enough that the kernel can drain dirty pages without long stalls.

Equally important is aligning kernel behavior with PostgreSQL configuration. Checkpoint frequency, checkpoint completion targets, and background writer activity must be considered together with dirty page limits. Tuning one layer in isolation often makes things worse, not better.

Summary

PostgreSQL’s performance on Linux is inseparable from the kernel’s page cache behavior. Shared buffers and the page cache form a cooperative system, and kernel parameters such as vm.dirty_background_ratio and vm.dirty_ratio define how smoothly that cooperation works.

Misconfigured dirty page limits can turn fast storage into unpredictable latency, stall database backends, and make performance tuning feel like guesswork. Correctly configured, the same mechanisms provide steady writeback, stable latency, and predictable throughput. Understanding and testing these interactions is therefore not optional – it is a core skill for running PostgreSQL reliably on Linux.

Thanks for reading,

-Klaus

2 thoughts on “The Linux Page Cache and PostgreSQL”

    1. Klaus Aschenbrenner

      Hello Holger,

      Yes – everything described in the article is still true for PostgreSQL 18 by default. PostgreSQL still relies on the Linux page cache for buffering reads and writes, so kernel settings like vm.dirty_* remain relevant.

      PostgreSQL 18 introduces a new asynchronous I/O subsystem and experimental Direct I/O support, but Direct I/O is not the default. As long as you run PostgreSQL with the standard I/O path (which most installations do), the interaction with the Linux page cache behaves exactly as described in the article.

      So the architectural principles remain valid – PostgreSQL 18 just adds more control over how I/O is issued, especially for reads.

      Thanks,

      -Klaus

Leave a Reply to Holger Cancel Reply

Your email address will not be published. Required fields are marked *

Do you want to master SQL Server 2025 like an expert?

SQL Server 2025 Unleashed: AI, Performance & Beyond

Live Training on March 9 for only EUR 690 incl. 20% VAT