What is Buffer Pool & Shared Buffers in SQL Deep Dive?

Database Internalsmedium

Buffer Pool & Shared Buffers

The buffer pool (shared_buffers in PostgreSQL) is an in-memory cache of database pages (8KB each). Frequently accessed pages stay in memory, avoiding disk I/O. Larger shared_buffers means more cache hits and faster queries.

Memory anchor

Shared buffers is the RAM cache on your desk — pages you worked on recently sit there for quick access. The OS page cache is a larger filing cabinet in the same room. Disk is the warehouse outside — slow to walk to.

Expected depth

PostgreSQL recommended setting: shared_buffers = 25% of RAM (OS page cache handles the rest). Cache hit rate: (shared buffers hits) / (shared buffers hits + reads from disk). Target > 99% for OLTP. Page replacement policy: PostgreSQL uses a clock-sweep algorithm (simpler than LRU). The OS page cache is a second caching layer — PostgreSQL I/O goes to OS cache before disk. This double-caching means effective_cache_size (planning hint, not allocation) should be set to total RAM for the planner to use optimistic I/O estimates. Random I/O is expensive on spinning disks (HDD) but cheap on SSDs — this changes index scan vs seq scan decisions significantly.

Deep — senior internals

Buffer pool internals: each 8KB page has a buffer descriptor with pin count (how many backend processes hold it), dirty flag, and usage count (for clock-sweep eviction). Large shared_buffers doesn't always help — too large reduces OS page cache available for WAL and temporary sort files. PostgreSQL 14+ reduced buffer pool contention with per-partition buffer table. pg_buffercache extension reveals which relations occupy buffer pool pages. Huge pages (Linux THP) reduce TLB pressure for large shared_buffers. For read-heavy workloads, PgBouncer connection pooling reduces the overhead of too many backends all holding buffer pins.

🎤Interview-ready answer

I set shared_buffers to 25% of RAM and effective_cache_size to 75% (for planner cost estimates). I monitor cache hit rate via pg_stat_bgwriter — target > 99%. For SSD-backed databases, the planner's I/O cost assumptions may be too pessimistic — I tune random_page_cost down (1.1 for SSDs vs 4.0 default) to encourage index scans over sequential scans.

⚠Common trap

Setting shared_buffers to 80% of RAM. This starves the OS page cache needed for WAL and temp files, causing unexpected disk I/O. Keep it at 25%.

Related concepts

Node.js

libuv & Thread Pool

Java

String Pool & Interning