What is B-tree Indexes in SQL Deep Dive?

Indexes & Query Planscritical

B-tree Indexes

B-tree (Balanced Tree) is the default and most common index type. It maintains a sorted tree structure enabling O(log n) lookups and efficient range scans. Every column you ORDER BY, filter with =, <, >, BETWEEN, or LIKE 'prefix%' can benefit from a B-tree index.

Memory anchor

A B-tree index is a library card catalog — sorted alphabetically (the tree), each card points to the shelf location (heap pointer). A range scan is flipping through a section of cards sequentially. Sequential scan skips the catalog and just walks every shelf.

Expected depth

B-tree structure: root → internal nodes (contain sorted key copies and child pointers) → leaf nodes (contain actual key values and heap tuple pointers, linked as a doubly-linked list for range scans). Index scan vs sequential scan: the planner chooses based on selectivity. For > ~5% of rows, sequential scan is often faster (sequential I/O is faster than random I/O). Bitmap index scan: fetches many index entries, sorts by heap block address, then scans blocks in order — combines index precision with sequential I/O efficiency.

Deep — senior internals

Index page splits happen when a leaf page is full — the page splits into two, propagating up the tree. This causes write amplification and index bloat. High-frequency sequential inserts (auto-increment IDs) cause fewer splits than random UUIDs, which cause ~50% page fills and more bloat. VACUUM FULL or pg_repack rebuilds bloated indexes. Index correlation: pg_stats.correlation measures how well the physical row order matches the index order (1.0 = perfect, 0 = random). Low correlation means index scans require random I/O; the planner may prefer sequential scan. Partial indexes on predicates reduce index size: CREATE INDEX ON orders (created_at) WHERE status = 'pending' — only indexes pending orders.

🎤Interview-ready answer

B-tree is the right index for equality, range, and prefix searches. The planner uses it when selectivity is high enough — low-selectivity columns (boolean, status) often aren't worth indexing. I create partial indexes on filtered subsets (WHERE status = 'active') to reduce index size and keep it hot in cache. UUID primary keys cause index bloat from page splits — I use ULID or sequential IDs for high-insert tables.

⚠Common trap

Creating an index on a low-cardinality column like status (3 values) and expecting it to be used for range queries. The planner will choose a sequential scan for > ~5% row access. Indexes help when they eliminate most of the table.

Related concepts

JavaScript

Tree Shaking

DSA

Tree Traversals (Inorder, Preorder, Postorder)

DSA

Trie (Prefix Tree)

System Design

Index Types: B-tree vs LSM-tree