Indexes & Query Planshigh

Covering Indexes & Index-Only Scans

A covering index includes all columns needed to satisfy a query entirely from the index, without accessing the heap (main table). This 'index-only scan' is significantly faster because it eliminates random I/O to fetch row data.

Memory anchor

A covering index is a self-contained answer sheet — all the answers (columns) are on the sheet itself (index leaf), no need to look up the textbook (heap). INCLUDE columns are bonus answers appended to the sheet that don't affect how the sheet is sorted.

Expected depth

A query is index-only if every column referenced (SELECT, WHERE, ORDER BY, GROUP BY) is stored in the index. In PostgreSQL, INCLUDE clause adds non-key columns to the leaf pages without affecting index sort order: CREATE INDEX ON orders (user_id, created_at) INCLUDE (status, amount). This enables index-only scans for SELECT status, amount WHERE user_id = ? ORDER BY created_at without adding status and amount to the sort key. Visibility map: PostgreSQL must consult the visibility map to confirm rows are visible to the current transaction — heap access may still occur if the visibility map indicates uncertainty.

Deep — senior internals

The INCLUDE clause (PostgreSQL 11+, SQL Server since 2005) is a game changer — it stores extra columns in the leaf pages without bloating the internal tree nodes (which would slow down index traversal). INCLUDEd columns can't be used for filtering or ordering, only for covering. In MySQL, secondary indexes always include the primary key, making them implicitly covering for queries selecting PK columns. Covering indexes can significantly reduce I/O for OLAP-style queries on OLTP tables — a well-designed covering index on a reporting query can turn a multi-second full-table scan into a millisecond index-only scan.

🎤Interview-ready answer

Covering indexes are my primary tuning tool for high-frequency read queries. I add columns to the INCLUDE clause to enable index-only scans. For a query like 'get user's order count and total amount by status', an index on (user_id, status) INCLUDE (amount) turns a heap scan into an index-only scan. I check EXPLAIN output for 'Index Only Scan' to confirm.

Common trap

Over-including columns in indexes, making them huge. Wide covering indexes consume more memory (buffer pool), slow down writes, and may not fit hot data in cache. Include only the columns needed for the specific high-frequency query pattern.

Related concepts