Indexing Strategies
Indexes are data structures that speed up queries by avoiding full table scans. A B-tree index (the default in PostgreSQL and MySQL) maintains sorted data and supports equality, range queries, and ORDER BY efficiently. Without proper indexes, a query on a 100M row table scans every row. With the right index, it finds the answer in microseconds by traversing a tree of typically 3-4 levels deep.
Indexes are like the index at the back of a textbook: B-tree is alphabetical (great for 'find chapter on Zebras'), GIN is the keyword index (find every page mentioning 'evolution'), BRIN is the table of contents (Chapter 1 is pages 1-50, Chapter 2 is 51-100). Each new index makes the book thicker and slower to update.
Index types in PostgreSQL: B-tree for equality and range queries (WHERE status = 'active', WHERE created_at > '2024-01-01'). GIN (Generalized Inverted Index) for full-text search, JSONB containment (@>), and array overlap (&&). GiST (Generalized Search Tree) for geospatial queries (PostGIS), range types, and nearest-neighbor searches. BRIN (Block Range INdex) for naturally ordered data like timestamps — stores min/max per block range, resulting in tiny indexes (MBs vs GBs for B-tree) on large time-series tables. Hash indexes for equality-only lookups (rarely better than B-tree in practice). Partial indexes (WHERE clause on the index) are incredibly powerful: CREATE INDEX idx_active_users ON users(email) WHERE deleted_at IS NULL — indexes only active users, making the index smaller and faster. Composite indexes follow the leftmost prefix rule: an index on (a, b, c) supports queries on (a), (a, b), and (a, b, c) but not (b) or (c) alone.
Index maintenance is as important as index creation. pg_stat_user_indexes shows index usage — drop unused indexes because they slow writes and waste space. Index bloat occurs when autovacuum cannot reclaim dead index entries; use pg_stat_user_tables to monitor and REINDEX CONCURRENTLY to rebuild without locking. Covering indexes (INCLUDE clause) store additional columns in the index leaf pages, enabling index-only scans that never touch the heap. Expression indexes (CREATE INDEX ON users(LOWER(email))) support queries on computed values. For write-heavy tables, each additional index slows INSERT/UPDATE/DELETE — benchmark the trade-off. In MySQL, covering indexes work through the clustered index: if all columns are in a secondary index, InnoDB can answer the query from the index alone (index-only scan). The query planner may choose not to use an index if it estimates a sequential scan is faster — this happens when the query returns a large fraction of the table (typically > 5-10%).
Indexing strategy starts with understanding your query patterns. I use EXPLAIN ANALYZE to identify slow queries, then choose the right index type: B-tree for equality and range, GIN for JSONB and full-text search, BRIN for time-series data (tiny index, huge table). Partial indexes are one of PostgreSQL most powerful features — indexing only the rows that matter (e.g., only active users) keeps the index small and fast. Composite indexes follow the leftmost prefix rule, so I order columns by selectivity. Equally important is index maintenance: I monitor pg_stat_user_indexes for unused indexes, watch for index bloat, and REINDEX CONCURRENTLY when needed. Every index slows writes, so I do not add indexes speculatively — I add them when pg_stat_statements shows a query needs one.
Adding indexes for every possible query without measuring. Each index slows every write operation and consumes storage. Use pg_stat_statements to find actual slow queries, and pg_stat_user_indexes to find indexes that are never used. The optimal number of indexes is the minimum needed to keep your important queries fast.