Table Partitioning
Partitioning splits a large table into smaller physical pieces (partitions) based on a key column. Queries that filter on the partition key only scan relevant partitions (partition pruning), dramatically reducing I/O.
Partitioning is a filing cabinet with labeled drawers (2024-Jan, 2024-Feb). Searching for January files? Open only the January drawer (partition pruning). Archiving 2022? Pull out the 2022 drawer and discard it (DROP PARTITION).
PostgreSQL partition types: RANGE (date ranges — monthly partitions for time-series), LIST (discrete values — by country or status), HASH (even distribution by hash — for tables without a natural range key). Partition pruning: the planner eliminates irrelevant partitions when the filter uses the partition key. Local indexes: each partition has its own index — smaller, more cache-friendly than a monolithic index. Constraint exclusion and partition pruning must be enabled. Operations on a partition (VACUUM, ANALYZE, index rebuild) run independently, improving maintenance parallelism.
Partitioning enables dropping old data instantly: DROP TABLE orders_2022 is O(1) vs DELETE FROM orders WHERE year = 2022 which is O(n) with MVCC overhead. Partition-wise join (PostgreSQL 11+): when joining two partitioned tables on the partition key, the planner can join matching partitions independently in parallel. Sub-partitioning: partition by year, then sub-partition each year's partition by month. Partition overhead: the planner must visit each partition's metadata on every query — large numbers of partitions (>1000) can slow planning time significantly. pg_partman extension automates partition creation and maintenance.
Partitioning is essential for time-series tables (logs, events, orders by date). I use RANGE partitioning by month with automatic partition creation via pg_partman. Old partition dropping (DROP TABLE monthly_partition) is instantaneous vs DELETE. Partition pruning eliminates irrelevant month scans. I keep partition counts reasonable (<500) to avoid slow planning time.
Partitioning without filtering on the partition key in queries. Without a WHERE clause on the partition key, PostgreSQL scans ALL partitions — worse than a single table with an index because of the per-partition overhead.