Data Modeling & Designhigh

Partitioning Strategies

Partitioning splits a large table into smaller physical pieces while maintaining a single logical table. This improves query performance (partition pruning skips irrelevant partitions), maintenance (vacuum, backup, drop individual partitions), and can distribute data across storage. PostgreSQL supports declarative partitioning since v10 with range, list, and hash partitioning. DynamoDB partitions automatically by partition key hash.

Memory anchor

Partitioning is splitting a massive bookshelf into labeled sections: 'January 2024,' 'February 2024.' Need data from March? Skip straight to that section (partition pruning). Need to delete 2022? Pull out the whole shelf section in one move instead of removing books one by one.

Expected depth

Range partitioning: split by date range (monthly partitions for time-series data). This is the most common strategy — queries with date filters prune irrelevant partitions instantly, and old data can be dropped by detaching the partition (instant, no DELETE needed). List partitioning: split by discrete values (partition per region, per tenant). Hash partitioning: distribute evenly by hash of a column — ensures equal partition sizes but does not support range queries on the partition key. Sub-partitioning (multi-level) combines strategies: partition by date range, then sub-partition by hash of tenant_id. Partition count matters: too few partitions means each is too large, too many means partition metadata overhead and query planner slowdown (PostgreSQL can struggle with 10,000+ partitions). Aim for partitions in the 1-50GB range.

Deep — senior internals

PostgreSQL partitioning internals: partition pruning happens at plan time (static) and execution time (dynamic, for parameterized queries). For partition-wise joins, PostgreSQL can join individual matching partitions rather than the whole table, dramatically speeding up joins between co-partitioned tables. Partition-wise aggregation similarly pushes aggregations into individual partitions. When migrating a large table to partitioned: create the partitioned table, attach existing table as default partition, then incrementally move data to proper partitions. Alternatively, use logical replication to migrate with zero downtime. In DynamoDB, partitioning is automatic but understanding it is critical: each partition handles 3000 RCU / 1000 WCU. Burst capacity is limited, and hot partitions throttle. Kinesis and Kafka use partition-based ordering guarantees — messages within a partition are ordered, across partitions are not.

🎤Interview-ready answer

I partition tables when they exceed 100M rows or when I need efficient data lifecycle management. My default strategy for time-series data is range partitioning by month — it enables partition pruning for date-filtered queries and instant data archival by detaching old partitions. For multi-tenant systems, I use list partitioning by tenant_id, which also enables per-tenant backups and enables future tenant isolation if needed. I aim for 1-50GB per partition and avoid going beyond a few hundred partitions to keep the query planner efficient. The key benefit beyond performance is operations: dropping a partition is instant (vs DELETE on millions of rows which is slow and generates dead tuples), and I can attach pre-loaded partitions for bulk imports without affecting online queries.

Common trap

Creating too many partitions. PostgreSQL query planning slows down significantly with thousands of partitions. If you partition daily and retain 10 years of data, that is 3,650 partitions — consider monthly partitions (120) or weekly (520) instead. Also, queries without a partition key filter scan all partitions, which is worse than a single unpartitioned table.

Related concepts