PostgreSQL
PostgreSQL is the most capable open-source relational database available. It is the correct default for nearly any application. It supports ACID transactions, complex queries, JSONB for semi-structured data, full-text search, geospatial queries (PostGIS), and a rich set of index types. If your data fits on one machine (up to several TB with proper tuning), PostgreSQL handles it.
PostgreSQL is the Swiss Army knife you already own — it does JSONB, full-text search, geospatial, every index type. It is the Honda Civic of databases: boring, reliable, handles 95% of use cases, and you only need a Ferrari (ClickHouse) when you are literally racing.
PostgreSQL uses MVCC (Multi-Version Concurrency Control) — every transaction sees a snapshot of the data, and readers never block writers. Dead tuples accumulate from updates/deletes and are cleaned up by autovacuum (tune autovacuum_vacuum_cost_delay and autovacuum_vacuum_scale_factor for large tables). The WAL (Write-Ahead Log) ensures durability — every change is written to WAL before the data file, enabling point-in-time recovery. TOAST (The Oversized-Attribute Storage Technique) automatically compresses and stores large values out-of-line. Index types: B-tree (default, for equality and range), Hash (equality only, rarely used), GIN (full-text search, JSONB, arrays), GiST (geospatial, range types), BRIN (block-range for naturally ordered data like timestamps — tiny index for huge tables). Use pg_stat_statements to identify slow queries, and PgBouncer for connection pooling.
PostgreSQL internals that matter at scale: the buffer pool (shared_buffers, typically 25% of RAM) caches frequently accessed pages. effective_cache_size tells the planner how much OS page cache to expect (typically 75% of RAM). work_mem controls per-operation sort/hash memory — set too low and sorts spill to disk, set too high and concurrent queries exhaust RAM. Parallel query execution (max_parallel_workers_per_gather) enables scanning large tables with multiple workers. Table partitioning (declarative since v10) is essential for tables over 100M rows — partition by date range for time-series data, by hash for even distribution. Logical replication enables zero-downtime schema migrations by replicating to a new table with the updated schema. pg_stat_user_tables shows dead tuple counts and last autovacuum time — if dead tuples are growing, autovacuum cannot keep up. For very high connection counts (1000+), PgBouncer in transaction mode is mandatory.
PostgreSQL is my default database choice for almost every application. It handles OLTP workloads excellently with MVCC for concurrency, WAL for durability, and a rich set of index types (B-tree, GIN for JSONB/full-text, GiST for geospatial, BRIN for time-series). For performance at scale, the key levers are: proper indexing (use pg_stat_statements to find slow queries, EXPLAIN ANALYZE to understand plans), connection pooling with PgBouncer (each connection costs ~10MB RAM), autovacuum tuning for tables with high update rates, and declarative partitioning for tables over 100M rows. PostgreSQL comfortably handles single-digit TB with proper tuning. I only reach for other databases when I have a workload it cannot serve: sub-second analytics on billions of rows (ClickHouse), document flexibility without any joins (MongoDB), or key-value at massive scale (DynamoDB).
Not using connection pooling. PostgreSQL forks a new process per connection, each consuming ~10MB of RAM. Without PgBouncer, a spike to 500 connections means 5GB of RAM just for connections, and you hit max_connections limits. Always put PgBouncer in front of PostgreSQL in production.