Database Internalshigh

WAL (Write-Ahead Log)

WAL (Write-Ahead Log) is PostgreSQL's mechanism for durability. All changes are written to the WAL (append-only log on disk) before being applied to data files. On crash recovery, PostgreSQL replays the WAL to restore the committed state.

Memory anchor

WAL is a flight recorder (black box) — before the plane moves (data changes), the recorder logs the action. If the plane crashes (server failure), investigators (recovery) replay the recording to reconstruct exactly what happened.

Expected depth

WAL guarantees durability (the D in ACID): before reporting a COMMIT to the client, PostgreSQL flushes the WAL record to disk (fsync). The actual data file changes can be deferred — the WAL is the source of truth for recovery. Streaming replication: WAL segments are streamed to standby servers that replay them, maintaining a hot standby. Logical replication: WAL decoding interprets WAL records as row-level changes (INSERT/UPDATE/DELETE) for replication to heterogeneous systems (Debezium, Kafka CDC). checkpoint_completion_target controls how aggressively PostgreSQL writes dirty buffers to data files.

Deep — senior internals

WAL LSN (Log Sequence Number): monotonically increasing pointer into the WAL stream. pg_current_wal_lsn() returns the current write position. Replication lag: replica_lag = primary_lsn - replica_received_lsn. WAL archives: pg_basebackup + WAL archiving provides PITR (Point-in-Time Recovery) — replay WAL from a base backup to any point in time. wal_level: minimal (no replication), replica (streaming replication), logical (logical decoding for CDC). synchronous_commit: on (flush WAL to replica), remote_apply (wait for replica to apply), off (async — faster but risks data loss on primary failure). WAL writer and checkpointer are background processes that balance durability and I/O pressure.

🎤Interview-ready answer

WAL is what makes PostgreSQL durable — all commits are written to the sequential WAL before acknowledging success. On crash, replay from last checkpoint restores committed state. I use WAL-based streaming replication for standby databases and logical replication (Debezium) to stream changes to Kafka. For PITR, I archive WAL segments to S3 and take base backups — recoverable to any point in time.

Common trap

Setting synchronous_commit = off for performance thinking it's safe. It risks losing the last few milliseconds of commits on primary failure (the WAL isn't flushed to disk before acknowledging). For financial data, always use synchronous_commit = on.

Related concepts