OLAP Fundamentalshigh

Columnar Storage Internals

Columnar databases store data by column rather than by row. When an analytical query asks for 'average revenue by region,' the database reads only the revenue and region columns, skipping everything else. This reduces I/O by 10-100x compared to row stores that would read entire rows. Columnar data also compresses extremely well because adjacent values have the same type and often similar values.

Memory anchor

Columnar storage is like sorting your closet by item type: all shirts together, all pants together. Need 'all blue items'? You only open the color drawer. Dictionary encoding is labeling hangers 1-5 instead of writing 'blue' a thousand times.

Expected depth

Key columnar techniques: dictionary encoding replaces repeated strings with integer keys (a country column with 200 unique values becomes 1 byte per row instead of variable-length strings). Run-length encoding collapses consecutive identical values. Delta encoding stores differences between sorted values (timestamps become small integers). Bit-packing uses the minimum number of bits per value. These techniques stack — a 1TB raw dataset might compress to 50-100GB in a columnar format. Predicate pushdown and partition pruning further reduce data scanned: if a query filters on date, the engine skips all partitions outside that date range without reading them.

Deep — senior internals

The Parquet file format (used by Snowflake, BigQuery, Spark, and many data lake tools) is the de facto standard for columnar storage. It organizes data into row groups (typically 128MB), and within each row group, columns are stored separately with per-column statistics (min, max, count, null count). These statistics enable min/max pruning — if a row group has max(date) = 2024-01-15 and your query filters date > 2024-02-01, the entire row group is skipped. ORC (Optimized Row Columnar) is an alternative format popular in the Hive ecosystem with similar principles but different implementation choices (stripe-level indexes, bloom filters). Zone maps in ClickHouse and Snowflake provide similar skip-scanning capabilities.

🎤Interview-ready answer

Columnar storage is the foundation of every modern OLAP engine. It works by storing values of each column contiguously, which enables three key optimizations: massive compression through type-specific encoding (dictionary, run-length, delta), reduced I/O by reading only queried columns, and vectorized CPU execution on batches of same-type values. In practice, a 1TB dataset in row format might be 50-100GB in columnar format. Combined with partition pruning and zone maps, analytical queries scan only a fraction of the data. This is why ClickHouse can query billions of rows in milliseconds — it is not magic, it is columnar storage plus aggressive compression plus vectorized execution.

Common trap

Assuming columnar compression ratios are guaranteed. Compression depends on data characteristics: high-cardinality columns (UUIDs, free-text) compress poorly compared to low-cardinality columns (country, status). Sorting data by low-cardinality columns before storage dramatically improves compression.

Related concepts