ClickHouse
ClickHouse is an open-source columnar OLAP database built for real-time analytics on massive datasets. It can query billions of rows in milliseconds on modest hardware. Originally built by Yandex for web analytics (Yandex.Metrica processes 20+ billion events per day on ClickHouse). If you need sub-second analytics on 100M+ rows — dashboards, observability, event analytics — ClickHouse is likely your best option.
ClickHouse is a Formula 1 car for analytics: insanely fast on the track (queries billions of rows in milliseconds), but you must fuel it in bulk (batch inserts). Dripping fuel one drop at a time (single-row inserts) clogs the engine with 'too many parts.' The MergeTree is its turbo engine that keeps merging and sorting data parts in the background.
The MergeTree engine family is the core: MergeTree (base engine, sorted by primary key, supports TTL for automatic data expiration), ReplacingMergeTree (deduplicates by primary key during merges — eventual, not immediate), AggregatingMergeTree (pre-aggregates data during merges using aggregate function states). The primary key in ClickHouse is a sparse index — it stores one entry per granule (8192 rows by default), not per row. This means the primary key index fits in memory even for trillion-row tables. Materialized views in ClickHouse are incremental — they process only new inserts, not the entire table, making them ideal for pre-aggregating high-volume event streams. Data is inserted in batches (not single rows) — ClickHouse creates a new data part per insert, and too many small inserts cause the 'too many parts' error. Buffer tables or async insert mode solve this.
ClickHouse achieves its speed through: columnar storage with LZ4/ZSTD compression (10-30x compression ratios), vectorized query execution (processes data in batches using SIMD), primary key based granule skipping (sparse index), and aggressive parallel execution across cores. The merge process is background: data parts are periodically merged into larger sorted parts, applying deduplication (ReplacingMergeTree) or aggregation (AggregatingMergeTree) during the merge. This means queries on ReplacingMergeTree may see duplicates until the next merge — use FINAL keyword to force deduplication at query time (slower). Projections (since v21.6) are like automatic materialized views stored within the table — the query optimizer transparently uses a projection if it matches the query better than the primary sort order. Distributed tables span multiple shards, with each shard being a ReplicatedMergeTree for HA. ClickHouse Keeper (ZooKeeper replacement) coordinates replication. For real-time dashboards, the pattern is: ingest events via Kafka engine -> materialized view aggregates into AggregatingMergeTree -> dashboard queries the aggregate table.
ClickHouse is my choice for real-time analytics on massive datasets — observability platforms, event analytics, ad-tech dashboards — anywhere I need sub-second queries on billions of rows. The MergeTree engine with sparse indexing is the key: it stores one index entry per 8192 rows, keeping the index small enough to fit in memory even at trillion-row scale. Materialized views are incremental (process only new inserts), which lets me pre-aggregate high-volume event streams in real-time. I use AggregatingMergeTree for pre-computed rollups and ReplacingMergeTree when I need eventual deduplication. The critical operational requirement is batch inserts — ClickHouse creates a data part per insert, so single-row inserts cause 'too many parts' errors. I batch inserts (at least 1000 rows) or use the Kafka engine for streaming ingestion. Compared to Snowflake/BigQuery, ClickHouse gives me 10-100x better latency but requires more operational work.
Inserting data one row at a time into ClickHouse. ClickHouse creates a new data part for each INSERT, and having too many parts causes degraded performance and eventually errors. Always batch inserts (ideally 10,000+ rows per insert) or use Buffer tables / async insert mode for high-frequency writes.