OLTP Fundamentalscritical

OLTP vs OLAP

OLTP (Online Transaction Processing) handles individual transactions — inserts, updates, deletes on specific rows. Think user signups, order placements, payment processing. OLAP (Online Analytical Processing) handles analytical queries — aggregations, scans, and joins across millions or billions of rows. Think revenue dashboards, cohort analysis, funnel reports. OLTP databases store data in rows (great for fetching a complete record). OLAP databases store data in columns (great for scanning a single field across all records).

Memory anchor

OLTP is a cashier ringing up one customer at a time (single-row transactions). OLAP is the manager in the back office reviewing all receipts from the entire year at once (columnar scans across millions of rows).

Expected depth

OLTP databases are write-optimized with row-oriented storage (PostgreSQL, MySQL). They use B-tree indexes for point lookups, support ACID transactions, and are normalized (3NF) to avoid write anomalies. OLAP databases are read-optimized with columnar storage (Snowflake, BigQuery, ClickHouse). They use techniques like dictionary encoding, run-length encoding, and bitmap indexes for compression. OLAP schemas are denormalized (star/snowflake schema) to avoid expensive joins at query time. A critical architectural principle: never run OLAP queries against your OLTP database. Use CDC (Change Data Capture) or ETL pipelines to replicate data from OLTP to OLAP systems.

Deep — senior internals

The row-vs-column distinction has deep implications for hardware utilization. Row stores load entire rows into CPU cache even when you only need one column — wasting memory bandwidth. Column stores load only the columns you query, achieving 10-100x compression ratios through type-specific encoding (delta encoding for timestamps, dictionary encoding for low-cardinality strings). Modern OLAP engines (ClickHouse, DuckDB) also leverage SIMD instructions to process columnar data in vectorized batches. The boundary between OLTP and OLAP is blurring: PostgreSQL with columnar extensions (Citus, Hydra) can handle moderate analytics, and HTAP databases (TiDB, CockroachDB, AlloyDB) attempt to serve both workloads. However, for serious scale (1B+ rows), dedicated OLAP engines still outperform HTAP systems by 10-100x.

🎤Interview-ready answer

OLTP and OLAP serve fundamentally different workloads. OLTP is for transactional operations — fetching or modifying individual rows with low latency and ACID guarantees. It uses row-oriented storage and B-tree indexes. OLAP is for analytical queries — scanning and aggregating across millions of rows. It uses columnar storage with aggressive compression. The key architectural principle is to separate these workloads: run your application against an OLTP database (PostgreSQL), and replicate data via CDC or ETL to an OLAP engine (ClickHouse, Snowflake, BigQuery) for analytics. Running analytical queries against your production OLTP database is a common mistake that degrades application performance.

Common trap

Thinking OLTP and OLAP are just about query types rather than storage engine fundamentals. The real difference is row-oriented vs columnar storage and the compression, vectorization, and scan optimizations that follow from that architectural choice.

Related concepts