OLAP Fundamentalshigh

Star & Snowflake Schema

Star schema is the standard data warehouse design pattern. A central fact table (e.g., sales transactions) contains foreign keys to dimension tables (date, product, customer, store). The fact table holds metrics (revenue, quantity), and dimensions hold descriptive attributes. It is called a star because the diagram looks like a star with the fact table in the center. Snowflake schema normalizes the dimensions further (e.g., product -> category -> department), reducing redundancy but adding joins.

Memory anchor

Star schema is a sun with planets: the fat fact table is the sun (billions of sales rows), and dimension tables orbit around it (date, product, store). Snowflake schema is when those planets have their own moons (product -> category -> department) — more normalized, more joins, more headaches.

Expected depth

Star schemas are deliberately denormalized — dimensions contain redundant data to minimize joins. This is the opposite of OLTP normalization, and it is intentional. In OLAP, read performance matters more than storage efficiency. Fact tables are typically very large (billions of rows) and narrow (foreign keys + metrics). Dimension tables are small (thousands to millions of rows) and wide (many descriptive columns). Slowly Changing Dimensions (SCD) handle how dimension attributes change over time: Type 1 (overwrite), Type 2 (new row with version), Type 3 (add a column for previous value). Most data warehouses use Type 2 for critical dimensions like customer.

Deep — senior internals

Modern cloud data warehouses have somewhat blurred the star/snowflake distinction. Snowflake (the product) and BigQuery can handle joins efficiently enough that strict star schema design is less critical than in legacy systems (Teradata, Oracle DW). However, the principles still matter: pre-joining data into wide denormalized tables (one-big-table or OBT approach) can significantly reduce query complexity and cost in pay-per-scan engines like BigQuery. The data vault modeling approach (hubs, links, satellites) is gaining popularity for raw data layer modeling because it handles schema evolution better than star schemas, with star schemas built as a semantic/presentation layer on top.

🎤Interview-ready answer

Star schema is the standard OLAP modeling pattern: a central fact table with foreign keys to dimension tables. Facts hold metrics, dimensions hold descriptive attributes. It is deliberately denormalized to minimize joins and optimize read performance. I use star schemas for BI-facing data marts where query simplicity matters. For the raw/staging layer, I prefer a data vault approach because it handles schema evolution and source system changes more gracefully. The fact/dimension model is important because it determines how your BI tools (Looker, Tableau, Metabase) generate queries — a well-designed star schema means simpler, faster queries.

Common trap

Over-normalizing your data warehouse into a snowflake schema to save storage. Storage is cheap; query performance and simplicity are expensive. In most cases, a star schema with denormalized dimensions is the better choice for OLAP workloads.

Related concepts