What is Snowflake in Databases?

Analytical Enginescritical

Snowflake

Snowflake is a cloud-native data warehouse that separates storage and compute. Data is stored in cloud object storage (S3/GCS/Azure Blob), and compute is provided by virtual warehouses (clusters of nodes) that can be independently scaled, paused, and resumed. You pay for storage (cheap) and compute (expensive, per-second billing). This separation means you can scale compute up for a heavy ETL job and then scale it back down without touching your data.

Memory anchor

Snowflake is a rental car company: storage is the parking lot (cheap), compute is the car (pay per second of driving), and you can rent a bigger car (4XL warehouse) for the weekend and return it Monday. Time Travel is the dashcam that lets you rewind to any moment in the last 90 days.

Expected depth

Virtual warehouses come in T-shirt sizes (XS to 6XL, each doubling the previous). Multi-cluster warehouses auto-scale by adding clusters for concurrent queries. Micro-partitions are the physical storage unit — 50-500MB compressed, automatically created by Snowflake (you do not partition manually like in PostgreSQL). Clustering keys (similar to a sort key) optimize micro-partition pruning for frequently filtered columns. Time Travel (up to 90 days on Enterprise) lets you query data as of any past point in time or undo accidental deletes — incredibly useful for recovering from mistakes. Zero-copy cloning creates instant copies of databases/schemas/tables for testing without duplicating storage. VARIANT data type stores semi-structured data (JSON, Avro, Parquet) natively with automatic schema detection and columnar optimization.

Deep — senior internals

Cost optimization is the most important operational skill with Snowflake. Key strategies: auto-suspend warehouses after 1-5 minutes of inactivity, use multi-cluster warehouses with economy scaling for non-urgent workloads, avoid queuing by right-sizing warehouse for concurrency, use materialized views for repeated expensive queries. Query profile analysis shows which stages consume the most time — remote disk reads indicate data not in the local SSD cache (the warehouse needs to warm up). Result cache returns results instantly for identical queries within 24 hours. Snowpipe enables continuous micro-batch ingestion from cloud storage. The data sharing feature allows exposing live data to other Snowflake accounts without copying — used for data marketplace and cross-organization analytics. Snowflake is not ideal for sub-second interactive analytics (startup time for warehouses and query compilation add latency) — use ClickHouse or a pre-computed OLAP cube for that use case.

🎤Interview-ready answer

Snowflake is my choice for enterprise data warehousing when the team needs managed infrastructure, BI tool integration, and governance features. Its separation of storage and compute is the key architecture — I can scale a warehouse to 4XL for a heavy ETL job and pause it when done, paying only for the seconds of compute used. Micro-partitions with clustering keys enable efficient pruning, and Time Travel provides a safety net for accidental data changes. I optimize costs by auto-suspending warehouses, using result caching, and right-sizing warehouses for their workload. The main limitation is latency — Snowflake queries have a minimum latency floor (200ms-1s for warehouse startup and query compilation), so it is not suitable for sub-second interactive dashboards. For that, I use ClickHouse or pre-aggregated materialized views.

⚠Common trap

Using Snowflake for real-time interactive dashboards that need sub-second response times. Snowflake is optimized for batch-oriented analytical queries with seconds-level latency. The warehouse startup time, query compilation, and remote storage access create a latency floor that makes it unsuitable for truly interactive real-time analytics. Use ClickHouse or pre-computed aggregations for that use case.

Related concepts

Databases

BigQuery

Databases

Columnar Storage Internals