Database Cost Optimization
Database costs are often the largest line item in cloud bills. The main cost drivers are compute (instance size, virtual warehouse time), storage (data volume, retention), I/O (reads/writes, data transfer), and licensing. Cost optimization starts with right-sizing: most databases are over-provisioned because engineers fear performance issues.
Database cost optimization is like a utility bill audit: the biggest savings come from finding the one appliance left running 24/7 (bad query doing full-table scans), not from switching lightbulbs (instance sizing). Snowflake warehouses left on are space heaters running in summer — auto-suspend them.
Snowflake costs: auto-suspend warehouses after 1-5 minutes (a 4XL warehouse running idle for 8 hours costs hundreds of dollars), use multi-cluster warehouses with economy scaling for non-urgent queries, leverage result caching to avoid recomputation. BigQuery costs: partition and cluster tables (reduces data scanned, directly reducing on-demand costs), avoid SELECT * (you pay per byte scanned), use flat-rate pricing for predictable workloads. DynamoDB costs: use on-demand mode for unpredictable workloads, provisioned with auto-scaling for predictable workloads, minimize GSI count (each GSI stores a copy of projected attributes), avoid Scan operations. PostgreSQL costs: right-size RDS instances (monitor CPU, memory, I/O), use read replicas for read-heavy workloads, consider Aurora for auto-scaling storage, use reserved instances for 1-3 year commitments (40-60% savings).
Advanced cost strategies: data tiering (move cold data to cheaper storage — Snowflake storage is much cheaper than compute, S3 Glacier for archive), TTL-based data expiration (ClickHouse TTL, DynamoDB TTL, PostgreSQL pg_partman with drop_partition), and compute scheduling (pause Snowflake warehouses during off-hours, scale down RDS instances on weekends). For ClickHouse, self-hosted on reserved instances is dramatically cheaper than cloud-managed alternatives for sustained workloads. Monitoring: track Snowflake credits per warehouse per query, BigQuery bytes scanned per query (INFORMATION_SCHEMA.JOBS), DynamoDB consumed capacity units, PostgreSQL query stats via pg_stat_statements. The biggest savings usually come from fixing inefficient queries (a single bad query scanning full tables can cost more than all other queries combined) rather than optimizing infrastructure.
Database cost optimization follows a clear priority: fix inefficient queries first (a single full-table scan in BigQuery can cost more than a month of well-optimized queries), then right-size compute, then optimize storage. For Snowflake, I auto-suspend warehouses and use multi-cluster economy scaling. For BigQuery, I partition tables, cluster by common filter columns, and select only needed columns. For DynamoDB, I use Query instead of Scan, minimize GSIs, and enable TTL for expired data. For PostgreSQL on RDS, I use reserved instances and read replicas. The meta-principle is monitoring: you cannot optimize what you do not measure. I track per-query costs (BigQuery INFORMATION_SCHEMA, Snowflake query history, pg_stat_statements) and set alerts for anomalous spending.
Over-provisioning DynamoDB in provisioned mode with high WCU/RCU to avoid throttling, when on-demand mode would be cheaper for your actual usage pattern. Conversely, using on-demand mode for a steady, predictable workload when provisioned mode with auto-scaling would be 5-7x cheaper. Always model both pricing modes against your actual traffic patterns.