Choosing the Right Databasehigh

Multi-Database Architecture

Most production systems use multiple databases, each serving a specific purpose. A common pattern: PostgreSQL as the source of truth for transactional data, Redis for caching and sessions, Elasticsearch for search, and ClickHouse or BigQuery for analytics. The key is keeping these systems in sync — typically via Change Data Capture (CDC), event streams (Kafka), or application-level dual writes.

Memory anchor

Multi-database architecture is like a newsroom: PostgreSQL is the editor's master copy (source of truth), CDC (Debezium) is the wire service that broadcasts every edit, and downstream databases are different newspaper editions (Elasticsearch for search, ClickHouse for charts, Redis for the ticker). If any edition gets corrupted, reprint it from the master.

Expected depth

Synchronization approaches: CDC (Change Data Capture) via Debezium captures row-level changes from PostgreSQL WAL and streams them to Kafka, which feeds downstream systems (Elasticsearch, ClickHouse, Redis). This is the most reliable approach because it captures all changes including those made outside the application (migrations, manual fixes). Application-level dual writes (write to PostgreSQL and Elasticsearch in the same code path) are simpler but fragile — if the Elasticsearch write fails, data is inconsistent. Event sourcing (store events in Kafka, materialize views in various databases) is the most flexible but most complex. The outbox pattern (write to a database outbox table within the transaction, then asynchronously publish events) provides reliable event publishing without distributed transactions.

Deep — senior internals

Eventual consistency between databases is the fundamental trade-off. When a user creates a product in PostgreSQL, the Elasticsearch search index is updated asynchronously — there is a window (typically milliseconds to seconds) where the product exists in the database but is not yet searchable. Design your UX around this: after creating a product, route the user to the product page (from PostgreSQL) rather than the search results page (from Elasticsearch). For critical consistency requirements (financial data), avoid multi-database architectures and keep everything in PostgreSQL. The CQRS (Command Query Responsibility Segregation) pattern formalizes this: commands (writes) go to the primary database, queries (reads) go to optimized read stores. This is powerful but adds complexity — only adopt CQRS when you have proven that a single database cannot serve both your write and read patterns.

🎤Interview-ready answer

In multi-database architectures, I use CDC (Change Data Capture) via Debezium as the synchronization backbone. PostgreSQL WAL changes flow through Kafka to downstream systems — Elasticsearch for search, ClickHouse for analytics, Redis for caching. CDC is more reliable than application-level dual writes because it captures all changes regardless of origin. The key design principle is that PostgreSQL is the single source of truth, and all other databases are derived views that can be rebuilt from scratch. I design the UX to accommodate the eventual consistency window between databases — for example, after a write, I read from the primary database rather than from the search index. I only adopt CQRS when I have measured evidence that the read and write patterns cannot be served by a single database with read replicas.

Common trap

Using application-level dual writes (write to both PostgreSQL and Elasticsearch in the same code path) instead of CDC. Dual writes are fragile: if the second write fails, data is inconsistent, and there is no automatic recovery. CDC via Debezium captures changes from the WAL and guarantees at-least-once delivery to downstream systems.

Related concepts