Multi-Region Architecture
Deploying services across multiple geographic regions to reduce latency for global users, provide disaster recovery, and meet data residency compliance requirements.
Multi-region = opening franchise restaurants in different cities. Each kitchen cooks locally for speed, but the recipe book (data) must stay in sync across all locations.
Multi-region introduces the fundamental tension between consistency and latency: a write in us-east-1 must be replicated to eu-west-1 before a user in Europe sees it, which takes time. Strategies: (1) single-region writes (primary region handles all writes, other regions are read replicas — simplest, but writes have latency for non-primary users); (2) active-active (all regions accept writes — lowest latency globally, but requires conflict resolution); (3) data sharding by geography (EU users' data never leaves the EU region — required for GDPR). DNS-based routing (Route 53 latency routing, Cloudflare) directs users to the nearest region.
Cross-region replication lag is a first-class concern. If your SLA requires reads to reflect writes within 500ms, you must measure and alert on replication lag. Aurora Global Database provides <1s cross-region replication. For the user session problem (a user logs in from the US then requests from the EU before their session replicates): use global session storage (DynamoDB Global Tables, Redis Enterprise Active-Active) rather than regional session stores. Data sovereignty (GDPR Article 44+) may legally prohibit replicating certain user data outside a jurisdiction — your data layer must support per-user or per-tenant data residency configuration.
For a B2C platform with global users, I use a primary region (us-east-1) for all writes with Aurora Global Database providing cross-region read replicas at <1s lag. I route reads to the nearest region using Route 53 latency routing. For EU data residency compliance, I shard EU user data to an eu-west-1 cluster that never replicates to other regions. Disaster recovery uses AWS health checks to automatically fail over to a secondary region within 60 seconds.
Declaring a system 'multi-region' when the database is still single-region. Multi-region application servers in front of a single-region database just move your SPOF from the application to the database — the latency and availability problem remains.