Data Modeling & Designmedium

Multi-Tenant Database Design

Multi-tenant database design determines how to isolate customer data in a SaaS application. Three approaches: shared database with tenant_id column (cheapest, lowest isolation), shared database with schema-per-tenant (moderate isolation), and database-per-tenant (highest isolation, most expensive). Most SaaS applications start with the shared database approach and migrate to higher isolation as they grow.

Memory anchor

Multi-tenant design is an apartment building: shared database + RLS is everyone in one building with locked doors (cheapest, Row Level Security is the lock). Schema-per-tenant is separate floors with their own keys. Database-per-tenant is separate houses — maximum privacy, but you need a separate plumber, electrician, and security system for each one.

Expected depth

Shared database with tenant_id: add tenant_id to every table, enforce it in every query via application middleware or Row Level Security (RLS). PostgreSQL RLS (CREATE POLICY) enforces tenant isolation at the database level — even if application code forgets the WHERE clause, the database filters by tenant. This is the most operationally simple approach. Schema-per-tenant: each tenant gets their own schema within the same database. Moderate isolation, schema migrations must run on every schema. Works well up to ~1000 tenants. Database-per-tenant: maximum isolation, simplest per-tenant management (backup, restore, migrate), but operationally expensive at scale (connection pooling across thousands of databases). Used for enterprise customers with compliance requirements (data residency, dedicated encryption keys).

Deep — senior internals

Hybrid approach: shared database for small tenants (free, starter plans) with RLS, dedicated schemas for medium tenants, and dedicated databases for enterprise tenants requiring compliance isolation. This maps tenant tier to isolation level. Noisy neighbor problem: one tenant running expensive queries degrades performance for all tenants in a shared database. Mitigations: per-tenant query timeouts (statement_timeout per role), resource queues, or moving heavy tenants to dedicated resources. Connection pooling with multi-tenant: PgBouncer can pool connections across tenants in the shared-database model. For schema-per-tenant, ensure the pooler routes to the correct schema (SET search_path). For database-per-tenant, each database needs its own pool — this scales poorly beyond ~100 databases. Citus (PostgreSQL extension) provides transparent multi-tenant sharding: it distributes tables by tenant_id across nodes while maintaining the single-database SQL interface.

🎤Interview-ready answer

I design multi-tenant databases starting with shared-database-with-RLS for most tenants and offering dedicated isolation for enterprise customers. PostgreSQL Row Level Security enforces tenant isolation at the database level, which is safer than relying on application code to add WHERE tenant_id = X to every query. For the noisy neighbor problem, I set per-tenant statement timeouts and monitor per-tenant query performance. As we grow, I use a hybrid model: small tenants share the database with RLS, medium tenants get dedicated schemas, and enterprise tenants (with compliance requirements) get dedicated databases. Citus extends this by distributing the shared database across nodes by tenant_id, providing horizontal scaling while maintaining the SQL interface. The key principle is starting simple (shared database) and adding isolation only when specific tenants require it.

Common trap

Starting with database-per-tenant from day one when you have 5 customers. This approach is operationally expensive: each database needs monitoring, backups, connection pooling, and schema migration management. At 5 customers it seems manageable; at 5,000 it is a nightmare. Start with shared database plus RLS and migrate individual tenants to higher isolation when they need it.

Related concepts