Data Flow & Messaginghigh

Outbox Pattern

Write domain events to an outbox table in the same DB transaction as the state change. A separate relay process reads the outbox and publishes events to the message broker, guaranteeing at-least-once delivery.

Memory anchor

Outbox pattern = writing a sticky note and dropping it in an 'outgoing mail' tray in the same motion as filing your paperwork. A mail carrier picks it up later — you never walk to the post office yourself.

Expected depth

The fundamental problem: you cannot atomically write to a database AND publish to a message broker — they are two separate systems. If you write to the DB then publish, you can crash between the two and lose the event. If you publish then write, you can crash and publish an event for a change that was never persisted. The outbox pattern solves this by making the DB write the single atomic operation: the outbox table is part of the same ACID transaction as the state change. The relay (Debezium for CDC, or a polling loop) then reads confirmed outbox rows and publishes them, marking them as published.

Deep — senior internals

Debezium is the gold standard for the relay: it reads the Postgres WAL (write-ahead log) and publishes changes to Kafka with low latency and no polling overhead. This is 'transactional outbox with CDC'. The published event must include the aggregate version to allow consumers to detect out-of-order delivery. The relay must publish idempotently — it marks outbox rows with a published timestamp and uses Kafka's idempotent producer (exactly-once semantics at the producer level) to avoid duplicate messages. A subtle issue: outbox table growth. You need a cleanup job that deletes published outbox rows older than a retention window, or the table becomes a disk pressure point.

🎤Interview-ready answer

I implement the outbox pattern with Debezium and Kafka for any service that needs to publish events as part of a state-changing operation. The application writes to its domain tables and an outbox table in one transaction. Debezium watches the WAL and publishes outbox rows to Kafka. The application never calls the Kafka producer directly in the request path — this eliminates the dual-write problem and decouples event publishing latency from request latency.

Common trap

Polling the outbox table with a background thread inside the application process. This works but introduces ordering issues (multiple application instances can publish out of order) and doesn't survive a JVM crash mid-publish. CDC via Debezium is the correct approach.

Related concepts