Serverless & Event-Drivencritical

SQS & SNS

SQS (Simple Queue Service) is a fully managed message queue for decoupling services. SNS (Simple Notification Service) is a pub/sub messaging service that fans out messages to multiple subscribers simultaneously.

Memory anchor

SQS is a deli number system — each customer (message) gets a number, one server (consumer) handles each at a time, and they can't be skipped. SNS is a PA announcement — one message broadcasts to everyone in the building simultaneously.

Expected depth

SQS: Standard (at-least-once delivery, best-effort ordering) vs FIFO (exactly-once, strict ordering, 300 TPS base; 3,000 with batching; 9,000 with high-throughput FIFO mode). Visibility timeout: message is hidden from other consumers while being processed. Dead Letter Queue (DLQ): messages that fail processing after maxReceiveCount times go here for investigation. Long polling (WaitTimeSeconds: 20) reduces API calls and costs. SNS: a single publish fans out to SQS queues, Lambda functions, HTTP endpoints, email, SMS, mobile push. SNS+SQS fan-out pattern: SNS topic sends to multiple SQS queues for parallel processing.

Deep — senior internals

SQS message retention: default 4 days, max 14 days. Max message size: 256KB (use S3 for large payloads with Extended Client Library). FIFO queues with message groups enable parallel FIFO processing — messages in different groups process concurrently, messages in the same group in order. SQS Extended Client Library stores large message bodies in S3 and includes a reference pointer in the SQS message. Consumer scaling: Lambda + SQS event source mapping scales Lambda concurrency based on queue depth. DLQ alarms are critical — unmonitored DLQs silently accumulate failed messages. SQS resource policy enables cross-account access without sharing credentials.

🎤Interview-ready answer

SQS decouples producers from consumers, providing buffering for traffic spikes. I use FIFO when order matters (financial transactions, command processing) and Standard for high throughput workloads. DLQ is non-negotiable — failed messages need investigation. The fan-out pattern (SNS → multiple SQS queues) lets multiple services process the same event independently. For Lambda consumers, SQS auto-scales invocations based on queue depth.

Common trap

Not setting a DLQ on SQS queues. Messages that fail processing are retried until the retention period expires, then silently dropped. Without a DLQ, you have no visibility into failed messages and lose data.