Skip to content

Retry & Dead Letter Queue (DLQ)

Overview

The Retry and Dead Letter Queue (DLQ) mechanism is a critical resilience pattern that prevents message loss when a consumer fails to process a message. It provides an automated, configurable way to handle transient (temporary) and permanent processing failures gracefully.

  • Retry: When a consumer handler fails with a transient error, the message is automatically republished to the original topic to be processed again. This is handled by the DLQHandler.
  • Dead Letter Queue (DLQ): If a message fails to be processed after a configured number of retry attempts, it is considered a "poison pill" and is moved to a separate, dedicated topic called the Dead Letter Queue. This removes the failing message from the main processing queue, preventing it from blocking other messages.

The Failure Handling Flow

When a decorated consumer handler (@subscribe_to) raises an exception, the following automated flow is triggered:

  1. Failure Detection: The BaseMessageProcessor catches the exception.
  2. DLQ Handler Invocation: It packages all information about the failure (the original message, headers, exception details) into a FailureContext object and passes it to the DLQHandler.
  3. Retry Check: The DLQHandler inspects the message's headers to check the current retry attempt count (using the x-retry-attempts header).
  4. Decision:
    • If the attempt count is less than max_attempts (from your configuration), the handler republishes the message back to its original topic with an incremented retry header.
    • If the attempt count has reached max_attempts, the handler constructs a detailed "DLQ envelope" (a JSON object with the original message and error details) and publishes it to the configured DLQ topic.

The DLQ Processor

Once a message is in the DLQ topic, it needs to be handled. Athomic provides a dedicated background service, the DeadLetterProcessorService, which automatically subscribes to your DLQ topic.

This service consumes the DLQ envelopes and passes them to a configured Dead Letter Strategy for final disposition.

Available Strategies

  • logging_only (Default): This strategy simply logs the full details of the failed message envelope as a structured ERROR log. This is the safest default, ensuring that failing messages are recorded for manual inspection without taking further automated action.
  • republish_to_parking_lot: This strategy republishes the failed message to yet another topic, a "parking lot", which can be used for long-term storage or offline analysis.

Configuration

The retry and DLQ mechanism is configured under the [integration.messaging.dlq] section in your settings.toml.

[default.integration.messaging.dlq]
# A master switch to enable or disable the entire retry/DLQ flow.
enabled = true

# The maximum number of times to retry processing a message before sending it to the DLQ.
max_attempts = 3

# The name of the Dead Letter Queue topic where failed messages will be sent.
topic = "platform.dead-letter-queue.v1"

# The strategy for handling messages that arrive in the DLQ.
# Can be "logging_only" or "republish_to_parking_lot".
strategy = "logging_only"

# (Optional) The topic to use if the 'republish_to_parking_lot' strategy is chosen.
# parking_lot_topic = "platform.parking-lot.v1"

API Reference

nala.athomic.integration.messaging.retry.handler.DLQHandler

Orchestrates the handling of consumption failures, implementing the retry mechanism (republishing) and the final dead-letter queue (DLQ) publishing.

This class serves as the decision point: - If attempts < max_attempts and it's a single message: Republish for retry. - If attempts >= max_attempts or it's a batch failure: Send to DLQ.

__init__(producer, retry_policy, serializer, topic, max_attempts)

Initializes the DLQ handler with all necessary dependencies.

Parameters:

Name Type Description Default
producer ProducerProtocol

The producer used to send messages (for retry or DLQ).

required
retry_policy MaxAttemptsPolicy

The policy defining how retry headers and counts are managed.

required
serializer SerializerProtocol

The serializer used for general data conversion.

required
topic str

The destination topic name for the Dead Letter Queue.

required
max_attempts int

The maximum number of times to retry a message before sending to DLQ.

required

handle_failure(context) async

Orchestrates failure handling using an explicit context object. - Single messages will be retried up to max_attempts. - Batch messages will be sent directly to the DLQ without retry attempts.

Parameters:

Name Type Description Default
context FailureContext

The FailureContext object containing details of the consumption failure.

required

Returns:

Name Type Description
bool bool

True if the message was handled (retried or sent to DLQ), False otherwise.

nala.athomic.integration.messaging.retry.processor_service.DeadLetterProcessorService

Bases: BaseService

A dedicated background service that manages the lifecycle of a consumer subscribed to the Dead Letter Queue (DLQ).

This service is backend-agnostic as it receives a pre-configured consumer instance via dependency injection (DIP). Its purpose is purely to ensure the DLQ consumer is started and stopped gracefully with the application.

__init__(consumer)

Initializes the service with a pre-configured DLQ consumer.

Parameters:

Name Type Description Default
consumer BaseConsumer

A consumer instance that is already configured to subscribe to the DLQ topic and process its messages. This consumer is typically created by a Factory and injected here.

required

nala.athomic.integration.messaging.retry.strategies.protocol.DeadLetterStrategyProtocol

Bases: Protocol

Defines the contract for strategies that manage and handle messages that have permanently failed processing (Dead Letter Queue - DLQ).

Any class implementing this protocol must provide a concrete implementation for the asynchronous 'handle' method.

handle(failure_details) async

Processes a message that has permanently failed and requires a final disposition (e.g., logging, manual queue, or permanent storage).

Parameters:

Name Type Description Default
failure_details Dict[str, Any]

A dictionary containing all relevant information about the failed message, including payload, exception details, and retry history.

required

Returns:

Name Type Description
ProcessingOutcome ProcessingOutcome

An enum indicating the result of the DLQ handling process. - ACK: Indicates the handling strategy was successful and the message can be removed from the DLQ. - REQUEUE: Indicates a temporary failure in the handling process, and the message should be requeued to the DLQ for another attempt.