Exponential Backoff
Overview
Exponential Backoff is a standard resilience strategy used to gradually increase the delay between consecutive actions. This is useful in two primary scenarios:
- Polling Idle Resources: When a background service is polling for work (like the
OutboxPublisherchecking for new events) and finds none, it's inefficient to poll again immediately. Exponential backoff increases the wait time between polls, reducing CPU and network usage during idle periods. - Retrying Failed Operations: When retrying a failed call to a downstream service, applying an increasing delay between attempts gives the struggling service time to recover.
The Athomic implementation provides a stateful BackoffHandler that manages this logic based on configurable, named policies.
Key Features
- Policy-Based: Define multiple named backoff policies with different timings (
min_delay,max_delay,factor) for various use cases. - Stateful Handler: The
BackoffHandlerautomatically manages the current delay state, increasing it after each wait and resetting it when work is found. - Live Configuration: Backoff policies can be tuned in real-time without an application restart.
How It Works
The system is composed of three main components:
-
BackoffPolicy: A simple data object that holds the rules for a backoff strategy:min_delay: The initial and minimum wait time.max_delay: The maximum time to wait, which caps the exponential growth.factor: The multiplier used to increase the delay after each wait (e.g., a factor of1.5will increase the wait time by 50% on each step).
-
BackoffHandler: The stateful object that orchestrates the backoff logic. Its key methods are:wait(): Asynchronously sleeps for thecurrent_delayperiod and then calculates the next, longer delay.reset(): Resets thecurrent_delayback to the policy'smin_delay.
-
BackoffFactory: A factory used to create configuredBackoffHandlerinstances based on named policies from yoursettings.toml.
Use Case: The OutboxPublisher Polling Loop
The OutboxPublisher is a perfect example of this pattern in action:
- In its main loop, it polls the database for new events.
- If no events are found, it calls
backoff_handler.wait(). The first time, it might wait 1 second. The next, 1.5 seconds, then 2.25, and so on, up to a configured maximum. This makes the service highly efficient when idle. - As soon as it finds and processes an event, it immediately calls
backoff_handler.reset(). This resets the delay to the minimum, ensuring the service becomes highly responsive as soon as there is work to do.
Usage Example
You can use the BackoffHandler in any custom polling loop.
import asyncio
from nala.athomic.resilience.backoff import BackoffFactory
# Get a handler configured with the "my_worker_policy" from settings.toml
backoff_factory = BackoffFactory()
backoff_handler = backoff_factory.create_handler(policy_name="my_worker_policy")
async def my_polling_worker():
while True:
work_done = await poll_for_work()
if work_done:
# We found work, so reset the delay to be responsive
backoff_handler.reset()
else:
# No work found, wait with an increasing delay
print("No work found, backing off...")
await backoff_handler.wait()
Configuration
You define backoff policies in your settings.toml under the [resilience.backoff] section.
[default.resilience.backoff]
enabled = true
# A default policy if no specific one is requested.
[default.resilience.backoff.default_policy]
min_delay_seconds = 1.0
max_delay_seconds = 30.0
factor = 1.5
# A dictionary of named, reusable policies.
[default.resilience.backoff.policies]
# A policy for an aggressive, fast-polling worker.
[default.resilience.backoff.policies.outbox_publisher_polling]
min_delay_seconds = 0.1
max_delay_seconds = 5.0
factor = 1.2
# A policy for a slow, infrequent background job.
[default.resilience.backoff.policies.daily_cleanup_job]
min_delay_seconds = 60.0
max_delay_seconds = 3600.0 # 1 hour
factor = 2.0
Live Configuration
Because BackoffSettings is a LiveConfigModel, you can change any of these policy values in your live configuration source (e.g., Consul), and the changes will be reflected in the BackoffHandler instances without requiring a restart.
API Reference
nala.athomic.resilience.backoff.handler.BackoffHandler
Manages the state and logic for an exponential backoff strategy.
This handler keeps track of the current delay, dynamically increasing it based on the configured policy parameters (min_delay, max_delay, factor) and providing hooks for error handling.
__init__(policy, operation_name='unknown', on_error=None)
Initializes the BackoffHandler.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
policy
|
BackoffPolicy
|
The immutable policy defining the backoff rules. |
required |
operation_name
|
str
|
A descriptive name for the operation being managed, used for dedicated logging. Defaults to "unknown". |
'unknown'
|
on_error
|
Optional[ErrorCallback]
|
An asynchronous callback executed immediately after an error occurs but before sleeping. |
None
|
reset()
Resets the internal delay counter back to the minimum policy value.
This should be called after a successful operation to immediately resume high-frequency polling or operation attempts.
wait()
async
Waits for the current delay period (typical usage for an idle/polling state) and then increases the delay for the next cycle.
wait_after_error(exc)
async
Executes the on_error callback (if configured) and then waits for
the current delay period, increasing it for the next cycle.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
exc
|
Exception
|
The exception that triggered the delay. |
required |
nala.athomic.resilience.backoff.policy.BackoffPolicy
dataclass
A Value Object holding the configuration for an exponential backoff strategy.
This policy defines the parameters used by the BackoffHandler to control
the delay between consecutive polling cycles or retry attempts, ensuring
the system waits increasingly longer after failures or during idle periods.
Attributes:
| Name | Type | Description |
|---|---|---|
min_delay |
float
|
The initial and minimum delay in seconds. |
max_delay |
float
|
The maximum delay in seconds, capping the exponential increase. |
factor |
float
|
The multiplier used to increase the delay after each step. |
nala.athomic.resilience.backoff.factory.BackoffFactory
Factory class responsible for creating configured instances of BackoffHandler.
It resolves configuration policies (default or named override) and constructs a runnable handler ready for use in polling loops or retry mechanisms.
__init__(settings=None)
Initializes the factory with resolved application settings for backoff.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
settings
|
Optional[BackoffSettings]
|
Explicit settings instance. If None, loads from global application settings. |
None
|
create_handler(policy_name=None, operation_name='default', on_error=None)
Creates a new BackoffHandler instance configured with the specified policy.
The method resolves the correct configuration: prioritizing a named policy if provided and found, or falling back to the default policy otherwise.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
policy_name
|
Optional[str]
|
The name of a policy defined in settings (e.g., 'high_contention'). |
None
|
operation_name
|
str
|
A descriptive name for the operation using this handler, used for logging and metrics. Defaults to "default". |
'default'
|
on_error
|
Optional[ErrorCallback]
|
An asynchronous callable hook executed before sleeping after a failure. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
BackoffHandler |
BackoffHandler
|
A fully configured handler instance. |