Adaptive Throttling
Overview
Adaptive Throttling is a sophisticated, closed-loop resilience pattern that dynamically adjusts rate limits based on the real-time health of downstream services. While a standard rate limiter uses static, pre-configured limits, an adaptive throttler reacts to changing conditions to proactively prevent cascading failures.
For example, if the P99 latency of a downstream service suddenly spikes, or its error rate increases, the adaptive throttling engine will automatically reduce the rate limit of calls to that service, giving it a chance to recover. Once the service's health metrics return to normal, the throttler will gradually relax the limit back to its configured maximum.
This creates a self-regulating system that is far more resilient to partial outages and performance degradation than static rate limiting alone.
How It Works: The Feedback Loop
The system is orchestrated by the AdaptiveThrottlingService, a background service that runs a continuous feedback loop:
-
Monitor: A
MetricsFetcherperiodically queries a monitoring system (like Prometheus) for key health indicators of downstream services. These are defined by you as PromQL queries in the configuration. -
Decide: The fetched metrics (e.g.,
latency_p99,error_rate_percent) are passed to aDecisionAlgorithm. The algorithm compares these real-time values against healthy thresholds defined in your configuration. -
Adjust:
- If a threshold is breached, the algorithm calculates a new, more restrictive rate limit (e.g., reducing the current limit by 20%).
- If the system is healthy, the algorithm gradually increases the rate limit back towards the statically configured maximum.
-
Store: The newly calculated dynamic limit is stored in a distributed
AdaptiveStateStore(e.g., Redis) with a Time-To-Live (TTL). -
Enforce: The
AdaptiveRateLimiterProvideris configured to wrap the standard rate limiter. When a request is made, it first checks theAdaptiveStateStorefor a dynamic limit. If one exists, it is enforced. Otherwise, the static limit from the configuration is used.
This cycle repeats continuously, allowing the system to autonomously adapt to the real-time health of its dependencies.
Configuration
Adaptive Throttling is a powerful feature that requires careful configuration of its two main parts: the rate limiter itself, and the adaptive engine that controls it.
1. Enable the Adaptive Rate Limiter Provider
First, in your [resilience.rate_limiter] section, you must set the backend to "adaptive". This tells the RateLimiterFactory to create the AdaptiveRateLimiterProvider, which wraps your primary enforcement provider (like limits).
[default.resilience.rate_limiter]
# Enable the adaptive provider as the main backend
backend = "adaptive"
# The adaptive provider wraps another provider. Configure the base provider here.
[default.resilience.rate_limiter.provider]
backend = "limits"
storage_backend = "redis"
redis_storage_uri = "redis://localhost:6379/4"
strategy = "moving-window"
# Your static policies still act as the MAXIMUM ceiling for the adaptive limits.
[default.resilience.rate_limiter.policies]
external_api = "100/minute"
2. Configure the Adaptive Throttling Engine
Next, configure the feedback loop engine in the [resilience.adaptive_throttling] section.
[default.resilience.adaptive_throttling]
enabled = true
check_interval_seconds = 15 # Run the feedback loop every 15 seconds.
# Tell the engine which rate limit policies it should dynamically adapt.
policies_to_adapt = ["external_api"]
# --- State Store (where dynamic limits are stored) ---
state_store_backend = "redis"
state_store_uri = "redis://localhost:6379/5"
state_store_ttl_seconds = 300 # Dynamic limits expire after 5 minutes.
# --- Metrics Fetcher (where to get health data from) ---
metrics_fetcher_type = "prometheus"
metrics_fetcher_url = "http://prometheus:9090"
# Map internal metric names to your actual PromQL queries.
[default.resilience.adaptive_throttling.prometheus_queries]
latency_p99 = "histogram_quantile(0.99, sum(rate(http_client_request_duration_seconds_bucket{service_name='http_external_api'}[1m])) by (le))"
error_rate_percent = "(sum(rate(http_client_requests_total{service_name='http_external_api', status='failure'}[1m])) / sum(rate(http_client_requests_total{service_name='http_external_api'}[1m]))) * 100"
# --- Decision Algorithm Parameters ---
decision_algorithm = "threshold"
[default.resilience.adaptive_throttling.algorithm_params]
# Reduce limit if latency is over 2000ms
latency_threshold_ms = 2000.0
# Reduce limit if error rate is over 10%
error_threshold_percent = 10.0
# When breached, reduce the current limit by 25%
reduction_factor = 0.75
# When healthy, increase the current limit by 10%
increase_factor = 1.1
API Reference
nala.athomic.resilience.adaptive_throttling.service.AdaptiveThrottlingService
Bases: BaseService
Manages the lifecycle and core logic of the adaptive throttling engine.
This background service periodically fetches system metrics, calculates
optimal rate limits based on predefined algorithms and thresholds, and
stores the dynamic limits for enforcement by the AdaptiveRateLimiterProvider.
It inherits lifecycle management from BaseService.
Attributes:
| Name | Type | Description |
|---|---|---|
adaptive_settings |
AdaptiveThrottlingSettings
|
Specific configuration for the engine. |
rate_limit_settings |
RateLimiterSettings
|
Global rate limiter settings (used as reference). |
state_store |
AdaptiveStateStore
|
Component for storing and fetching dynamic limits. |
metrics_fetcher |
MetricsFetcher
|
Component for gathering system metrics (e.g., from Prometheus). |
decision_algorithm |
DecisionAlgorithm
|
Algorithm for calculating new limits. |
__init__(settings=None)
Initializes the AdaptiveThrottlingService.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
settings
|
Optional[AdaptiveThrottlingSettings]
|
Configuration settings. If None, loads from global settings. |
None
|
after_stop()
async
Hook called after the run loop is stopped to close background components.
nala.athomic.resilience.adaptive_throttling.protocols.DecisionAlgorithm
Bases: Protocol
Interface for algorithms that decide the new adaptive rate limit based on current conditions and metrics.
calculate_new_limit(policy_name, current_configured_limit, current_dynamic_limit, metrics)
Calculates the new adaptive limit based on provided context and metrics.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
policy_name
|
str
|
The name of the policy being adjusted (e.g., "default", "premium"). |
required |
current_configured_limit
|
str
|
The rate limit string defined in the static configuration (RateLimiterSettings) for this policy (or the default). Acts as a ceiling/reference. |
required |
current_dynamic_limit
|
Optional[str]
|
The currently active dynamic limit string retrieved from the AdaptiveStateStore, if any. |
required |
metrics
|
Dict[str, Any]
|
The dictionary of metrics retrieved by the MetricsFetcher. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
AdaptiveDecision |
AdaptiveDecision
|
An object detailing the calculated decision (action and new limit). |
nala.athomic.resilience.adaptive_throttling.providers.adaptive_provider.AdaptiveRateLimiterProvider
Bases: RateLimiterProtocol
A Rate Limiter Provider that dynamically adjusts limits based on system health.
This provider acts as a decorator, wrapping a base rate limiter implementation.
Before enforcing a limit, it queries an AdaptiveStateStore for a
potentially more restrictive dynamic limit calculated by a separate
Decision Engine. It then applies the effective limit using the base provider.
Attributes:
| Name | Type | Description |
|---|---|---|
base_provider |
RateLimiterProtocol
|
The underlying implementation for limit enforcement. |
config |
RateLimiterSettings
|
The application's rate limiter configuration. |
state_store |
AdaptiveStateStore
|
The store used to fetch current dynamic limits. |
__init__(config, base_provider, state_store)
Initializes the AdaptiveRateLimiterProvider.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
RateLimiterSettings
|
The application's RateLimiterSettings object. |
required |
base_provider
|
RateLimiterProtocol
|
The underlying implementation responsible for actual limit enforcement. |
required |
state_store
|
AdaptiveStateStore
|
The store used to fetch current dynamic limits. |
required |
allow(key, rate, policy=None)
async
Checks if the request is allowed based on the dynamically adjusted limit.
The method determines the effective limit (dynamic or configured) and delegates the final enforcement check to the base provider.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
key
|
str
|
The identifier being rate limited. |
required |
rate
|
str
|
The configured rate limit string (default/maximum limit). |
required |
policy
|
Optional[str]
|
The policy name used to look up the dynamic limit. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
bool |
bool
|
True if allowed, False otherwise. |
clear(key, rate)
async
Clears rate limit counters in the base provider for the specific key and rate.
Note: This operation only clears the counter state managed by the base provider's storage. Clearing the dynamic limit itself is the responsibility of the Decision Engine during the recovery cycle.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
key
|
str
|
The identifier whose counters should be cleared. |
required |
rate
|
str
|
The rate limit rule associated with the key. |
required |
get_current_usage(key, rate)
async
Gets the current usage count from the base provider based on the provided rate string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
key
|
str
|
The identifier being checked. |
required |
rate
|
str
|
The rate limit string rule to check usage against. |
required |
Returns:
| Type | Description |
|---|---|
Optional[int]
|
Optional[int]: The current usage count, or None if the operation fails. |
reset()
async
Resets ALL rate limit counters in the base provider's storage.
WARNING: This operation is potentially global and does NOT automatically
clear dynamic limits in the AdaptiveStateStore.
Raises:
| Type | Description |
|---|---|
Exception
|
Propagates any error from the base provider's reset attempt. |