Adaptive Throttling

Overview

Adaptive Throttling is a sophisticated, closed-loop resilience pattern that dynamically adjusts rate limits based on the real-time health of downstream services. While a standard rate limiter uses static, pre-configured limits, an adaptive throttler reacts to changing conditions to proactively prevent cascading failures.

For example, if the P99 latency of a downstream service suddenly spikes, or its error rate increases, the adaptive throttling engine will automatically reduce the rate limit of calls to that service, giving it a chance to recover. Once the service's health metrics return to normal, the throttler will gradually relax the limit back to its configured maximum.

This creates a self-regulating system that is far more resilient to partial outages and performance degradation than static rate limiting alone.

How It Works: The Feedback Loop

The system is orchestrated by the AdaptiveThrottlingService, a background service that runs a continuous feedback loop:

Monitor: A MetricsFetcher periodically queries a monitoring system (like Prometheus) for key health indicators of downstream services. These are defined by you as PromQL queries in the configuration.
Decide: The fetched metrics (e.g., latency_p99, error_rate_percent) are passed to a DecisionAlgorithm. The algorithm compares these real-time values against healthy thresholds defined in your configuration.
Adjust:
- If a threshold is breached, the algorithm calculates a new, more restrictive rate limit (e.g., reducing the current limit by 20%).
- If the system is healthy, the algorithm gradually increases the rate limit back towards the statically configured maximum.
Store: The newly calculated dynamic limit is stored in a distributed AdaptiveStateStore (e.g., Redis) with a Time-To-Live (TTL).
Enforce: The AdaptiveRateLimiterProvider is configured to wrap the standard rate limiter. When a request is made, it first checks the AdaptiveStateStore for a dynamic limit. If one exists, it is enforced. Otherwise, the static limit from the configuration is used.

This cycle repeats continuously, allowing the system to autonomously adapt to the real-time health of its dependencies.

Configuration

Adaptive Throttling is a powerful feature that requires careful configuration of its two main parts: the rate limiter itself, and the adaptive engine that controls it.

1. Enable the Adaptive Rate Limiter Provider

First, in your [resilience.rate_limiter] section, you must set the backend to "adaptive". This tells the RateLimiterFactory to create the AdaptiveRateLimiterProvider, which wraps your primary enforcement provider (like limits).

[default.resilience.rate_limiter]
# Enable the adaptive provider as the main backend
backend = "adaptive"

  # The adaptive provider wraps another provider. Configure the base provider here.
  [default.resilience.rate_limiter.provider]
  backend = "limits"
  storage_backend = "redis"
  redis_storage_uri = "redis://localhost:6379/4"
  strategy = "moving-window"

  # Your static policies still act as the MAXIMUM ceiling for the adaptive limits.
  [default.resilience.rate_limiter.policies]
  external_api = "100/minute"

2. Configure the Adaptive Throttling Engine

Next, configure the feedback loop engine in the [resilience.adaptive_throttling] section.

[default.resilience.adaptive_throttling]
enabled = true
check_interval_seconds = 15 # Run the feedback loop every 15 seconds.

# Tell the engine which rate limit policies it should dynamically adapt.
policies_to_adapt = ["external_api"]

  # --- State Store (where dynamic limits are stored) ---
  state_store_backend = "redis"
  state_store_uri = "redis://localhost:6379/5"
  state_store_ttl_seconds = 300 # Dynamic limits expire after 5 minutes.

  # --- Metrics Fetcher (where to get health data from) ---
  metrics_fetcher_type = "prometheus"
  metrics_fetcher_url = "http://prometheus:9090"

    # Map internal metric names to your actual PromQL queries.
    [default.resilience.adaptive_throttling.prometheus_queries]
    latency_p99 = "histogram_quantile(0.99, sum(rate(http_client_request_duration_seconds_bucket{service_name='http_external_api'}[1m])) by (le))"
    error_rate_percent = "(sum(rate(http_client_requests_total{service_name='http_external_api', status='failure'}[1m])) / sum(rate(http_client_requests_total{service_name='http_external_api'}[1m]))) * 100"

  # --- Decision Algorithm Parameters ---
  decision_algorithm = "threshold"
    [default.resilience.adaptive_throttling.algorithm_params]
    # Reduce limit if latency is over 2000ms
    latency_threshold_ms = 2000.0
    # Reduce limit if error rate is over 10%
    error_threshold_percent = 10.0
    # When breached, reduce the current limit by 25%
    reduction_factor = 0.75
    # When healthy, increase the current limit by 10%
    increase_factor = 1.1

API Reference

`nala.athomic.resilience.adaptive_throttling.service.AdaptiveThrottlingService`

Bases: BaseService

Manages the lifecycle and core logic of the adaptive throttling engine.

This background service periodically fetches system metrics, calculates optimal rate limits based on predefined algorithms and thresholds, and stores the dynamic limits for enforcement by the AdaptiveRateLimiterProvider. It inherits lifecycle management from BaseService.

Attributes:

Name	Type	Description
`adaptive_settings`	`AdaptiveThrottlingSettings`	Specific configuration for the engine.
`rate_limit_settings`	`RateLimiterSettings`	Global rate limiter settings (used as reference).
`state_store`	`AdaptiveStateStore`	Component for storing and fetching dynamic limits.
`metrics_fetcher`	`MetricsFetcher`	Component for gathering system metrics (e.g., from Prometheus).
`decision_algorithm`	`DecisionAlgorithm`	Algorithm for calculating new limits.

`init(settings=None)`

Initializes the AdaptiveThrottlingService.

Parameters:

Name	Type	Description	Default
`settings`	`Optional[AdaptiveThrottlingSettings]`	Configuration settings. If None, loads from global settings.	`None`

`after_stop()` `async`

Hook called after the run loop is stopped to close background components.

`nala.athomic.resilience.adaptive_throttling.protocols.DecisionAlgorithm`

Bases: Protocol

Interface for algorithms that decide the new adaptive rate limit based on current conditions and metrics.

`calculate_new_limit(policy_name, current_configured_limit, current_dynamic_limit, metrics)`

Calculates the new adaptive limit based on provided context and metrics.

Parameters:

Name	Type	Description	Default
`policy_name`	`str`	The name of the policy being adjusted (e.g., "default", "premium").	required
`current_configured_limit`	`str`	The rate limit string defined in the static configuration (RateLimiterSettings) for this policy (or the default). Acts as a ceiling/reference.	required
`current_dynamic_limit`	`Optional[str]`	The currently active dynamic limit string retrieved from the AdaptiveStateStore, if any.	required
`metrics`	`Dict[str, Any]`	The dictionary of metrics retrieved by the MetricsFetcher.	required

Returns:

Name	Type	Description
`AdaptiveDecision`	`AdaptiveDecision`	An object detailing the calculated decision (action and new limit).

`nala.athomic.resilience.adaptive_throttling.providers.adaptive_provider.AdaptiveRateLimiterProvider`

Bases: RateLimiterProtocol

A Rate Limiter Provider that dynamically adjusts limits based on system health.

This provider acts as a decorator, wrapping a base rate limiter implementation. Before enforcing a limit, it queries an AdaptiveStateStore for a potentially more restrictive dynamic limit calculated by a separate Decision Engine. It then applies the effective limit using the base provider.

Attributes:

Name	Type	Description
`base_provider`	`RateLimiterProtocol`	The underlying implementation for limit enforcement.
`config`	`RateLimiterSettings`	The application's rate limiter configuration.
`state_store`	`AdaptiveStateStore`	The store used to fetch current dynamic limits.

`init(config, base_provider, state_store)`

Initializes the AdaptiveRateLimiterProvider.

Parameters:

Name	Type	Description	Default
`config`	`RateLimiterSettings`	The application's RateLimiterSettings object.	required
`base_provider`	`RateLimiterProtocol`	The underlying implementation responsible for actual limit enforcement.	required
`state_store`	`AdaptiveStateStore`	The store used to fetch current dynamic limits.	required

`allow(key, rate, policy=None)` `async`

Checks if the request is allowed based on the dynamically adjusted limit.

The method determines the effective limit (dynamic or configured) and delegates the final enforcement check to the base provider.

Parameters:

Name	Type	Description	Default
`key`	`str`	The identifier being rate limited.	required
`rate`	`str`	The configured rate limit string (default/maximum limit).	required
`policy`	`Optional[str]`	The policy name used to look up the dynamic limit.	`None`

Returns:

Name	Type	Description
`bool`	`bool`	True if allowed, False otherwise.

`clear(key, rate)` `async`

Clears rate limit counters in the base provider for the specific key and rate.

Note: This operation only clears the counter state managed by the base provider's storage. Clearing the dynamic limit itself is the responsibility of the Decision Engine during the recovery cycle.

Parameters:

Name	Type	Description	Default
`key`	`str`	The identifier whose counters should be cleared.	required
`rate`	`str`	The rate limit rule associated with the key.	required

`get_current_usage(key, rate)` `async`

Gets the current usage count from the base provider based on the provided rate string.

Parameters:

Name	Type	Description	Default
`key`	`str`	The identifier being checked.	required
`rate`	`str`	The rate limit string rule to check usage against.	required

Returns:

Type	Description
`Optional[int]`	Optional[int]: The current usage count, or None if the operation fails.

`reset()` `async`

Resets ALL rate limit counters in the base provider's storage.

WARNING: This operation is potentially global and does NOT automatically clear dynamic limits in the AdaptiveStateStore.

Raises:

Type	Description
`Exception`	Propagates any error from the base provider's reset attempt.

Adaptive Throttling

Overview

How It Works: The Feedback Loop

Configuration

1. Enable the Adaptive Rate Limiter Provider

2. Configure the Adaptive Throttling Engine

API Reference

nala.athomic.resilience.adaptive_throttling.service.AdaptiveThrottlingService

__init__(settings=None)

after_stop() async

nala.athomic.resilience.adaptive_throttling.protocols.DecisionAlgorithm

calculate_new_limit(policy_name, current_configured_limit, current_dynamic_limit, metrics)

nala.athomic.resilience.adaptive_throttling.providers.adaptive_provider.AdaptiveRateLimiterProvider

__init__(config, base_provider, state_store)

allow(key, rate, policy=None) async

clear(key, rate) async

get_current_usage(key, rate) async

reset() async

`nala.athomic.resilience.adaptive_throttling.service.AdaptiveThrottlingService`

`init(settings=None)`

`after_stop()` `async`

`nala.athomic.resilience.adaptive_throttling.protocols.DecisionAlgorithm`

`calculate_new_limit(policy_name, current_configured_limit, current_dynamic_limit, metrics)`

`nala.athomic.resilience.adaptive_throttling.providers.adaptive_provider.AdaptiveRateLimiterProvider`

`init(config, base_provider, state_store)`

`allow(key, rate, policy=None)` `async`

`clear(key, rate)` `async`

`get_current_usage(key, rate)` `async`

`reset()` `async`