Skip to content

Adaptive Throttling

Overview

Adaptive Throttling is a sophisticated, closed-loop resilience pattern that dynamically adjusts rate limits based on the real-time health of downstream services. While a standard rate limiter uses static, pre-configured limits, an adaptive throttler reacts to changing conditions to proactively prevent cascading failures.

For example, if the P99 latency of a downstream service suddenly spikes, or its error rate increases, the adaptive throttling engine will automatically reduce the rate limit of calls to that service, giving it a chance to recover. Once the service's health metrics return to normal, the throttler will gradually relax the limit back to its configured maximum.

This creates a self-regulating system that is far more resilient to partial outages and performance degradation than static rate limiting alone.


How It Works: The Feedback Loop

The system is orchestrated by the AdaptiveThrottlingService, a background service that runs a continuous feedback loop:

  1. Monitor: A MetricsFetcher periodically queries a monitoring system (like Prometheus) for key health indicators of downstream services. These are defined by you as PromQL queries in the configuration.

  2. Decide: The fetched metrics (e.g., latency_p99, error_rate_percent) are passed to a DecisionAlgorithm. The algorithm compares these real-time values against healthy thresholds defined in your configuration.

  3. Adjust:

    • If a threshold is breached, the algorithm calculates a new, more restrictive rate limit (e.g., reducing the current limit by 20%).
    • If the system is healthy, the algorithm gradually increases the rate limit back towards the statically configured maximum.
  4. Store: The newly calculated dynamic limit is stored in a distributed AdaptiveStateStore (e.g., Redis) with a Time-To-Live (TTL).

  5. Enforce: The AdaptiveRateLimiterProvider is configured to wrap the standard rate limiter. When a request is made, it first checks the AdaptiveStateStore for a dynamic limit. If one exists, it is enforced. Otherwise, the static limit from the configuration is used.

This cycle repeats continuously, allowing the system to autonomously adapt to the real-time health of its dependencies.


Configuration

Adaptive Throttling is a powerful feature that requires careful configuration of its two main parts: the rate limiter itself, and the adaptive engine that controls it.

1. Enable the Adaptive Rate Limiter Provider

First, in your [resilience.rate_limiter] section, you must set the backend to "adaptive". This tells the RateLimiterFactory to create the AdaptiveRateLimiterProvider, which wraps your primary enforcement provider (like limits).

[default.resilience.rate_limiter]
# Enable the adaptive provider as the main backend
backend = "adaptive"

  # The adaptive provider wraps another provider. Configure the base provider here.
  [default.resilience.rate_limiter.provider]
  backend = "limits"
  storage_backend = "redis"
  redis_storage_uri = "redis://localhost:6379/4"
  strategy = "moving-window"

  # Your static policies still act as the MAXIMUM ceiling for the adaptive limits.
  [default.resilience.rate_limiter.policies]
  external_api = "100/minute"

2. Configure the Adaptive Throttling Engine

Next, configure the feedback loop engine in the [resilience.adaptive_throttling] section.

[default.resilience.adaptive_throttling]
enabled = true
check_interval_seconds = 15 # Run the feedback loop every 15 seconds.

# Tell the engine which rate limit policies it should dynamically adapt.
policies_to_adapt = ["external_api"]

  # --- State Store (where dynamic limits are stored) ---
  state_store_backend = "redis"
  state_store_uri = "redis://localhost:6379/5"
  state_store_ttl_seconds = 300 # Dynamic limits expire after 5 minutes.

  # --- Metrics Fetcher (where to get health data from) ---
  metrics_fetcher_type = "prometheus"
  metrics_fetcher_url = "http://prometheus:9090"

    # Map internal metric names to your actual PromQL queries.
    [default.resilience.adaptive_throttling.prometheus_queries]
    latency_p99 = "histogram_quantile(0.99, sum(rate(http_client_request_duration_seconds_bucket{service_name='http_external_api'}[1m])) by (le))"
    error_rate_percent = "(sum(rate(http_client_requests_total{service_name='http_external_api', status='failure'}[1m])) / sum(rate(http_client_requests_total{service_name='http_external_api'}[1m]))) * 100"

  # --- Decision Algorithm Parameters ---
  decision_algorithm = "threshold"
    [default.resilience.adaptive_throttling.algorithm_params]
    # Reduce limit if latency is over 2000ms
    latency_threshold_ms = 2000.0
    # Reduce limit if error rate is over 10%
    error_threshold_percent = 10.0
    # When breached, reduce the current limit by 25%
    reduction_factor = 0.75
    # When healthy, increase the current limit by 10%
    increase_factor = 1.1

API Reference

nala.athomic.resilience.adaptive_throttling.service.AdaptiveThrottlingService

Bases: BaseService

Manages the lifecycle and core logic of the adaptive throttling engine.

This background service periodically fetches system metrics, calculates optimal rate limits based on predefined algorithms and thresholds, and stores the dynamic limits for enforcement by the AdaptiveRateLimiterProvider. It inherits lifecycle management from BaseService.

Attributes:

Name Type Description
adaptive_settings AdaptiveThrottlingSettings

Specific configuration for the engine.

rate_limit_settings RateLimiterSettings

Global rate limiter settings (used as reference).

state_store AdaptiveStateStore

Component for storing and fetching dynamic limits.

metrics_fetcher MetricsFetcher

Component for gathering system metrics (e.g., from Prometheus).

decision_algorithm DecisionAlgorithm

Algorithm for calculating new limits.

__init__(settings=None)

Initializes the AdaptiveThrottlingService.

Parameters:

Name Type Description Default
settings Optional[AdaptiveThrottlingSettings]

Configuration settings. If None, loads from global settings.

None

after_stop() async

Hook called after the run loop is stopped to close background components.

nala.athomic.resilience.adaptive_throttling.protocols.DecisionAlgorithm

Bases: Protocol

Interface for algorithms that decide the new adaptive rate limit based on current conditions and metrics.

calculate_new_limit(policy_name, current_configured_limit, current_dynamic_limit, metrics)

Calculates the new adaptive limit based on provided context and metrics.

Parameters:

Name Type Description Default
policy_name str

The name of the policy being adjusted (e.g., "default", "premium").

required
current_configured_limit str

The rate limit string defined in the static configuration (RateLimiterSettings) for this policy (or the default). Acts as a ceiling/reference.

required
current_dynamic_limit Optional[str]

The currently active dynamic limit string retrieved from the AdaptiveStateStore, if any.

required
metrics Dict[str, Any]

The dictionary of metrics retrieved by the MetricsFetcher.

required

Returns:

Name Type Description
AdaptiveDecision AdaptiveDecision

An object detailing the calculated decision (action and new limit).

nala.athomic.resilience.adaptive_throttling.providers.adaptive_provider.AdaptiveRateLimiterProvider

Bases: RateLimiterProtocol

A Rate Limiter Provider that dynamically adjusts limits based on system health.

This provider acts as a decorator, wrapping a base rate limiter implementation. Before enforcing a limit, it queries an AdaptiveStateStore for a potentially more restrictive dynamic limit calculated by a separate Decision Engine. It then applies the effective limit using the base provider.

Attributes:

Name Type Description
base_provider RateLimiterProtocol

The underlying implementation for limit enforcement.

config RateLimiterSettings

The application's rate limiter configuration.

state_store AdaptiveStateStore

The store used to fetch current dynamic limits.

__init__(config, base_provider, state_store)

Initializes the AdaptiveRateLimiterProvider.

Parameters:

Name Type Description Default
config RateLimiterSettings

The application's RateLimiterSettings object.

required
base_provider RateLimiterProtocol

The underlying implementation responsible for actual limit enforcement.

required
state_store AdaptiveStateStore

The store used to fetch current dynamic limits.

required

allow(key, rate, policy=None) async

Checks if the request is allowed based on the dynamically adjusted limit.

The method determines the effective limit (dynamic or configured) and delegates the final enforcement check to the base provider.

Parameters:

Name Type Description Default
key str

The identifier being rate limited.

required
rate str

The configured rate limit string (default/maximum limit).

required
policy Optional[str]

The policy name used to look up the dynamic limit.

None

Returns:

Name Type Description
bool bool

True if allowed, False otherwise.

clear(key, rate) async

Clears rate limit counters in the base provider for the specific key and rate.

Note: This operation only clears the counter state managed by the base provider's storage. Clearing the dynamic limit itself is the responsibility of the Decision Engine during the recovery cycle.

Parameters:

Name Type Description Default
key str

The identifier whose counters should be cleared.

required
rate str

The rate limit rule associated with the key.

required

get_current_usage(key, rate) async

Gets the current usage count from the base provider based on the provided rate string.

Parameters:

Name Type Description Default
key str

The identifier being checked.

required
rate str

The rate limit string rule to check usage against.

required

Returns:

Type Description
Optional[int]

Optional[int]: The current usage count, or None if the operation fails.

reset() async

Resets ALL rate limit counters in the base provider's storage.

WARNING: This operation is potentially global and does NOT automatically clear dynamic limits in the AdaptiveStateStore.

Raises:

Type Description
Exception

Propagates any error from the base provider's reset attempt.