Skip to content

Health & Readiness Checks

Overview

The Health & Readiness module provides a standardized and extensible system for determining if the application is healthy and ready to handle traffic. This is a critical feature for running in orchestrated environments like Kubernetes, which rely on readiness probes to know when to add a service instance to the load balancer.

The framework exposes an HTTP endpoint, typically /readyz, which runs a series of checks against all critical dependencies (databases, message brokers, external APIs) and reports their status.


How It Works

The system is built around a few core components that promote decoupling and extensibility:

  1. ReadinessCheck Protocol: A simple contract that any readiness check must follow. It requires a unique name, an enabled() method, and an asynchronous check() method that returns True for healthy or False for unhealthy.

  2. ReadinessRegistry: A singleton registry where all readiness check instances are registered during application startup.

  3. ServiceReadinessCheck: A generic and powerful implementation that can check the status of any Athomic BaseService. It automatically integrates with the service lifecycle, so a readiness check for the Kafka consumer, for example, simply queries kafka_consumer.is_ready().

  4. /readyz Endpoint: An internal API route that, when called, executes the run_all() method on the ReadinessRegistry. This runs all registered checks concurrently and aggregates their results into a single JSON response. The overall HTTP status will be 200 OK only if all enabled checks pass.


How to Add a Custom Readiness Check

You can easily add your own application-specific readiness checks. For example, you might want to check the status of a critical third-party API that your service depends on.

1. Create the Check Class

Create a class that implements the ReadinessCheck protocol.

# In your_app/health_checks.py
from nala.athomic.http import HttpClientFactory
from nala.athomic.observability.health import ReadinessCheck

class ExternalApiServiceCheck(ReadinessCheck):
    name = "external_api_status"

    def __init__(self):
        # Get a pre-configured HTTP client from the factory
        self.http_client = HttpClientFactory.create("my_external_api_client")

    def enabled(self) -> bool:
        # The check is enabled if the client itself is enabled in the config
        return self.http_client.is_enabled()

    async def check(self) -> bool:
        try:
            # Perform a lightweight check, like a HEAD request or a health endpoint call
            response = await self.http_client.get("/_health")
            return response.status_code == 200
        except Exception:
            return False

2. Register the Check

In your application's startup sequence (e.g., domain_initializers.py), instantiate your check and register it.

# In your_app/startup/domain_initializers.py
from nala.athomic.observability.health import readiness_registry
from your_app.health_checks import ExternalApiServiceCheck

def register_domain_services():
    # ... other registrations ...

    # Register your custom health check
    readiness_registry.register(ExternalApiServiceCheck())

Your custom check will now be automatically executed and reported by the /readyz endpoint.


Example Response

A call to the /readyz endpoint will return a JSON response detailing the status of each check.

{
  "status": "unhealthy",
  "checks": {
    "consul_client": "ok",
    "database_connection_manager": "ok",
    "kafka_consumer_my_app.events.v1": "ok",
    "external_api_status": "fail"
  }
}

API Reference

nala.athomic.observability.health.protocol.ReadinessCheck

Bases: Protocol

Defines the contract for an individual readiness check implementation.

Any class that implements this protocol can be registered with the ReadinessRegistry to contribute to the overall application readiness state.

name instance-attribute

A unique, descriptive name for the check (e.g., 'database_connection').

check() async

Performs the asynchronous check of the dependency or resource.

This method must be lightweight and fast to avoid delaying the readiness probe.

Returns:

Name Type Description
bool bool

True if the resource is healthy (ready), False otherwise.

enabled()

Determines if the check should be executed based on configuration or runtime environment.

Returns:

Name Type Description
bool bool

True if the check should run, False otherwise.

nala.athomic.observability.health.registry.ReadinessRegistry

A registry responsible for collecting and orchestrating all application readiness checks.

This acts as a centralized source of truth for determining if the application and its core dependencies (databases, message brokers, external services) are fully initialized and ready to handle live traffic.

__init__()

Initializes the internal dictionary to store readiness checks, mapped by name.

register(check)

Registers a new ReadinessCheck implementation with the registry.

Parameters:

Name Type Description Default
check ReadinessCheck

An instance of a ReadinessCheck protocol implementation.

required

run_all() async

Executes all registered readiness checks asynchronously.

It respects the enabled() status of each check and handles exceptions during execution by marking the check as failed.

Returns:

Type Description
Dict[str, str]

Dict[str, str]: A dictionary containing the name of each check and its resulting status: 'ok', 'fail', or 'skipped'.

nala.athomic.observability.health.checks.service_check.ServiceReadinessCheck

Bases: ReadinessCheck

A generic readiness check implementation that verifies the health and readiness state of any core Athomic service implementing the BaseServiceProtocol.

This check is a crucial part of the Dependency Inversion Principle, allowing the health system to query service status without knowing the service's internal implementation details.

__init__(service)

Initializes the check by injecting the service instance to be monitored.

Parameters:

Name Type Description Default
service BaseServiceProtocol

The service instance (e.g., OutboxPublisher, HttpClient) whose readiness state will be checked.

required

check() async

Checks if the service is ready to operate (e.g., connected to its dependencies and initialized).

Delegates the call directly to the service's is_ready() method.

enabled()

Checks if the underlying service is enabled based on its configuration.

Delegates the call directly to the service's is_enabled() method.