Health & Readiness Checks

Overview

The Health & Readiness module provides a standardized and extensible system for determining if the application is healthy and ready to handle traffic. This is a critical feature for running in orchestrated environments like Kubernetes, which rely on readiness probes to know when to add a service instance to the load balancer.

The framework exposes an HTTP endpoint, typically /readyz, which runs a series of checks against all critical dependencies (databases, message brokers, external APIs) and reports their status.

How It Works

The system is built around a few core components that promote decoupling and extensibility:

ReadinessCheck Protocol: A simple contract that any readiness check must follow. It requires a unique name, an enabled() method, and an asynchronous check() method that returns True for healthy or False for unhealthy.
ReadinessRegistry: A singleton registry where all readiness check instances are registered during application startup.
ServiceReadinessCheck: A generic and powerful implementation that can check the status of any Athomic BaseService. It automatically integrates with the service lifecycle, so a readiness check for the Kafka consumer, for example, simply queries kafka_consumer.is_ready().
/readyz Endpoint: An internal API route that, when called, executes the run_all() method on the ReadinessRegistry. This runs all registered checks concurrently and aggregates their results into a single JSON response. The overall HTTP status will be 200 OK only if all enabled checks pass.

How to Add a Custom Readiness Check

You can easily add your own application-specific readiness checks. For example, you might want to check the status of a critical third-party API that your service depends on.

1. Create the Check Class

Create a class that implements the ReadinessCheck protocol.

# In your_app/health_checks.py
from nala.athomic.http import HttpClientFactory
from nala.athomic.observability.health import ReadinessCheck

class ExternalApiServiceCheck(ReadinessCheck):
    name = "external_api_status"

    def __init__(self):
        # Get a pre-configured HTTP client from the factory
        self.http_client = HttpClientFactory.create("my_external_api_client")

    def enabled(self) -> bool:
        # The check is enabled if the client itself is enabled in the config
        return self.http_client.is_enabled()

    async def check(self) -> bool:
        try:
            # Perform a lightweight check, like a HEAD request or a health endpoint call
            response = await self.http_client.get("/_health")
            return response.status_code == 200
        except Exception:
            return False

2. Register the Check

In your application's startup sequence (e.g., domain_initializers.py), instantiate your check and register it.

# In your_app/startup/domain_initializers.py
from nala.athomic.observability.health import readiness_registry
from your_app.health_checks import ExternalApiServiceCheck

def register_domain_services():
    # ... other registrations ...

    # Register your custom health check
    readiness_registry.register(ExternalApiServiceCheck())

Your custom check will now be automatically executed and reported by the /readyz endpoint.

Example Response

A call to the /readyz endpoint will return a JSON response detailing the status of each check.

{
  "status": "unhealthy",
  "checks": {
    "consul_client": "ok",
    "database_connection_manager": "ok",
    "kafka_consumer_my_app.events.v1": "ok",
    "external_api_status": "fail"
  }
}

API Reference

`nala.athomic.observability.health.protocol.ReadinessCheck`

Bases: Protocol

Defines the contract for an individual readiness check implementation.

Any class that implements this protocol can be registered with the ReadinessRegistry to contribute to the overall application readiness state.

`name` `instance-attribute`

A unique, descriptive name for the check (e.g., 'database_connection').

`check()` `async`

Performs the asynchronous check of the dependency or resource.

This method must be lightweight and fast to avoid delaying the readiness probe.

Returns:

Name	Type	Description
`bool`	`bool`	True if the resource is healthy (ready), False otherwise.

`enabled()`

Determines if the check should be executed based on configuration or runtime environment.

Returns:

Name	Type	Description
`bool`	`bool`	True if the check should run, False otherwise.

`nala.athomic.observability.health.registry.ReadinessRegistry`

A registry responsible for collecting and orchestrating all application readiness checks.

This acts as a centralized source of truth for determining if the application and its core dependencies (databases, message brokers, external services) are fully initialized and ready to handle live traffic.

`init()`

Initializes the internal dictionary to store readiness checks, mapped by name.

`register(check)`

Registers a new ReadinessCheck implementation with the registry.

Parameters:

Name	Type	Description	Default
`check`	`ReadinessCheck`	An instance of a ReadinessCheck protocol implementation.	required

`run_all()` `async`

Executes all registered readiness checks asynchronously.

It respects the enabled() status of each check and handles exceptions during execution by marking the check as failed.

Returns:

Type	Description
`Dict[str, str]`	Dict[str, str]: A dictionary containing the name of each check and its resulting status: 'ok', 'fail', or 'skipped'.

`nala.athomic.observability.health.checks.service_check.ServiceReadinessCheck`

Bases: ReadinessCheck

A generic readiness check implementation that verifies the health and readiness state of any core Athomic service implementing the BaseServiceProtocol.

This check is a crucial part of the Dependency Inversion Principle, allowing the health system to query service status without knowing the service's internal implementation details.

`init(service)`

Initializes the check by injecting the service instance to be monitored.

Parameters:

Name	Type	Description	Default
`service`	`BaseServiceProtocol`	The service instance (e.g., OutboxPublisher, HttpClient) whose readiness state will be checked.	required

`check()` `async`

Checks if the service is ready to operate (e.g., connected to its dependencies and initialized).

Delegates the call directly to the service's is_ready() method.

`enabled()`

Checks if the underlying service is enabled based on its configuration.

Delegates the call directly to the service's is_enabled() method.

Health & Readiness Checks

Overview

How It Works

How to Add a Custom Readiness Check

1. Create the Check Class

2. Register the Check

Example Response

API Reference

nala.athomic.observability.health.protocol.ReadinessCheck

name instance-attribute

check() async

enabled()

nala.athomic.observability.health.registry.ReadinessRegistry

__init__()

register(check)

run_all() async

nala.athomic.observability.health.checks.service_check.ServiceReadinessCheck

__init__(service)

check() async

enabled()

`nala.athomic.observability.health.protocol.ReadinessCheck`

`name` `instance-attribute`

`check()` `async`

`enabled()`

`nala.athomic.observability.health.registry.ReadinessRegistry`

`init()`

`register(check)`

`run_all()` `async`

`nala.athomic.observability.health.checks.service_check.ServiceReadinessCheck`

`init(service)`

`check()` `async`

`enabled()`