Health & Readiness Checks
Overview
The Health & Readiness module provides a standardized and extensible system for determining if the application is healthy and ready to handle traffic. This is a critical feature for running in orchestrated environments like Kubernetes, which rely on readiness probes to know when to add a service instance to the load balancer.
The framework exposes an HTTP endpoint, typically /readyz, which runs a series of checks against all critical dependencies (databases, message brokers, external APIs) and reports their status.
How It Works
The system is built around a few core components that promote decoupling and extensibility:
-
ReadinessCheckProtocol: A simple contract that any readiness check must follow. It requires a uniquename, anenabled()method, and an asynchronouscheck()method that returnsTruefor healthy orFalsefor unhealthy. -
ReadinessRegistry: A singleton registry where all readiness check instances are registered during application startup. -
ServiceReadinessCheck: A generic and powerful implementation that can check the status of any AthomicBaseService. It automatically integrates with the service lifecycle, so a readiness check for the Kafka consumer, for example, simply querieskafka_consumer.is_ready(). -
/readyzEndpoint: An internal API route that, when called, executes therun_all()method on theReadinessRegistry. This runs all registered checks concurrently and aggregates their results into a single JSON response. The overall HTTP status will be200 OKonly if all enabled checks pass.
How to Add a Custom Readiness Check
You can easily add your own application-specific readiness checks. For example, you might want to check the status of a critical third-party API that your service depends on.
1. Create the Check Class
Create a class that implements the ReadinessCheck protocol.
# In your_app/health_checks.py
from nala.athomic.http import HttpClientFactory
from nala.athomic.observability.health import ReadinessCheck
class ExternalApiServiceCheck(ReadinessCheck):
name = "external_api_status"
def __init__(self):
# Get a pre-configured HTTP client from the factory
self.http_client = HttpClientFactory.create("my_external_api_client")
def enabled(self) -> bool:
# The check is enabled if the client itself is enabled in the config
return self.http_client.is_enabled()
async def check(self) -> bool:
try:
# Perform a lightweight check, like a HEAD request or a health endpoint call
response = await self.http_client.get("/_health")
return response.status_code == 200
except Exception:
return False
2. Register the Check
In your application's startup sequence (e.g., domain_initializers.py), instantiate your check and register it.
# In your_app/startup/domain_initializers.py
from nala.athomic.observability.health import readiness_registry
from your_app.health_checks import ExternalApiServiceCheck
def register_domain_services():
# ... other registrations ...
# Register your custom health check
readiness_registry.register(ExternalApiServiceCheck())
Your custom check will now be automatically executed and reported by the /readyz endpoint.
Example Response
A call to the /readyz endpoint will return a JSON response detailing the status of each check.
{
"status": "unhealthy",
"checks": {
"consul_client": "ok",
"database_connection_manager": "ok",
"kafka_consumer_my_app.events.v1": "ok",
"external_api_status": "fail"
}
}
API Reference
nala.athomic.observability.health.protocol.ReadinessCheck
Bases: Protocol
Defines the contract for an individual readiness check implementation.
Any class that implements this protocol can be registered with the ReadinessRegistry to contribute to the overall application readiness state.
name
instance-attribute
A unique, descriptive name for the check (e.g., 'database_connection').
check()
async
Performs the asynchronous check of the dependency or resource.
This method must be lightweight and fast to avoid delaying the readiness probe.
Returns:
| Name | Type | Description |
|---|---|---|
bool |
bool
|
True if the resource is healthy (ready), False otherwise. |
enabled()
Determines if the check should be executed based on configuration or runtime environment.
Returns:
| Name | Type | Description |
|---|---|---|
bool |
bool
|
True if the check should run, False otherwise. |
nala.athomic.observability.health.registry.ReadinessRegistry
A registry responsible for collecting and orchestrating all application readiness checks.
This acts as a centralized source of truth for determining if the application and its core dependencies (databases, message brokers, external services) are fully initialized and ready to handle live traffic.
__init__()
Initializes the internal dictionary to store readiness checks, mapped by name.
register(check)
Registers a new ReadinessCheck implementation with the registry.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
check
|
ReadinessCheck
|
An instance of a ReadinessCheck protocol implementation. |
required |
run_all()
async
Executes all registered readiness checks asynchronously.
It respects the enabled() status of each check and handles
exceptions during execution by marking the check as failed.
Returns:
| Type | Description |
|---|---|
Dict[str, str]
|
Dict[str, str]: A dictionary containing the name of each check and its resulting status: 'ok', 'fail', or 'skipped'. |
nala.athomic.observability.health.checks.service_check.ServiceReadinessCheck
Bases: ReadinessCheck
A generic readiness check implementation that verifies the health and readiness state of any core Athomic service implementing the BaseServiceProtocol.
This check is a crucial part of the Dependency Inversion Principle, allowing the health system to query service status without knowing the service's internal implementation details.
__init__(service)
Initializes the check by injecting the service instance to be monitored.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
service
|
BaseServiceProtocol
|
The service instance (e.g., OutboxPublisher, HttpClient) whose readiness state will be checked. |
required |
check()
async
Checks if the service is ready to operate (e.g., connected to its dependencies and initialized).
Delegates the call directly to the service's is_ready() method.
enabled()
Checks if the underlying service is enabled based on its configuration.
Delegates the call directly to the service's is_enabled() method.