Large Language Models (LLM)
Overview
The athomic.ai.llm module provides a unified, provider-agnostic interface for interacting with Large Language Models (LLMs) such as OpenAI (GPT-4) and Google Vertex AI (Gemini).
It is designed to solve common challenges in production AI applications:
- Vendor Lock-in: Switch providers via configuration without changing code.
- Real-Time Streaming: Native support for asynchronous token streaming with standardized chunks.
- Structured Data: Reliably extract Pydantic models from LLM outputs using a standardized API (
generate_structured). - Observability: Automatic tracing (OpenTelemetry) and metrics (Prometheus) for every call (unary or streaming).
- Embedded Governance: Enforce rate limits and safety checks directly within the provider pipeline.
Core Concepts
LLMProviderProtocol
The contract that all LLM providers must implement. It defines three primary operations:
generate_content(prompt, ...): Generates unstructured text (blocking).stream_content(prompt, ...): Generates unstructured text incrementally (async iterator).generate_structured(prompt, response_model, ...): Generates a structured object adhering to a specific Pydantic schema.
BaseLLM & Governance
The abstract base class now handles AI Governance automatically: 1. Input Guards: Executed before the request is sent to the provider. These are blocking (e.g., Rate Limiting, Prompt Injection check). 2. Output Guards: Executed after the response is received. For streaming, these currently act in audit mode.
Usage Example
Basic Text Generation (Blocking)
from nala.athomic.ai.llm.factory import LLMFactory
from nala.athomic.config import get_settings
async def chat_with_ai(user_input: str):
# 1. Create the provider from global settings
settings = get_settings().ai.llm.connections["default"]
llm = LLMFactory.create(settings)
# 2. Generate content (waits for full completion)
response = await llm.generate_content(
prompt=user_input,
system_message="You are a helpful assistant."
)
return response.content
Real-Time Streaming
import sys
from nala.athomic.ai.llm.factory import LLMFactory
async def stream_chat(user_input: str):
llm = LLMFactory.create_default()
# stream_content yields LLMResponseChunk objects
async for chunk in llm.stream_content(prompt=user_input):
# 1. Process incremental text (Token Delta)
if chunk.content_delta:
sys.stdout.write(chunk.content_delta)
sys.stdout.flush()
# 2. Handle metadata on finish (Usage, Stop Reason)
if chunk.is_final:
print(f"n[Meta] Finished. Tokens: {chunk.usage.total_tokens}")
Structured Output (JSON Extraction)
from pydantic import BaseModel, Field
from nala.athomic.ai.llm.factory import LLMFactory
class UserIntent(BaseModel):
intent: str = Field(..., description="The user's intention (buy, sell, support)")
confidence: float
async def analyze_intent(text: str):
llm = LLMFactory.create_default()
# Guaranteed to return a UserIntent instance or raise StructuredOutputError
intent: UserIntent = await llm.generate_structured(
prompt=f"Analyze this text: {text}",
response_model=UserIntent
)
return intent
Configuration
LLM connections are configured in settings.toml.
[default.ai.llm]
# The default connection to use
default_connection_name = "gpt4_main"
[default.ai.llm.connections.gpt4_main]
backend = "openai"
default_model = "gpt-4-turbo"
timeout = 30.0
[default.ai.llm.connections.gpt4_main.provider]
# OpenAI specific settings
api_key = { path = "ai/openai", key = "api_key" }
organization_id = "org-123"
API Reference
nala.athomic.ai.llm.protocol.LLMProviderProtocol
Bases: Protocol
Protocol defining the contract for Large Language Model (LLM) interactions. Focused on text generation and structured data extraction.
generate_content(prompt=None, system_message=None, **kwargs)
async
Generates unstructured text content based on a given prompt.
generate_structured(prompt, response_model, system_message=None, max_retries=3, **kwargs)
async
Generates structured data conforming to a specific Pydantic schema.
stream_content(prompt=None, system_message=None, **kwargs)
async
Generates unstructured text content incrementally as an asynchronous stream.
This method is critical for low-latency user experiences (UX).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
prompt
|
Optional[str]
|
The user input text. |
None
|
system_message
|
Optional[str]
|
Optional system instruction. |
None
|
**kwargs
|
Any
|
Additional generation parameters. |
{}
|
Returns:
| Type | Description |
|---|---|
AsyncIterator[LLMResponseChunk]
|
An asynchronous iterator yielding LLMResponseChunk objects. |
nala.athomic.ai.llm.base.BaseLLM
Bases: BaseService, LLMProviderProtocol, Generic[S], ABC
Abstract base class for all LLM providers (Vertex, OpenAI, etc.).
This class implements the Template Method pattern to enforce: 1. Lifecycle Management (Startup/Shutdown via BaseService). 2. Unified Observability (Tracing, Metrics, Logging). 3. Error Handling boundaries.
Note: Governance (Guards) is handled by the LLMManager/GuardPipeline before calling this provider.
generate(prompt=None, system_message=None, tools=None, **kwargs)
async
Public method for generation (Text or Tool Calls) with observability.
generate_structured(prompt, response_model, system_message=None, max_retries=3, **kwargs)
async
Public method for structured generation with observability.
record_token_usage(prompt_tokens, completion_tokens)
Helper method to record token usage metrics to Prometheus.
stream_content(prompt=None, system_message=None, tools=None, **kwargs)
async
Public method for streamed generation (Text or Tool Calls) with observability. The LLM's response is yielded incrementally.
nala.athomic.ai.schemas.llms.LLMResponseChunk
Bases: BaseModel
Represents a single, incremental chunk of the response during streaming.
nala.athomic.ai.llm.factory.LLMFactory
Factory responsible for creating instances of the configured LLM provider.
create(settings)
classmethod
Creates and returns an instance of the LLM provider based on the settings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
settings
|
LLMConnectionSettings
|
The specific connection settings containing the backend type and provider-specific configurations. |
required |
Returns:
| Type | Description |
|---|---|
BaseLLM
|
An initialized instance adhering to BaseLLM. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the backend is not specified or not registered. |