Skip to content

Large Language Models (LLM)

Overview

The athomic.ai.llm module provides a unified, provider-agnostic interface for interacting with Large Language Models (LLMs) such as OpenAI (GPT-4) and Google Vertex AI (Gemini). It is designed to solve common challenges in production AI applications:

  • Vendor Lock-in: Switch providers via configuration without changing code.
  • Real-Time Streaming: Native support for asynchronous token streaming with standardized chunks.
  • Structured Data: Reliably extract Pydantic models from LLM outputs using a standardized API (generate_structured).
  • Observability: Automatic tracing (OpenTelemetry) and metrics (Prometheus) for every call (unary or streaming).
  • Embedded Governance: Enforce rate limits and safety checks directly within the provider pipeline.

Core Concepts

LLMProviderProtocol

The contract that all LLM providers must implement. It defines three primary operations:

  • generate_content(prompt, ...): Generates unstructured text (blocking).
  • stream_content(prompt, ...): Generates unstructured text incrementally (async iterator).
  • generate_structured(prompt, response_model, ...): Generates a structured object adhering to a specific Pydantic schema.

BaseLLM & Governance

The abstract base class now handles AI Governance automatically: 1. Input Guards: Executed before the request is sent to the provider. These are blocking (e.g., Rate Limiting, Prompt Injection check). 2. Output Guards: Executed after the response is received. For streaming, these currently act in audit mode.


Usage Example

Basic Text Generation (Blocking)

from nala.athomic.ai.llm.factory import LLMFactory
from nala.athomic.config import get_settings

async def chat_with_ai(user_input: str):
    # 1. Create the provider from global settings
    settings = get_settings().ai.llm.connections["default"]
    llm = LLMFactory.create(settings)

    # 2. Generate content (waits for full completion)
    response = await llm.generate_content(
        prompt=user_input,
        system_message="You are a helpful assistant."
    )
    return response.content

Real-Time Streaming

import sys
from nala.athomic.ai.llm.factory import LLMFactory

async def stream_chat(user_input: str):
    llm = LLMFactory.create_default()

    # stream_content yields LLMResponseChunk objects
    async for chunk in llm.stream_content(prompt=user_input):

        # 1. Process incremental text (Token Delta)
        if chunk.content_delta:
            sys.stdout.write(chunk.content_delta)
            sys.stdout.flush()

        # 2. Handle metadata on finish (Usage, Stop Reason)
        if chunk.is_final:
            print(f"n[Meta] Finished. Tokens: {chunk.usage.total_tokens}")

Structured Output (JSON Extraction)

from pydantic import BaseModel, Field
from nala.athomic.ai.llm.factory import LLMFactory

class UserIntent(BaseModel):
    intent: str = Field(..., description="The user's intention (buy, sell, support)")
    confidence: float

async def analyze_intent(text: str):
    llm = LLMFactory.create_default()

    # Guaranteed to return a UserIntent instance or raise StructuredOutputError
    intent: UserIntent = await llm.generate_structured(
        prompt=f"Analyze this text: {text}",
        response_model=UserIntent
    )

    return intent

Configuration

LLM connections are configured in settings.toml.

[default.ai.llm]
# The default connection to use
default_connection_name = "gpt4_main"

  [default.ai.llm.connections.gpt4_main]
  backend = "openai"
  default_model = "gpt-4-turbo"
  timeout = 30.0

    [default.ai.llm.connections.gpt4_main.provider]
    # OpenAI specific settings
    api_key = { path = "ai/openai", key = "api_key" }
    organization_id = "org-123"

API Reference

nala.athomic.ai.llm.protocol.LLMProviderProtocol

Bases: Protocol

Protocol defining the contract for Large Language Model (LLM) interactions. Focused on text generation and structured data extraction.

generate_content(prompt=None, system_message=None, **kwargs) async

Generates unstructured text content based on a given prompt.

generate_structured(prompt, response_model, system_message=None, max_retries=3, **kwargs) async

Generates structured data conforming to a specific Pydantic schema.

stream_content(prompt=None, system_message=None, **kwargs) async

Generates unstructured text content incrementally as an asynchronous stream.

This method is critical for low-latency user experiences (UX).

Parameters:

Name Type Description Default
prompt Optional[str]

The user input text.

None
system_message Optional[str]

Optional system instruction.

None
**kwargs Any

Additional generation parameters.

{}

Returns:

Type Description
AsyncIterator[LLMResponseChunk]

An asynchronous iterator yielding LLMResponseChunk objects.

nala.athomic.ai.llm.base.BaseLLM

Bases: BaseService, LLMProviderProtocol, Generic[S], ABC

Abstract base class for all LLM providers (Vertex, OpenAI, etc.).

This class implements the Template Method pattern to enforce: 1. Lifecycle Management (Startup/Shutdown via BaseService). 2. Unified Observability (Tracing, Metrics, Logging). 3. Error Handling boundaries.

Note: Governance (Guards) is handled by the LLMManager/GuardPipeline before calling this provider.

generate(prompt=None, system_message=None, tools=None, **kwargs) async

Public method for generation (Text or Tool Calls) with observability.

generate_structured(prompt, response_model, system_message=None, max_retries=3, **kwargs) async

Public method for structured generation with observability.

record_token_usage(prompt_tokens, completion_tokens)

Helper method to record token usage metrics to Prometheus.

stream_content(prompt=None, system_message=None, tools=None, **kwargs) async

Public method for streamed generation (Text or Tool Calls) with observability. The LLM's response is yielded incrementally.

nala.athomic.ai.schemas.llms.LLMResponseChunk

Bases: BaseModel

Represents a single, incremental chunk of the response during streaming.

nala.athomic.ai.llm.factory.LLMFactory

Factory responsible for creating instances of the configured LLM provider.

create(settings) classmethod

Creates and returns an instance of the LLM provider based on the settings.

Parameters:

Name Type Description Default
settings LLMConnectionSettings

The specific connection settings containing the backend type and provider-specific configurations.

required

Returns:

Type Description
BaseLLM

An initialized instance adhering to BaseLLM.

Raises:

Type Description
ValueError

If the backend is not specified or not registered.