LLMRing Python Library

llms.txt

GitHub: https://github.com/juanre/llmring

Python library to talk to OpenAI, Anthropic, Google, and Ollama with a unified interface. Configuration is stored in a version-controlled llmring.lock file (local to each codebase). Models are accessed via aliases that resolve ordered model lists (automatic fallbacks + reasoning-aware budgets).

Modes of Operation

  1. Lockfile-Only: Works completely standalone with just your llmring.lock. No backend required, no logging, no MCP persistence.
  2. With Server: Connect to self-hosted llmring-server for receipts, usage tracking, and MCP persistence.

Installation

uv add llmring

Quick Start

llmring lock init
llmring lock chat  # For intelligent conversational configuration
llmring bind summarizer anthropic:claude-3-5-haiku-20241022
llmring aliases
from llmring import LLMRing, LLMRequest, Message

ring = LLMRing()
request = LLMRequest(
  messages=[Message(role="user", content="Summarize this text")],
  model="summarizer"
)
response = await ring.chat(request)

Use comma-separated model IDs (e.g., anthropic:claude-3-5-haiku-20241022,openai:gpt-4o-mini) to add fallbacks in priority order.

Lockfile

  • Authoritative config; commit to VCS
  • Optional profiles for different environments: dev, staging, prod
  • Pinned registry versions per provider
  • Aliases map to ordered models arrays (first item is primary, rest are fallback)
# Registry version pinning (optional)
# (IDs change as providers publish updates)
[registry_versions]
openai = 186
anthropic = 102

# Default bindings (arrays support automatic fallback)
[[bindings]]
alias = "summarizer"
models = ["anthropic:claude-3-5-haiku-20241022"]

[[bindings]]
alias = "advisor"
models = [
  "openai:gpt-5-2025-08-07",
  "google:gemini-2.5-pro",
  "anthropic:claude-opus-4-1-20250805"
]

[[bindings]]
alias = "balanced"
models = ["anthropic:claude-3-5-sonnet-20250219", "openai:gpt-4o"]

LLMRing resolves aliases to the first available model in the list; subsequent entries are tried automatically if the primary model is unavailable or rate-limited.

Reasoning Models

Version 1.1 introduces reasoning-aware orchestration. The registry tags reasoning-capable models (OpenAI o1/o3/gpt-5*, Anthropic Opus 4.1, etc.) with is_reasoning_model plus a min_recommended_reasoning_tokens hint. The client uses that metadata to choose the right API parameters automatically.

from llmring import LLMRing, LLMRequest, Message

ring = LLMRing()

request = LLMRequest(
    model="advisor",                # Alias backed by reasoning models
    messages=[Message(role="user", content="Draft a migration plan")],
    reasoning_tokens=3500,          # Optional override (otherwise uses registry hint)
    max_tokens=800                  # Applied as max_completion_tokens for OpenAI reasoning models
)

response = await ring.chat(request)
print(response.usage)
  • Skip reasoning_tokens to accept the registry recommendation for each model.
  • OpenAI reasoning requests automatically switch to max_completion_tokens; non-reasoning models continue using max_tokens.
  • Usage and receipts include the additional reasoning budget so cost reports stay accurate.

Cost Breakdown & Prompt Caching

Version 1.2 upgrades cost tracking with per-feature breakdowns:

  • Cache reads via cache_read_input_tokens (or cached_tokens) are billed with provider-specific discounted rates.
  • Cache writes capture both 5-minute and 1-hour TTL buckets (cache_creation_5m_tokens, cache_creation_1h_tokens) plus any generic totals (cache_creation_input_tokens).
  • Long-context tiers automatically kick in once prompt_tokens - cache_read_input_tokens exceeds the registry’s long_context_threshold_tokens.
  • Thinking tokens use supports_thinking and dollars_per_million_tokens_output_thinking to cost reasoning output separately.

CostCalculator.add_cost_to_response now injects a cost_breakdown dict into every response so usage logs, receipts, and dashboards can explain spend at a glance.

CLI

llmring lock init [--force]
llmring lock chat                      # conversational configuration
llmring bind <alias> <provider:model[,provider:model...]> [--profile <name>]
llmring aliases [--profile <name>]
llmring lock validate
llmring lock bump-registry
llmring list [--provider <name>]
llmring info <provider:model> [--json]
llmring stats|export                   # requires server
llmring mcp chat [--server URL]        # MCP interactive chat
llmring mcp servers list               # list MCP servers
llmring mcp tools                      # list MCP tools

CLI Output

llmring --help
usage: cli.py [-h] {lock,bind,aliases,list,chat,info,providers,stats,export,cache,receipts,register} ... LLMRing - Unified LLM Service CLI with Profile Support Profiles allow environment-specific configurations (dev, prod, test). Use --profile flag or set LLMRING_PROFILE environment variable. positional arguments: {lock,bind,aliases,list,chat,info,providers,stats,export,cache,receipts,register} Commands lock Lockfile management bind Bind an alias to model(s) with fallback support aliases List aliases from lockfile list List available models chat Send a chat message info Show model information providers List configured providers stats Show usage statistics export Export receipts to file cache Registry cache management receipts On-demand receipt generation and management register Register with LLMRing server (for SaaS features) options: -h, --help show this help message and exit
llmring providers
Configured Providers: ---------------------------------------- ✓ openai OPENAI_API_KEY ✓ anthropic ANTHROPIC_API_KEY ✓ google GOOGLE_API_KEY or GEMINI_API_KEY ✓ ollama (not required)
llmring list
Available Models: ---------------------------------------- ANTHROPIC: - claude-opus-4-1-20250805 - claude-opus-4-20250514 - claude-sonnet-4-20250514 - claude-3-7-sonnet-20250219 - claude-3-5-haiku-20241022 - claude-3-haiku-20240307 - claude-sonnet-4-5-20250929 - claude-haiku-4-5-20251001 OPENAI: - gpt-4.1 - gpt-5-nano - o3-deep-research - o4-mini-deep-research - gpt-5-mini-2025-08-07 - gpt-5-2025-08-07 - gpt-5-chat-latest - gpt-4.1-mini - gpt-4.1-nano - gpt-4o - gpt-4o-2024-05-13 - gpt-4o-mini - gpt-realtime - gpt-4o-realtime-preview - gpt-4o-mini-realtime-preview - gpt-audio - gpt-4o-audio-preview - gpt-4o-mini-audio-preview - o1 - o1-pro - o3-pro - o3 - o4-mini - o3-mini - o1-mini - codex-mini-latest - gpt-4o-mini-search-preview - gpt-4o-search-preview - computer-use-preview - gpt-5 - gpt-5-mini - gpt-5-pro - GPT-4.1 - GPT-4o - gpt-5-codex - gpt-5-nano-2025-08-07 - gpt-5-pro-2025-10-06 - o3-deep-research-2025-06-26 - o4-mini-deep-research-2025-06-26 - o3-pro-2025-06-10 - o3-2025-04-16 - o4-mini-2025-04-16 - gpt-4.1-2025-04-14 - gpt-4.1-mini-2025-04-14 - gpt-4.1-nano-2025-04-14 - o1-pro-2025-03-19 - o3-mini-2025-01-31 - o1-2024-12-17 - gpt-4o-2024-08-06 - gpt-4o-mini-2024-07-18 - gpt-4-turbo-2024-04-09 GOOGLE: - gemini-2.0-flash-lite - gemini-2.0-flash-live-001 - gemini-2.0-flash - gemini-2.5-flash-lite - gemini-2.5-flash - gemini-2.5-pro - gemini-live-2.5-flash-preview - gemini-2.5-flash-preview-native-audio-dialog - gemini-2.5-flash-exp-native-audio-thinking-dialog - gemini-2.5-flash-image-preview - gemini-2.5-flash-preview-tts - gemini-2.5-pro-preview-tts - gemini-2.0-flash-preview-image-generation OLLAMA: (No models available)

Lockfile workflow

llmring lock init
Creating lockfile with registry-based defaults... ✅ Created lockfile with registry data ✅ Created lockfile: /Users/juanre/prj/llmring-all/llmring.ai/dist/docs-run/llmring.lock Default aliases: long_context → openai:gpt-4.1-2025-04-14 low_cost → openai:gpt-5-nano fast → openai:gpt-5-nano json_mode → openai:gpt-4.1 mcp_agent → anthropic:claude-3-haiku-20240307 deep → anthropic:claude-3-haiku-20240307 balanced → anthropic:claude-sonnet-4-20250514 pdf_reader → anthropic:claude-3-haiku-20240307 💡 Use 'llmring lock chat' for conversational lockfile management
llmring lock chat

Starts an interactive conversational configuration session for intelligent lockfile management. The embedded advisor alias uses reasoning models with fallbacks so recommendations stay reliable.

llmring aliases
Aliases in profile 'default': long_context → openai:gpt-4.1-2025-04-14 low_cost → openai:gpt-5-nano fast → openai:gpt-5-nano json_mode → openai:gpt-4.1 mcp_agent → anthropic:claude-3-haiku-20240307 deep → anthropic:claude-3-haiku-20240307 balanced → anthropic:claude-sonnet-4-20250514 pdf_reader → anthropic:claude-3-haiku-20240307
llmring bind summarizer anthropic:claude-3-5-haiku-20241022
✅ Bound 'summarizer' → 'anthropic:claude-3-haiku' in profile 'default'
llmring aliases
Aliases in profile 'default': long_context → openai:gpt-4.1-2025-04-14 low_cost → openai:gpt-5-nano fast → openai:gpt-5-nano json_mode → openai:gpt-4.1 mcp_agent → anthropic:claude-3-haiku-20240307 deep → anthropic:claude-3-haiku-20240307 balanced → anthropic:claude-sonnet-4-20250514 pdf_reader → anthropic:claude-3-haiku-20240307 summarizer → anthropic:claude-3-haiku

Registry Integration

  • Fetches model capabilities/pricing from https://llmring.github.io/registry/
  • Models keyed as provider:model
  • Fields include max_input_tokens, max_output_tokens, dollars_per_million_tokens_*, capability flags (e.g., supports_parallel_tool_calls), reasoning metadata (is_reasoning_model, min_recommended_reasoning_tokens, supports_thinking), and cost extras for caching/long-context tiers (dollars_per_million_tokens_cached_input, dollars_per_million_tokens_cache_write_*, long_context_threshold_tokens, etc.).

Receipts

  • Local: library can calculate costs and create unsigned receipt objects
  • Canonical: signed by server using Ed25519 over RFC 8785 JCS

Environment

export OPENAI_API_KEY=...
export ANTHROPIC_API_KEY=...
export GEMINI_API_KEY=...         # or GOOGLE_API_KEY=... or GOOGLE_GEMINI_API_KEY=...
export LLMRING_PROFILE=prod
export LLMRING_SERVER_URL=http://localhost:8000   # Optional server (defaults to SaaS when unset)
export LLMRING_API_KEY=proj_test_key              # Required for stats/export/receipts

Security

  • Lockfile contains no secrets
  • API keys via environment only

API Reference

Schemas

Message

{
  "role": "system | user | assistant | tool",
  "content": "string or structured content",
  "tool_calls": [ { "id": "...", "type": "...", "function": { "name": "...", "arguments": { } } } ],
  "tool_call_id": "optional",
  "timestamp": "ISO-8601 optional"
}

LLMRequest

{
  "messages": [ Message ],
  "model": "provider:model or alias",
  "temperature": 0.0,
  "max_tokens": 1024,
  "reasoning_tokens": 3000,       // Optional override; defaults to registry-recommended budget
  "response_format": { },
  "tools": [ { } ],
  "tool_choice": "auto | none | any | { function: name }",
  "cache": { },
  "metadata": { },
  "json_response": true,
  "stream": false,
  "extra_params": { }
}

LLMResponse

{
  "content": "string",
  "model": "provider:model",
  "parsed": { },
  "usage": {
    "prompt_tokens": 5200,
    "completion_tokens": 800,
    "reasoning_tokens": 4000,           // Present when provider reports thinking budgets
    "cache_read_input_tokens": 3000,    // Discounted cached reads
    "cache_creation_5m_tokens": 1200,   // Prompt cache write (5 minute TTL)
    "cache_creation_1h_tokens": 0,
    "total_tokens": 10000,
    "cost": 0.00483,                    // Full cost with all add-ons
    "cost_breakdown": {
      "input": 0.00033,
      "cache_read": 0.00006,
      "cache_write_5m": 0.00018,
      "cache_write_1h": 0.0,
      "long_context_input": 0.00026,    // Only present when hitting long-context tiers
      "output": 0.00026,
      "reasoning": 0.00400
    }
  },
  "finish_reason": "stop | length | tool_calls | ...",
  "tool_calls": [ { } ]
}

StreamChunk

{
  "delta": "partial text",
  "model": "provider:model",
  "finish_reason": null,
  "usage": null,
  "tool_calls": [ ]
}

Class: LLMRing

Constructor:

LLMRing(origin: str = "llmring", registry_url: str | None = None, lockfile_path: str | None = None)

Methods:

  • async chat(request: LLMRequest, profile: str | None = None) -> LLMResponse

    • Resolves aliases via lockfile, routes call to provider, enriches usage.cost when registry pricing is available, records a local unsigned receipt if lockfile present.
  • async chat_with_alias(alias_or_model: str, messages: list, temperature: float | None = None, max_tokens: int | None = None, profile: str | None = None, **kwargs) -> LLMResponse

  • resolve_alias(alias_or_model: str, profile: str | None = None) -> str

  • bind_alias(alias: str, model: str, profile: str | None = None) -> None

  • unbind_alias(alias: str, profile: str | None = None) -> None

  • list_aliases(profile: str | None = None) -> dict[str, str]

  • init_lockfile(force: bool = False) -> None

  • get_available_models() -> dict[str, list[str]]

  • get_model_info(model: str) -> dict

  • async get_enhanced_model_info(model: str) -> dict

  • async validate_context_limit(request: LLMRequest) -> str | None

  • async calculate_cost(response: LLMResponse) -> dict | None

  • async close() -> None

Lockfile API

Classes:

  • AliasBinding { alias, provider, model, constraints? } with model_ref property.
  • ProfileConfig { name, bindings[], registry_versions{} }
    • set_binding(alias, model_ref, constraints?)
    • remove_binding(alias) -> bool
    • get_binding(alias) -> AliasBinding | None
  • Lockfile { version, created_at, updated_at, default_profile, profiles{} }
    • @classmethod create_default() -> Lockfile
    • save(path: Path | None = None) -> None
    • @classmethod load(path: Path | None = None) -> Lockfile
    • @classmethod find_lockfile(start_path: Path | None = None) -> Path | None
    • calculate_digest() -> str
    • get_profile(name: str | None = None) -> ProfileConfig
    • set_binding(alias, model_ref, profile: str | None = None, constraints: dict | None = None)
    • resolve_alias(alias, profile: str | None = None) -> str | None

Registry Client

  • RegistryModel fields: provider, model_name, display_name, description?, max_input_tokens?, max_output_tokens?, dollars_per_million_tokens_input?, dollars_per_million_tokens_output?, supports_vision, supports_function_calling, supports_json_mode, supports_parallel_tool_calls, is_active, added_date?, deprecated_date?.

  • RegistryClient(registry_url: str | None = None, cache_dir: Path | None = None)

    • async fetch_current_models(provider: str) -> list[RegistryModel]
    • async fetch_version(provider: str, version: int) -> RegistryVersion
    • async get_current_version(provider: str) -> int
    • async check_drift(provider: str, pinned_version: int) -> dict
    • async validate_model(provider: str, model_name: str) -> bool
    • clear_cache() -> None

Receipts

  • Receipt fields: receipt_id, timestamp, alias, profile, lock_digest, provider, model, prompt_tokens, completion_tokens, total_tokens, input_cost, output_cost, total_cost, signature?.
  • ReceiptSigner: generate_keypair(), load_private_key(), load_public_key(), sign_receipt(receipt) -> str, verify_receipt(receipt, public_key) -> bool, export_private_key(), export_public_key().
  • ReceiptGenerator: generate_receipt(...) -> Receipt, calculate_costs(provider, model, prompt_tokens, completion_tokens, model_info?) -> dict.

Providers

All providers implement BaseLLMProvider:

async def chat(messages, model, temperature=None, max_tokens=None, response_format=None, tools=None, tool_choice=None, json_response=None, cache=None, stream=False, extra_params=None) -> LLMResponse | AsyncIterator[StreamChunk]
async def validate_model(model: str) -> bool
async def get_supported_models() -> list[str]
def get_default_model() -> str

Structured Output (Unified)

LLMRing provides a single interface for JSON Schema across providers:

request = LLMRequest(
  model="balanced",
  messages=[Message(role="user", content="Generate a person")],
  response_format={
    "type": "json_schema",
    "json_schema": {"name": "person", "schema": {"type": "object", "properties": {"name": {"type": "string"}}}},
    "strict": True
  }
)
response = await ring.chat(request)
print(response.parsed)

OpenAI uses native JSON Schema; Anthropic/Gemini use native tools/functions under the hood; Ollama uses best‑effort JSON with one repair attempt.