LLMRing Python Library

GitHub: https://github.com/juanre/llmring

Python library to talk to OpenAI, Anthropic, Google, and Ollama with a unified interface. Configuration is stored in a version-controlled llmring.lock file (local to each codebase). Models are accessed via aliases that resolve ordered model lists (automatic fallbacks + reasoning-aware budgets).

Modes of Operation

Lockfile-Only: Works completely standalone with just your llmring.lock. No backend required, no logging, no MCP persistence.
With Server: Connect to self-hosted llmring-server for receipts, usage tracking, and MCP persistence.

Installation

uv add llmring

Quick Start

llmring lock init

llmring lock chat  # For intelligent conversational configuration

llmring bind summarizer anthropic:claude-3-5-haiku-20241022

llmring aliases

from llmring import LLMRing, LLMRequest, Message

ring = LLMRing()
request = LLMRequest(
  messages=[Message(role="user", content="Summarize this text")],
  model="summarizer"
)
response = await ring.chat(request)

Use comma-separated model IDs (e.g., anthropic:claude-3-5-haiku-20241022,openai:gpt-4o-mini) to add fallbacks in priority order.

Lockfile

Authoritative config; commit to VCS
Optional profiles for different environments: dev, staging, prod
Pinned registry versions per provider
Aliases map to ordered models arrays (first item is primary, rest are fallback)

# Registry version pinning (optional)
# (IDs change as providers publish updates)
[registry_versions]
openai = 186
anthropic = 102

# Default bindings (arrays support automatic fallback)
[[bindings]]
alias = "summarizer"
models = ["anthropic:claude-3-5-haiku-20241022"]

[[bindings]]
alias = "advisor"
models = [
  "openai:gpt-5-2025-08-07",
  "google:gemini-2.5-pro",
  "anthropic:claude-opus-4-1-20250805"
]

[[bindings]]
alias = "balanced"
models = ["anthropic:claude-3-5-sonnet-20250219", "openai:gpt-4o"]

LLMRing resolves aliases to the first available model in the list; subsequent entries are tried automatically if the primary model is unavailable or rate-limited.

Reasoning Models

Version 1.1 introduces reasoning-aware orchestration. The registry tags reasoning-capable models (OpenAI o1/o3/gpt-5*, Anthropic Opus 4.1, etc.) with is_reasoning_model plus a min_recommended_reasoning_tokens hint. The client uses that metadata to choose the right API parameters automatically.

from llmring import LLMRing, LLMRequest, Message

ring = LLMRing()

request = LLMRequest(
    model="advisor",                # Alias backed by reasoning models
    messages=[Message(role="user", content="Draft a migration plan")],
    reasoning_tokens=3500,          # Optional override (otherwise uses registry hint)
    max_tokens=800                  # Applied as max_completion_tokens for OpenAI reasoning models
)

response = await ring.chat(request)
print(response.usage)

Skip reasoning_tokens to accept the registry recommendation for each model.
OpenAI reasoning requests automatically switch to max_completion_tokens; non-reasoning models continue using max_tokens.
Usage and receipts include the additional reasoning budget so cost reports stay accurate.

Cost Breakdown & Prompt Caching

Version 1.2 upgrades cost tracking with per-feature breakdowns:

Cache reads via cache_read_input_tokens (or cached_tokens) are billed with provider-specific discounted rates.
Cache writes capture both 5-minute and 1-hour TTL buckets (cache_creation_5m_tokens, cache_creation_1h_tokens) plus any generic totals (cache_creation_input_tokens).
Long-context tiers automatically kick in once prompt_tokens - cache_read_input_tokens exceeds the registry’s long_context_threshold_tokens.
Thinking tokens use supports_thinking and dollars_per_million_tokens_output_thinking to cost reasoning output separately.

CostCalculator.add_cost_to_response now injects a cost_breakdown dict into every response so usage logs, receipts, and dashboards can explain spend at a glance.

CLI

llmring lock init [--force]

llmring lock chat                      # conversational configuration

llmring bind <alias> <provider:model[,provider:model...]> [--profile <name>]

llmring aliases [--profile <name>]

llmring lock validate

llmring lock bump-registry

llmring list [--provider <name>]

llmring info <provider:model> [--json]

llmring stats|export                   # requires server

llmring mcp chat [--server URL]        # MCP interactive chat

llmring mcp servers list               # list MCP servers

llmring mcp tools                      # list MCP tools

CLI Output

llmring --help

usage: cli.py [-h] {lock,bind,aliases,list,chat,info,providers,stats,export,cache,receipts,register} ... LLMRing - Unified LLM Service CLI with Profile Support Profiles allow environment-specific configurations (dev, prod, test). Use --profile flag or set LLMRING_PROFILE environment variable. positional arguments: {lock,bind,aliases,list,chat,info,providers,stats,export,cache,receipts,register} Commands lock Lockfile management bind Bind an alias to model(s) with fallback support aliases List aliases from lockfile list List available models chat Send a chat message info Show model information providers List configured providers stats Show usage statistics export Export receipts to file cache Registry cache management receipts On-demand receipt generation and management register Register with LLMRing server (for SaaS features) options: -h, --help show this help message and exit

llmring providers

Configured Providers: ---------------------------------------- ✓ openai OPENAI_API_KEY ✓ anthropic ANTHROPIC_API_KEY ✓ google GOOGLE_API_KEY or GEMINI_API_KEY ✓ ollama (not required)

llmring list

Available Models: ---------------------------------------- ANTHROPIC: - claude-opus-4-1-20250805 - claude-opus-4-20250514 - claude-sonnet-4-20250514 - claude-3-7-sonnet-20250219 - claude-3-5-haiku-20241022 - claude-3-haiku-20240307 - claude-sonnet-4-5-20250929 - claude-haiku-4-5-20251001 OPENAI: - gpt-4.1 - gpt-5-nano - o3-deep-research - o4-mini-deep-research - gpt-5-mini-2025-08-07 - gpt-5-2025-08-07 - gpt-5-chat-latest - gpt-4.1-mini - gpt-4.1-nano - gpt-4o - gpt-4o-2024-05-13 - gpt-4o-mini - gpt-realtime - gpt-4o-realtime-preview - gpt-4o-mini-realtime-preview - gpt-audio - gpt-4o-audio-preview - gpt-4o-mini-audio-preview - o1 - o1-pro - o3-pro - o3 - o4-mini - o3-mini - o1-mini - codex-mini-latest - gpt-4o-mini-search-preview - gpt-4o-search-preview - computer-use-preview - gpt-5 - gpt-5-mini - gpt-5-pro - GPT-4.1 - GPT-4o - gpt-5-codex - gpt-5-nano-2025-08-07 - gpt-5-pro-2025-10-06 - o3-deep-research-2025-06-26 - o4-mini-deep-research-2025-06-26 - o3-pro-2025-06-10 - o3-2025-04-16 - o4-mini-2025-04-16 - gpt-4.1-2025-04-14 - gpt-4.1-mini-2025-04-14 - gpt-4.1-nano-2025-04-14 - o1-pro-2025-03-19 - o3-mini-2025-01-31 - o1-2024-12-17 - gpt-4o-2024-08-06 - gpt-4o-mini-2024-07-18 - gpt-4-turbo-2024-04-09 GOOGLE: - gemini-2.0-flash-lite - gemini-2.0-flash-live-001 - gemini-2.0-flash - gemini-2.5-flash-lite - gemini-2.5-flash - gemini-2.5-pro - gemini-live-2.5-flash-preview - gemini-2.5-flash-preview-native-audio-dialog - gemini-2.5-flash-exp-native-audio-thinking-dialog - gemini-2.5-flash-image-preview - gemini-2.5-flash-preview-tts - gemini-2.5-pro-preview-tts - gemini-2.0-flash-preview-image-generation OLLAMA: (No models available)

Lockfile workflow

llmring lock init

Creating lockfile with registry-based defaults... ✅ Created lockfile with registry data ✅ Created lockfile: /Users/juanre/prj/llmring-all/llmring.ai/dist/docs-run/llmring.lock Default aliases: long_context → openai:gpt-4.1-2025-04-14 low_cost → openai:gpt-5-nano fast → openai:gpt-5-nano json_mode → openai:gpt-4.1 mcp_agent → anthropic:claude-3-haiku-20240307 deep → anthropic:claude-3-haiku-20240307 balanced → anthropic:claude-sonnet-4-20250514 pdf_reader → anthropic:claude-3-haiku-20240307 💡 Use 'llmring lock chat' for conversational lockfile management

llmring lock chat

Starts an interactive conversational configuration session for intelligent lockfile management. The embedded advisor alias uses reasoning models with fallbacks so recommendations stay reliable.

llmring aliases

Aliases in profile 'default': long_context → openai:gpt-4.1-2025-04-14 low_cost → openai:gpt-5-nano fast → openai:gpt-5-nano json_mode → openai:gpt-4.1 mcp_agent → anthropic:claude-3-haiku-20240307 deep → anthropic:claude-3-haiku-20240307 balanced → anthropic:claude-sonnet-4-20250514 pdf_reader → anthropic:claude-3-haiku-20240307

llmring bind summarizer anthropic:claude-3-5-haiku-20241022

✅ Bound 'summarizer' → 'anthropic:claude-3-haiku' in profile 'default'

llmring aliases

Registry Integration

Fetches model capabilities/pricing from https://llmring.github.io/registry/
Models keyed as provider:model
Fields include max_input_tokens, max_output_tokens, dollars_per_million_tokens_*, capability flags (e.g., supports_parallel_tool_calls), reasoning metadata (is_reasoning_model, min_recommended_reasoning_tokens, supports_thinking), and cost extras for caching/long-context tiers (dollars_per_million_tokens_cached_input, dollars_per_million_tokens_cache_write_*, long_context_threshold_tokens, etc.).

Receipts

Local: library can calculate costs and create unsigned receipt objects
Canonical: signed by server using Ed25519 over RFC 8785 JCS

Environment

export OPENAI_API_KEY=...
export ANTHROPIC_API_KEY=...
export GEMINI_API_KEY=...         # or GOOGLE_API_KEY=... or GOOGLE_GEMINI_API_KEY=...
export LLMRING_PROFILE=prod
export LLMRING_SERVER_URL=http://localhost:8000   # Optional server (defaults to SaaS when unset)
export LLMRING_API_KEY=proj_test_key              # Required for stats/export/receipts

Security

Lockfile contains no secrets
API keys via environment only

API Reference

Schemas

Message

{
  "role": "system | user | assistant | tool",
  "content": "string or structured content",
  "tool_calls": [ { "id": "...", "type": "...", "function": { "name": "...", "arguments": { } } } ],
  "tool_call_id": "optional",
  "timestamp": "ISO-8601 optional"
}

LLMRequest

{
  "messages": [ Message ],
  "model": "provider:model or alias",
  "temperature": 0.0,
  "max_tokens": 1024,
  "reasoning_tokens": 3000,       // Optional override; defaults to registry-recommended budget
  "response_format": { },
  "tools": [ { } ],
  "tool_choice": "auto | none | any | { function: name }",
  "cache": { },
  "metadata": { },
  "json_response": true,
  "stream": false,
  "extra_params": { }
}

LLMResponse

{
  "content": "string",
  "model": "provider:model",
  "parsed": { },
  "usage": {
    "prompt_tokens": 5200,
    "completion_tokens": 800,
    "reasoning_tokens": 4000,           // Present when provider reports thinking budgets
    "cache_read_input_tokens": 3000,    // Discounted cached reads
    "cache_creation_5m_tokens": 1200,   // Prompt cache write (5 minute TTL)
    "cache_creation_1h_tokens": 0,
    "total_tokens": 10000,
    "cost": 0.00483,                    // Full cost with all add-ons
    "cost_breakdown": {
      "input": 0.00033,
      "cache_read": 0.00006,
      "cache_write_5m": 0.00018,
      "cache_write_1h": 0.0,
      "long_context_input": 0.00026,    // Only present when hitting long-context tiers
      "output": 0.00026,
      "reasoning": 0.00400
    }
  },
  "finish_reason": "stop | length | tool_calls | ...",
  "tool_calls": [ { } ]
}

StreamChunk

{
  "delta": "partial text",
  "model": "provider:model",
  "finish_reason": null,
  "usage": null,
  "tool_calls": [ ]
}

Class: LLMRing

Constructor:

LLMRing(origin: str = "llmring", registry_url: str | None = None, lockfile_path: str | None = None)

Methods:

async chat(request: LLMRequest, profile: str | None = None) -> LLMResponse
- Resolves aliases via lockfile, routes call to provider, enriches usage.cost when registry pricing is available, records a local unsigned receipt if lockfile present.
async chat_with_alias(alias_or_model: str, messages: list, temperature: float | None = None, max_tokens: int | None = None, profile: str | None = None, **kwargs) -> LLMResponse
resolve_alias(alias_or_model: str, profile: str | None = None) -> str
bind_alias(alias: str, model: str, profile: str | None = None) -> None
unbind_alias(alias: str, profile: str | None = None) -> None
list_aliases(profile: str | None = None) -> dict[str, str]
init_lockfile(force: bool = False) -> None
get_available_models() -> dict[str, list[str]]
get_model_info(model: str) -> dict
async get_enhanced_model_info(model: str) -> dict
async validate_context_limit(request: LLMRequest) -> str | None
async calculate_cost(response: LLMResponse) -> dict | None
async close() -> None

Lockfile API

Classes:

AliasBinding { alias, provider, model, constraints? } with model_ref property.
ProfileConfig { name, bindings[], registry_versions{} }
- set_binding(alias, model_ref, constraints?)
- remove_binding(alias) -> bool
- get_binding(alias) -> AliasBinding | None
Lockfile { version, created_at, updated_at, default_profile, profiles{} }
- @classmethod create_default() -> Lockfile
- save(path: Path | None = None) -> None
- @classmethod load(path: Path | None = None) -> Lockfile
- @classmethod find_lockfile(start_path: Path | None = None) -> Path | None
- calculate_digest() -> str
- get_profile(name: str | None = None) -> ProfileConfig
- set_binding(alias, model_ref, profile: str | None = None, constraints: dict | None = None)
- resolve_alias(alias, profile: str | None = None) -> str | None

Registry Client

RegistryModel fields: provider, model_name, display_name, description?, max_input_tokens?, max_output_tokens?, dollars_per_million_tokens_input?, dollars_per_million_tokens_output?, supports_vision, supports_function_calling, supports_json_mode, supports_parallel_tool_calls, is_active, added_date?, deprecated_date?.
RegistryClient(registry_url: str | None = None, cache_dir: Path | None = None)
- async fetch_current_models(provider: str) -> list[RegistryModel]
- async fetch_version(provider: str, version: int) -> RegistryVersion
- async get_current_version(provider: str) -> int
- async check_drift(provider: str, pinned_version: int) -> dict
- async validate_model(provider: str, model_name: str) -> bool
- clear_cache() -> None

Receipts

Receipt fields: receipt_id, timestamp, alias, profile, lock_digest, provider, model, prompt_tokens, completion_tokens, total_tokens, input_cost, output_cost, total_cost, signature?.
ReceiptSigner: generate_keypair(), load_private_key(), load_public_key(), sign_receipt(receipt) -> str, verify_receipt(receipt, public_key) -> bool, export_private_key(), export_public_key().
ReceiptGenerator: generate_receipt(...) -> Receipt, calculate_costs(provider, model, prompt_tokens, completion_tokens, model_info?) -> dict.

Providers

All providers implement BaseLLMProvider:

async def chat(messages, model, temperature=None, max_tokens=None, response_format=None, tools=None, tool_choice=None, json_response=None, cache=None, stream=False, extra_params=None) -> LLMResponse | AsyncIterator[StreamChunk]
async def validate_model(model: str) -> bool
async def get_supported_models() -> list[str]
def get_default_model() -> str

Structured Output (Unified)

LLMRing provides a single interface for JSON Schema across providers:

request = LLMRequest(
  model="balanced",
  messages=[Message(role="user", content="Generate a person")],
  response_format={
    "type": "json_schema",
    "json_schema": {"name": "person", "schema": {"type": "object", "properties": {"name": {"type": "string"}}}},
    "strict": True
  }
)
response = await ring.chat(request)
print(response.parsed)

OpenAI uses native JSON Schema; Anthropic/Gemini use native tools/functions under the hood; Ollama uses best‑effort JSON with one repair attempt.

LLMRing Python Library

Modes of Operation

Installation

Quick Start

Lockfile

Reasoning Models

Cost Breakdown & Prompt Caching

CLI

CLI Output

Lockfile workflow

Registry Integration

Receipts

Environment

Security

Links

API Reference

Schemas

Class: LLMRing

Lockfile API

Registry Client

Receipts

Providers

Structured Output (Unified)