LLMRing

One interface to run them all …

LLMRing is an Open Source provider-agnostic Python library for talking to LLMs. It lets you map tasks to aliases backed by ordered model lists (with automatic fallbacks), use a single interface for all providers, and track usage and cost via an optional server. Your aliases live in a version-controlled llmring.lock file, keeping model choices explicit, reviewable, and easy to share.

Your API calls go directly to OpenAI, Anthropic, Google, or Ollama.

The call’s metadata can be optionally logged to a server managed by you.

Components

Library (llmring) - Python package for unified LLM access with built-in MCP support
Server (llmring-server) - Optional backend for usage tracking, receipts, and MCP persistence
Registry - Versioned, human-validated database of model capabilities and pricing

Quick Start

Install and create a basic lockfile:

uv add llmring

llmring lock init

This creates llmring.lock with sensible defaults, pinned registry versions, and starter aliases such as fast, balanced, deep, and advisor (each populated with fallbacks when data is available). For intelligent, conversational configuration that analyzes the live registry and recommends optimal aliases, use:

llmring lock chat

Lockfile + Aliases

Your configuration lives in llmring.lock, a version-controlled file that makes your AI stack reproducible:

# llmring.lock (excerpt)

# Registry version pinning (optional)
# (IDs change as providers publish updates)
[registry_versions]
openai = 186
anthropic = 102
google = 41

# Default bindings (always an array, even for a single model)
[[bindings]]
alias = "summarizer"
models = ["anthropic:claude-3-5-haiku-20241022"]

[[bindings]]
alias = "pdf_converter"
models = ["openai:gpt-4o-mini"]

[[bindings]]
alias = "advisor"
models = [
  "openai:gpt-5-2025-08-07",
  "google:gemini-2.5-pro",
  "anthropic:claude-opus-4-1-20250805"
]

[[bindings]]
alias = "balanced"
models = ["anthropic:claude-3-5-sonnet-20250219", "openai:gpt-4o"]  # Automatic fallback

Aliases always resolve the first available entry in models; remaining entries act as automatic failovers in priority order (perfect for rate limits or regional outages).

Model identifiers include provider release dates when available—refresh them anytime with llmring lock bump-registry.

Use aliases in your code:

from llmring import LLMRing, Message

ring = LLMRing()  # Loads from llmring.lock

response = await ring.chat("summarizer", messages=[
    Message(role="user", content="Summarize this document...")
])

Unified Structured Output

LLMRing provides one interface for structured output across all providers. Use a JSON Schema with response_format, and LLMRing adapts it per provider:

from llmring import LLMRing
from llmring.schemas import LLMRequest, Message

ring = LLMRing()

request = LLMRequest(
  model="balanced",
  messages=[Message(role="user", content="Generate a person")],
  response_format={
    "type": "json_schema",
    "json_schema": {
      "name": "person",
      "schema": {
        "type": "object",
        "properties": {
          "name": {"type": "string"},
          "age": {"type": "integer"}
        },
        "required": ["name", "age"]
      }
    },
    "strict": True
  }
)

response = await ring.chat(request)
print(response.content)  # valid JSON
print(response.parsed)   # dict

How it works per provider:

OpenAI: Native JSON Schema strict mode
Anthropic: Tool-based extraction with validation
Google Gemini: FunctionDeclaration with schema mapping
Ollama: Best-effort JSON with automatic repair

Reasoning Models & Token Budgets

LLMRing 1.1 adds first-class support for OpenAI’s reasoning family (o1, o3, gpt-5*). Reasoning token budgets are handled automatically based on registry metadata, and you can override them when you need more headroom:

from llmring import LLMRing, LLMRequest, Message

ring = LLMRing()

request = LLMRequest(
  model="advisor",                     # Alias backed by reasoning models
  messages=[Message(role="user", content="Plan the migration path")],
  reasoning_tokens=4000,               # Optional override
  max_tokens=800                       # Completion budget (max_completion_tokens under the hood)
)

response = await ring.chat(request)
print(response.usage["cost"])

Registry-aware: Each reasoning-capable entry is tagged with is_reasoning_model plus min_recommended_reasoning_tokens.
Automatic defaults: Skip reasoning_tokens to let LLMRing apply the recommended split between reasoning and completion tokens.
Provider smartness: OpenAI calls automatically swap to max_completion_tokens so you get the right API semantics without new code.
Cost tracking: Usage includes the extra reasoning budget so receipts and CSV exports reflect actual spend.

Cost Breakdown & Prompt Caching

LLMRing 1.2 extends cost analysis with provider-specific breakdowns for prompt caching, long-context tiers, and dedicated thinking tokens.

response.usage == {
  "prompt_tokens": 5200,
  "completion_tokens": 800,
  "reasoning_tokens": 4000,
  "cache_read_input_tokens": 3000,
  "cache_creation_5m_tokens": 1200,
  "total_tokens": 10000,
  "cost": 0.00483,
  "cost_breakdown": {
    "input": 0.00033,
    "cache_read": 0.00006,
    "cache_write_5m": 0.00018,
    "output": 0.00026,
    "reasoning": 0.00400
  }
}

Cache-aware: Bills cached reads (cache_read_input_tokens) and cache writes (cache_creation_5m_tokens, cache_creation_1h_tokens, cache_creation_input_tokens) with the correct provider rates.
Long-context savvy: Automatically switches to long-context pricing once you cross the provider thresholds.
Thinking tokens: Uses registry metadata (supports_thinking, dollars_per_million_tokens_output_thinking) to price dedicated reasoning budgets separately.
Full transparency: Cost breakdown flows into usage logs, receipts, and dashboards so finance sees exactly where spend goes.

CLI Commands

The configuration is in your lockfile

# Create basic lockfile with defaults
llmring lock init

# Intelligent conversational configuration (recommended)
llmring lock chat

# Bind aliases locally (escape hatch)
llmring bind pdf_converter openai:gpt-4o-mini

# Validate against registry
llmring lock validate

# Update registry versions
llmring lock bump-registry

Two Modes of Operation

1. Lockfile-Only (No Backend)

Works completely standalone with just your llmring.lock file. Safe, explicit configuration per codebase. No costs tracking, no logging, no MCP persistence.

2. With Server (Self-Hosted)

Add receipts, usage tracking, and MCP tool/resource persistence by connecting to your own llmring-server instance.

See Server Docs for endpoints, headers, and deployment.

The Open Registry

Model information comes from versioned, per-provider registries:

Current snapshot: https://llmring.github.io/registry/openai/models.json
Historical snapshot: https://llmring.github.io/registry/openai/v/<version>/models.json

Each provider’s registry is versioned independently. Your lockfile records these versions to track drift:

[registry_versions]
openai = 186      # Example: registry snapshot ID when you last updated
anthropic = 102   # Each provider advances independently

Note: These versions track what the registry knew at that point, not the actual model behavior. Providers can change prices and limits anytime - the registry helps you detect when things have drifted from your expectations.

Example model entry:

{
  "openai:gpt-5-2025-08-07": {
    "provider": "openai",
    "model_name": "gpt-5-2025-08-07",
    "display_name": "GPT-5",
    "max_input_tokens": 272000,
    "max_output_tokens": 128000,
    "dollars_per_million_tokens_input": 1.25,
    "dollars_per_million_tokens_output": 10.0,
    "supports_vision": true,
    "supports_function_calling": true,
    "supports_json_mode": true,
    "supports_parallel_tool_calls": false,
    "is_reasoning_model": true,
    "min_recommended_reasoning_tokens": 2000,
    "is_active": true
  }
}

See Registry Docs for schema and curation workflow.

Profiles for Different Environments

Support multiple configurations in one lockfile:

# llmring.lock (profiles excerpt)

# Production: High quality with fallbacks
[profiles.prod]
[[profiles.prod.bindings]]
alias = "summarizer"
models = ["anthropic:claude-3-5-haiku-20241022", "openai:gpt-4o-mini"]

[[profiles.prod.bindings]]
alias = "analyzer"
models = ["openai:gpt-5-2025-08-07", "anthropic:claude-opus-4-1-20250805"]

# Development: Cheaper models
[profiles.dev]
[[profiles.dev.bindings]]
alias = "summarizer"
models = ["openai:gpt-4o-mini"]

[[profiles.dev.bindings]]
alias = "analyzer"
models = ["openai:gpt-4o-mini", "anthropic:claude-3-5-haiku-20241022"]

Switch profiles via environment:

export LLMRING_PROFILE=prod
python app.py

CLI Workflow

Core lockfile management:

# Create basic lockfile with defaults
llmring lock init

# Intelligent conversational configuration (recommended)
llmring lock chat

# Bind aliases (updates lockfile)
llmring bind summarizer anthropic:claude-3-5-haiku-20241022

# List aliases from lockfile
llmring aliases

# Validate against registry
llmring lock validate

# Update registry versions
llmring lock bump-registry

MCP operations (requires backend):

# Connect to any MCP server for interactive chat
llmring mcp chat --server "stdio://python -m your_mcp_server"

# List registered MCP servers
llmring mcp servers list

# Register new MCP server
llmring mcp register calculator http://calculator-mcp:8080

# List available tools
llmring mcp tools

# Execute a tool
llmring mcp execute calculator.add "{\"a\": 5, \"b\": 3}"

With a server connected:

# View usage stats (requires server)
llmring stats

# Export receipts (requires server)
llmring export

Environment Variables

# LLM provider keys (required)
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
# Gemini supports either of these
export GEMINI_API_KEY="..."       # or
export GOOGLE_API_KEY="..."       # or
export GOOGLE_GEMINI_API_KEY="..."

# Optional profile selection
export LLMRING_PROFILE="prod"

# Optional server connection (self-hosted or SaaS)
export LLMRING_SERVER_URL="http://localhost:8000"   # Base URL for CLI + MCP telemetry
export LLMRING_API_KEY="proj_test_key"              # Required for stats/export/receipts
# SaaS defaults to https://api.llmring.ai when unset

Why LLMRing

Lockfile: Version control your AI configuration with reproducible deployments
Task-oriented: Think in terms of tasks, not model IDs
Reasoning-aware: Automatically splits reasoning vs completion budgets for o1/o3/gpt-5 class models
Cost transparent: Breaks costs down across cache reads/writes, long-context tiers, and thinking tokens
Fallback ready: Ordered model lists provide instant failover when providers rate-limit or degrade
Drift detection: Track when models change from your expectations
MCP Integration: Full Model Context Protocol support for tool orchestration

Source Code

Everything is open source on GitHub:

llmring - Python package and CLI
llmring-server - Optional API server
registry - Model registry source

License

MIT License. Use it however you want.

One interface to run them all
One registry to find them
One API to track them all
And with aliases bind them