# Home - LLM Documentation URL: https://llmring.ai/ This is the LLM-readable version of the Home page. --- # LLMRing One interface to run them all ... LLMRing is an Open Source provider-agnostic Python library for talking to LLMs. It lets you map tasks to aliases backed by ordered model lists (with automatic fallbacks), use a single interface for all providers, and track usage and cost via an optional server. Your aliases live in a version-controlled `llmring.lock` file, keeping model choices explicit, reviewable, and easy to share. Your API calls go directly to OpenAI, Anthropic, Google, or Ollama. The call's metadata can be optionally logged to a [server managed by you](/docs/server/). ## Components - **[Library (llmring)](/docs/llmring/)** - Python package for unified LLM access with built-in MCP support - **[Server (llmring-server)](/docs/server/)** - Optional backend for usage tracking, receipts, and MCP persistence - **[Registry](/docs/registry/)** - Versioned, human-validated database of model capabilities and pricing ## Quick Start Install and create a basic lockfile: ```bash uv add llmring ``` ```bash llmring lock init ``` This creates `llmring.lock` with sensible defaults, pinned registry versions, and starter aliases such as `fast`, `balanced`, `deep`, and `advisor` (each populated with fallbacks when data is available). For intelligent, conversational configuration that analyzes the live registry and recommends optimal aliases, use: ```bash llmring lock chat ``` ## Lockfile + Aliases Your configuration lives in `llmring.lock`, a version-controlled file that makes your AI stack reproducible: ```toml # llmring.lock (excerpt) # Registry version pinning (optional) # (IDs change as providers publish updates) [registry_versions] openai = 186 anthropic = 102 google = 41 # Default bindings (always an array, even for a single model) [[bindings]] alias = "summarizer" models = ["anthropic:claude-3-5-haiku-20241022"] [[bindings]] alias = "pdf_converter" models = ["openai:gpt-4o-mini"] [[bindings]] alias = "advisor" models = [ "openai:gpt-5-2025-08-07", "google:gemini-2.5-pro", "anthropic:claude-opus-4-1-20250805" ] [[bindings]] alias = "balanced" models = ["anthropic:claude-3-5-sonnet-20250219", "openai:gpt-4o"] # Automatic fallback ``` Aliases always resolve the first available entry in `models`; remaining entries act as automatic failovers in priority order (perfect for rate limits or regional outages). Model identifiers include provider release dates when available—refresh them anytime with `llmring lock bump-registry`. Use aliases in your code: ```python from llmring import LLMRing, Message ring = LLMRing() # Loads from llmring.lock response = await ring.chat("summarizer", messages=[ Message(role="user", content="Summarize this document...") ]) ``` ## Unified Structured Output LLMRing provides one interface for structured output across all providers. Use a JSON Schema with `response_format`, and LLMRing adapts it per provider: ```python from llmring import LLMRing from llmring.schemas import LLMRequest, Message ring = LLMRing() request = LLMRequest( model="balanced", messages=[Message(role="user", content="Generate a person")], response_format={ "type": "json_schema", "json_schema": { "name": "person", "schema": { "type": "object", "properties": { "name": {"type": "string"}, "age": {"type": "integer"} }, "required": ["name", "age"] } }, "strict": True } ) response = await ring.chat(request) print(response.content) # valid JSON print(response.parsed) # dict ``` **How it works per provider:** - **OpenAI**: Native JSON Schema strict mode - **Anthropic**: Tool-based extraction with validation - **Google Gemini**: FunctionDeclaration with schema mapping - **Ollama**: Best-effort JSON with automatic repair ## Reasoning Models & Token Budgets LLMRing 1.1 adds first-class support for OpenAI's reasoning family (o1, o3, gpt-5*). Reasoning token budgets are handled automatically based on registry metadata, and you can override them when you need more headroom: ```python from llmring import LLMRing, LLMRequest, Message ring = LLMRing() request = LLMRequest( model="advisor", # Alias backed by reasoning models messages=[Message(role="user", content="Plan the migration path")], reasoning_tokens=4000, # Optional override max_tokens=800 # Completion budget (max_completion_tokens under the hood) ) response = await ring.chat(request) print(response.usage["cost"]) ``` - **Registry-aware**: Each reasoning-capable entry is tagged with `is_reasoning_model` plus `min_recommended_reasoning_tokens`. - **Automatic defaults**: Skip `reasoning_tokens` to let LLMRing apply the recommended split between reasoning and completion tokens. - **Provider smartness**: OpenAI calls automatically swap to `max_completion_tokens` so you get the right API semantics without new code. - **Cost tracking**: Usage includes the extra reasoning budget so receipts and CSV exports reflect actual spend. ## Cost Breakdown & Prompt Caching LLMRing 1.2 extends cost analysis with provider-specific breakdowns for prompt caching, long-context tiers, and dedicated thinking tokens. ```python response.usage == { "prompt_tokens": 5200, "completion_tokens": 800, "reasoning_tokens": 4000, "cache_read_input_tokens": 3000, "cache_creation_5m_tokens": 1200, "total_tokens": 10000, "cost": 0.00483, "cost_breakdown": { "input": 0.00033, "cache_read": 0.00006, "cache_write_5m": 0.00018, "output": 0.00026, "reasoning": 0.00400 } } ``` - **Cache-aware:** Bills cached reads (`cache_read_input_tokens`) and cache writes (`cache_creation_5m_tokens`, `cache_creation_1h_tokens`, `cache_creation_input_tokens`) with the correct provider rates. - **Long-context savvy:** Automatically switches to long-context pricing once you cross the provider thresholds. - **Thinking tokens:** Uses registry metadata (`supports_thinking`, `dollars_per_million_tokens_output_thinking`) to price dedicated reasoning budgets separately. - **Full transparency:** Cost breakdown flows into usage logs, receipts, and dashboards so finance sees exactly where spend goes. ## CLI Commands The configuration is in your lockfile ```bash # Create basic lockfile with defaults llmring lock init # Intelligent conversational configuration (recommended) llmring lock chat # Bind aliases locally (escape hatch) llmring bind pdf_converter openai:gpt-4o-mini # Validate against registry llmring lock validate # Update registry versions llmring lock bump-registry ``` ## Two Modes of Operation ### 1. Lockfile-Only (No Backend) Works completely standalone with just your `llmring.lock` file. Safe, explicit configuration per codebase. No costs tracking, no logging, no MCP persistence. ### 2. With Server (Self-Hosted) Add receipts, usage tracking, and MCP tool/resource persistence by connecting to your own `llmring-server` instance. See [Server Docs](/docs/server/) for endpoints, headers, and deployment. ## The Open Registry Model information comes from versioned, per-provider registries: - Current snapshot: [https://llmring.github.io/registry/openai/models.json](https://llmring.github.io/registry/openai/models.json) - Historical snapshot: `https://llmring.github.io/registry/openai/v//models.json` Each provider's registry is versioned independently. Your lockfile records these versions to track drift: ```toml [registry_versions] openai = 186 # Example: registry snapshot ID when you last updated anthropic = 102 # Each provider advances independently ``` Note: These versions track what the registry knew at that point, not the actual model behavior. Providers can change prices and limits anytime - the registry helps you detect when things have drifted from your expectations. Example model entry: ```json { "openai:gpt-5-2025-08-07": { "provider": "openai", "model_name": "gpt-5-2025-08-07", "display_name": "GPT-5", "max_input_tokens": 272000, "max_output_tokens": 128000, "dollars_per_million_tokens_input": 1.25, "dollars_per_million_tokens_output": 10.0, "supports_vision": true, "supports_function_calling": true, "supports_json_mode": true, "supports_parallel_tool_calls": false, "is_reasoning_model": true, "min_recommended_reasoning_tokens": 2000, "is_active": true } } ``` See [Registry Docs](/docs/registry/) for schema and curation workflow. ## Profiles for Different Environments Support multiple configurations in one lockfile: ```toml # llmring.lock (profiles excerpt) # Production: High quality with fallbacks [profiles.prod] [[profiles.prod.bindings]] alias = "summarizer" models = ["anthropic:claude-3-5-haiku-20241022", "openai:gpt-4o-mini"] [[profiles.prod.bindings]] alias = "analyzer" models = ["openai:gpt-5-2025-08-07", "anthropic:claude-opus-4-1-20250805"] # Development: Cheaper models [profiles.dev] [[profiles.dev.bindings]] alias = "summarizer" models = ["openai:gpt-4o-mini"] [[profiles.dev.bindings]] alias = "analyzer" models = ["openai:gpt-4o-mini", "anthropic:claude-3-5-haiku-20241022"] ``` Switch profiles via environment: ```bash export LLMRING_PROFILE=prod python app.py ``` ## CLI Workflow Core lockfile management: ```bash # Create basic lockfile with defaults llmring lock init # Intelligent conversational configuration (recommended) llmring lock chat # Bind aliases (updates lockfile) llmring bind summarizer anthropic:claude-3-5-haiku-20241022 # List aliases from lockfile llmring aliases # Validate against registry llmring lock validate # Update registry versions llmring lock bump-registry ``` MCP operations (requires backend): ```bash # Connect to any MCP server for interactive chat llmring mcp chat --server "stdio://python -m your_mcp_server" # List registered MCP servers llmring mcp servers list # Register new MCP server llmring mcp register calculator http://calculator-mcp:8080 # List available tools llmring mcp tools # Execute a tool llmring mcp execute calculator.add "{\"a\": 5, \"b\": 3}" ``` With a server connected: ```bash # View usage stats (requires server) llmring stats # Export receipts (requires server) llmring export ``` ## Environment Variables ```bash # LLM provider keys (required) export OPENAI_API_KEY="sk-..." export ANTHROPIC_API_KEY="sk-ant-..." # Gemini supports either of these export GEMINI_API_KEY="..." # or export GOOGLE_API_KEY="..." # or export GOOGLE_GEMINI_API_KEY="..." # Optional profile selection export LLMRING_PROFILE="prod" # Optional server connection (self-hosted or SaaS) export LLMRING_SERVER_URL="http://localhost:8000" # Base URL for CLI + MCP telemetry export LLMRING_API_KEY="proj_test_key" # Required for stats/export/receipts # SaaS defaults to https://api.llmring.ai when unset ``` ## Why LLMRing - **Lockfile**: Version control your AI configuration with reproducible deployments - **Task-oriented**: Think in terms of tasks, not model IDs - **Reasoning-aware**: Automatically splits reasoning vs completion budgets for o1/o3/gpt-5 class models - **Cost transparent**: Breaks costs down across cache reads/writes, long-context tiers, and thinking tokens - **Fallback ready**: Ordered model lists provide instant failover when providers rate-limit or degrade - **Drift detection**: Track when models change from your expectations - **MCP Integration**: Full Model Context Protocol support for tool orchestration ## Source Code Everything is open source on GitHub: - [llmring](https://github.com/juanre/llmring) - Python package and CLI - [llmring-server](https://github.com/juanre/llmring-server) - Optional API server - [registry](https://github.com/llmring/registry) - Model registry source ## License MIT License. Use it however you want. --- One interface to run them all One registry to find them One API to track them all And with aliases bind them