# LLMRing Library (Python) - LLM Documentation URL: https://llmring.ai/docs/llmring/ This is the LLM-readable version of the LLMRing Library (Python) page. --- # LLMRing Python Library **GitHub**: [https://github.com/juanre/llmring](https://github.com/juanre/llmring) Python library to talk to OpenAI, Anthropic, Google, and Ollama with a unified interface. Configuration is stored in a version-controlled `llmring.lock` file (local to each codebase). Models are accessed via aliases that resolve ordered model lists (automatic fallbacks + reasoning-aware budgets). ## Modes of Operation 1. **Lockfile-Only**: Works completely standalone with just your `llmring.lock`. No backend required, no logging, no MCP persistence. 2. **With Server**: Connect to self-hosted `llmring-server` for receipts, usage tracking, and MCP persistence. ## Installation ```bash uv add llmring ``` ## Quick Start ```bash llmring lock init ``` ```bash llmring lock chat # For intelligent conversational configuration ``` ```bash llmring bind summarizer anthropic:claude-3-5-haiku-20241022 ``` ```bash llmring aliases ``` ```python from llmring import LLMRing, LLMRequest, Message ring = LLMRing() request = LLMRequest( messages=[Message(role="user", content="Summarize this text")], model="summarizer" ) response = await ring.chat(request) ``` Use comma-separated model IDs (e.g., `anthropic:claude-3-5-haiku-20241022,openai:gpt-4o-mini`) to add fallbacks in priority order. ## Lockfile - Authoritative config; commit to VCS - Optional profiles for different environments: `dev`, `staging`, `prod` - Pinned registry versions per provider - Aliases map to ordered `models` arrays (first item is primary, rest are fallback) ```toml # Registry version pinning (optional) # (IDs change as providers publish updates) [registry_versions] openai = 186 anthropic = 102 # Default bindings (arrays support automatic fallback) [[bindings]] alias = "summarizer" models = ["anthropic:claude-3-5-haiku-20241022"] [[bindings]] alias = "advisor" models = [ "openai:gpt-5-2025-08-07", "google:gemini-2.5-pro", "anthropic:claude-opus-4-1-20250805" ] [[bindings]] alias = "balanced" models = ["anthropic:claude-3-5-sonnet-20250219", "openai:gpt-4o"] ``` LLMRing resolves aliases to the first available model in the list; subsequent entries are tried automatically if the primary model is unavailable or rate-limited. ## Reasoning Models Version 1.1 introduces reasoning-aware orchestration. The registry tags reasoning-capable models (OpenAI o1/o3/gpt-5*, Anthropic Opus 4.1, etc.) with `is_reasoning_model` plus a `min_recommended_reasoning_tokens` hint. The client uses that metadata to choose the right API parameters automatically. ```python from llmring import LLMRing, LLMRequest, Message ring = LLMRing() request = LLMRequest( model="advisor", # Alias backed by reasoning models messages=[Message(role="user", content="Draft a migration plan")], reasoning_tokens=3500, # Optional override (otherwise uses registry hint) max_tokens=800 # Applied as max_completion_tokens for OpenAI reasoning models ) response = await ring.chat(request) print(response.usage) ``` - Skip `reasoning_tokens` to accept the registry recommendation for each model. - OpenAI reasoning requests automatically switch to `max_completion_tokens`; non-reasoning models continue using `max_tokens`. - Usage and receipts include the additional reasoning budget so cost reports stay accurate. ## Cost Breakdown & Prompt Caching Version 1.2 upgrades cost tracking with per-feature breakdowns: - **Cache reads** via `cache_read_input_tokens` (or `cached_tokens`) are billed with provider-specific discounted rates. - **Cache writes** capture both 5-minute and 1-hour TTL buckets (`cache_creation_5m_tokens`, `cache_creation_1h_tokens`) plus any generic totals (`cache_creation_input_tokens`). - **Long-context tiers** automatically kick in once `prompt_tokens - cache_read_input_tokens` exceeds the registry's `long_context_threshold_tokens`. - **Thinking tokens** use `supports_thinking` and `dollars_per_million_tokens_output_thinking` to cost reasoning output separately. `CostCalculator.add_cost_to_response` now injects a `cost_breakdown` dict into every response so usage logs, receipts, and dashboards can explain spend at a glance. ## CLI ```bash llmring lock init [--force] ``` ```bash llmring lock chat # conversational configuration ``` ```bash llmring bind [--profile ] ``` ```bash llmring aliases [--profile ] ``` ```bash llmring lock validate ``` ```bash llmring lock bump-registry ``` ```bash llmring list [--provider ] ``` ```bash llmring info [--json] ``` ```bash llmring stats|export # requires server ``` ```bash llmring mcp chat [--server URL] # MCP interactive chat ``` ```bash llmring mcp servers list # list MCP servers ``` ```bash llmring mcp tools # list MCP tools ``` ### CLI Output ```bash llmring --help ``` ``` usage: cli.py [-h] {lock,bind,aliases,list,chat,info,providers,stats,export,cache,receipts,register} ... LLMRing - Unified LLM Service CLI with Profile Support Profiles allow environment-specific configurations (dev, prod, test). Use --profile flag or set LLMRING_PROFILE environment variable. positional arguments: {lock,bind,aliases,list,chat,info,providers,stats,export,cache,receipts,register} Commands lock Lockfile management bind Bind an alias to model(s) with fallback support aliases List aliases from lockfile list List available models chat Send a chat message info Show model information providers List configured providers stats Show usage statistics export Export receipts to file cache Registry cache management receipts On-demand receipt generation and management register Register with LLMRing server (for SaaS features) options: -h, --help show this help message and exit ``` ```bash llmring providers ``` ``` Configured Providers: ---------------------------------------- ✓ openai OPENAI_API_KEY ✓ anthropic ANTHROPIC_API_KEY ✓ google GOOGLE_API_KEY or GEMINI_API_KEY ✓ ollama (not required) ``` ```bash llmring list ``` ``` Available Models: ---------------------------------------- ANTHROPIC: - claude-opus-4-1-20250805 - claude-opus-4-20250514 - claude-sonnet-4-20250514 - claude-3-7-sonnet-20250219 - claude-3-5-haiku-20241022 - claude-3-haiku-20240307 - claude-sonnet-4-5-20250929 - claude-haiku-4-5-20251001 OPENAI: - gpt-4.1 - gpt-5-nano - o3-deep-research - o4-mini-deep-research - gpt-5-mini-2025-08-07 - gpt-5-2025-08-07 - gpt-5-chat-latest - gpt-4.1-mini - gpt-4.1-nano - gpt-4o - gpt-4o-2024-05-13 - gpt-4o-mini - gpt-realtime - gpt-4o-realtime-preview - gpt-4o-mini-realtime-preview - gpt-audio - gpt-4o-audio-preview - gpt-4o-mini-audio-preview - o1 - o1-pro - o3-pro - o3 - o4-mini - o3-mini - o1-mini - codex-mini-latest - gpt-4o-mini-search-preview - gpt-4o-search-preview - computer-use-preview - gpt-5 - gpt-5-mini - gpt-5-pro - GPT-4.1 - GPT-4o - gpt-5-codex - gpt-5-nano-2025-08-07 - gpt-5-pro-2025-10-06 - o3-deep-research-2025-06-26 - o4-mini-deep-research-2025-06-26 - o3-pro-2025-06-10 - o3-2025-04-16 - o4-mini-2025-04-16 - gpt-4.1-2025-04-14 - gpt-4.1-mini-2025-04-14 - gpt-4.1-nano-2025-04-14 - o1-pro-2025-03-19 - o3-mini-2025-01-31 - o1-2024-12-17 - gpt-4o-2024-08-06 - gpt-4o-mini-2024-07-18 - gpt-4-turbo-2024-04-09 GOOGLE: - gemini-2.0-flash-lite - gemini-2.0-flash-live-001 - gemini-2.0-flash - gemini-2.5-flash-lite - gemini-2.5-flash - gemini-2.5-pro - gemini-live-2.5-flash-preview - gemini-2.5-flash-preview-native-audio-dialog - gemini-2.5-flash-exp-native-audio-thinking-dialog - gemini-2.5-flash-image-preview - gemini-2.5-flash-preview-tts - gemini-2.5-pro-preview-tts - gemini-2.0-flash-preview-image-generation OLLAMA: (No models available) ``` ### Lockfile workflow ```bash llmring lock init ``` ``` Creating lockfile with registry-based defaults... ✅ Created lockfile with registry data ✅ Created lockfile: /Users/juanre/prj/llmring-all/llmring.ai/dist/docs-run/llmring.lock Default aliases: long_context → openai:gpt-4.1-2025-04-14 low_cost → openai:gpt-5-nano fast → openai:gpt-5-nano json_mode → openai:gpt-4.1 mcp_agent → anthropic:claude-3-haiku-20240307 deep → anthropic:claude-3-haiku-20240307 balanced → anthropic:claude-sonnet-4-20250514 pdf_reader → anthropic:claude-3-haiku-20240307 💡 Use 'llmring lock chat' for conversational lockfile management ``` ```bash llmring lock chat ``` Starts an interactive conversational configuration session for intelligent lockfile management. The embedded `advisor` alias uses reasoning models with fallbacks so recommendations stay reliable. ```bash llmring aliases ``` ``` Aliases in profile 'default': long_context → openai:gpt-4.1-2025-04-14 low_cost → openai:gpt-5-nano fast → openai:gpt-5-nano json_mode → openai:gpt-4.1 mcp_agent → anthropic:claude-3-haiku-20240307 deep → anthropic:claude-3-haiku-20240307 balanced → anthropic:claude-sonnet-4-20250514 pdf_reader → anthropic:claude-3-haiku-20240307 ``` ```bash llmring bind summarizer anthropic:claude-3-5-haiku-20241022 ``` ``` ✅ Bound 'summarizer' → 'anthropic:claude-3-haiku' in profile 'default' ``` ```bash llmring aliases ``` ``` Aliases in profile 'default': long_context → openai:gpt-4.1-2025-04-14 low_cost → openai:gpt-5-nano fast → openai:gpt-5-nano json_mode → openai:gpt-4.1 mcp_agent → anthropic:claude-3-haiku-20240307 deep → anthropic:claude-3-haiku-20240307 balanced → anthropic:claude-sonnet-4-20250514 pdf_reader → anthropic:claude-3-haiku-20240307 summarizer → anthropic:claude-3-haiku ``` ## Registry Integration - Fetches model capabilities/pricing from [https://llmring.github.io/registry/](https://llmring.github.io/registry/) - Models keyed as `provider:model` - Fields include `max_input_tokens`, `max_output_tokens`, `dollars_per_million_tokens_*`, capability flags (e.g., `supports_parallel_tool_calls`), reasoning metadata (`is_reasoning_model`, `min_recommended_reasoning_tokens`, `supports_thinking`), and cost extras for caching/long-context tiers (`dollars_per_million_tokens_cached_input`, `dollars_per_million_tokens_cache_write_*`, `long_context_threshold_tokens`, etc.). ## Receipts - Local: library can calculate costs and create unsigned receipt objects - Canonical: signed by server using Ed25519 over RFC 8785 JCS ## Environment ```bash export OPENAI_API_KEY=... export ANTHROPIC_API_KEY=... export GEMINI_API_KEY=... # or GOOGLE_API_KEY=... or GOOGLE_GEMINI_API_KEY=... export LLMRING_PROFILE=prod export LLMRING_SERVER_URL=http://localhost:8000 # Optional server (defaults to SaaS when unset) export LLMRING_API_KEY=proj_test_key # Required for stats/export/receipts ``` ## Security - Lockfile contains no secrets - API keys via environment only ## Links - GitHub: https://github.com/juanre/llmring - PyPI: https://pypi.org/project/llmring --- ## API Reference ### Schemas `Message` ```json { "role": "system | user | assistant | tool", "content": "string or structured content", "tool_calls": [ { "id": "...", "type": "...", "function": { "name": "...", "arguments": { } } } ], "tool_call_id": "optional", "timestamp": "ISO-8601 optional" } ``` `LLMRequest` ```json { "messages": [ Message ], "model": "provider:model or alias", "temperature": 0.0, "max_tokens": 1024, "reasoning_tokens": 3000, // Optional override; defaults to registry-recommended budget "response_format": { }, "tools": [ { } ], "tool_choice": "auto | none | any | { function: name }", "cache": { }, "metadata": { }, "json_response": true, "stream": false, "extra_params": { } } ``` `LLMResponse` ```json { "content": "string", "model": "provider:model", "parsed": { }, "usage": { "prompt_tokens": 5200, "completion_tokens": 800, "reasoning_tokens": 4000, // Present when provider reports thinking budgets "cache_read_input_tokens": 3000, // Discounted cached reads "cache_creation_5m_tokens": 1200, // Prompt cache write (5 minute TTL) "cache_creation_1h_tokens": 0, "total_tokens": 10000, "cost": 0.00483, // Full cost with all add-ons "cost_breakdown": { "input": 0.00033, "cache_read": 0.00006, "cache_write_5m": 0.00018, "cache_write_1h": 0.0, "long_context_input": 0.00026, // Only present when hitting long-context tiers "output": 0.00026, "reasoning": 0.00400 } }, "finish_reason": "stop | length | tool_calls | ...", "tool_calls": [ { } ] } ``` `StreamChunk` ```json { "delta": "partial text", "model": "provider:model", "finish_reason": null, "usage": null, "tool_calls": [ ] } ``` ### Class: LLMRing Constructor: ```python LLMRing(origin: str = "llmring", registry_url: str | None = None, lockfile_path: str | None = None) ``` Methods: - `async chat(request: LLMRequest, profile: str | None = None) -> LLMResponse` - Resolves aliases via lockfile, routes call to provider, enriches `usage.cost` when registry pricing is available, records a local unsigned receipt if lockfile present. - `async chat_with_alias(alias_or_model: str, messages: list, temperature: float | None = None, max_tokens: int | None = None, profile: str | None = None, **kwargs) -> LLMResponse` - `resolve_alias(alias_or_model: str, profile: str | None = None) -> str` - `bind_alias(alias: str, model: str, profile: str | None = None) -> None` - `unbind_alias(alias: str, profile: str | None = None) -> None` - `list_aliases(profile: str | None = None) -> dict[str, str]` - `init_lockfile(force: bool = False) -> None` - `get_available_models() -> dict[str, list[str]]` - `get_model_info(model: str) -> dict` - `async get_enhanced_model_info(model: str) -> dict` - `async validate_context_limit(request: LLMRequest) -> str | None` - `async calculate_cost(response: LLMResponse) -> dict | None` - `async close() -> None` ### Lockfile API Classes: - `AliasBinding { alias, provider, model, constraints? }` with `model_ref` property. - `ProfileConfig { name, bindings[], registry_versions{} }` - `set_binding(alias, model_ref, constraints?)` - `remove_binding(alias) -> bool` - `get_binding(alias) -> AliasBinding | None` - `Lockfile { version, created_at, updated_at, default_profile, profiles{} }` - `@classmethod create_default() -> Lockfile` - `save(path: Path | None = None) -> None` - `@classmethod load(path: Path | None = None) -> Lockfile` - `@classmethod find_lockfile(start_path: Path | None = None) -> Path | None` - `calculate_digest() -> str` - `get_profile(name: str | None = None) -> ProfileConfig` - `set_binding(alias, model_ref, profile: str | None = None, constraints: dict | None = None)` - `resolve_alias(alias, profile: str | None = None) -> str | None` ### Registry Client - `RegistryModel` fields: `provider`, `model_name`, `display_name`, `description?`, `max_input_tokens?`, `max_output_tokens?`, `dollars_per_million_tokens_input?`, `dollars_per_million_tokens_output?`, `supports_vision`, `supports_function_calling`, `supports_json_mode`, `supports_parallel_tool_calls`, `is_active`, `added_date?`, `deprecated_date?`. - `RegistryClient(registry_url: str | None = None, cache_dir: Path | None = None)` - `async fetch_current_models(provider: str) -> list[RegistryModel]` - `async fetch_version(provider: str, version: int) -> RegistryVersion` - `async get_current_version(provider: str) -> int` - `async check_drift(provider: str, pinned_version: int) -> dict` - `async validate_model(provider: str, model_name: str) -> bool` - `clear_cache() -> None` ### Receipts - `Receipt` fields: `receipt_id`, `timestamp`, `alias`, `profile`, `lock_digest`, `provider`, `model`, `prompt_tokens`, `completion_tokens`, `total_tokens`, `input_cost`, `output_cost`, `total_cost`, `signature?`. - `ReceiptSigner`: `generate_keypair()`, `load_private_key()`, `load_public_key()`, `sign_receipt(receipt) -> str`, `verify_receipt(receipt, public_key) -> bool`, `export_private_key()`, `export_public_key()`. - `ReceiptGenerator`: `generate_receipt(...) -> Receipt`, `calculate_costs(provider, model, prompt_tokens, completion_tokens, model_info?) -> dict`. ### Providers All providers implement `BaseLLMProvider`: ```python async def chat(messages, model, temperature=None, max_tokens=None, response_format=None, tools=None, tool_choice=None, json_response=None, cache=None, stream=False, extra_params=None) -> LLMResponse | AsyncIterator[StreamChunk] async def validate_model(model: str) -> bool async def get_supported_models() -> list[str] def get_default_model() -> str ``` ### Structured Output (Unified) LLMRing provides a single interface for JSON Schema across providers: ```python request = LLMRequest( model="balanced", messages=[Message(role="user", content="Generate a person")], response_format={ "type": "json_schema", "json_schema": {"name": "person", "schema": {"type": "object", "properties": {"name": {"type": "string"}}}}, "strict": True } ) response = await ring.chat(request) print(response.parsed) ``` OpenAI uses native JSON Schema; Anthropic/Gemini use native tools/functions under the hood; Ollama uses best‑effort JSON with one repair attempt.