LLMRing Python Library
llms.txtGitHub: https://github.com/juanre/llmring
Python library to talk to OpenAI, Anthropic, Google, and Ollama with a unified interface. Configuration is stored in a version-controlled llmring.lock file (local to each codebase). Models are accessed via aliases that resolve ordered model lists (automatic fallbacks + reasoning-aware budgets).
Modes of Operation
- Lockfile-Only: Works completely standalone with just your
llmring.lock. No backend required, no logging, no MCP persistence. - With Server: Connect to self-hosted
llmring-serverfor receipts, usage tracking, and MCP persistence.
Installation
uv add llmring
Quick Start
llmring lock init
llmring lock chat # For intelligent conversational configuration
llmring bind summarizer anthropic:claude-3-5-haiku-20241022
llmring aliases
from llmring import LLMRing, LLMRequest, Message
ring = LLMRing()
request = LLMRequest(
messages=[Message(role="user", content="Summarize this text")],
model="summarizer"
)
response = await ring.chat(request)
Use comma-separated model IDs (e.g., anthropic:claude-3-5-haiku-20241022,openai:gpt-4o-mini) to add fallbacks in priority order.
Lockfile
- Authoritative config; commit to VCS
- Optional profiles for different environments:
dev,staging,prod - Pinned registry versions per provider
- Aliases map to ordered
modelsarrays (first item is primary, rest are fallback)
# Registry version pinning (optional)
# (IDs change as providers publish updates)
[registry_versions]
openai = 186
anthropic = 102
# Default bindings (arrays support automatic fallback)
[[bindings]]
alias = "summarizer"
models = ["anthropic:claude-3-5-haiku-20241022"]
[[bindings]]
alias = "advisor"
models = [
"openai:gpt-5-2025-08-07",
"google:gemini-2.5-pro",
"anthropic:claude-opus-4-1-20250805"
]
[[bindings]]
alias = "balanced"
models = ["anthropic:claude-3-5-sonnet-20250219", "openai:gpt-4o"]
LLMRing resolves aliases to the first available model in the list; subsequent entries are tried automatically if the primary model is unavailable or rate-limited.
Reasoning Models
Version 1.1 introduces reasoning-aware orchestration. The registry tags reasoning-capable models (OpenAI o1/o3/gpt-5*, Anthropic Opus 4.1, etc.) with is_reasoning_model plus a min_recommended_reasoning_tokens hint. The client uses that metadata to choose the right API parameters automatically.
from llmring import LLMRing, LLMRequest, Message
ring = LLMRing()
request = LLMRequest(
model="advisor", # Alias backed by reasoning models
messages=[Message(role="user", content="Draft a migration plan")],
reasoning_tokens=3500, # Optional override (otherwise uses registry hint)
max_tokens=800 # Applied as max_completion_tokens for OpenAI reasoning models
)
response = await ring.chat(request)
print(response.usage)
- Skip
reasoning_tokensto accept the registry recommendation for each model. - OpenAI reasoning requests automatically switch to
max_completion_tokens; non-reasoning models continue usingmax_tokens. - Usage and receipts include the additional reasoning budget so cost reports stay accurate.
Cost Breakdown & Prompt Caching
Version 1.2 upgrades cost tracking with per-feature breakdowns:
- Cache reads via
cache_read_input_tokens(orcached_tokens) are billed with provider-specific discounted rates. - Cache writes capture both 5-minute and 1-hour TTL buckets (
cache_creation_5m_tokens,cache_creation_1h_tokens) plus any generic totals (cache_creation_input_tokens). - Long-context tiers automatically kick in once
prompt_tokens - cache_read_input_tokensexceeds the registry’slong_context_threshold_tokens. - Thinking tokens use
supports_thinkinganddollars_per_million_tokens_output_thinkingto cost reasoning output separately.
CostCalculator.add_cost_to_response now injects a cost_breakdown dict into every response so usage logs, receipts, and dashboards can explain spend at a glance.
CLI
llmring lock init [--force]
llmring lock chat # conversational configuration
llmring bind <alias> <provider:model[,provider:model...]> [--profile <name>]
llmring aliases [--profile <name>]
llmring lock validate
llmring lock bump-registry
llmring list [--provider <name>]
llmring info <provider:model> [--json]
llmring stats|export # requires server
llmring mcp chat [--server URL] # MCP interactive chat
llmring mcp servers list # list MCP servers
llmring mcp tools # list MCP tools
CLI Output
llmring --help
llmring providers
llmring list
Lockfile workflow
llmring lock init
llmring lock chat
Starts an interactive conversational configuration session for intelligent lockfile management.
The embedded advisor alias uses reasoning models with fallbacks so recommendations stay reliable.
llmring aliases
llmring bind summarizer anthropic:claude-3-5-haiku-20241022
llmring aliases
Registry Integration
- Fetches model capabilities/pricing from https://llmring.github.io/registry/
- Models keyed as
provider:model - Fields include
max_input_tokens,max_output_tokens,dollars_per_million_tokens_*, capability flags (e.g.,supports_parallel_tool_calls), reasoning metadata (is_reasoning_model,min_recommended_reasoning_tokens,supports_thinking), and cost extras for caching/long-context tiers (dollars_per_million_tokens_cached_input,dollars_per_million_tokens_cache_write_*,long_context_threshold_tokens, etc.).
Receipts
- Local: library can calculate costs and create unsigned receipt objects
- Canonical: signed by server using Ed25519 over RFC 8785 JCS
Environment
export OPENAI_API_KEY=...
export ANTHROPIC_API_KEY=...
export GEMINI_API_KEY=... # or GOOGLE_API_KEY=... or GOOGLE_GEMINI_API_KEY=...
export LLMRING_PROFILE=prod
export LLMRING_SERVER_URL=http://localhost:8000 # Optional server (defaults to SaaS when unset)
export LLMRING_API_KEY=proj_test_key # Required for stats/export/receipts
Security
- Lockfile contains no secrets
- API keys via environment only
Links
API Reference
Schemas
Message
{
"role": "system | user | assistant | tool",
"content": "string or structured content",
"tool_calls": [ { "id": "...", "type": "...", "function": { "name": "...", "arguments": { } } } ],
"tool_call_id": "optional",
"timestamp": "ISO-8601 optional"
}
LLMRequest
{
"messages": [ Message ],
"model": "provider:model or alias",
"temperature": 0.0,
"max_tokens": 1024,
"reasoning_tokens": 3000, // Optional override; defaults to registry-recommended budget
"response_format": { },
"tools": [ { } ],
"tool_choice": "auto | none | any | { function: name }",
"cache": { },
"metadata": { },
"json_response": true,
"stream": false,
"extra_params": { }
}
LLMResponse
{
"content": "string",
"model": "provider:model",
"parsed": { },
"usage": {
"prompt_tokens": 5200,
"completion_tokens": 800,
"reasoning_tokens": 4000, // Present when provider reports thinking budgets
"cache_read_input_tokens": 3000, // Discounted cached reads
"cache_creation_5m_tokens": 1200, // Prompt cache write (5 minute TTL)
"cache_creation_1h_tokens": 0,
"total_tokens": 10000,
"cost": 0.00483, // Full cost with all add-ons
"cost_breakdown": {
"input": 0.00033,
"cache_read": 0.00006,
"cache_write_5m": 0.00018,
"cache_write_1h": 0.0,
"long_context_input": 0.00026, // Only present when hitting long-context tiers
"output": 0.00026,
"reasoning": 0.00400
}
},
"finish_reason": "stop | length | tool_calls | ...",
"tool_calls": [ { } ]
}
StreamChunk
{
"delta": "partial text",
"model": "provider:model",
"finish_reason": null,
"usage": null,
"tool_calls": [ ]
}
Class: LLMRing
Constructor:
LLMRing(origin: str = "llmring", registry_url: str | None = None, lockfile_path: str | None = None)
Methods:
async chat(request: LLMRequest, profile: str | None = None) -> LLMResponse- Resolves aliases via lockfile, routes call to provider, enriches
usage.costwhen registry pricing is available, records a local unsigned receipt if lockfile present.
- Resolves aliases via lockfile, routes call to provider, enriches
async chat_with_alias(alias_or_model: str, messages: list, temperature: float | None = None, max_tokens: int | None = None, profile: str | None = None, **kwargs) -> LLMResponseresolve_alias(alias_or_model: str, profile: str | None = None) -> strbind_alias(alias: str, model: str, profile: str | None = None) -> Noneunbind_alias(alias: str, profile: str | None = None) -> Nonelist_aliases(profile: str | None = None) -> dict[str, str]init_lockfile(force: bool = False) -> Noneget_available_models() -> dict[str, list[str]]get_model_info(model: str) -> dictasync get_enhanced_model_info(model: str) -> dictasync validate_context_limit(request: LLMRequest) -> str | Noneasync calculate_cost(response: LLMResponse) -> dict | Noneasync close() -> None
Lockfile API
Classes:
AliasBinding { alias, provider, model, constraints? }withmodel_refproperty.ProfileConfig { name, bindings[], registry_versions{} }set_binding(alias, model_ref, constraints?)remove_binding(alias) -> boolget_binding(alias) -> AliasBinding | None
Lockfile { version, created_at, updated_at, default_profile, profiles{} }@classmethod create_default() -> Lockfilesave(path: Path | None = None) -> None@classmethod load(path: Path | None = None) -> Lockfile@classmethod find_lockfile(start_path: Path | None = None) -> Path | Nonecalculate_digest() -> strget_profile(name: str | None = None) -> ProfileConfigset_binding(alias, model_ref, profile: str | None = None, constraints: dict | None = None)resolve_alias(alias, profile: str | None = None) -> str | None
Registry Client
RegistryModelfields:provider,model_name,display_name,description?,max_input_tokens?,max_output_tokens?,dollars_per_million_tokens_input?,dollars_per_million_tokens_output?,supports_vision,supports_function_calling,supports_json_mode,supports_parallel_tool_calls,is_active,added_date?,deprecated_date?.RegistryClient(registry_url: str | None = None, cache_dir: Path | None = None)async fetch_current_models(provider: str) -> list[RegistryModel]async fetch_version(provider: str, version: int) -> RegistryVersionasync get_current_version(provider: str) -> intasync check_drift(provider: str, pinned_version: int) -> dictasync validate_model(provider: str, model_name: str) -> boolclear_cache() -> None
Receipts
Receiptfields:receipt_id,timestamp,alias,profile,lock_digest,provider,model,prompt_tokens,completion_tokens,total_tokens,input_cost,output_cost,total_cost,signature?.ReceiptSigner:generate_keypair(),load_private_key(),load_public_key(),sign_receipt(receipt) -> str,verify_receipt(receipt, public_key) -> bool,export_private_key(),export_public_key().ReceiptGenerator:generate_receipt(...) -> Receipt,calculate_costs(provider, model, prompt_tokens, completion_tokens, model_info?) -> dict.
Providers
All providers implement BaseLLMProvider:
async def chat(messages, model, temperature=None, max_tokens=None, response_format=None, tools=None, tool_choice=None, json_response=None, cache=None, stream=False, extra_params=None) -> LLMResponse | AsyncIterator[StreamChunk]
async def validate_model(model: str) -> bool
async def get_supported_models() -> list[str]
def get_default_model() -> str
Structured Output (Unified)
LLMRing provides a single interface for JSON Schema across providers:
request = LLMRequest(
model="balanced",
messages=[Message(role="user", content="Generate a person")],
response_format={
"type": "json_schema",
"json_schema": {"name": "person", "schema": {"type": "object", "properties": {"name": {"type": "string"}}}},
"strict": True
}
)
response = await ring.chat(request)
print(response.parsed)
OpenAI uses native JSON Schema; Anthropic/Gemini use native tools/functions under the hood; Ollama uses best‑effort JSON with one repair attempt.