# Home - LLM Documentation URL: https://llmring.ai/ This is the LLM-readable version of the Home page. --- # LLMRing One interface to run them all ... LLMRing is an Open Source provider-agnostic Python library for talking to LLMs. It lets you easily manage which model you use for any task with aliases, use a single interface for all providers, and track usage and cost via an optional server. Your aliases are stored in a version-controlled `llmring.lock` file, making your model choices explicit, easy to change, and easy to share. Your API calls go directly to OpenAI, Anthropic, Google, or Ollama. The call's metadata can be optionally logged to a [server managed by you](/docs/server/). ## Components - **[Library (llmring)](/docs/llmring/)** - Python package for unified LLM access with built-in MCP support - **[Server (llmring-server)](/docs/server/)** - Optional backend for usage tracking, receipts, and MCP persistence - **[Registry](/docs/registry/)** - Versioned, human-validated database of model capabilities and pricing ## Quick Start Install and create a basic lockfile: ```bash uv add llmring ``` ```bash llmring lock init ``` This creates `llmring.lock` with sensible defaults and pinned registry versions. For intelligent, conversational configuration that analyzes the live registry and recommends optimal aliases (e.g., `fast`, `balanced`, `deep`), use: ```bash llmring lock chat ``` ## Lockfile + Aliases Your configuration lives in `llmring.lock`, a version-controlled file that makes your AI stack reproducible: ```toml # llmring.lock (excerpt) # Registry version pinning (optional) [registry_versions] openai = 142 anthropic = 89 google = 27 # Default bindings [[bindings]] alias = "summarizer" models = ["anthropic:claude-3-haiku"] [[bindings]] alias = "pdf_converter" models = ["openai:gpt-4o-mini"] [[bindings]] alias = "balanced" models = ["anthropic:claude-3-5-sonnet", "openai:gpt-4o"] # With fallback ``` Use aliases in your code: ```python from llmring import LLMRing, Message ring = LLMRing() # Loads from llmring.lock response = await ring.chat("summarizer", messages=[ Message(role="user", content="Summarize this document...") ]) ``` ## Unified Structured Output LLMRing provides one interface for structured output across all providers. Use a JSON Schema with `response_format`, and LLMRing adapts it per provider: ```python from llmring import LLMRing from llmring.schemas import LLMRequest, Message ring = LLMRing() request = LLMRequest( model="balanced", messages=[Message(role="user", content="Generate a person")], response_format={ "type": "json_schema", "json_schema": { "name": "person", "schema": { "type": "object", "properties": { "name": {"type": "string"}, "age": {"type": "integer"} }, "required": ["name", "age"] } }, "strict": True } ) response = await ring.chat(request) print(response.content) # valid JSON print(response.parsed) # dict ``` **How it works per provider:** - **OpenAI**: Native JSON Schema strict mode - **Anthropic**: Tool-based extraction with validation - **Google Gemini**: FunctionDeclaration with schema mapping - **Ollama**: Best-effort JSON with automatic repair ## CLI Commands The configuration is in your lockfile ```bash # Create basic lockfile with defaults llmring lock init # Intelligent conversational configuration (recommended) llmring lock chat # Bind aliases locally (escape hatch) llmring bind pdf_converter openai:gpt-4o-mini # Validate against registry llmring lock validate # Update registry versions llmring lock bump-registry ``` ## Two Modes of Operation ### 1. Lockfile-Only (No Backend) Works completely standalone with just your `llmring.lock` file. Safe, explicit configuration per codebase. No costs tracking, no logging, no MCP persistence. ### 2. With Server (Self-Hosted) Add receipts, usage tracking, and MCP tool/resource persistence by connecting to your own `llmring-server` instance. See [Server Docs](/docs/server/) for endpoints, headers, and deployment. ## The Open Registry Model information comes from versioned, per-provider registries: - Current: [https://llmring.github.io/registry/openai/models.json](https://llmring.github.io/registry/openai/models.json) - Versioned: [https://llmring.github.io/registry/openai/v/142/models.json](https://llmring.github.io/registry/openai/v/142/models.json) Each provider's registry is versioned independently. Your lockfile records these versions to track drift: ```toml [registry_versions] openai = 142 # Registry snapshot when you last updated anthropic = 89 # What the registry knew at version 89 ``` Note: These versions track what the registry knew at that point, not the actual model behavior. Providers can change prices and limits anytime - the registry helps you detect when things have drifted from your expectations. See [Registry Docs](/docs/registry/) for schema and curation workflow. ## Profiles for Different Environments Support multiple configurations in one lockfile: ```toml # llmring.lock (profiles excerpt) # Production: High quality with fallbacks [profiles.prod] [[profiles.prod.bindings]] alias = "summarizer" models = ["anthropic:claude-3-haiku"] [[profiles.prod.bindings]] alias = "analyzer" models = ["openai:gpt-4", "anthropic:claude-3-5-sonnet"] # Development: Cheaper models [profiles.dev] [[profiles.dev.bindings]] alias = "summarizer" models = ["openai:gpt-4o-mini"] [[profiles.dev.bindings]] alias = "analyzer" models = ["openai:gpt-4o-mini"] ``` Switch profiles via environment: ```bash export LLMRING_PROFILE=prod python app.py ``` ## CLI Workflow Core lockfile management: ```bash # Create basic lockfile with defaults llmring lock init # Intelligent conversational configuration (recommended) llmring lock chat # Bind aliases (updates lockfile) llmring bind summarizer anthropic:claude-3-haiku # List aliases from lockfile llmring aliases # Validate against registry llmring lock validate # Update registry versions llmring lock bump-registry ``` MCP operations (requires backend): ```bash # Connect to any MCP server for interactive chat llmring mcp chat --server "stdio://python -m your_mcp_server" # List registered MCP servers llmring mcp servers list # Register new MCP server llmring mcp register calculator http://calculator-mcp:8080 # List available tools llmring mcp tools # Execute a tool llmring mcp execute calculator.add '{"a": 5, "b": 3}' ``` With a server connected: ```bash # View usage stats (requires server) llmring stats # Export receipts (requires server) llmring export ``` ## Environment Variables ```bash # LLM provider keys (required) export OPENAI_API_KEY="sk-..." export ANTHROPIC_API_KEY="sk-ant-..." # Gemini supports either of these export GEMINI_API_KEY="..." # or export GOOGLE_API_KEY="..." # or export GOOGLE_GEMINI_API_KEY="..." # Optional profile selection export LLMRING_PROFILE="prod" # Optional server connection export LLMRING_API_URL="http://localhost:8000" ``` ## Why LLMRing - **Lockfile**: Version control your AI configuration with reproducible deployments - **Task-oriented**: Think in terms of tasks, not model IDs - **No vendor lock-in**: Works completely without any backend - **Drift detection**: Track when models change from your expectations - **MCP Integration**: Full Model Context Protocol support for tool orchestration - **Flexible**: Use standalone or with optional self-hosted server for receipts and tracking ## Source Code Everything is open source on GitHub: - [llmring](https://github.com/juanre/llmring) - Python package and CLI - [llmring-server](https://github.com/juanre/llmring-server) - Optional API server - [registry](https://github.com/llmring/registry) - Model registry source ## License MIT License. Use it however you want. --- One interface to run them all One registry to find them One API to track them all And with aliases bind them --- # Documentation - LLM Documentation URL: https://llmring.ai/docs/ This is the LLM-readable version of the Documentation page. --- # LLMRing Documentation ## Getting Started - **[Common Recipes](/docs/recipes/)** - Practical patterns and examples ## Core Components - **[Library (llmring)](/docs/llmring/)** - Python library reference and API with built-in MCP support - **[Server (llmring-server)](/docs/server/)** - Self-hostable backend for signed receipts, usage tracking, and MCP persistence - **[Registry](/docs/registry/)** - Human-validated model capabilities and pricing database ## Advanced Features - **[MCP Integration](/docs/mcp/)** - Model Context Protocol for tool orchestration and conversational lockfile management ## Resources - **[GitHub Repository](https://github.com/juanre/llmring)** - Source code and issues - **[PyPI Package](https://pypi.org/project/llmring/)** - Python package --- # LLMRing Library (Python) - LLM Documentation URL: https://llmring.ai/docs/llmring/ This is the LLM-readable version of the LLMRing Library (Python) page. --- # LLMRing Python Library **GitHub**: [https://github.com/juanre/llmring](https://github.com/juanre/llmring) Python library to talk to OpenAI, Anthropic, Google, and Ollama with a unified interface. Configuration is stored in a version-controlled `llmring.lock` file (local to each codebase). Models are accessed via aliases. ## Modes of Operation 1. **Lockfile-Only**: Works completely standalone with just your `llmring.lock`. No backend required, no logging, no MCP persistence. 2. **With Server**: Connect to self-hosted `llmring-server` for receipts, usage tracking, and MCP persistence. ## Installation ```bash uv add llmring ``` ## Quick Start ```bash llmring lock init ``` ```bash llmring lock chat # For intelligent conversational configuration ``` ```bash llmring bind summarizer anthropic:claude-3-haiku ``` ```bash llmring aliases ``` ```python from llmring import LLMRing, LLMRequest, Message ring = LLMRing() request = LLMRequest( messages=[Message(role="user", content="Summarize this text")], model="summarizer" ) response = await ring.chat(request) ``` ## Lockfile - Authoritative config; commit to VCS - Optional profiles for different environments: `dev`, `staging`, `prod` - Pinned registry versions per provider ```toml # Registry version pinning (optional) [registry_versions] openai = 142 anthropic = 89 # Default bindings [[bindings]] alias = "summarizer" models = ["anthropic:claude-3-haiku-20240307"] [[bindings]] alias = "balanced" models = ["anthropic:claude-3-5-sonnet", "openai:gpt-4o"] ``` ## CLI ```bash llmring lock init [--force] ``` ```bash llmring lock chat # conversational configuration ``` ```bash llmring bind [--profile ] ``` ```bash llmring aliases [--profile ] ``` ```bash llmring lock validate ``` ```bash llmring lock bump-registry ``` ```bash llmring list [--provider ] ``` ```bash llmring info [--json] ``` ```bash llmring stats|export # requires server ``` ```bash llmring mcp chat [--server URL] # MCP interactive chat ``` ```bash llmring mcp servers list # list MCP servers ``` ```bash llmring mcp tools # list MCP tools ``` ### CLI Output ```bash llmring --help ``` ``` usage: cli.py [-h] {lock,bind,aliases,list,chat,info,providers,push,pull,stats,export,register} ... LLMRing - Unified LLM Service CLI positional arguments: {lock,bind,aliases,list,chat,info,providers,push,pull,stats,export,register} Commands lock Lockfile management bind Bind an alias to a model aliases List aliases from lockfile list List available models chat Send a chat message info Show model information providers List configured providers push Push lockfile aliases to server (X-Project-Key required) pull Pull aliases from server into lockfile (X-Project-Key required) stats Show usage statistics export Export receipts to file register Register with LLMRing server (for SaaS features) options: -h, --help show this help message and exit ``` ```bash llmring providers ``` ``` Configured Providers: ---------------------------------------- ✓ openai OPENAI_API_KEY ✓ anthropic ANTHROPIC_API_KEY ✗ google GOOGLE_API_KEY or GEMINI_API_KEY ✓ ollama (not required) ``` ```bash llmring list ``` ``` Available Models: ---------------------------------------- ANTHROPIC: - claude-3-7-sonnet-20250219 - claude-3-7-sonnet - claude-3-5-sonnet-20241022-v2 - claude-3-5-sonnet-20241022 - claude-3-5-sonnet-20240620 - claude-3-5-sonnet - claude-3-5-haiku-20241022 - claude-3-5-haiku - claude-3-opus-20240229 - claude-3-sonnet-20240229 - claude-3-haiku-20240307 OPENAI: - gpt-4o - gpt-4o-mini - gpt-4o-2024-08-06 - gpt-4-turbo - gpt-4 - gpt-3.5-turbo - o1 - o1-mini OLLAMA: - llama3.3:latest - llama3.3 - llama3.2 - llama3.1 - llama3 - mistral - mixtral - codellama - phi3 - gemma2 - gemma - qwen2.5 - qwen ``` ### Lockfile workflow ```bash llmring lock init ``` ``` ✅ Created lockfile: /Users/juanre/prj/llmring-all/llmring.ai/dist/docs-run/llmring.lock Default bindings: long_context → openai:gpt-4-turbo-preview low_cost → openai:gpt-3.5-turbo json_mode → openai:gpt-4-turbo-preview fast → openai:gpt-3.5-turbo deep → anthropic:claude-3-opus-20240229 balanced → anthropic:claude-3-sonnet-20240229 pdf_reader → anthropic:claude-3-sonnet-20240229 local → ollama:llama3.3:latest ``` ```bash llmring lock chat ``` Starts an interactive conversational configuration session for intelligent lockfile management. ```bash llmring aliases ``` ``` Aliases in profile 'default': ---------------------------------------- long_context → openai:gpt-4-turbo-preview low_cost → openai:gpt-3.5-turbo json_mode → openai:gpt-4-turbo-preview fast → openai:gpt-3.5-turbo deep → anthropic:claude-3-opus-20240229 balanced → anthropic:claude-3-sonnet-20240229 pdf_reader → anthropic:claude-3-sonnet-20240229 local → ollama:llama3.3:latest ``` ```bash llmring bind summarizer anthropic:claude-3-haiku ``` ``` ✅ Bound 'summarizer' → 'anthropic:claude-3-haiku' in profile 'default' ``` ```bash llmring aliases ``` ``` Aliases in profile 'default': ---------------------------------------- long_context → openai:gpt-4-turbo-preview low_cost → openai:gpt-3.5-turbo json_mode → openai:gpt-4-turbo-preview fast → openai:gpt-3.5-turbo deep → anthropic:claude-3-opus-20240229 balanced → anthropic:claude-3-sonnet-20240229 pdf_reader → anthropic:claude-3-sonnet-20240229 local → ollama:llama3.3:latest summarizer → anthropic:claude-3-haiku ``` ## Registry Integration - Fetches model capabilities/pricing from [https://llmring.github.io/registry/](https://llmring.github.io/registry/) - Models keyed as `provider:model` - Fields include `max_input_tokens`, `max_output_tokens`, `dollars_per_million_tokens_*`, and capability flags including `supports_parallel_tool_calls`. ## Receipts - Local: library can calculate costs and create unsigned receipt objects - Canonical: signed by server using Ed25519 over RFC 8785 JCS ## Environment ```bash export OPENAI_API_KEY=... export ANTHROPIC_API_KEY=... export GEMINI_API_KEY=... # or GOOGLE_API_KEY=... or GOOGLE_GEMINI_API_KEY=... export LLMRING_PROFILE=prod ``` ## Security - Lockfile contains no secrets - API keys via environment only ## Links - GitHub: https://github.com/juanre/llmring - PyPI: https://pypi.org/project/llmring --- ## API Reference ### Schemas `Message` ```json { "role": "system | user | assistant | tool", "content": "string or structured content", "tool_calls": [ { "id": "...", "type": "...", "function": { "name": "...", "arguments": { } } } ], "tool_call_id": "optional", "timestamp": "ISO-8601 optional" } ``` `LLMRequest` ```json { "messages": [ Message ], "model": "provider:model or alias", "temperature": 0.0, "max_tokens": 1024, "response_format": { }, "tools": [ { } ], "tool_choice": "auto | none | any | { function: name }", "cache": { }, "metadata": { }, "json_response": true, "stream": false, "extra_params": { } } ``` `LLMResponse` ```json { "content": "string", "model": "provider:model", "parsed": { }, "usage": { "prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0, "cost": 0.000123, // if registry pricing available "cost_breakdown": { "input": 0.0, "output": 0.0 } }, "finish_reason": "stop | length | tool_calls | ...", "tool_calls": [ { } ] } ``` `StreamChunk` ```json { "delta": "partial text", "model": "provider:model", "finish_reason": null, "usage": null, "tool_calls": [ ] } ``` ### Class: LLMRing Constructor: ```python LLMRing(origin: str = "llmring", registry_url: str | None = None, lockfile_path: str | None = None) ``` Methods: - `async chat(request: LLMRequest, profile: str | None = None) -> LLMResponse` - Resolves aliases via lockfile, routes call to provider, enriches `usage.cost` when registry pricing is available, records a local unsigned receipt if lockfile present. - `async chat_with_alias(alias_or_model: str, messages: list, temperature: float | None = None, max_tokens: int | None = None, profile: str | None = None, **kwargs) -> LLMResponse` - `resolve_alias(alias_or_model: str, profile: str | None = None) -> str` - `bind_alias(alias: str, model: str, profile: str | None = None) -> None` - `unbind_alias(alias: str, profile: str | None = None) -> None` - `list_aliases(profile: str | None = None) -> dict[str, str]` - `init_lockfile(force: bool = False) -> None` - `get_available_models() -> dict[str, list[str]]` - `get_model_info(model: str) -> dict` - `async get_enhanced_model_info(model: str) -> dict` - `async validate_context_limit(request: LLMRequest) -> str | None` - `async calculate_cost(response: LLMResponse) -> dict | None` - `async close() -> None` ### Lockfile API Classes: - `AliasBinding { alias, provider, model, constraints? }` with `model_ref` property. - `ProfileConfig { name, bindings[], registry_versions{} }` - `set_binding(alias, model_ref, constraints?)` - `remove_binding(alias) -> bool` - `get_binding(alias) -> AliasBinding | None` - `Lockfile { version, created_at, updated_at, default_profile, profiles{} }` - `@classmethod create_default() -> Lockfile` - `save(path: Path | None = None) -> None` - `@classmethod load(path: Path | None = None) -> Lockfile` - `@classmethod find_lockfile(start_path: Path | None = None) -> Path | None` - `calculate_digest() -> str` - `get_profile(name: str | None = None) -> ProfileConfig` - `set_binding(alias, model_ref, profile: str | None = None, constraints: dict | None = None)` - `resolve_alias(alias, profile: str | None = None) -> str | None` ### Registry Client - `RegistryModel` fields: `provider`, `model_name`, `display_name`, `description?`, `max_input_tokens?`, `max_output_tokens?`, `dollars_per_million_tokens_input?`, `dollars_per_million_tokens_output?`, `supports_vision`, `supports_function_calling`, `supports_json_mode`, `supports_parallel_tool_calls`, `is_active`, `added_date?`, `deprecated_date?`. - `RegistryClient(registry_url: str | None = None, cache_dir: Path | None = None)` - `async fetch_current_models(provider: str) -> list[RegistryModel]` - `async fetch_version(provider: str, version: int) -> RegistryVersion` - `async get_current_version(provider: str) -> int` - `async check_drift(provider: str, pinned_version: int) -> dict` - `async validate_model(provider: str, model_name: str) -> bool` - `clear_cache() -> None` ### Receipts - `Receipt` fields: `receipt_id`, `timestamp`, `alias`, `profile`, `lock_digest`, `provider`, `model`, `prompt_tokens`, `completion_tokens`, `total_tokens`, `input_cost`, `output_cost`, `total_cost`, `signature?`. - `ReceiptSigner`: `generate_keypair()`, `load_private_key()`, `load_public_key()`, `sign_receipt(receipt) -> str`, `verify_receipt(receipt, public_key) -> bool`, `export_private_key()`, `export_public_key()`. - `ReceiptGenerator`: `generate_receipt(...) -> Receipt`, `calculate_costs(provider, model, prompt_tokens, completion_tokens, model_info?) -> dict`. ### Providers All providers implement `BaseLLMProvider`: ```python async def chat(messages, model, temperature=None, max_tokens=None, response_format=None, tools=None, tool_choice=None, json_response=None, cache=None, stream=False, extra_params=None) -> LLMResponse | AsyncIterator[StreamChunk] async def validate_model(model: str) -> bool async def get_supported_models() -> list[str] def get_default_model() -> str ``` ### Structured Output (Unified) LLMRing provides a single interface for JSON Schema across providers: ```python request = LLMRequest( model="balanced", messages=[Message(role="user", content="Generate a person")], response_format={ "type": "json_schema", "json_schema": {"name": "person", "schema": {"type": "object", "properties": {"name": {"type": "string"}}}}, "strict": True } ) response = await ring.chat(request) print(response.parsed) ``` OpenAI uses native JSON Schema; Anthropic/Gemini use native tools/functions under the hood; Ollama uses best‑effort JSON with one repair attempt. --- # LLMRing Server - LLM Documentation URL: https://llmring.ai/docs/server/ This is the LLM-readable version of the LLMRing Server page. --- ## LLMRing Server **GitHub**: [https://github.com/juanre/llmring-server](https://github.com/juanre/llmring-server) Self-hostable backend that adds optional capabilities: signed receipts, usage logging/stats, conversation persistence, MCP tool/resource/prompt management, and a read-only proxy to the public registry. No alias storage or synchronization - aliases remain local to each codebase's lockfile. Dual‑mode: standalone service or embedded as a library (used by llmring-api). ## Quick Start ```bash uv run llmring-server --reload ``` Default: http://localhost:8000 with Swagger at `/docs`. ## Authentication - Project-scoped via `X-API-Key` header (api_key_id as VARCHAR) - No user management in this service - aliases are local to each codebase's lockfile ## Endpoints (selected) Public: - GET `/` – service info - GET `/health` – DB health - GET `/registry` or `/registry.json` – aggregated registry - GET `/receipts/public-key.pem` – current public key - GET `/receipts/public-keys.json` – list of active/rotated public keys ### Examples ```bash curl http://localhost:8000/ ``` ``` { "service": "llmring-server", "version": "0.1.0", "status": "operational", "timestamp": "2024-01-15T10:30:45.123Z", "endpoints": { "health": "/health", "registry": "/registry", "api": "/api/v1", "docs": "/docs" } }``` ```bash curl http://localhost:8000/registry.json ``` ``` { "version": "1.0", "generated": "2024-01-15T10:30:45.123Z", "providers": { "openai": { "version": 142, "models": { "gpt-4": { "name": "gpt-4", "max_input_tokens": 8192, "max_output_tokens": 4096, "dollars_per_million_input_tokens": 30.0, "dollars_per_million_output_tokens": 60.0 }, "gpt-4o-mini": { "name": "gpt-4o-mini", "max_input_tokens": 128000, "max_output_tokens": 16384, "dollars_per_million_input_tokens": 0.15, "dollars_per_million_output_tokens": 0.6 } } }, "anthropic": { "version": 89, "models": { "claude-3-haiku": { "name": "claude-3-haiku-20240307", "max_input_tokens": 200000, "max_output_tokens": 4096, "dollars_per_million_input_tokens": 0.25, "dollars_per_million_output_tokens": 1.25 }, "claude-3-opus": { "name": "claude-3-opus-20240229", "max_input_tokens": 200000, "max_output_tokens": 4096, "dollars_per_million_input_tokens": 15.0, "dollars_per_million_output_tokens": 75.0 } } } } }``` Project-scoped (require `X-API-Key`): - Usage: `POST /api/v1/log`, `GET /api/v1/stats` - Receipts: `POST /api/v1/receipts` (store signed), `GET /api/v1/receipts/{id}`, `POST /api/v1/receipts/issue` - Conversations: `POST /conversations`, `GET /conversations`, `GET /conversations/{id}`, `POST /conversations/{id}/messages/batch` - MCP Servers: `POST /api/v1/mcp/servers`, `GET /api/v1/mcp/servers`, `POST /api/v1/mcp/servers/{id}/refresh` - MCP Tools: `GET /api/v1/mcp/tools`, `POST /api/v1/mcp/tools/{id}/execute`, `GET /api/v1/mcp/tools/{id}/history` - MCP Resources: `GET /api/v1/mcp/resources`, `GET /api/v1/mcp/resources/{id}/content` - MCP Prompts: `GET /api/v1/mcp/prompts`, `POST /api/v1/mcp/prompts/{id}/render` ### Usage API - `POST /api/v1/log` body `{ provider, model, input_tokens, output_tokens, cached_input_tokens?, alias?, profile?, cost?, latency_ms?, origin?, id_at_origin?, metadata? }` → `{ log_id, cost, timestamp }` - `GET /api/v1/stats?start_date=&end_date=&group_by=day` → `{ summary, by_day[], by_model{}, by_origin{} }` ### Receipts API - `POST /api/v1/receipts` body `{ receipt: {...} }` → `{ receipt_id, status: "verified" }` - `GET /api/v1/receipts/{id}` → full receipt object - `POST /api/v1/receipts/issue` body is an unsigned receipt → signed receipt (requires server signing key) ## Configuration (env) - `LLMRING_DATABASE_URL` (required) - `LLMRING_DATABASE_SCHEMA` (default: llmring) - `LLMRING_REDIS_URL` (optional, caching) - `LLMRING_REGISTRY_BASE_URL` (default: https://llmring.github.io/registry/) - `LLMRING_RECEIPTS_PRIVATE_KEY_B64`, `LLMRING_RECEIPTS_PUBLIC_KEY_B64`, `LLMRING_RECEIPTS_KEY_ID` ## Dual‑mode - Standalone: manages its own DB connections and migrations - Library: use `create_app(db_manager=..., standalone=False, run_migrations=...)` with an external pool App factory: ```python create_app( db_manager: AsyncDatabaseManager | None = None, run_migrations: bool = True, schema: str | None = None, settings: Settings | None = None, standalone: bool = True, include_meta_routes: bool = True, ) -> FastAPI ``` ## Receipts - Ed25519 signature over RFC 8785 JCS - Canonical receipts are stored/verified by the server Receipt fields (subset): `id`, `timestamp`, `model`, `alias`, `profile`, `lock_digest`, `key_id`, `tokens { input, output, cached_input }`, `cost { amount, calculation }`, `signature`. ## MCP Integration The server provides full MCP (Model Context Protocol) persistence: ### MCP Database Schema - `servers` - MCP server registry (name, URL, transport, capabilities) - `tools` - Available tools with schemas - `resources` - Accessible resources (files, URLs, etc.) - `prompts` - Reusable prompt templates - `tool_executions` - Execution history with inputs/outputs All MCP operations are project-scoped via the `X-API-Key` header. ## Security Checklist - Set explicit CORS origins in production - Serve behind TLS - Treat `X-API-Key` as a secret (api_key_id) - Configure receipts keys to enable verification/issuance - MCP resources are isolated per project ## Links - GitHub: https://github.com/juanre/llmring-server --- # LLMRing Registry - LLM Documentation URL: https://llmring.ai/docs/registry/ This is the LLM-readable version of the LLMRing Registry page. --- # LLMRing Open Registry Public, versioned, **human-validated** registry of model capabilities and pricing, hosted on GitHub Pages. Models are keyed as `provider:model`. Base URL: [https://llmring.github.io/registry/](https://llmring.github.io/registry/) **Curation Philosophy**: All published registry files are reviewed and validated by humans. Automation is used only to generate draft candidates; no auto-publish to ensure data accuracy and trustworthiness. ## Files - Current per provider: `/[provider]/models.json` - Archived versions: `/[provider]/v/[n]/models.json` - Manifest: `/manifest.json` ## Schema (per provider) ```json { "version": 2, "updated_at": "2025-08-20T00:00:00Z", "models": { "openai:gpt-4o-mini": { "provider": "openai", "model_name": "gpt-4o-mini", "display_name": "GPT-4o Mini", "max_input_tokens": 128000, "max_output_tokens": 16384, "dollars_per_million_tokens_input": 0.15, "dollars_per_million_tokens_output": 0.60, "supports_vision": true, "supports_function_calling": true, "supports_json_mode": true, "supports_parallel_tool_calls": true, "tool_call_format": "json_schema", "is_active": true } } } ``` ## Curation Workflow (Human-Validated, Canonical) LLMRing's registry prioritizes accuracy through manual review: 1. **Gather sources** (recommended): Collect pricing/docs HTML and PDFs from each provider for audit trail 2. **Generate draft**: Use automation to create best-effort draft from sources (automation allowed for drafts only) 3. **Review changes**: Compare draft vs current published file, field-by-field; manually adjust as needed 4. **Promote**: Bump per-provider `version`, set `updated_at`, archive previous under `v//models.json`, replace current `models.json` **Critical**: Published `models.json` files are always human-reviewed. Automation generates candidates only; humans make final decisions to ensure accuracy. ### CLI (from `registry` package) ```bash # Install browser for PDF fetching (first time only) uv run playwright install chromium # Fetch documentation from all providers uv run llmring-registry fetch --provider all # Extract model information to create drafts uv run llmring-registry extract --provider all --timeout 120 # Review draft changes for each provider uv run llmring-registry review-draft --provider openai uv run llmring-registry review-draft --provider anthropic uv run llmring-registry review-draft --provider google # Accept all changes (after review) uv run llmring-registry review-draft --provider openai --accept-all # Promote reviewed file to production and archive uv run llmring-registry promote --provider openai ``` **Single provider update example:** ```bash uv run llmring-registry fetch --provider openai uv run llmring-registry extract --provider openai uv run llmring-registry review-draft --provider openai --accept-all uv run llmring-registry promote --provider openai ``` ## Clients - The `llmring` library fetches current models and uses the registry for cost calculation and limit validation. - The server proxies the registry and may cache responses. ### Client-side Lookup Rules - Models map is a dictionary keyed by `provider:model`. Clients should prefer O(1) lookups by that key. - When only `model` is available, clients may attempt fallback keys: `models[model]` or `models[f"{provider}/{model}"]` for legacy data. ## Links - Live data: [https://llmring.github.io/registry/](https://llmring.github.io/registry/) - Source: [https://github.com/llmring/registry](https://github.com/llmring/registry) --- # Model Context Protocol (MCP) - LLM Documentation URL: https://llmring.ai/docs/mcp/ This is the LLM-readable version of the Model Context Protocol (MCP) page. --- # Model Context Protocol (MCP) LLMRing provides comprehensive MCP (Model Context Protocol) support for standardized tool orchestration and resource management. MCP enables LLMs to interact with external tools, resources, and data sources through a unified protocol. ## Key Features - **MCP Chat Client**: Interactive terminal application with persistent conversation history - **Enhanced LLM**: Automatic tool discovery and execution integrated into chat flows - **Lockfile Management**: Conversational configuration via MCP for intelligent alias setup - **Multiple Transports**: HTTP, WebSocket, and stdio connections to MCP servers - **Streaming Support**: Tool calls work seamlessly with streaming responses - **Custom Servers**: Build your own MCP servers to expose tools to LLMs - **Persistence**: Optional server-backed storage (requires llmring-server) ## Choosing Your MCP Interface LLMRing provides two ways to work with MCP: | Feature | MCP Chat Client | Enhanced LLM | |---------|----------------|--------------| | **Best For** | Interactive terminal sessions, configuration | Programmatic integration, applications | | **Interface** | Command-line chat application | Python API | | **History** | Automatic persistent history in `~/.llmring/mcp_chat/` | Custom management needed | | **Session Management** | Built-in session saving/loading | Manual implementation | | **Tool Discovery** | Automatic with `/tools` command | Automatic via API | | **Streaming** | Real-time terminal output | AsyncIterator for custom handling | | **Use Cases** | Lockfile configurationInteractive explorationCLI toolsQuick testing | Production applicationsAutomated workflowsCustom integrationsBatch processing | **Quick Decision Guide:** - Use **Chat Client** for interactive configuration, exploration, or testing - Use **Enhanced LLM** for MCP capabilities in your Python applications ## Quick Start ### MCP Chat Client (Interactive) ```bash # Conversational lockfile configuration (built-in) llmring lock chat # Connect to custom MCP server llmring mcp chat --server "stdio://python -m your_mcp_server" # HTTP server llmring mcp chat --server "http://localhost:8080" # WebSocket server llmring mcp chat --server "ws://localhost:8080" ``` **Command Line Options:** ```bash llmring mcp chat [OPTIONS] Options: --server TEXT MCP server URL (stdio://, http://, ws://) --model TEXT LLM model alias to use (default: advisor) --no-telemetry Disable telemetry --debug Enable debug logging ``` ### Enhanced LLM (Programmatic) ```python from llmring.mcp.client.enhanced_llm import create_enhanced_llm # Create enhanced LLM with MCP tools llm = await create_enhanced_llm( model="balanced", mcp_server_path="stdio://python -m my_mcp_server" ) # Chat with automatic tool execution messages = [{"role": "user", "content": "Help me with my files"}] response = await llm.chat(messages) print(response.content) if response.tool_calls: print(f"Used tools: {[call['function']['name'] for call in response.tool_calls]}") ``` ## Built-in Chat Commands | Command | Description | |---------|-------------| | `/help` | Display all available commands | | `/history` | Show current conversation history | | `/sessions` | List all saved chat sessions | | `/load ` | Load and resume a previous session | | `/clear` | Clear the current conversation | | `/model ` | Switch to a different model | | `/tools` | List available MCP tools from the server | | `/exit` or `/quit` | Exit the chat client | ## Conversational Lockfile Configuration The most powerful feature of MCP in LLMRing is conversational lockfile management: ```bash llmring lock chat ``` This starts an interactive session where you can: - Describe requirements in natural language - Get recommendations based on current registry - Understand cost implications and tradeoffs - Configure aliases with fallback models - Set up environment-specific profiles **Example conversation:** ``` You: I need a configuration for a coding assistant that prioritizes accuracy Assistant: I'll help you configure an accurate coding assistant. Based on the registry, I recommend using Claude 3.5 Sonnet as the primary model with GPT-4o as fallback. [Calling tool: add_alias] Added alias 'coder' with models: anthropic:claude-3-5-sonnet, openai:gpt-4o This configuration prioritizes accuracy while providing fallback for availability. Monthly cost estimate: ~$50-100 for moderate usage. You: Add a cheaper option for simple tasks Assistant: I'll add a cost-effective alias for simpler coding tasks. [Calling tool: add_alias] Added alias 'coder-fast' with model: openai:gpt-4o-mini This model is 10x cheaper and perfect for simple completions, syntax fixes, and basic code generation. ``` ## Persistent History All conversations are automatically saved in `~/.llmring/mcp_chat/`: ``` ~/.llmring/mcp_chat/ ├── command_history.txt # Terminal command history ├── conversation_.json # Individual conversations └── sessions.json # Session metadata and index ``` Each session includes: - Unique session ID and timestamp - Complete message history - Tool calls and their results - Model used for each response ## Connecting to MCP Servers ### Stdio Servers (Local Processes) Most common for development and local tools: ```bash # Python MCP server llmring mcp chat --server "stdio://python -m mypackage.mcp_server" # Node.js MCP server llmring mcp chat --server "stdio://node my-mcp-server.js" # Any executable llmring mcp chat --server "stdio:///usr/local/bin/my-mcp-tool" ``` ### HTTP Servers For REST API-based MCP servers: ```bash # Local development llmring mcp chat --server "http://localhost:8080" # Remote server llmring mcp chat --server "https://api.example.com/mcp" ``` ### WebSocket Servers For real-time, bidirectional communication: ```bash # WebSocket connection llmring mcp chat --server "ws://localhost:8080" # Secure WebSocket llmring mcp chat --server "wss://mcp.example.com" ``` ## Creating Custom MCP Servers Build your own MCP servers to expose tools to LLMs: ### Simple Python Example ```python #!/usr/bin/env python3 """my_mcp_server.py - Custom MCP server example""" import asyncio from llmring.mcp.server import MCPServer from llmring.mcp.server.transport.stdio import StdioTransport # Create server server = MCPServer( name="My Custom Tools", version="1.0.0" ) # Register tools @server.function_registry.register( name="get_weather", description="Get weather for a location" ) def get_weather(location: str) -> dict: return { "location": location, "temperature": 72, "conditions": "sunny" } @server.function_registry.register( name="calculate", description="Perform calculations" ) def calculate(expression: str) -> dict: try: # Use ast.literal_eval in production for safety result = eval(expression) return {"result": result} except Exception as e: return {"error": str(e)} # Run server async def main(): transport = StdioTransport() await server.run(transport) if __name__ == "__main__": asyncio.run(main()) ``` Connect to your server: ```bash llmring mcp chat --server "stdio://python my_mcp_server.py" ``` ## Enhanced LLM with MCP For programmatic usage with automatic tool execution: ### Basic Usage ```python from llmring.mcp.client.enhanced_llm import create_enhanced_llm # Create enhanced LLM llm = await create_enhanced_llm( model="balanced", mcp_server_path="stdio://python -m my_mcp_server" ) # Chat with automatic tool execution messages = [{"role": "user", "content": "What's the weather in NYC?"}] response = await llm.chat(messages) # Tools are called automatically and results integrated print(response.content) # "The weather in NYC is 72°F and sunny." ``` ### Streaming with Tools ```python # Streaming works seamlessly with tool calls messages = [{"role": "user", "content": "Analyze this file and summarize it"}] async for chunk in await llm.chat_stream(messages): if chunk.type == "content": print(chunk.content, end="", flush=True) elif chunk.type == "tool_call": print(f"\n[Calling tool: {chunk.tool_call.name}]") elif chunk.type == "tool_result": print(f"\n[Tool result received]") ``` ### Direct MCP Client Usage ```python from llmring.mcp.client.mcp_client import MCPClient # Connect to MCP server client = MCPClient("http://localhost:8000") await client.initialize() # List available tools tools = await client.list_tools() for tool in tools: print(f"- {tool['name']}: {tool['description']}") # Execute a tool directly result = await client.call_tool( "read_file", {"path": "/path/to/file.txt"} ) print(result) # Clean up await client.close() ``` ## Best Practices ### Server Design Philosophy When creating MCP servers: 1. **Data-Focused Tools**: Design tools to provide data and perform actions, not make decisions 2. **LLM in Driver's Seat**: Let the LLM decide how to use tools based on user intent 3. **Clear Tool Names**: Use descriptive, action-oriented names 4. **Comprehensive Schemas**: Provide detailed parameter descriptions 5. **Error Handling**: Return informative error messages ### Security Considerations 1. **Validate Input**: Always validate tool parameters 2. **Limit Scope**: Tools should have minimal necessary permissions 3. **Secure Transport**: Use HTTPS/WSS in production 4. **Authentication**: Implement proper auth for production servers 5. **Audit Logging**: Log tool usage for security monitoring ### Performance Tips 1. **Choose Appropriate Models**: Match model capability to task complexity 2. **Cache Results**: Implement caching in MCP servers for expensive operations 3. **Streaming Responses**: Use streaming for long-running operations 4. **Batch Operations**: Design tools to handle batch requests when possible ## Troubleshooting **Server won't start:** - Check the server path is correct - Ensure proper permissions - Verify Python/Node.js environment is activated **Tools not appearing:** - Run `/tools` to refresh tool list - Check server logs for registration errors - Verify tool schemas are valid **History not saving:** - Check permissions on `~/.llmring/mcp_chat/` - Ensure enough disk space - Look for errors in debug mode (`--debug`) **Connection errors:** - Verify server is running - Check firewall/network settings - Ensure correct protocol (stdio/http/ws) ## Further Resources - Example MCP servers: See `examples/mcp/` in the [llmring repository](https://github.com/juanre/llmring) - MCP specification: [Model Context Protocol](https://github.com/anthropics/model-context-protocol) - LLMRing MCP source: `src/llmring/mcp/` in the repository