# Home - LLM Documentation

URL: https://llmring.ai/
This is the LLM-readable version of the Home page.

---

# LLMRing

One interface to run them all ...

LLMRing is an Open Source provider-agnostic Python library for talking to LLMs. It lets you easily manage which model you use for any task with aliases, use a single interface for all providers, and track usage and cost via an optional server. Your aliases are stored in a version-controlled `llmring.lock` file, making your model choices explicit, easy to change, and easy to share.

Your API calls go directly to OpenAI, Anthropic, Google, or Ollama.

The call's metadata can be optionally logged to a [server managed by you](/docs/server/).

## Components

- **[Library (llmring)](/docs/llmring/)** - Python package for unified LLM access with built-in MCP support
- **[Server (llmring-server)](/docs/server/)** - Optional backend for usage tracking, receipts, and MCP persistence
- **[Registry](/docs/registry/)** - Versioned, human-validated database of model capabilities and pricing

## Quick Start

Install and create a basic lockfile:

```bash
uv add llmring
```

```bash
llmring lock init
```

This creates `llmring.lock` with sensible defaults and pinned registry versions. For intelligent, conversational configuration that analyzes the live registry and recommends optimal aliases (e.g., `fast`, `balanced`, `deep`), use:

```bash
llmring lock chat
```

## Lockfile + Aliases

Your configuration lives in `llmring.lock`, a version-controlled file that makes your AI stack reproducible:

```toml
# llmring.lock (excerpt)

# Registry version pinning (optional)
[registry_versions]
openai = 142
anthropic = 89
google = 27

# Default bindings
[[bindings]]
alias = "summarizer"
models = ["anthropic:claude-3-haiku"]

[[bindings]]
alias = "pdf_converter"
models = ["openai:gpt-4o-mini"]

[[bindings]]
alias = "balanced"
models = ["anthropic:claude-3-5-sonnet", "openai:gpt-4o"]  # With fallback
```

Use aliases in your code:

```python
from llmring import LLMRing, Message

ring = LLMRing()  # Loads from llmring.lock

response = await ring.chat("summarizer", messages=[
    Message(role="user", content="Summarize this document...")
])
```

## Unified Structured Output

LLMRing provides one interface for structured output across all providers. Use a JSON Schema with `response_format`, and LLMRing adapts it per provider:

```python
from llmring import LLMRing
from llmring.schemas import LLMRequest, Message

ring = LLMRing()

request = LLMRequest(
  model="balanced",
  messages=[Message(role="user", content="Generate a person")],
  response_format={
    "type": "json_schema",
    "json_schema": {
      "name": "person",
      "schema": {
        "type": "object",
        "properties": {
          "name": {"type": "string"},
          "age": {"type": "integer"}
        },
        "required": ["name", "age"]
      }
    },
    "strict": True
  }
)

response = await ring.chat(request)
print(response.content)  # valid JSON
print(response.parsed)   # dict
```

**How it works per provider:**
- **OpenAI**: Native JSON Schema strict mode
- **Anthropic**: Tool-based extraction with validation
- **Google Gemini**: FunctionDeclaration with schema mapping
- **Ollama**: Best-effort JSON with automatic repair

## CLI Commands

The configuration is in your lockfile

```bash
# Create basic lockfile with defaults
llmring lock init

# Intelligent conversational configuration (recommended)
llmring lock chat

# Bind aliases locally (escape hatch)
llmring bind pdf_converter openai:gpt-4o-mini

# Validate against registry
llmring lock validate

# Update registry versions
llmring lock bump-registry
```

## Two Modes of Operation

### 1. Lockfile-Only (No Backend)
Works completely standalone with just your `llmring.lock` file. Safe, explicit configuration per codebase. No costs tracking, no logging, no MCP persistence.

### 2. With Server (Self-Hosted)
Add receipts, usage tracking, and MCP tool/resource persistence by connecting to your own `llmring-server` instance.

See [Server Docs](/docs/server/) for endpoints, headers, and deployment.

## The Open Registry

Model information comes from versioned, per-provider registries:

- Current: [https://llmring.github.io/registry/openai/models.json](https://llmring.github.io/registry/openai/models.json)
- Versioned: [https://llmring.github.io/registry/openai/v/142/models.json](https://llmring.github.io/registry/openai/v/142/models.json)

Each provider's registry is versioned independently. Your lockfile records these versions to track drift:

```toml
[registry_versions]
openai = 142      # Registry snapshot when you last updated
anthropic = 89    # What the registry knew at version 89
```

Note: These versions track what the registry knew at that point, not the actual model behavior. Providers can change prices and limits anytime - the registry helps you detect when things have drifted from your expectations.

See [Registry Docs](/docs/registry/) for schema and curation workflow.

## Profiles for Different Environments

Support multiple configurations in one lockfile:

```toml
# llmring.lock (profiles excerpt)

# Production: High quality with fallbacks
[profiles.prod]
[[profiles.prod.bindings]]
alias = "summarizer"
models = ["anthropic:claude-3-haiku"]

[[profiles.prod.bindings]]
alias = "analyzer"
models = ["openai:gpt-4", "anthropic:claude-3-5-sonnet"]

# Development: Cheaper models
[profiles.dev]
[[profiles.dev.bindings]]
alias = "summarizer"
models = ["openai:gpt-4o-mini"]

[[profiles.dev.bindings]]
alias = "analyzer"
models = ["openai:gpt-4o-mini"]
```

Switch profiles via environment:

```bash
export LLMRING_PROFILE=prod
python app.py
```

## CLI Workflow

Core lockfile management:

```bash
# Create basic lockfile with defaults
llmring lock init

# Intelligent conversational configuration (recommended)
llmring lock chat

# Bind aliases (updates lockfile)
llmring bind summarizer anthropic:claude-3-haiku

# List aliases from lockfile
llmring aliases

# Validate against registry
llmring lock validate

# Update registry versions
llmring lock bump-registry
```

MCP operations (requires backend):

```bash
# Connect to any MCP server for interactive chat
llmring mcp chat --server "stdio://python -m your_mcp_server"

# List registered MCP servers
llmring mcp servers list

# Register new MCP server
llmring mcp register calculator http://calculator-mcp:8080

# List available tools
llmring mcp tools

# Execute a tool
llmring mcp execute calculator.add '{"a": 5, "b": 3}'
```

With a server connected:

```bash
# View usage stats (requires server)
llmring stats

# Export receipts (requires server)
llmring export
```

## Environment Variables

```bash
# LLM provider keys (required)
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
# Gemini supports either of these
export GEMINI_API_KEY="..."       # or
export GOOGLE_API_KEY="..."       # or
export GOOGLE_GEMINI_API_KEY="..."

# Optional profile selection
export LLMRING_PROFILE="prod"

# Optional server connection
export LLMRING_API_URL="http://localhost:8000"
```

## Why LLMRing

- **Lockfile**: Version control your AI configuration with reproducible deployments
- **Task-oriented**: Think in terms of tasks, not model IDs
- **No vendor lock-in**: Works completely without any backend
- **Drift detection**: Track when models change from your expectations
- **MCP Integration**: Full Model Context Protocol support for tool orchestration
- **Flexible**: Use standalone or with optional self-hosted server for receipts and tracking

## Source Code

Everything is open source on GitHub:

- [llmring](https://github.com/juanre/llmring) - Python package and CLI
- [llmring-server](https://github.com/juanre/llmring-server) - Optional API server
- [registry](https://github.com/llmring/registry) - Model registry source

## License

MIT License. Use it however you want.

---

One interface to run them all
One registry to find them
One API to track them all
And with aliases bind them

---

# Documentation - LLM Documentation

URL: https://llmring.ai/docs/
This is the LLM-readable version of the Documentation page.

---

# LLMRing Documentation

## Getting Started

- **[Common Recipes](/docs/recipes/)** - Practical patterns and examples

## Core Components

- **[Library (llmring)](/docs/llmring/)** - Python library reference and API with built-in MCP support
- **[Server (llmring-server)](/docs/server/)** - Self-hostable backend for signed receipts, usage tracking, and MCP persistence
- **[Registry](/docs/registry/)** - Human-validated model capabilities and pricing database

## Advanced Features

- **[MCP Integration](/docs/mcp/)** - Model Context Protocol for tool orchestration and conversational lockfile management

## Resources

- **[GitHub Repository](https://github.com/juanre/llmring)** - Source code and issues
- **[PyPI Package](https://pypi.org/project/llmring/)** - Python package

---

# LLMRing Library (Python) - LLM Documentation

URL: https://llmring.ai/docs/llmring/
This is the LLM-readable version of the LLMRing Library (Python) page.

---

# LLMRing Python Library

**GitHub**: [https://github.com/juanre/llmring](https://github.com/juanre/llmring)

Python library to talk to OpenAI, Anthropic, Google, and Ollama with a unified interface. Configuration is stored in a version-controlled `llmring.lock` file (local to each codebase). Models are accessed via aliases.

## Modes of Operation

1. **Lockfile-Only**: Works completely standalone with just your `llmring.lock`. No backend required, no logging, no MCP persistence.
2. **With Server**: Connect to self-hosted `llmring-server` for receipts, usage tracking, and MCP persistence.

## Installation

```bash
uv add llmring
```

## Quick Start

```bash
llmring lock init
```
```bash
llmring lock chat  # For intelligent conversational configuration
```
```bash
llmring bind summarizer anthropic:claude-3-haiku
```
```bash
llmring aliases
```

```python
from llmring import LLMRing, LLMRequest, Message

ring = LLMRing()
request = LLMRequest(
  messages=[Message(role="user", content="Summarize this text")],
  model="summarizer"
)
response = await ring.chat(request)
```

## Lockfile

- Authoritative config; commit to VCS
- Optional profiles for different environments: `dev`, `staging`, `prod`
- Pinned registry versions per provider

```toml
# Registry version pinning (optional)
[registry_versions]
openai = 142
anthropic = 89

# Default bindings
[[bindings]]
alias = "summarizer"
models = ["anthropic:claude-3-haiku-20240307"]

[[bindings]]
alias = "balanced"
models = ["anthropic:claude-3-5-sonnet", "openai:gpt-4o"]
```

## CLI

```bash
llmring lock init [--force]
```
```bash
llmring lock chat                      # conversational configuration
```
```bash
llmring bind <alias> <provider:model> [--profile <name>]
```
```bash
llmring aliases [--profile <name>]
```
```bash
llmring lock validate
```
```bash
llmring lock bump-registry
```
```bash
llmring list [--provider <name>]
```
```bash
llmring info <provider:model> [--json]
```
```bash
llmring stats|export                   # requires server
```
```bash
llmring mcp chat [--server URL]        # MCP interactive chat
```
```bash
llmring mcp servers list               # list MCP servers
```
```bash
llmring mcp tools                      # list MCP tools
```

### CLI Output

```bash
llmring --help
```

```
usage: cli.py [-h]
              {lock,bind,aliases,list,chat,info,providers,push,pull,stats,export,register} ...

LLMRing - Unified LLM Service CLI

positional arguments:
  {lock,bind,aliases,list,chat,info,providers,push,pull,stats,export,register}
                        Commands
    lock                Lockfile management
    bind                Bind an alias to a model
    aliases             List aliases from lockfile
    list                List available models
    chat                Send a chat message
    info                Show model information
    providers           List configured providers
    push                Push lockfile aliases to server (X-Project-Key
                        required)
    pull                Pull aliases from server into lockfile (X-Project-Key
                        required)
    stats               Show usage statistics
    export              Export receipts to file
    register            Register with LLMRing server (for SaaS features)

options:
  -h, --help            show this help message and exit
```

```bash
llmring providers
```

```
Configured Providers:
----------------------------------------
✓ openai       OPENAI_API_KEY
✓ anthropic    ANTHROPIC_API_KEY
✗ google       GOOGLE_API_KEY or GEMINI_API_KEY
✓ ollama       (not required)
```

```bash
llmring list
```

```
Available Models:
----------------------------------------

ANTHROPIC:
  - claude-3-7-sonnet-20250219
  - claude-3-7-sonnet
  - claude-3-5-sonnet-20241022-v2
  - claude-3-5-sonnet-20241022
  - claude-3-5-sonnet-20240620
  - claude-3-5-sonnet
  - claude-3-5-haiku-20241022
  - claude-3-5-haiku
  - claude-3-opus-20240229
  - claude-3-sonnet-20240229
  - claude-3-haiku-20240307

OPENAI:
  - gpt-4o
  - gpt-4o-mini
  - gpt-4o-2024-08-06
  - gpt-4-turbo
  - gpt-4
  - gpt-3.5-turbo
  - o1
  - o1-mini

OLLAMA:
  - llama3.3:latest
  - llama3.3
  - llama3.2
  - llama3.1
  - llama3
  - mistral
  - mixtral
  - codellama
  - phi3
  - gemma2
  - gemma
  - qwen2.5
  - qwen
```

### Lockfile workflow

```bash
llmring lock init
```

```
✅ Created lockfile: /Users/juanre/prj/llmring-all/llmring.ai/dist/docs-run/llmring.lock

Default bindings:
  long_context → openai:gpt-4-turbo-preview
  low_cost → openai:gpt-3.5-turbo
  json_mode → openai:gpt-4-turbo-preview
  fast → openai:gpt-3.5-turbo
  deep → anthropic:claude-3-opus-20240229
  balanced → anthropic:claude-3-sonnet-20240229
  pdf_reader → anthropic:claude-3-sonnet-20240229
  local → ollama:llama3.3:latest
```

```bash
llmring lock chat
```
Starts an interactive conversational configuration session for intelligent lockfile management.

```bash
llmring aliases
```

```
Aliases in profile 'default':
----------------------------------------
long_context         → openai:gpt-4-turbo-preview
low_cost             → openai:gpt-3.5-turbo
json_mode            → openai:gpt-4-turbo-preview
fast                 → openai:gpt-3.5-turbo
deep                 → anthropic:claude-3-opus-20240229
balanced             → anthropic:claude-3-sonnet-20240229
pdf_reader           → anthropic:claude-3-sonnet-20240229
local                → ollama:llama3.3:latest
```

```bash
llmring bind summarizer anthropic:claude-3-haiku
```

```
✅ Bound 'summarizer' → 'anthropic:claude-3-haiku' in profile 'default'
```

```bash
llmring aliases
```

```
Aliases in profile 'default':
----------------------------------------
long_context         → openai:gpt-4-turbo-preview
low_cost             → openai:gpt-3.5-turbo
json_mode            → openai:gpt-4-turbo-preview
fast                 → openai:gpt-3.5-turbo
deep                 → anthropic:claude-3-opus-20240229
balanced             → anthropic:claude-3-sonnet-20240229
pdf_reader           → anthropic:claude-3-sonnet-20240229
local                → ollama:llama3.3:latest
summarizer           → anthropic:claude-3-haiku
```

## Registry Integration

- Fetches model capabilities/pricing from [https://llmring.github.io/registry/](https://llmring.github.io/registry/)
- Models keyed as `provider:model`
- Fields include `max_input_tokens`, `max_output_tokens`, `dollars_per_million_tokens_*`, and capability flags including `supports_parallel_tool_calls`.

## Receipts

- Local: library can calculate costs and create unsigned receipt objects
- Canonical: signed by server using Ed25519 over RFC 8785 JCS

## Environment

```bash
export OPENAI_API_KEY=...
export ANTHROPIC_API_KEY=...
export GEMINI_API_KEY=...         # or GOOGLE_API_KEY=... or GOOGLE_GEMINI_API_KEY=...
export LLMRING_PROFILE=prod
```

## Security

- Lockfile contains no secrets
- API keys via environment only

## Links

- GitHub: https://github.com/juanre/llmring
- PyPI: https://pypi.org/project/llmring

---

## API Reference

### Schemas

`Message`

```json
{
  "role": "system | user | assistant | tool",
  "content": "string or structured content",
  "tool_calls": [ { "id": "...", "type": "...", "function": { "name": "...", "arguments": { } } } ],
  "tool_call_id": "optional",
  "timestamp": "ISO-8601 optional"
}
```

`LLMRequest`

```json
{
  "messages": [ Message ],
  "model": "provider:model or alias",
  "temperature": 0.0,
  "max_tokens": 1024,
  "response_format": { },
  "tools": [ { } ],
  "tool_choice": "auto | none | any | { function: name }",
  "cache": { },
  "metadata": { },
  "json_response": true,
  "stream": false,
  "extra_params": { }
}
```

`LLMResponse`

```json
{
  "content": "string",
  "model": "provider:model",
  "parsed": { },
  "usage": {
    "prompt_tokens": 0,
    "completion_tokens": 0,
    "total_tokens": 0,
    "cost": 0.000123,               // if registry pricing available
    "cost_breakdown": { "input": 0.0, "output": 0.0 }
  },
  "finish_reason": "stop | length | tool_calls | ...",
  "tool_calls": [ { } ]
}
```

`StreamChunk`

```json
{
  "delta": "partial text",
  "model": "provider:model",
  "finish_reason": null,
  "usage": null,
  "tool_calls": [ ]
}
```

### Class: LLMRing

Constructor:

```python
LLMRing(origin: str = "llmring", registry_url: str | None = None, lockfile_path: str | None = None)
```

Methods:

- `async chat(request: LLMRequest, profile: str | None = None) -> LLMResponse`
  - Resolves aliases via lockfile, routes call to provider, enriches `usage.cost` when registry pricing is available, records a local unsigned receipt if lockfile present.

- `async chat_with_alias(alias_or_model: str, messages: list, temperature: float | None = None, max_tokens: int | None = None, profile: str | None = None, **kwargs) -> LLMResponse`

- `resolve_alias(alias_or_model: str, profile: str | None = None) -> str`

- `bind_alias(alias: str, model: str, profile: str | None = None) -> None`

- `unbind_alias(alias: str, profile: str | None = None) -> None`

- `list_aliases(profile: str | None = None) -> dict[str, str]`

- `init_lockfile(force: bool = False) -> None`

- `get_available_models() -> dict[str, list[str]]`

- `get_model_info(model: str) -> dict`

- `async get_enhanced_model_info(model: str) -> dict`

- `async validate_context_limit(request: LLMRequest) -> str | None`

- `async calculate_cost(response: LLMResponse) -> dict | None`

- `async close() -> None`

### Lockfile API

Classes:

- `AliasBinding { alias, provider, model, constraints? }` with `model_ref` property.
- `ProfileConfig { name, bindings[], registry_versions{} }`
  - `set_binding(alias, model_ref, constraints?)`
  - `remove_binding(alias) -> bool`
  - `get_binding(alias) -> AliasBinding | None`
- `Lockfile { version, created_at, updated_at, default_profile, profiles{} }`
  - `@classmethod create_default() -> Lockfile`
  - `save(path: Path | None = None) -> None`
  - `@classmethod load(path: Path | None = None) -> Lockfile`
  - `@classmethod find_lockfile(start_path: Path | None = None) -> Path | None`
  - `calculate_digest() -> str`
  - `get_profile(name: str | None = None) -> ProfileConfig`
  - `set_binding(alias, model_ref, profile: str | None = None, constraints: dict | None = None)`
  - `resolve_alias(alias, profile: str | None = None) -> str | None`

### Registry Client

- `RegistryModel` fields: `provider`, `model_name`, `display_name`, `description?`, `max_input_tokens?`, `max_output_tokens?`, `dollars_per_million_tokens_input?`, `dollars_per_million_tokens_output?`, `supports_vision`, `supports_function_calling`, `supports_json_mode`, `supports_parallel_tool_calls`, `is_active`, `added_date?`, `deprecated_date?`.

- `RegistryClient(registry_url: str | None = None, cache_dir: Path | None = None)`
  - `async fetch_current_models(provider: str) -> list[RegistryModel]`
  - `async fetch_version(provider: str, version: int) -> RegistryVersion`
  - `async get_current_version(provider: str) -> int`
  - `async check_drift(provider: str, pinned_version: int) -> dict`
  - `async validate_model(provider: str, model_name: str) -> bool`
  - `clear_cache() -> None`

### Receipts

- `Receipt` fields: `receipt_id`, `timestamp`, `alias`, `profile`, `lock_digest`, `provider`, `model`, `prompt_tokens`, `completion_tokens`, `total_tokens`, `input_cost`, `output_cost`, `total_cost`, `signature?`.
- `ReceiptSigner`: `generate_keypair()`, `load_private_key()`, `load_public_key()`, `sign_receipt(receipt) -> str`, `verify_receipt(receipt, public_key) -> bool`, `export_private_key()`, `export_public_key()`.
- `ReceiptGenerator`: `generate_receipt(...) -> Receipt`, `calculate_costs(provider, model, prompt_tokens, completion_tokens, model_info?) -> dict`.

### Providers

All providers implement `BaseLLMProvider`:

```python
async def chat(messages, model, temperature=None, max_tokens=None, response_format=None, tools=None, tool_choice=None, json_response=None, cache=None, stream=False, extra_params=None) -> LLMResponse | AsyncIterator[StreamChunk]
async def validate_model(model: str) -> bool
async def get_supported_models() -> list[str]
def get_default_model() -> str
```

### Structured Output (Unified)

LLMRing provides a single interface for JSON Schema across providers:

```python
request = LLMRequest(
  model="balanced",
  messages=[Message(role="user", content="Generate a person")],
  response_format={
    "type": "json_schema",
    "json_schema": {"name": "person", "schema": {"type": "object", "properties": {"name": {"type": "string"}}}},
    "strict": True
  }
)
response = await ring.chat(request)
print(response.parsed)
```
OpenAI uses native JSON Schema; Anthropic/Gemini use native tools/functions under the hood; Ollama uses best‑effort JSON with one repair attempt.

---

# LLMRing Server - LLM Documentation

URL: https://llmring.ai/docs/server/
This is the LLM-readable version of the LLMRing Server page.

---

## LLMRing Server

**GitHub**: [https://github.com/juanre/llmring-server](https://github.com/juanre/llmring-server)

Self-hostable backend that adds optional capabilities: signed receipts, usage logging/stats, conversation persistence, MCP tool/resource/prompt management, and a read-only proxy to the public registry. No alias storage or synchronization - aliases remain local to each codebase's lockfile. Dual‑mode: standalone service or embedded as a library (used by llmring-api).

## Quick Start

```bash
uv run llmring-server --reload
```

Default: http://localhost:8000 with Swagger at `/docs`.

## Authentication

- Project-scoped via `X-API-Key` header (api_key_id as VARCHAR)
- No user management in this service - aliases are local to each codebase's lockfile

## Endpoints (selected)

Public:
- GET `/` – service info
- GET `/health` – DB health
- GET `/registry` or `/registry.json` – aggregated registry
- GET `/receipts/public-key.pem` – current public key
- GET `/receipts/public-keys.json` – list of active/rotated public keys

### Examples

```bash
curl http://localhost:8000/
```

```
{
  "service": "llmring-server",
  "version": "0.1.0",
  "status": "operational",
  "timestamp": "2024-01-15T10:30:45.123Z",
  "endpoints": {
    "health": "/health",
    "registry": "/registry",
    "api": "/api/v1",
    "docs": "/docs"
  }
}```

```bash
curl http://localhost:8000/registry.json
```

```
{
  "version": "1.0",
  "generated": "2024-01-15T10:30:45.123Z",
  "providers": {
    "openai": {
      "version": 142,
      "models": {
        "gpt-4": {
          "name": "gpt-4",
          "max_input_tokens": 8192,
          "max_output_tokens": 4096,
          "dollars_per_million_input_tokens": 30.0,
          "dollars_per_million_output_tokens": 60.0
        },
        "gpt-4o-mini": {
          "name": "gpt-4o-mini",
          "max_input_tokens": 128000,
          "max_output_tokens": 16384,
          "dollars_per_million_input_tokens": 0.15,
          "dollars_per_million_output_tokens": 0.6
        }
      }
    },
    "anthropic": {
      "version": 89,
      "models": {
        "claude-3-haiku": {
          "name": "claude-3-haiku-20240307",
          "max_input_tokens": 200000,
          "max_output_tokens": 4096,
          "dollars_per_million_input_tokens": 0.25,
          "dollars_per_million_output_tokens": 1.25
        },
        "claude-3-opus": {
          "name": "claude-3-opus-20240229",
          "max_input_tokens": 200000,
          "max_output_tokens": 4096,
          "dollars_per_million_input_tokens": 15.0,
          "dollars_per_million_output_tokens": 75.0
        }
      }
    }
  }
}```

Project-scoped (require `X-API-Key`):
- Usage: `POST /api/v1/log`, `GET /api/v1/stats`
- Receipts: `POST /api/v1/receipts` (store signed), `GET /api/v1/receipts/{id}`, `POST /api/v1/receipts/issue`
- Conversations: `POST /conversations`, `GET /conversations`, `GET /conversations/{id}`, `POST /conversations/{id}/messages/batch`
- MCP Servers: `POST /api/v1/mcp/servers`, `GET /api/v1/mcp/servers`, `POST /api/v1/mcp/servers/{id}/refresh`
- MCP Tools: `GET /api/v1/mcp/tools`, `POST /api/v1/mcp/tools/{id}/execute`, `GET /api/v1/mcp/tools/{id}/history`
- MCP Resources: `GET /api/v1/mcp/resources`, `GET /api/v1/mcp/resources/{id}/content`
- MCP Prompts: `GET /api/v1/mcp/prompts`, `POST /api/v1/mcp/prompts/{id}/render`

### Usage API

- `POST /api/v1/log` body `{ provider, model, input_tokens, output_tokens, cached_input_tokens?, alias?, profile?, cost?, latency_ms?, origin?, id_at_origin?, metadata? }` → `{ log_id, cost, timestamp }`
- `GET /api/v1/stats?start_date=&end_date=&group_by=day` → `{ summary, by_day[], by_model{}, by_origin{} }`

### Receipts API

- `POST /api/v1/receipts` body `{ receipt: {...} }` → `{ receipt_id, status: "verified" }`
- `GET /api/v1/receipts/{id}` → full receipt object
- `POST /api/v1/receipts/issue` body is an unsigned receipt → signed receipt (requires server signing key)

## Configuration (env)

- `LLMRING_DATABASE_URL` (required)
- `LLMRING_DATABASE_SCHEMA` (default: llmring)
- `LLMRING_REDIS_URL` (optional, caching)
- `LLMRING_REGISTRY_BASE_URL` (default: https://llmring.github.io/registry/)
- `LLMRING_RECEIPTS_PRIVATE_KEY_B64`, `LLMRING_RECEIPTS_PUBLIC_KEY_B64`, `LLMRING_RECEIPTS_KEY_ID`

## Dual‑mode

- Standalone: manages its own DB connections and migrations
- Library: use `create_app(db_manager=..., standalone=False, run_migrations=...)` with an external pool

App factory:

```python
create_app(
  db_manager: AsyncDatabaseManager | None = None,
  run_migrations: bool = True,
  schema: str | None = None,
  settings: Settings | None = None,
  standalone: bool = True,
  include_meta_routes: bool = True,
) -> FastAPI
```

## Receipts

- Ed25519 signature over RFC 8785 JCS
- Canonical receipts are stored/verified by the server

Receipt fields (subset): `id`, `timestamp`, `model`, `alias`, `profile`, `lock_digest`, `key_id`, `tokens { input, output, cached_input }`, `cost { amount, calculation }`, `signature`.

## MCP Integration

The server provides full MCP (Model Context Protocol) persistence:

### MCP Database Schema
- `servers` - MCP server registry (name, URL, transport, capabilities)
- `tools` - Available tools with schemas
- `resources` - Accessible resources (files, URLs, etc.)
- `prompts` - Reusable prompt templates
- `tool_executions` - Execution history with inputs/outputs

All MCP operations are project-scoped via the `X-API-Key` header.

## Security Checklist

- Set explicit CORS origins in production
- Serve behind TLS
- Treat `X-API-Key` as a secret (api_key_id)
- Configure receipts keys to enable verification/issuance
- MCP resources are isolated per project

## Links

- GitHub: https://github.com/juanre/llmring-server

---

# LLMRing Registry - LLM Documentation

URL: https://llmring.ai/docs/registry/
This is the LLM-readable version of the LLMRing Registry page.

---

# LLMRing Open Registry

Public, versioned, **human-validated** registry of model capabilities and pricing, hosted on GitHub Pages. Models are keyed as `provider:model`.

Base URL: [https://llmring.github.io/registry/](https://llmring.github.io/registry/)

**Curation Philosophy**: All published registry files are reviewed and validated by humans. Automation is used only to generate draft candidates; no auto-publish to ensure data accuracy and trustworthiness.

## Files

- Current per provider: `/[provider]/models.json`
- Archived versions: `/[provider]/v/[n]/models.json`
- Manifest: `/manifest.json`

## Schema (per provider)

```json
{
  "version": 2,
  "updated_at": "2025-08-20T00:00:00Z",
  "models": {
    "openai:gpt-4o-mini": {
      "provider": "openai",
      "model_name": "gpt-4o-mini",
      "display_name": "GPT-4o Mini",
      "max_input_tokens": 128000,
      "max_output_tokens": 16384,
      "dollars_per_million_tokens_input": 0.15,
      "dollars_per_million_tokens_output": 0.60,
      "supports_vision": true,
      "supports_function_calling": true,
      "supports_json_mode": true,
      "supports_parallel_tool_calls": true,
      "tool_call_format": "json_schema",
      "is_active": true
    }
  }
}
```

## Curation Workflow (Human-Validated, Canonical)

LLMRing's registry prioritizes accuracy through manual review:

1. **Gather sources** (recommended): Collect pricing/docs HTML and PDFs from each provider for audit trail
2. **Generate draft**: Use automation to create best-effort draft from sources (automation allowed for drafts only)
3. **Review changes**: Compare draft vs current published file, field-by-field; manually adjust as needed
4. **Promote**: Bump per-provider `version`, set `updated_at`, archive previous under `v//models.json`, replace current `models.json`

**Critical**: Published `models.json` files are always human-reviewed. Automation generates candidates only; humans make final decisions to ensure accuracy.

### CLI (from `registry` package)

```bash
# Install browser for PDF fetching (first time only)
uv run playwright install chromium

# Fetch documentation from all providers
uv run llmring-registry fetch --provider all

# Extract model information to create drafts
uv run llmring-registry extract --provider all --timeout 120

# Review draft changes for each provider
uv run llmring-registry review-draft --provider openai
uv run llmring-registry review-draft --provider anthropic
uv run llmring-registry review-draft --provider google

# Accept all changes (after review)
uv run llmring-registry review-draft --provider openai --accept-all

# Promote reviewed file to production and archive
uv run llmring-registry promote --provider openai
```

**Single provider update example:**
```bash
uv run llmring-registry fetch --provider openai
uv run llmring-registry extract --provider openai
uv run llmring-registry review-draft --provider openai --accept-all
uv run llmring-registry promote --provider openai
```

## Clients

- The `llmring` library fetches current models and uses the registry for cost calculation and limit validation.
- The server proxies the registry and may cache responses.

### Client-side Lookup Rules

- Models map is a dictionary keyed by `provider:model`. Clients should prefer O(1) lookups by that key.
- When only `model` is available, clients may attempt fallback keys: `models[model]` or `models[f"{provider}/{model}"]` for legacy data.

## Links

- Live data: [https://llmring.github.io/registry/](https://llmring.github.io/registry/)
- Source: [https://github.com/llmring/registry](https://github.com/llmring/registry)

---

# Model Context Protocol (MCP) - LLM Documentation

URL: https://llmring.ai/docs/mcp/
This is the LLM-readable version of the Model Context Protocol (MCP) page.

---

# Model Context Protocol (MCP)

LLMRing provides comprehensive MCP (Model Context Protocol) support for standardized tool orchestration and resource management. MCP enables LLMs to interact with external tools, resources, and data sources through a unified protocol.

## Key Features

- **MCP Chat Client**: Interactive terminal application with persistent conversation history
- **Enhanced LLM**: Automatic tool discovery and execution integrated into chat flows
- **Lockfile Management**: Conversational configuration via MCP for intelligent alias setup
- **Multiple Transports**: HTTP, WebSocket, and stdio connections to MCP servers
- **Streaming Support**: Tool calls work seamlessly with streaming responses
- **Custom Servers**: Build your own MCP servers to expose tools to LLMs
- **Persistence**: Optional server-backed storage (requires llmring-server)

## Choosing Your MCP Interface

LLMRing provides two ways to work with MCP:

| Feature | MCP Chat Client | Enhanced LLM |
|---------|----------------|--------------|
| **Best For** | Interactive terminal sessions, configuration | Programmatic integration, applications |
| **Interface** | Command-line chat application | Python API |
| **History** | Automatic persistent history in `~/.llmring/mcp_chat/` | Custom management needed |
| **Session Management** | Built-in session saving/loading | Manual implementation |
| **Tool Discovery** | Automatic with `/tools` command | Automatic via API |
| **Streaming** | Real-time terminal output | AsyncIterator for custom handling |
| **Use Cases** | Lockfile configurationInteractive explorationCLI toolsQuick testing | Production applicationsAutomated workflowsCustom integrationsBatch processing |

**Quick Decision Guide:**
- Use **Chat Client** for interactive configuration, exploration, or testing
- Use **Enhanced LLM** for MCP capabilities in your Python applications

## Quick Start

### MCP Chat Client (Interactive)

```bash
# Conversational lockfile configuration (built-in)
llmring lock chat

# Connect to custom MCP server
llmring mcp chat --server "stdio://python -m your_mcp_server"

# HTTP server
llmring mcp chat --server "http://localhost:8080"

# WebSocket server
llmring mcp chat --server "ws://localhost:8080"
```

**Command Line Options:**

```bash
llmring mcp chat [OPTIONS]

Options:
  --server TEXT       MCP server URL (stdio://, http://, ws://)
  --model TEXT        LLM model alias to use (default: advisor)
  --no-telemetry      Disable telemetry
  --debug             Enable debug logging
```

### Enhanced LLM (Programmatic)

```python
from llmring.mcp.client.enhanced_llm import create_enhanced_llm

# Create enhanced LLM with MCP tools
llm = await create_enhanced_llm(
    model="balanced",
    mcp_server_path="stdio://python -m my_mcp_server"
)

# Chat with automatic tool execution
messages = [{"role": "user", "content": "Help me with my files"}]
response = await llm.chat(messages)

print(response.content)
if response.tool_calls:
    print(f"Used tools: {[call['function']['name'] for call in response.tool_calls]}")
```

## Built-in Chat Commands

| Command | Description |
|---------|-------------|
| `/help` | Display all available commands |
| `/history` | Show current conversation history |
| `/sessions` | List all saved chat sessions |
| `/load ` | Load and resume a previous session |
| `/clear` | Clear the current conversation |
| `/model ` | Switch to a different model |
| `/tools` | List available MCP tools from the server |
| `/exit` or `/quit` | Exit the chat client |

## Conversational Lockfile Configuration

The most powerful feature of MCP in LLMRing is conversational lockfile management:

```bash
llmring lock chat
```

This starts an interactive session where you can:
- Describe requirements in natural language
- Get recommendations based on current registry
- Understand cost implications and tradeoffs
- Configure aliases with fallback models
- Set up environment-specific profiles

**Example conversation:**

```
You: I need a configuration for a coding assistant that prioritizes accuracy
Assistant: I'll help you configure an accurate coding assistant. Based on the registry,
I recommend using Claude 3.5 Sonnet as the primary model with GPT-4o as fallback.

[Calling tool: add_alias]
Added alias 'coder' with models: anthropic:claude-3-5-sonnet, openai:gpt-4o

This configuration prioritizes accuracy while providing fallback for availability.
Monthly cost estimate: ~$50-100 for moderate usage.

You: Add a cheaper option for simple tasks
Assistant: I'll add a cost-effective alias for simpler coding tasks.

[Calling tool: add_alias]
Added alias 'coder-fast' with model: openai:gpt-4o-mini

This model is 10x cheaper and perfect for simple completions, syntax fixes, and
basic code generation.
```

## Persistent History

All conversations are automatically saved in `~/.llmring/mcp_chat/`:

```
~/.llmring/mcp_chat/
├── command_history.txt              # Terminal command history
├── conversation_.json   # Individual conversations
└── sessions.json                    # Session metadata and index
```

Each session includes:
- Unique session ID and timestamp
- Complete message history
- Tool calls and their results
- Model used for each response

## Connecting to MCP Servers

### Stdio Servers (Local Processes)

Most common for development and local tools:

```bash
# Python MCP server
llmring mcp chat --server "stdio://python -m mypackage.mcp_server"

# Node.js MCP server
llmring mcp chat --server "stdio://node my-mcp-server.js"

# Any executable
llmring mcp chat --server "stdio:///usr/local/bin/my-mcp-tool"
```

### HTTP Servers

For REST API-based MCP servers:

```bash
# Local development
llmring mcp chat --server "http://localhost:8080"

# Remote server
llmring mcp chat --server "https://api.example.com/mcp"
```

### WebSocket Servers

For real-time, bidirectional communication:

```bash
# WebSocket connection
llmring mcp chat --server "ws://localhost:8080"

# Secure WebSocket
llmring mcp chat --server "wss://mcp.example.com"
```

## Creating Custom MCP Servers

Build your own MCP servers to expose tools to LLMs:

### Simple Python Example

```python
#!/usr/bin/env python3
"""my_mcp_server.py - Custom MCP server example"""

import asyncio
from llmring.mcp.server import MCPServer
from llmring.mcp.server.transport.stdio import StdioTransport

# Create server
server = MCPServer(
    name="My Custom Tools",
    version="1.0.0"
)

# Register tools
@server.function_registry.register(
    name="get_weather",
    description="Get weather for a location"
)
def get_weather(location: str) -> dict:
    return {
        "location": location,
        "temperature": 72,
        "conditions": "sunny"
    }

@server.function_registry.register(
    name="calculate",
    description="Perform calculations"
)
def calculate(expression: str) -> dict:
    try:
        # Use ast.literal_eval in production for safety
        result = eval(expression)
        return {"result": result}
    except Exception as e:
        return {"error": str(e)}

# Run server
async def main():
    transport = StdioTransport()
    await server.run(transport)

if __name__ == "__main__":
    asyncio.run(main())
```

Connect to your server:

```bash
llmring mcp chat --server "stdio://python my_mcp_server.py"
```

## Enhanced LLM with MCP

For programmatic usage with automatic tool execution:

### Basic Usage

```python
from llmring.mcp.client.enhanced_llm import create_enhanced_llm

# Create enhanced LLM
llm = await create_enhanced_llm(
    model="balanced",
    mcp_server_path="stdio://python -m my_mcp_server"
)

# Chat with automatic tool execution
messages = [{"role": "user", "content": "What's the weather in NYC?"}]
response = await llm.chat(messages)

# Tools are called automatically and results integrated
print(response.content)  # "The weather in NYC is 72°F and sunny."
```

### Streaming with Tools

```python
# Streaming works seamlessly with tool calls
messages = [{"role": "user", "content": "Analyze this file and summarize it"}]

async for chunk in await llm.chat_stream(messages):
    if chunk.type == "content":
        print(chunk.content, end="", flush=True)
    elif chunk.type == "tool_call":
        print(f"\n[Calling tool: {chunk.tool_call.name}]")
    elif chunk.type == "tool_result":
        print(f"\n[Tool result received]")
```

### Direct MCP Client Usage

```python
from llmring.mcp.client.mcp_client import MCPClient

# Connect to MCP server
client = MCPClient("http://localhost:8000")
await client.initialize()

# List available tools
tools = await client.list_tools()
for tool in tools:
    print(f"- {tool['name']}: {tool['description']}")

# Execute a tool directly
result = await client.call_tool(
    "read_file",
    {"path": "/path/to/file.txt"}
)
print(result)

# Clean up
await client.close()
```

## Best Practices

### Server Design Philosophy

When creating MCP servers:

1. **Data-Focused Tools**: Design tools to provide data and perform actions, not make decisions
2. **LLM in Driver's Seat**: Let the LLM decide how to use tools based on user intent
3. **Clear Tool Names**: Use descriptive, action-oriented names
4. **Comprehensive Schemas**: Provide detailed parameter descriptions
5. **Error Handling**: Return informative error messages

### Security Considerations

1. **Validate Input**: Always validate tool parameters
2. **Limit Scope**: Tools should have minimal necessary permissions
3. **Secure Transport**: Use HTTPS/WSS in production
4. **Authentication**: Implement proper auth for production servers
5. **Audit Logging**: Log tool usage for security monitoring

### Performance Tips

1. **Choose Appropriate Models**: Match model capability to task complexity
2. **Cache Results**: Implement caching in MCP servers for expensive operations
3. **Streaming Responses**: Use streaming for long-running operations
4. **Batch Operations**: Design tools to handle batch requests when possible

## Troubleshooting

**Server won't start:**
- Check the server path is correct
- Ensure proper permissions
- Verify Python/Node.js environment is activated

**Tools not appearing:**
- Run `/tools` to refresh tool list
- Check server logs for registration errors
- Verify tool schemas are valid

**History not saving:**
- Check permissions on `~/.llmring/mcp_chat/`
- Ensure enough disk space
- Look for errors in debug mode (`--debug`)

**Connection errors:**
- Verify server is running
- Check firewall/network settings
- Ensure correct protocol (stdio/http/ws)

## Further Resources

- Example MCP servers: See `examples/mcp/` in the [llmring repository](https://github.com/juanre/llmring)
- MCP specification: [Model Context Protocol](https://github.com/anthropics/model-context-protocol)
- LLMRing MCP source: `src/llmring/mcp/` in the repository