LLM Routing
Overview
The OMEGA router classifies prompts by intent and routes them to the optimal LLM based on priority mode, context affinity, and token budget. The classifier runs locally via ONNX prototypes in under 2 milliseconds --- no external API calls for classification.
Install: pip install omega-memory[router]
The router supports 5 intents, 5 providers, and 4 priority modes. It tracks which model you are currently using (context affinity) to avoid unnecessary switches, and automatically routes to large-context models (Gemini) when prompts exceed 100K tokens.
Quick Example
# Classify a prompt
omega_classify_intent(prompt="Write a recursive tree traversal in Python")
# Returns: intent="coding", confidence=0.92
# Route to the best model
omega_route_prompt(prompt="Write a recursive tree traversal in Python", priority="quality")
# Returns: recommended model, provider, and reasoning
# Switch priority mode for the session
omega_set_priority_mode(mode="speed")
Intents
| Intent | Description | Example Prompts |
|---|---|---|
coding | Writing, debugging, or reviewing code | "Fix the null pointer in auth.py", "Write a REST endpoint" |
creative | Writing prose, brainstorming, content generation | "Draft a blog post about microservices", "Name this feature" |
logic | Reasoning, math, analysis, architecture decisions | "Compare PostgreSQL vs DynamoDB for this use case" |
exploration | Research, open-ended questions, broad investigation | "How does Kubernetes handle pod scheduling?" |
simple_edit | Small edits, renames, formatting, typo fixes | "Rename this variable", "Add a docstring to this function" |
Providers and Models
| Provider | Models | Strengths |
|---|---|---|
| Anthropic | Claude Opus 4.6, Claude Sonnet | Coding, reasoning |
| OpenAI | GPT-4o, GPT-4o-mini | General purpose, creative |
| Gemini 2.5 Pro/Flash, Gemini 3 Pro/Flash | Large context (1M+ tokens) | |
| Groq | Llama 3.1 8B Instant | Speed, simple edits |
| xAI | Grok-4, Grok-4.1-fast | Exploration, long context (2M) |
Priority Modes
| Mode | Behavior |
|---|---|
cost | Route to the cheapest model that can handle the intent |
speed | Route to the fastest model (Groq for simple edits, smaller models elsewhere) |
quality | Route to the most capable model for the intent |
balanced | Default. Weighs cost, speed, and quality equally. |
Tools Reference
| Tool | Purpose |
|---|---|
omega_route_prompt | Route a prompt to the optimal LLM. Accepts priority override and estimated token count. |
omega_classify_intent | Classify a prompt's intent without routing. Returns intent, confidence, and all scores. |
omega_router_status | Show provider availability (which have API keys), routing stats, and current priority mode |
omega_set_priority_mode | Set the routing priority mode for all subsequent routing decisions |
omega_get_model_config | View routing configuration for a specific intent or all intents |
omega_switch_model | Switch to a different LLM with OMEGA memory preservation. Retrieves relevant context for the target model. |
omega_get_current_model | Get the current model for a session (from context affinity tracking) |
omega_router_context | Get session context: current model, provider, token count, conversation depth |
omega_warm_router | Pre-load the intent classifier prototypes and model config. Reduces first-route latency. |
omega_router_benchmark | Run a quick accuracy test: 6 sample prompts, verify intent classification and routing |
Setup
Store API keys in ~/.omega/secrets.json (file is chmod 600):
{
"anthropic_api_key": "sk-ant-...",
"openai_api_key": "sk-...",
"google_api_key": "...",
"groq_api_key": "gsk_...",
"xai_api_key": "xai-..."
}
You can also set keys via environment variables in ~/.zshrc:
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."
Verify setup:
omega_router_status
# Shows which providers have valid keys
Common Workflows
Basic Routing
Set a priority mode once, then route prompts:
omega_set_priority_mode(mode="balanced")
omega_route_prompt(prompt="Implement a rate limiter with sliding window")
# Returns: Anthropic/Claude Opus 4.6 (coding intent, quality match)
omega_route_prompt(prompt="Fix the typo in the README")
# Returns: Groq/Llama 3.1 8B (simple_edit intent, speed match)
Override priority per-prompt:
omega_route_prompt(prompt="Analyze this architecture", priority="quality")
Intent Classification
Classify without routing --- useful for understanding how the router sees your prompts:
omega_classify_intent(prompt="Should we use a message queue or direct HTTP calls?", detailed=True)
# Returns:
# intent: logic
# confidence: 0.87
# scores: {logic: 0.87, coding: 0.45, exploration: 0.38, creative: 0.12, simple_edit: 0.03}
Large Context Override
When a prompt involves more than 100K tokens (e.g., analyzing a large codebase), the router automatically overrides to a large-context model:
omega_route_prompt(prompt="Analyze this codebase for security issues", estimated_tokens=150000)
# Returns: Google/Gemini or xAI/Grok-4.1-fast (large context capable)
Context Affinity
The router tracks which model you are currently using and penalizes unnecessary switches. This prevents thrashing between providers mid-conversation:
omega_router_context(session_id="agent-1")
# Returns: current model, provider, token count, conversation depth
Model Switching
When you need to switch providers (e.g., moving from coding to creative work), use omega_switch_model to preserve OMEGA memory context:
omega_switch_model(
session_id="agent-1",
target_provider="openai",
target_model="gpt-4o",
retrieve_context=True
)
# Retrieves relevant OMEGA context and adapts it for the target model
Warm-Up
Pre-load the classifier at session start to eliminate first-route latency:
omega_warm_router()
Benchmarking
Verify the router is working correctly:
omega_router_benchmark()
# Runs 6 sample prompts, reports intent classification accuracy and routing decisions
Cross-Model Consultation
OMEGA includes two consultation tools that let you get a second opinion from a different LLM provider. The system is provider-aware: if you are running on Claude, you get omega_consult_gpt. If you are running on OpenAI or another non-Anthropic provider, you get omega_consult_claude.
When to use consultation:
- Stuck on a problem for 10+ minutes or after 3+ failed approaches
- Facing an irreversible architecture decision
- Debugging a dead end
- Cross-validating a fragile solution
Example (from a Claude agent):
omega_consult_gpt(
prompt="Should I use a message queue or direct HTTP calls for this microservice?",
context="We have 3 services, ~100 req/sec, need exactly-once delivery for payment events",
temperature=0.5
)
# Returns: GPT's analysis with a different perspective
Example (from an OpenAI agent):
omega_consult_claude(
prompt="Review this SQL migration for edge cases",
context="ALTER TABLE users ADD COLUMN email_verified BOOLEAN DEFAULT FALSE; ...",
temperature=0.3
)
Consultation requires the target provider's API key in ~/.omega/secrets.json. If the key is missing, the tool is unavailable (no error, just not shown in the tool list).
Tips
- Warm the router at session start. Calling
omega_warm_routerloads ONNX prototypes into memory, reducing first-route latency from ~50ms to <2ms. - Classification is free.
omega_classify_intentruns entirely locally via ONNX --- no API calls, no cost, sub-2ms latency. Use it liberally. - Priority modes are session-wide.
omega_set_priority_modeaffects all subsequent routing in the session. Override per-prompt with thepriorityparameter onomega_route_prompt. - Context affinity reduces churn. The router prefers to stay on the current model. It will only switch when a different model is significantly better for the intent.
- Estimated tokens matter. If you know a prompt involves a large context, pass
estimated_tokensto trigger the large-context override. Without it, the router cannot know the full context size. - Secrets file permissions.
~/.omega/secrets.jsonshould be chmod 600. The CLIomega setupsets this automatically. - Not all providers are required. The router works with whatever providers have valid keys. Missing providers are simply skipped during routing.
- Force an intent. If you disagree with the classifier, use
force_intentonomega_route_promptto override classification.