LLM Routing

Name: OMEGA
Author: OMEGA

Overview

The OMEGA router classifies prompts by intent and routes them to the optimal LLM based on priority mode, context affinity, and token budget. The classifier runs locally via ONNX prototypes in under 10 milliseconds (sub-2ms after warm-up) --- no external API calls for classification.

Install: pip install omega-memory[router]

The router supports 5 intents, 5 providers, and 4 priority modes. It tracks which model you are currently using (context affinity) to avoid unnecessary switches, and automatically routes to large-context models (Gemini) when prompts exceed 100K tokens.

Quick Example

# Classify a prompt
omega_classify_intent(prompt="Write a recursive tree traversal in Python")
# Returns: intent="coding", confidence=0.92

# Route to the best model
omega_route_prompt(prompt="Write a recursive tree traversal in Python", priority="quality")
# Returns: recommended model, provider, and reasoning

# Switch priority mode for the session
omega_set_priority_mode(mode="speed")

Intents

Intent	Description	Example Prompts
`coding`	Writing, debugging, or reviewing code	"Fix the null pointer in auth.py", "Write a REST endpoint"
`creative`	Writing prose, brainstorming, content generation	"Draft a blog post about microservices", "Name this feature"
`logic`	Reasoning, math, analysis, architecture decisions	"Compare PostgreSQL vs DynamoDB for this use case"
`exploration`	Research, open-ended questions, broad investigation	"How does Kubernetes handle pod scheduling?"
`simple_edit`	Small edits, renames, formatting, typo fixes	"Rename this variable", "Add a docstring to this function"

Providers and Models

Provider	Models	Strengths
Anthropic	Claude Opus 4.6, Claude Sonnet 4.5, Claude Haiku 4.5	Coding, reasoning
OpenAI	GPT-5.4, GPT-4o	General purpose, creative
Google	Gemini 3.1 Pro, Gemini 3.1 Flash	Large context (1M+ tokens)
Groq	Llama 3.3 70B, Llama 3.1 8B Instant	Speed, simple edits
xAI	Grok-4, Grok-4.1-fast	Exploration, long context (2M)

Priority Modes

Mode	Behavior
`cost`	Route to the cheapest model that can handle the intent
`speed`	Route to the fastest model (Groq for simple edits, smaller models elsewhere)
`quality`	Route to the most capable model for the intent
`balanced`	Default. Weighs cost, speed, and quality equally.

Tools Reference

Tool	Purpose
`omega_route_prompt`	Route a prompt to the optimal LLM. Accepts priority override and estimated token count.
`omega_classify_intent`	Classify a prompt's intent without routing. Returns intent, confidence, and all scores.
`omega_router_status`	Show provider availability (which have API keys), routing stats, and current priority mode
`omega_set_priority_mode`	Set the routing priority mode for all subsequent routing decisions
`omega_get_model_config`	View routing configuration for a specific intent or all intents
`omega_switch_model`	Switch to a different LLM with OMEGA memory preservation. Retrieves relevant context for the target model.
`omega_get_current_model`	Get the current model for a session (from context affinity tracking)
`omega_router_context`	Get session context: current model, provider, token count, conversation depth
`omega_warm_router`	Pre-load the intent classifier prototypes and model config. Reduces first-route latency.
`omega_router_benchmark`	Run a quick accuracy test: 6 sample prompts, verify intent classification and routing

Setup

Store API keys in ~/.omega/secrets.json (file is chmod 600):

{
  "anthropic_api_key": "sk-ant-...",
  "openai_api_key": "sk-...",
  "google_api_key": "...",
  "groq_api_key": "gsk_...",
  "xai_api_key": "xai-..."
}

You can also set keys via environment variables in ~/.zshrc:

export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."

Verify setup:

omega_router_status
# Shows which providers have valid keys

Common Workflows

Basic Routing

Set a priority mode once, then route prompts:

omega_set_priority_mode(mode="balanced")

omega_route_prompt(prompt="Implement a rate limiter with sliding window")
# Returns: Anthropic/Claude Opus 4.6 (coding intent, quality match)

omega_route_prompt(prompt="Fix the typo in the README")
# Returns: Groq/Llama 3.1 8B (simple_edit intent, speed match)

Override priority per-prompt:

omega_route_prompt(prompt="Analyze this architecture", priority="quality")

Intent Classification

Classify without routing --- useful for understanding how the router sees your prompts:

omega_classify_intent(prompt="Should we use a message queue or direct HTTP calls?", detailed=True)
# Returns:
#   intent: logic
#   confidence: 0.87
#   scores: {logic: 0.87, coding: 0.45, exploration: 0.38, creative: 0.12, simple_edit: 0.03}

Large Context Override

When a prompt involves more than 100K tokens (e.g., analyzing a large codebase), the router automatically overrides to a large-context model:

omega_route_prompt(prompt="Analyze this codebase for security issues", estimated_tokens=150000)
# Returns: Google/Gemini or xAI/Grok-4.1-fast (large context capable)

Context Affinity

The router tracks which model you are currently using and penalizes unnecessary switches. This prevents thrashing between providers mid-conversation:

omega_router_context(session_id="agent-1")
# Returns: current model, provider, token count, conversation depth

Model Switching

When you need to switch providers (e.g., moving from coding to creative work), use omega_switch_model to preserve OMEGA memory context:

omega_switch_model(
    session_id="agent-1",
    target_provider="openai",
    target_model="gpt-4o",
    retrieve_context=True
)
# Retrieves relevant OMEGA context and adapts it for the target model

Warm-Up

Pre-load the classifier at session start to eliminate first-route latency:

omega_warm_router()

Benchmarking

Verify the router is working correctly:

omega_router_benchmark()
# Runs 6 sample prompts, reports intent classification accuracy and routing decisions

Cross-Model Consultation

OMEGA includes two consultation tools that let you get a second opinion from a different LLM provider. The system is provider-aware: if you are running on Claude, you get omega_consult_gpt. If you are running on OpenAI or another non-Anthropic provider, you get omega_consult_claude.

When to use consultation:

Stuck on a problem for 10+ minutes or after 3+ failed approaches
Facing an irreversible architecture decision
Debugging a dead end
Cross-validating a fragile solution

Example (from a Claude agent):

omega_consult_gpt(
    prompt="Should I use a message queue or direct HTTP calls for this microservice?",
    context="We have 3 services, ~100 req/sec, need exactly-once delivery for payment events",
    temperature=0.5
)
# Returns: GPT's analysis with a different perspective

Example (from an OpenAI agent):

omega_consult_claude(
    prompt="Review this SQL migration for edge cases",
    context="ALTER TABLE users ADD COLUMN email_verified BOOLEAN DEFAULT FALSE; ...",
    temperature=0.3
)

Consultation requires the target provider's API key in ~/.omega/secrets.json. If the key is missing, the tool is unavailable (no error, just not shown in the tool list).

Tips

Warm the router at session start. Calling omega_warm_router loads ONNX prototypes into memory, reducing first-route latency from ~50ms to <2ms.
Classification is free. omega_classify_intent runs entirely locally via ONNX --- no API calls, no cost, sub-2ms latency. Use it liberally.
Priority modes are session-wide. omega_set_priority_mode affects all subsequent routing in the session. Override per-prompt with the priority parameter on omega_route_prompt.
Context affinity reduces churn. The router prefers to stay on the current model. It will only switch when a different model is significantly better for the intent.
Estimated tokens matter. If you know a prompt involves a large context, pass estimated_tokens to trigger the large-context override. Without it, the router cannot know the full context size.
Secrets file permissions. ~/.omega/secrets.json should be chmod 600. The CLI omega setup sets this automatically.
Not all providers are required. The router works with whatever providers have valid keys. Missing providers are simply skipped during routing.
Force an intent. If you disagree with the classifier, use force_intent on omega_route_prompt to override classification.