Skip to main content

LLM Routing

Overview

The OMEGA router classifies prompts by intent and routes them to the optimal LLM based on priority mode, context affinity, and token budget. The classifier runs locally via ONNX prototypes in under 2 milliseconds --- no external API calls for classification.

Install: pip install omega-memory[router]

The router supports 5 intents, 5 providers, and 4 priority modes. It tracks which model you are currently using (context affinity) to avoid unnecessary switches, and automatically routes to large-context models (Gemini) when prompts exceed 100K tokens.

Quick Example

# Classify a prompt
omega_classify_intent(prompt="Write a recursive tree traversal in Python")
# Returns: intent="coding", confidence=0.92

# Route to the best model
omega_route_prompt(prompt="Write a recursive tree traversal in Python", priority="quality")
# Returns: recommended model, provider, and reasoning

# Switch priority mode for the session
omega_set_priority_mode(mode="speed")

Intents

IntentDescriptionExample Prompts
codingWriting, debugging, or reviewing code"Fix the null pointer in auth.py", "Write a REST endpoint"
creativeWriting prose, brainstorming, content generation"Draft a blog post about microservices", "Name this feature"
logicReasoning, math, analysis, architecture decisions"Compare PostgreSQL vs DynamoDB for this use case"
explorationResearch, open-ended questions, broad investigation"How does Kubernetes handle pod scheduling?"
simple_editSmall edits, renames, formatting, typo fixes"Rename this variable", "Add a docstring to this function"

Providers and Models

ProviderModelsStrengths
AnthropicClaude Opus 4.6, Claude SonnetCoding, reasoning
OpenAIGPT-4o, GPT-4o-miniGeneral purpose, creative
GoogleGemini 2.5 Pro/Flash, Gemini 3 Pro/FlashLarge context (1M+ tokens)
GroqLlama 3.1 8B InstantSpeed, simple edits
xAIGrok-4, Grok-4.1-fastExploration, long context (2M)

Priority Modes

ModeBehavior
costRoute to the cheapest model that can handle the intent
speedRoute to the fastest model (Groq for simple edits, smaller models elsewhere)
qualityRoute to the most capable model for the intent
balancedDefault. Weighs cost, speed, and quality equally.

Tools Reference

ToolPurpose
omega_route_promptRoute a prompt to the optimal LLM. Accepts priority override and estimated token count.
omega_classify_intentClassify a prompt's intent without routing. Returns intent, confidence, and all scores.
omega_router_statusShow provider availability (which have API keys), routing stats, and current priority mode
omega_set_priority_modeSet the routing priority mode for all subsequent routing decisions
omega_get_model_configView routing configuration for a specific intent or all intents
omega_switch_modelSwitch to a different LLM with OMEGA memory preservation. Retrieves relevant context for the target model.
omega_get_current_modelGet the current model for a session (from context affinity tracking)
omega_router_contextGet session context: current model, provider, token count, conversation depth
omega_warm_routerPre-load the intent classifier prototypes and model config. Reduces first-route latency.
omega_router_benchmarkRun a quick accuracy test: 6 sample prompts, verify intent classification and routing

Setup

Store API keys in ~/.omega/secrets.json (file is chmod 600):

{
  "anthropic_api_key": "sk-ant-...",
  "openai_api_key": "sk-...",
  "google_api_key": "...",
  "groq_api_key": "gsk_...",
  "xai_api_key": "xai-..."
}

You can also set keys via environment variables in ~/.zshrc:

export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."

Verify setup:

omega_router_status
# Shows which providers have valid keys

Common Workflows

Basic Routing

Set a priority mode once, then route prompts:

omega_set_priority_mode(mode="balanced")

omega_route_prompt(prompt="Implement a rate limiter with sliding window")
# Returns: Anthropic/Claude Opus 4.6 (coding intent, quality match)

omega_route_prompt(prompt="Fix the typo in the README")
# Returns: Groq/Llama 3.1 8B (simple_edit intent, speed match)

Override priority per-prompt:

omega_route_prompt(prompt="Analyze this architecture", priority="quality")

Intent Classification

Classify without routing --- useful for understanding how the router sees your prompts:

omega_classify_intent(prompt="Should we use a message queue or direct HTTP calls?", detailed=True)
# Returns:
#   intent: logic
#   confidence: 0.87
#   scores: {logic: 0.87, coding: 0.45, exploration: 0.38, creative: 0.12, simple_edit: 0.03}

Large Context Override

When a prompt involves more than 100K tokens (e.g., analyzing a large codebase), the router automatically overrides to a large-context model:

omega_route_prompt(prompt="Analyze this codebase for security issues", estimated_tokens=150000)
# Returns: Google/Gemini or xAI/Grok-4.1-fast (large context capable)

Context Affinity

The router tracks which model you are currently using and penalizes unnecessary switches. This prevents thrashing between providers mid-conversation:

omega_router_context(session_id="agent-1")
# Returns: current model, provider, token count, conversation depth

Model Switching

When you need to switch providers (e.g., moving from coding to creative work), use omega_switch_model to preserve OMEGA memory context:

omega_switch_model(
    session_id="agent-1",
    target_provider="openai",
    target_model="gpt-4o",
    retrieve_context=True
)
# Retrieves relevant OMEGA context and adapts it for the target model

Warm-Up

Pre-load the classifier at session start to eliminate first-route latency:

omega_warm_router()

Benchmarking

Verify the router is working correctly:

omega_router_benchmark()
# Runs 6 sample prompts, reports intent classification accuracy and routing decisions

Cross-Model Consultation

OMEGA includes two consultation tools that let you get a second opinion from a different LLM provider. The system is provider-aware: if you are running on Claude, you get omega_consult_gpt. If you are running on OpenAI or another non-Anthropic provider, you get omega_consult_claude.

When to use consultation:

  • Stuck on a problem for 10+ minutes or after 3+ failed approaches
  • Facing an irreversible architecture decision
  • Debugging a dead end
  • Cross-validating a fragile solution

Example (from a Claude agent):

omega_consult_gpt(
    prompt="Should I use a message queue or direct HTTP calls for this microservice?",
    context="We have 3 services, ~100 req/sec, need exactly-once delivery for payment events",
    temperature=0.5
)
# Returns: GPT's analysis with a different perspective

Example (from an OpenAI agent):

omega_consult_claude(
    prompt="Review this SQL migration for edge cases",
    context="ALTER TABLE users ADD COLUMN email_verified BOOLEAN DEFAULT FALSE; ...",
    temperature=0.3
)

Consultation requires the target provider's API key in ~/.omega/secrets.json. If the key is missing, the tool is unavailable (no error, just not shown in the tool list).

Tips

  • Warm the router at session start. Calling omega_warm_router loads ONNX prototypes into memory, reducing first-route latency from ~50ms to <2ms.
  • Classification is free. omega_classify_intent runs entirely locally via ONNX --- no API calls, no cost, sub-2ms latency. Use it liberally.
  • Priority modes are session-wide. omega_set_priority_mode affects all subsequent routing in the session. Override per-prompt with the priority parameter on omega_route_prompt.
  • Context affinity reduces churn. The router prefers to stay on the current model. It will only switch when a different model is significantly better for the intent.
  • Estimated tokens matter. If you know a prompt involves a large context, pass estimated_tokens to trigger the large-context override. Without it, the router cannot know the full context size.
  • Secrets file permissions. ~/.omega/secrets.json should be chmod 600. The CLI omega setup sets this automatically.
  • Not all providers are required. The router works with whatever providers have valid keys. Missing providers are simply skipped during routing.
  • Force an intent. If you disagree with the classifier, use force_intent on omega_route_prompt to override classification.