Zep quietly rebranded from “AI memory” to “context engineering” a few months back. I noticed it and sat with it for a while, because the framing is genuinely sharper. Memory implies storage. Context engineering implies architecture. The shift matters.
The insight behind it: agents don't fail because they have bad retrieval. They fail because the right information never gets assembled into context at the right moment. That's an engineering problem across multiple layers, not a database problem.
Zep is right about the framing. Where most of the ecosystem goes wrong is treating the stack as a single layer and calling it solved when you nail Layer 1.
Layer 1: Memory
Every competitor in this space plays here. Mem0 with 47K GitHub stars. Zep with its temporal knowledge graph. XTrace, ContextStream, and a dozen others shipping in 2025. Memory is the baseline: persist decisions, lessons, preferences, and context across sessions so agents don't start from zero.
OMEGA covers this layer with a full pipeline: semantic search via ONNX embeddings on your local machine, FTS5 full-text search, type weighting (decisions rank higher than session summaries), reranking, deduplication via SHA-256. Retrieval runs in under 50ms. On LongMemEval's 500-question benchmark, this pipeline scores 95.4%, which puts it at the top of the published leaderboard.
But memory alone is not enough. The tools that stop at Layer 1 handle the easy case: one agent, one session, straightforward queries. The hard case is everything else.
Layer 2: Coordination
Picture a typical day with Claude Code running two parallel agents: one refactoring the API layer, one updating the test suite. Without coordination, Agent A writes tohandlers.py while Agent B reads a stale version of the same file. The conflict doesn't surface until merge time. Or worse, Agent B silently overwrites Agent A's changes without knowing they happened.
Layer 2 is what prevents this. File claims register which agent is editing which file, and block conflicting writes before they happen. Branch guards prevent simultaneous commits to the same branch. Deadlock detection catches circular dependencies before agents stall indefinitely. Task queues prevent duplicate work when two agents pick up the same job from the backlog.
Almost nobody offers this. Letta has partial state management within a single agent framework, but nothing that coordinates across independent agent sessions. Mem0 doesn't touch it. Zep doesn't touch it. This is the layer where multi-agent work either holds together or falls apart.
OMEGA's coordination layer has been running in production for months now. Five core primitives: omega_coord_status, file claims, branch guards, task queues, and peer messaging between agents.
Layer 3: Routing
Not every request should go to GPT-4 or Claude Opus. A typo fix doesn't need a 200-token-per-query model that costs $15 per million output tokens. An architecture review probably does. The gap between those two tasks is real, and most agent setups ignore it entirely. Everything goes to the same model, at the same cost, regardless of what the task actually needs.
Layer 3 is intent classification and request routing: a sub-2ms classifier that reads the incoming query, determines its complexity and domain, and routes it to the appropriate provider. Simple edits go to Haiku or Flash. Architecture decisions go to Opus or GPT-4o. Local tasks that don't need cloud inference go to a local model. Providers span Anthropic, OpenAI, Google, xAI, and self-hosted options.
The cost savings are real. Depending on your mix of task types, intelligent routing cuts API spend by $50 to $200 per month for active development workflows. More importantly, it improves quality: routing architecture questions to a stronger model instead of the cheapest one available changes the output in ways that matter.
Memory tools don't touch routing. This makes sense from their perspective: they sell persistence, not execution. But from a context engineering perspective, routing is inseparable from context assembly. The model you choose changes what context is useful and how much of it fits.
Layer 4: Knowledge
Memory stores what happened in sessions. Knowledge stores what you know about your domain. These are different things, and treating them the same way degrades both.
Layer 4 is persistent knowledge: ingesting PDFs, markdown files, web pages, and documentation into a local vector store with ONNX embeddings running entirely on your machine. When an agent opens a file related to your authentication system, the relevant sections of your architecture decision records, migration history, and internal docs get injected into context automatically. No manual lookup. No stale copy-paste.
The difference between this and session memory is durability and scope. A session memory might record “we decided to use JWT with 15-minute refresh tokens.” The knowledge layer holds the full migration docs that explain why, the three alternatives that were rejected, and the edge cases that informed the decision. That full context is available to any agent session without being explicitly referenced.
And because all of this runs locally via ONNX embeddings, there are no API calls to an embeddings service, no cloud dependency, and no data leaving your machine.
Why the Stack Compounds
The reason these four layers matter together is that each one makes the others better. This is the part that single-layer tools structurally cannot replicate.
Memory informs routing
If OMEGA knows from prior sessions that you prefer Claude for architecture decisions and GPT-4o for code review, the router uses that preference automatically. The routing layer gets smarter as the memory layer accumulates. A routing system without memory treats every session as the first.
Coordination uses memory
When Agent A claims a file and logs its reasoning, Agent B can query that context before attempting conflicting work. “Agent A is mid-refactor on this module and decided to defer the interface change to next session” is information that prevents redundant work. Coordination without persistent memory is just traffic management. Coordination with memory is shared situational awareness.
Knowledge feeds memory
When an agent reads a decision from the knowledge layer, that interaction gets recorded in memory: what was accessed, when, and in what context. Over time, the memory layer learns which knowledge documents are actually consulted and which are noise. Retrieval quality improves because the system observes its own usage patterns.
None of this compounding is possible if you bolt together four separate tools from four different vendors. The context that flows between layers requires a unified store. When your memory system and your routing system and your knowledge system have separate data models, the information that makes each layer smarter gets lost at the boundary.
The Landscape, Mapped
Where each tool currently sits:
This is not a hit piece on any of these tools. Mem0 built real community. Zep's temporal knowledge graph is genuinely interesting and their rebrand to context engineering shows they understand the problem space. Letta's agent framework has real production deployments.
The landscape point is simpler: single-layer tools are making a choice to stay in their lane. For some use cases, that's the right call. For developers running multi-agent workflows who want compound intelligence that gets better over time, the single-layer approach hits a ceiling.
Where This Goes
Context engineering is a better frame than “memory tools.” Zep deserves credit for surfacing it. The field is moving from “save some text and retrieve it” toward something closer to what an operating system does for a process: structured context assembly from multiple sources, with coherent state across parallel workers.
The full four-layer stack is what OMEGA Pro ships. Layer 1 through Layer 4 in a single install, no cloud dependencies, running on your machine. The agent's intelligence compounds because the system is designed for compounding, not just for retrieval.
The core layer is free and open source under Apache-2.0. Layer 2 through 4 are in Pro. If you're running multi-agent workflows or want the full stack, the Pro page is here.
Related reading
