Observational Memory
Is Not Enough.
Mastra just shipped observational memory for coding agents. It's clever, well-designed, and genuinely useful within a session. But when the session ends, every observation vanishes.

Mastra launched observational memory as part of Mastra Code, their AI coding agent. Two background LLM agents, an Observer and a Reflector, compress your conversation into timestamped observations that stay inside the context window. No database. No embeddings. No external storage. It's elegant, and it works.
I built OMEGA, which takes the opposite approach: external storage in SQLite with semantic search and entity graphs. I'm biased. But the architectural differences between these two approaches are real, and they matter more than any benchmark score.
What Mastra Gets Right
Credit where it's due. Mastra's observational memory is a genuinely clever design:
If your use case is “make this single coding session smarter,” Mastra's approach is genuinely good. The 94.87% LongMemEval score proves it works within a single evaluation run.
The Problem LongMemEval Doesn't Test
LongMemEval is a single-run benchmark. It loads ~40 sessions of conversation history and asks 500 questions. It tests whether a memory system can recall facts, handle updates, track preferences, reason temporally, and connect information across sessions.
What it does not test:
This is the fundamental architectural difference. In-context memory is session-scoped. External memory is permanent. For a quick coding session, session-scoped might be enough. For an agent that works with you across weeks and months, it isn't.
The Benchmark Numbers
Both systems score well on LongMemEval. But the more interesting signal isn't the raw scores. It's what happens when you change the actor model:
| System | Score | Note |
|---|---|---|
| OMEGA | 95.4% | #1 overall. External SQLite + ONNX embeddings. Category-tuned answer prompts. |
| Mastra OM (gpt-5-mini) | 94.87% | Observer + Reflector with gemini-2.5-flash. In-context only. |
| Mastra OM (gpt-4o) | 84.23% | Same architecture, weaker actor model. 10+ point drop. |
| Zep / Graphiti | 71.2% | Graph-based approach. Self-reported. |
The key insight from these numbers: in-context memory is inherently coupled to the actor model. The model must parse the observation block, find relevant facts, and reason about them. A stronger model does this better. That's why Mastra drops 10+ points when switching from gpt-5-mini to gpt-4o. Same memory architecture, same observations, dramatically different results.
External retrieval decouples the memory system from the reasoning model. OMEGA's retrieval pipeline (BM25 + vector search + reranking) works the same regardless of which LLM answers the questions. Your memory quality doesn't degrade when you switch to a cheaper or faster model.
Transparency note: OMEGA's 95.4% uses category-tuned answer prompts (different prompts per question type), making a direct score-to-score comparison misleading. The architectural advantages above hold regardless of benchmark methodology.
How They Actually Work
The architectural difference explains everything else. Here's the step-by-step:
OMEGA: External storage
- 1.Agent learns something new or completes a task
- 2.OMEGA extracts the memory and stores it in SQLite with ONNX embeddings (local, no API)
- 3.Consolidation engine merges duplicates, flags contradictions, decays stale memories
- 4.On next query: hybrid BM25 + vector search retrieves the top 5-10 relevant memories
- 5.~1,500 tokens injected into context. The rest stays in storage.
- 6.Memories persist forever across sessions, agents, projects, and tools
Mastra: In-context compression
- 1.Conversation accumulates messages normally
- 2.At ~30K tokens, the Observer agent (gemini-2.5-flash) summarizes recent messages into timestamped observations
- 3.Observations are appended to a context block (append-only, prompt-cache-friendly)
- 4.At ~40K tokens, the Reflector agent rewrites the observations into a shorter summary
- 5.Rewriting is lossy: detail is permanently discarded during reflection
- 6.When the session ends, all observations are gone. No external persistence.
The key tradeoff: Mastra's approach means every observation is always visible to the model. No retrieval can miss anything, because there is no retrieval. But it also means the context window carries an ever-growing block of compressed text, and the Reflector's rewriting is lossy. Details that seem unimportant during rewriting are gone forever.
OMEGA's approach means retrieval can theoretically miss a relevant memory. But memories are never lost, only harder to find. Hybrid BM25 + vector search with semantic reranking minimizes retrieval failures, and the consolidation engine actively manages memory quality over time.
The Token Economics
Here is where the architectural difference hits your budget. In-context memory means every API call carries the full observation block. External memory means you pay for retrieval results only.
| Dimension | OMEGA | Mastra |
|---|---|---|
| Memory storage cost | $0 (local SQLite file) | LLM API cost for Observer + Reflector (gemini-2.5-flash per invocation) |
| Tokens per memory query | ~1,500 (top-k retrieval) | ~30,000-70,000 (full context block) |
| At 10K queries/month | ~15M tokens in, near-zero cost | ~300M-700M tokens in, significant cost at scale |
| Embedding cost | $0 (local ONNX, no API) | N/A (no embeddings) |
| Infrastructure | None (single SQLite file) | None (in-context) |
At small scale (a few sessions a day), the cost difference is negligible. At production scale (thousands of agent sessions per month), the difference between ~1,500 and ~50,000 tokens per memory query compounds into significant spend.
Mastra partially offsets this with prompt caching. Because observations are append-only, the cached prefix stays valid across turns within a session. This is a real advantage for providers that support prompt caching. But it doesn't help across sessions, because there are no cross-session observations.
What In-Context Memory Can't Do
These aren't feature gaps that Mastra could add later. They're architectural constraints of in-context memory:
Cross-session memory
When the context window closes, observations are gone. Starting a new session means starting from zero. OMEGA's SQLite persists across sessions, tools, projects, and reboots.
Selective retrieval
You can't retrieve specific memories from an observation block. The entire block is injected into every turn. With 50K tokens of observations, every API call pays for all of them, even when asking about one specific fact.
Memory lifecycle management
There is no way to decay, archive, or selectively forget observations. The Reflector compresses everything uniformly. OMEGA's consolidation engine decays stale memories, merges duplicates, and flags contradictions with full audit trails.
Multi-agent memory sharing
Each Mastra agent has its own context window. Two agents working on the same project can't share observations. OMEGA's shared SQLite store with entity graphs enables genuine multi-agent coordination.
Unbounded growth
After thousands of interactions, OMEGA's retrieval stays O(log n). Mastra's observation block grows until the Reflector starts discarding, and you have no control over what gets discarded.
When Mastra Is the Right Choice
I'm not going to pretend OMEGA is better in every scenario. Mastra's approach genuinely wins if:
These are legitimate use cases. If your agent does one task per session and doesn't need to remember anything from yesterday, Mastra's zero-infra approach is arguably simpler than running any external memory system.
The Real Question: What Does “Memory” Mean?
Here's what this comparison really comes down to: what do you mean when you say you want your AI agent to have “memory”?
If you mean “remember more within this conversation,” Mastra's observational memory is a good solution. It extends effective context length by compressing older messages into observations.
If you mean “remember across conversations, learn over time, and build up knowledge,” you need external persistent memory. That's what OMEGA does. Memories survive after sessions end. They accumulate. The consolidation engine manages them over time. Your agent on day 100 is meaningfully different from your agent on day 1.
The first is context extension. The second is memory. Both are valuable. They're just solving different problems.
If persistent, cross-session memory is what you need, OMEGA takes 30 seconds to set up:
- Jason Sosa, builder of OMEGA
Related reading
See the full feature matrix with Mastra, Mem0, Zep, and Letta.