ADR-001: Compounding Knowledge Engine (CKE)

Name: OMEGA
Author: OMEGA

Date: 2026-04-08 Status: Accepted (Internal R&D) Deciders: Jason

Context

OMEGA has 1322 memories with sophisticated retrieval (RRF, Thompson sampling, decay, feedback). But knowledge doesn't actively compound: storing a new memory that confirms an existing thesis doesn't strengthen the thesis. Contradictions are detected but don't trigger re-evaluation of related beliefs. There's no scheduled audit that identifies knowledge gaps.

Karpathy's autoresearch (modify-eval-keep/discard) and LLM Wiki (ingest-ripple-lint) provide the pattern. OMEGA already has the primitives. The gap is the orchestration layer that connects them into a compounding loop.

Decision

Build a lightweight CompoundingEngine class in src/omega_platform/compounding.py that orchestrates existing primitives into a feedback loop. No new database tables. No schema changes. Uses existing metadata fields, edges, and feedback mechanisms.

Three Operations

1. RIPPLE (post-store enrichment) When a new memory is stored, automatically:

Find top-5 semantically similar memories (already done by _auto_relate)
If similarity > 0.80 AND same entity/project: increment evidence_count in related memory metadata
If contradiction detected: create contradicts edge, mark old as superseded
If same thesis: strengthen (increment access_count on related memories)
Log ripple effects to a ripple_log metadata field on the new memory

Trigger: Hook into _schedule_auto_relate() or post-store pipeline.

2. LINT (periodic knowledge audit) Scheduled pass (daily or on-demand) that scores the knowledge base:

Orphan memories: 0 edges, 0 access, >30 days old. Flag for review.
Stale theses: Decisions/lessons not accessed in 60 days. Flag as potentially outdated.
Contradiction clusters: Groups of memories with contradicts edges. Surface for resolution.
Coverage gaps: Entity IDs with <3 memories. Projects with no recent decisions.
Prediction accuracy: Oracle calibration rollup. Which domains are we miscalibrated on?
Strength distribution: How many memories at strength <0.1? Knowledge is decaying faster than compounding?

Output: A lint_report memory (event_type: advisor_insight) summarizing findings.

3. THESIS TRACKING (evolving beliefs) New metadata pattern (not a new type, uses existing decision type):

{
  "event_type": "decision",
  "metadata": {
    "thesis": true,
    "thesis_id": "thesis-ai-healthcare-2026",
    "confidence": 0.7,
    "evidence_count": 5,
    "evidence_for": ["mem-abc", "mem-def"],
    "evidence_against": ["mem-ghi"],
    "last_evaluated": "2026-04-08",
    "domain": "ai/healthcare"
  }
}

Ripple operation updates confidence based on evidence ratio: confidence = evidence_for / (evidence_for + evidence_against + 1)

Architecture

Store Memory
    |
    v
[Existing] _auto_relate() -> creates edges
    |
    v
[NEW] ripple() -> strengthen related, detect contradictions,
                   update thesis confidence, log effects
    |
    v
[Existing] Strength computed on next query (automatic)

              [Scheduled]
                  |
                  v
             lint() -> scan for orphans, stale, gaps,
                       contradictions, calibration
                  |
                  v
             Store lint_report as advisor_insight
                  |
                  v
             [Optional] Generate research tasks from gaps

What We DON'T Build

No new database tables
No new MCP tools (yet)
No admin UI (yet)
No changes to query pipeline
No scheduled cron (manual or simple timer for now)

Consequences

Positive

Knowledge compounds automatically after every store
Lint catches decay before it becomes invisible
Thesis tracking gives explicit confidence levels on beliefs
Uses existing primitives, minimal new code (~200 lines)
VC case study demonstrates value concretely

Negative

Ripple adds ~100ms to store operations (background thread, acceptable)
Lint on 1322 memories takes ~5-10 seconds
Thesis confidence is simplistic (evidence ratio, not Bayesian)

Neutral

Internal R&D only, can iterate freely without API stability concerns
Seed data for VC case study exercises all three operations

Alternatives Considered

A: Instruction-only (program.md approach)

No code, just agent instructions. Rejected because it doesn't compound across sessions and requires the agent to remember to lint.

C: Full engine with new schema

New theses table, lint_results table, scheduled cron. Rejected as premature for R&D. Can upgrade later if the pattern proves valuable.