Skip to main content

Knowledge Base

Overview

The OMEGA knowledge base lets you ingest documents (PDF, markdown, web pages, plain text, CSV, JSON) into a searchable store with semantic chunking and vector embeddings. Once ingested, documents are searchable alongside your memories, enabling RAG-style retrieval over personal and project documentation.

Install PDF support:

  • Full: pip install omega-memory[knowledge-pdf] (Docling primary extractor + pdfplumber fallback)
  • Lite: pip install omega-memory[knowledge-pdf-lite] (pdfplumber only)

Without PDF extras, markdown, HTML, plain text, CSV, and JSON are supported out of the box.

Documents can be scoped to entities (e.g., entity_id="acme") to keep different organizations' knowledge separate.

Quick Example

# Ingest a PDF
omega_ingest_document(path_or_url="/path/to/architecture.pdf", title="System Architecture")

# Ingest a webpage
omega_ingest_document(path_or_url="https://docs.example.com/api-reference")

# Search across all ingested documents
omega_search_documents(query="authentication flow", limit=5)

# List everything in the knowledge base
omega_list_documents()

Tools Reference

ToolPurpose
omega_ingest_documentIngest a document from a file path or URL. Supports title, source_type override, and entity_id for scoping.
omega_search_documentsSemantic search across all ingested chunks. Filter by entity_id, source_type, and limit.
omega_list_documentsList all documents in the knowledge base with chunk counts and metadata
omega_remove_documentRemove a document and all its chunks/embeddings by source path
omega_scan_documentsScan a directory (default ~/.omega/documents/) for new or changed files and auto-ingest. Checksum-based --- only re-ingests modified files.
omega_sync_kbSync pending files from the cloud knowledge base queue (Supabase uploads via web app) into the local knowledge base.

Supported Formats

FormatSource TypeNotes
PDFpdfDocling (primary, native markdown output) with pdfplumber fallback
Markdownmarkdown.md files
HTMLwebpageWeb URLs and .html files
Plain texttext.txt files
CSVtextIngested as text
JSONtextIngested as text

Source type is auto-detected from the file extension or URL. Override with the source_type parameter if needed.

Common Workflows

Ingest a Document

From a local file:

omega_ingest_document(path_or_url="/Users/me/docs/api-spec.pdf", title="API Specification v2")

From a URL:

omega_ingest_document(path_or_url="https://docs.example.com/getting-started")

With entity scoping:

omega_ingest_document(path_or_url="/Users/me/docs/acme-contract.pdf", entity_id="acme", title="Acme Service Agreement")

Search Documents

Basic search:

omega_search_documents(query="rate limiting configuration")

Filtered by entity:

omega_search_documents(query="billing terms", entity_id="acme", limit=3)

Filtered by source type:

omega_search_documents(query="deployment steps", source_type="pdf")

Auto-Scan a Directory

Place files in ~/.omega/documents/ (or any directory), then scan:

omega_scan_documents()

Or scan a custom directory:

omega_scan_documents(directory="/Users/me/Projects/myapp/docs")

The scanner is checksum-based: it only re-ingests files whose content has changed since the last scan. New files are ingested automatically.

Remove a Document

omega_remove_document(source_path="/Users/me/docs/old-spec.pdf")

This removes the document and all its chunks and embeddings from the knowledge base.

CLI Commands

omega knowledge scan                          # Scan default directory
omega knowledge scan --directory /path/to/docs  # Scan custom directory
omega knowledge list                          # List all documents
omega knowledge search "authentication flow"  # Search documents

Tips

  • Docling produces better PDF output. The full knowledge-pdf extra uses Docling, which extracts native markdown with proper heading structure. The lite extra (pdfplumber only) works but produces flatter text.
  • Chunking is semantic. Documents are split into chunks at natural boundaries (headings, paragraphs) rather than fixed token counts. This improves retrieval relevance.
  • Use entity scoping for multi-org work. If you manage documents for multiple companies, scope each with entity_id so searches return only relevant results.
  • Auto-scan for hands-free ingestion. Drop files into ~/.omega/documents/ and let hooks or CLI handle ingestion. The checksum check prevents duplicate work.
  • Search returns chunks, not whole documents. Each result is a specific chunk with source attribution (document title, page number if applicable). This keeps results focused and relevant.
  • Remove and re-ingest to update. If a document changes, remove the old version and ingest the new one. The scan command handles this automatically via checksum comparison.