OpenClaw Mem Engine (slot backend) — Design + Roadmap¶

Working name: openclaw-mem-engine

Scope: an optional OpenClaw memory slot backend plugin shipped alongside openclaw-mem.

TL;DR (big picture)¶

openclaw-mem stays the sidecar (capture, governance, observability, grading, receipts).

openclaw-mem-engine becomes an optional slot owner (replaces memory-lancedb when enabled) so we can:

do hybrid recall (FTS + vector) with scopes & metadata filters,
make recall behavior auditable + tunable (receipts, knobs, policies),
add safe M1 automation: conservative autoRecall + strict autoCapture (configurable),
and actually exploit LanceDB features that the official memory-lancedb backend doesn’t surface.

Rollback remains trivial: switch plugins.slots.memory back to memory-lancedb or memory-core.

Why we’re doing this¶

Observed gap:

The official memory-lancedb plugin uses LanceDB mostly as a basic vector store (vectorSearch) and does not expose:
full-text search (FTS; BM25 scoring),
hybrid fusion/rerank,
scope-aware metadata filtering + scalar indexes,
index lifecycle/optimize,
version tags / easy rollback.

Operator goal (what CK cares about):

“Concept/goal → the right decisions/preferences/projects” should be reliably retrievable.
Retrieval must be controllable (scope, importance, recency) and explainable (why these results).

Product shape (two planes)¶

Plane A — Mem Ops (sidecar, always-on)¶

Owned by openclaw-mem:

capture (tool outcomes, optional message events)
ingestion into SQLite ledger
importance grading + drift detection
provenance / trust tiers
receipts & dashboards
(future) context packing / graph view

Plane B — Mem Engine (slot backend, optional)¶

Owned by openclaw-mem-engine:

implements OpenClaw canonical memory tools for the active slot:
memory_store
memory_recall
memory_forget
exposes operator/admin tools for parity workflows:
memory_list
memory_stats
memory_export
memory_import
CLI compatibility layer:
openclaw memory list|stats|export|import (when plugin CLI wiring is available)
fallback: openclaw ltm list|stats|export|import
uses LanceDB as the online store for fast retrieval & hybrid search.

Key stance: sidecar governs; engine serves.

M1 automation (what is safe to ship now)¶

autoRecall (conservative)¶

Hook: before_agent_start
Default: on (but gated by heuristics)
Behavior:
skip trivial prompts (greetings/ack/emojis/HEARTBEAT/slash commands)
- robust to trailing emoji/punctuation (e.g. 好的👌, ok👍, hi～, 收到!!)
- punctuation-only prompts also skip (e.g. ？, …)
recall tiers: must_remember → nice_to_have → (optional) unknown fallback
cap: <=6 memories
escapes memory text to reduce prompt-injection risk
emits bounded lifecycle receipt (openclaw-mem-engine.recall.receipt.v1) with skip reason / tier counts / top IDs
in receipts.verbosity=high, injects a compact autoRecall wrapper comment (IDs only; no memory text in receipt)
- default low keeps receipts in logs only (no prompt-side comment)

autoCapture (strict)¶

Hook: agent_end
Default: on (but strict allowlist)
Behavior:
capture only a small number of items (1–4 per turn)
default categories: preference, decision
skip tool outputs; prefer user text; skip secrets-like strings
dedupe near-identical items
emits bounded lifecycle receipt (openclaw-mem-engine.autoCapture.receipt.v1) with extracted/filtered/stored counts

Rollback¶

One line: - set plugins.slots.memory back to memory-lancedb (or memory-core) and restart gateway.

Receipt controls (P0-2)¶

receipts.enabled (default true)
receipts.verbosity (low default, high optional)
receipts.maxItems (default 3, hard cap 10)

Design constraints: - receipts are bounded/deterministic by default - no memory text is emitted in receipt payloads by default (IDs + scores only) - operator-legible explainability is first-class (whySummary, whyTheseIds, fallback suppression reason)

Scope policy + context budget knobs (Rollout Step 1/2)¶

openclaw-mem-engine now exposes two rollbackable control planes:

1) Scope policy (default-on namespace isolation) - scopePolicy.enabled (default true; kill-switch) - scopePolicy.defaultScope (default "global") - scopePolicy.fallbackScopes (default []; ordered allowlist, only consulted if primary scope is insufficient) - scopePolicy.fallbackMarker (default true; emit observable fallback marker in logs/receipts) - scopePolicy.skipFallbackOnInvalidScope (default true; invalid strict scope tags suppress fallback scopes) - scopePolicy.validationMode (strict default; normalize / none optional) - scopePolicy.maxScopeLength (default 64)

Write-path hardening: - scope values are validated/normalized at write-time (memory_store, autoCapture, memory_import) - invalid strict scopes fall back to defaultScope and emit a scope-validation warning receipt/log marker

2) Final prepend context budget (hard ceiling at packing tail) - budget.enabled (default true; kill-switch) - budget.maxChars (default 1800) - budget.minRecentSlots (default 1) - budget.overflowAction (truncate_oldest default; truncate_tail optional)

Packing semantics: - budget is enforced at the very end of autoRecall packing (prependContext) independent of maxItems - if overflow occurs: - truncate_oldest: drop oldest-by-createdAt slots first, while protecting the minRecentSlots most-recent slots - truncate_tail: drop from the tail of the selected/relevance-ordered list first (least relevant), still protecting minRecentSlots - if still above cap (e.g. receiptComment itself is large), final tail slicing is applied to guarantee a deterministic hard ceiling - when truncation happens and budget.enabled=true, emit openclaw-mem-engine:contextBudget marker (before/after chars + dropped ids/count)

Example config snippet:

{
  "plugins": {
    "entries": {
      "openclaw-mem-engine": {
        "enabled": true,
        "config": {
          "scopePolicy": {
            "enabled": true,
            "defaultScope": "global",
            "fallbackScopes": ["openclaw-mem", "personal"],
            "fallbackMarker": true,
            "skipFallbackOnInvalidScope": true,
            "validationMode": "strict",
            "maxScopeLength": 64
          },
          "budget": {
            "enabled": true,
            "maxChars": 1800,
            "minRecentSlots": 1,
            "overflowAction": "truncate_oldest"
          }
        }
      }
    }
  }
}

Rollback (single-line posture): - disable either feature without code changes: - scopePolicy.enabled = false - budget.enabled = false

Rollout Step 3 — guarded TODO capture (default off)¶

Step 3 keeps the default behavior unchanged (autoCapture.captureTodo = false) and adds explicit TODO guardrails that can be enabled/rolled back via config only.

New knobs under autoCapture: - captureTodo (default false) - maxTodoPerTurn (default 1, min 0, max 3) - todoDedupeWindowHours (default 24, min 1, max 168) - todoStaleTtlDays (default 7, min 1, max 90)

Guardrail behavior when captureTodo=true: - TODO capture is capped by maxTodoPerTurn per agent turn. - TODO dedupe is scope-scoped + time-bounded: only same-scope TODOs within todoDedupeWindowHours are considered duplicates. - TODO injection obeys a deterministic recall-time TTL: TODO memories older than todoStaleTtlDays are dropped from autoRecall injection. - Drops emit bounded markers/receipts (openclaw-mem-engine:todoGuardrail, plus autoCapture receipt counters).

Operational note (Telegram / injected metadata): - Some deployments include autoRecall receipts (e.g. <relevant-memories>…</relevant-memories>) and code-fenced metadata blocks in the same inbound message. - autoCapture strips these injected artifacts before candidate extraction, and filters tool-like content per candidate line, so a real user TODO line (e.g. TODO: …) can still be captured. - Scope tags can be on the same line or on the previous line: - TODO: ... preceded by [SCOPE: openclaw-mem] on its own line is still captured into that scope. - Keep TODO lines outside code fences for best results.

Recommended enable snippet (Step 3):

{
  "plugins": {
    "entries": {
      "openclaw-mem-engine": {
        "enabled": true,
        "config": {
          "autoCapture": {
            "enabled": true,
            "captureTodo": true,
            "maxTodoPerTurn": 1,
            "todoDedupeWindowHours": 24,
            "todoStaleTtlDays": 7
          },
          "budget": {
            "enabled": true,
            "maxChars": 1800,
            "minRecentSlots": 1,
            "overflowAction": "truncate_oldest"
          }
        }
      }
    }
  }
}

Rollback: - immediate kill switch: autoCapture.captureTodo = false

Rollout Step 4 — deterministic Working Set (canary-gated)¶

Step 4 is now wired in before_agent_start behind a rollbackable config gate: - workingSet.enabled (default false) - workingSet.persist (default true) - workingSet.maxChars / maxItemsPerSection / maxGoalChars / maxItemChars

Behavior when enabled: - synthesize a per-scope working state blob from recent scoped memories + current prompt - goal / constraints / decisions / next_actions / open_questions - pin Working Set as the first injected slot before normal recall results - optional upsert persistence with deterministic ID working_set:<scope>

Receipts: - recall lifecycle now includes optional workingSet summary (generated, id, chars, section counts, persisted)

Rollback: - workingSet.enabled = false

Architecture overview¶

                     +------------------------------+
                     |  OpenClaw Agent / Sessions   |
                     +---------------+--------------+
                                     |
                                     | memory_store / memory_recall
                                     v
                     +------------------------------+
                     |   openclaw-mem-engine slot   |
                     |   (LanceDB hybrid backend)   |
                     +---------------+--------------+
                                     |
                                     | reads/writes
                                     v
                     +------------------------------+
                     | LanceDB table: memories      |
                     |  - vector + FTS + metadata   |
                     +---------------+--------------+
                                     ^
                                     |
                writeback annotations|  (importance/trust/provenance)
                                     |
                     +---------------+--------------+
                     | openclaw-mem (sidecar ops)   |
                     |  - capture JSONL             |
                     |  - ingest SQLite             |
                     |  - grade importance          |
                     |  - receipts & drift checks   |
                     +------------------------------+

Decision (confirmed): LanceDB is the source of truth for the engine path. Sidecar may maintain its SQLite ledger for auditability and long-horizon analysis.

Data model (what we store)¶

Minimum viable schema for LanceDB memories table:

id (uuid)
text (string)
vector (float[]) — embedding of text (or of a normalized form)
createdAt (ms)

Governance metadata (needed for “concept→right stuff”):

category (preference|decision|fact|entity|other)
importance (0..1)
importance_label (must_remember|nice_to_have|ignore|unknown)
scope (string; e.g. openclaw-mem, finlife, personal, …)
trust_tier (trusted|untrusted|quarantined)
source_kind (operator|tool|web|import|system)
source_ref (optional; tool name, URL, transcript id, etc.)
lang (optional; zh|en|…)

Notes:

“importance unknown” stays fail-open: we don’t auto-filter unknown unless an explicit query policy requests it.
Keep schema additive: new columns should be optional and backfillable.

Retrieval design (the part you’ll feel)¶

Query pipeline¶

1) Parse intent - detect optional scope hints (explicit param, or inferred from tags) - choose policy defaults (limit, min score, recency window)

2) Hybrid retrieval - FTS (BM25-scored) path: exact-ish keyword matching for names/paths/ids - Vector path: semantic similarity - Fuse results with a deterministic strategy (e.g., RRF), optionally rerank.

3) Metadata filtering (before or during query) - scope filter (most important) - importance_label filter (default policy): prefer must_remember + nice_to_have, then fallback to unknown, then ignore if still empty. - recency bias (soft) - trust gating (prefer trusted; allow untrusted when recall would be empty)

4) Return - bounded list (K) - include lightweight explanations: score components + why it passed filters - return structured JSON objects with stable ids + governance metadata so a packer can wrap results into an injection-ready ContextPack (see docs/context-pack.md)

Why hybrid is MVP-critical¶

Pure vector fails on:
paths, error codes, commit hashes
short identifiers
partial quotes
Pure FTS fails on:
paraphrases / concept-level asks

Hybrid is the minimum that makes “concept→decisions/preferences” feel reliable.

Docs memory (decisions/roadmaps/specs) as a cold lane¶

Operators feel recall failure most painfully on: “we already decided this.”

So the engine path should be able to optionally search an operator-authored docs corpus (DECISIONS / roadmaps / specs) using the same hybrid recipe (FTS + embeddings) and return bounded citations.

Status (current): - shipped as docsColdLane in openclaw-mem-engine config - installable tools: memory_docs_ingest, memory_docs_search - memory_recall + autoRecall can consult cold lane only after hot lane is insufficient (minHotItems) - results are marked source_kind=operator, trust_tier=trusted - embeddings are optional/fail-open (FTS-only still works)

Spec: Docs memory (hybrid search v0) →
Ops guide: mem-engine-admin-ops.md
Operational hardening note: Scoped docs search starvation — root cause, first fix, and verifier (2026-03-21)
Next design slice: Docs cold lane — scope pushdown v1
Stance: no local LLM; rerank (if needed) is remote + bounded.

Embeddings hardening (clamp + fail-open)¶

Embeddings providers enforce an input limit (often expressed as a max token count). If an agent passes a very long string into memory_store/memory_recall, the provider can return a 400 ("input too long").

openclaw-mem-engine hardens this by: - clamping embedding inputs deterministically (head+tail) before calling the provider - failing open when embeddings are unavailable/over-limit (tools keep working; results may degrade)

Config knobs (OpenClaw config paths): - plugins.entries.openclaw-mem-engine.config.embedding.maxChars (default: 6000) - plugins.entries.openclaw-mem-engine.config.embedding.headChars (default: 500) - plugins.entries.openclaw-mem-engine.config.embedding.maxBytes (optional; UTF-8 cap)

Example (pin defaults explicitly):

openclaw config set plugins.entries.openclaw-mem-engine.config.embedding.maxChars 6000
openclaw config set plugins.entries.openclaw-mem-engine.config.embedding.headChars 500
# optional extra safety
openclaw config set plugins.entries.openclaw-mem-engine.config.embedding.maxBytes 24000

Receipt visibility: - memory_store receipts include: embeddingSkipped + embeddingSkipReason - memory_recall receipts include skip reasons (e.g. embedding_input_too_long) when vector embedding is skipped

Fail-open UX: - If embeddings are unavailable/over-limit, memory_recall will still return lexical-only (FTS) results and prepend a ⚠️ warning. - If embedding is skipped during memory_store, the tool still stores the record (zero vector fallback) and returns a ⚠️ warning.

Write path + governance writeback¶

`memory_store`¶

Engine writes the primary record to LanceDB.
Sidecar may later write back computed governance fields:
importance / importance_label
scope
trust_tier
category
additional provenance
Command used for writeback: openclaw-mem writeback-lancedb --db <sqlite> --lancedb <path> --table <name> [--limit N] [--batch N] [--dry-run] [--force] [--force-fields <fields>]
--force/--overwrite (default off): allow overwriting existing values when incoming values are present.
--force-fields (comma-separated): restrict overwrite to selected fields (default safe subset: importance,importance_label,scope,category; trust_tier must be explicit).
Writeback receipts should report: forceOverwrite, forceFields, updated, skipped, overwritten, overwrittenFields, and missing IDs.

Wei Ji memory preflight (optional, now hookable)¶

openclaw-mem-engine can now call Wei Ji automatically before memory_store writes.

Purpose: - move governance to the dangerous moment right before memory becomes system truth - reduce operator memory burden (the flow asks Wei Ji first) - keep the lane rollbackable and bounded

Config gate: - plugins.entries.openclaw-mem-engine.config.weijiMemoryPreflight

Recommended host posture: - enabled: true - failMode: "open" - failOnQueued: false - failOnRejected: false

That gives an advisory-first live lane: - Wei Ji runs automatically - runtime failure does not brick memory writes - receipts still expose what Wei Ji said

Receipts surface: - memory_store.details.receipt.weiJiMemoryPreflight

Blocking modes: - failMode: "closed" blocks on runtime/subprocess failure - failOnQueued: true blocks if Wei Ji queues review - failOnRejected: true blocks if Wei Ji rejects

Rollback: - disable the config gate, then restart the gateway

Importance grading integration¶

The heuristic-v1 scorer (and later scorers) live in the sidecar.
memory_recall defaults to must+nice (must_remember + nice_to_have), then fail-open includes unknown, then ignore.
Engine uses graded fields for filtering defaults and includes policyTier in recall receipts.

Sunrise rollout (slow-cook lane + cron “日出條款”)¶

We roll this out in three stages to keep the system safe and rollbackable.

Stage A (background, 0-risk): keep plugins.slots.memory on memory-lancedb.
Run periodic writeback from sidecar SQLite → LanceDB (metadata governance).
Silent-on-success; only notify on anomalies (errors/missing IDs).
Stage B (canary, short window): off-peak temporary slot switch.
Backup config → switch slot to openclaw-mem-engine → run a small golden-set recall check → rollback.
Goal: prove lifecycle receipts (policyTier + ftsTop/vecTop/fusedTop + tier counts) behave as expected under real traffic.
Stage C (live): switch default slot to openclaw-mem-engine.
Keep an auto-downgrade path (switch back to memory-lancedb) if recall error-rate/latency spikes.

Recommended cron alignment (Asia/Taipei): - Importance grading slow-cook runs 0 */4 * * *. - Stage A writeback runs at 20 */4 * * * (20 minutes after) to avoid overlap.

Indexing & performance (making LanceDB “actually fast”)¶

We should treat index lifecycle as an operator concern with receipts.

Build indices:
vector index (createIndex("vector", …))
FTS index on text
scalar indices on scope, importance_label, category, maybe createdAt
Maintenance:
run optimize() periodically (or on thresholds)
expose “index status” in JSON receipts

Admin ops (P0-1 shipped)¶

Admin surfaces are now implemented in the engine path with receipts:

list: filter by scope / category, bounded by limit
stats: counts by scope/category + size/age summaries
export: deterministic ordering + default text redaction
import: append mode + dedupe (none|id|id_text) + dry-run validation

Each operation emits receipt/debug fields including:

applied filters
returned/imported counts
backend context (dbPath, tableName, latency)

See Engine admin ops (P0-1) for examples.

Roadmap (detailed early milestones, coarse later)¶

M0 — Engine skeleton (slot switch + basic tools)¶

Goal: we can replace the slot without breaking workflows.

Deliverables:

New OpenClaw extension/plugin: openclaw-mem-engine
Config switch:
plugins.slots.memory = "openclaw-mem-engine"
rollback: switch back to memory-lancedb / memory-core
Tools implemented:
memory_store(text, importance?, category?, scope?, …)
memory_recall(query, limit?, scope?, …)
memory_forget(id)
JSON receipts for every tool call (at least: counts, filters applied, latency ms)
memory_recall now exposes bounded lifecycle receipts (details.receipt.lifecycle) with skip reason, per-tier candidate/selected counts, and compact ftsTop / vecTop / fusedTop IDs.

Definition of done:

Smoke test:
store → recall → forget works end-to-end
empty recall returns empty (no errors)
Fail-open behavior:
embedding provider errors do not crash the agent loop (tool returns a compact error + suggestion)

M1 — Hybrid + scopes MVP (the “you will feel it” milestone)¶

Goal: make concept/goal queries consistently pull the right decisions/preferences.

Deliverables:

FTS index on text + vector search on vector
Hybrid fusion (deterministic; RRF baseline)
Metadata filters:
scope (hard)
importance_label (soft default: prefer must/nice)
trust_tier (soft default: prefer trusted)
Basic scalar indexes for filter columns
Eval harness (offline):
a small golden set of queries → expected memory ids/texts
measures: hit@k, qualitative “wrong-scope” rate, latency

Definition of done:

For a golden set of “concept→decision/preference” queries, hybrid beats vector-only baseline.
Query results show scope & importance in the returned payload (auditable).

M1.5 — Sidecar writeback + policy-tiered recall defaults¶

Goal: close the loop between graded SQLite ledger artifacts and runtime recall behavior.

Deliverables:

Sidecar command for bounded, dry-run safe metadata writeback into LanceDB:
openclaw-mem writeback-lancedb --db <sqlite> --lancedb <path> --table <name> [--limit N] [--batch N] [--dry-run] [--force] [--force-fields <fields>]
fields: importance, importance_label, scope, trust_tier (if available), category
memory_recall default policy sequence:
must_remember + nice_to_have
fallback unknown
fallback ignore
Recall receipts include which policy tier was selected (e.g., must+nice, must+nice+unknown, must+nice+unknown+ignore).

M2 — Ops hardening (index lifecycle + optimize + drift)¶

Deliverables (coarse):

scheduled optimize with receipts
index existence + health checks
drift dashboard: label distribution + “unknown” %

M3 — Versioning safety net (tags + rollback)¶

Deliverables (coarse):

tag versions before large auto-capture or regrading changes
ability to revert/checkout a tagged version

Spec (v0): docs/specs/mem-engine-versioning-safety-net-v0.md

M4 — Multimodal (optional)¶

Deliverables (coarse):

add media_refs + media_kind metadata
define embedding strategy per modality

Risks & mitigations¶

Complexity creep: keep engine MVP small; push governance logic into sidecar.
Recall noise: default to scope-aware retrieval; only broaden when empty.
Migration risk: slot switch rollback is the safety valve; keep it one-line.
Native deps: LanceDB has native bindings; pin versions and test on target OS/arch.

Open questions (to settle while implementing M0/M1)¶

1) Scope source-of-truth: - explicit tool params vs inferred from chat tags vs both 2) How we expose explanations without bloating context 3) Default trust policy when everything is untrusted 4) Storage path + backup policy for LanceDB data

OpenClaw Mem Engine (slot backend) — Design + Roadmap¶

TL;DR (big picture)¶

Why we’re doing this¶

Product shape (two planes)¶

Plane A — Mem Ops (sidecar, always-on)¶

Plane B — Mem Engine (slot backend, optional)¶

M1 automation (what is safe to ship now)¶

autoRecall (conservative)¶

autoCapture (strict)¶

Rollback¶

Receipt controls (P0-2)¶

Scope policy + context budget knobs (Rollout Step 1/2)¶

Rollout Step 3 — guarded TODO capture (default off)¶

Rollout Step 4 — deterministic Working Set (canary-gated)¶

Architecture overview¶

Data model (what we store)¶

Retrieval design (the part you’ll feel)¶

Query pipeline¶

Why hybrid is MVP-critical¶

Docs memory (decisions/roadmaps/specs) as a cold lane¶

Embeddings hardening (clamp + fail-open)¶

Write path + governance writeback¶

memory_store¶

Wei Ji memory preflight (optional, now hookable)¶

Importance grading integration¶

Sunrise rollout (slow-cook lane + cron “日出條款”)¶

Indexing & performance (making LanceDB “actually fast”)¶

Admin ops (P0-1 shipped)¶

Roadmap (detailed early milestones, coarse later)¶

M0 — Engine skeleton (slot switch + basic tools)¶

M1 — Hybrid + scopes MVP (the “you will feel it” milestone)¶

M1.5 — Sidecar writeback + policy-tiered recall defaults¶

M2 — Ops hardening (index lifecycle + optimize + drift)¶

M3 — Versioning safety net (tags + rollback)¶

M4 — Multimodal (optional)¶

Risks & mitigations¶

Open questions (to settle while implementing M0/M1)¶

`memory_store`¶