Skip to content

Completeness roadmap (vs memory-lancedb-pro)

Goal: keep openclaw-mem comparable in completeness to win4r/memory-lancedb-pro at the level of operator-facing capabilities (not necessarily identical UI).

Reference project: https://github.com/win4r/memory-lancedb-pro

Current baseline (shipped)

  • ✅ Hybrid recall (FTS + vector) in openclaw-mem-engine
  • ✅ Scope-aware filtering + policy tiers (must → nice → optional unknown)
  • ✅ M1 automation:
  • ✅ Proactive Pack / autoRecall (conservative, skip trivial prompts, capped, escaped)
  • ✅ autoCapture (strict allowlist, secret-skip, dedupe, capped)
  • ✅ Deterministic, rollbackable ops posture (slot switch + per-feature disable)

Gap backlog (fill-in plan)

P0 — Operator parity (must be comparable)

1) ✅ Admin surfaces (comparable) - Shipped in openclaw-mem-engine: - list memories (scope/category/limit filters) - stats (counts by scope/category + size/age summaries) - export (sanitized deterministic JSONL/JSON) - import (append + dedupe + dry-run) - Surfaces: - tool API: memory_list, memory_stats, memory_export, memory_import - CLI: openclaw memory <list|stats|export|import> when plugin CLI is loaded - fallback namespace: openclaw ltm <list|stats|export|import> - Acceptance met: operator can audit counts by scope/category and export a sanitized snapshot with receipts.

2) ✅ Receipts/debug transparency for recall lifecycle (P0-2) - Shipped bounded lifecycle receipt (openclaw-mem-engine.recall.receipt.v1) for: - manual memory_recall tool results (details.receipt.lifecycle) - autoRecall hook logs + optional injection wrapper comment (receipts.verbosity=high) - Includes: skip status/reason, tiers searched, tier counts (candidates/selected), ftsTop / vecTop / fusedTop (IDs + scores only), final injected count - Explicit rejection reasons now emitted: trivial_prompt, no_query, no_results_must, no_results_nice, provider_unavailable, budget_cap - Config knobs: receipts.enabled, receipts.verbosity, receipts.maxItems (default: enabled + low + 3) - Acceptance met: recall path is auditable without exposing memory text in receipts by default.

3) ✅ Namespace & scope hygiene - Shipped hardening: - line-anchored scope tag parsing ([ISO] / [SCOPE]) that ignores code fences + injected <relevant-memories> blocks - scopePolicy.skipFallbackOnInvalidScope=true (default) to suppress fallback on invalid strict scopes - explicit scopeFallbackSuppressed marker for operator debugging - Acceptance met: same user runs 2 projects; recall doesn’t cross unless explicitly allowed

4) ✅ Step4 rollout wiring: deterministic Working Set + operator receipts - Added config-gated Working Set (workingSet.enabled, default off for canary) - Deterministic synthesis from recent per-scope preference/decision/todo rows + prompt questions - Pinned injection before normal recall slots; optional upsert persistence (working_set:<scope>) - Recall receipts now include workingSet summary + whySummary / whyTheseIds

P1 — Quality parity (makes it feel “pro”)

4) Fusion/ranking improvements (still deterministic) - Calibrate hybrid fusion weights; add optional recency boost - Acceptance: recall quality improves on benchmark + no large regressions

5) Retention/TTL policy (opt-in) - Optional TTL/decay for low-importance captures - Acceptance: DB growth bounded without losing must_remember

6) Safety hardening - Stronger secret detector + PII heuristics; capture redaction rules - Shipped (2026-04-27): deterministic high-risk token coverage widened for sk-proj, github_pat, AWS secret-access-key assignments, and long Bearer auth values across mem-engine autoCapture + episodic helper guards. - Shipped (2026-04-27, follow-up): shared synthetic golden corpus tests/data/SECRET_DETECTOR_GOLDEN.v1.json now drives mem-engine source-contract checks, episodic guard tests, and sidecar/plugin redaction coverage from one source of truth. - Shipped (2026-04-27, follow-up): sidecar/plugin tool-result summary runtime behavior is now covered by a bounded black-box Node test (extensions/openclaw-mem/toolResultSummary.test.mjs) wired through tests/test_plugin_episodic_summary_runtime.py. - Shipped (2026-04-27, follow-up): tool_result_persist now has an end-to-end fake-API plugin harness runtime check (extensions/openclaw-mem/toolResultPersistE2E.test.mjs) asserting emitted episodic tool.result JSONL lines stay redacted/non-leaking while preserving benign text. - Shipped (2026-04-27, stdout/stderr follow-up): the same e2e harness now includes a stdout/stderr-style payload case that must collapse to result captured (output redacted)-style summaries, with explicit no-leak checks against secret-like needles across the full JSONL row. - Shipped (2026-04-27, structured-json follow-up): tool-result summary/runtime coverage now includes structured JSON payloads with stdout/stderr fields (must collapse to redacted-output posture without leaks) plus benign structured JSON/docs snippets (must remain informative). - Shipped (2026-04-27, malformed-json fallback boundary): tool-result summary/runtime coverage now also asserts malformed JSON-like payloads across { top-level, nested object/array, and root array-first ([) contexts, where quoted "stdout"/"stderr" mentions inside prose string content stay informative while true malformed key-like output fields with full OUTPUT_FIELD_KEYS parity (stdout, stderr, raw_stdout, raw_stderr, tool_output, command_output) still collapse. - Verification surface: - tests/test_episodes_extract_sessions.py - tests/test_episodic_secret_detection.py - tests/test_mem_engine_auto_capture_tool_output.py - tests/test_plugin_episodic_spool.py - extensions/openclaw-mem-engine/secretDetectorGolden.test.mjs - extensions/openclaw-mem/toolResultSummary.test.mjs - extensions/openclaw-mem/toolResultPersistE2E.test.mjs - tests/test_plugin_episodic_summary_runtime.py - Suggested focused verification commands: - uv run --group dev python -m pytest tests/test_episodes_extract_sessions.py tests/test_episodic_secret_detection.py tests/test_mem_engine_auto_capture_tool_output.py tests/test_plugin_episodic_spool.py tests/test_plugin_episodic_summary_runtime.py -q - node --test extensions/openclaw-mem-engine/secretDetectorGolden.test.mjs - node --test extensions/openclaw-mem/toolResultSummary.test.mjs - node --experimental-strip-types --test extensions/openclaw-mem/toolResultPersistE2E.test.mjs (includes plain + structured JSON stdout/stderr collapse, malformed-JSON fallback boundary assertions for top-level + nested + array-first contexts with full OUTPUT_FIELD_KEYS parity (stdout, stderr, raw_stdout, raw_stderr, tool_output, command_output), no-leak checks, and benign JSON/docs non-overblock assertions) - Acceptance: no obvious secrets captured in test corpus

P2 — UX/Website completeness (nice, but helps adoption)

7) Docs polish (README/About/website) - One killer demo flow, before/after, architecture diagram

8) Operator runbooks - Upgrade, rollback, incident playbook, troubleshooting

Execution protocol

  • We fill this backlog via single-agent hacking mode runs (one worker), each run:
  • updates docs (what changed + how to verify)
  • ships code + tests
  • logs Decision/Tech Note if it changes ops posture

P1-5 (fusion/ranking improvements) is the active next slice, then lifecycle MVP archive-first.

Current first cut: - Add a deterministic golden fixture for quota-based recall selection. - Keep base rank-fusion behavior unchanged; use the record timestamp only as a deterministic tie-break inside fallback overflow selection. - Add opt-in lifecycle writeback for records selected into the final pack: refresh detail_json.lifecycle.last_used_at / used_count, preserve archived_at, and never hard-delete. - Verify with focused engine and lifecycle tests before broader rollout or default changes.

Latest lifecycle MVP archive-first progress (shipped, governed lane): - optimize review emits bounded signals.soft_archive_candidates proposals for stale low-importance rows with explicit must_remember, recent-use, and already-archived protections. - optimize evolution-review emits bounded set_soft_archive_candidate proposals with safe_for_auto_apply=false by default. - optimize governor-review requires explicit --approve-soft-archive before any soft-archive candidate can become approved_for_apply. - optimize assist-apply now supports governed soft-archive mutation with reversible lifecycle writes only (soft_archive_candidate, archived_at, archive_reason_code) and apply-time protection rechecks; no hard-delete path is introduced. - canary-readiness hardening now extends verifier/posture accounting so soft-archive action counts are visible in verifier-bundle/posture-review, and verifier receipts assert no-hard-delete row-count invariants alongside rollback replay.

Focused verifier command for this slice: - uv run --group dev python -m pytest tests/test_optimize_review.py tests/test_optimize_evolution_review.py tests/test_optimize_governor_review.py tests/test_optimize_assist_apply.py tests/test_optimize_effect_followup.py tests/test_optimize_verifier_bundle.py tests/test_optimize_posture_review.py tests/test_optimize_assist_runner.py -q