Skip to content

OpenClaw user improvement roadmap (product-facing)

This page is the product-facing roadmap for openclaw-mem.

It is written for OpenClaw operators (people who run agents in real work), not for contributors.

If you want the engineering-only backlog, see: - docs/roadmap.md - docs/completeness-roadmap.md


Problem statement (operator view)

OpenClaw agents become dramatically more useful when they can: - remember durable constraints (timezone, safety rules, “don’t touch cron payloads”), - retrieve the right decisions quickly (without prompt gymnastics), - avoid cross-project contamination, - and explain why a memory was injected.

Today, openclaw-mem is already useful, but the UX still has friction: - recall can be “relevant but not first”, - scope isolation is present but not “hard enough” to eliminate bleed, - lifecycle is still mostly manual (DB grows; signal decays), - explainability exists (receipts) but needs to be more operator-legible.


What we optimize for (product principles)

1) Local-first, fail-open - If a quality layer fails (embeddings, rerank, provider), the agent loop must continue.

2) Rollbackable posture - All “takes ownership of the memory slot” features are opt-in and one-line rollback.

3) Governance over vibes - Retrieval and capture must be auditable: receipts, trace, provenance.

4) Namespace hygiene - A user should be able to run multiple projects without memory bleed unless explicitly allowed.


One killer demo flow (showcase-ready)

Goal: a reproducible 5-minute demo that shows before/after impact.

Recommended flow: 1) Run the synthetic “durable self” demo: - docs/showcase/inside-out-demo.md - ./scripts/inside_out_demo.sh 2) Show that the packed bundle consistently enforces: - timezone preference, - privacy stance (synthetic demo), - output style (index-first / bounded reveal). 3) Turn on --trace (or receipt verbosity) and show why memories were selected. 4) (Optional) switch slot backend to openclaw-mem-engine and demonstrate hybrid recall receipts.


Improvement list (ranked by OpenClaw user value)

Progress update (current cycle)

  • ✅ Scope hardening pass shipped (skipFallbackOnInvalidScope default-on + hardened scope-tag extraction).
  • ✅ Receipt UX pass shipped (whySummary + whyTheseIds + explicit fallback suppression markers).
  • ✅ Step4 wiring shipped behind canary flag (workingSet.enabled, deterministic synthesis + pinned injection + optional working_set:<scope> persistence).

P0 — Immediate wins (days)

These should be high-impact and low-risk.

  • Harder scope isolation by default
  • Make cross-project bleed difficult to happen accidentally.
  • Default: strict scope validation + explicit fallback markers.
  • Acceptance: two projects in parallel; recall stays in-scope unless the operator opts into fallback scopes.

  • Recall ranking that matches operator expectations

  • Improve deterministic ranking so “the obvious memory” is returned first.
  • Candidate approach (still deterministic): split retention from activation; use Working Set as backbone lane, then quota-mix hot recall (must cap + nice floor + wildcard) instead of letting must_remember fill the whole budget.
  • Spec: docs/specs/auto-recall-activation-vs-retention-v1.md
  • Acceptance: on a small golden set, top-1/top-3 improves without increasing noise, and large must_remember pools no longer wash out turn-relevant memories.

  • Explainability that answers the operator’s question

  • Receipts already exist; make them more legible for humans.
  • Show: tier searched, why skipped, why included/excluded, which scope(s) were consulted.
  • Acceptance: when an operator asks “why did it recall that?”, the receipt answers it in one screen.

  • Make Roadmap + this page discoverable in the docs nav

  • Operators should not need to grep the repo to find the product view.

P1 — Make it feel “pro” (1–2 weeks)

  • Importance grading: operator workflow + drift checks
  • Baseline exists (heuristic-v1); add lightweight operator review + drift receipts.
  • Acceptance: operators can spot-check must/ignore precision and correct mistakes quickly.

  • Lifecycle MVP (archive-first, reversible)

  • Use-based decay: track last_used_at based on actual inclusion (e.g., pack trace), not “retrieved”.
  • Soft-archive low-value records; reversible.
  • Acceptance: DB growth is bounded; must_remember remains stable.

  • Docs memory as a first-class recall surface

  • Operators want: “we already decided this” retrieval.
  • Acceptance: decisions/specs are retrievable by keyword even with no embeddings.

P2 — Showcase-grade experience (2–4 weeks)

  • Optional: wire pack as the default context feeder (guarded)
  • Default OFF; canary first.
  • Acceptance: consistent, smaller prompts; fewer “context bloat” failures; receipts prove what was injected.

  • “Inside-Out Memory” demo → real operator template

  • Provide a ready-to-run template that users can adapt:
    • a sample DECISIONS file,
    • a sample memory namespace scheme,
    • a “one-command demo” script.

High-ROI modules that can be developed independently (single-worker friendly)

These are intentionally scoped so one subagent can ship them end-to-end (docs + tests + receipts), without requiring coordination across many modules.

1) Receipt UX pass (operator legibility) - Improve receipt fields/names; add clear skip reason taxonomy; add “why these IDs” summary. - Minimal risk; strong adoption impact.

2) Deterministic recency boost / tie-break policy - Keep the system deterministic; add a simple, auditable ranking tweak. - Add golden-set regression tests.

3) Scope strictness + fallback marker hardening - Normalize/validate scope values; better failure modes; better default isolation.

4) Lifecycle MVP (soft archive) with receipts - Add last_used_at updates from pack --trace inclusion. - Add a daily job that archives only ignore/low-priority first.

5) Docs nav + docs landing polish - Add “Start here” + “Killer demo” + “Product roadmap” in nav. - Ensure the docs site tells a coherent story.

6) Golden set for “operator invariants” - A small test corpus (synthetic) that asserts: - timezone preference always recalled, - scope isolation works, - receipts remain bounded.

7) AutoCapture hardening pass (false positives / secrets) - Tighten secret filters and dedupe. - Add unit tests against known bad captures.


Evidence / provenance

This roadmap is grounded in: - existing receipts from openclaw-mem-engine recall/capture lifecycle, - real operator friction observed in day-to-day OpenClaw usage, - and the reproducible Inside-Out demo contract.

Where we lack evidence (e.g., ranking changes), we will gate rollout behind: - a small golden set, - regression tests, - and a canary window.