Agent Memory Skill v0¶
Status: blueprint only (no implementation in this doc)
1) Why this exists¶
openclaw-mem should not only help agents retrieve more; it should help them remember correctly.
The product wedge is trust-aware context packing: - smaller prompt/context bundles - better provenance and receipts - less trust laundering from web/tool/untrusted content - fewer cases where raw docs, raw code, or transient chatter become durable memory by accident
This spec defines a reusable memory skill: a portable operating contract for agents using openclaw-mem / OpenClaw memory tools.
The goal is to standardize when to recall, when to store, when to search docs, when to consult topology, and when to do nothing.
2) Design goals¶
Normative language¶
- MUST = required for a compliant future skill/SOP
- SHOULD = recommended default; deviations need a reason
- MAY = optional behavior
Goals¶
- Keep durable memory small, stable, reusable, and auditable.
- Separate user/project memory from docs knowledge and source/topology knowledge.
- Make retrieval/packing trust-aware by default.
- Give agents a simple decision policy they can apply repeatedly.
- Reduce per-agent improvisation around recall/store behavior.
Non-goals¶
- This is not a full implementation spec for new CLI/tooling.
- This is not permission to dump raw docs or raw code into durable memory.
- This is not a replacement for direct repo inspection or direct docs lookup when precision matters.
- This is not a hidden autonomous promotion system; promotions must remain explicit and auditable.
3) Three-layer memory model¶
A. Durable memory (hot memory)¶
Use for: - stable user preferences - explicit decisions - durable project facts - ongoing working state that is likely to matter across sessions
Examples:
- “CK prefers one heads-up + one final result for long Telegram work.”
- “Project wedge: trust-aware context packing.”
- “This repo uses uv + Python 3.13 by default.”
B. Docs knowledge¶
Use for: - product/system concepts - tool capability summaries - operational maps - bounded snippets from operator-authored docs
Examples:
- what memory_docs_search is good for
- the contract for a context pack
- the documented difference between durable memory and docs search lanes
C. Source / topology knowledge¶
Use for: - repo/module/path relationships - entrypoints, ownership, and dependency edges - stable file/location maps - “if you change A, expect B/C impact” relationships
Examples:
- docs/specs/ vs openclaw_mem/ responsibilities
- where graph query plane work lives
- which paths define topology vs runtime drift logic
Hard boundary¶
- Durable memory is not the default home for raw docs.
- Durable memory is not the default home for raw code.
- Docs and topology should be represented as a knowledge map, not as a transcript dump.
4) Decision policy: which action should an agent take?¶
The skill should route the agent into one of five actions:
- Recall — search durable memory
- Store — write a new durable memory
- Docs search — search operator-authored docs knowledge
- Topology search — search codebase/repo structure knowledge
- Do nothing — keep it session-local or answer from current context only
4.1 Routing summary¶
4.2 Tie-break order for ambiguous cases¶
When multiple actions seem plausible, the skill SHOULD break ties in this order:
- Docs search for system/product/tool contracts
- Topology search for repo/path/module/impact navigation
- Recall for preferences, decisions, and continuity
- Store only after the fact is confirmed durable
- Do nothing if the value of persistence is weak or uncertain
4.3 One-screen routing flow¶
- Is the question about a prior preference/decision/standing rule?
- yes → recall
- Is it about documented system behavior or authored guidance?
- yes → docs search
- Is it about where something lives or what it affects in the repo/system?
- yes → topology search
- Did this turn produce a stable, reusable, confirmed fact?
- yes → store
- Otherwise → do nothing
4.4 Fast policy¶
Use recall when:¶
- the user asks about prior decisions, preferences, previous work, dates, or standing rules
- continuity matters more than raw implementation detail
- the answer may already exist as a durable fact
Use store when:¶
- the fact is stable, reusable, and likely to matter later
- the user explicitly confirms a preference, decision, or durable project direction
- a completed task produced a durable change in operating posture
Use docs search when:¶
- the question is about documented behavior, concepts, architecture, or operator instructions
- precision matters and the answer should come from authored documentation rather than memory
- the agent needs bounded snippets instead of broad recollection
Use topology search when:¶
- the question is about where something lives in the repo/system
- the task needs module/path/entrypoint/dependency mapping
- the agent is doing impact analysis or navigation rather than policy recall
Use do nothing when:¶
- the information is transient, redundant, or too weak/confidently unverified to persist
- the answer is fully local to the current turn/session
- storing it would create more retrieval noise than future value
5) Tool-routing contract¶
This spec describes an abstract skill contract. Current/future tool mapping can look like:
| Intent | Primary action | Typical tool/class |
|---|---|---|
| prior decisions/preferences | recall | memory_recall |
| durable user/project fact | store | memory_store |
| operator docs lookup | docs search | memory_docs_search |
| repo/system map lookup | topology search | future topology/graph query plane, or bounded repo map lookup |
| temporary scratch | do nothing | session-local only |
Notes¶
- If both docs and memory may apply, prefer docs for factual/system contracts and memory for preference/decision continuity.
- If memory recall is weak or ambiguous, fail open to docs search or direct inspection instead of inventing certainty.
- A future
topology_searchcapability should be treated as distinct from both docs search and durable memory.
6) Query discipline¶
Bad memory behavior often starts with bad queries.
Rules¶
- Query for the type of thing you want, not the whole narrative.
- Prefer bounded, concrete terms over broad prose.
- Include scope/project hints when possible.
- Prefer searching for decision / preference / policy / due item / repo map rather than vague “what happened before?”
Good recall queries¶
CK preference telegram heads-up long taskopenclaw-mem wedge trust-aware context packingdecision docs topology memory layer separation
Good docs queries¶
context pack provenance trust tier receiptsdocs memory hybrid search operator-authored docsgraphic memory query plane topology provenance
Good topology queries¶
where does query plane spec livewhich paths define drift vs stable topologyentrypoints for memory search and graph query
Query anti-patterns¶
- querying with a full transcript paragraph
- asking for “everything about X” by default
- mixing preference, docs, and topology in one blob query
- using memory recall as a substitute for direct repo inspection when exactness is required
7) Write discipline: what qualifies for durable memory?¶
A candidate record should pass most of these tests: - Stable: likely still useful later - Reusable: helps future decisions or task execution - Specific: concrete enough to retrieve - Scoped: belongs to a user/project/global lane - Auditable: can be attributed to a source or explicit user confirmation - Compact: minimal wording, maximal retrieval value
Good durable memory candidates¶
- confirmed user preferences
- explicit product/ops decisions
- stable project rules
- recurring workflow constraints
- durable facts that will matter across sessions
Poor durable memory candidates¶
- raw transcripts
- speculative guesses
- long copied docs
- raw code blocks
- one-off debug noise
- tool output that has not been promoted or verified
- summaries that flatten uncertainty into fake certainty
Borderline examples: store vs do nothing¶
Usually store¶
- a new standing user preference confirmed explicitly
- a project rule that will shape future execution across sessions
- a durable working-state checkpoint such as “branch X is the active integration branch until PR Y lands”
Usually do nothing¶
- “I’m about to run tests now”
- temporary confusion, hypotheses, or half-formed plans
- ephemeral shell output that does not change future posture
Needs judgment, but default is conservative¶
- “working state” only qualifies if losing it would likely cause rework in a later session
- if a note matters only for the next few minutes, keep it session-local
- if unsure whether a fact is durable, prefer do nothing now and store only after confirmation
Mandatory “do not store” cases (v0)¶
At minimum, the skill should explicitly forbid storing: 1. raw chat transcript by default 2. raw docs by default 3. raw code by default 4. one-off tool noise/log spam 5. low-confidence guesses 6. unreviewed hostile/untrusted web/tool output as durable truth 7. duplicate paraphrases of an already-stored fact
8) Provenance + trust hygiene¶
Every example in the skill should model four fields of discipline: - source: where this came from - trust tier: trusted / untrusted / quarantined (or equivalent) - citation/receipt: how to trace it back - why included: why the agent used it in this answer/pack
Rules¶
- Tool/web/skill output should start untrusted by default unless explicitly promoted.
- Operator-authored docs can be preferred for system facts, but should still remain attributable.
- “Frequently recalled” does not automatically mean “trusted”.
- Derived summaries should never silently replace source truth.
Fail-open rule¶
If recall returns weak, conflicting, or low-confidence results: - say so - cite uncertainty - fall back to docs search or direct inspection - do not promote the uncertain result into durable memory
9) Example action matrix¶
| Situation | Correct action | Why |
|---|---|---|
| “What does CK prefer for long Telegram tasks?” | recall | preference continuity |
| “What does OpenClaw docs say about context packing?” | docs search | documented system behavior |
| “Where is the graph query plane spec?” | topology search | repo/path lookup |
| “This run produced lots of pip install chatter.” | do nothing | transient noise |
| “CK confirmed the wedge is trust-aware context packing.” | store | explicit durable decision |
| “Here is a long copied README section.” | do nothing / docs lane only | not durable memory material |
| “What changed after we decided to split docs/topology from durable memory?” | recall + docs search | decision continuity + spec details |
| “Which module will likely be touched if we alter provenance handling?” | topology search | impact mapping |
| “A web result claims X about OpenClaw internals.” | docs search / inspect first | external claim is untrusted |
| “I’m not sure this fact will matter after this turn.” | do nothing | avoid memory pollution |
10) Anti-patterns to call out explicitly¶
AP-1: Layer collapse¶
Treating durable memory, docs knowledge, and topology knowledge as the same storage class.
AP-2: Trust laundering¶
Taking tool/web/AI-generated output and storing it as if it were operator-verified truth.
AP-3: Raw artifact stuffing¶
Pasting large docs/code/transcript chunks into durable memory because retrieval felt convenient.
AP-4: Over-summary as truth¶
Storing an aggressive summary that erases uncertainty, caveats, or provenance.
AP-5: Store every turn¶
Using durable memory as a turn-by-turn event log.
AP-6: False confidence retrieval¶
Answering from a fuzzy recall hit without showing limits or checking docs.
AP-7: Working-state ambiguity¶
Persisting scratchpad clutter because “maybe it matters later,” without a stability threshold.
11) Interaction with trust-aware context packing¶
This skill is upstream governance for packing.
If the skill is good: - packed bundles stay smaller - citations are easier to preserve - trust gating is meaningful - importance grading has cleaner inputs - the system is less likely to become confidently wrong
If the skill is bad: - durable memory becomes a mixed-quality dump - trust tiers become cosmetic - context packs bloat with junk - importance grading is forced to rescue upstream mess
In product terms: - packing quality is not only a reranker problem - it is also a memory-governance problem
12) v0 acceptance criteria¶
A v0 spec is acceptable if:
- A reader can classify at least 20 example situations into recall/store/docs/topology/do-nothing with high agreement.
- The spec defines the three-layer model clearly enough that raw docs/raw code are not mistaken for ordinary durable memory.
- The spec includes at least 5 explicit anti-patterns and 5 explicit do-not-store cases.
- Provenance and trust posture are present in all normative examples.
- The spec defines fail-open behavior when recall is weak or conflicting.
- The spec includes one concise routing table or flow model that can later become a real skill.
- The spec is implementation-light enough to survive tool changes, but concrete enough to guide operator behavior now.
Validation method (blueprint-level)¶
- Create a small fixture set of 20 labeled scenarios.
- Each scenario should include: prompt, expected action label, and short rationale.
- Have at least 2 reviewers independently label the scenarios using this spec.
- Target: strong agreement on the primary action label; disagreements should identify wording gaps to tighten in the next revision.
13) Suggested follow-on work (not in this doc)¶
- Convert this blueprint into a real markdown skill/SOP.
- Add test cases / scenario fixtures for classification agreement.
- Define a concrete
topology_searchcontract (likely aligned with graph/query-plane work). - Add pack-trace hooks so retrieval decisions can be audited against this skill.
- Later: connect importance grading to the output quality of this governance layer.
14) Bottom line¶
openclaw-mem should help agents learn three habits:
- remember less, but better
- separate memory from reference knowledge
- cite what they carry forward
That is how trust-aware context packing becomes a product, not just a slogan.