Agent Memory Skill v0¶

Status: blueprint only (no implementation in this doc)

1) Why this exists¶

openclaw-mem should not only help agents retrieve more; it should help them remember correctly.

The product wedge is trust-aware context packing: - smaller prompt/context bundles - better provenance and receipts - less trust laundering from web/tool/untrusted content - fewer cases where raw docs, raw code, or transient chatter become durable memory by accident

This spec defines a reusable memory skill: a portable operating contract for agents using openclaw-mem / OpenClaw memory tools.

The goal is to standardize when to recall, when to store, when to search docs, when to consult topology, and when to do nothing.

2) Design goals¶

Normative language¶

MUST = required for a compliant future skill/SOP
SHOULD = recommended default; deviations need a reason
MAY = optional behavior

Goals¶

Keep durable memory small, stable, reusable, and auditable.
Separate user/project memory from docs knowledge and source/topology knowledge.
Make retrieval/packing trust-aware by default.
Give agents a simple decision policy they can apply repeatedly.
Reduce per-agent improvisation around recall/store behavior.

Non-goals¶

This is not a full implementation spec for new CLI/tooling.
This is not permission to dump raw docs or raw code into durable memory.
This is not a replacement for direct repo inspection or direct docs lookup when precision matters.
This is not a hidden autonomous promotion system; promotions must remain explicit and auditable.

3) Three-layer memory model¶

A. Durable memory (hot memory)¶

Use for: - stable user preferences - explicit decisions - durable project facts - ongoing working state that is likely to matter across sessions

Examples: - “CK prefers one heads-up + one final result for long Telegram work.” - “Project wedge: trust-aware context packing.” - “This repo uses uv + Python 3.13 by default.”

B. Docs knowledge¶

Use for: - product/system concepts - tool capability summaries - operational maps - bounded snippets from operator-authored docs

Examples: - what memory_docs_search is good for - the contract for a context pack - the documented difference between durable memory and docs search lanes

C. Source / topology knowledge¶

Use for: - repo/module/path relationships - entrypoints, ownership, and dependency edges - stable file/location maps - “if you change A, expect B/C impact” relationships

Examples: - docs/specs/ vs openclaw_mem/ responsibilities - where graph query plane work lives - which paths define topology vs runtime drift logic

Hard boundary¶

Durable memory is not the default home for raw docs.
Durable memory is not the default home for raw code.
Docs and topology should be represented as a knowledge map, not as a transcript dump.

4) Decision policy: which action should an agent take?¶

The skill should route the agent into one of five actions:

Recall — search durable memory
Store — write a new durable memory
Docs search — search operator-authored docs knowledge
Topology search — search codebase/repo structure knowledge
Do nothing — keep it session-local or answer from current context only

4.1 Routing summary¶

4.2 Tie-break order for ambiguous cases¶

When multiple actions seem plausible, the skill SHOULD break ties in this order:

Docs search for system/product/tool contracts
Topology search for repo/path/module/impact navigation
Recall for preferences, decisions, and continuity
Store only after the fact is confirmed durable
Do nothing if the value of persistence is weak or uncertain

4.3 One-screen routing flow¶

Is the question about a prior preference/decision/standing rule?
yes → recall
Is it about documented system behavior or authored guidance?
yes → docs search
Is it about where something lives or what it affects in the repo/system?
yes → topology search
Did this turn produce a stable, reusable, confirmed fact?
yes → store
Otherwise → do nothing

4.4 Fast policy¶

Use `recall` when:¶

the user asks about prior decisions, preferences, previous work, dates, or standing rules
continuity matters more than raw implementation detail
the answer may already exist as a durable fact

Use `store` when:¶

the fact is stable, reusable, and likely to matter later
the user explicitly confirms a preference, decision, or durable project direction
a completed task produced a durable change in operating posture

Use `docs search` when:¶

the question is about documented behavior, concepts, architecture, or operator instructions
precision matters and the answer should come from authored documentation rather than memory
the agent needs bounded snippets instead of broad recollection

Use `topology search` when:¶

the question is about where something lives in the repo/system
the task needs module/path/entrypoint/dependency mapping
the agent is doing impact analysis or navigation rather than policy recall

Use `do nothing` when:¶

the information is transient, redundant, or too weak/confidently unverified to persist
the answer is fully local to the current turn/session
storing it would create more retrieval noise than future value

5) Tool-routing contract¶

This spec describes an abstract skill contract. Current/future tool mapping can look like:

Intent	Primary action	Typical tool/class
prior decisions/preferences	recall	`memory_recall`
durable user/project fact	store	`memory_store`
operator docs lookup	docs search	`memory_docs_search`
repo/system map lookup	topology search	future topology/graph query plane, or bounded repo map lookup
temporary scratch	do nothing	session-local only

Notes¶

If both docs and memory may apply, prefer docs for factual/system contracts and memory for preference/decision continuity.
If memory recall is weak or ambiguous, fail open to docs search or direct inspection instead of inventing certainty.
A future topology_search capability should be treated as distinct from both docs search and durable memory.

6) Query discipline¶

Bad memory behavior often starts with bad queries.

Rules¶

Query for the type of thing you want, not the whole narrative.
Prefer bounded, concrete terms over broad prose.
Include scope/project hints when possible.
Prefer searching for decision / preference / policy / due item / repo map rather than vague “what happened before?”

Good recall queries¶

CK preference telegram heads-up long task
openclaw-mem wedge trust-aware context packing
decision docs topology memory layer separation

Good docs queries¶

context pack provenance trust tier receipts
docs memory hybrid search operator-authored docs
graphic memory query plane topology provenance

Good topology queries¶

where does query plane spec live
which paths define drift vs stable topology
entrypoints for memory search and graph query

Query anti-patterns¶

querying with a full transcript paragraph
asking for “everything about X” by default
mixing preference, docs, and topology in one blob query
using memory recall as a substitute for direct repo inspection when exactness is required

7) Write discipline: what qualifies for durable memory?¶

A candidate record should pass most of these tests: - Stable: likely still useful later - Reusable: helps future decisions or task execution - Specific: concrete enough to retrieve - Scoped: belongs to a user/project/global lane - Auditable: can be attributed to a source or explicit user confirmation - Compact: minimal wording, maximal retrieval value

Good durable memory candidates¶

confirmed user preferences
explicit product/ops decisions
stable project rules
recurring workflow constraints
durable facts that will matter across sessions

Poor durable memory candidates¶

raw transcripts
speculative guesses
long copied docs
raw code blocks
one-off debug noise
tool output that has not been promoted or verified
summaries that flatten uncertainty into fake certainty

Borderline examples: store vs do nothing¶

Usually store¶

a new standing user preference confirmed explicitly
a project rule that will shape future execution across sessions
a durable working-state checkpoint such as “branch X is the active integration branch until PR Y lands”

Usually do nothing¶

“I’m about to run tests now”
temporary confusion, hypotheses, or half-formed plans
ephemeral shell output that does not change future posture

Needs judgment, but default is conservative¶

“working state” only qualifies if losing it would likely cause rework in a later session
if a note matters only for the next few minutes, keep it session-local
if unsure whether a fact is durable, prefer do nothing now and store only after confirmation

Mandatory “do not store” cases (v0)¶

At minimum, the skill should explicitly forbid storing: 1. raw chat transcript by default 2. raw docs by default 3. raw code by default 4. one-off tool noise/log spam 5. low-confidence guesses 6. unreviewed hostile/untrusted web/tool output as durable truth 7. duplicate paraphrases of an already-stored fact

8) Provenance + trust hygiene¶

Every example in the skill should model four fields of discipline: - source: where this came from - trust tier: trusted / untrusted / quarantined (or equivalent) - citation/receipt: how to trace it back - why included: why the agent used it in this answer/pack

Rules¶

Tool/web/skill output should start untrusted by default unless explicitly promoted.
Operator-authored docs can be preferred for system facts, but should still remain attributable.
“Frequently recalled” does not automatically mean “trusted”.
Derived summaries should never silently replace source truth.

Fail-open rule¶

If recall returns weak, conflicting, or low-confidence results: - say so - cite uncertainty - fall back to docs search or direct inspection - do not promote the uncertain result into durable memory

9) Example action matrix¶

Situation	Correct action	Why
“What does CK prefer for long Telegram tasks?”	recall	preference continuity
“What does OpenClaw docs say about context packing?”	docs search	documented system behavior
“Where is the graph query plane spec?”	topology search	repo/path lookup
“This run produced lots of pip install chatter.”	do nothing	transient noise
“CK confirmed the wedge is trust-aware context packing.”	store	explicit durable decision
“Here is a long copied README section.”	do nothing / docs lane only	not durable memory material
“What changed after we decided to split docs/topology from durable memory?”	recall + docs search	decision continuity + spec details
“Which module will likely be touched if we alter provenance handling?”	topology search	impact mapping
“A web result claims X about OpenClaw internals.”	docs search / inspect first	external claim is untrusted
“I’m not sure this fact will matter after this turn.”	do nothing	avoid memory pollution

10) Anti-patterns to call out explicitly¶

AP-1: Layer collapse¶

Treating durable memory, docs knowledge, and topology knowledge as the same storage class.

AP-2: Trust laundering¶

Taking tool/web/AI-generated output and storing it as if it were operator-verified truth.

AP-3: Raw artifact stuffing¶

Pasting large docs/code/transcript chunks into durable memory because retrieval felt convenient.

AP-4: Over-summary as truth¶

Storing an aggressive summary that erases uncertainty, caveats, or provenance.

AP-5: Store every turn¶

Using durable memory as a turn-by-turn event log.

AP-6: False confidence retrieval¶

Answering from a fuzzy recall hit without showing limits or checking docs.

AP-7: Working-state ambiguity¶

Persisting scratchpad clutter because “maybe it matters later,” without a stability threshold.

11) Interaction with trust-aware context packing¶

This skill is upstream governance for packing.

If the skill is good: - packed bundles stay smaller - citations are easier to preserve - trust gating is meaningful - importance grading has cleaner inputs - the system is less likely to become confidently wrong

If the skill is bad: - durable memory becomes a mixed-quality dump - trust tiers become cosmetic - context packs bloat with junk - importance grading is forced to rescue upstream mess

In product terms: - packing quality is not only a reranker problem - it is also a memory-governance problem

12) v0 acceptance criteria¶

A v0 spec is acceptable if:

A reader can classify at least 20 example situations into recall/store/docs/topology/do-nothing with high agreement.
The spec defines the three-layer model clearly enough that raw docs/raw code are not mistaken for ordinary durable memory.
The spec includes at least 5 explicit anti-patterns and 5 explicit do-not-store cases.
Provenance and trust posture are present in all normative examples.
The spec defines fail-open behavior when recall is weak or conflicting.
The spec includes one concise routing table or flow model that can later become a real skill.
The spec is implementation-light enough to survive tool changes, but concrete enough to guide operator behavior now.

Validation method (blueprint-level)¶

Create a small fixture set of 20 labeled scenarios.
Each scenario should include: prompt, expected action label, and short rationale.
Have at least 2 reviewers independently label the scenarios using this spec.
Target: strong agreement on the primary action label; disagreements should identify wording gaps to tighten in the next revision.

13) Suggested follow-on work (not in this doc)¶

Convert this blueprint into a real markdown skill/SOP.
Add test cases / scenario fixtures for classification agreement.
Define a concrete topology_search contract (likely aligned with graph/query-plane work).
Add pack-trace hooks so retrieval decisions can be audited against this skill.
Later: connect importance grading to the output quality of this governance layer.

14) Bottom line¶

openclaw-mem should help agents learn three habits: - remember less, but better - separate memory from reference knowledge - cite what they carry forward

That is how trust-aware context packing becomes a product, not just a slogan.

Agent Memory Skill v0¶

1) Why this exists¶

2) Design goals¶

Normative language¶

Goals¶

Non-goals¶

3) Three-layer memory model¶

A. Durable memory (hot memory)¶

B. Docs knowledge¶

C. Source / topology knowledge¶

Hard boundary¶

4) Decision policy: which action should an agent take?¶

4.1 Routing summary¶

4.2 Tie-break order for ambiguous cases¶

4.3 One-screen routing flow¶

4.4 Fast policy¶

Use recall when:¶

Use store when:¶

Use docs search when:¶

Use topology search when:¶

Use do nothing when:¶

5) Tool-routing contract¶

Notes¶

6) Query discipline¶

Rules¶

Good recall queries¶

Good docs queries¶

Good topology queries¶

Query anti-patterns¶

7) Write discipline: what qualifies for durable memory?¶

Good durable memory candidates¶

Poor durable memory candidates¶

Borderline examples: store vs do nothing¶

Usually store¶

Usually do nothing¶

Needs judgment, but default is conservative¶

Mandatory “do not store” cases (v0)¶

8) Provenance + trust hygiene¶

Rules¶

Fail-open rule¶

9) Example action matrix¶

10) Anti-patterns to call out explicitly¶

AP-1: Layer collapse¶

AP-2: Trust laundering¶

AP-3: Raw artifact stuffing¶

AP-4: Over-summary as truth¶

AP-5: Store every turn¶

AP-6: False confidence retrieval¶

AP-7: Working-state ambiguity¶

11) Interaction with trust-aware context packing¶

12) v0 acceptance criteria¶

Validation method (blueprint-level)¶

13) Suggested follow-on work (not in this doc)¶

14) Bottom line¶

Use `recall` when:¶

Use `store` when:¶

Use `docs search` when:¶

Use `topology search` when:¶

Use `do nothing` when:¶