Skip to content

Importance grading (MVP v1)

Status: DONE (MVP v1 closure packet landed on 2026-03-31; deterministic scorer + benchmarked receipt).

What this is

openclaw-mem supports importance grading so downstream workflows can: - prioritize what gets recalled, - filter what gets triaged, - and keep long-term memory useful as volume grows.

This is intentionally designed to be: - local-first (deterministic baseline) - auditable (stored in detail_json) - backwards compatible (legacy numeric importance still accepted)

Canonical storage shape

We store a canonical object at:

  • detail_json.importance

Canonical object fields:

{
  "score": 0.86,
  "label": "must_remember",
  "rationale": "Why this matters.",
  "method": "manual-via-cli",
  "version": 1,
  "graded_at": "2026-02-11T00:00:00Z"
}

Score → label mapping (MVP v1)

  • score >= 0.80must_remember
  • score >= 0.50nice_to_have
  • else → ignore

Label semantics (operator intent)

These labels are meant to be used as triage / recall priorities, not as “truth.”

  • must_remember: durable, high-signal items you would regret losing (decisions, stable preferences, key constraints, critical incidents).
  • nice_to_have: useful context, but not mission-critical (supporting notes, transient-but-helpful facts).
  • ignore: low-signal noise (routine logs, duplicate status, ephemeral chatter).

Where importance labels get used

  • Context packing: openclaw-mem pack uses the label tier (for example, the niceCap budget) to keep “nice-to-have” from dominating the bundle.
  • Agent routing: the Agent memory skill (SOP) assumes stores are sparse and intentional; importance labels are how we keep that intent visible downstream.

Ungraded items

If detail_json.importance is missing, treat it as unknown. By default, do not drop/filter ungraded items unless a caller explicitly requests filtering.

Compatibility rules

openclaw-mem consumers should accept both: - legacy numeric: importance: 0.86 - canonical object: importance: {"score": 0.86, ...}

Additional best-effort compatibility in parsers: - numeric-string score forms are accepted ({"score": "0.86"}), including NFKC width-normalized digits ({"score": "0.86"}).

See openclaw_mem.importance.parse_importance_score().

Writing importance

Manual (via CLI)

openclaw-mem store writes the canonical importance object.

Example:

uv run python -m openclaw_mem store "Prefer tabs over spaces" \
  --category preference \
  --importance 0.9 \
  --json

Deterministic heuristic (heuristic-v1)

A deterministic scorer exists at: - openclaw_mem/heuristic_v1.py

It produces the canonical object (method=heuristic-v1).

Regression + benchmark corpus

The operator-curated MVP benchmark corpus is stored under: - benchmarks/importance_grading_set.v1.jsonl

The scorer regression test reads the same benchmark corpus so the benchmark and test lane stay aligned.

Run the scorer regression tests:

# Option A (recommended): run inside the project env
uv run --python 3.13 -- python -m unittest -q tests/test_heuristic_v1.py

# Option B: system python (no uv)
python3 -m unittest -q tests/test_heuristic_v1.py

Run the benchmark receipt generator:

PYTHONPATH=. python3 tools/importance_benchmark.py \
  --input benchmarks/importance_grading_set.v1.jsonl \
  --output handoffs/receipts/2026-03-31_importance-benchmark/importance-benchmark.v1.json

Latest closure packet: - benchmark source: benchmarks/importance_grading_set.v1.jsonl - benchmark receipt: handoffs/receipts/2026-03-31_importance-benchmark/importance-benchmark.v1.json - closure note: docs/2026-03-31_importance-grading-benchmark-closure.md

Autograde on ingest/harvest (feature-flagged)

You can optionally have ingest / harvest run heuristic-v1 and write detail_json.importance during import.

  • Enable via env var:
  • OPENCLAW_MEM_IMPORTANCE_SCORER=heuristic-v1 (alias heuristic_v1)
  • Or override per-run:

  • CLI override takes precedence over env var for the same run (for one-off dry-run behavior, use --importance-scorer off).

  • --importance-scorer {heuristic-v1|heuristic_v1|off}

Notes: - This is designed to be safe + reversible: set the env/flag to off to stop grading. - Existing detail_json.importance values are preserved unless a caller explicitly opts into re-grading.

Run summary output (ops receipt)

When --json is enabled, ingest and harvest also emit a small run summary so cron/ops flows can trend label distribution over time.

Fields: - total_seen: number of observations processed in this run - graded_filled: number of observations where autograde populated missing detail_json.importance - skipped_existing: observations that already had detail_json.importance (left untouched) - skipped_disabled: observations with missing importance when autograde is disabled - scorer_errors: autograde failures (ingest still succeeds; fail-open) - label_counts: aggregate label distribution for observations that had importance (existing + newly graded). Receipts always include canonical keys (must_remember, nice_to_have, ignore, unknown) with zero defaults for deterministic dashboards.

Example:

uv run --python 3.13 -- python -m openclaw_mem ingest \
  --file observations.jsonl \
  --importance-scorer heuristic-v1 \
  --json

Example JSON output:

{
  "inserted": 3,
  "ids": [101, 102, 103],
  "total_seen": 3,
  "graded_filled": 3,
  "skipped_existing": 0,
  "skipped_disabled": 0,
  "scorer_errors": 0,
  "label_counts": {
    "nice_to_have": 2,
    "must_remember": 1
  }
}

Minimal run summary contract (v0)

To keep scheduled receipts deterministic and redaction-safe, treat ingest/harvest JSON output as an aggregate-only contract.

Recommended text form (for logs/channels):

harvest-receipt: total_seen=<int>, graded_filled=<int>, skipped_existing=<int>, skipped_disabled=<int>, scorer_errors=<int>, labels=<json>, optional_embedded=<int>

Recommended JSON skeleton (subset that should remain stable):

{
  "total_seen": 0,
  "graded_filled": 0,
  "skipped_existing": 0,
  "skipped_disabled": 0,
  "scorer_errors": 0,
  "label_counts": {
    "must_remember": 0,
    "nice_to_have": 0,
    "ignore": 0,
    "unknown": 0
  }
}

Keep receipts to counts/ratios only: - no raw observation content - no full file paths (prefer labels like source=harvest-dir when sharing) - no raw payload snippets or user traces