openclaw-auto-recall

openclaw-auto-recall automatically searches the agent's memory before every prompt and injects relevant context without the use of a second LLM, a separate database, or any external service. The agent doesn't miss important information just because it didn't think to call memory_search. The primary benefits are:

Consistency — relevant context is always considered, not just when the agent remembers to search
Speed — ~100ms per turn via direct search, no LLM round-trip
Token efficiency — only the injected context tokens are added (capped at 768), not a full second model call for extraction
Simplicity — no new infrastructure, no configuration beyond installing the plugin
Transparency — injected context is tagged with [auto-recall] so it's visible in the prompt
Safety — if search fails or times out, the model proceeds normally with no interruption
Non-destructive — purely additive; install or remove without changing any underlying configuration, and disabling returns the system to its exact baseline state

Why

AI agents sometimes forget to search their own memory. When they don't call memory_search before providing an answer, they can miss context that would have improved the accuracy or relevance of their response. Every successful memory system at scale injects context before the prompt — this closes that gap for OpenClaw by leveraging existing memory search functionality in a highly efficient way, with no additional infrastructure, no second LLM call, and no changes to the agent's behavior.

How it works

Hooks into before_prompt_build — fires on every user message, before the model sees it
Calls the memory search engine directly via getActiveMemorySearchManager — no subagent, no model round-trip. This searches the same local vector index that the memory_search tool uses: all indexed files (MEMORY.md, daily notes, wiki pages, session transcripts, and any other paths configured in memorySearch.paths).
Formats relevant results and returns them as prependContext, wrapped in [auto-recall]...[/auto-recall] tags
If search fails, times out, or finds nothing relevant, the model proceeds exactly as it would without the plugin

User message → before_prompt_build → memory_search(query) → prependContext → model sees enriched prompt

Latency: ~100ms per turn. Token cost: only the injected context (capped at 768 tokens).

User message → before_prompt_build → shouldSkip? → yes → normal prompt
                                           ↓ no
                                   search memory with user message
                                           ↓
                                   penalize draft paths
                                           ↓
                                   format within token budget
                                           ↓
                                   prependContext → enriched prompt

If search fails, times out, or finds nothing above the minimum score, the model receives the original prompt unchanged — openclaw-auto-recall never blocks a turn.

Architecture decisions

Why not spawn a subagent?

Spawning subagents results in increased latency, tokens, and cost. As an example, OpenClaw's built-in active-memory plugin spawns a full subagent model call for each turn and adds a complete LLM round-trip. But openclaw-auto-recall calls manager.search() directly, which is a local embedding similarity search against the SQLite index and gets the same results in ~100ms instead of seconds, with lower tokens and cost.

Why not `registerMemoryPromptSupplement`?

That API exists for adding static content to the memory section of the system prompt (like a permanent note). openclaw-auto-recall needs dynamic, per-turn context based on the current message. prependContext from before_prompt_build is the right tool — it's injected per-turn, not baked into the system prompt.

Why the skip conditions?

Short messages, heartbeats, slash commands, and NO_REPLY signals don't benefit from memory search. Running search on them wastes ~100ms and potentially injects noise. The skip list is conservative and easy to extend.

Why the draft path penalty?

Session drafts (memory/drafts/) contain raw tool output — shell commands, API responses, debug logs. These rank well on keyword match but are noisy and unhelpful as injected context. openclaw-auto-recall fetches twice the requested results, applies a 0.15 relevance penalty to draft paths, then takes the top results by adjusted score. This lets curated content (wiki, daily notes, MEMORY.md) rank above raw transcripts without completely excluding drafts when they're genuinely the best match.

Why `[auto-recall]` tags?

Two reasons: visibility and feedback loop prevention. The tags make it obvious which context was injected vs. what the user actually said. In the future, they could also be used to strip injected context before re-indexing, preventing the echo chamber effect.

Installation

openclaw plugins install --link /path/to/openclaw-auto-recall
openclaw gateway restart

Or from ClawHub (when published):

openclaw plugins install clawhub:openclaw-auto-recall
openclaw gateway restart

Configuration

{
  "plugins": {
    "entries": {
      "openclaw-auto-recall": {
        "enabled": true,
        "config": {
          "maxResults": 5,
          "minScore": 0.5,
          "maxTokens": 768
        }
      }
    }
  }
}

Setting	Default	Description
`maxResults`	5	Maximum search results to inject
`minScore`	0.5	Minimum relevance score (0-1) to include. Lower values return more results with more noise; higher values return fewer but more relevant results.
`maxTokens`	768	Maximum estimated tokens for injected context

Skip conditions

openclaw-auto-recall skips injection for:

Messages shorter than 10 characters
HEARTBEAT_OK responses
NO_REPLY signals
Slash commands (/status, /model, /reset, /new, /approve, /stop)

Comparison

Dimension	Honcho	Hindsight/Vectorize	active-memory (built-in)	openclaw-auto-recall
Infrastructure	Separate service	PostgreSQL + PyTorch + CUDA	None (built-in)	None (plugin)
Second LLM per turn	Yes	Yes	Yes (subagent)	No
Replaces memory-core	Yes	Yes	No (extends)	No (extends)
Data location	Cloud or separate DB	Separate PostgreSQL	Local SQLite	Local SQLite
Latency per turn	Extra API call	Extra LLM call	Seconds (subagent)	~100ms (direct search)
Token cost	LLM extraction	LLM extraction	Subagent call	Injected context only
Setup complexity	Deploy service	Deploy stack	Enabled by default	Install plugin

Search quality depends on your content

openclaw-auto-recall is only as good as what's in your memory index. If you maintain a knowledge base (wiki, Obsidian vault, daily notes, project documentation), it will surface rich, relevant context. If your memory is sparse, there's less to find — start by building good notes and the plugin gets more useful over time.

Customizing for your deployment

openclaw-auto-recall searches whatever OpenClaw's memory_search indexes. If your instance indexes wiki pages, project docs, or custom paths, those are automatically included — no configuration needed beyond your existing memorySearch.paths setup.

The draft path penalty (DRAFT_PATH_PENALTY and DRAFT_PATH_PATTERNS in src/index.js) downranks raw session transcripts and noisy content. If you don't have drafts, it does nothing. If you have your own noisy content directories, edit the patterns to match your structure:

const DRAFT_PATH_PATTERNS = [
  /memory\/drafts\//,
  /memory\/draft-.*\.md$/,
  // Add your own: /logs\//, /raw-output\//, etc.
];

The skip patterns filter out messages that don't benefit from memory search. Add your own patterns to SKIP_PATTERNS for any bot commands or signals in your deployment.

Removing

# Disable without removing
openclaw config set plugins.entries.openclaw-auto-recall.enabled false
openclaw gateway restart

# Or fully remove
openclaw plugins uninstall openclaw-auto-recall
openclaw gateway restart

No data is created or modified. Disabling returns you to exactly the baseline state.

Requirements

memory_search configured and working
before_prompt_build hook support
getActiveMemorySearchManager SDK export
sources: ["memory", "sessions"] recommended for session transcript search

(Developed and tested on OpenClaw 2026.6.6)

License

MIT

Auto-Recall

openclaw-auto-recall

Why

How it works

Architecture decisions

Why not spawn a subagent?

Why not `registerMemoryPromptSupplement`?

Why the skip conditions?

Why the draft path penalty?

Why `[auto-recall]` tags?

Installation

Configuration

Skip conditions

Comparison

Search quality depends on your content

Customizing for your deployment

Removing

Requirements

License

Source and release

Source repository

Source commit

Install command

Metadata

Compatibility

Auto-Recall

openclaw-auto-recall

Why

How it works

Architecture decisions

Why not spawn a subagent?

Why not registerMemoryPromptSupplement?

Why the skip conditions?

Why the draft path penalty?

Why [auto-recall] tags?

Installation

Configuration

Skip conditions

Comparison

Search quality depends on your content

Customizing for your deployment

Removing

Requirements

License

Source and release

Source repository

Source commit

Install command

Metadata

Compatibility

Why not `registerMemoryPromptSupplement`?

Why `[auto-recall]` tags?