openclaw-auto-recall
openclaw-auto-recall automatically searches the agent's memory before every prompt and injects relevant context without the use of a second LLM, a separate database, or any external service. The agent doesn't miss important information just because it didn't think to call memory_search. The primary benefits are:
- Consistency — relevant context is always considered, not just when the agent remembers to search
- Speed — ~100ms per turn via direct search, no LLM round-trip
- Token efficiency — only the injected context tokens are added (capped at 768), not a full second model call for extraction
- Simplicity — no new infrastructure, no configuration beyond installing the plugin
- Transparency — injected context is tagged with
[auto-recall]so it's visible in the prompt - Safety — if search fails or times out, the model proceeds normally with no interruption
- Non-destructive — purely additive; install or remove without changing any underlying configuration, and disabling returns the system to its exact baseline state
Why
AI agents sometimes forget to search their own memory. When they don't call memory_search before providing an answer, they can miss context that would have improved the accuracy or relevance of their response. Every successful memory system at scale injects context before the prompt — this closes that gap for OpenClaw by leveraging existing memory search functionality in a highly efficient way, with no additional infrastructure, no second LLM call, and no changes to the agent's behavior.
How it works
- Hooks into
before_prompt_build— fires on every user message, before the model sees it - Calls the memory search engine directly via
getActiveMemorySearchManager— no subagent, no model round-trip. This searches the same local vector index that thememory_searchtool uses: all indexed files (MEMORY.md, daily notes, wiki pages, session transcripts, and any other paths configured inmemorySearch.paths). - Formats relevant results and returns them as
prependContext, wrapped in[auto-recall]...[/auto-recall]tags - If search fails, times out, or finds nothing relevant, the model proceeds exactly as it would without the plugin
User message → before_prompt_build → memory_search(query) → prependContext → model sees enriched prompt
Latency: ~100ms per turn. Token cost: only the injected context (capped at 768 tokens).
User message → before_prompt_build → shouldSkip? → yes → normal prompt
↓ no
search memory with user message
↓
penalize draft paths
↓
format within token budget
↓
prependContext → enriched prompt
If search fails, times out, or finds nothing above the minimum score, the model receives the original prompt unchanged — openclaw-auto-recall never blocks a turn.
Architecture decisions
Why not spawn a subagent?
Spawning subagents results in increased latency, tokens, and cost. As an example, OpenClaw's built-in active-memory plugin spawns a full subagent model call for each turn and adds a complete LLM round-trip. But openclaw-auto-recall calls manager.search() directly, which is a local embedding similarity search against the SQLite index and gets the same results in ~100ms instead of seconds, with lower tokens and cost.
Why not registerMemoryPromptSupplement?
That API exists for adding static content to the memory section of the system prompt (like a permanent note). openclaw-auto-recall needs dynamic, per-turn context based on the current message. prependContext from before_prompt_build is the right tool — it's injected per-turn, not baked into the system prompt.
Why the skip conditions?
Short messages, heartbeats, slash commands, and NO_REPLY signals don't benefit from memory search. Running search on them wastes ~100ms and potentially injects noise. The skip list is conservative and easy to extend.
Why the draft path penalty?
Session drafts (memory/drafts/) contain raw tool output — shell commands, API responses, debug logs. These rank well on keyword match but are noisy and unhelpful as injected context. openclaw-auto-recall fetches twice the requested results, applies a 0.15 relevance penalty to draft paths, then takes the top results by adjusted score. This lets curated content (wiki, daily notes, MEMORY.md) rank above raw transcripts without completely excluding drafts when they're genuinely the best match.
Why [auto-recall] tags?
Two reasons: visibility and feedback loop prevention. The tags make it obvious which context was injected vs. what the user actually said. In the future, they could also be used to strip injected context before re-indexing, preventing the echo chamber effect.
Installation
openclaw plugins install --link /path/to/openclaw-auto-recall
openclaw gateway restart
Or from ClawHub (when published):
openclaw plugins install clawhub:openclaw-auto-recall
openclaw gateway restart
Configuration
{
"plugins": {
"entries": {
"openclaw-auto-recall": {
"enabled": true,
"config": {
"maxResults": 5,
"minScore": 0.5,
"maxTokens": 768
}
}
}
}
}
| Setting | Default | Description |
|---|---|---|
maxResults | 5 | Maximum search results to inject |
minScore | 0.5 | Minimum relevance score (0-1) to include. Lower values return more results with more noise; higher values return fewer but more relevant results. |
maxTokens | 768 | Maximum estimated tokens for injected context |
Skip conditions
openclaw-auto-recall skips injection for:
- Messages shorter than 10 characters
HEARTBEAT_OKresponsesNO_REPLYsignals- Slash commands (
/status,/model,/reset,/new,/approve,/stop)
Comparison
| Dimension | Honcho | Hindsight/Vectorize | active-memory (built-in) | openclaw-auto-recall |
|---|---|---|---|---|
| Infrastructure | Separate service | PostgreSQL + PyTorch + CUDA | None (built-in) | None (plugin) |
| Second LLM per turn | Yes | Yes | Yes (subagent) | No |
| Replaces memory-core | Yes | Yes | No (extends) | No (extends) |
| Data location | Cloud or separate DB | Separate PostgreSQL | Local SQLite | Local SQLite |
| Latency per turn | Extra API call | Extra LLM call | Seconds (subagent) | ~100ms (direct search) |
| Token cost | LLM extraction | LLM extraction | Subagent call | Injected context only |
| Setup complexity | Deploy service | Deploy stack | Enabled by default | Install plugin |
Search quality depends on your content
openclaw-auto-recall is only as good as what's in your memory index. If you maintain a knowledge base (wiki, Obsidian vault, daily notes, project documentation), it will surface rich, relevant context. If your memory is sparse, there's less to find — start by building good notes and the plugin gets more useful over time.
Customizing for your deployment
openclaw-auto-recall searches whatever OpenClaw's memory_search indexes. If your instance indexes wiki pages, project docs, or custom paths, those are automatically included — no configuration needed beyond your existing memorySearch.paths setup.
The draft path penalty (DRAFT_PATH_PENALTY and DRAFT_PATH_PATTERNS in src/index.js) downranks raw session transcripts and noisy content. If you don't have drafts, it does nothing. If you have your own noisy content directories, edit the patterns to match your structure:
const DRAFT_PATH_PATTERNS = [
/memory\/drafts\//,
/memory\/draft-.*\.md$/,
// Add your own: /logs\//, /raw-output\//, etc.
];
The skip patterns filter out messages that don't benefit from memory search. Add your own patterns to SKIP_PATTERNS for any bot commands or signals in your deployment.
Removing
# Disable without removing
openclaw config set plugins.entries.openclaw-auto-recall.enabled false
openclaw gateway restart
# Or fully remove
openclaw plugins uninstall openclaw-auto-recall
openclaw gateway restart
No data is created or modified. Disabling returns you to exactly the baseline state.
Requirements
memory_searchconfigured and workingbefore_prompt_buildhook supportgetActiveMemorySearchManagerSDK exportsources: ["memory", "sessions"]recommended for session transcript search
(Developed and tested on OpenClaw 2026.6.6)
License
MIT