@elimaine

Delta-Mem MLX

OpenClaw helper plugin for installing and connecting the delta-mem MLX sidecar on Apple Silicon.

Current version
v0.1.3
code-pluginCommunitysource-linked

Delta-Mem MLX for OpenClaw

License: MIT OpenClaw Plugin Apple Silicon

Give OpenClaw agents a local Apple Silicon memory sidecar.

This plugin connects OpenClaw to a local δ-mem MLX sidecar: an OpenAI-compatible chat-completions service that runs MLX models and can preload retrieved memory into a compact neural state before generation. It is for people running local agents on macOS who want memory behavior that is more interesting than simply pasting old context into the prompt.

δ-mem is exciting because it changes how attention is shaped in a frozen model. Instead of fine-tuning a new model or appending a long transcript, the sidecar keeps a small per-session memory state and applies it through the model's attention path. In local tests, the Qwen3 4B backbone improved consistently when the δ-mem adapter was attached; OpenClaw-specific memory preload experiments are still being tuned and benchmarked.

Stack

LayerWhat it uses
Agent runtimeOpenClaw model provider config
Sidecar APIFastAPI, OpenAI-compatible /v1/chat/completions
InferenceMLX / mlx-lm on Apple Silicon
Default backbonemlx-community/Qwen3-4B-Instruct-2507-4bit
δ-mem adapterofthetrees/delta-mem-qwen3-4b-instruct-mlx-adapter
Optional retrievalQMD vsearch snippets passed as attention_state

Sidecar and benchmark repo:

https://github.com/elimaine/delta-mem-mlx-sidecar-w-openclaw

δ-mem paper:

https://arxiv.org/abs/2605.12357

Requirements

ComponentMinimumNotes
HostmacOS on Apple SiliconMLX is the intended local runtime
Python3.11+Used by the sidecar
NodeCurrent LTSUsed by plugin helper scripts
OpenClaw2026.3.24-beta.2 or newerRequired for plugin install/use
StorageSeveral GBModel downloads are gated and explicit
Hugging FaceNetwork accessNeeded for model/adapter downloads

The plugin can still emit provider config for an already-running sidecar. Managed install is optional.

Install

From ClawHub:

openclaw plugins install clawhub:@elimaine/openclaw-delta-mem-mlx

From GitHub:

git clone https://github.com/elimaine/openclaw-delta-mem-mlx-plugin.git
openclaw plugins install -l ./openclaw-delta-mem-mlx-plugin

Quick Start

Check whether the sidecar route is reachable:

curl -fsS http://127.0.0.1:8765/health

Then use the plugin inside OpenClaw:

  • Human path: ask OpenClaw to run delta_mem_mlx_status, then delta_mem_mlx_provider_config.
  • Agent path: call delta_mem_mlx_provider_config with sidecarBaseUrl: "http://127.0.0.1:8765" and modelId: "delta-mem-qwen3-4b-mlx", then add the returned provider block to the model configuration.

Sidecar Setup

Managed install:

npm run install-sidecar

The installer checks platform assumptions first. It asks before downloading model artifacts and starts the sidecar only after a completed managed install.

Install without starting:

npm run install-sidecar-only

Start an existing sidecar checkout:

npm run start-sidecar -- --root /path/to/delta-mem-mlx-sidecar-w-openclaw

Use the QMD fallback when OpenClaw is not yet passing attention_state itself:

npm run start-sidecar -- \
  --root /path/to/delta-mem-mlx-sidecar-w-openclaw \
  --attention-state-source qmd \
  --qmd-mode vsearch \
  --qmd-limit 6

If QMD is missing, times out, or exits nonzero, the sidecar still answers normally and reports X-Delta-Attention-State-Count: 0.

Model Presets

PresetPurpose
qwen3-deltaQwen3 4B MLX backbone plus converted δ-mem adapter
smokeSmall Qwen2.5 MLX model for cheap setup checks
customUser-supplied MLX model and optional adapter directory

The Qwen3 δ preset validates that delta_mem_config.json and delta_mem_adapter_mlx.npz exist before startup, so partial adapter downloads fail early with a clear error.

How Memory Reaches The Model

OpenClaw calls the sidecar like a normal OpenAI-compatible provider:

  • baseUrl: sidecar /v1 root
  • transportProtocol: openai-chat-completions
  • endpoint: /v1/chat/completions
  • stable session header: X-OpenClaw-Session-Key

Retrieved memory should not be appended to the visible chat unless the operator explicitly wants that behavior. The sidecar accepts hidden memory-shaping input as:

  • request body: attention_state, attentionState, or delta_attention_state
  • request header: X-Delta-Attention-State

The sidecar then preloads those snippets into the per-session δ-state before the real assistant response. This is attention shaping, not guaranteed verbatim recall.

Agent Integration Notes

Agents configuring this plugin should preserve these invariants:

  • Use one stable X-OpenClaw-Session-Key per logical agent/session.
  • Route to /v1/chat/completions, not legacy completions.
  • Keep QMD snippets in attention_state; do not copy them into user-visible prompt text unless explicitly requested.
  • Verify integration with X-Delta-Attention-State-Count and X-Delta-Attention-State-Source.
  • Treat install/download actions as operator-approved; do not silently fetch model weights.

Minimal provider hook:

 const body = {
   model,
   messages,
   temperature,
   max_tokens: maxTokens
 };
+const query = latestUserText(messages);
+const snippets = await qmd.vsearch(query, { limit: 6 });
+body.attention_state = snippets.map((item) => ({
+  text: item.text || item.content || item.snippet,
+  source: item.source || "qmd:vsearch",
+  score: item.score
+})).filter((item) => item.text);

Benchmarks

Current findings live in the sidecar wiki:

https://github.com/elimaine/delta-mem-mlx-sidecar-w-openclaw/blob/main/wiki/Benchmark-Findings.md

High-level summary:

Test shapeResult
Qwen3 4B + δ-mem adapterConsistent response-quality lift over the plain backbone
Strongest OpenClaw-shaped QMD preload tests0.5625 plain to 0.7292 δ-mem, about 1.30x
Stricter OpenClaw-16 replaySmall exact-recall recovery, +0.0391 absolute
LatencyUsually slower with memory enabled; exact ratio depends on path

These are early local benchmarks, not a claim that the plugin perfectly recalls session history. The practical goal is to measure when δ-state improves agent behavior enough to justify the latency and setup cost.

When Not To Use This

  • You are not on Apple Silicon and need an efficient local runtime today.
  • You want guaranteed transcript recall rather than attention shaping.
  • You need a zero-setup hosted model provider.
  • You cannot allocate disk space for local model artifacts.

Documentation

License

MIT

Source and release

Source repository

elimaine/openclaw-delta-mem-mlx-plugin

Open repo

Source commit

2c25fbf08caf9b5952cdf424b3b0a028766b5e25

View commit

Install command

openclaw plugins install clawhub:@elimaine/openclaw-delta-mem-mlx

Metadata

  • Package: @elimaine/openclaw-delta-mem-mlx
  • Created: 2026/05/17
  • Updated: 2026/05/17
  • Executes code: Yes
  • Source tag: main

Compatibility

  • Built with OpenClaw: 2026.3.24-beta.2
  • Plugin API range: >=2026.3.24-beta.2
  • Tags: latest
  • Files: 7