OpenClaw SafeMode for OpenClaw
A one-install policy gate that protects every OpenClaw tool call against the six categories of catastrophe an autonomous agent is most likely to cause:
| Rule | What it blocks |
|---|---|
financial_drain | Per-call, per-hour, per-day, per-week spend over your threshold |
credential_access | Reads of .aws/, .ssh/, .env, vaults, password stores |
identity_escalation | New users, granted admin, modified roles or policies |
mass_destruction | rm -rf, DROP TABLE, force-deletes, mass deletes |
lateral_pivot | Calls to instance metadata, k8s management ports, internal admin paths |
unauthorized_exfiltration | Outbound transmissions to non-allowlisted destinations, especially after a sensitive read |
Decisions are made by Inherence's hosted gate (free tier). The plugin sends a tool-call envelope, gets back allow or deny plus a signed receipt that you can verify offline with @inherencelabs/verifier.
Install
# 1. Install the plugin into your OpenClaw workspace
openclaw plugin install @inherencelabs/openclaw-safemode
# 2. Get a free API key
open https://inherencelabs.com/dashboard # or visit in browser
# 3. Set the key (one of these)
export INHERENCE_API_KEY=ink_...
# …or put it in your OpenClaw plugin config:
# extensions:
# openclaw-safemode:
# apiKey: "ink_..."
Restart OpenClaw. From the next tool call onward, every call is gated.
What you see
> agent: "Charge customer Acme Corp $99,000,000 for renewal"
[openclaw-safemode] deny tool=stripe_charge rule=financial_drain
reason: financial_drain.numeric_threshold
receipt: rec_8c2a1f4a09bb6731
review: https://dashboard.inherencelabs.com/decisions/rec_8c2a1f4a09bb6731
Tool call BLOCKED. The agent is informed it cannot proceed.
> agent: "Show me last week's orders"
[openclaw-safemode] allow tool=query_orders latency=42ms receipt: rec_…
Tool call PROCEEDS normally.
Configuration
Set in openclaw.config.jsonc under extensions.openclaw-safemode:
| Field | Default | Purpose |
|---|---|---|
apiKey | $INHERENCE_API_KEY | Your Inherence API key |
endpoint | https://mcp.inherencelabs.com/api/proxy/decide | Gate URL — override for staging/self-host |
policyMode | "smart" | smart = six-rule shield. strict = adds extra heuristics. permissive = log-only, never block |
onDeny | "block" | block = terminal reject. approve = surface a user approval prompt. log = warn + allow (dev only) |
failClosed | true | If the gate is unreachable, block the call. Set false to fail-open — NOT recommended outside dev |
timeoutMs | 1500 | Connect + read timeout for the gate (matches inherence-proxy default) |
sessionId | (generated) | UUID4 scoping for exfiltration-rate checks |
receiptSink | (unset) | Path to append signed receipts as JSONL for audit replay |
How the decision is made
┌──────────────┐ tool_name + args ┌──────────────────────┐
│ OpenClaw │ ───────────────────▶ │ openclaw-safemode │
│ agent │ │ (this plugin) │
│ │ │ │
│ │ ◀───── allow/deny ─────│ classify locally │
└──────────────┘ │ POST /decide │
│ log + return │
└──────┬───────────────┘
│
HTTPS │ Bearer auth
▼
┌──────────────────────────────────┐
│ mcp.inherencelabs.com │
│ /api/proxy/decide │
│ │
│ • Runs the 6 catastrophe rules │
│ • Signs an EdDSA JWT receipt │
│ • Constant-time response │
└──────────────────────────────────┘
Round-trip latency budget: ~100 ms at the hosted gate (p50 ~98 ms over US-East). The plugin adds <1 ms of overhead.
Verifying receipts offline
Every decision (allow or deny) returns a signed JWT. To audit without trusting our infrastructure, run one of the three reference verifiers.
The npm package
@inherencelabs/verifieris coming soon. Until it ships to npm, install the verifier directly from the GitHub source — instructions below. All three verifiers (Rust, Python, JavaScript/WASM) pass the same 22-vector conformance suite, so picking whichever language fits your stack is safe.
Rust (most common)
# Cargo.toml
[dependencies]
inherence-verifier = { git = "https://github.com/Inherencelabs/inherence-verifier-rs", tag = "main" }
use inherence_verifier::{verify_receipt, VerifyConfig};
let cfg = VerifyConfig::new().pin_authority_jwk(&authority_jwk_json)?;
match verify_receipt(&jwt, &cfg) {
Ok(()) => println!("VALID"),
Err(e) => println!("INVALID: {}", e.code()),
}
Python
pip install git+https://github.com/Inherencelabs/inherence-verifier-py
from inherence_verifier import VerifyConfig, verify_receipt
cfg = VerifyConfig()
cfg.pin_authority_jwk(json.dumps(authority_jwk))
out = verify_receipt(receipt_jwt, cfg)
assert out["verdict"] == "VALID"
JavaScript / WASM (coming to npm)
# Until @inherencelabs/verifier ships to npm, build from source:
git clone https://github.com/Inherencelabs/inherence-verifier-js
cd inherence-verifier-js && wasm-pack build --target nodejs --out-dir pkg
# Then in your project:
npm install /path/to/inherence-verifier-js/pkg
import { verify } from "@inherencelabs/verifier"; // post-publish
const out = verify({
jwt: receipt,
authority_jwk: JSON.stringify(authorityJwk),
pinned_vks: {},
});
// out.verdict === "VALID" or { verdict: "INVALID", failure_code: "..." }
The pinned authority JWK lives at https://inherencelabs.com/keys/authority.jwk. See the verification protocol spec for the full procedure.
Failure modes
| Situation | Default behavior | Configurable |
|---|---|---|
| API key missing | Plugin blocks every call with a clear "config" error | No — always block |
| API key invalid (401) | Plugin blocks every call | No — always block |
| Gate unreachable (timeout, network, 5xx) | Plugin blocks every call | failClosed: false to allow |
Gate returns deny | Plugin blocks the call | onDeny: "approve" for user prompt, "log" for dev |
| Plugin throws | OpenClaw treats hook as fail-closed (per its own policy) | No |
The fail-closed semantics for unreachable + 5xx mirror inherence-proxy's posture — the plugin assumes you'd rather fail loud than ship an unsafe action.
Compatibility
Tested clean against the three OpenClaw maintainer-favorite extensions:
| Extension | OpenClaw role | Shield verdict |
|---|---|---|
browser | tool plugin | ✓ gated through before_tool_call; classifier returns unknown for the bare tool name, gate inspects URL args |
codex | agent harness + GPT provider | ✓ codex-dispatched calls hit the same before_tool_call pipeline as native dispatch |
clickclack | channel | ✓ no hook overlap (channel events ≠ tool calls) |
See COMPATIBILITY.md for the detailed matrix + run instructions.
Examples
See examples/ for three runnable agent configs:
examples/autonomous-trader/— agent withstripe_chargeaccess; shield blocks oversized transfersexamples/code-exec-agent/— agent withbashaccess; shield blocksrm -rfand credential readsexamples/web-research/— agent withfetchaccess; shield blocks exfiltration to non-allowlisted hosts
Testing your install
# Mock the gate locally and run the conformance suite
npm test
# Hit the live gate (requires INHERENCE_API_KEY)
INHERENCE_API_KEY=ink_... npm run test:e2e
License
Apache-2.0.
Why not pure local?
The hosted gate runs the same six rules that ship in inherence-python's rule_catalog.py, but the gate (a) holds the cross-agent rate counters required for unauthorized_exfiltration and financial_drain, (b) signs the receipt with a key the verifier libraries pin, and (c) gets policy updates without the user re-deploying. A pure-local variant is on the roadmap for offline agents; talk to us if you need it.