OpenClaw SafeMode for OpenClaw

A one-install policy gate that protects every OpenClaw tool call against the six categories of catastrophe an autonomous agent is most likely to cause:

Rule	What it blocks
`financial_drain`	Per-call, per-hour, per-day, per-week spend over your threshold
`credential_access`	Reads of `.aws/`, `.ssh/`, `.env`, vaults, password stores
`identity_escalation`	New users, granted admin, modified roles or policies
`mass_destruction`	`rm -rf`, `DROP TABLE`, force-deletes, mass deletes
`lateral_pivot`	Calls to instance metadata, k8s management ports, internal admin paths
`unauthorized_exfiltration`	Outbound transmissions to non-allowlisted destinations, especially after a sensitive read

Decisions are made by Inherence's hosted gate (free tier). The plugin sends a tool-call envelope, gets back allow or deny plus a signed receipt that you can verify offline with @inherencelabs/verifier.

Install

# 1. Install the plugin into your OpenClaw workspace
openclaw plugin install @inherencelabs/openclaw-safemode

# 2. Get a free API key
open https://inherencelabs.com/dashboard   # or visit in browser

# 3. Set the key (one of these)
export INHERENCE_API_KEY=ink_...
# …or put it in your OpenClaw plugin config:
#    extensions:
#      openclaw-safemode:
#        apiKey: "ink_..."

Restart OpenClaw. From the next tool call onward, every call is gated.

What you see

> agent: "Charge customer Acme Corp $99,000,000 for renewal"

[openclaw-safemode] deny  tool=stripe_charge  rule=financial_drain
  reason: financial_drain.numeric_threshold
  receipt: rec_8c2a1f4a09bb6731
  review: https://dashboard.inherencelabs.com/decisions/rec_8c2a1f4a09bb6731

Tool call BLOCKED. The agent is informed it cannot proceed.

> agent: "Show me last week's orders"

[openclaw-safemode] allow  tool=query_orders  latency=42ms  receipt: rec_…

Tool call PROCEEDS normally.

Configuration

Set in openclaw.config.jsonc under extensions.openclaw-safemode:

Field	Default	Purpose
`apiKey`	`$INHERENCE_API_KEY`	Your Inherence API key
`endpoint`	`https://mcp.inherencelabs.com/api/proxy/decide`	Gate URL — override for staging/self-host
`policyMode`	`"smart"`	`smart` = six-rule shield. `strict` = adds extra heuristics. `permissive` = log-only, never block
`onDeny`	`"block"`	`block` = terminal reject. `approve` = surface a user approval prompt. `log` = warn + allow (dev only)
`failClosed`	`true`	If the gate is unreachable, block the call. Set false to fail-open — NOT recommended outside dev
`timeoutMs`	`1500`	Connect + read timeout for the gate (matches `inherence-proxy` default)
`sessionId`	(generated)	UUID4 scoping for exfiltration-rate checks
`receiptSink`	(unset)	Path to append signed receipts as JSONL for audit replay

How the decision is made

   ┌──────────────┐   tool_name + args     ┌──────────────────────┐
   │  OpenClaw    │ ───────────────────▶   │  openclaw-safemode    │
   │  agent       │                        │  (this plugin)       │
   │              │                        │                      │
   │              │ ◀───── allow/deny ─────│  classify locally    │
   └──────────────┘                        │  POST /decide        │
                                           │  log + return        │
                                           └──────┬───────────────┘
                                                  │
                                            HTTPS │ Bearer auth
                                                  ▼
                                  ┌──────────────────────────────────┐
                                  │   mcp.inherencelabs.com          │
                                  │   /api/proxy/decide              │
                                  │                                  │
                                  │   • Runs the 6 catastrophe rules │
                                  │   • Signs an EdDSA JWT receipt   │
                                  │   • Constant-time response       │
                                  └──────────────────────────────────┘

Round-trip latency budget: ~100 ms at the hosted gate (p50 ~98 ms over US-East). The plugin adds <1 ms of overhead.

Verifying receipts offline

Every decision (allow or deny) returns a signed JWT. To audit without trusting our infrastructure, run one of the three reference verifiers.

The npm package @inherencelabs/verifier is coming soon. Until it ships to npm, install the verifier directly from the GitHub source — instructions below. All three verifiers (Rust, Python, JavaScript/WASM) pass the same 22-vector conformance suite, so picking whichever language fits your stack is safe.

Rust (most common)

# Cargo.toml
[dependencies]
inherence-verifier = { git = "https://github.com/Inherencelabs/inherence-verifier-rs", tag = "main" }

use inherence_verifier::{verify_receipt, VerifyConfig};

let cfg = VerifyConfig::new().pin_authority_jwk(&authority_jwk_json)?;
match verify_receipt(&jwt, &cfg) {
    Ok(())  => println!("VALID"),
    Err(e)  => println!("INVALID: {}", e.code()),
}

Python

pip install git+https://github.com/Inherencelabs/inherence-verifier-py

from inherence_verifier import VerifyConfig, verify_receipt
cfg = VerifyConfig()
cfg.pin_authority_jwk(json.dumps(authority_jwk))
out = verify_receipt(receipt_jwt, cfg)
assert out["verdict"] == "VALID"

JavaScript / WASM (coming to npm)

# Until @inherencelabs/verifier ships to npm, build from source:
git clone https://github.com/Inherencelabs/inherence-verifier-js
cd inherence-verifier-js && wasm-pack build --target nodejs --out-dir pkg
# Then in your project:
npm install /path/to/inherence-verifier-js/pkg

import { verify } from "@inherencelabs/verifier";   // post-publish

const out = verify({
  jwt: receipt,
  authority_jwk: JSON.stringify(authorityJwk),
  pinned_vks: {},
});
// out.verdict === "VALID" or { verdict: "INVALID", failure_code: "..." }

The pinned authority JWK lives at https://inherencelabs.com/keys/authority.jwk. See the verification protocol spec for the full procedure.

Failure modes

Situation	Default behavior	Configurable
API key missing	Plugin blocks every call with a clear "config" error	No — always block
API key invalid (401)	Plugin blocks every call	No — always block
Gate unreachable (timeout, network, 5xx)	Plugin blocks every call	`failClosed: false` to allow
Gate returns `deny`	Plugin blocks the call	`onDeny: "approve"` for user prompt, `"log"` for dev
Plugin throws	OpenClaw treats hook as fail-closed (per its own policy)	No

The fail-closed semantics for unreachable + 5xx mirror inherence-proxy's posture — the plugin assumes you'd rather fail loud than ship an unsafe action.

Compatibility

Tested clean against the three OpenClaw maintainer-favorite extensions:

Extension	OpenClaw role	Shield verdict
`browser`	tool plugin	✓ gated through `before_tool_call`; classifier returns `unknown` for the bare tool name, gate inspects URL args
`codex`	agent harness + GPT provider	✓ codex-dispatched calls hit the same `before_tool_call` pipeline as native dispatch
`clickclack`	channel	✓ no hook overlap (channel events ≠ tool calls)

See COMPATIBILITY.md for the detailed matrix + run instructions.

Examples

See examples/ for three runnable agent configs:

examples/autonomous-trader/ — agent with stripe_charge access; shield blocks oversized transfers
examples/code-exec-agent/ — agent with bash access; shield blocks rm -rf and credential reads
examples/web-research/ — agent with fetch access; shield blocks exfiltration to non-allowlisted hosts

Testing your install

# Mock the gate locally and run the conformance suite
npm test

# Hit the live gate (requires INHERENCE_API_KEY)
INHERENCE_API_KEY=ink_... npm run test:e2e

License

Apache-2.0.

Why not pure local?

The hosted gate runs the same six rules that ship in inherence-python's rule_catalog.py, but the gate (a) holds the cross-agent rate counters required for unauthorized_exfiltration and financial_drain, (b) signs the receipt with a key the verifier libraries pin, and (c) gets policy updates without the user re-deploying. A pure-local variant is on the roadmap for offline agents; talk to us if you need it.

OpenClaw SafeMode

OpenClaw SafeMode for OpenClaw

Install

What you see

Configuration

How the decision is made

Verifying receipts offline

Rust (most common)

Python

JavaScript / WASM (coming to npm)

Failure modes

Compatibility

Examples

Testing your install

License

Why not pure local?

Source and release

Source repository

Source commit

Install command

Metadata

Compatibility