@inherencelabs

OpenClaw SafeMode

OpenClaw SafeMode — gates every OpenClaw tool call through Inherence's six-rule catastrophe shield (financial_drain, credential_access, identity_escalation, mass_destruction, lateral_pivot, unauthorized_exfiltration).

Current version
v0.1.0
code-pluginCommunitysource-linked

OpenClaw SafeMode for OpenClaw

A one-install policy gate that protects every OpenClaw tool call against the six categories of catastrophe an autonomous agent is most likely to cause:

RuleWhat it blocks
financial_drainPer-call, per-hour, per-day, per-week spend over your threshold
credential_accessReads of .aws/, .ssh/, .env, vaults, password stores
identity_escalationNew users, granted admin, modified roles or policies
mass_destructionrm -rf, DROP TABLE, force-deletes, mass deletes
lateral_pivotCalls to instance metadata, k8s management ports, internal admin paths
unauthorized_exfiltrationOutbound transmissions to non-allowlisted destinations, especially after a sensitive read

Decisions are made by Inherence's hosted gate (free tier). The plugin sends a tool-call envelope, gets back allow or deny plus a signed receipt that you can verify offline with @inherencelabs/verifier.

Install

# 1. Install the plugin into your OpenClaw workspace
openclaw plugin install @inherencelabs/openclaw-safemode

# 2. Get a free API key
open https://inherencelabs.com/dashboard   # or visit in browser

# 3. Set the key (one of these)
export INHERENCE_API_KEY=ink_...
# …or put it in your OpenClaw plugin config:
#    extensions:
#      openclaw-safemode:
#        apiKey: "ink_..."

Restart OpenClaw. From the next tool call onward, every call is gated.

What you see

> agent: "Charge customer Acme Corp $99,000,000 for renewal"

[openclaw-safemode] deny  tool=stripe_charge  rule=financial_drain
  reason: financial_drain.numeric_threshold
  receipt: rec_8c2a1f4a09bb6731
  review: https://dashboard.inherencelabs.com/decisions/rec_8c2a1f4a09bb6731

Tool call BLOCKED. The agent is informed it cannot proceed.
> agent: "Show me last week's orders"

[openclaw-safemode] allow  tool=query_orders  latency=42ms  receipt: rec_…

Tool call PROCEEDS normally.

Configuration

Set in openclaw.config.jsonc under extensions.openclaw-safemode:

FieldDefaultPurpose
apiKey$INHERENCE_API_KEYYour Inherence API key
endpointhttps://mcp.inherencelabs.com/api/proxy/decideGate URL — override for staging/self-host
policyMode"smart"smart = six-rule shield. strict = adds extra heuristics. permissive = log-only, never block
onDeny"block"block = terminal reject. approve = surface a user approval prompt. log = warn + allow (dev only)
failClosedtrueIf the gate is unreachable, block the call. Set false to fail-open — NOT recommended outside dev
timeoutMs1500Connect + read timeout for the gate (matches inherence-proxy default)
sessionId(generated)UUID4 scoping for exfiltration-rate checks
receiptSink(unset)Path to append signed receipts as JSONL for audit replay

How the decision is made

   ┌──────────────┐   tool_name + args     ┌──────────────────────┐
   │  OpenClaw    │ ───────────────────▶   │  openclaw-safemode    │
   │  agent       │                        │  (this plugin)       │
   │              │                        │                      │
   │              │ ◀───── allow/deny ─────│  classify locally    │
   └──────────────┘                        │  POST /decide        │
                                           │  log + return        │
                                           └──────┬───────────────┘
                                                  │
                                            HTTPS │ Bearer auth
                                                  ▼
                                  ┌──────────────────────────────────┐
                                  │   mcp.inherencelabs.com          │
                                  │   /api/proxy/decide              │
                                  │                                  │
                                  │   • Runs the 6 catastrophe rules │
                                  │   • Signs an EdDSA JWT receipt   │
                                  │   • Constant-time response       │
                                  └──────────────────────────────────┘

Round-trip latency budget: ~100 ms at the hosted gate (p50 ~98 ms over US-East). The plugin adds <1 ms of overhead.

Verifying receipts offline

Every decision (allow or deny) returns a signed JWT. To audit without trusting our infrastructure, run one of the three reference verifiers.

The npm package @inherencelabs/verifier is coming soon. Until it ships to npm, install the verifier directly from the GitHub source — instructions below. All three verifiers (Rust, Python, JavaScript/WASM) pass the same 22-vector conformance suite, so picking whichever language fits your stack is safe.

Rust (most common)

# Cargo.toml
[dependencies]
inherence-verifier = { git = "https://github.com/Inherencelabs/inherence-verifier-rs", tag = "main" }
use inherence_verifier::{verify_receipt, VerifyConfig};

let cfg = VerifyConfig::new().pin_authority_jwk(&authority_jwk_json)?;
match verify_receipt(&jwt, &cfg) {
    Ok(())  => println!("VALID"),
    Err(e)  => println!("INVALID: {}", e.code()),
}

Python

pip install git+https://github.com/Inherencelabs/inherence-verifier-py
from inherence_verifier import VerifyConfig, verify_receipt
cfg = VerifyConfig()
cfg.pin_authority_jwk(json.dumps(authority_jwk))
out = verify_receipt(receipt_jwt, cfg)
assert out["verdict"] == "VALID"

JavaScript / WASM (coming to npm)

# Until @inherencelabs/verifier ships to npm, build from source:
git clone https://github.com/Inherencelabs/inherence-verifier-js
cd inherence-verifier-js && wasm-pack build --target nodejs --out-dir pkg
# Then in your project:
npm install /path/to/inherence-verifier-js/pkg
import { verify } from "@inherencelabs/verifier";   // post-publish

const out = verify({
  jwt: receipt,
  authority_jwk: JSON.stringify(authorityJwk),
  pinned_vks: {},
});
// out.verdict === "VALID" or { verdict: "INVALID", failure_code: "..." }

The pinned authority JWK lives at https://inherencelabs.com/keys/authority.jwk. See the verification protocol spec for the full procedure.

Failure modes

SituationDefault behaviorConfigurable
API key missingPlugin blocks every call with a clear "config" errorNo — always block
API key invalid (401)Plugin blocks every callNo — always block
Gate unreachable (timeout, network, 5xx)Plugin blocks every callfailClosed: false to allow
Gate returns denyPlugin blocks the callonDeny: "approve" for user prompt, "log" for dev
Plugin throwsOpenClaw treats hook as fail-closed (per its own policy)No

The fail-closed semantics for unreachable + 5xx mirror inherence-proxy's posture — the plugin assumes you'd rather fail loud than ship an unsafe action.

Compatibility

Tested clean against the three OpenClaw maintainer-favorite extensions:

ExtensionOpenClaw roleShield verdict
browsertool plugin✓ gated through before_tool_call; classifier returns unknown for the bare tool name, gate inspects URL args
codexagent harness + GPT provider✓ codex-dispatched calls hit the same before_tool_call pipeline as native dispatch
clickclackchannel✓ no hook overlap (channel events ≠ tool calls)

See COMPATIBILITY.md for the detailed matrix + run instructions.

Examples

See examples/ for three runnable agent configs:

  • examples/autonomous-trader/ — agent with stripe_charge access; shield blocks oversized transfers
  • examples/code-exec-agent/ — agent with bash access; shield blocks rm -rf and credential reads
  • examples/web-research/ — agent with fetch access; shield blocks exfiltration to non-allowlisted hosts

Testing your install

# Mock the gate locally and run the conformance suite
npm test

# Hit the live gate (requires INHERENCE_API_KEY)
INHERENCE_API_KEY=ink_... npm run test:e2e

License

Apache-2.0.

Why not pure local?

The hosted gate runs the same six rules that ship in inherence-python's rule_catalog.py, but the gate (a) holds the cross-agent rate counters required for unauthorized_exfiltration and financial_drain, (b) signs the receipt with a key the verifier libraries pin, and (c) gets policy updates without the user re-deploying. A pure-local variant is on the roadmap for offline agents; talk to us if you need it.

Source and release

Source repository

Inherencelabs/openclaw-safemode

Open repo

Source commit

aa044b7304f0e39080a42f42ec5c90b6f40a423e

View commit

Install command

openclaw plugins install clawhub:@inherencelabs/openclaw-safemode

Metadata

  • Package: @inherencelabs/openclaw-safemode
  • Created: 2026/05/21
  • Updated: 2026/05/21
  • Executes code: Yes
  • Source tag: main

Compatibility

  • Built with OpenClaw: 2026.5.19
  • Plugin API range: >=2026.5.19
  • Tags: latest
  • Files: 41