openclaw-language-boundary

OCAL-style language-boundary safety plugin for OpenClaw.

Status

Stable MVP. Default hardening phase is observe, with progressive recommendations toward guided, strict, and force after clean audit windows.

What it does

Builds a conservative Action IR before each tool call
Classifies effect / target / risk without relying on the model
Applies default policy rules
Tracks tool failure state
Writes redacted audit JSONL
Adds observe-mode checks for outbound messages, installs, subagent spawning, and cron lifecycle
Supports progressive hardening phases: observe → guided → strict → force
Can block or request approval when the active hardening phase maps to enforce mode

First-use config

Generate a starter config:

npm run typecheck
npm test
npm run build
npm run release:check
npm run init-config -- --mode=observe
npm run init-config -- --mode=guided
npm run init-config -- --mode=strict
npm run init-config -- --mode=force

npm run release:check runs the read-only internal gate: typecheck, tests, build, required docs, example config JSON validation, generated init-config validation, local/private value leak scan, and dist entry verification.

partial-enforce is still accepted as an alias for guided.

The generated config uses placeholders only, e.g. /path/to/workspace; replace them with your own workspace and production data roots.

Example configs are available in examples/:

minimal-config.json
observe-mode.json
partial-enforce.json (guided phase)
force-mode.json

Progressive hardening

The plugin separates policy mode from runtime state:

policy_mode: observe or enforce — whether blocking decisions are active.
hardening_phase: observe, guided, strict, or force — product-facing safety maturity phase.
runtime_state: normal, degraded, failure_loop, etc. — current substrate/tool health.

Recommended rollout:

observe for 7-30 days: record Action IR and audit logs without broad blocking.
guided after a clean audit window: enforce the vetted high-risk rule set.
strict after guided stays clean: increase approval coverage for sensitive environments.
force only by explicit user/admin confirmation: production/compliance mode.

By default autoAdvance is "off". If enabled as "guided-only", the plugin may advance only from clean observe to guided; it never auto-enters strict or force.

Safety defaults

Unknown tools require approval
Exec requires approval
External write requires approval
Config/credential targets require strong approval
Hook errors fail closed for high-risk actions
Audit logs redact sensitive-looking values

Calibration notes

After initial observe-mode calibration, these OpenClaw core tools are classified as low-risk/read-like when used safely:

process with list / poll / log
update_plan
sessions_yield
subagents with list

Riskier variants stay high risk, e.g. process.kill, process.write, subagents.steer, and exec.

Next steps

Keep mode: "observe" as the project default.
Use partial enforce only after a clean audit window.
Keep calibration focused on false positives in runtime config, shell diagnostics, workspace artifact cleanup, and external writes.

Mainline closeout

The runtime reliability mainline is stable and usable.
The language-boundary mainline is now in calibration / maintenance mode, not active incident response.
Continue only if future audit logs show new false positives or a new boundary gap.

Reusability requirements

This plugin is intended to be released for other OpenClaw users, not just this machine.

No hard-coded user paths such as /Users/<name> or machine-specific workspace paths.
No dependency on a specific agent/session/machine name.
No dependency on a specific channel, provider account, local credential, cron job, or local extension layout.
Host-specific paths or providers may appear only as examples or local validation notes, never runtime defaults.
Path classification must stay config-driven with safe defaults.
Audit/state paths must be configurable or derived from OpenClaw/home directory.
Default behavior must remain conservative: observe first, partial enforce only after calibration.

Language Boundary