@regulus-joseph

Policy Layer (CBS) + Security Layers

OpenClaw policy enforcement plugin: Layer 1 (normalize + 23 dangerous patterns), Layer 2 (D' CBS sensorium), Layer 3 (LLM smart-review + fast-lane + JSONL log), Layer 4 (39 secret patterns redaction)

当前版本
v0.4.0
code-plugin社区source-linked

Policy Layer — v0.4.0

OpenClaw Gateway Plugin: 4-layer security enforcement framework + D' Cognitive Behavior Scoring (CBS).


What Is Policy Layer?

Policy Layer is a security and behavioral governance plugin running at the OpenClaw Gateway layer, performing multi-dimensional checks at three stages — before, during, and after every Agent command execution:

User Input → LLM decides to execute a tool → before_tool_call (security check)
                                                      ↓
Tool Executes → after_tool_call (secret leak detection)
                                                      ↓
Next LLM Decision → before_prompt_build (inject cognitive state score)

Core problems it solves:

GoalHow
SecurityBlock dangerous commands (e.g. rm -rf /, curl|sh) before execution
Self-awarenessLet the Agent "know its own state" — slow down when D' score is low
LearningRecord all security decisions; user can flag wrong decisions (report-bad-result)
TransparencyEvery decision is written to JSONL for audit and visualization

Feature Overview

FeatureDescription
🛡️ Dangerous Command BlockingLayer 1 pattern matching — 16 CRITICAL patterns blocked immediately, no LLM review
🤖 LLM Smart ReviewHIGH/MEDIUM commands go through Ollama local model for second review (approve/deny/escalate)
🚀 Fast LaneSame harmless command approved 5 times consecutively → skip LLM review, fast-track
📊 Cognitive State ScoringD' CBS algorithm — 4 dimensions (success rate / tool fail / context hit / severity) scored in real time
🔒 Secret Leak Detectionafter_tool_call scans tool output for 39 secret patterns; leaks trigger warning
📝 Decision Audit LogAll decisions appended to ~/.openclaw/logs/approval.jsonl (JSONL, append-only)
🗳️ User Feedback Loopreport-bad-result — user flags wrong decisions → score drops + added to blacklist
📈 Analytics DashboardGenerate HTML dashboard from approval.jsonl, with pattern filtering and timeline analysis

Quick Start

# Verify plugin is loaded (restart to reload)
openclaw gateway restart

# Run tests
cd ~/projects/policy-layer
npm test                  # 103 tests

# Regenerate analytics dashboard
python3 docs/generate-analytics.py
open docs/approval-analytics.html

Architecture in Depth

End-to-End Flow

Tool Call Input
     │
     ├─ Layer 1: normalizeCommand()
     │      ├─ stripAnsi()         // Remove ANSI escape codes
     │      ├─ stripNullBytes()    // Remove \x00 (common evasion technique)
     │      └─ nfkcNormalize()     // NFKC normalization (unify Unicode homoglyphs)
     │      ↓
     ├─ Layer 1: detectDangerousPatterns()
     │      ├─ CRITICAL (16): Block immediately, no LLM review
     │      └─ HIGH/MEDIUM (7): Proceed to Smart Review
     │      ↓
     ├─ Layer 2: D' CBS (injected via before_prompt_build)
     │      └─ Injects <openclaw_state> XML into LLM context
     │         Agent reads it and adjusts behavior according to D' score
     │      ↓
     ├─ Layer 3: Smart Review (HIGH/MEDIUM only)
     │      ├─ Ollama local inference (approve / deny / escalate)
     │      ├─ Fast Lane: 5 consecutive approves → skip LLM review
     │      └─ Approval Log: all decisions appended to approval.jsonl
     │      ↓
     └─ Layer 4: Secret Leak Detection (after_tool_call)
            └─ Scan tool output for 39 secret patterns; leak → warn + redact

Layer-by-Layer Details

Layer 1 — Command Normalization & Danger Detection

1.1 Command Normalization (normalize.ts)

Before pattern matching, commands go through three preprocessing steps:

normalizeCommand(cmd: string): string
  ├─ stripAnsi(str)         // Remove ANSI escapes (e.g. \x1b[31m)
  ├─ stripNullBytes(str)     // Remove \x00 bytes
  └─ nfkcNormalize(str)      // NFKC normalize (unify Unicode homoglyphs)

Why NFKC normalization?
Some Unicode characters are visually identical to ASCII (e.g. Greek ο vs Latin o). Attackers can use homoglyphs to craft commands that bypass pattern detection. NFKC normalization converts them all to the standard form.

1.2 Danger Pattern Detection (patterns.ts)

Detects 23 dangerous patterns, split into two response tiers:

CRITICAL — Immediate Block (no LLM review)

PatternExample MatchNote
rm_recursive_rootrm -rf /, rm -rf /*Recursive delete from root
pipe_to_shellcurl ... | sh, wget ... | bashRemote code execution
kill_allkill -9 -1, killall gatewayKill all processes
fork_bomb:(){ :|:& };:Fork bomb
chmod_777_rootchmod 777 /Permission downgrading
gateway_stoppkill gateway, openclaw gateway stopShutdown self
script_executionchmod +x *.sh | bash|sh|pythonRun unauthorized scripts
/dev/tcpcat /dev/tcp/...Firewall bypass via /dev/tcp

HIGH / MEDIUM — LLM Smart Review

PatternExample Match
git_reset_hardgit reset --hard
dev_tcp/dev/tcp/host/port
sql_dropDROP TABLE, DROP DATABASE
kill_term_negativekill -TERM -1

1.3 Path Traversal Detection (path.ts)

Detects path traversal attacks: ../ escaping home directory, /proc//sys sensitive path access.


Layer 2 — D' Cognitive Behavior Scoring System (CBS)

2.1 What Is D'?

D' (d-prime) is the core metric from Signal Detection Theory. Policy Layer adapts it for AI Agent behavioral evaluation — treating the Agent's historical behavior as the "signal" and comparing it against a baseline, yielding a quantifiable risk/anomaly score.

2.2 Four Signal Dimensions

After each tool call, the system records 4 signals:

SignalWeightMeaningMax-normalized
success_rate0.30Tool call success ratesuccess_rate / 1.0
tool_fail0.25Tool failure rate (lower is better)1 - failure_rate / 1.0
cbr_hit0.20Context Buffer Recall hit ratecbr_hit / 1.0
severity_inv0.25Severity (lower is better)1 - severity / 1000

2.3 D' Score Calculation

D' = Σ(w_i × m_i) / (0.30 × 1.0 × n)

Where m_i is each signal max-normalized, n is the cycle window count.

2.4 Score Injection

In the before_prompt_build hook, inject <openclaw_state> XML into the LLM context:

<openclaw_state>
  <d_prime>0.72</d_prime>
  <d_gate_threshold>0.35</d_gate_threshold>
  <d_gate_status>HIGH_REJECT</d_gate_status>
  <cycles_tracked>12</cycles_tracked>
  <session_success_rate>0.95</session_success_rate>
  <last_policy_result>PASS</last_policy_result>
</openclaw_state>

2.5 Score Thresholds & Behavioral Guidance

D' RangeStatusAgent Behavior
> 0.65NORMALNormal execution, no special guidance
0.35–0.65LOW_ACCEPTModerate caution, reduce aggressive operations
< 0.35HIGH_REJECTSignificant slowdown, perform only essential operations
< 0.20CRITICALPause all non-essential operations, await user confirmation

Layer 3 — Smart Review System

3.1 Smart Review (smart-review.ts)

For HIGH/MEDIUM commands, run a second review via Ollama local LLM:

smartReview(cmd: string, patterns: string[]): ReviewResult
// ReviewResult: "approve" | "deny" | "escalate"

Review flow:

  1. Organize command + matched patterns + context into a prompt
  2. Request Ollama (llama3.3 by default — local inference, no network required)
  3. LLM returns approve / deny / escalate
  4. Result written to Approval Log

Safety: If Ollama is unreachable, defaults to escalate (safe default — requires human approval).

3.2 Fast Lane (fast-lane.ts)

Motivation: For definitely harmless commands (e.g. git status, ls), running LLM review every time is wasteful and adds latency.

Mechanism: The same command pattern approved by LLM 5 times in a row → enter Fast Lane, subsequent same-pattern commands bypass LLM review entirely.

// Fast lane counter (grouped by pattern)
fast_lane_counter: Map<pattern_label, consecutive_approvals>
// Trigger: consecutive_approvals >= 5
// Reset: any deny / escalate / new command pattern

3.3 Approval Log (approval-log.ts)

All decisions (approve / deny / escalate / fast_lane / blocked) are appended to:

~/.openclaw/logs/approval.jsonl

Each line format:

{"ts":"2026-05-16T21:00:00.000Z","cmd":"rm -rf node_modules","patterns":["rm_recursive"],"result":"approve","review":"fast_lane"}

Layer 4 — Secret Leak Detection

4.1 Detection Scope (secret-patterns.ts)

In the after_tool_call hook, scan tool output for 39 secret patterns:

CategoryExample Patterns
API Keyssk-, sk_live_, AIza..., SG.xxx, github_token
Private KeysBEGIN RSA PRIVATE KEY, BEGIN DSA PRIVATE KEY
Databasemysqldump, postgres://, mongodb://
AWSAKIA..., aws_secret
Cloud ServicesOCPassphrase, datocms, stripe

4.2 Handling

Leak detected → replace key content with [REDACTED] in output + print warning log. Command is not blocked (tool already returned, cannot undo).

4.3 URL & Env Var Redaction (url-redact.ts)

  • key=xxx parameters in URLs are auto-redacted
  • Secret values in environment variables are auto-redacted

CLI Commands

security-status

View current 4-layer status and Fast Lane counters:

policy-layer$ security-status
🛡️  Policy Layer v0.4.0 — Layers 1–4 Active ✅
Fast Lane:
  rm_recursive (counter=3/5)
  pipe_to_shell (counter=5/5 ✅ FAST LANE ACTIVE)

show-my-d-score [session]

View D' CBS details for current (or specified) session:

policy-layer$ show-my-d-score
D' Score:    0.72
Status:      NORMAL
Cycles:      20/20
Signals:
  success_rate:  0.95 (w=0.30) ✓
  tool_fail:     0.90 (w=0.25) ✓
  cbr_hit:       0.80 (w=0.20) ⚠
  severity_inv:  0.92 (w=0.25) ✓

policy-reset-fastlane [pattern]

Reset Fast Lane counters:

  • No args: reset all
  • With pattern: reset only that pattern

report-bad-result [reason]

User feedback loop. When a command passed Policy Layer but produced a bad result:

report-bad-result accidentally deleted node_modules

Effects:

  1. Last tool call's successfalse, severity600
  2. D' score drops (Agent is "penalized")
  3. Command pattern auto-added to USER_BLACKLIST_PATTERNS
  4. Persisted to ~/.openclaw/logs/blacklist.jsonl, auto-loaded on next startup

Deployment

File Reference

FilePurpose
src/index.tsPlugin entry: 3 hooks + 4 commands
src/security/*.ts9 security modules (Layer 1/3/4)
openclaw.plugin.jsonPlugin manifest
config/openclaw.jsonGateway config (deploys to ~/.openclaw/)
scripts/deploy.shOne-click deployment script

One-Click Deploy

cd ~/projects/policy-layer

# Dry run (no writes)
./scripts/deploy.sh --dry-run

# Actual deploy
./scripts/deploy.sh

# Verify
cat ~/.openclaw/exec-approvals.json | grep ask
openclaw logs --tail 20

Note: exec-approvals.json is no longer used. OpenClaw defaults to ask: "off" + security: "full", fully delegating to the Policy Layer plugin.


Configuration

In ~/.openclaw/openclaw.json:

{
  "plugins": {
    "entries": {
      "policy-layer": {
        "enabled": true,
        "config": {
          "reportToUser":    true,   // Agent proactively reports D' state in conversation
          "sensoriumWindow": 20,     // Cycle window size for D' tracking
          "dGateThreshold":  0.35,   // D' below this → LOW_ACCEPT
          "logLevel":       "info"  // debug / info / warn
        }
      }
    }
  }
}
ParameterDefaultDescription
reportToUsertrueAgent actively reports D' status in conversation; false = silent
sensoriumWindow20Rolling window size for D' calculation
dGateThreshold0.35D' below this triggers LOW_ACCEPT guidance
logLevelinfoLog verbosity level

Testing

cd ~/projects/policy-layer
npm test                  # 103 tests (61 unit + 42 integration)

Test Coverage

ModuleTestsCoverage
L1 normalize6ANSI/null/NFKC/trim
L1 patterns2316 CRITICAL + 7 HIGH/MEDIUM
L1 path6Traversal detection, valid paths
L3 fast-lane55-approval threshold, reset
L3 hook simulation13critical=block, benign=pass, multi-pattern
L4 secrets179 secret types, URL, env vars
Gateway2HTTP health, WebSocket
Total103100%

Analytics Dashboard

python3 docs/generate-analytics.py
open docs/approval-analytics.html

Dashboard features:

  • Left sidebar: Result counts (deny/escalate/approve/fast_lane) — click to filter
  • Donut chart: Result distribution
  • Bar chart: Top 8 patterns by frequency
  • Timeline: Hourly activity stacked by result type
  • Event table: Sortable, filterable event log (max 200 records)
  • Pattern drilldown: Each pattern's deny/escalate/approve/fast_lane breakdown

Data source: ~/.openclaw/logs/approval.jsonl (append-only JSONL) Auto-refresh: Every 30 seconds


Project Structure

policy-layer/
├── src/
│   ├── index.ts                    ← Plugin entry (3 hooks + 4 commands)
│   ├── sensorium-index.ts         ← D' CBS (standalone test version)
│   ├── sensorium-index.test.ts     ← 42 D' unit tests
│   └── security/
│       ├── normalize.ts             ← ANSI/null/NFKC normalization
│       ├── patterns.ts             ← 23 danger patterns + user blacklist
│       ├── path.ts                 ← Path traversal validation
│       ├── smart-review.ts         ← Ollama LLM review
│       ├── approval-log.ts         ← JSONL append log
│       ├── fast-lane.ts            ← 5-approval fast lane
│       ├── secret-patterns.ts      ← 39 secret patterns
│       ├── redact.ts               ← Secret redaction engine
│       └── url-redact.ts           ← URL + env var redaction
├── tests/
│   ├── unit/security.test.ts       ← 61 unit tests
│   └── integration/hook-simulation.test.ts  ← 42 integration tests
├── docs/
│   └── generate-analytics.py       ← HTML dashboard generator
├── tools/
│   └── query_approval.py           ← approval.jsonl query CLI
├── scripts/
│   └── deploy.sh                   ← Deployment script
├── openclaw.plugin.json
├── package.json
├── vitest.config.ts
└── README.md

Known Limitations

  1. Ollama unreachable behavior: Smart Review defaults to escalate when Ollama is down. This means HIGH/MEDIUM patterns trigger human approval whenever Ollama is unavailable. Ensure Ollama is running if you don't want mandatory approval.

  2. Fast Lane counter does not grow on 'escalate': Repeated escalations for the same pattern do not increment the Fast Lane counter. This is intentional — repeated escalations signal the pattern needs review, not Fast Lane bypass.

  3. Path traversal not yet fully wired: validatePath() exists but is not yet connected to before_tool_call for file-path arguments. Planned.


Phase 2: Security Learning via Memory-Recall

Goal

Enable the Agent to proactively query past security decisions for similar commands before executing tools.

Flow

User Input
  → memory-recall: extract 6w + category (LLM call)
  → LLM decides tool call
  → before_tool_call: Policy Layer verdict (programmatic)
      → matched_patterns + security_result already available
      → async write to LanceDB (no LLM call needed)
  → Command executes
  → after_tool_call: append to LanceDB

Payload Extension Fields

FieldSourceDescription
security_resultPolicy Layer verdictapprove / deny / escalate / fast_lane
matched_patternsdetectDangerousPatterns()List of triggered pattern labels
risk_severityDerivedcritical / high / medium / low

Implementation Steps

  1. LanceDB Writer — async writer in before_tool_call writing verdict records to dedicated LanceDB table (~/.policy-layer/verdicts.lance)
  2. Query Hook in before_prompt_build — before LLM decides on a tool, query LanceDB for similar intent's past verdicts, inject summary into prompt
  3. Leverage memory-recall infrastructure — reuse bge-m3 embedding + L2 FTS + L3 graph expansion, with isolated LanceDB namespace
  4. Historical data migrationtools/query_approval.py --export migrates approval.jsonl records to LanceDB

Why a Separate LanceDB?

  • Policy Layer verdict data has a different schema (command + patterns + verdict) vs memory-recall (6w + category + conversation context)
  • Isolation keeps memory-recall unchanged for other projects
  • Enables future embedding-based retrieval of similar past commands without LLM calls

Tech Stack

ComponentTechnology
Plugin FrameworkOpenClaw Gateway Plugin Hooks
LanguageTypeScript
TestingVitest (103 tests)
LLM ReviewOllama (llama3.3, local inference)
StorageJSONL (append log), LanceDB (Phase 2)
Embeddingbge-m3 (Phase 2 planned)
VisualizationNative HTML + CSS + JS (no build step)

源码与版本

源码仓库

regulus-joseph/policy-layer

打开仓库

源码提交

https://github.com/regulus-joseph/policy-layer

查看提交

安装命令

openclaw plugins install clawhub:policy-layer

元数据

  • 包名: policy-layer
  • 创建时间: 2026/05/16
  • 更新时间: 2026/05/16
  • 执行代码:
  • 源码标签: main

兼容性

  • 构建于 OpenClaw: 2026.5.0
  • 插件 API 范围: 0.4.0
  • 标签: latest
  • 文件数: 114