@zmlgit

Task Watchdog

Watchdog plugin: notify parent agent on subagent failure / exec abnormal exit via next-turn injection

Current version
v1.4.0
code-pluginCommunitysource-linked

openclaw-task-watchdog

npm version License: MIT OpenClaw Plugin

OpenClaw Task Watchdog Plugin — Auto-notify on subagent failures, exec errors, and stale tasks.

中文说明


Why This Plugin?

OpenClaw excels at dispatching subagents and running long tasks via exec. But there's a gap:

Pain PointWhat Happens
Silent failuresA subagent crashes or times out, but the parent session never finds out
Forgotten tasksAn exec command exits with error code 137 (OOM) — nobody notices
Stale jobsA background task has been "running" for 45 minutes with no progress
Manual checkingUsers repeatedly ask "is it done yet?" instead of getting proactive alerts

Task Watchdog bridges this gap by monitoring task lifecycle events and injecting timely notifications into the parent session — so you always know when something needs attention.

Architecture

┌─────────────────────────────────────────────────────────┐
│                    OpenClaw Gateway                      │
│                                                         │
│  ┌──────────────┐    ┌──────────────────────────────┐   │
│  │  Subagent A  │    │       Task Watchdog          │   │
│  │  (running)   │    │                              │   │
│  └──────┬───────┘    │  Hooks:                      │   │
│         │            │  ├─ subagent_ended ──────────►│───┼──► notify parent
│  ┌──────▼───────┐    │  ├─ after_tool_call (exec) ─►│───┼──► notify session
│  │  Subagent B  │    │  ├─ heartbeat_prompt ───────►│───┼──► stale check
│  │  (failed!)   │    │  └─ gateway_start ──────────►│───┼──► timer patrol
│  └──────────────┘    │                              │   │
│                      │  Features:                   │   │
│  ┌──────────────┐    │  • Idempotency guard         │   │
│  │  exec cmd    │    │  • Safe truncation           │   │
│  │  (OOM kill!) │    │  • Circular-ref safe JSON    │   │
│  └──────────────┘    │  • Timer cleanup on stop     │   │
│                      └──────────────────────────────┘   │
└─────────────────────────────────────────────────────────┘

Features

HookWhat it does
subagent_endedDetects abnormal subagent outcomes (error, timeout, killed, reset, deleted) and notifies the parent session. Sends continuation reminders on normal completions.
after_tool_call (exec)Watches for abnormal exec exits — non-zero exit codes, OOM kills, signals, permission denied, command not found.
heartbeat_prompt_contributionWhen timer patrol is off, injects patrol instructions into heartbeat cycles to check for stale running tasks.
gateway_startStarts a timer-based patrol that periodically requests heartbeats to trigger stale-task checks.
message_receivedRecords user message timestamps for silence detection. Resets consecutive tool call counter.
before_agent_replyResets consecutive tool call counter and clears silence timer when agent replies.

Design Principles

  • Deadlock-safe: Uses in-process API calls instead of spawning CLI commands
  • Idempotent: Each notification uses an idempotencyKey to prevent duplicates
  • Zero-config: Works out of the box with sensible defaults
  • Memory-safe: Idempotency map capped at 10,000 entries with TTL-based eviction

Silence Detection

The plugin detects two types of agent silence:

  1. Consecutive tool calls without reply: If the agent calls more than consecutiveToolCallThreshold tools in a row without replying to the user, a nudge is injected (once per minute per session).
  2. User message timeout: If a user sends a message but doesn't receive a reply within silenceThresholdMs, a silence nudge is triggered during the next timer patrol cycle.

Installation

# Via OpenClaw plugin install
openclaw plugin install openclaw-task-watchdog

# Via npm
npm install openclaw-task-watchdog

Configuration

All settings are optional. Configure via openclaw.plugin.jsonconfig:

FieldTypeDefaultDescription
subagentNotifyOnstring[]["error", "timeout", "killed"]Subagent outcomes that trigger notifications. Options: error, timeout, killed, reset, deleted
execNotifyOnAbnormalbooleantrueEnable notifications on abnormal exec exits
injectionTtlMsinteger300000 (5 min)TTL for next-turn injection messages (5000–600000 ms)
timerPatrolbooleantrueEnable timer-based patrol on gateway start
heartbeatPatrolbooleanfalseEnable heartbeat-based patrol (only when timerPatrol is disabled)
timerPatrolIntervalMsinteger120000 (2 min)Timer patrol interval (30000–600000 ms)
staleThresholdMsinteger1800000 (30 min)How long before a task is considered stale (60000–7200000 ms)
consecutiveToolCallThresholdinteger5Number of consecutive tool calls without a reply before triggering a nudge (2–20)
subagentConsecutiveThresholdinteger15Consecutive tool call threshold for subagent sessions. Defaults to consecutiveToolCallThreshold * 3 if not set
silenceThresholdMsinteger180000 (3 min)How long after a user message without reply before triggering a silence nudge (60000–1800000 ms)

Example:

{
  "task-watchdog": {
    "subagentNotifyOn": ["error", "timeout", "killed", "reset"],
    "timerPatrolIntervalMs": 180000,
    "staleThresholdMs": 900000
  }
}

Development

npm install
npx tsc          # build
npx tsc --watch  # dev mode

Changelog

See CHANGELOG.md for version history.

License

MIT © zml


中文说明

解决什么问题?

OpenClaw 通过子 agent 或 exec 执行长任务时,失败可能被忽略:

痛点表现
子 agent 崩溃或超时父 session 不知道,继续等待
exec 命令被 OOM kill无人发现,任务停滞
后台任务停滞运行了 45 分钟没有进展
手动检查用户反复问"做完了吗?"

Task Watchdog 监控任务生命周期,自动注入通知。

安装

openclaw plugin install openclaw-task-watchdog

开发

npm install && npx tsc

Source and release

Source repository

zmlgit/openclaw-task-watchdog

Open repo

Source commit

cc072c17036f89a3a2f97e0640947908b978f2a9

View commit

Install command

openclaw plugins install clawhub:openclaw-task-watchdog

Metadata

  • Package: openclaw-task-watchdog
  • Created: 2026/05/13
  • Updated: 2026/05/13
  • Executes code: Yes
  • Source tag: cc072c17036f89a3a2f97e0640947908b978f2a9

Compatibility

  • Built with OpenClaw: 2026.5.7
  • Plugin API range: >=2026.3.24-beta.2
  • Tags: latest
  • Files: 14