Nutrient PDF Plugin for OpenClaw
Explicit, on-demand Nutrient PDF extraction for OpenClaw — structured Markdown output with tables, headings, and reading order preserved.

What this plugin does
It adds an explicit Nutrient extraction surface you can call on demand:
nutrient_pdf_extract— an agent tool to extract a specific PDF to structured Markdownopenclaw nutrient-pdf extract <file.pdf>— a CLI command for direct extraction from your terminalopenclaw nutrient-pdf status— check CLI availability and version
Use it when you want Nutrient's table and heading fidelity on a particular document, requested explicitly by the agent or from the command line.
What it does not do
It does not change OpenClaw's built-in pdf tool. As of OpenClaw 2026.6, the built-in tool does its own extraction through the bundled document-extract plugin (the clawpdf engine), and OpenClaw does not currently expose a hook for an external plugin to substitute its own extractor there. So this plugin is a supplement for explicit extraction, not a drop-in replacement for the default engine.
Note for users on OpenClaw 2026.4 – 2026.5: earlier versions had an
agents.defaults.pdfExtraction.enginesetting that routed the built-in tool through Nutrient. That configuration was removed in 2026.6 when extraction moved into the bundleddocument-extractplugin. This plugin no longer references it.
Why Nutrient
Plain-text PDF extractors produce word soup: they score 0.000 on table structure and 0.000 on heading preservation across 200 real documents (measured against the historical pdfjs default).
When an agent asks "what's in row 3, column 4?" it needs structure, not a flat text dump. Nutrient produces Markdown with proper table rows and columns that agents can look up directly.

Benchmark (200 documents, opendataloader-bench)
| Metric | pdfjs | Nutrient | Change |
|---|---|---|---|
| Overall accuracy | 0.578 | 0.880 | +52% |
| Table structure | 0.000 | 0.662 | -- |
| Heading fidelity | 0.000 | 0.811 | -- |
| Reading order | 0.871 | 0.924 | +6% |
Scored with NID (reading order), TEDS (table structure), and MHS (heading fidelity), versus the historical pdfjs default extractor.
Install
openclaw plugins install @nutrient-sdk/openclaw-nutrient-pdf
Verify the bundled pdf-to-markdown CLI is reachable:
openclaw nutrient-pdf status
Then use the tool from an agent, or extract directly:
openclaw nutrient-pdf extract ./report.pdf
Configuration
Optional settings under plugins.entries.nutrient-pdf.config. These affect only this plugin's tool and CLI:
{
plugins: {
entries: {
"nutrient-pdf": {
config: {
command: "pdf-to-markdown", // path to the CLI binary (auto-resolves by default)
timeoutMs: 30000, // extraction timeout per document
}
}
}
}
}
All processing runs locally. No cloud uploads, no API keys.
Free tier
The pdf-to-markdown CLI includes 1,000 free documents per month. See nutrient.io for higher-volume licensing.
Links
License
MIT -- see LICENSE for details and third-party dependency notice.