@nutrient-sdk

Nutrient PDF Plugin for OpenClaw

Explicit Nutrient-powered PDF extraction tool and CLI for OpenClaw. Adds an on-demand nutrient_pdf_extract tool and an openclaw nutrient-pdf CLI for high-fidelity table, heading, and reading-order extraction.

Current version
v2026.6.3
code-pluginCommunitysource-linked

Nutrient PDF Plugin for OpenClaw

Explicit, on-demand Nutrient PDF extraction for OpenClaw — structured Markdown output with tables, headings, and reading order preserved.

Table comparison: pdfjs word soup vs Nutrient structured markdown

What this plugin does

It adds an explicit Nutrient extraction surface you can call on demand:

  • nutrient_pdf_extract — an agent tool to extract a specific PDF to structured Markdown
  • openclaw nutrient-pdf extract <file.pdf> — a CLI command for direct extraction from your terminal
  • openclaw nutrient-pdf status — check CLI availability and version

Use it when you want Nutrient's table and heading fidelity on a particular document, requested explicitly by the agent or from the command line.

What it does not do

It does not change OpenClaw's built-in pdf tool. As of OpenClaw 2026.6, the built-in tool does its own extraction through the bundled document-extract plugin (the clawpdf engine), and OpenClaw does not currently expose a hook for an external plugin to substitute its own extractor there. So this plugin is a supplement for explicit extraction, not a drop-in replacement for the default engine.

Note for users on OpenClaw 2026.4 – 2026.5: earlier versions had an agents.defaults.pdfExtraction.engine setting that routed the built-in tool through Nutrient. That configuration was removed in 2026.6 when extraction moved into the bundled document-extract plugin. This plugin no longer references it.

Why Nutrient

Plain-text PDF extractors produce word soup: they score 0.000 on table structure and 0.000 on heading preservation across 200 real documents (measured against the historical pdfjs default).

When an agent asks "what's in row 3, column 4?" it needs structure, not a flat text dump. Nutrient produces Markdown with proper table rows and columns that agents can look up directly.

Benchmark scores: pdfjs vs Nutrient across 200 documents

Benchmark (200 documents, opendataloader-bench)

MetricpdfjsNutrientChange
Overall accuracy0.5780.880+52%
Table structure0.0000.662--
Heading fidelity0.0000.811--
Reading order0.8710.924+6%

Scored with NID (reading order), TEDS (table structure), and MHS (heading fidelity), versus the historical pdfjs default extractor.

Install

openclaw plugins install @nutrient-sdk/openclaw-nutrient-pdf

Verify the bundled pdf-to-markdown CLI is reachable:

openclaw nutrient-pdf status

Then use the tool from an agent, or extract directly:

openclaw nutrient-pdf extract ./report.pdf

Configuration

Optional settings under plugins.entries.nutrient-pdf.config. These affect only this plugin's tool and CLI:

{
  plugins: {
    entries: {
      "nutrient-pdf": {
        config: {
          command: "pdf-to-markdown",  // path to the CLI binary (auto-resolves by default)
          timeoutMs: 30000,            // extraction timeout per document
        }
      }
    }
  }
}

All processing runs locally. No cloud uploads, no API keys.

Free tier

The pdf-to-markdown CLI includes 1,000 free documents per month. See nutrient.io for higher-volume licensing.

Links

License

MIT -- see LICENSE for details and third-party dependency notice.

Source and release

Source repository

pspdfkit-labs/openclaw-nutrient-pdf

Open repo

Source commit

b5489e8ac74c9db45b248bdf7bcea58e4655965a

View commit

Install command

openclaw plugins install clawhub:@nutrient-sdk/openclaw-nutrient-pdf

Metadata

  • Package: @nutrient-sdk/openclaw-nutrient-pdf
  • Created: 2026/04/16
  • Updated: 2026/06/10
  • Executes code: Yes
  • Source tag: master

Compatibility

  • Built with OpenClaw: 2026.6.2
  • Plugin API range: >=2026.4.1
  • Tags: latest
  • Files: 23