Nutrient PDF MCP Server

A powerful Model Context Protocol server for LLM-driven PDF document analysis and exploration

A Model Context Protocol (MCP) server for investigating PDF object trees with lazy loading support. This tool allows LLMs to efficiently explore PDF document structure without overwhelming token limits.

Features

Lazy Loading: Explore PDF structure without loading entire object trees
Path Navigation: Navigate through PDF objects using dot notation (e.g., Pages.Kids.0)
Selective Resolution: Resolve specific indirect objects on demand
Token Efficient: Massive reduction in response sizes compared to full tree dumps
Type Safe: Comprehensive type hints and error handling

Installation

Optional `asdf` setup

You'll need python and nodejs installed on your machine. You can optionally use asdf.

Finally install required tools with:

git clone https://github.com/PSPDFKit/nutrient-pdf-mcp-server.git
cd nutrient-pdf-mcp-server
asdf install

# Install pipx for Python
python -m pip install --user pipx

Proceed with the rest of the installation after that.

Quick Start

git clone https://github.com/PSPDFKit/nutrient-pdf-mcp-server.git
cd nutrient-pdf-mcp-server
make install-dev  # Sets up development environment

For Claude Code CLI

Recommended: Build and Install

pip install build
make build
pipx install dist/nutrient_pdf_mcp-1.0.0-py3-none-any.whl
claude mcp add nutrient-pdf-mcp nutrient-pdf-mcp

If using asdf, you might need to configure pipx with the following before running:

export PIPX_DEFAULT_PYTHON=$(asdf which python)
pipx install dist/nutrient_pdf_mcp-1.0.0-py3-none-any.whl

Development Mode

make install-dev
claude mcp add nutrient-pdf-mcp "$(pwd)/venv/bin/python" -m pdf_mcp.server

Manual Configuration

{
  "mcpServers": {
    "nutrient-pdf-mcp": {
      "command": "python",
      "args": ["-m", "pdf_mcp.server"]
    }
  }
}

Available Tools

`get_pdf_object_tree`

Nutrient PDF MCP Server - Get JSON representation of PDF object tree with lazy loading.

Parameters:

pdf_path (required): Path to the PDF file
object_id (optional): Specific object ID to retrieve (e.g., '1 0')
path (optional): Object path to navigate (e.g., 'Pages.Kids.0')
mode (optional): Parsing mode - 'lazy' (default) or 'full'

Examples:

{
  "pdf_path": "document.pdf",
  "mode": "lazy"
}

{
  "pdf_path": "document.pdf",
  "path": "Pages.Kids.0",
  "mode": "lazy"
}

`resolve_indirect_object`

Nutrient PDF MCP Server - Resolve a specific indirect object by its object and generation numbers.

Parameters:

pdf_path (required): Path to the PDF file
objnum (required): PDF object number (e.g., 3)
gennum (optional): PDF generation number (defaults to 0)
depth (optional): Resolution depth - 'shallow' (default) or 'deep'

Examples:

{
  "pdf_path": "document.pdf",
  "objnum": 3,
  "gennum": 0,
  "depth": "shallow"
}

Command Line Usage

# Run the server
make serve

# Or run with debug logging
make serve-debug

Architecture

Core Components

parser.py: Main PDF parsing logic with lazy loading support
server.py: MCP server implementation
types.py: Type definitions for PDF objects and responses
exceptions.py: Custom exception classes

Response Types

All PDF objects are serialized into a consistent JSON format:

{
  "type": "dict",
  "value": {
    "/Type": { "type": "name", "value": "/Pages" },
    "/Kids": {
      "type": "array",
      "value": [{ "type": "indirect_ref", "objnum": 2, "gennum": 0 }]
    }
  }
}

Token Efficiency

The lazy loading system provides massive token savings:

Lazy mode: ~5-50 lines (minimal tokens)
Shallow resolution: ~50-100 lines (reasonable tokens)
Deep resolution: 500+ lines (use sparingly)

Examples

Exploring PDF Structure

Get overview: get_pdf_object_tree(path="document.pdf", mode="lazy")
Navigate to pages: get_pdf_object_tree(path="document.pdf", path="Pages", mode="lazy")
Resolve specific page: resolve_indirect_object(objnum=3, gennum=0, depth="shallow")
Deep dive when needed: resolve_indirect_object(objnum=3, gennum=0, depth="deep")

Path Navigation Examples

"Pages" - Navigate to Pages object
"Pages.Kids" - Get Kids array from Pages
"Pages.Kids.0" - Get first page
"Pages.Kids.0.MediaBox.2" - Get width from MediaBox array

Development

Quick Start

# Set up development environment
make install-dev

# Run all quality checks (format, lint, typecheck, test)
make quality

# Or run individual commands
make test          # Run tests
make format        # Format code
make lint          # Run linter
make typecheck     # Type checking

Project Structure

nutrient-pdf-mcp-server/
├── pdf_mcp/
│   ├── __init__.py
│   ├── server.py          # MCP server
│   ├── parser.py          # PDF parsing logic
│   ├── types.py           # Type definitions
│   └── exceptions.py      # Custom exceptions
├── tests/                 # Test suite
├── res/                   # Sample PDFs
├── pyproject.toml         # Project configuration
└── README.md

Publishing to PyPI

# Build the package
make build

# Upload to test PyPI first
twine upload --repository testpypi dist/*

# Upload to production PyPI
twine upload dist/*

After publishing, users can install with:

pipx install nutrient-pdf-mcp
# or
pip install --user nutrient-pdf-mcp

Contributing

Fork the repository
Create a feature branch
Make your changes with tests
Ensure code quality checks pass
Submit a pull request

License

MIT License - see LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.vscode		.vscode
pdf_mcp		pdf_mcp
tests		tests
.gitignore		.gitignore
.tool-versions		.tool-versions
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Nutrient PDF MCP Server

Features

Installation

Optional `asdf` setup

Quick Start

For Claude Code CLI

Manual Configuration

Available Tools

`get_pdf_object_tree`

`resolve_indirect_object`

Command Line Usage

Architecture

Core Components

Response Types

Token Efficiency

Examples

Exploring PDF Structure

Path Navigation Examples

Development

Quick Start

Project Structure

Publishing to PyPI

Contributing

License

Related Projects

About

Uh oh!

Contributors 2

Uh oh!

Languages

License

PSPDFKit/nutrient-pdf-mcp-server

Folders and files

Latest commit

History

Repository files navigation

Nutrient PDF MCP Server

Features

Installation

Optional asdf setup

Quick Start

For Claude Code CLI

Manual Configuration

Available Tools

get_pdf_object_tree

resolve_indirect_object

Command Line Usage

Architecture

Core Components

Response Types

Token Efficiency

Examples

Exploring PDF Structure

Path Navigation Examples

Development

Quick Start

Project Structure

Publishing to PyPI

Contributing

License

Related Projects

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors 2

Uh oh!

Languages

Optional `asdf` setup

`get_pdf_object_tree`

`resolve_indirect_object`