A powerful Model Context Protocol server for LLM-driven PDF document analysis and exploration
A Model Context Protocol (MCP) server for investigating PDF object trees with lazy loading support. This tool allows LLMs to efficiently explore PDF document structure without overwhelming token limits.
- Lazy Loading: Explore PDF structure without loading entire object trees
- Path Navigation: Navigate through PDF objects using dot notation (e.g.,
Pages.Kids.0
) - Selective Resolution: Resolve specific indirect objects on demand
- Token Efficient: Massive reduction in response sizes compared to full tree dumps
- Type Safe: Comprehensive type hints and error handling
You'll need python
and nodejs
installed on your machine. You can optionally use asdf
.
Finally install required tools with:
git clone https://github.com/PSPDFKit/nutrient-pdf-mcp-server.git
cd nutrient-pdf-mcp-server
asdf install
# Install pipx for Python
python -m pip install --user pipx
Proceed with the rest of the installation after that.
git clone https://github.com/PSPDFKit/nutrient-pdf-mcp-server.git
cd nutrient-pdf-mcp-server
make install-dev # Sets up development environment
Recommended: Build and Install
pip install build
make build
pipx install dist/nutrient_pdf_mcp-1.0.0-py3-none-any.whl
claude mcp add nutrient-pdf-mcp nutrient-pdf-mcp
If using asdf
, you might need to configure pipx
with the following before running:
export PIPX_DEFAULT_PYTHON=$(asdf which python)
pipx install dist/nutrient_pdf_mcp-1.0.0-py3-none-any.whl
Development Mode
make install-dev
claude mcp add nutrient-pdf-mcp "$(pwd)/venv/bin/python" -m pdf_mcp.server
{
"mcpServers": {
"nutrient-pdf-mcp": {
"command": "python",
"args": ["-m", "pdf_mcp.server"]
}
}
}
Nutrient PDF MCP Server - Get JSON representation of PDF object tree with lazy loading.
Parameters:
pdf_path
(required): Path to the PDF fileobject_id
(optional): Specific object ID to retrieve (e.g., '1 0')path
(optional): Object path to navigate (e.g., 'Pages.Kids.0')mode
(optional): Parsing mode - 'lazy' (default) or 'full'
Examples:
{
"pdf_path": "document.pdf",
"mode": "lazy"
}
{
"pdf_path": "document.pdf",
"path": "Pages.Kids.0",
"mode": "lazy"
}
Nutrient PDF MCP Server - Resolve a specific indirect object by its object and generation numbers.
Parameters:
pdf_path
(required): Path to the PDF fileobjnum
(required): PDF object number (e.g., 3)gennum
(optional): PDF generation number (defaults to 0)depth
(optional): Resolution depth - 'shallow' (default) or 'deep'
Examples:
{
"pdf_path": "document.pdf",
"objnum": 3,
"gennum": 0,
"depth": "shallow"
}
# Run the server
make serve
# Or run with debug logging
make serve-debug
parser.py
: Main PDF parsing logic with lazy loading supportserver.py
: MCP server implementationtypes.py
: Type definitions for PDF objects and responsesexceptions.py
: Custom exception classes
All PDF objects are serialized into a consistent JSON format:
{
"type": "dict",
"value": {
"/Type": { "type": "name", "value": "/Pages" },
"/Kids": {
"type": "array",
"value": [{ "type": "indirect_ref", "objnum": 2, "gennum": 0 }]
}
}
}
The lazy loading system provides massive token savings:
- Lazy mode: ~5-50 lines (minimal tokens)
- Shallow resolution: ~50-100 lines (reasonable tokens)
- Deep resolution: 500+ lines (use sparingly)
- Get overview:
get_pdf_object_tree(path="document.pdf", mode="lazy")
- Navigate to pages:
get_pdf_object_tree(path="document.pdf", path="Pages", mode="lazy")
- Resolve specific page:
resolve_indirect_object(objnum=3, gennum=0, depth="shallow")
- Deep dive when needed:
resolve_indirect_object(objnum=3, gennum=0, depth="deep")
"Pages"
- Navigate to Pages object"Pages.Kids"
- Get Kids array from Pages"Pages.Kids.0"
- Get first page"Pages.Kids.0.MediaBox.2"
- Get width from MediaBox array
# Set up development environment
make install-dev
# Run all quality checks (format, lint, typecheck, test)
make quality
# Or run individual commands
make test # Run tests
make format # Format code
make lint # Run linter
make typecheck # Type checking
nutrient-pdf-mcp-server/
├── pdf_mcp/
│ ├── __init__.py
│ ├── server.py # MCP server
│ ├── parser.py # PDF parsing logic
│ ├── types.py # Type definitions
│ └── exceptions.py # Custom exceptions
├── tests/ # Test suite
├── res/ # Sample PDFs
├── pyproject.toml # Project configuration
└── README.md
# Build the package
make build
# Upload to test PyPI first
twine upload --repository testpypi dist/*
# Upload to production PyPI
twine upload dist/*
After publishing, users can install with:
pipx install nutrient-pdf-mcp
# or
pip install --user nutrient-pdf-mcp
- Fork the repository
- Create a feature branch
- Make your changes with tests
- Ensure code quality checks pass
- Submit a pull request
MIT License - see LICENSE file for details.