New Innovation from Artifex

PyMuPDF Layout

10× faster PDF parsing with layout analysis. Trained on structure, not images. CPU-only.

Find Out More
Reading the Document's
DNA, Not Just Its Picture

We read PDF structure (fonts, spacing, positions, etc) directly, then use a Graph Neural Network to understand the patterns.

CPU Processing,
GPU-Level Accuracy

While VLMs burn through expensive compute just to recognize titles and tables, our CPU-based approach handles all the structure extraction.

Built on Decades
of PDF Knowledge

Decades of PDF expertise meeting real-world AI demands, engineered for modern workflows.

AI Solutions for Modern Document Processing

We empower your AI solutions with fast, efficient & precise document processing and data extraction.

  • Reduce hallucinations

  • High-fidelity data extraction

  • Perfect for RAG & LLM environments

Empowering Innovators withAI-Driven Solutions

PyMuPDF LLMPyMuPDF4LLM

The SDK Developers Need for AI Pipelines

Our PyMuPDF4LLM SDK integrates seamlessly with Hugging Face, LangChain, and LlamaIndex, simplifying document processing. With powerful tools, focus on building AI apps without the hassle of data extraction.

Our Extraction Features

  • Support for multi-column pages & tables

  • Support for image and vector graphics extraction (and inclusion of references in the MD text)

  • Support for page chunking output

  • Direct support for output as LlamaIndex Documents

Fast, Chunked, and Reliable Data Extraction

Our data extraction is designed for speed and efficiency, delivering results in chunks with dependable per-page markdown. Get structured, accurate data without delays.

1from langchain_community.document_loaders import PyMuPDFLoader
2
3# Load the PDF file
4pdf_path = "example.pdf"  # Replace with your actual PDF file
5loader = PyMuPDFLoader(pdf_path)
6
7# Load and extract document data
8documents = loader.load()
9
10# Print extracted text from each page
11for i, doc in enumerate(documents):
12    print(f"Page {i+1}:\n")
13    print(doc.page_content[:1000])  # Print first 1000 characters

Learn How Our AI Solutions will Help You

Unleash More With a License

Enjoy the freedom to customize, distribute, and scale without limits. Upgrade to a commercial license and make our product truly yours.

Unlimited distribution without requirements

No need to disclose your code

Technical support available

Not Just SDKs, Artifex Also Provides AI-Powered Solutions for Everyone

AI-Powered Invoice Parsing for Effortless PDF Automation

Our AI Invoice Parser API streamlines document processing for non-developers. Easily connect with Make, Zapier, and 7,000+ other tools. No coding required.