Optimizing PDF File Size with PyMuPDF: Three Essential Techniques

Harald Lieder·June 18, 2025

PyMuPDFOptimize PDF

In this article

1. Dead-Weight Removal
2. Font Subsetting
3. Advanced Image Compression
4. Advanced Save Options

In today’s fast-paced workflows, bulky PDFs can become a bottleneck — slowing down email attachments, consuming valuable storage, and frustrating mobile viewers. When high-resolution images, embedded fonts, and hidden metadata bloat your files, a targeted optimization strategy is essential. This article walks you through three core techniques — stripping metadata and other dead weight, image compression, and font subsetting — and shows how PyMuPDF’s straightforward API makes it easy to turn oversized documents into leaner and faster PDFs.

1. Dead-Weight Removal

Why it’s popular
PDFs tend to accumulate “dead weight” in the form of hidden metadata (author names, timestamps, revision histories), page thumbnails, embedded files, long annotation chains — and even stale form-field values. All that extra baggage not only inflates file size but can leak sensitive information.

Typical use cases

Publishing whitepapers or specifications publicly
Embedding PDFs in websites or apps
Stripping private data before distribution

How PyMuPDF helps
With one call to Document.scrub(), you can clean out everything you don’t need:

import pymupdf
doc = pymupdf.open("input.pdf")
doc.scrub(
   metadata=True,        # Clears basic metadata
   xml_metadata=True,    # Removes XML metadata
   attached_files=True,  # Deletes file attachments
   embedded_files=True,  # Deletes embedded files
   thumbnails=True,      # Strips page thumbnails
   reset_fields=True,    # Reverts form fields to their defaults
   reset_responses=True, # Removes annotation replies
)
doc.ez_save("lean.pdf")

Here, scrub() wipes out unwanted objects, and ez_save() (pronounced “easy save”) guarantees that logically deleted content is physically purged from the output. The result is a smaller, privacy-safe PDF.

For optimum results, execute this method once only, immediately before saving the file.

For details on file saving, see section “4. Advanced Save Options”.

2. Font Subsetting

Why it’s popular
Embedding full font files — often tens or hundreds of kilobytes each — turns a simple PDF into a heavy download, especially when the document only uses a handful of characters.

Typical use cases

Creating multilingual manuals with large character sets
Creating rich-text annotations
Creating or Updating rich-text widgets (form fields)

How PyMuPDF helps
PyMuPDF can automatically subset embedded fonts, keeping only the glyphs actually used for each font:

doc.subset_fonts()
doc.ez_save("output.pdf")

This process slashes font-related overhead without sacrificing visual fidelity.

Important

Execute this method once only, immediately before saving the file.

3. Advanced Image Compression

Why it’s popular
High-resolution images are often the single biggest contributor to PDF bloat. A few 300 DPI photos can add tens of megabytes — killing upload speeds, clogging inboxes, and frustrating mobile users.

Typical use cases

Emailing slide decks, product catalogs, or brochures
Publishing lightweight PDFs for mobile apps
Archiving scanned documents on space-restricted drives

How PyMuPDF helps
PyMuPDF’s Document.rewrite_images() gives you pixel-level control —
downsampling, recompressing or converting to grayscale:

import pymupdf
doc = pymupdf.open("input.pdf")
doc.rewrite_images(
    dpi_threshold=100,   # only process images above 100 DPI
    dpi_target=72,       # downsample to 72 DPI
    quality=60,          # JPEG quality level
    lossy=True,          # include / exclude lossy images
    lossless=True,       # include / exclude lossless images
    bitonal=True,        # include / exclude monochrome images
    color=True,          # include / exclude colored images
    gray=True,           # include / exclude gray-scale images
    set_to_gray=True,    # convert to gray-scale before conversion
)
doc.ez_save("compressed_images.pdf")

In this example, every image with more than 100 DPI becomes a 72 DPI gray-scale JPEG at 60% quality — often cutting image size by 70–90%.

Absolute Minimum Size
If you truly don’t need images, you can remove them entirely via redaction annotations:

for page in doc:
    page.add_redact_annot(page.rect)
    page.apply_redactions(
        images=pymupdf.PDF_REDACT_IMAGE_REMOVE,            # remove images
        graphics=pymupdf.pymupdf.PDF_REDACT_LINE_ART_NONE, # don't touch graphics
        text=pymupdf.PDF_REDACT_TEXT_NONE,                 # don't touch text
    )
doc.ez_save("images_stripped.pdf")

Here, redaction annotations purge all page images, leaving only text and vector graphics behind.

4. Advanced Save Options

All the “scrubbing”, image-downsampling, and font-subsetting you did so far, only happened in memory. Without physically purging unreferenced objects (“ghosts”) and compressing the PDF’s internal streams, your file would remain just as bulky.

PyMuPDF’s Document.save() parameters can trigger garbage collection and compression at write-time:

garbage=3 De-duplicates and removes all objects no longer referenced.
deflate=True Applies zlib compression to any uncompressed streams (images, fonts, etc.).
use_objstms=True converts text-based PDF object definitions into streams that can be compressed, often additionally cutting 25%+ off size.

doc.save(
    "output.pdf",
    garbage=3,       # de-duplicate and drop unreferenced objects
    deflate=True,    # zlib-compress any loose streams
    use_objstms=True # convert text objects into compressible streams
)
# Or simply:
doc.ez_save("output.pdf")

Method ez_save() applies those options under the hood, ensuring your on-disk PDF truly reflects the optimizations you’ve made.

Conclusion

Combining dead-weight removal, image optimization, and font subsetting turns oversized PDFs into sleek, lean documents — ideal for email, mobile apps, and web publishing. PyMuPDF’s straightforward API puts these powerful techniques at your fingertips, so you can focus on your content rather than delivery constraints. Ready to supercharge your PDF workflow? Dive into the PyMuPDF documentation and start trimming!

Learn more

Share your projects and connect with others on the PyMuPDF Forum.