Automating PDF Form Filling and Flattening with PyMuPDF

Harald Lieder·June 2, 2025

PyMuPDFConversionForms

In this article

Introduction: Why This Matters
Why Automate PDF Form Processing?
How PyMuPDF Helps
Implementation: Filling and Flattening a PDF Using CSV Input
- Step 1: Preparing a CSV File
  - Example CSV (form_data.csv)
- Step 2: Automating PDF Processing
Choosing the Right Flattening Method
Conclusion

Introduction: Why This Matters

If you regularly work with forms—whether tax documents, contracts, or HR paperwork—you know how tedious manual entry can be. Automating the process saves time, ensures accuracy, and guarantees consistency across documents.

But once a form is filled, should it remain editable? In many workflows, locking down form fields is crucial to prevent accidental edits, preserve data integrity, and secure signed content. That’s where flattening comes into play.

PyMuPDF lets you fill PDF forms programmatically and apply different flattening techniques, depending on your needs. Some methods preserve metadata for data extraction, while others ensure stability and zoom precision. Let’s explore both.

Why Automate PDF Form Processing?

PDF forms are everywhere, and automating their completion can simplify complex workflows. Some common use cases include:

✅ Workflow Automation – Quickly populating large volumes of documents with structured data.

✅ E-Signature & Compliance – Preventing unauthorized edits after signing.

✅ Government & Tax Forms – Standardizing form completion while maintaining exportable records.

Once filled, forms shouldn’t remain interactive—but different flattening methods provide varying levels of control over accessibility and data preservation.

How PyMuPDF Helps

PyMuPDF makes it easy to process and finalize PDF forms, offering several methods of flattening:

Setting Fields to Read-Only

✅ Prevents user edits but still allows programmatic extraction of field data (e.g., exporting to CSV).

✅ Ideal for workflows that require maintaining form metadata for record-keeping.

Using bake() to Embed Form Data into Page Content

✅ Converts fields into static page text while keeping the PDF fully searchable.

✅ Ensures precision in rendering across different zoom levels, maintaining layout stability.

Rendering the PDF as Images

✅ Completely eliminates interactivity, locking down the content.

❌ Removes searchability, making text extraction impossible.

Each method serves different purposes—some prioritize data accessibility, while others focus on permanent document structure.

Implementation: Filling and Flattening a PDF Using CSV Input

Step 1: Preparing a CSV File

Using a CSV file simplifies automation, ensuring structured field-value mapping.

Example CSV (form_data.csv)

field_name,value
Name,John Doe
Date,01/06/2025
Address,123 Elm Street
Consent,Yes
Choice,Option A

Step 2: Automating PDF Processing

This script iterates over all pages, fills fields from CSV data, and applies proper flattening.

import pymupdf 
import csv

# Load the PDF
doc = pymupdf.open("fillable_form.pdf")

# Read data from CSV into a Python dictionary
data = {}
with open("form_data.csv", newline='', encoding='utf-8') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        data[row["field_name"]] = row["value"]

# Iterate over all pages and process fields
for page in doc:  
    for field in page.widgets():  
        if field.field_name in data:
            value = data[field.field_name]
            
            # Handling checkboxes
            if field.field_type == pymupdf. PDF_WIDGET_TYPE_CHECKBOX:
                if value == field.on_state():
                    field.field_value = value
                else:
                    field.field_value = False 
  
            # Handling radio buttons
            elif field.field_type == pymupdf. PDF_WIDGET_TYPE_RADIOBUTTON:
                if value == field.on_state():
                    field.field_value = value  
            
            # Handling other fields types
            else:
                field.field_value = value  

# Choose the appropriate flattening method:

# Option 1: Set fields to read-only
for page in doc:
    for field in page.widgets():
        field.flags |= pymupdf. PDF_FIELD_IS_READ_ONLY  

# Option 2: Flatten using bake(): convert fields to searchable text
doc.bake()

# Save the modified PDF
doc.save("flattened_form.pdf")

# Option 3: Flatten by converting all pages to images

# Open a new empty PDF to receive the page images
imaged = pymupdf.open()

# Render each page into an image 
for page in doc:  
    width, height = page.rect.br

    # Make output page with same size
    img_page = imaged.new_page(width=width, height=height)

    # Render input page with desired resolution
    pix = page.get_pixmap(dpi=300)

    # Insert image into output page
    img_page.insert_image(page.rect, pixmap=pix)

# Save imaged PDF making sure to compress the images
imaged.save("imaged_form.pdf", deflate=True)

Choosing the Right Flattening Method

So, which method should you use?

✔ Need to export field data later? → Use read-only flags so fields remain accessible.

✔ Want to remove interactive features with zoom stability? → Use bake() to embed fields into the page but keep it searchable.

✔ Completely eliminate even programmatic changes? → Convert the document into static images. This also removes the ability to search or extract text.

Conclusion

With PyMuPDF, automating form filling and flattening is fast, efficient, and flexible. You can:

✅ Populate forms dynamically using a CSV file.

✅ Choose the right flattening method for accessibility, security, or precision.

✅ Maintain searchability and document stability with `bake()`.

Whether securing legal forms, processing tax records, or automating HR paperwork—this approach streamlines document management while preserving control over how data is stored and accessed.

Want to try it? Just plug in your own PDF and CSV, and start automating!