How to Search and Replace Text in PDFs Using PyMuPDF

Jamie Lemon·July 29, 2025

PyMuPDFRedactionsText Manipulation
How to Search and Replace Text in PDFs

PDF manipulation has always been a challenging task for developers, but PyMuPDF makes it surprisingly straightforward. Whether you need to update company names, fix typos, or replace outdated information across multiple documents, PyMuPDF provides powerful tools for searching and replacing text in PDF files.

What is PyMuPDF?

PyMuPDF is a Python binding for MuPDF, a lightweight PDF toolkit. It's fast, memory-efficient, and offers comprehensive PDF manipulation capabilities including text extraction, rendering, and modification. Unlike some PDF libraries that create new documents, PyMuPDF can modify existing PDFs while preserving their structure and formatting.

Installation

First, install PyMuPDF using pip:

pip install PyMuPDF

Basic Text Search and Replace

Here's a simple example that demonstrates the core functionality:

import pymupdf

def search_and_replace_text(pdf_path, search_text, replace_text, output_path):
    # Open the PDF document
    doc = pymupdf.open(pdf_path)
    
    # Iterate through each page
    for page_num in range(len(doc)):
        page = doc[page_num]
        
        # Search for the text
        text_instances = page.search_for(search_text)
        
        # Replace each instance
        for inst in text_instances:
            # Get the rectangle containing the text
            rect = inst
            
            # Add a white rectangle to cover the old text
            page.draw_rect(rect, color=(1, 1, 1), fill=(1, 1, 1))
            
            # Insert the new text
            page.insert_text(rect.tl, replace_text, fontsize=12, color=(0, 0, 0))
    
    # Save the modified document
    doc.save(output_path)
    doc.close()

# Usage example
search_and_replace_text(
    "input.pdf", 
    "Hello World",
    "Goodbye!",
    "output.pdf"
)

Advanced Search and Replace with Better Formatting

The basic approach above works but doesn't preserve the original font formatting. Here's an improved version that attempts to match the original text properties:

import pymupdf

def advanced_search_replace(pdf_path, search_text, replace_text, output_path):
    doc = pymupdf.open(pdf_path)

    for page_num in range(len(doc)):
        page = doc[page_num]

        # Get text blocks with formatting information
        blocks = page.get_text("dict")

        for block in blocks["blocks"]:
            if "lines" in block:
                for line in block["lines"]:
                    for span in line["spans"]:
                        if search_text.lower() in span["text"].lower():
                            # Extract font information
                            font = span["font"]
                            size = span["size"]
                            flags = span["flags"]
                            color = span["color"]

                            # Get the bounding box
                            bbox = span["bbox"]
                            rect = pymupdf.Rect(bbox)

                            # Replace the text
                            updated_text = span["text"].replace(search_text, replace_text)

                            # Cover old text with white rectangle
                            page.draw_rect(rect, color=(1, 1, 1), fill=(1, 1, 1))

                            # Insert new text with original formatting

                            page.insert_text(
                                rect.tl,
                                updated_text,
                                fontsize=size,
                                color=color
                            )


    doc.save(output_path)
    doc.close()

# Usage example
advanced_search_replace(
    "input.pdf",
    "Hello World",
    "Goodbye!",
    "output.pdf"
)
Note

Font name has not been attempted here as this is a little more involved and requires matching the extracted font name against it reference ID. See the details in the documentation for fontname.

Handling Multiple Replacements

For bulk replacements, you can create a more flexible function that accepts a dictionary of search-replace pairs:

import pymupdf

def bulk_search_replace(pdf_path, replacements, output_path):
    """
    Replace multiple text strings in a PDF.

    Args:
        pdf_path: Path to input PDF
        replacements: Dictionary with search terms as keys and replacements as values
        output_path: Path for output PDF
    """
    doc = pymupdf.open(pdf_path)

    for page_num in range(len(doc)):
        page = doc[page_num]

        for search_text, replace_text in replacements.items():
            text_instances = page.search_for(search_text)

            for inst in text_instances:
                rect = inst
                page.draw_rect(rect, color=(1, 1, 1), fill=(1, 1, 1))
                page.insert_text(rect.tl, replace_text, fontsize=12)

    doc.save(output_path)
    doc.close()

# Usage example
replacements = {
    "Acme Corp": "Super Corp",
    "2023": "2024",
    "john@acme.com": "john@supercorp.com"
}

bulk_search_replace("input.pdf", replacements, "output.pdf")

Case-Insensitive Search

To perform case-insensitive searches, you'll need to handle the matching manually:

import pymupdf

def case_insensitive_replace(pdf_path, search_text, replace_text, output_path):
    doc = pymupdf.open(pdf_path)

    for page_num in range(len(doc)):
        page = doc[page_num]

        # Get all text on the page
        text_dict = page.get_text("dict")

        for block in text_dict["blocks"]:
            if "lines" in block:
                for line in block["lines"]:
                    for span in line["spans"]:
                        original_text = span["text"]

                        # Case-insensitive search
                        if search_text.lower() in original_text.lower():
                            # Find all occurrences (case-insensitive)
                            import re
                            pattern = re.compile(re.escape(search_text), re.IGNORECASE)
                            new_text = pattern.sub(replace_text, original_text)

                            if new_text != original_text:
                                bbox = span["bbox"]
                                rect = pymupdf.Rect(bbox)

                                # Replace text
                                page.draw_rect(rect, color=(1, 1, 1), fill=(1, 1, 1))
                                page.insert_text(
                                    rect.tl,
                                    new_text,
                                    fontsize=span["size"],
                                    color=span["color"]
                                )

    doc.save(output_path)
    doc.close()

# Usage example
case_insensitive_replace(
    "input.pdf",
    "HeLlo WoRlD",
    "Goodbye!",
    "output.pdf"
)

Regular Expression Support

For more complex pattern matching, you can use regular expressions:

import pymupdf
import re

def regex_replace(pdf_path, pattern, replacement, output_path):
    """
    Replace text using regular expressions.

    Args:
        pattern: Regular expression pattern to search for
        replacement: Replacement string (can include group references like \1, \2)
    """
    doc = pymupdf.open(pdf_path)
    compiled_pattern = re.compile(pattern)

    for page_num in range(len(doc)):
        page = doc[page_num]
        text_dict = page.get_text("dict")

        for block in text_dict["blocks"]:
            if "lines" in block:
                for line in block["lines"]:
                    for span in line["spans"]:
                        original_text = span["text"]
                        new_text = compiled_pattern.sub(replacement, original_text)

                        if new_text != original_text:
                            bbox = span["bbox"]
                            rect = pymupdf.Rect(bbox)

                            page.draw_rect(rect, color=(1, 1, 1), fill=(1, 1, 1))
                            page.insert_text(
                                rect.tl,
                                new_text,
                                fontsize=span["size"]
                            )

    doc.save(output_path)
    doc.close()

# Example: Replace all email addresses
regex_replace(
    "input.pdf",
    r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b',
    "email@hidden.com",
    "output.pdf"
)
Note

The above example will not replace emails which span over multiple lines - so double check your output for these edge cases!

Error Handling and Best Practices

Always include proper error handling in production code. The code below checks to see if the PDF requires a password and will throw an exception on errors.

import pymupdf

def safe_search_replace(pdf_path, search_text, replace_text, output_path):
    try:
        doc = pymupdf.open(pdf_path)

        if doc.is_encrypted:
            print("PDF is password protected")
            return False

        changes_made = False

        for page_num in range(len(doc)):
            page = doc[page_num]
            text_instances = page.search_for(search_text)

            if text_instances:
                changes_made = True
                for inst in text_instances:
                    rect = inst
                    page.draw_rect(rect, color=(1, 1, 1), fill=(1, 1, 1))
                    page.insert_text(rect.tl, replace_text, fontsize=12)
                    print("Text replacement made")

        if changes_made:
            doc.save(output_path)
            print(f"Successfully saved modified PDF to {output_path}")
        else:
            print(f"No instances of '{search_text}' found")

        doc.close()
        return True

    except Exception as e:
        print(f"Error processing PDF: {str(e)}")
        return False


# Usage example
safe_search_replace(
    "input.pdf",
    "Hello World",
    "Goodbye!",
    "output.pdf"
)

But Wait? What About the Replaced Text?

Thus far these examples have shown how to detect the rectangle where the found text is and then cover that area with a graphical rectangle (in white as we assume the PDF background is white!). However, we are just visually obscuring the text here! If extraction is performed on the PDF then the original text will be available to be read. Perhaps we want to completely remove this text before we replace it? If so then redactions should be used!

Redacting before Replacing

The following example removes the existing text and replaces it with our new data:

import pymupdf

def search_redact_and_replace_text(pdf_path, search_text, replace_text, output_path, fill_color=(1, 1, 1), text_color=(0, 0, 0), fontname="tiro", fontsize=14):
    # Open the PDF document
    doc = pymupdf.open(pdf_path)

    # Iterate through each page
    for page_num in range(len(doc)):
        page = doc[page_num]

        # Search for text instances
        text_instances = page.search_for(search_text)

        # Replace each instance
        for rect in text_instances:

            # Create redaction annotation
            redact_area = page.add_redact_annot(rect, text=replace_text,
                                              fill=fill_color, text_color=text_color, fontname=fontname, fontsize=fontsize)

            # Set additional properties
            redact_area.set_info(content=f"Redacted sensitive information")
            redact_area.update()

        page.apply_redactions()

    # Save the modified document
    doc.save(output_path)
    doc.close()

# Usage example
search_redact_and_replace_text(
    "input.pdf",
    "Hello World",
    "Goodbye!",
    "output.pdf"
)

This utilises the add redaction method in PyMuPDF and sets some defaults for the text options - however, it is up to you to figure out the font properties and background color to best suit your PDF look and feel! Perhaps some of the earlier examples above can hint at ways to do that.

Using redactions is more secure and probably the method you will want to employ for your search and replace.

Note

Once the document is saved any original text marked for redaction is completely removed so ensure to make a copy of the original file first if required.

Limitations and Considerations

While PyMuPDF is powerful, there are some important limitations to keep in mind:

Font Matching: The library may not always perfectly match the original font, especially with embedded or custom fonts. Test your results carefully.

Layout Preservation: Complex layouts with overlapping elements or precise positioning might be affected by text replacement. If your replaced text is longer than the original it can easily overlap the next word in a sentence. PDFs should not be considered to be like Word documents - the text layout will not adjust as you insert or remove characters, new lines and new pages won't be automatically created if you insert huge blocks of text. Remember PDFs are not like Word documents!

Text Recognition: PyMuPDF works with the actual text content in PDFs. It cannot replace text that's embedded as images or in scanned documents.

Performance: For large PDFs or batch processing, consider processing pages in chunks or using multiprocessing for better performance.

Conclusion

PyMuPDF provides a robust solution for text search and replacement in PDF documents. While the basic functionality is straightforward to implement, achieving perfect formatting preservation requires more careful handling of font properties and text positioning. The examples provided here should give you a solid foundation for building PDF text manipulation tools tailored to your specific needs.

Remember, when using the replace by redaction method, to always test your replacements on sample documents first, and consider creating backups of important PDFs before performing bulk modifications.