Working With PDF Annotations in Python

Harald Lieder·July 5, 2023

PyMuPDFAnnotations
Working With PDF Annotations

What Are Annotations?

PDF files have long been the de facto standard for sharing information across different systems, all while preserving formatting, graphics, and other design elements. One of the remarkable features of PDF documents is the Annotations — an umbrella term that covers a variety of interactive objects that can be placed on top of the PDF content.

Annotations are versatile, allowing users to add interactive elements that enhance the content, such as textual notes, highlights and file attachments, just to name a few. They offer an engaging way to enrich a document’s content, facilitate reader’s interaction, and provide feedback or clarification where necessary.

In the real world, the use of PDF annotations is varied. Educators often use them to provide feedback on students’ work. In business, they facilitate collaborative document review, enabling team members to add comments, suggestions, or approvals. Legal professionals use annotations to reference cases and append critical notes. Architects and engineers also use them to review blueprints or technical drawings, adding key details or changes without altering the original document.

Understanding PDF annotations is important when working with PDFs as they greatly enrich the functionality and interactivity of documents. They facilitate seamless collaboration, as multiple users can contribute without changing the original content. Annotations enhance clarity by offering critical context and explanations, which in turn increase the document’s comprehensibility. This is where PyMuPDF comes into play. PyMuPDF allows for efficient automation and manipulation of PDF annotations. This blog will delve into how you can leverage PyMuPDF to create, modify, and manage different types of PDF annotations effectively.

Following are some types of PDF annotations along with code snippets to work with them programmatically. They all are fully supported by PyMuPDF.

Marking Text

Highlight, Underline, Squiggly (zigzag), and StrikeOut are used to increase the visibility of selected portions of text and pretty much do what their names suggest. They look like this:

an example of text marking annotations

Drawing Graphical Elements

To create simple graphics, use any of the annotations Line, Square, Circle, Polygon, PolyLine, or Ink (freehand drawing simulation). An example for a Circle (ellipse) annotation will be shown further down.

Providing Meta-Information

A Stamp annotation looks like a rubber stamp and lets you visibly mark a page as being “Draft”, “Confidential”, “Approved” or similar. Example:

an example of a stamp annotation

A Caret annotation indicates that text on this page has been modified. This may be useful to focus attention on selected pages during negotiating contract wordings or similar. This is how a Caret looks like:

an example of a caret annotation

Commenting, Adding Information and Redacting

A Text annotation is represented by an icon, which exhibits text when clicked. It is very similar to a sticky note and it is used to comment on some part of the page. A visual example is shown further down.

A FreeText annotation has a similar purpose, but its text is directly visible and not hidden “inside” an icon.

FileAttachment annotations are similar to Text but allow storing complete files “inside” an icon. The standard icon for a file attachment looks like this:

an example of a file attachment annotation icon

A Redact annotation is used to mark some part of the page as a candidate for removal. Such an area will be marked like this:

an example of a redact annotation

Working With Annotations in PyMuPDF

To work with annotations in PyMuPDF, you can use the Page class and its methods. For example, to add a Text annotation, you can use the following code:

import pymupdf


doc = pymupdf.open("input.pdf")  # open the input file
page = doc[0]  # load the desired page


point = pymupdf.Point(50, 50)  # top-left coordinates of the icon
annot = page.add_text_annot(point, "This is a sticky note.")


doc.save("input-annotated.pdf")  # save changes in a new file
doc.close()

A fully-fledged PDF reader like Adobe Reader shows these annotations as clickable icons which would show these annotations as a pop up when we click on the icons as shown below.

an example of a pdf sticky note annotation

PyMuPDF adds new annotations using default properties for each annotation type. For instance, Circle annotations receive a red, straight-line border and no interior color.

By using methods of the Annot class, you can modify these properties as desired — either any time later or right during the annotation’s creation.

Let us try another example and add a Circle annotation. This time we will choose a cloudy border with a dotted blue line, yellow fill color and 30% transparency. We take the same page of the file from before and do the following:

# We need a rectangle into which the circle (ellipse) is drawn
rect = pymupdf.Rect(100, 100, 300, 250)


# Define colors for border (stroke) and interior (fill)
blue = pymupdf.pdfcolor["blue"]
yellow = pymupdf.pdfcolor["yellow"]


annot = page.add_circle_annot(rect)


# Colorize the new annotation
annot.set_colors(stroke=blue, fill=yellow)


# Give it a dashed and cloudy border
annot.set_border(dashes=(2, 2), clouds=2)


# Set a transparency of 30%
annot.set_opacity(0.3)


# Overwrite annotation defaults
annot.update()

This is the resulting page appearance:

an example of a circle annotation

Of course, PyMuPDF also offers elegant ways to access existing annotations of a page, display their properties, and delete or modify them. Here is a code snippet that shows the two annotations that we inserted earlier:

# Iterate over the page’s annotations and print basic properties
for annot in page.annots():
    print(f"{annot!r},'{annot.info['content']}'")


# Output:
'Text' annotation on page 0 of input.pdf, 'This is a sticky note.'
'Circle' annotation on page 0 of input.pdf, ''

Conclusion

In this blog post, we’ve explored a valuable feature of PDF: Annotations.

Annotations provide a way to add extra information and context to a PDF document without modifying the main content. They can be used for collaboration, review, or simply to provide helpful notes for the reader.

PyMuPDF supports a variety of annotation types, including text, highlight, underline, strikeout, squiggly, and more. You can also modify or delete existing annotations, making it easy to manage and update annotations as needed.

To learn more about PyMuPDF and its extensive range of features, be sure to check out the official documentation: https://pymupdf.readthedocs.io/en/latest/.

If you want to interactively play around with things you have seen in this article, go ahead and try out this Jupyter notebook.

If you have questions about PyMuPDF, you can reach the devs on the #pymupdf Discord channel.