How to Save a PDF Document with PyMuPDF: Encryption and Much More!
Harald Lieder·April 25, 2023

Handling PDF files is a common task in the world of software development, whether it’s reading, writing, or editing PDF documents. Python, a versatile and powerful programming language, has numerous libraries to aid in this process. One of these libraries is PyMuPDF, a wrapper around the popular MuPDF library. PyMuPDF provides a simple and efficient interface to manipulate PDF files with ease.
In this blog post, we’ll walk through the process of saving a PDF document using Python and PyMuPDF. We will cover the following topics:
- Installing PyMuPDF
- Opening a PDF document
- Saving a PDF document
- Closing a PDF document
1. Installing PyMuPDF
Like with every Python package, before we can use PyMuPDF we must first install it. You can install PyMuPDF using pip, the standard Python package installer. Open a terminal or command prompt and enter the following command:
pip install pymupdf
2. Opening a PDF Document
To open a PDF document, we'll use the pymupdf.Document()
function from the PyMuPDF library. First, we need to import the library:
import pymupdf
Now, we can open a PDF document by specifying its file path:
input_pdf_path = "example.pdf"
doc = pymupdf.open(input_pdf_path)
3. Saving a PDF Document
If you updated your PDF document, you would certainly want to save these changes. The Document
method save()
will do this job for you.
But even if you did not change the document at all, you may nevertheless need to create a new version. For example, you may want to encrypt the document’s content, or you want to store it on a web site and you therefore need a file format that supports fast viewing in a browser.
Yet another situation exists if you only made minor changes and your document file is large. In that case, you will be interested in a solution that does not write the full file new version but somehow lets you “append” your updates to the file.
The parameters of method Document.save()
let you adapt to a wide range of requirements.
3.1 Encrypting and Decrypting a PDF
This allows you to protect the content of your document against unauthorized access by password-protecting it. The PDF concept allows you to define two different passwords for a PDF document:
- Owner password: provides full access to the document, including decryption and defining new password versions.
- User password: defines a subset of permissions when dealing with the document. For example, all access to the document may be password protected. Or viewing only is allowed but not any changes, or only changes of a certain category.
Here’s how to create a protected document version with “full-access” as the owner password and “restricted-access” as the user password. We’ll also define a collection of permissions for those using the user password. Keep in mind, both passwords must be UTF-8 strings of length forty or less.
output_pdf_path = "encrypted.pdf"
owner_pw = ”full-access”
user_pw = ”resticted-access”
encr_method = pymupdf.PDF_ENCRYPT_AES_256 # strongest encryption
# define a bit field of user permissions:
perm = (pymupdf.PDF_PERM_ACCESSIBILITY # always use this
| pymupdf.PDF_PERM_PRINT # permit printing
| pymupdf.PDF_PERM_COPY # permit copying
| pymupdf.PDF_PERM_ANNOTATE # permit adding annotations
)
doc.save(output_pdf_path,
owner_pw=owner_pw, # set the owner password
user_pw=user_pw, # set the user password
encryption=encr_method, # set the encryption method
permissions=perm, # set the user permissions
)
Users attempting to open “encrypted.pdf” will be met with a prompt to enter a password. Permissions will be granted depending on whether the owner or user password is provided. This protection works the same way for apps trying to access the document.
Now, let’s see how PyMuPDF can open a password-protected PDF document and create a decrypted version:
doc = pymupdf.open(“encrypted.pdf”)
# check if password protection exists and let the user enter a password
if doc.needs_pass: # we need the owner password!
owner_pw = input(“enter owner password:”)
rc = doc.authenticate(owner_pw) # authenticate with the password string
if rc not in (1, 4, 6): # authorization levels including ownership
sys.exit(“Bad owner password.”)
# you now have full (owner) document access
# save a decrypted version of the document
encr_method = pymupdf.PDF_ENCRYPT_NONE # option for removing encryption
doc.save(“decrypted.pdf”,
encryption=encr_method,
)
3.2 Creating a PDF for Fast Web Access
PDF documents are often large. If stored online, having to wait for a multi-megabyte download to see the first page is less than ideal. That’s why the PDF specification supports a “fast web access” or “linearized” file format that lets you see the first pages while the document is still downloading.
Here’s how to save a linearized version of your PDF:
input_file = “input.pdf”
output_file = “linearized.pdf”
doc = pymupdf.open(input_file)
# we can also check if the document already is linear:
If doc.is_fast_webaccess:
sys.exit(“Document already linear!”)
doc.save(output_file, linear=True)
# upload the file “linearized.pdf” to some internet site as you normally would
3.3 Saving Incrementally
Appending changes to a PDF document without creating a new file is known as “incremental” save. Choose this option if you’ve made minor changes to a large document — if the file size is large, maybe dozens of megabytes, just appending a few hundred bytes will be multiple order magnitude faster than rewriting the whole file — or if your PDF is digitally signed and you can’t or don’t want to break this state. An advanced PDF viewer will still report the document as signed and note any subsequent changes.
Because incremental updates modify the PDF in place, the output path name must equal the input. Using the Document object’s property name
, do the following:
doc.save(doc.name, incremental=True, encryption=pymupdf.PDF_ENCRYPT_KEEP)
Or use the shorter alias: doc.saveIncr()
.
The Document property version_count
will increase by one after this.
Incremental saves aren’t always possible. Check Document.can_save_incrementally()
to prevent exceptions in cases like damaged documents, newly created PDFs, or structural changes like encryption or compression.
3.4 Controlling File Size: Compression and Garbage Collection
The PDF file format is based on ASCII text — formatted in a unique way such that its objects can be located rapidly. To save space, some objects may be stored in compressed format or, conversely, decompressed to help with debugging or manual tinkering.
Earlier document updates may have generated outdated versions of objects which are still physically present in the file, occupying space, but are no longer accessible — like ghosts. The save method offers multiple ways to bust these ghosts and compress objects within the PDF. Here’s a quick overview of the most important parameters:
garbage=n
: With an integer n, taking values from 0 to 4. Zero means no garbage collection. With increasing thoroughness, unused objects can be removed, and duplicate objects or even binary content can be joined. A typical, recommended value isgarbage=3
.expand=True
: Decompresses binary content. Mainly for debugging purposes, but some apps may be incapable of dealing with compressed object types.deflate=True
: Compresses binary content (where possible and beneficial). For finer control, trydeflate_images=True
anddeflate_fonts=True
.
If you want to go beyond these possibilities offered by the save method, consider the following two Document
methods:
Document.subset_fonts()
: This can have a large file size effect if you inserted text using your own font files. It shrinks font file sizes by keeping only the characters actually used in the document. Fonts supporting Asian characters benefit a lot here.Document.scrub()
: With over a dozen options, this method removes information and objects. Primarily for data protection but removing thumbnail images can also slim down your PDF.
3.5 How to Handle Documents in Memory
PyMuPDF can read documents from a memory area instead of a local disk. Downloads from popular cloud services (Google, AWS, Microsoft Azure) usually deliver Python bytes
objects representing PDF documents.
Instead of storing them on your computer, you can directly open these objects via doc = pymupdf.open(stream=bytes_object)
. Nothing else in your app needs to be changed to work with a document created like this. This has obvious advantages.
Of special interest in the context of this article is the method Document.tobytes()
.
This is the version of Document.save()
that saves to memory — not to disk. It supports exactly the same parameters as save — so everything you’ve learned so far still applies. You can use the produced bytes
object to upload it to your cloud storage, for example.
Here’s a code snippet that downloads a PDF from AWS, converts it for fast web access, and uploads it again:
import pymupdf
import boto3
s3 = boto3.client("s3")
# fill in your credentials to access the cloud
response = s3.get_object(
Bucket="string", # choose your appropriate value
Key="string" # choose your appropriate value
)
body = response["Body"]
# define Document with these data
doc = pymupdf.open(stream=body.read())
# make a linear version
linear_version = doc.tobytes(linear=True)
# upload to cloud service again
rc = s3.write_get_object_response(
Body=linear_version,
RequestRoute=request_route, # choose your appropriate value
RequestToken=request_token, # choose your appropriate value
)
doc.close()
Note
Boto3 is the Amazon Web Services (AWS) Software Development Kit (SDK) for Python.
4. Closing the Document
When you’re done working with the PDF document, it’s good practice to properly close it with doc.close()
. This releases associated resources and relinquishes control of the file to the operating system, thus making it available to other processes before your app ends.
Python experts will appreciate that a Document
in PyMuPDF is a Python context manager, enabling this construct:
with pymupdf.open(input_file) as doc:
# do some work with the document, then
doc.subset_fonts() # optionally compress fonts
doc.save(…) # save when finished
# the document will automatically have been closed at this point
Conclusion
In this blog post, we’ve learned how to save a PDF document using Python and the PyMuPDF library. By following these simple steps, you can easily create, modify, and save PDF files in your Python projects.
Remember, PyMuPDF offers a cornucopia of other features to work with PDF documents, like extracting text, images, annotations, and more. Don’t forget to explore the official PyMuPDF documentation to learn even more about its capabilities: https://pymupdf.readthedocs.io/en/latest/.
Another knowledge source is the utilities repository. Whatever you plan to do when dealing with PDFs: you will probably find some example script there that gives you a start.
If you have questions about PyMuPDF, you can reach the devs on the #pymupdf Discord channel.