Multi-threaded Use of MuPDF in Java

Sebastian Rasmussen / Fred Ross-Perry·January 2, 2023

MuPDFMultithreading
Multi-threaded Use of MuPDF in Java

This article explains how to safely use MuPDF, where multiple threads make concurrent calls to the library.

Single-threaded rendering of a PDF page to a PNG

Let’s start easy and render a PDF page to a PNG in Java in a single-threaded environment.


import com.artifex.mupdf.fitz.*;
//...
Document doc = Document.openDocument("document.pdf");
Page page = doc.loadPage(42);
// No scaling/rotation, output RGB samples, no alpha, include annotations/form fields
Pixmap pixmap = page.toPixmap(Matrix.Identity(), ColorSpace.DeviceRGB, false, true);
pixmap.saveAsPNG("out.png");

The sample code opens the document from the given file, loads one page, and renders it into a Pixmap before finally saving the Pixmap to a PNG output file.

The code above will block during document opening, page loading, rendering, and saving the output file. Each of these operations may take non-negligible time. That is usually not a problem for command-line tools, but GUI applications avoid blocking UI updates for long periods. Usually, GUI applications perform long-execution tasks in separate threads that pass results back to UI threads that update the GUI.

A full single-threaded example, complete with error handling, is available in the MuPDF git repository.

Single-threaded MuPDF usage in multi-threaded applications

Multi-threaded applications that have several threads to handle UI requests etc., but only call the MuPDF library from one single thread, have no threading issues with MuPDF. Such applications use MuPDF in a single-threaded way.

An example would be an application with a document-server thread that receives requests from other threads for specific pages from documents, then uses MuPDF to render the desired pages into images before passing the resulting images back to the requesting thread. This MuPDF background thread will be blocked while opening documents, loading pages, and rendering the same way as the single-threaded code above.

Rules for multi-threaded usage of MuPDF

Multi-threaded applications can safely use MuPDF and avoid blocking during page rendering like the single-threaded code above by following a few simple rules:

  • Different threads may not use the same Document or Page objects simultaneously.
  • Different threads may not use the same Device simultaneously.
  • Once created from Pages, DisplayLists may be used by different threads simultaneously.

Below we will describe and illustrate two approaches to abiding by these rules.

How to use MuPDF in multi-threaded applications

One approach to abiding by those rules is for a multi-threaded application to limit accessing the document and its pages to a single thread and only farm out rendering of pages to one or more worker threads.

Modifications of the document, such as adding/changing/removing annotations or setting/clearing form field values, also need to happen in the same thread that opens the Document and loads Pages. That thread must also load document Outline or page Links if the application needs them. That is easy to know because you need a reference to Document or Page objects for those operations (or PDFDocument/PDFPage for PDF specification operations).

In the following example, the main thread is used for loading documents and pages, while separate worker threads do the rendering, one per page. The input to each worker thread is a display list, the area of the page to draw, and a destination Pixmap. The output from each worker thread is a Pixmap with rendered page contents.

The main thread must wait for each worker to finish rendering into its Pixmap object before saving the Pixmap to a PNG file. Therefore the main thread calls join() to wait for each worker to finish executing its run(), where rendering takes place.

import com.artifex.mupdf.fitz.*;
//...
final Document doc = Document.openDocument("document.pdf");
final int pageCount = doc.countPages();
final Thread[] threads = new Thread[pageCount];
final Pixmap[] pixmaps = new Pixmap[pageCount];

for (int i = 0; i < pageCount; ++i) {
    Page page = doc.loadPage(i);
    /* Get page size and create display list from page on the main thread. */
    final DisplayList displayList = page.toDisplayList();
    final Rect bounds = displayList.getBounds();
    final int pageNumber = i;

    threads[i] = new Thread() {
        /* Creating the pixmap and the rendering runs inside the worker threads. */
        public void run() {
            pixmaps[pageNumber] = new Pixmap(ColorSpace.DeviceRGB, bounds);
            pixmaps[pageNumber].clear(0xff);
            DrawDevice dev = new DrawDevice(pixmaps[pageNumber]);
            displayList.run(dev, Matrix.Identity(), bounds, null);
            dev.close();
        }
    });
    threads[i].start();
}

/* Wait until each rendered image is available before writing it as a PNG. */
for (int i = pageCount - 1; i >= 0; i--) {
    threads[i].join();
    pixmaps[i].saveAsPNG(String.format("out%04d.png", i));
}

A variant of this example, including error handling, is available in the MuPDF git repository.

An issue with the sample code above is that it creates one thread per page. For short documents, this is not a problem. For documents with thousands of pages, it is likely to cause scheduling contention of the rendering threads.

The sample code below also farms out page rendering tasks to worker threads. The difference is that it submits rendering tasks to a thread pool with only a few worker threads. The application specifies a reasonably sized thread pool to limit thread contention.

A Callable that returns a Pixmap encapsulates each rendering task. When submitting the Callable, the thread pool returns a Future, representing the pending Pixmap result from the rendering task. Each Callable may (but does not have to) execute until the resulting Pixmap is requested by the main thread when saving the rendered pages to PNG files.

import com.artifex.mupdf.fitz.*;
import java.util.concurrent.*;
import java.util.*;
//...
Document doc = Document.openDocument("document.pdf");
int pages = doc.countPages();

ExecutorService executor = Executors.newFixedThreadPool(4);
List<future<pixmap>> renderingFutures = new LinkedList();

for (int i = 0; i < pages; ++i) {
        final int pageNumber = i;
        Page page =  doc.loadPage(pageNumber);
        /* Get page size and create display list from page on the main thread. */
        final Rect bounds = page.getBounds();
        final DisplayList displayList = page.toDisplayList();

        renderingFutures.add(executor.submit(new Callable<pixmap>() {
                /* Create the pixmap and do render in a thread in the pool. */
                public Pixmap call() {
                        Pixmap pixmap = new Pixmap(ColorSpace.DeviceRGB, bounds);
                        pixmap.clear(0xff);
                        DrawDevice dev = new DrawDevice(pixmap);
                        displayList.run(dev, Matrix.Identity(), bounds, null);
                        dev.close();
                        return pixmap;
                }
        }));
}

/* Wait until each rendered image is available before writing it as a PNG. */
for (int i = pages - 1; i >= 0; --i) {
        Pixmap pixmap = renderingFutures.get(i).get();
        pixmap.saveAsPNG(String.format("out-%04d.png", i));
}

executor.shutdown();
</pixmap></future<pixmap>

A variant of this example, with error handling, is available in the MuPDF git repository.

Implementation details and comparison of Java vs. C

Understanding this section is not required to use the MuPDF Java API in multi-threaded applications. Instead, this section is for those wanting to peek under the hood of the MuPDF Java API to understand the implementation details of how multi-threaded usage in Java differs from C.

A context provides MuPDF’s parsing/rendering process with access to the global state (such as caches for decoded images and rendered versions of font glyphs) and per-thread state (such as exception stack and data to rate-limit warnings). The first argument to every MuPDF API in C is a context, which means C programmers must first manually obtain a fz_context. Further details of contexts at the C level are available in chapter 5 of MuPDF Explored.

When creating a fz_context, you may pass a fz_lock_context with callbacks to lock/unlock a set of mutexes used by MuPDF during parsing and rendering. The MuPDF C library is deliberately kept free of knowledge of any particular threading libraries so that using MuPDF with any threading library is possible. Single-threaded applications do not need locking, so the lock/unlock callbacks are empty, and there are no mutexes. Multi-threaded applications require locking and must provide a fz_lock_context with suitable callbacks for their chosen threading library.

The Java bindings automate both of these, making MuPDF easier to use.

How MuPDF automatically handles contexts in Java

In addition to the set of rules for Java outlined above, multi-threaded applications in C have to adhere to one more rule:

  • Different threads may not use the same context at the same time.

See the section on multi-threading in MuPDF Explored and the multi-threading section on the MuPDF website for full details of all rules when using MuPDF in C.

The approach the MuPDF API in Java takes to handle this additional rule is to create an application global base context and make clones of the base context for every thread that uses the MuPDF Java API.

As a Java thread is about to call any MuPDF class for the first time, a static initializer in the class gets executed. If not already done, these static initializers will load the MuPDF JNI library and create a global internal base fz_context.

When the Java thread’s call reaches the MuPDF JNI bindings, they query thread local storage for a cached context clone of the base context. If there is no such context clone, the bindings clone the base context and cache the clone in thread local storage. The MuPDF JNI bindings pass these cloned contexts to every MuPDF C API they use.

Thus MuPDF transparently creates and clones fz_contexts whenever needed.

MuPDF’s predefined locking/threading mechanisms in Java

The Java VM provides access to Windows threads or UNIX pthreads depending on the platform it is running on. MuPDF’s Java API passes a suitable fz_lock_context based on either threading library when creating the global internal base context. This means that all the threading choices are handled for you, and there are therefore no provisions for users to supply their own mutexes or lock/unlock functions in Java.