MuPDF.NET: The Sharp Addition to the MuPDF Family

Harald Lieder / Jamie Lemon·June 3, 2024

MuPDFMuPDF.NETRelease
MuPDF.NET: The Sharp Addition to the MuPDF Family

The MuPDF clan is expanding! After C and Java, JavaScript (MuPDF.js) and Python (PyMuPDF) you can now also use the elegant language C# to manipulate PDF and other documents the “sharp” way!

MuPDF.NET, currently in its alpha version (released in May 2024), is like PyMuPDF’s twin sibling: PyMuPDF classes (Document, Page, Annot, Story, Font, Widget and so on) have their pendant in MuPDF.NET (also called Document, Page, Annot etc.) with methods and properties that behave and deliver as it is known from PyMuPDF.

Key Features

The features of MuPDF.NET are the same as those of PyMuPDF – the only exception as per today being table recognition.

Let us nevertheless enumerate the highlights here again:

  • Merge, join, encrypt, decrypt and compress PDF documents.
  • Support automatic repair, linearized PDFs and incremental updates.
  • Support for PDF Optional Content and page labels.
  • Extract, search for and highlight text.
  • Extract images and vector graphics.
  • Convert (render) pages to popular image formats.
  • Add and maintain annotations and fields.
  • Create, delete and reorder pages.
  • Insert text, images or vector graphics.
  • Remove text, images or vector graphics via redactions.
  • API for seamless, dynamic OCR via the integrated Tesseract engine.
  • Create font subsets in PDFs.

Availability

MuPDF.NET is available for Microsoft Windows and registered in NuGet. Use Microsoft Visual Studio (versions 2019 and up) to build and maintain your applications.

Example Session

Our example application will open a PDF provided as a pathname parameter in a terminal window, load the first page, search for some text and highlight all its occurrences. Then save the modified document under a new name.

  • Start Visual Studio and select “Create a new project”.
  • Please make sure to download MuPDF.NET from NuGet.
  • Choose “Console App”:
Console App
  • Choose name “highlighter” for your application and enter the following code in the editor:
using MuPDF.NET;
internal class Program
{
    private static void Main(string[] args)
    {
        Document doc = new(args[0]);
        Page page = doc[0];
        List matches = page.SearchFor("pixmap");
        if (matches.Count > 0)
        {
            page.AddHighlightAnnot(matches);
            doc.Save("highlighted.pdf");
        }
    }
}
  • In the Visual Studio menu, choose “Release” in the configuration field and click option Build | Build Solution to build your program.
  • After less than a second, you will find your application’s binaries in some subfolder of your project: “…\highlighter\bin\Release\net…”.
Release
  • To test your application, step into that folder to locate the executable file “highlighter.exe”. Open a terminal window here and execute highlighter.exe “input.pdf”. Then check out the generated result document “highlighted.pdf”.
  • If your original page looks like this:
Original Page

… the result will be this:

Result
  • When finished testing your application, use Visual Studio’s “Publish” feature to build a portable, self-contained executable that can be used independently from Visual Studio and the project’s folders.
    • Select your project and click on Build | Publish Selection.
    • From the choices offered as publication targets, select the option Folder and choose some folder.
    • Finally click on “Show all Settings” and make the following adjustments when necessary.
Show all settings

Here is what you should select or specify:

Profile settings

After pressing Publish file “D:\highlighter.exe” will be generated that can be passed around to all supported platforms.

Not Everything Sharp is a Knife: Beyond C#

As seasoned .NET enthusiasts know, the magic of .NET packages transcends language boundaries. Once you've crafted a .NET package, it becomes accessible to any language supported by the framework.

Enter MuPDF.NET—an empowering gateway. It grants Fortran (F#), Visual Basic (VB.Net), and all other .NET languages to have access to the full power of MuPDF.

Here is a simple C# program that prints the text of a PDF page:

using MuPDF.NET;
internal class Program
{
    private static void Main()
    {
        Document doc = new(“test.pdf”);
        Page page = doc[0];
        TextPage tpage = page.GetTextPage();
        Console.WriteLine(tpage.ExtractText());
        }
    }
}

And now the Fortran version:

#r "MuPDF.NET.dll"
let doc = MuPDF.NET.Document("test.pdf")
let page = doc.LoadPage(0)
let tpage = page.GetTextPage()
printfn $"{tpage.ExtractText()}"

If you feel more at home with Visual Basic, then this is for you:

Imports System
Imports mupdf.NET


Module Program
    Sub Main()
        Dim doc As Document = New Document("test.pdf")
        Dim page As Page = doc.LoadPage(0)
        Dim tpage As TextPage = page.GetTextPage()
        Console.WriteLine(tpage.ExtractText())
    End Sub
End Module

Conclusion

MuPDF.NET emerges as a powerful addition to the MuPDF suite, offering C#, F# and Visual Basic developers have the same robust features as PyMuPDF, but with their preferred syntax in any .Net language. It promises seamless document manipulation, from merging and encrypting PDFs to dynamic OCR capabilities.

With its availability on Windows and NuGet registration, MuPDF.NET is poised to become a go-to library for C# and all .Net programmers looking to handle PDFs and other documents efficiently. The provided example session showcases the ease with which one can integrate MuPDF.NET into a project, highlighting its potential to streamline document processing tasks.

As the library matures, we can expect even more features and improvements, solidifying its place in the developer's toolkit.

Other Useful Links