Skip to content

Instantly share code, notes, and snippets.

@mlissner
Created April 12, 2026 04:28
Show Gist options
  • Select an option

  • Save mlissner/dbe57cf3888e8ae0f826b15876842204 to your computer and use it in GitHub Desktop.

Select an option

Save mlissner/dbe57cf3888e8ae0f826b15876842204 to your computer and use it in GitHub Desktop.
Implementation plan for pdf_select_version API

Plan: PDF Version Selection API

Background

PDFs support incremental updates — each time a PDF is saved, a new cross-reference (xref) section is appended, creating a new "version" while preserving all prior versions. MuPDF already tracks these versions internally and has a public pdf_count_versions() function, but there's no public way to select a version so that all subsequent API calls operate on that historical snapshot.

Internally, MuPDF already has the mechanism: doc->xref_base controls which version object lookups resolve against. It's used temporarily by signature validation functions (save, set, do work, restore). We're exposing this as a persistent, public API.

Commit 1: Core feature — C API + JS API docs

Step 1: C header declarations

File: include/mupdf/pdf/xref.h (after line 253, near pdf_count_versions)

Add two function declarations with doc comments:

  • pdf_select_version(ctx, doc, version) — selects a version for all subsequent operations. Version 0 = latest (default), 1 = previous save, etc. Throws FZ_ERROR_ARGUMENT if version is out of range.
  • pdf_selected_version(ctx, doc) — returns the currently selected version.

Follow the existing comment style (multi-line /* ... */ blocks) matching pdf_count_versions at lines 241-250.

Step 2: C implementation

File: source/pdf/pdf-xref.c (after pdf_doc_was_linearized at line 3906)

Implement both functions:

pdf_select_version:

  1. Validate: version >= 0 && version < pdf_count_versions(ctx, doc). Throw FZ_ERROR_ARGUMENT on failure.
  2. Set doc->xref_base = version.
  3. Call pdf_drop_page_tree_internal(ctx, doc) to invalidate the page tree cache — different versions may have different page counts/ordering. The page tree is lazily rebuilt on next page access, so this is safe.

pdf_selected_version:

  1. Return doc->xref_base.

Both functions follow the style of the surrounding code (no braces on single-statement bodies, fz_throw for errors).

Step 3: JavaScript API documentation

File: docs/reference/javascript/types/PDFDocument.rst (after countVersions block ending at line 1140)

Add RST documentation for both methods following the exact pattern of countVersions at lines 1131-1140:

  • PDFDocument.prototype.selectVersion(version) — with :param: tag and description of version numbering (0 = latest).
  • PDFDocument.prototype.selectedVersion() — with :returns: number.

Include code examples in .. code-block:: blocks.


Commit 2: Tests — multi-version PDF fixture + C and JS test programs

Step 4: Create a multi-version test PDF

We need a PDF with known version history to test against. Create a script or use mutool to build one:

  1. Create a 1-page PDF (version 1)
  2. Add a second page and save incrementally (version 2)
  3. Add a third page and save incrementally (version 3)

This gives us a 3-version PDF where:

  • Version 0 (latest): 3 pages
  • Version 1: 2 pages
  • Version 2: 1 page

The test fixture should be small and committed to the repo so tests are self-contained. Investigate whether mutool run with JavaScript can create this, or whether we need a small Python/C script. Place it alongside the test examples.

Step 5: C example / test program

File: docs/examples/pdf-version-select.c (new file)

A self-contained C program that:

  1. Opens the multi-version test PDF (path from argv)
  2. Prints the number of versions (pdf_count_versions)
  3. For each version (0 through N-1):
    • Selects it with pdf_select_version
    • Confirms with pdf_selected_version
    • Prints the page count at that version (pdf_count_pages)
  4. Resets to version 0

Follow the style of docs/examples/example.c: includes <mupdf/fitz.h> and <mupdf/pdf.h>, uses fz_try/fz_catch for error handling, cleans up resources on exit. Keep it simple — no rendering, just version enumeration and page counting.

This serves as both documentation (how to use the API) and a functional test (verifiable against the multi-version PDF).

Step 6: JavaScript example / test script

File: docs/examples/pdf-version-select.js (new file)

A mutool-run script that does the same thing as the C example:

  1. Opens the multi-version test PDF
  2. Prints version count
  3. Iterates versions, selecting each and printing the page count
  4. Resets

Follow the style of existing .js examples in docs/examples/. Runnable with mutool run pdf-version-select.js test.pdf.


Commit 3: Language bindings — mutool JS, WASM, Java

Step 7: JavaScript binding in mutool

File: source/tools/murun.c

Two changes:

a) Add wrapper functions (after ffi_PDFDocument_countVersions at line 8205):

Follow the exact pattern of the existing version functions. selectVersion takes one argument via js_tonumber(J, 1), calls pdf_select_version. selectedVersion takes no arguments, calls pdf_selected_version, pushes result with js_pushnumber.

b) Register the methods (after line 12694, near other version methods):

jsB_propfun(J, "PDFDocument.selectVersion", ffi_PDFDocument_selectVersion, 1);
jsB_propfun(J, "PDFDocument.selectedVersion", ffi_PDFDocument_selectedVersion, 0);

Step 8: WASM C binding

File: platform/wasm/lib/mupdf.c (after wasm_pdf_count_versions at line 1318)

Add two WASM exports following the existing patterns:

  • wasm_pdf_select_version(pdf_document *doc, int version) — void function using VOID(pdf_select_version, doc, version) macro, matching the pattern at line 1335 (wasm_pdf_enable_journal).
  • wasm_pdf_selected_version(pdf_document *doc) — int function using INTEGER(pdf_selected_version, doc) macro, matching the pattern at line 1317 (wasm_pdf_count_versions).

Step 9: WASM TypeScript binding

File: platform/wasm/lib/mupdf.ts (after countVersions at line 2801)

Add two methods to the PDFDocument class:

  • selectVersion(version: number) — calls libmupdf._wasm_pdf_select_version(this.pointer, version), matching the style of deleteObject at line 2540.
  • selectedVersion(): number — calls libmupdf._wasm_pdf_selected_version(this.pointer), matching the style of countVersions at line 2800.

Step 10: Java JNI binding

Two files:

a) JNI C wrapper: platform/java/jni/pdfdocument.c (after FUN(PDFDocument_countVersions) ending at line 997)

Add two JNI functions:

  • FUN(PDFDocument_selectVersion) — takes jint version, calls pdf_select_version(ctx, pdf, version), returns void. Follow the pattern of FUN(PDFDocument_deleteObject) at line 458 (void function with int param).
  • FUN(PDFDocument_selectedVersion) — returns jint, calls pdf_selected_version(ctx, pdf). Follow the pattern of FUN(PDFDocument_countVersions) at line 982.

b) Java declaration: platform/java/src/com/artifex/mupdf/fitz/PDFDocument.java (after countVersions at line 199)

public native void selectVersion(int version);
public native int selectedVersion();

Commit 4: Read-only enforcement for historical versions

A separate commit so it can be reverted independently if the maintainers prefer not to have it.

Step 11: Guard mutations when viewing a historical version

File: source/pdf/pdf-xref.c, in ensure_incremental_xref() at line 537

All PDF mutations flow through this single function:

  • pdf_get_incremental_xref_entry() calls it (line 600)
  • pdf_get_or_create_incremental_entry() calls it (line 716)
  • Which in turn are called by pdf_create_object, pdf_update_object, pdf_delete_object, etc.

Add a guard at the top of ensure_incremental_xref:

if (doc->xref_base != 0)
    fz_throw(ctx, FZ_ERROR_ARGUMENT,
        "cannot modify document while viewing historical version %d",
        doc->xref_base);

This is 2-3 lines in one function. To modify the document, the caller must first call pdf_select_version(ctx, doc, 0) to return to the latest version.


Notes

  • C++/Python/C# bindings are auto-generated from the C headers by scripts/mupdfwrap.py and should pick up the new functions automatically without any manual work.

File change summary

Commit Step File Change ~Lines Done
1 1 include/mupdf/pdf/xref.h Edit — declarations + doc comments ~15 x
1 2 source/pdf/pdf-xref.c Edit — implementations ~25 x
1 3 docs/reference/javascript/types/PDFDocument.rst Edit — API docs ~25 x
2 4 TBD New — multi-version test PDF + generator ~30 x
2 5 docs/examples/pdf-version-select.c New — C example/test ~80 x
2 6 docs/examples/pdf-version-select.js New — JS example/test ~20 x
3 7 source/tools/murun.c Edit — JS wrappers + registration ~20 x
3 8 platform/wasm/lib/mupdf.c Edit — WASM C exports ~10 x
3 9 platform/wasm/lib/mupdf.ts Edit — TypeScript methods ~10 x
3 10 platform/java/jni/pdfdocument.c Edit — JNI wrappers ~25 x
3 10 platform/java/src/.../PDFDocument.java Edit — native declarations ~3 x
4 11 source/pdf/pdf-xref.c Edit — read-only guard ~3 x

Total: ~265 lines across 11 files, in 4 commits

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment