1.6 release notes

Before attempting to upgrade to Document Engine 1.6, first upgrade to Document Engine 1.5 if you haven’t already, and make sure your application still runs as expected. Read our general advice before doing so.

Highlights

This release introduces compatibility with the new versioning of Nutrient Web SDK (1.0.0 follows 2024.8.x). Make sure to reread the client integration documentation, because Nutrient Web SDK renamed pspdfkit.js to nutrient-viewer.js and introduced NutrientViewer instead of PSPDFKit (the latter of which is still kept as an alias).

Additionally, Document Engine responds with the Server header set to Document Engine/x.y.z, where x.y.z is the version of Document Engine. This change is subtle, but it enables you to identify problems more quickly (e.g. determining if a response is coming from Document Engine or a proxy like NGINX).

The HTML conversion engine no longer triggers unwanted outgoing network activity. There should be no traffic to optimizationguide-pa.googleapis.com (port 443, UDP) and 239.255.255.250 (port 1900, UDP). We fixed the conversion engine configuration to make sure this doesn’t happen again.

TIFF conversion now supports 1-bit TIFF files with min-is-white photometric interpretation and 0 DPI (e.g. barcodes). Version 1.6 also improves the handling of large TIFF files concerning memory allocations and CPU load.

Version 1.6 adds two new upstream API endpoints, allowing you to fetch the text of all pages in a document:

  • /api/documents/:document_id/pages/text

  • /api/documents/:document_id/layers/:layer_name/pages/text

Breaking changes

JSON Web Token (JWT) authorization requires you to provide a list of document identifiers the user is allowed to access in the JWT claims. If not specified, the user isn’t allowed to access any documents. Examples:

  • If not set, set to null, or an empty array is provided ({ "allowed_document_ids": [] }) — No access allowed

  • { "allowed_document_ids": "any" } — Access to any document is allowed

  • { "allowed_document_ids": ["foo", "bar"] } — Access to documents with foo or bar identifiers is allowed

Database migrations

Document Engine creates two sets of assets whenever it has to convert a document from a different format into a PDF on upload. These assets are:

  • The PDF result of the conversion, referred to as source_pdf in the documents table.

  • The original file of the conversion, referred to as original_file in the documents table.

Historically, both source_pdf and original_file refer to assets in the pdfs table using an encoded form of the assets’ sha256 hash. This release ships a change that lazily starts using uuid to refer to original_file assets in the documents table instead of sha256.

Changelog

A full list of changes, along with the issue numbers, is available here.