What is a linearized PDF file? A complete guide
Linearized PDFs, also known as “Fast Web View” PDFs, are optimized for efficient online viewing. Unlike traditional PDFs that require downloading an entire document before displaying any content, linearized PDFs allow for incremental loading. This means the first page can be accessed almost immediately, significantly improving the user experience, especially when dealing with large documents.
In this complete guide, we’ll explore the technical details, benefits, and methods for creating linearized PDFs. We’ll look at open source solutions like Ghostscript, along with Nutrient DWS API and Document Engine. With these tools, you’ll learn how to optimize your PDFs for web viewing, improve document accessibility, and streamline your PDF workflows.
What is a linearized PDF?
A linearized PDF, also known as a “web-optimized” or “Fast Web View” PDF, is a special type of PDF optimized for faster web viewing. Unlike standard PDFs, which require an entire file to be downloaded before it can be viewed, a linearized PDF allows the first page to load immediately. This enables users to start reading the document without having to wait for the entire file to download, significantly improving the user experience, especially in web-based environments.
How linearization works
In a standard PDF, the data is scattered throughout the file, requiring the entire document to download before viewing begins. In contrast, in a linearized PDF, this data is restructured to facilitate fast, on-demand web viewing. The first page is stored upfront for immediate access, and additional pages are progressively loaded in sequence.
To achieve this, linearization involves adding a linearization dictionary and “hint tables” at the beginning of a document. These elements enable random access to specific pages, allowing a viewer designed for linearized PDFs to request content from the server in sequential “chunks,” ensuring efficient and smooth loading.
Understanding the importance of linearized PDFs
The primary benefit of linearizing a PDF is its ability to optimize web viewing by enabling the document to be streamed directly to the user. This is similar to how media platforms like YouTube stream videos by loading the initial part first while buffering the rest. This capability offers several advantages:
-
Faster document loading — Linearized PDFs load the first page instantly, reducing waiting times significantly.
-
Improved user experience — Especially crucial when accessing large files over networks with limited bandwidth.
-
Resilient to network interruptions — Since pages are streamed individually, interruptions in the network don’t require restarting the document download.
-
Optimized for mobile devices — Reduces memory usage and improves the viewing experience on mobile devices with limited storage or processing power.
Using Ghostscript to linearize PDFs
Ghostscript is a powerful open source interpreter for PDF and PostScript files. Licensed under the GNU Affero General Public License (AGPL), it allows developers to freely use, modify, and distribute the software, provided they comply with the license terms. It enables PDF linearization through the -dFastWebView
option, allowing the first page of a PDF to load before the entire file is downloaded — a feature especially useful for web applications.
Step 1 — Installation of Ghostscript
For macOS (using Homebrew):
brew install ghostscript
For Ubuntu/Debian:
sudo apt update sudo apt install ghostscript
For Windows:
-
Download the Ghostscript installer from Artifex’s official website.
-
Run the installer and follow the installation instructions.
-
Add Ghostscript to the system’s
PATH
to allow access from the command line.
Once installed, you can verify the installation by running the following command in the terminal:
gs --version
Step 2 — Linearizing a PDF with Ghostscript
To linearize a PDF, use the following Ghostscript command:
gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dFastWebView=true -sOutputFile=output.pdf input.pdf
-
-dNOPAUSE
— This option tells Ghostscript to process all pages without pausing between them (no user interaction). -
-dBATCH
— Ensures the process runs in batch mode, meaning Ghostscript exits after the job is complete. -
-sDEVICE=pdfwrite
— Specifies that the output format should be PDF. -
-dFastWebView=true
— Enables linearization, which is required for Fast Web View. -
-sOutputFile=output.pdf
— Specifies the name of the output file (in this case,output.pdf
). -
input.pdf
— Your original PDF file.
Step 3 — Checking the linearized PDF
Open the linearized PDF in a text editor (like Notepad, TextEdit, or VS Code) and check the header:
%PDF-1.7 %«Ïè¢ %%Invocation: gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dFastWebView=true -sOutputFile=? ? 4 0 obj <</Linearized 1/L 4107/H[ 805 135]/O 6/E 805/N 1/T 3986>>
The key part to look for is /Linearized 1
, which indicates the PDF has been linearized.
Limitations of Ghostscript for linearization
-
Partial support for PDF specification
-
Warnings like “We don’t support XRefStm with FastWebView (Linearized PDF)” and “We don’t support ObjStms with FastWebView (Linearized PDF)” indicate that Ghostscript doesn’t fully support handling cross-reference streams (
XRefStm
) and object streams (ObjStms
) as per the PDF specification for linearization. -
While Ghostscript still produces a linearized PDF, these limitations may affect compatibility with certain systems.
-
Compatibility level
-
Ghostscript often processes PDFs at a default of
-dCompatibilityLevel=1.4
, which ensures compatibility with older PDF readers but might not meet requirements for modern or advanced workflows.
-
Testing required
-
While no critical errors typically occur during linearization, testing the output file is essential to ensure it behaves as expected in real-world scenarios, especially with tools that rely on strict adherence to the PDF specification.
Additional considerations
-
Performance — Large PDFs may take longer to process with Ghostscript compared to some commercial tools.
-
Licensing — Ghostscript is free under the AGPL for open source projects, but proprietary use requires a commercial license.
Despite these drawbacks, Ghostscript is a free and viable option for projects relying on open source tools. For more robust linearization, consider using Nutrient DWS API or Document Engine, both of which are designed for advanced workflows.
Creating linearized PDFs with Nutrient DWS API
Nutrient DWS API simplifies the process of linearizing PDFs. By following the steps outlined below, you can easily integrate this functionality into your application.
Step 1 — Preparing the prerequisites
Before making an API call, make sure you have the following:
-
API key — You’ll need an API key for authentication. Sign up for a free account to receive 100 credits, but note that a free trial entails a watermark on the output documents. These credits are used for API requests, and different operations consume varying amounts of credits.
-
PDF document — Choose the PDF document you wish to linearize. Ensure the file is accessible on your local machine or server.
-
API testing tool — Use tools like cURL or Postman for testing the API requests.
Step 2 — Making the API request
Once you have your API key and PDF document ready, make a simple API request to linearize the PDF. Here’s the cURL command:
curl -X POST https://api.nutrient.io/build \ -H "Authorization: Bearer your_api_key_here" \ -o result.pdf \ -F [email protected] \ -F instructions='{ "parts": [ { "file": "document" } ], "output": { "type": "pdf", "optimize": { "linearize": true } } }'
-
Authorization — Replace
your_api_key_here
with your actual Nutrient DWS API key to authenticate the request. -
Document — Replace
example-document.pdf
with the file path of the PDF you wish to linearize. Ensure the file exists at the specified path. -
Linearize — Setting the
linearize
parameter totrue
ensures the output will be a linearized PDF, optimized for faster web viewing and streaming.
Step 3 — Executing the request
-
Run the cURL command or use Postman to send the request.
-
The API will process the document and return a linearized PDF, saved as
result.pdf
in your specified directory.
After the API call is complete, open the resulting result.pdf
file in any PDF viewer, including web browsers. Notice how the first page loads almost instantly, even if the document is large.
Why choose Nutrient DWS API?
Nutrient DWS API offers a comprehensive and scalable solution for document processing. Here are some reasons to consider it for your linearization needs:
-
Ease of use — Detailed documentation and clear instructions make it easy to integrate the API into your workflows quickly.
-
Scalability — Whether you’re working with a few PDFs or handling high-volume document processing, Nutrient’s API is built to scale with your needs.
-
Flexibility — The API supports a wide range of document manipulation tools, from PDF conversion and file optimization, to watermarking, and more.
-
Transparent pricing — The credit-based pricing model ensures you only pay for what you use, allowing for cost-effective and predictable billing.
Extending your workflows with Nutrient DWS API
Nutrient’s API offers a variety of tools to enhance and streamline your document workflows, providing more than just PDF linearization:
-
Digital signatures — Create secure digital signatures with a trusted certificate, all within a single interface.
-
PDF generator — Convert HTML documents into fully formatted PDF files with our generation API.
-
PDF editor — Merge, split, delete, flatten, and duplicate PDF documents effortlessly with our editing API.
-
Converter API — Easily convert popular file formats such as DOCX, PPTX, and images into PDFs and other image formats like JPG, PNG, and TIFF.
-
Watermark — Add custom text or image watermarks to your PDF documents for branding or security.
-
Optical character recognition (OCR) — Convert scanned documents into searchable, editable PDFs with our OCR API.
-
Data extraction — Extract valuable data from your documents, including text, key values, tables, and more.
-
PDF annotations — Add comments, highlights, and other annotations to PDFs to facilitate collaboration and markup.
-
PDF form filling — Automatically fill PDF forms with data from your application workflows.
These additional tools, alongside PDF linearization, enable you to create more efficient and customized document processing workflows tailored to your business needs.
Nutrient Document Engine
Nutrient Document Engine is powerful software designed for processing documents and enabling automation workflows. As a backend service, it seamlessly integrates with your infrastructure or can be provided by Nutrient as a managed solution. It complements web and mobile frontend SDKs, allowing developers to manage the entire document lifecycle.
How to linearize a PDF using Document Engine
Document Engine makes linearizing PDFs straightforward with the following API request. Use the curl
command below to linearize a PDF file:
curl -X POST http://localhost:5000/api/build \ -H "Authorization: Token token=<API token>" \ -F document=@/path/to/example-document.pdf \ -F instructions='{ "parts": [ { "file": "document" } ], "output": { "type": "pdf", "optimize": { "linearize": true } } }' \ -o result.pdf
-
Endpoint — Replace
http://localhost:5000
with the URL where your Document Engine is hosted, if necessary. -
Authorization
— Replace<API token>
with your actual API token to authenticate the request. -
document
— Replace/path/to/example-document.pdf
with the file path to the PDF you want to linearize. -
instructions
— This field specifies the processing instructions. The key part here is"linearize": true
, which indicates the PDF should be linearized. -
output
— The final output will be saved asresult.pdf
, a linearized PDF.
Licensing considerations for PDF linearization
To use the linearization feature in Document Engine, you must have it included in your license. If this feature isn’t already part of your license, contact Nutrient’s Sales team to add it. After adding the feature, be sure to update your license or activation keys in your configuration.
Combining linearization with PDF compression
You can also combine linearization with other PDF optimizations like compression. If your license includes both linearization and compression features, you can apply them simultaneously in a single API request:
{ "parts": [ { "file": "document" } ], "output": { "type": "pdf", "optimize": { "grayscaleText": true, "grayscaleGraphics": true, "grayscaleFormFields": true, "grayscaleAnnotations": true, "disableImages": true, "mrcCompression": true, "imageOptimizationQuality": 2, "linearize": true } } }
In the instructions above, multiple optimization features are specified, such as:
-
Grayscale optimization — Reduces the size of the PDF by converting text, graphics, and form fields to grayscale.
-
Image optimization — Compresses images to improve load times.
-
MRC compression — Applies mixed raster content (MRC) compression for further file size reduction.
To learn more about the various compression methods supported by Document Engine, refer to our PDF compression guide.
Best practices for working with linearized PDFs
-
Use reliable tools — Choose trusted PDF creation and editing tools that support linearization to ensure reliable performance.
-
Optimize content — Before linearizing, optimize your PDF content (e.g. reducing image size or simplifying design) for better efficiency.
-
Test performance — After linearizing, test your PDF across different environments (web browsers, mobile devices) to ensure fast and smooth performance.
-
Monitor file size — Linearization can increase file size slightly due to additional indexing. Keep the overall size manageable to prevent slow load times.
-
Avoid incremental saving — Incremental saving can negatively impact the linearization process. Save your PDFs fully to ensure proper linearization.
-
Compress PDF files — Use compression techniques to reduce file size and improve loading times, especially for larger documents.
-
Use PDF optimization tools — Leverage PDF tools to optimize and compress large PDFs for more efficient web viewing.
-
Avoid embedded fonts — Embedded fonts can increase file size and slow down loading times, so consider using standard fonts when possible.
By following these best practices, you can ensure your linearized PDFs are optimized for web viewing, providing a smoother, faster experience for your users.
Conclusion
We explored tools like Ghostscript, Nutrient DWS API, and Nutrient Document Engine for creating and optimizing linearized PDFs. These solutions provide easy ways to implement Fast Web View, enhancing the accessibility and speed of PDFs in web applications.
Sign up for a free trial today to explore DWS API and, for advanced features like Document Engine, contact our Sales team to discuss licensing options.
FAQ
Here are a few frequently asked questions about linearized PDFs.
How does a linearized PDF differ from a standard PDF?
A linearized PDF allows the first page to load immediately, while a standard PDF requires the entire document to download before displaying any content.
How can I linearize PDFs using Nutrient Document Engine?
You can linearize PDFs with Nutrient Document Engine by making an API request that includes the linearize
parameter set to true
, which optimizes the document for faster web viewing.
How do I check if my PDF is linearized?
Open the PDF in a text editor and look for /Linearized
1 in the header, which indicates the PDF has been linearized.
What are some limitations of using Ghostscript for PDF linearization?
Ghostscript lacks a dedicated API for programming languages, making it less suitable for integration into automated workflows compared to commercial solutions.
Is DWS API suitable for large-scale document processing?
Yes, DWS API is designed to scale efficiently, handling high-volume document processing for enterprise-level workflows.