Complete guide to PDF.js: The leading JavaScript library for PDF rendering
This article was first published in June 2023 and was updated in August 2024.
In this post, we’ll provide you with a complete overview of PDF.js, including a quick technology explainer and key features. You’ll also get a step-by-step integration guide that covers opening a PDF with PDF.js, manipulating pages, handling annotations, customizing your viewer, and more.
If you’re exploring more advanced solutions, Nutrient Web SDK is another robust option to consider for your PDF needs.
Introduction to PDF.js
What is PDF.js?
PDF.js is an open source JavaScript library that allows you to render PDF files in a web browser without the need for any plugins or external software. It was released by Mozilla on 2 July 2011 and is now maintained by a team of developers from across the world. PDF.js is built on top of the Canvas API and other web technologies, making it easy to integrate into web applications.
How PDF.js works: Core concepts and architecture
PDF.js is designed to be modular, with each module focusing on a specific task. This modular architecture allows you to include only the modules you need, reducing the size of your code and improving performance. PDF.js has several layers, each with its own purpose:
-
Core layer — This is the lowest-level layer of PDF.js, responsible for parsing the binary format of a PDF file and converting it into an internal representation that can be used by higher-level layers. The core layer is typically used directly only by advanced users who need fine-grained control of the parsing process.
-
Display layer — The display layer builds upon the core layer and provides a more user-friendly API for rendering PDF files. With the display layer, you can easily render a PDF page into a
<canvas>
element using just a few lines of JavaScript code. This layer is suitable for most day-to-day use cases. -
Viewer layer — The viewer layer is a ready-to-use user interface (UI) that comes with PDF.js. It includes features like search, rotation, a thumbnail sidebar, and more. The viewer layer is built on top of the display layer and provides a complete PDF viewing experience out of the box.
PDF.js key features
PDF.js provides a set of features for viewing, annotating, and manipulating PDF documents:
-
Render PDF documents in the browser using the HTML5
<canvas>
element -
Search for text within a document
-
View page thumbnails
-
Zoom in and out of pages
-
Rotate pages
-
Add text and highlight annotations to a document
-
Fill out PDF form fields
-
View and navigate through bookmarks and document outlines
Some PDF.js limitations
PDF.js is a powerful library for rendering PDF documents in the browser, but like any software, it has its limitations and drawbacks:
-
PDF.js rendering can be slow and resource-intensive, particularly for large or complex PDF documents.
-
Text selection and searching can be slow or inaccurate, particularly for documents with complex formatting or embedded images.
-
The accuracy of PDF.js rendering can vary depending on the browser and platform being used.
-
Certain PDF features, such as interactive forms and multimedia elements, may not be fully supported or may not work as expected in PDF.js.
-
PDF.js doesn’t support all of the features available in the latest PDF specification, so some documents may not render correctly or may not be compatible with PDF.js at all.
-
PDF.js may not be the best choice for applications that require advanced PDF functionality or performance, such as document editing or printing. In these cases, a more specialized PDF library may be a better option.
Step-by-step integration guide
Getting started with PDF.js
-
Download or clone PDF.js — You can download the library as a ZIP file or clone the repository using Git.
-
Prepare the files — Extract the ZIP file and copy the
pdf.mjs
andpdf.worker.mjs
files from thebuild/
folder to your project directory. -
Include PDF.js in your HTML — Add the following script tag to your HTML file to load the PDF.js library:
<script src="./pdf.mjs" type="module"></script>
After including the script tag, you can start using PDF.js to render PDF files on your webpage.
Rendering a PDF file with PDF.js
To render a PDF file using PDF.js, follow these steps.
-
Add a
<canvas>
element to your HTML where the PDF will be displayed:
<canvas id="pdf-canvas"></canvas>
-
Create a file named
index.js
.
Next, use document.getElementById
to select the <canvas>
element in your HTML where the PDF will be displayed:
const canvas = document.getElementById('pdf-canvas');
Define the URL of the PDF file you want to render. Ensure this file is accessible from your project directory:
const pdfUrl = 'pspdfkit-web-demo.pdf';
Set the path to the PDF.js worker file to enable PDF rendering. Use pdfjsLib.getDocument(pdfUrl).promise
to load the PDF file, which returns a Promise
resolving to a PDFDocumentProxy
object:
pdfjsLib.GlobalWorkerOptions.workerSrc = './pdf.worker.mjs'; pdfjsLib.getDocument(pdfUrl).promise.then(function (pdfDoc) { // Continue with further steps. });
Retrieve the desired page from the loaded PDF document using pdfDoc.getPage(1)
. This returns a Promise
resolving to a PDFPageProxy
object representing the page:
pdfDoc.getPage(1).then(function (page) { // Continue with further steps. });
Calculate and set the dimensions of the canvas to match the size of the PDF page to ensure correct display:
const viewport = page.getViewport({ scale: 1 }); canvas.width = viewport.width; canvas.height = viewport.height;
Get the 2D rendering context from the canvas. Create a renderContext
object with the canvas context and viewport settings. Call the page.render()
method to render the PDF page onto the canvas:
const ctx = canvas.getContext('2d'); const renderContext = { canvasContext: ctx, viewport: viewport, }; page.render(renderContext);
Implement error handling to manage cases where the PDF file might be missing or corrupted. Use the catch()
method to log errors to the console or display an appropriate message:
pdfjsLib .getDocument(pdfUrl) .promise.then(function (pdfDoc) { // Handling and rendering logic. }) .catch(function (error) { console.log('Error loading PDF file:', error); });
-
Include the
index.js
file in your HTML file:
<script src="./index.js" type="module"></script>
The type="module"
attribute allows the use of ES6 features like import
/export
and ensures that the script is loaded asynchronously, avoiding global scope issues.
Make sure to add your PDF file to the same directory as the HTML file. You can use the demo PDF file as an example.
This is a simple example of how to render a PDF file using PDF.js. PDF.js provides many more options and features that you’ll explore in the next sections.
Running the project
To run the project, follow the steps in this section.
-
Install the
serve
package:
npm install --global serve
-
Serve the contents of the current directory:
serve -l 8080 .
-
Navigate to http://localhost:8080 to view the project.
User interface enhancements
To improve the user experience when working with PDFs, you can add various UI enhancements such as navigation controls and zoom functionalities. The next sections explain how you can integrate these features.
Adding navigation controls
-
Include buttons for navigation in your HTML file:
<button id="prev-page">Previous Page</button> <button id="next-page">Next Page</button>
-
Update your
index.js
to handle page navigation. Add event listeners to the buttons and manage page navigation:
const initialState = { pdfDoc: null, currentPage: 1, zoom: 1, }; document .getElementById('prev-page') .addEventListener('click', function () { if (initialState.pdfDoc && initialState.currentPage > 1) { initialState.currentPage--; renderPage(initialState.currentPage); } }); document .getElementById('next-page') .addEventListener('click', function () { if ( initialState.pdfDoc && initialState.currentPage < initialState.pdfDoc.numPages ) { initialState.currentPage++; renderPage(initialState.currentPage); } }); function renderPage(pageNumber) { if (!initialState.pdfDoc) return; initialState.pdfDoc .getPage(pageNumber) .then((page) => { const viewport = page.getViewport({ scale: initialState.zoom, }); canvas.width = viewport.width; canvas.height = viewport.height; const ctx = canvas.getContext('2d'); const renderContext = { canvasContext: ctx, viewport: viewport, }; page .render(renderContext) .promise.then(() => { console.log('Rendering complete'); }) .catch((error) => { console.log('Error rendering page:', error); }); }) .catch((error) => { console.log('Error loading page:', error); }); }
Adding zoom controls
-
Include buttons for zoom controls in your HTML file:
<button id="zoom-in">Zoom In</button> <button id="zoom-out">Zoom Out</button>
-
Update
index.js
to manage zoom functionality. Add event listeners to the zoom buttons and adjust the scale of the viewport:
document .getElementById('zoom-in') .addEventListener('click', function () { if (initialState.pdfDoc) { initialState.zoom *= 4 / 3; // Increase zoom scale. renderPage(); } }); document .getElementById('zoom-out') .addEventListener('click', function () { if (initialState.pdfDoc) { initialState.zoom *= 2 / 3; // Decrease zoom scale. renderPage(); } });
Advanced features and customizations
This section explores more advanced features and customizations you can add to your PDF viewer. These include handling annotations, enabling text selection, controlling rendering options, navigating documents, and manipulating PDFs. Each feature enhances the functionality and user experience of your PDF viewer, making it more interactive and efficient.
Handling PDF annotations
PDF.js is primarily designed as a viewer for rendering PDF files, and it doesn’t provide a built-in mechanism for editing or modifying PDF documents. However, it does provide a way to access PDF annotations such as links, highlights, and comments. PDF annotations can be accessed using the getAnnotations()
method on a PDF page.
Here’s an example of how to render PDF annotations:
// Load annotation data. page .getAnnotations() .then(function (annotations) { annotations.forEach(function (annotation) { if (annotation.subtype === 'Text') { // Render a text annotation. const textRect = annotation.rect; const text = document.createElement('div'); text.style.position = 'absolute'; text.style.left = textRect[0] + 'px'; text.style.top = textRect[1] + 'px'; text.style.width = textRect[2] - textRect[0] + 'px'; text.style.height = textRect[3] - textRect[1] + 'px'; text.style.backgroundColor = 'green'; text.style.opacity = '0.5'; text.innerText = annotation.contents; canvas.parentNode.appendChild(text); } else if (annotation.subtype === 'Highlight') { // Render a highlight annotation. const highlightRect = annotation.rect; const highlight = document.createElement('div'); highlight.style.position = 'absolute'; highlight.style.left = highlightRect[0] + 'px'; highlight.style.top = highlightRect[1] + 'px'; highlight.style.width = highlightRect[2] - highlightRect[0] + 'px'; highlight.style.height = highlightRect[3] - highlightRect[1] + 'px'; highlight.style.backgroundColor = 'yellow'; highlight.style.opacity = '0.5'; canvas.parentNode.appendChild(highlight); } }); }) .catch(function (error) { console.log('Error loading annotations:', error); });
Make sure the PDF file you’re using actually contains annotations.
This code snippet retrieves all the annotations for the current page using the page.getAnnotations()
method. It then loops through each annotation and checks its subtype to determine what type of annotation it is.
For text annotations, it creates a div
element, sets its position and dimensions using the annotation’s rectangle coordinates, and adds it to the container element with a green background color and opacity of 0.5. Similarly, for highlight annotations, it creates a div
element, sets its position and dimensions using the annotation’s rectangle coordinates, and adds it to the container element with a yellow background color and opacity of 0.5.
This code will render text annotations as green semi-transparent rectangles with the text contents of the annotation on top of the PDF page at the position specified by the annotation’s rectangle coordinates, and it’ll highlight annotations as yellow rectangles with 50 percent opacity.
Check out the source code for this example on GitHub.
Handling PDF text selection
PDF.js provides support for selecting and copying text from PDF files. PDF text can be accessed using the getTextContent()
method on a PDF page.
Here’s an example of how to extract text from a PDF page:
page.getTextContent().then(function (textContent) { let text = ''; for (let i = 0; i < textContent.items.length; i++) { const item = textContent.items[i]; text += item.str; } console.log(text); });
This code will extract all the text from the PDF page and log it to the console.
Controlling PDF rendering
PDF.js provides many options for controlling how PDF files are rendered. These options can be passed as parameters to the page.render()
method.
Here are some of the most common options:
-
canvasContext
— Specifies the rendering context to use for rendering the PDF page. This is typically a 2D canvas context. -
viewport
— Specifies the viewport to use for rendering the PDF page. The viewport defines the part of the PDF page that should be displayed on the canvas. It can be customized with options such asscale
,rotation
, andoffset
. -
background
— Specifies the color or pattern to use for the background of the canvas. This can be set to a CSS color value or a canvas pattern object.
Here’s an example of how to use some of these options:
page.render({ canvasContext: ctx, viewport: page.getViewport({ scale: 1.5 }), background: 'rgb(255,0, 0)', });
This will render the PDF page with a 1.5x zoom and with a red background.
Navigating a PDF document with PDF.js
PDF.js provides several methods for navigating PDF documents, including scrolling, zooming, and searching. Here’s an overview of some of the most commonly used navigation methods:
-
Scrolling — PDF.js allows you to scroll through a document using the mouse or touchpad. You can also use the scrollbar to navigate through the document.
-
Zooming — You can zoom in and out of a PDF document using the mouse or touchpad. You can also use the zoom buttons on the toolbar to zoom in and out.
-
Searching — PDF.js provides a search bar that allows you to search for specific words or phrases in a PDF document. You can also use the “find” command to search for text within the document.
Manipulating a PDF document with PDF.js
While PDF.js is primarily a PDF viewer, it offers some manipulation capabilities, such as filling out forms:
-
Users can complete and submit PDF forms.
-
Saved forms can be saved as new PDF documents.
For more advanced PDF document manipulation, consider using a dedicated PDF library like Nutrient Web SDK.
Alternative to PDF.js: Nutrient
Nutrient is a commercial PDF rendering and processing library that offers several advantages over PDF.js:
- Performance — Nutrient is optimized for performance and can handle large PDF files more efficiently than PDF.js.
- Functionality — Nutrient offers advanced features that aren’t available in PDF.js, such as digital signatures, annotations, and form filling. It also has a more comprehensive API, making it easier to integrate into existing workflows.
- Support — Nutrient is a commercial product that comes with dedicated technical support, ensuring that any issues or problems can be quickly resolved.
- Compatibility — Nutrient works on all major platforms, including web, desktop, and mobile. It also supports a wide range of file formats, including PDF, Office, and image files.
- Customization — Nutrient offers a high degree of customization, allowing developers to tailor the user interface and functionality to meet their specific needs.
Overall, Nutrient is a better solution for businesses and organizations that require advanced PDF functionality and performance, as well as dedicated technical support. While PDF.js is a free and open source library that’s suitable for basic PDF rendering, it may not be suitable for more complex use cases.
Check out our Nutrient Web SDK product page and demo to learn more about the features and benefits of Nutrient.
Conclusion
PDF.js is a powerful and flexible library that empowers developers to render PDF documents directly in the browser without relying on plugins or third-party software. This article has provided you with a comprehensive overview of the essential steps to get started with PDF.js — from basic rendering, to advanced features like annotations, text extraction, and custom user interfaces. By leveraging these capabilities, you can build robust PDF viewers and PDF-related applications tailored to your specific needs, enhancing both functionality and user experience.
If you’re looking for a more advanced solution with additional features and support, consider using Nutrient Web SDK. Nutrient offers a comprehensive set of tools for working with PDF files, including rendering, editing, and annotating PDF documents. With Nutrient, you can build powerful PDF applications that meet the needs of your users and your business.
We created similar how-to blog posts using different web frameworks and libraries. Explore these posts to further enhance your PDF handling skills:
- How to Build a JavaScript PDF Viewer with PDF.js
- How to Build an Angular PDF Viewer with PDF.js
- How to Build a Vue.js PDF Viewer with PDF.js
- How to Build a React PDF Viewer with PDF.js
- How to Build a Bootstrap 5 PDF Viewer with PDF.js
- How to Build an Electron PDF Viewer with PDF.js
- How to Build a jQuery PDF Viewer with PDF.js
FAQ
Here are a few frequently asked questions about PDF.js.
What is PDF.js and how does it work?
PDF.js is an open source JavaScript library that allows you to render PDF files in a web browser without the need for any plugins or external software. It uses the Canvas API and other web technologies to display PDF documents.
How do I integrate PDF.js into my web application?
To integrate PDF.js, download the library, include the necessary scripts in your HTML, and use JavaScript to load and render PDF files onto a canvas element.
What are the key features of PDF.js?
PDF.js supports rendering PDF documents, searching text within PDFs, viewing thumbnails, zooming in and out, rotating pages, adding annotations, and filling form fields.
What are the limitations of PDF.js?
Limitations include potential performance issues with large or complex PDFs, varying rendering accuracy across browsers, and limited support for interactive forms and multimedia elements.
Can PDF.js handle annotations and form fields?
Yes, PDF.js can render annotations and form fields present in a PDF. However, interactive features like form filling might require additional handling or integration with other libraries.