This article was first published in June 2021 and was updated in November 2024.
This guide explains how to generate PDFs from HTML using Node.js with Puppeteer and Nutrient Document Engine. It includes steps for setting up Document Engine, preparing HTML content, and converting the content to PDF.
Why use Node.js for HTML-to-PDF conversion?
Node.js offers various libraries for PDF generation, with Puppeteer and Nutrient being two powerful options. Puppeteer makes it easy to automate HTML-to-PDF conversion in a development environment, while Document Engine provides robust features suited for high-quality PDF production in enterprise applications. By combining these tools, you can create dynamic, customized PDFs that retain HTML styling and are fully optimized for production.
In this tutorial, we’ll guide you through the HTML rendering process with Puppeteer and discuss how Nutrient can enhance your PDF generation in more advanced use cases.
Introduction to HTML-to-PDF conversion
HTML-to-PDF conversion is a common requirement in many web applications, allowing users to download webpages or documents as PDF files for easy sharing or printing. This process involves converting HTML content into a PDF document, which can be achieved using various Node.js libraries. By converting HTML to PDF, you can ensure your webpages are easily accessible offline, maintain their formatting, and are ready for printing or archiving.
Using Node.js for PDF conversion offers several advantages. Node.js libraries are typically lightweight, easy to integrate, and provide robust performance. They allow developers to automate the process of generating PDF files from HTML content, making it possible to create PDF documents dynamically based on user input or other data sources. This flexibility is particularly useful for generating reports, invoices, and other documents that require a consistent format.
Using Puppeteer for HTML-to-PDF conversion
Step 1 — Installing dependencies
Start by initializing your project and installing Puppeteer for PDF generation:
mkdir html-to-pdf && cd html-to-pdf npm init -y npm install puppeteer
Step 2 — Creating the HTML template
Write an HTML template, template.html
, which will be rendered as a PDF:
<!DOCTYPE html> <html> <head> <title>Sample PDF</title> <style> body { font-family: Arial, sans-serif; } h1 { color: #4caf50; } </style> </head> <body> <h1>Hello, World!</h1> <p>This PDF was generated from HTML.</p> </body> </html>
Step 3 — Writing the Puppeteer script
Create a new file, generatePdf.js
, to render the HTML to PDF:
const fs = require('fs'); const puppeteer = require('puppeteer'); async function generatePdf() { const browser = await puppeteer.launch(); const page = await browser.newPage(); const html = fs.readFileSync('template.html', 'utf8'); await page.setContent(html, { waitUntil: 'networkidle0' }); await page.pdf({ path: 'output.pdf', format: 'A4', printBackground: true, }); await browser.close(); console.log('PDF generated successfully'); } generatePdf();
Step 4 — Running the script
Generate the PDF by running the script:
node generatePdf.js
After running, a file named output.pdf
will appear in the project folder.
Getting started with Nutrient Document Engine
While Puppeteer offers a simple solution for rendering PDFs from HTML, Nutrient provides a comprehensive PDF management tool that can help you unlock advanced features like editing, annotating, and digitally signing PDFs.
If you’re looking to take your PDF workflows even further with capabilities beyond basic generation, Nutrient’s Document Engine will give you the flexibility you need. Let’s walk through how to get started with Nutrient and set it up for your Node.js project.
Requirements
Before we begin, ensure your system meets the following requirements:
-
Operating systems:
-
macOS Ventura, Monterey, Mojave, Catalina, or Big Sur.
-
Ubuntu, Fedora, Debian, or CentOS (64-bit Intel and ARM processors supported).
-
-
Memory: At least 4 GB of RAM.
Installing Docker
Document Engine is distributed via Docker. Install Docker by following the appropriate instructions for your OS:
-
macOS — Install Docker Desktop for Mac.
-
Windows/Linux — Follow the guides on Docker’s website.
Setting up Document Engine
To start Document Engine, you’ll need to configure Docker. Save the following docker-compose.yml
file:
version: '3.8' services: document_engine: image: pspdfkit/document-engine:1.5.0 environment: PGUSER: de-user PGPASSWORD: password PGDATABASE: document-engine PGHOST: db PGPORT: 5432 API_AUTH_TOKEN: secret SECRET_KEY_BASE: secret-key-base JWT_PUBLIC_KEY: | -----BEGIN PUBLIC KEY----- MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA2gzhmJ9TDanEzWdP1WG+ 0Ecwbe7f3bv6e5UUpvcT5q68IQJKP47AQdBAnSlFVi4X9SaurbWoXdS6jpmPpk24 QvitzLNFphHdwjFBelTAOa6taZrSusoFvrtK9x5xsW4zzt/bkpUraNx82Z8MwLwr t6HlY7dgO9+xBAabj4t1d2t+0HS8O/ed3CB6T2lj6S8AbLDSEFc9ScO6Uc1XJlSo rgyJJSPCpNhSq3AubEZ1wMS1iEtgAzTPRDsQv50qWIbn634HLWxTP/UH6YNJBwzt 3O6q29kTtjXlMGXCvin37PyX4Jy1IiPFwJm45aWJGKSfVGMDojTJbuUtM+8P9Rrn AwIDAQAB -----END PUBLIC KEY----- JWT_ALGORITHM: RS256 DASHBOARD_USERNAME: dashboard DASHBOARD_PASSWORD: secret ports: - 5000:5000 depends_on: - db db: image: postgres:16 environment: POSTGRES_USER: de-user POSTGRES_PASSWORD: password POSTGRES_DB: document-engine volumes: - pgdata:/var/lib/postgresql/data volumes: pgdata:
Starting Document Engine
Open a terminal, navigate to the directory containing the docker-compose.yml
file, and run:
docker-compose up
Wait until you see the message:
document_engine_1 | Access the web dashboard at http://localhost:5000/dashboard
Visit http://localhost:5000/dashboard and authenticate using the following credentials:
-
Username:
dashboard
-
Password:
secret
PDF generation from HTML with Document Engine
Document Engine simplifies the process of generating PDFs directly from HTML. Here’s how to do it step by step.
Step 1 — Preparing your HTML template
First, create an HTML template for your content. Use Mustache to dynamically inject data into the template. Save this template as template.mustache
:
<!DOCTYPE html> <html> <body> <div class="address"> John Smith <br /> 123 Smith Street <br /> 90568 TA <br /> <br /> {{date}} </div> <div class="subject">Subject: PDF Generation FTW!</div> <div> <p>PDF is great!</p> </div> <div> {{name}} <br /> </div> </body> </html>
Step 2 — Providing the dynamic data
Create a data.json
file with the data that will replace the placeholders in your HTML template:
{ "name": "John Smith Jr.", "date": "29 February, 2020" }
Step 3 — Rendering HTML using Mustache
Next, render the HTML using Mustache:
const mustache = require('mustache'); const fs = require('fs'); const template = fs.readFileSync('template.mustache').toString(); const data = JSON.parse(fs.readFileSync('data.json').toString()); const outputHtml = mustache.render(template, data); // Save the rendered HTML file fs.writeFileSync('output.html', outputHtml); console.log('HTML generated successfully.');
Step 4 — Sending HTML to the Document Engine
Instead of converting the HTML yourself, you can send it to the Document Engine’s API to generate the PDF. Here’s an example using axios
to send the rendered HTML:
const axios = require('axios'); const fs = require('fs'); // Read the generated HTML const htmlContent = fs.readFileSync('output.html', 'utf8'); // Define the PDF generation schema const pdfGenerationSchema = { html: htmlContent, layout: { orientation: 'portrait', // Optional: 'landscape' or 'portrait' size: 'A4', // Optional: 'A4', 'Letter', or custom dimensions margin: { left: 10, // Optional: margin sizes in mm top: 10, right: 10, bottom: 10, }, }, }; // Send the HTML to the Document Engine API axios .post('http://localhost:5000/api/documents', pdfGenerationSchema, { headers: { Authorization: 'Token token=YOUR_API_TOKEN', 'Content-Type': 'application/json', }, }) .then((response) => { // Handle the PDF response (e.g., save the PDF file) fs.writeFileSync('output.pdf', response.data); console.log('PDF generated successfully.'); }) .catch((error) => { console.error('Error generating PDF:', error); });
Step 5 — Adding watermarks and cover pages
The Document Engine can add extra features like watermarks and cover pages via its API. To add a watermark, include an additional HTML block like this:
<div
style="position: fixed;
top: 50%;
left: 50%;
font-size: 72px;
color: red;
opacity: 0.5;
transform: rotate(-45deg);
text-align: center;
z-index: -1;"
>
My Watermark
</div>
This will place a semi-transparent watermark in the center of the PDF.
For a cover page, you can add an additional HTML block with a page break:
<div style="page-break-after: always;">
<h1>Cover Page</h1>
<p>This is the cover page of the PDF.</p>
</div>
Alternatively, you can upload an existing PDF as the cover page through the Document Engine API:
curl -X POST http://localhost:5000/api/documents \ -H "Authorization: Token token=<API token>" \ -F page.html=@/path/to/page.html \ -F cover.pdf=@/path/to/cover.pdf \ -F generation='{ "html": "page.html" }' \ -F operations='{ "operations": [ { "type": "importDocument", "beforePageIndex": 0, "document": "cover.pdf" } ] }'
Choosing the right Node.js library
When it comes to generating PDFs with Node.js, selecting the right library is essential for both functionality and efficiency. Here are some key factors to consider:
-
Ease of use — Look for libraries with straightforward and intuitive APIs that simplify PDF generation. User-friendly libraries reduce the learning curve, making things faster to implement.
-
Performance — Consider libraries that are optimized to handle large data sets and generate PDFs quickly. High performance is particularly valuable when PDFs are created in real time or with large amounts of data.
-
Features — Identify the specific features you need, such as support for images, tables, custom fonts, or interactivity. Some libraries offer advanced capabilities that improve the quality and usability of your PDFs.
-
Compatibility — Ensure the library aligns with your Node.js version and project dependencies to avoid compatibility issues that might require additional troubleshooting.
-
Community support — Choose libraries with active communities and good documentation. Libraries with strong support can be easier to troubleshoot and often have resources to guide you through advanced features.
Popular PDF libraries for Node.js
A few popular options for Node.js PDF generation include Puppeteer, jsPDF, PDFKit, and Document Engine. Each has unique strengths, so it’s important to select based on your project requirements:
-
Puppeteer — Known for rendering complex HTML and CSS layouts, Puppeteer is ideal for capturing highly stylized HTML in PDF form. It leverages a headless browser to accurately replicate a webpage as a PDF, making it great for visual consistency.
-
jsPDF — A lightweight, easy-to-use library, jsPDF works well for straightforward PDF generation tasks, especially when working with basic content.
-
PDFKit — This library offers extensive customization options and is well-suited for projects that require detailed, element-by-element control over PDF layouts.
-
Document Engine — Ideal for complex, enterprise-level applications, Nutrient provides robust support for PDF generation and interactivity, with features like annotations, forms, and digital signatures. It’s compatible with Node.js and is a powerful choice when advanced functionality and precision are key.
Evaluating your specific project needs and testing a few libraries can help you find the right tool for effective and reliable PDF generation in Node.js.
Handling complex scenarios with Node.js and PDF libraries
Generating PDFs from dynamic webpages and processing large datasets can present unique challenges. Here are tools and techniques to help:
-
Use a headless browser — Libraries like Puppeteer and Playwright enable you to automate browser interactions, making it possible to generate PDFs from dynamic webpages. These tools can render JavaScript-heavy content, ensuring a PDF matches the appearance of the page.
-
Optimize performance — For handling large datasets, techniques like caching, parallel processing, and using efficient data structures can significantly improve performance. Optimizing these aspects is essential for applications that need to generate PDFs in real time or process high volumes of data.
-
Use templates — Template engines like Handlebars and EJS are great for generating PDFs with dynamic content. Templates allow you to separate content and layout, making it easier to manage and customize a PDF’s structure.
-
Customize layouts — Libraries such as PDFKit and jsPDF provide advanced styling and layout options, allowing for tailored PDF designs that meet specific branding or formatting standards. Both libraries support custom fonts, images, tables, and more.
-
Integrate with Document Engine — For more robust PDF processing, Nutrient Document Engine is a powerful option. It operates as a PDF server, ideal for automating PDF workflows and managing the full document lifecycle. Document Engine runs as a headless service within your infrastructure or hosted via Nutrient. It integrates with Nutrient’s web and mobile SDKs, enabling seamless management of document automation, including features like merging, form filling, annotation, and advanced security — perfect for enterprise applications with high performance and security requirements.
-
Combine with other tools — Integrate Node.js libraries with external services, such as APITemplate.io or Nutrient Document Engine, to expand capabilities for template-based PDF generation, server-side rendering, and API-based PDF automation. These integrations streamline workflows for applications that need to produce consistent, high-quality PDFs from existing templates or data sources.
Examples of handling complex scenarios
Here’s how to tackle specific scenarios using these tools:
-
Generating PDFs from dynamic webpages — Use Puppeteer or Playwright to capture JavaScript-heavy content for PDFs that mirror webpage content.
-
Handling large datasets — Implement caching, parallel processing, and optimized data handling to efficiently generate PDFs from large datasets.
-
Creating custom layouts — Leverage PDFKit and jsPDF for custom layouts, or use Nutrient Document Engine for advanced layouts and document lifecycle management.
-
Automating with Document Engine — Document Engine offers enterprise-grade PDF automation, with capabilities for processing, annotating, and securing PDFs, making it ideal for applications requiring extensive document handling features.
By applying these techniques and utilizing powerful tools like Document Engine, you can handle even the most complex PDF generation tasks in Node.js applications. This approach ensures your application delivers high-quality, customizable PDFs while meeting specific functionality and performance requirements.
Conclusion
This post covered how to generate PDFs from HTML in Node.js using Puppeteer for basic HTML-to-PDF tasks and Nutrient Document Engine for more advanced features like annotations and digital signatures. With Nutrient, you can set up Document Engine, render HTML with dynamic data, and create production-grade PDFs through API requests — ideal for generating documents such as reports and invoices. To get an API token, contact our Sales team.
FAQ
Here are a few frequently asked questions about Document Engine and working with PDFs.
What are the system requirements for running Document Engine?
Document Engine requires at least 4 GB of RAM and can be run on macOS (Ventura, Monterey, Mojave, Catalina, or Big Sur) and Linux distributions (Ubuntu, Fedora, Debian, CentOS). Docker is required to run the engine.Do I need Puppeteer or any other tools to generate PDFs with Document Engine?
No, Document Engine handles the entire process of converting HTML to PDF, so you don’t need Puppeteer or any other tool to generate PDFs.How do I add dynamic content like names and dates to my PDFs?
You can use Mustache templates to inject dynamic data into your HTML before sending it to the Document Engine. This allows you to create personalized PDFs by simply updating the data file.Can I add watermarks or cover pages to my PDFs?
Yes, Document Engine lets you add watermarks and cover pages through its API. You can either add HTML for a watermark or upload a separate PDF file as a cover page.How do I interact with Document Engine using Node.js?
You can send an API request using Node.js and a library likeaxios
to pass your HTML content to Document Engine, which will return the generated PDF. The tutorial includes a sample script for making this request.