Blog post

How to convert HTML to PDF using Node.js

Illustration: How to convert HTML to PDF using Node.js
Information

This article was first published in June 2021 and was updated in November 2024.

This guide explains how to generate PDFs from HTML using Node.js with Puppeteer and Nutrient Document Engine. It includes steps for setting up Document Engine, preparing HTML content, and converting the content to PDF.

Why use Node.js for HTML-to-PDF conversion?

Node.js offers various libraries for PDF generation, with Puppeteer and Nutrient being two powerful options. Puppeteer makes it easy to automate HTML-to-PDF conversion in a development environment, while Document Engine provides robust features suited for high-quality PDF production in enterprise applications. By combining these tools, you can create dynamic, customized PDFs that retain HTML styling and are fully optimized for production.

In this tutorial, we’ll guide you through the HTML rendering process with Puppeteer and discuss how Nutrient can enhance your PDF generation in more advanced use cases.

Introduction to HTML-to-PDF conversion

HTML-to-PDF conversion is a common requirement in many web applications, allowing users to download webpages or documents as PDF files for easy sharing or printing. This process involves converting HTML content into a PDF document, which can be achieved using various Node.js libraries. By converting HTML to PDF, you can ensure your webpages are easily accessible offline, maintain their formatting, and are ready for printing or archiving.

Using Node.js for PDF conversion offers several advantages. Node.js libraries are typically lightweight, easy to integrate, and provide robust performance. They allow developers to automate the process of generating PDF files from HTML content, making it possible to create PDF documents dynamically based on user input or other data sources. This flexibility is particularly useful for generating reports, invoices, and other documents that require a consistent format.

Using Puppeteer for HTML-to-PDF conversion

Step 1 — Installing dependencies

Start by initializing your project and installing Puppeteer for PDF generation:

mkdir html-to-pdf && cd html-to-pdf
npm init -y
npm install puppeteer

Step 2 — Creating the HTML template

Write an HTML template, template.html, which will be rendered as a PDF:

<!DOCTYPE html>
<html>
	<head>
		<title>Sample PDF</title>
		<style>
			body {
				font-family: Arial, sans-serif;
			}
			h1 {
				color: #4caf50;
			}
		</style>
	</head>
	<body>
		<h1>Hello, World!</h1>
		<p>This PDF was generated from HTML.</p>
	</body>
</html>

Step 3 — Writing the Puppeteer script

Create a new file, generatePdf.js, to render the HTML to PDF:

const fs = require('fs');
const puppeteer = require('puppeteer');

async function generatePdf() {
	const browser = await puppeteer.launch();
	const page = await browser.newPage();

	const html = fs.readFileSync('template.html', 'utf8');
	await page.setContent(html, { waitUntil: 'networkidle0' });

	await page.pdf({
		path: 'output.pdf',
		format: 'A4',
		printBackground: true,
	});

	await browser.close();
	console.log('PDF generated successfully');
}

generatePdf();

Step 4 — Running the script

Generate the PDF by running the script:

node generatePdf.js

After running, a file named output.pdf will appear in the project folder.

Screenshot of the generated PDF output from HTML content using Puppeteer in Node.js.

Getting started with Nutrient Document Engine

While Puppeteer offers a simple solution for rendering PDFs from HTML, Nutrient provides a comprehensive PDF management tool that can help you unlock advanced features like editing, annotating, and digitally signing PDFs.

If you’re looking to take your PDF workflows even further with capabilities beyond basic generation, Nutrient’s Document Engine will give you the flexibility you need. Let’s walk through how to get started with Nutrient and set it up for your Node.js project.

Requirements

Before we begin, ensure your system meets the following requirements:

  • Operating systems:

    • macOS Ventura, Monterey, Mojave, Catalina, or Big Sur.

    • Ubuntu, Fedora, Debian, or CentOS (64-bit Intel and ARM processors supported).

  • Memory: At least 4 GB of RAM.

Installing Docker

Document Engine is distributed via Docker. Install Docker by following the appropriate instructions for your OS:

Setting up Document Engine

To start Document Engine, you’ll need to configure Docker. Save the following docker-compose.yml file:

version: '3.8'
services:
   document_engine:
      image: pspdfkit/document-engine:1.5.0
      environment:
         PGUSER: de-user
         PGPASSWORD: password
         PGDATABASE: document-engine
         PGHOST: db
         PGPORT: 5432
         API_AUTH_TOKEN: secret
         SECRET_KEY_BASE: secret-key-base
         JWT_PUBLIC_KEY: |
            -----BEGIN PUBLIC KEY-----
            MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA2gzhmJ9TDanEzWdP1WG+
            0Ecwbe7f3bv6e5UUpvcT5q68IQJKP47AQdBAnSlFVi4X9SaurbWoXdS6jpmPpk24
            QvitzLNFphHdwjFBelTAOa6taZrSusoFvrtK9x5xsW4zzt/bkpUraNx82Z8MwLwr
            t6HlY7dgO9+xBAabj4t1d2t+0HS8O/ed3CB6T2lj6S8AbLDSEFc9ScO6Uc1XJlSo
            rgyJJSPCpNhSq3AubEZ1wMS1iEtgAzTPRDsQv50qWIbn634HLWxTP/UH6YNJBwzt
            3O6q29kTtjXlMGXCvin37PyX4Jy1IiPFwJm45aWJGKSfVGMDojTJbuUtM+8P9Rrn
            AwIDAQAB
            -----END PUBLIC KEY-----
         JWT_ALGORITHM: RS256
         DASHBOARD_USERNAME: dashboard
         DASHBOARD_PASSWORD: secret
      ports:
         - 5000:5000
      depends_on:
         - db
   db:
      image: postgres:16
      environment:
         POSTGRES_USER: de-user
         POSTGRES_PASSWORD: password
         POSTGRES_DB: document-engine
      volumes:
         - pgdata:/var/lib/postgresql/data

volumes:
   pgdata:

Starting Document Engine

Open a terminal, navigate to the directory containing the docker-compose.yml file, and run:

docker-compose up

Wait until you see the message:

document_engine_1  | Access the web dashboard at http://localhost:5000/dashboard

Visit http://localhost:5000/dashboard and authenticate using the following credentials:

  • Username: dashboard

  • Password: secret

Document engine dashboard

PDF generation from HTML with Document Engine

Document Engine simplifies the process of generating PDFs directly from HTML. Here’s how to do it step by step.

Step 1 — Preparing your HTML template

First, create an HTML template for your content. Use Mustache to dynamically inject data into the template. Save this template as template.mustache:

<!DOCTYPE html>
<html>
	<body>
		<div class="address">
			John Smith
			<br />
			123 Smith Street
			<br />
			90568 TA
			<br />
			<br />
			{{date}}
		</div>
		<div class="subject">Subject: PDF Generation FTW!</div>
		<div>
			<p>PDF is great!</p>
		</div>
		<div>
			{{name}}
			<br />
		</div>
	</body>
</html>

Step 2 — Providing the dynamic data

Create a data.json file with the data that will replace the placeholders in your HTML template:

{
	"name": "John Smith Jr.",
	"date": "29 February, 2020"
}

Step 3 — Rendering HTML using Mustache

Next, render the HTML using Mustache:

const mustache = require('mustache');
const fs = require('fs');

const template = fs.readFileSync('template.mustache').toString();
const data = JSON.parse(fs.readFileSync('data.json').toString());

const outputHtml = mustache.render(template, data);

// Save the rendered HTML file
fs.writeFileSync('output.html', outputHtml);
console.log('HTML generated successfully.');

Step 4 — Sending HTML to the Document Engine

Instead of converting the HTML yourself, you can send it to the Document Engine’s API to generate the PDF. Here’s an example using axios to send the rendered HTML:

const axios = require('axios');
const fs = require('fs');

// Read the generated HTML
const htmlContent = fs.readFileSync('output.html', 'utf8');

// Define the PDF generation schema
const pdfGenerationSchema = {
	html: htmlContent,
	layout: {
		orientation: 'portrait', // Optional: 'landscape' or 'portrait'
		size: 'A4', // Optional: 'A4', 'Letter', or custom dimensions
		margin: {
			left: 10, // Optional: margin sizes in mm
			top: 10,
			right: 10,
			bottom: 10,
		},
	},
};

// Send the HTML to the Document Engine API
axios
	.post('http://localhost:5000/api/documents', pdfGenerationSchema, {
		headers: {
			Authorization: 'Token token=YOUR_API_TOKEN',
			'Content-Type': 'application/json',
		},
	})
	.then((response) => {
		// Handle the PDF response (e.g., save the PDF file)
		fs.writeFileSync('output.pdf', response.data);
		console.log('PDF generated successfully.');
	})
	.catch((error) => {
		console.error('Error generating PDF:', error);
	});

Step 5 — Adding watermarks and cover pages

The Document Engine can add extra features like watermarks and cover pages via its API. To add a watermark, include an additional HTML block like this:

<div
	style="position: fixed;
  top: 50%;
  left: 50%;
  font-size: 72px;
  color: red;
  opacity: 0.5;
  transform: rotate(-45deg);
  text-align: center;
  z-index: -1;"
>
	My Watermark
</div>

This will place a semi-transparent watermark in the center of the PDF.

For a cover page, you can add an additional HTML block with a page break:

<div style="page-break-after: always;">
	<h1>Cover Page</h1>
	<p>This is the cover page of the PDF.</p>
</div>

Alternatively, you can upload an existing PDF as the cover page through the Document Engine API:

curl -X POST http://localhost:5000/api/documents \
  -H "Authorization: Token token=<API token>" \
  -F page.html=@/path/to/page.html \
  -F cover.pdf=@/path/to/cover.pdf \
  -F generation='{
  "html": "page.html"
}' \
  -F operations='{
  "operations": [
    {
      "type": "importDocument",
      "beforePageIndex": 0,
      "document": "cover.pdf"
    }
  ]
}'

Choosing the right Node.js library

When it comes to generating PDFs with Node.js, selecting the right library is essential for both functionality and efficiency. Here are some key factors to consider:

  • Ease of use — Look for libraries with straightforward and intuitive APIs that simplify PDF generation. User-friendly libraries reduce the learning curve, making things faster to implement.

  • Performance — Consider libraries that are optimized to handle large data sets and generate PDFs quickly. High performance is particularly valuable when PDFs are created in real time or with large amounts of data.

  • Features — Identify the specific features you need, such as support for images, tables, custom fonts, or interactivity. Some libraries offer advanced capabilities that improve the quality and usability of your PDFs.

  • Compatibility — Ensure the library aligns with your Node.js version and project dependencies to avoid compatibility issues that might require additional troubleshooting.

  • Community support — Choose libraries with active communities and good documentation. Libraries with strong support can be easier to troubleshoot and often have resources to guide you through advanced features.

A few popular options for Node.js PDF generation include Puppeteer, jsPDF, PDFKit, and Document Engine. Each has unique strengths, so it’s important to select based on your project requirements:

  • Puppeteer — Known for rendering complex HTML and CSS layouts, Puppeteer is ideal for capturing highly stylized HTML in PDF form. It leverages a headless browser to accurately replicate a webpage as a PDF, making it great for visual consistency.

  • jsPDF — A lightweight, easy-to-use library, jsPDF works well for straightforward PDF generation tasks, especially when working with basic content.

  • PDFKit — This library offers extensive customization options and is well-suited for projects that require detailed, element-by-element control over PDF layouts.

  • Document Engine — Ideal for complex, enterprise-level applications, Nutrient provides robust support for PDF generation and interactivity, with features like annotations, forms, and digital signatures. It’s compatible with Node.js and is a powerful choice when advanced functionality and precision are key.

Evaluating your specific project needs and testing a few libraries can help you find the right tool for effective and reliable PDF generation in Node.js.

Handling complex scenarios with Node.js and PDF libraries

Generating PDFs from dynamic webpages and processing large datasets can present unique challenges. Here are tools and techniques to help:

  1. Use a headless browser — Libraries like Puppeteer and Playwright enable you to automate browser interactions, making it possible to generate PDFs from dynamic webpages. These tools can render JavaScript-heavy content, ensuring a PDF matches the appearance of the page.

  2. Optimize performance — For handling large datasets, techniques like caching, parallel processing, and using efficient data structures can significantly improve performance. Optimizing these aspects is essential for applications that need to generate PDFs in real time or process high volumes of data.

  3. Use templates — Template engines like Handlebars and EJS are great for generating PDFs with dynamic content. Templates allow you to separate content and layout, making it easier to manage and customize a PDF’s structure.

  4. Customize layouts — Libraries such as PDFKit and jsPDF provide advanced styling and layout options, allowing for tailored PDF designs that meet specific branding or formatting standards. Both libraries support custom fonts, images, tables, and more.

  5. Integrate with Document Engine — For more robust PDF processing, Nutrient Document Engine is a powerful option. It operates as a PDF server, ideal for automating PDF workflows and managing the full document lifecycle. Document Engine runs as a headless service within your infrastructure or hosted via Nutrient. It integrates with Nutrient’s web and mobile SDKs, enabling seamless management of document automation, including features like merging, form filling, annotation, and advanced security — perfect for enterprise applications with high performance and security requirements.

  6. Combine with other tools — Integrate Node.js libraries with external services, such as APITemplate.io or Nutrient Document Engine, to expand capabilities for template-based PDF generation, server-side rendering, and API-based PDF automation. These integrations streamline workflows for applications that need to produce consistent, high-quality PDFs from existing templates or data sources.

Examples of handling complex scenarios

Here’s how to tackle specific scenarios using these tools:

  • Generating PDFs from dynamic webpages — Use Puppeteer or Playwright to capture JavaScript-heavy content for PDFs that mirror webpage content.

  • Handling large datasets — Implement caching, parallel processing, and optimized data handling to efficiently generate PDFs from large datasets.

  • Creating custom layouts — Leverage PDFKit and jsPDF for custom layouts, or use Nutrient Document Engine for advanced layouts and document lifecycle management.

  • Automating with Document Engine — Document Engine offers enterprise-grade PDF automation, with capabilities for processing, annotating, and securing PDFs, making it ideal for applications requiring extensive document handling features.

By applying these techniques and utilizing powerful tools like Document Engine, you can handle even the most complex PDF generation tasks in Node.js applications. This approach ensures your application delivers high-quality, customizable PDFs while meeting specific functionality and performance requirements.

Conclusion

This post covered how to generate PDFs from HTML in Node.js using Puppeteer for basic HTML-to-PDF tasks and Nutrient Document Engine for more advanced features like annotations and digital signatures. With Nutrient, you can set up Document Engine, render HTML with dynamic data, and create production-grade PDFs through API requests — ideal for generating documents such as reports and invoices. To get an API token, contact our Sales team.

FAQ

Here are a few frequently asked questions about Document Engine and working with PDFs.

What are the system requirements for running Document Engine? Document Engine requires at least 4 GB of RAM and can be run on macOS (Ventura, Monterey, Mojave, Catalina, or Big Sur) and Linux distributions (Ubuntu, Fedora, Debian, CentOS). Docker is required to run the engine.
Do I need Puppeteer or any other tools to generate PDFs with Document Engine? No, Document Engine handles the entire process of converting HTML to PDF, so you don’t need Puppeteer or any other tool to generate PDFs.
How do I add dynamic content like names and dates to my PDFs? You can use Mustache templates to inject dynamic data into your HTML before sending it to the Document Engine. This allows you to create personalized PDFs by simply updating the data file.
Can I add watermarks or cover pages to my PDFs? Yes, Document Engine lets you add watermarks and cover pages through its API. You can either add HTML for a watermark or upload a separate PDF file as a cover page.
How do I interact with Document Engine using Node.js? You can send an API request using Node.js and a library like axios to pass your HTML content to Document Engine, which will return the generated PDF. The tutorial includes a sample script for making this request.
Author
Hulya Masharipov
Hulya Masharipov Technical Writer

Hulya is a frontend web developer and technical writer at Nutrient who enjoys creating responsive, scalable, and maintainable web experiences. She’s passionate about open source, web accessibility, cybersecurity privacy, and blockchain.

Explore related topics

Free trial Ready to get started?
Free trial