Python HTML to PDF: Convert HTML to PDF using wkhtmltopdf

Q: How Can I Convert HTML to PDF Using Python with wkhtmltopdf?

To convert HTML to PDF using Python with wkhtmltopdf, install `pdfkit` with `pip install pdfkit`. Then use `pdfkit.from_file('input.html', 'output.pdf')` to generate a PDF.

Q: What Are the Steps to Install and Configure wkhtmltopdf for Python HTML-to-PDF Conversion?

First, install `wkhtmltopdf` on your system and ensure it’s in your `PATH`. Then, install `pdfkit` using `pip install pdfkit` and configure it in your Python script if needed.

Q: How Can I Troubleshoot Errors in Python HTML-to-PDF Conversion Using wkhtmltopdf?

Review the error messages, ensure `wkhtmltopdf` is correctly installed, verify that HTML resources are accessible, and adjust `pdfkit` options to address rendering issues.

Hulya Masharipov

Illustration: Python HTML to PDF: Convert HTML to PDF using wkhtmltopdf

TL;DR

This tutorial demonstrates HTML-to-PDF conversion in Python using wkhtmltopdf and python-pdfkit. Install wkhtmltopdf on your system, and then use pdfkit to generate PDFs from URLs, HTML strings, or files with from_url(), from_string(), or from_file(). Customize output with options for page size, orientation, and margins. This is ideal for generating reports, invoices, tickets, and product catalogs with clean, consistent formatting across platforms.

If you’re looking for a way to convert HTML to PDF using Python, this post will show you how to do it efficiently using wkhtmltopdf.

wkhtmltopdf is an open source command-line tool that converts HTML to PDF using the Qt WebKit rendering engine. It’s available for macOS, Linux, and Windows.

Common use cases for converting HTML to PDF include generating invoices or receipts for sales, printing shipping labels, converting resumes to PDF, and much more.

This tutorial will use python-pdfkit to convert HTML to PDF, and pdfkit, a simple Python wrapper that allows you to convert HTML to PDF using the wkhtmltopdf utility.

Introduction to HTML-to-PDF conversion

HTML-to-PDF conversion is a crucial process for transforming webpages or HTML documents into Portable Document Format (PDF) files. This conversion is particularly useful for creating printable versions of webpages, generating detailed reports, and sharing documents in a universally accepted format. The ability to convert HTML content into PDF files ensures that the layout and design of the original HTML are preserved, making it ideal for professional documentation.

Several libraries and tools are available for HTML-to-PDF conversion, each offering unique features and functionalities. Popular options include python-pdfkit and WeasyPrint. These libraries provide developers with the tools needed to generate high-quality PDFs from HTML content, whether it’s a simple webpage or a complex document with intricate styling and dynamic elements.

Installing wkhtmltopdf for Python HTML-to-PDF conversion

Before you can use wkhtmltopdf, you need to install it on your operating system.

On macOS

Install wkhtmltopdf using Homebrew:

brew install --cask wkhtmltopdf

On Debian/Ubuntu

Install wkhtmltopdf using APT:

sudo apt-get install wkhtmltopdf

On Windows

Download the latest version of wkhtmltopdf from the wkhtmltopdf website.

After you’ve downloaded the installer, set the path to the wkhtmltopdf binary to your PATH environment variable.

Installing python-pdfkit

Install python-pdfkit using Pip:

pip install pdfkit
# or
pip3 install pdfkit # for Python 3

python-pdfkit provides several APIs to create a PDF document:

From a URL using from_url
From a string using from_string
From a file using from_file

Creating a PDF from a URL

The from_url method takes two arguments: the URL, and the output path. The following code snippet shows how to convert the Google home page to PDF using pdfkit:

import pdfkit

pdfkit.from_url('https://google.com', 'example.pdf')

Running the code

This section outlines two options for running the code.

Option 1: Direct execution

Save the code snippet in a file named url.py.
Run the script:

python url.py

If you’re using Python 3 and the default Python command points to Python 2, use:

python3 url.py

Option 2: Using a virtual environment

Create a virtual environment:

python3 -m venv venv

Activate the virtual environment:

source venv/bin/activate

Install pdfkit:

pip install pdfkit

Save the code snippet in a file named url.py and run the script:

python url.py

The output PDF will be saved in the current directory as example.pdf.

Python HTML to PDF conversion using wkhtmltopdf

Creating a PDF from a string

The from_string method takes two arguments: the HTML string, and the output path. The following code snippet shows how to do this:

import pdfkit

pdfkit.from_string('<h1>Hello World!</h1>', 'out.pdf')

Creating a PDF from a String - Python HTML to PDF conversion

Creating a PDF from a file

The from_file method takes two arguments: the path to the HTML file, and the output path. The following code snippet shows how to do this:

import pdfkit

pdfkit.from_file('index.html', 'index.pdf')

You’ll use an invoice template for the HTML file. You can download the template from here. The following image shows the invoice template.

Invoice HTML to PDF example - Python HTML to PDF conversion

It’s also possible to pass some additional parameters — like the page size, orientation, and margins. Add the options parameter to do this:

options = {
    'page-size': 'Letter',
	 'orientation': 'Landscape',
    'margin-top': '0.75in',
    'margin-right': '0.75in',
    'margin-bottom': '0.75in',
    'margin-left': '0.75in',
    'encoding': "UTF-8",
    'custom-header': [
        ('Accept-Encoding', 'gzip')
    ],
    'no-outline': None
}

pdfkit.from_file('index.html', 'index.pdf', options=options)

Customizing PDF output

Customizing PDF output is a vital aspect of HTML-to-PDF conversion, allowing developers to tailor the generated PDF files to meet specific requirements. With the right tools, you can control various aspects of the PDF layout and design, ensuring the final document aligns with your needs.

For instance, you can adjust the page size, margins, font size, and font family to match your desired specifications. Adding headers, footers, and watermarks is also possible, providing additional context or branding to the PDF files. Moreover, by leveraging CSS and JavaScript, you can further enhance the appearance and behavior of generated PDFs, making them more interactive and visually appealing.

Here’s an example of how to customize the PDF output using python-pdfkit:

import pdfkit

options = {
    'page-size': 'A4',
    'margin-top': '1in',
    'margin-right': '1in',
    'margin-bottom': '1in',
    'margin-left': '1in',
    'encoding': "UTF-8",
    'custom-header': [('Accept-Encoding', 'gzip')],
    'no-outline': None
}

pdfkit.from_file('index.html', 'customized_output.pdf', options=options)

Advanced features of python-pdfkit

python-pdfkit is a robust library that offers several advanced features for HTML-to-PDF conversion. One of its standout capabilities is the support for JavaScript and CSS, allowing you to create dynamic and styled PDFs that closely resemble the original HTML content.

Additionally, python-pdfkit enables the generation of PDFs from website URLs, making it easy to convert entire webpages into PDF documents. This feature is particularly useful for archiving web content or creating offline versions of webpages.

The library also supports the use of templates, which can streamline the process of generating consistent and professional-looking PDFs. You can add custom headers and footers to the generated PDF files, providing additional information or branding elements.

Here’s an example of using python-pdfkit to generate a PDF from a URL with custom headers and footers:

import pdfkit

options = {
    'header-html': 'header.html',
    'footer-html': 'footer.html',
    'page-size': 'A4',
    'margin-top': '1in',
    'margin-right': '1in',
    'margin-bottom': '1in',
    'margin-left': '1in',
    'encoding': "UTF-8"
}

pdfkit.from_url('https://example.com', 'output_with_headers_footers.pdf', options=options)

Additional use cases for HTML-to-PDF conversion

Generating reports

Example: If you need to generate reports from web data, you can convert HTML tables or dashboards to PDF for easy sharing and archiving.

import pdfkit

html_report = """
<html>
  <head><title>Report</title></head>
  <body>
    <h1>Monthly Sales Report</h1>
    <table border="1">
      <tr><th>Product</th><th>Quantity</th><th>Price</th></tr>
      <tr><td>Product A</td><td>10</td><td>$100</td></tr>
      <tr><td>Product B</td><td>5</td><td>$200</td></tr>
    </table>
  </body>
</html>
"""

pdfkit.from_string(html_report, 'report.pdf')