Blog post

Python HTML to PDF: Convert HTML to PDF using wkhtmltopdf

Illustration: Python HTML to PDF: Convert HTML to PDF using wkhtmltopdf
Information

This article was first published in December 2022 and was updated in August 2024.

If you’re looking for a way to convert HTML to PDF using Python, this post will show you how to do it efficiently using wkhtmltopdf.

wkhtmltopdf is an open source command-line tool that converts HTML to PDF using the Qt WebKit rendering engine. It’s available for macOS, Linux, and Windows.

Common use cases for converting HTML to PDF include generating invoices or receipts for sales, printing shipping labels, converting resumes to PDF, and much more.

This tutorial will use python-pdfkit to convert HTML to PDF, and pdfkit, a simple Python wrapper that allows you to convert HTML to PDF using the wkhtmltopdf utility.

Introduction to HTML-to-PDF conversion

HTML-to-PDF conversion is a crucial process for transforming webpages or HTML documents into Portable Document Format (PDF) files. This conversion is particularly useful for creating printable versions of webpages, generating detailed reports, and sharing documents in a universally accepted format. The ability to convert HTML content into PDF files ensures that the layout and design of the original HTML are preserved, making it ideal for professional documentation.

Several libraries and tools are available for HTML-to-PDF conversion, each offering unique features and functionalities. Popular options include python-pdfkit and WeasyPrint. These libraries provide developers with the tools needed to generate high-quality PDFs from HTML content, whether it’s a simple webpage or a complex document with intricate styling and dynamic elements.

Installing wkhtmltopdf for Python HTML-to-PDF conversion

Before you can use wkhtmltopdf, you need to install it on your operating system.

On macOS

Install wkhtmltopdf using Homebrew:

brew install --cask wkhtmltopdf

On Debian/Ubuntu

Install wkhtmltopdf using APT:

sudo apt-get install wkhtmltopdf

On Windows

Download the latest version of wkhtmltopdf from the wkhtmltopdf website.

After you’ve downloaded the installer, set the path to the wkhtmltopdf binary to your PATH environment variable.

Installing python-pdfkit

Install python-pdfkit using Pip:

pip install pdfkit
# or
pip3 install pdfkit # for Python 3

python-pdfkit provides several APIs to create a PDF document:

  • From a URL using from_url

  • From a string using from_string

  • From a file using from_file

Creating a PDF from a URL

The from_url method takes two arguments: the URL, and the output path. The following code snippet shows how to convert the Google home page to PDF using pdfkit:

import pdfkit

pdfkit.from_url('https://google.com', 'example.pdf')

Running the code

This section outlines two options for running the code.

Option 1: Direct execution

  1. Save the code snippet in a file named url.py.

  2. Run the script:

python url.py

If you’re using Python 3 and the default Python command points to Python 2, use:

python3 url.py

Option 2: Using a virtual environment

  1. Create a virtual environment:

python3 -m venv venv
  1. Activate the virtual environment:

source venv/bin/activate
  1. Install pdfkit:

pip install pdfkit
  1. Save the code snippet in a file named url.py and run the script:

python url.py

The output PDF will be saved in the current directory as example.pdf.

Python HTML to PDF conversion using wkhtmltopdf

Creating a PDF from a string

The from_string method takes two arguments: the HTML string, and the output path. The following code snippet shows how to do this:

import pdfkit

pdfkit.from_string('<h1>Hello World!</h1>', 'out.pdf')

Creating a PDF from a String - Python HTML to PDF conversion

Creating a PDF from a file

The from_file method takes two arguments: the path to the HTML file, and the output path. The following code snippet shows how to do this:

import pdfkit

pdfkit.from_file('index.html', 'index.pdf')

You’ll use an invoice template for the HTML file. You can download the template from here. The following image shows the invoice template.

Invoice HTML to PDF example - Python HTML to PDF conversion

It’s also possible to pass some additional parameters — like the page size, orientation, and margins. Add the options parameter to do this:

options = {
    'page-size': 'Letter',
	 'orientation': 'Landscape',
    'margin-top': '0.75in',
    'margin-right': '0.75in',
    'margin-bottom': '0.75in',
    'margin-left': '0.75in',
    'encoding': "UTF-8",
    'custom-header': [
        ('Accept-Encoding', 'gzip')
    ],
    'no-outline': None
}

pdfkit.from_file('index.html', 'index.pdf', options=options)

Customizing PDF output

Customizing PDF output is a vital aspect of HTML-to-PDF conversion, allowing developers to tailor the generated PDF files to meet specific requirements. With the right tools, you can control various aspects of the PDF layout and design, ensuring the final document aligns with your needs.

For instance, you can adjust the page size, margins, font size, and font family to match your desired specifications. Adding headers, footers, and watermarks is also possible, providing additional context or branding to the PDF files. Moreover, by leveraging CSS and JavaScript, you can further enhance the appearance and behavior of generated PDFs, making them more interactive and visually appealing.

Here’s an example of how to customize the PDF output using python-pdfkit:

import pdfkit

options = {
    'page-size': 'A4',
    'margin-top': '1in',
    'margin-right': '1in',
    'margin-bottom': '1in',
    'margin-left': '1in',
    'encoding': "UTF-8",
    'custom-header': [('Accept-Encoding', 'gzip')],
    'no-outline': None
}

pdfkit.from_file('index.html', 'customized_output.pdf', options=options)

Advanced features of python-pdfkit

python-pdfkit is a robust library that offers several advanced features for HTML-to-PDF conversion. One of its standout capabilities is the support for JavaScript and CSS, allowing you to create dynamic and styled PDFs that closely resemble the original HTML content.

Additionally, python-pdfkit enables the generation of PDFs from website URLs, making it easy to convert entire webpages into PDF documents. This feature is particularly useful for archiving web content or creating offline versions of webpages.

The library also supports the use of templates, which can streamline the process of generating consistent and professional-looking PDFs. You can add custom headers and footers to the generated PDF files, providing additional information or branding elements.

Here’s an example of using python-pdfkit to generate a PDF from a URL with custom headers and footers:

import pdfkit

options = {
    'header-html': 'header.html',
    'footer-html': 'footer.html',
    'page-size': 'A4',
    'margin-top': '1in',
    'margin-right': '1in',
    'margin-bottom': '1in',
    'margin-left': '1in',
    'encoding': "UTF-8"
}

pdfkit.from_url('https://example.com', 'output_with_headers_footers.pdf', options=options)

Additional use cases for HTML-to-PDF conversion

  1. Generating reports

Example: If you need to generate reports from web data, you can convert HTML tables or dashboards to PDF for easy sharing and archiving.

import pdfkit

html_report = """
<html>
  <head><title>Report</title></head>
  <body>
    <h1>Monthly Sales Report</h1>
    <table border="1">
      <tr><th>Product</th><th>Quantity</th><th>Price</th></tr>
      <tr><td>Product A</td><td>10</td><td>$100</td></tr>
      <tr><td>Product B</td><td>5</td><td>$200</td></tr>
    </table>
  </body>
</html>
"""

pdfkit.from_string(html_report, 'report.pdf')

Report HTML to PDF example - Python HTML to PDF conversion

  1. Creating printable event tickets

Example: Convert dynamically generated HTML event tickets into PDFs that users can print or store digitally.

import pdfkit

event_ticket = """
<html>
  <head><title>Event Ticket</title></head>
  <body>
    <h1>Concert Ticket</h1>
    <p>Date: 2024-09-15</p>
    <p>Venue: Music Hall</p>
    <p>Seat: A12</p>
    <barcode>1234567890</barcode>
  </body>
</html>
"""

pdfkit.from_string(event_ticket, 'ticket.pdf')

Report HTML to PDF example - Python HTML to PDF conversion

  1. Exporting blog posts to PDF

Example: If you’re running a blog and want to offer readers the option to download posts as PDFs, you can convert the HTML content of a post into a PDF.

import pdfkit

blog_post_html = """
<html>
  <head><title>My Blog Post</title></head>
  <body>
    <h1>How to Use Python for Web Development</h1>
    <p>Python is a versatile language...</p>
    <h2>Getting Started</h2>
    <p>To start developing web applications with Python...</p>
  </body>
</html>
"""

pdfkit.from_string(blog_post_html, 'blog_post.pdf')

Report HTML to PDF example - Python HTML to PDF conversion

  1. Invoice generation with data integration

Example: Combine Python’s data processing capabilities with HTML-to-PDF conversion to generate invoices automatically.

import pdfkit

customer_name = "John Doe"
items = [
    {"name": "Product 1", "quantity": 2, "price": 50},
    {"name": "Product 2", "quantity": 1, "price": 150},
]

total = sum(item["quantity"] * item["price"] for item in items)

invoice_html = f"""
<html>
  <head><title>Invoice</title></head>
  <body>
    <h1>Invoice for {customer_name}</h1>
    <table border="1">
      <tr><th>Item</th><th>Quantity</th><th>Price</th></tr>
      {''.join(f"<tr><td>{item['name']}</td><td>{item['quantity']}</td><td>${item['price']}</td></tr>" for item in items)}
      <tr><td colspan="2">Total</td><td>${total}</td></tr>
    </table>
  </body>
</html>
"""

pdfkit.from_string(invoice_html, 'invoice.pdf')

Report HTML to PDF example - Python HTML to PDF conversion

  1. PDFs for e-learning materials

Example: Create downloadable PDFs for e-learning courses, where the content is initially written in HTML.

import pdfkit

course_material = """
<html>
  <head><title>Python Programming Course</title></head>
  <body>
    <h1>Introduction to Python</h1>
    <p>Python is a powerful, easy-to-learn programming language...</p>
    <h2>Lesson 1: Variables</h2>
    <p>Variables in Python are...</p>
  </body>
</html>
"""

pdfkit.from_string(course_material, 'python_course.pdf')

Report HTML to PDF example - Python HTML to PDF conversion

  1. PDFs for product catalogs

Example: Convert product catalogs stored as HTML into PDFs for distribution to clients or for offline viewing.

import pdfkit

catalog_html = """
<html>
  <head><title>Product Catalog</title></head>
  <body>
    <h1>Our Products</h1>
    <div><h2>Product A</h2><p>Description of Product A...</p></div>
    <div><h2>Product B</h2><p>Description of Product B...</p></div>
  </body>
</html>
"""

pdfkit.from_string(catalog_html, 'catalog.pdf')

Report HTML to PDF example - Python HTML to PDF conversion

  1. Multipage PDF creation

Example: Demonstrate how to create a multipage PDF by concatenating several HTML strings.

import pdfkit

pages = [
    "<html><body><h1>Page 1</h1><p>Content for page 1...</p></body></html>",
    "<html><body><h1>Page 2</h1><p>Content for page 2...</p></body></html>",
    "<html><body><h1>Page 3</h1><p>Content for page 3...</p></body></html>"
]

pdfkit.from_string("".join(pages), 'multi_page.pdf')

Report HTML to PDF example - Python HTML to PDF conversion

Best practices for HTML-to-PDF conversion

When converting HTML to PDF, following best practices ensures the generated PDFs are accurate and reliable and meet the required standards. Here are some key practices to consider:

  1. Ensure well-structured HTML content — Valid and well-structured HTML content is crucial for successful PDF conversion. This helps prevent errors and ensures that the layout and design are accurately preserved in the generated PDF.

  2. Use reliable libraries — Choose a reliable and efficient library for HTML-to-PDF conversion. Libraries like python-pdfkit and WeasyPrint are popular choices that offer robust features and functionalities.

  3. Customize PDF output — Tailor the PDF output to meet the specific needs of your application. Adjust page size, margins, fonts, and other layout elements to ensure the final document aligns with your requirements.

  4. Thorough testing — Test the generated PDF files thoroughly to ensure they meet the desired standards. Check for any layout issues, missing content, or other discrepancies that may affect the quality of the PDF.

By following these best practices, you can ensure your HTML-to-PDF conversion process is smooth, and the resulting PDFs are of high quality.

Troubleshooting common issues

Converting HTML to PDF can sometimes present challenges. Here are some common issues and their solutions:

  1. Errors in HTML content — If PDF generation fails, it may be due to errors in the HTML content. Ensure the HTML is well-structured and valid. Use tools like HTML validators to check for any issues.

  2. Incorrect layout or design — If the generated PDF doesn’t match the expected layout or design, customize the PDF output using the available options. Adjust page size, margins, and other settings to achieve the desired result.

  3. Library configuration issues — Ensure the HTML-to-PDF conversion library is properly installed and configured. Check the documentation for any specific setup requirements, and verify that all dependencies are met.

  4. Resource accessibility — Ensure that all resources (e.g. images, CSS files) referenced in the HTML are accessible. Broken links or missing resources can affect the appearance of the generated PDF.

If you encounter persistent issues, consult the documentation and support resources provided by the library. Many libraries have active communities and support channels where you can seek assistance.

By addressing these common issues, you can ensure a smooth and successful HTML-to-PDF conversion process.

Conclusion

In this tutorial, you saw how to generate PDFs from HTML using wkhtmltopdf. If you’re looking to add more robust PDF capabilities, Nutrient offers a commercial JavaScript PDF library that can easily be integrated into your web application. It comes with 30+ features that let you view, annotate, edit, and sign documents directly in your browser. Out of the box, it has a polished and flexible UI that you can extend or simplify based on your unique use case.

You can also deploy our vanilla JavaScript PDF viewer or use one of our many web framework deployment options like React.js, Angular, and Vue.js. To see a list of all web frameworks, start your free trial. Or, launch our demo to see our viewer in action.

FAQ

Here are a few frequently asked questions about converting HTML to PDF using Python.

How can I convert HTML to PDF using Python with wkhtmltopdf? To convert HTML to PDF using Python with wkhtmltopdf, install pdfkit with pip install pdfkit. Then use pdfkit.from_file('input.html', 'output.pdf') to generate a PDF.
What are the steps to install and configure wkhtmltopdf for Python HTML-to-PDF conversion? First, install wkhtmltopdf on your system and ensure it’s in your PATH. Then, install pdfkit using pip install pdfkit and configure it in your Python script if needed.
Can I customize the PDF output when converting HTML to PDF with Python? Yes, you can customize the PDF output by passing options like page size and margins to the pdfkit function when converting HTML to PDF with Python.
How can I troubleshoot errors in Python HTML-to-PDF conversion using wkhtmltopdf? Review the error messages, ensure wkhtmltopdf is correctly installed, verify that HTML resources are accessible, and adjust pdfkit options to address rendering issues.
Author
Hulya Masharipov
Hulya Masharipov Technical Writer

Hulya is a frontend web developer and technical writer at Nutrient who enjoys creating responsive, scalable, and maintainable web experiences. She’s passionate about open source, web accessibility, cybersecurity privacy, and blockchain.

Free trial Ready to get started?
Free trial