This article was first published in December 2022 and was updated in August 2024.
If you’re looking for a way to convert HTML to PDF using Python, this post will show you how to do it efficiently using wkhtmltopdf
.
wkhtmltopdf
is an open source command-line tool that converts HTML to PDF using the Qt WebKit rendering engine. It’s available for macOS, Linux, and Windows.
Common use cases for converting HTML to PDF include generating invoices or receipts for sales, printing shipping labels, converting resumes to PDF, and much more.
This tutorial will use python-pdfkit to convert HTML to PDF, and pdfkit, a simple Python wrapper that allows you to convert HTML to PDF using the wkhtmltopdf utility.
Introduction to HTML-to-PDF conversion
HTML-to-PDF conversion is a crucial process for transforming webpages or HTML documents into Portable Document Format (PDF) files. This conversion is particularly useful for creating printable versions of webpages, generating detailed reports, and sharing documents in a universally accepted format. The ability to convert HTML content into PDF files ensures that the layout and design of the original HTML are preserved, making it ideal for professional documentation.
Several libraries and tools are available for HTML-to-PDF conversion, each offering unique features and functionalities. Popular options include python-pdfkit
and WeasyPrint
. These libraries provide developers with the tools needed to generate high-quality PDFs from HTML content, whether it’s a simple webpage or a complex document with intricate styling and dynamic elements.
Installing wkhtmltopdf for Python HTML-to-PDF conversion
Before you can use wkhtmltopdf, you need to install it on your operating system.
On macOS
Install wkhtmltopdf using Homebrew:
brew install --cask wkhtmltopdf
On Debian/Ubuntu
Install wkhtmltopdf using APT:
sudo apt-get install wkhtmltopdf
On Windows
Download the latest version of wkhtmltopdf from the wkhtmltopdf website.
After you’ve downloaded the installer, set the path to the wkhtmltopdf binary to your PATH
environment variable.
Installing python-pdfkit
Install python-pdfkit using Pip:
pip install pdfkit # or pip3 install pdfkit # for Python 3
python-pdfkit provides several APIs to create a PDF document:
-
From a URL using
from_url
-
From a string using
from_string
-
From a file using
from_file
Creating a PDF from a URL
The from_url
method takes two arguments: the URL, and the output path. The following code snippet shows how to convert the Google home page to PDF using pdfkit:
import pdfkit pdfkit.from_url('https://google.com', 'example.pdf')
Running the code
This section outlines two options for running the code.
Option 1: Direct execution
-
Save the code snippet in a file named
url.py
. -
Run the script:
python url.py
If you’re using Python 3 and the default Python command points to Python 2, use:
python3 url.py
Option 2: Using a virtual environment
-
Create a virtual environment:
python3 -m venv venv
-
Activate the virtual environment:
source venv/bin/activate
-
Install pdfkit:
pip install pdfkit
-
Save the code snippet in a file named
url.py
and run the script:
python url.py
The output PDF will be saved in the current directory as example.pdf
.
Creating a PDF from a string
The from_string
method takes two arguments: the HTML string, and the output path. The following code snippet shows how to do this:
import pdfkit pdfkit.from_string('<h1>Hello World!</h1>', 'out.pdf')
Creating a PDF from a file
The from_file
method takes two arguments: the path to the HTML file, and the output path. The following code snippet shows how to do this:
import pdfkit pdfkit.from_file('index.html', 'index.pdf')
You’ll use an invoice template for the HTML file. You can download the template from here. The following image shows the invoice template.
It’s also possible to pass some additional parameters — like the page size, orientation, and margins. Add the options
parameter to do this:
options = { 'page-size': 'Letter', 'orientation': 'Landscape', 'margin-top': '0.75in', 'margin-right': '0.75in', 'margin-bottom': '0.75in', 'margin-left': '0.75in', 'encoding': "UTF-8", 'custom-header': [ ('Accept-Encoding', 'gzip') ], 'no-outline': None } pdfkit.from_file('index.html', 'index.pdf', options=options)
Customizing PDF output
Customizing PDF output is a vital aspect of HTML-to-PDF conversion, allowing developers to tailor the generated PDF files to meet specific requirements. With the right tools, you can control various aspects of the PDF layout and design, ensuring the final document aligns with your needs.
For instance, you can adjust the page size, margins, font size, and font family to match your desired specifications. Adding headers, footers, and watermarks is also possible, providing additional context or branding to the PDF files. Moreover, by leveraging CSS and JavaScript, you can further enhance the appearance and behavior of generated PDFs, making them more interactive and visually appealing.
Here’s an example of how to customize the PDF output using python-pdfkit:
import pdfkit options = { 'page-size': 'A4', 'margin-top': '1in', 'margin-right': '1in', 'margin-bottom': '1in', 'margin-left': '1in', 'encoding': "UTF-8", 'custom-header': [('Accept-Encoding', 'gzip')], 'no-outline': None } pdfkit.from_file('index.html', 'customized_output.pdf', options=options)
Advanced features of python-pdfkit
python-pdfkit is a robust library that offers several advanced features for HTML-to-PDF conversion. One of its standout capabilities is the support for JavaScript and CSS, allowing you to create dynamic and styled PDFs that closely resemble the original HTML content.
Additionally, python-pdfkit enables the generation of PDFs from website URLs, making it easy to convert entire webpages into PDF documents. This feature is particularly useful for archiving web content or creating offline versions of webpages.
The library also supports the use of templates, which can streamline the process of generating consistent and professional-looking PDFs. You can add custom headers and footers to the generated PDF files, providing additional information or branding elements.
Here’s an example of using python-pdfkit to generate a PDF from a URL with custom headers and footers:
import pdfkit options = { 'header-html': 'header.html', 'footer-html': 'footer.html', 'page-size': 'A4', 'margin-top': '1in', 'margin-right': '1in', 'margin-bottom': '1in', 'margin-left': '1in', 'encoding': "UTF-8" } pdfkit.from_url('https://example.com', 'output_with_headers_footers.pdf', options=options)
Additional use cases for HTML-to-PDF conversion
-
Generating reports
Example: If you need to generate reports from web data, you can convert HTML tables or dashboards to PDF for easy sharing and archiving.
import pdfkit html_report = """ <html> <head><title>Report</title></head> <body> <h1>Monthly Sales Report</h1> <table border="1"> <tr><th>Product</th><th>Quantity</th><th>Price</th></tr> <tr><td>Product A</td><td>10</td><td>$100</td></tr> <tr><td>Product B</td><td>5</td><td>$200</td></tr> </table> </body> </html> """ pdfkit.from_string(html_report, 'report.pdf')
-
Creating printable event tickets
Example: Convert dynamically generated HTML event tickets into PDFs that users can print or store digitally.
import pdfkit event_ticket = """ <html> <head><title>Event Ticket</title></head> <body> <h1>Concert Ticket</h1> <p>Date: 2024-09-15</p> <p>Venue: Music Hall</p> <p>Seat: A12</p> <barcode>1234567890</barcode> </body> </html> """ pdfkit.from_string(event_ticket, 'ticket.pdf')
-
Exporting blog posts to PDF
Example: If you’re running a blog and want to offer readers the option to download posts as PDFs, you can convert the HTML content of a post into a PDF.
import pdfkit blog_post_html = """ <html> <head><title>My Blog Post</title></head> <body> <h1>How to Use Python for Web Development</h1> <p>Python is a versatile language...</p> <h2>Getting Started</h2> <p>To start developing web applications with Python...</p> </body> </html> """ pdfkit.from_string(blog_post_html, 'blog_post.pdf')
-
Invoice generation with data integration
Example: Combine Python’s data processing capabilities with HTML-to-PDF conversion to generate invoices automatically.
import pdfkit customer_name = "John Doe" items = [ {"name": "Product 1", "quantity": 2, "price": 50}, {"name": "Product 2", "quantity": 1, "price": 150}, ] total = sum(item["quantity"] * item["price"] for item in items) invoice_html = f""" <html> <head><title>Invoice</title></head> <body> <h1>Invoice for {customer_name}</h1> <table border="1"> <tr><th>Item</th><th>Quantity</th><th>Price</th></tr> {''.join(f"<tr><td>{item['name']}</td><td>{item['quantity']}</td><td>${item['price']}</td></tr>" for item in items)} <tr><td colspan="2">Total</td><td>${total}</td></tr> </table> </body> </html> """ pdfkit.from_string(invoice_html, 'invoice.pdf')
-
PDFs for e-learning materials
Example: Create downloadable PDFs for e-learning courses, where the content is initially written in HTML.
import pdfkit course_material = """ <html> <head><title>Python Programming Course</title></head> <body> <h1>Introduction to Python</h1> <p>Python is a powerful, easy-to-learn programming language...</p> <h2>Lesson 1: Variables</h2> <p>Variables in Python are...</p> </body> </html> """ pdfkit.from_string(course_material, 'python_course.pdf')
-
PDFs for product catalogs
Example: Convert product catalogs stored as HTML into PDFs for distribution to clients or for offline viewing.
import pdfkit catalog_html = """ <html> <head><title>Product Catalog</title></head> <body> <h1>Our Products</h1> <div><h2>Product A</h2><p>Description of Product A...</p></div> <div><h2>Product B</h2><p>Description of Product B...</p></div> </body> </html> """ pdfkit.from_string(catalog_html, 'catalog.pdf')
-
Multipage PDF creation
Example: Demonstrate how to create a multipage PDF by concatenating several HTML strings.
import pdfkit pages = [ "<html><body><h1>Page 1</h1><p>Content for page 1...</p></body></html>", "<html><body><h1>Page 2</h1><p>Content for page 2...</p></body></html>", "<html><body><h1>Page 3</h1><p>Content for page 3...</p></body></html>" ] pdfkit.from_string("".join(pages), 'multi_page.pdf')
Best practices for HTML-to-PDF conversion
When converting HTML to PDF, following best practices ensures the generated PDFs are accurate and reliable and meet the required standards. Here are some key practices to consider:
-
Ensure well-structured HTML content — Valid and well-structured HTML content is crucial for successful PDF conversion. This helps prevent errors and ensures that the layout and design are accurately preserved in the generated PDF.
-
Use reliable libraries — Choose a reliable and efficient library for HTML-to-PDF conversion. Libraries like python-pdfkit and WeasyPrint are popular choices that offer robust features and functionalities.
-
Customize PDF output — Tailor the PDF output to meet the specific needs of your application. Adjust page size, margins, fonts, and other layout elements to ensure the final document aligns with your requirements.
-
Thorough testing — Test the generated PDF files thoroughly to ensure they meet the desired standards. Check for any layout issues, missing content, or other discrepancies that may affect the quality of the PDF.
By following these best practices, you can ensure your HTML-to-PDF conversion process is smooth, and the resulting PDFs are of high quality.
Troubleshooting common issues
Converting HTML to PDF can sometimes present challenges. Here are some common issues and their solutions:
-
Errors in HTML content — If PDF generation fails, it may be due to errors in the HTML content. Ensure the HTML is well-structured and valid. Use tools like HTML validators to check for any issues.
-
Incorrect layout or design — If the generated PDF doesn’t match the expected layout or design, customize the PDF output using the available options. Adjust page size, margins, and other settings to achieve the desired result.
-
Library configuration issues — Ensure the HTML-to-PDF conversion library is properly installed and configured. Check the documentation for any specific setup requirements, and verify that all dependencies are met.
-
Resource accessibility — Ensure that all resources (e.g. images, CSS files) referenced in the HTML are accessible. Broken links or missing resources can affect the appearance of the generated PDF.
If you encounter persistent issues, consult the documentation and support resources provided by the library. Many libraries have active communities and support channels where you can seek assistance.
By addressing these common issues, you can ensure a smooth and successful HTML-to-PDF conversion process.
Conclusion
In this tutorial, you saw how to generate PDFs from HTML using wkhtmltopdf. If you’re looking to add more robust PDF capabilities, Nutrient offers a commercial JavaScript PDF library that can easily be integrated into your web application. It comes with 30+ features that let you view, annotate, edit, and sign documents directly in your browser. Out of the box, it has a polished and flexible UI that you can extend or simplify based on your unique use case.
You can also deploy our vanilla JavaScript PDF viewer or use one of our many web framework deployment options like React.js, Angular, and Vue.js. To see a list of all web frameworks, start your free trial. Or, launch our demo to see our viewer in action.
FAQ
Here are a few frequently asked questions about converting HTML to PDF using Python.
How can I convert HTML to PDF using Python with wkhtmltopdf
?
To convert HTML to PDF using Python with wkhtmltopdf
, install pdfkit
with pip install pdfkit
. Then use pdfkit.from_file('input.html', 'output.pdf')
to generate a PDF.
What are the steps to install and configure wkhtmltopdf
for Python HTML-to-PDF conversion?
First, install wkhtmltopdf
on your system and ensure it’s in your PATH
. Then, install pdfkit
using pip install pdfkit
and configure it in your Python script if needed.
Can I customize the PDF output when converting HTML to PDF with Python?
Yes, you can customize the PDF output by passing options like page size and margins to thepdfkit
function when converting HTML to PDF with Python.
How can I troubleshoot errors in Python HTML-to-PDF conversion using wkhtmltopdf
?
Review the error messages, ensure wkhtmltopdf
is correctly installed, verify that HTML resources are accessible, and adjust pdfkit
options to address rendering issues.