Solving formatting issues when converting HTML to PDF
The Muhimbi Document Converter comes with the ability to convert HTML to PDF. However, as HTML is not really a language intended for output to a printer (or PDF), some pages may not look as expected.
Note that as of version 8.3 the Document Converter comes with a brand new HTML to Document Converter. This converter solves a number, but not all, of the issues described below.
To troubleshoot authentication and connectivity related errors, or HTML to PDF Conversions that return an empty PDF, see this Knowledge Base Article.
Listed below are a number of possible workarounds that may improve the formatting of the generated PDFs.
-
When using the Internet Explorer based HTML to PDF option ( the default in pre 8.3 releases, from 8.3 onwards the Print CSS media type is enabled by default), the Document Converter does not go through Internet Explorer’s print processing engine, so any print specific CSS entries are not used. If you have control over the page that is being converted then you can add some logic inside the page that looks at a query string parameter ( e.g. ?pdfconversion=true). Based on this parameter being present you can then emit different CSS / HTML that improves the formatting, e.g. a different page width. In SharePoint this can be achieved by modifying the master page or inserting a hidden ‘content editor web part’.
-
PDF Conversion starts the moment a page ‘finishes loading’. This generally works well, but some modern web pages rely on JavaScript to render part (or all) of the page. There is no way for our software to detect when the JavaScript has finished executing, as a result the converted PDF may only show partial information. As a workaround consider specifying a ‘ConversionDelay’ in our configuration file. Start with a high value (e.g. 30000 = 30 seconds) and if that works lower it to a more reasonable figure, often it just requires a value of 1000 (1 second)
<!-- Optional delay (in milliseconds) between loading the web page and
converting to PDF. This allows asynchronous events such as JavaScript to
complete –>
When setting the ConversionDelay to a value larger than 15000 (15 seconds), please make sure the following configuration value is set to a value larger than the ConversionDelay. Please note that this particular setting is specified in seconds, not milliseconds.
<!-- Max number of cycles (1sec apart) before a converter is considered ‘hanging’ and will be terminated –>
-
Alternatively you may want to consider making a change to one or more of the following settings in the Conversion Service’s config file.
-
HTMLConverterFullFidelity.PaperSize: Specify the paper size to use for the PDF when converting HTML pages. For example A4, Letter or a custom page size. For full details see the Conversion Service’s config file.
-
HTMLConverterFullFidelity.PageOrientation: Specifies the page orientation, either ‘Portrait’ or ‘Landscape’.
-
HTMLConverterFullFidelity.PageMargin: The Margin / border around the generated PDF file.
-
HTMLConverterFullFidelity.ScaleMode: Determine how the HTML will be scaled to the PDF page size, either FitWidth, FitWidthScaleImagesOnly or NoScale.
-
HTMLConverterFullFidelity.SplitTextLines: Should text be broken up or wrapped to a new page?
-
HTMLConverterFullFidelity.SplitImages: Should images be broken up or wrapped to a new page?
-
-
When using the Internet Explorer based HTML to PDF option ( the default in pre 8.3 releases), converting a URL sometimes results in ‘bitmap’ output where it looks like there is just a screenshot of the page content in the generated PDF rather than ‘real text’. This happens when the server running the conversion service is running Internet Explorer 9 or later. Microsoft introduced a change in that version that causes all HTML5 content to be rendered as a bitmap. Non HTML5 content is rendered just fine. To solve this problem roll back to Internet Explorer 8 or, if you have control over the content of the web page, skip output of the HTML5 doctype when a query string is passed in that indicates PDF Conversion. Alternatively update to the latest release and use the default ’ WebKit’ based conversion engine.
Although there is no ‘one size solves all’ answer, using the workarounds recommended above should make it possible to generated acceptable looking PDF files for most situations. As always, if you require assistance then feel free to reach out to our friendly support desk, we are here to help.