SharePoint Search does not return converted HTML Content
When converting HTML to PDF, please make sure you are running version 8.3 (or later) of the Muhimbi Document Converter as that version comes with a much improved HTML Converter that not only generates better looking output, it also ensures that SharePoint Search can find all content.
Muhimbi’s Document Converter comes with a facility to convert HTML content to PDF. Although the generated PDF can be searched in standard PDF viewers, iFilter based search indexers may struggle to index the content.
This can be solved by changing the following 2 lines in the config file to ‘True’
<add key="HTMLConverterFullFidelity.SplitTextLines" value="True"/> <add key="HTMLConverterFullFidelity.SplitImages" value="True"/>
For details about how to edit the Conversion Service’s configuration file see this article.
Please note that enabling the splitting of text and images has the following side effects:
-
The generated PDF file will be larger (in file size)
-
Content at the end of a page is not wrapped to the next page, but may end up being split.