Overriding default Conversion Settings when converting documents in SharePoint
The Muhimbi PDF Converter for SharePoint is based on an extremely flexible central conversion engine. This engine supports much more options than we can realistically display in the limited space available in a Workflow Activities. As a result we default to the most common options.
It has always been possible to override these settings (See the Administration Guide, section 2.3.2), but these overrides were global and affected all operations across the SharePoint Farm.
Although many of our customers are happy with this arrangement, it just didn’t suit everyone, particularly our power users. To improve this situation version 6.1 comes with a new facility that allows the default settings to be overridden on a request by request basis, hurrah!
The Convert Document workflow activity has been extended and the previously reserved Optional Parameters field can now be used to specify ‘override settings’. Overriding of parameters is done using an XML based syntax. Although very powerful, the XML syntax and possible values for each XML Element may not immediately be obvious to all users. For full information see the class diagrams and details of the OpenOptions and ConversionSettings classes in the Developer Guide.
The values that can be overridden are as follows.
<Override> <OpenOptions> <AllowExternalConnections>false</AllowExternalConnections> <AllowMacros>None</AllowMacros> <FileExtension></FileExtension> <OriginalFileName></OriginalFileName> <UserName></UserName> <Password></Password> <RefreshContent>true</RefreshContent> </OpenOptions> <ConversionSettings> <ConverterSpecificSettings type="ConverterSpecificSettings_WordProcessing"> <ProcessDocumentTemplate>false</ProcessDocumentTemplate> <RevisionsAndCommentsMarkupMode>InLine</RevisionsAndCommentsMarkupMode> <RevisionsAndCommentsDisplayMode>Original</RevisionsAndCommentsDisplayMode> </ConverterSpecificSettings> <OutputFormatSpecificSettings type="OutputFormatSpecificSettings_PDF"> <FastWebView>false</FastWebView> <EmbedAllFonts>false</EmbedAllFonts> <SubsetFonts>false</SubsetFonts> <PostProcessFile>false</PostProcessFile> <ViewerPreferences> <CenterWindow>true</CenterWindow> <NavigationPane>Bookmarks</NavigationPane> </ViewerPreferences> </OutputFormatSpecificSettings> <OCRSettings> <Language>English</Language> <Performance>Slow</Performance> <Paginate>true</Paginate> <WhiteList></WhiteList> <BlackList></BlackList> <Regions> <OCRRegion> <X>100</X><Y>100</Y><Width>200</Width><Height>50</Height> <StartPage>0</StartPage><EndPage>0</EndPage> <PageInterval>1</PageInterval><PageRange></PageRange> </OCRRegion> </Regions> </OCRSettings> <StartPage>1</StartPage> <EndPage>1</EndPage> <Fidelity>High</Fidelity> <Format>PDF</Format> <GenerateBookmarks>Disabled</GenerateBookmarks> <OpenPassword></OpenPassword> <OwnerPassword></OwnerPassword> <SecurityOptions>DisableContentAccessibility</SecurityOptions> <PageOrientation>Default</PageOrientation> <PDFProfile>PDF_A1B</PDFProfile> <Quality>OptimizeForPrint</Quality> <Range>VisibleDocuments</Range> <TOCSettings>...</TOCSettings> </ConversionSettings> </Override>
Please note:
-
When entering values only specify those fields you want to override, leave all other fields completely out (don’t just provide empty values, delete the entire line. (See the examples below)
-
All values are case sensitive.
-
Boolean values (true / false) need to be in all lowercase.
-
There is no need to specify the name of the enumeration, e.g. in RevisionsAndCommentsMarkupMode
-
In the ConverterSpecificSettings element you must specify the ‘type’ attribute to specify the exact type.
-
When specifying multiple values, e.g. in the SecurityOptions element, then please separate these options using a blank space.
-
The OutputFormatSpecificSettings property requires version 7.0.0.72 of the PDF Converter or newer.
-
The TOCSettings property requires version 7.3 of the PDF Converter or newer.
Enough with the theory, let’s work through some samples.
Make output format dynamic
If your workflow must have the ability to convert to different file formats depending on a workflow parameter or column value then you cannot use the normal drop down menu to pre-select the output format. Instead specify the following XML, position the cursor after
<Override> <ConversionSettings> <Format></Format> </ConversionSettings> </Override>
Generate versions of MS-Word files with different revision tracking options
MS-Word files support revision tracking, which is ideal for visualising what has changed in a document. By default we display the Final version of the document and don’t show the individual revisions. However, some people may want to display these revisions and control how revisions are displayed (in-line or in balloons).
The XML to override these settings is as follows
<Override> <ConversionSettings> <ConverterSpecificSettings type="ConverterSpecificSettings_WordProcessing"> <ProcessDocumentTemplate>true</ProcessDocumentTemplate> <RevisionsAndCommentsMarkupMode>InLine</RevisionsAndCommentsMarkupMode> <RevisionsAndCommentsDisplayMode>OriginalShowingMarkup</RevisionsAndCommentsDisplayMode> </ConverterSpecificSettings> </ConversionSettings> </Override>
The possible values for RevisionsAndCommentsMarkupMode are:
-
InLine: Show all revisions Inline.
-
Balloon: Show all revisions in balloons.
-
Mixed: Show only comments and formatting in balloons.
The possible values for RevisionsAndCommentsDisplayMode are:
-
FinalShowingMarkup: Show the document with all proposed changes highlighted.
-
Final: Show the document with all proposed changes included.
-
OriginalShowingMarkup: Show the original document with all proposed changes highlighted.
-
Original: Show the document before any changes were made.
Trim page numbers when converting to PDF
Sometimes you may not be interested in all pages of the converted file, e.g you only want to convert the cover page (Page 1). To achieve this just set the StartPage and EndPage to 1 as follows:
<Override> <ConversionSettings> <StartPage>1</StartPage> <EndPage>1</EndPage> </ConversionSettings> </Override>
When the start or end page should not be limited, specify 0 inside the element or remove the element. Do not include empty elements, e.g.
<Override> <ConversionSettings> <StartPage>2</StartPage> </ConversionSettings> </Override>
Specify ‘open’ passwords for secured documents
If your MS-Word files have been saved using a password then under normal circumstances these files cannot be converted. However, if you know this password then you can specify it as follows.
<Override> <OpenOptions> <Password></Password> </OpenOptions> </Override>
Disable refreshing of content
By default the PDF Converter refreshes all content in an MS-Word file, e.g. embedded fields, table of contents, smart parts, etc. If this is not desired then specify the following XML:
<Override> <OpenOptions> <RefreshContent>false</RefreshContent> </OpenOptions> </Override>
Specify PDF profile, e.g. PDF/A or a specific PDF Version
We currently provide some brilliant PDF/A support, but it is an all or nothing approach by setting a global flag in the configuration file. To control the PDF Version on a request by request basis either use our Web Services Interface or specify the following XML in the optional parameters field of the Convert Document workflow activity.
<Override> <ConversionSettings> <!-- Set the output profile --> <PDFProfile>PDF_A1B</PDFProfile> <!-- Force post processing --> <OutputFormatSpecificSettings type="OutputFormatSpecificSettings_PDF"> <FastWebView>false</FastWebView> <EmbedAllFonts>true</EmbedAllFonts> <SubsetFonts>false</SubsetFonts> <PostProcessFile>true</PostProcessFile> </OutputFormatSpecificSettings> </ConversionSettings> </Override>
This will make sure that all converted files conform to the PDF/A standard. However, if the source file is already in PDF format, and needs converting to PDF/A, then the SkipPDFFiles setting will need to be disabled first.
The PDFProfile element supports the following values, _please note that the use of this functionality requires a Muhimbi OCR and PDF/A Archiving for SharePoint add-on license.
-
PDF_A1B: Use the PDF/A1b standard for long term archiving.
-
PDF_A2B: Use the PDF/A2b standard for long term archiving.
-
PDF_A3B: Use the PDF/A3b standard for long term archiving. (As of version 8.4)
-
PDF_1_1: PDF 1.1 output (Compatible with Acrobat 2.0 (1994) and later).
-
PDF_1_2: PDF 1.2 output (Compatible with Acrobat 3.0 (1996) and later).
-
PDF_1_3: PDF 1.3 output (Compatible with Acrobat 4.0 (2000) and later).
-
PDF_1_4: PDF 1.4 output (Compatible with Acrobat 5.0 (2001) and later).
-
PDF_1_5: PDF 1.5 output (Compatible with Acrobat 6.0 (2003) and later).
-
PDF_1_6: PDF 1.6 output (Compatible with Acrobat 7.0 (2005) and later).
-
PDF_1_7: PDF 1.7 output (Compatible with Acrobat 8.0 (2006) and later).
Change conversion range
Some document formats such as Excel and PowerPoint allow sheets to be hidden. By default we convert all Visible Sheets / Slides, but perhaps you are only interested in the Active / Selected spreadsheet. This can be controlled using the following XML:
<Override> <ConversionSettings> <Range></Range> </ConversionSettings> </Override>
The possible values for Range are:
-
VisibleDocuments: Skips, in case of Excel and PowerPoint, any hidden tabs or slides.
-
AllDocuments: Export all tabs or slides in a workspace.
-
ActiveDocuments: Exports, in case of Excel, the selected tabs
Specify Converter Specific Settings
Quite a few of our converters support settings that are specific to that particular file format. For an example see the Revision Tracking example above. Although enhancements are made all the time, at the time of writing the following Converter Specific Settings are available.
-
ConverterSpecificSettings_Cad
-
ConverterSpecificSettings_CommandLineConverter
-
ConverterSpecificSettings_HTML
-
ConverterSpecificSettings_InfoPath
-
ConverterSpecificSettings_MSG
-
ConverterSpecificSettings_Presentations
-
ConverterSpecificSettings_WordProcessing
For more detail see the Class Diagrams in the Developer Guide.
Specify Viewer Preferences
As of version 7.0 the PDF Converter allows PDF Viewer Preferences to be specified, e.g. Center the PDF Reader Window, Hide the Menu and Toolbars, display the bookmarks pane, etc. For full details see this post.
An example that shows how to set these preferences is provided below. Naturally it can be combined with the other ‘overrides’ discussed previously.
<Override> <ConversionSettings> <OutputFormatSpecificSettings type="OutputFormatSpecificSettings_PDF"> <FastWebView>false</FastWebView> <EmbedAllFonts>false</EmbedAllFonts> <SubsetFonts>false</SubsetFonts> <PostProcessFile>false</PostProcessFile> <ViewerPreferences> <CenterWindow>true</CenterWindow> <NavigationPane>Bookmarks</NavigationPane> <DisplayTitle>false</DisplayTitle> <FitWindow>false</FitWindow> <HideMenubar>false</HideMenubar> <HideToolbar>false</HideToolbar> <HideWindowUI>false</HideWindowUI> <PageLayout>SinglePage</PageLayout> <HideEmptyNavigationPane>false</HideEmptyNavigationPane> <PageScalingMode>Default</PageScalingMode> <FullScreen>false</FullScreen> </ViewerPreferences> </OutputFormatSpecificSettings> </ConversionSettings> </Override>
Control Font Embedding, PDF Version, PDF Fast Web View
As of version 7.0 a number of facilities have been added to the OCR and PDF/A Archiving for SharePoint add-on license. These features allow fonts to be embedded / stripped, Fast Web View (Linearisation) to be enabled and the PDF Version to be changed (anything between PDF 1.1 and 1.7, including PDF/A1b and A2b). For details see this post.
An example that shows how to specify these new features using the XML based override syntax can be found below. Please make sure that PostProcessFile is set to true in order to pick up these settings.
<Override> <ConversionSettings> <OutputFormatSpecificSettings type="OutputFormatSpecificSettings_PDF"> <FastWebView>true</FastWebView> <EmbedAllFonts>true</EmbedAllFonts> <SubsetFonts>false</SubsetFonts> <PostProcessFile>true</PostProcessFile> </OutputFormatSpecificSettings> </ConversionSettings> </Override>
Specifying which InfoPath views to convert
As mentioned in the MS-Word revision tracking example above, it is possible to specify Converter Specific Settings. In this example we’ll show how to use this facility to specify which InfoPath views to convert. The same can be achieved at design time or by setting workflow variables, but the example provided below allows view names to be specified at run-time and only requires a single workflow step.
Let’s have a look at the code needed for doing the same via our Web Services based object model ( sample code is C#, other languages follow the same structure).
ConverterSpecificSettings_InfoPath csc = new ConverterSpecificSettings_InfoPath(); csc.ConversionViews = new InfoPathView[2]; csc.ConversionViews[0] = new InfoPathView(); csc.ConversionViews[0].Name = "NAME-OF-VIEW1"; csc.ConversionViews[1] = new InfoPathView(); csc.ConversionViews[1].Name = "NAME-OF-VIEW2"; // ** As we are overriding settings, we need to override ALL of them in this object, csc.ConvertAttachments = true; csc.AutoTrustForms = false; csc.ProcessFullTrustForms = true; csc.StripDataObjects = true; csc.StripDotNETCode = true; conversionSettings.ConverterSpecificSettings = csc;
What is important to realise is that when specifying ConverterSpecificSettings, it is essential that ALL values are specified as fields that are not will be initialised to their default value. The default value for a boolean field is ‘false’, which in this example would mean that attachments are not converted and InfoPath Data Objects are not stripped. Change these values in-line with your needs, but unless you are 100% sure what each value means, keep them as specified in this example.
When serialising the code into XML we get the following:
<Override> <ConversionSettings> <ConverterSpecificSettings type="ConverterSpecificSettings_InfoPath"> <ConversionViews> <InfoPathView> <Name>NAME-OF-VIEW1</Name> </InfoPathView> <InfoPathView> <Name>NAME_OF_VIEW2</Name> </InfoPathView> </ConversionViews> <!-- ** As we are overriding settings, we need to override ALL of them in this object --> <ConvertAttachments>true</ConvertAttachments> <AutoTrustForms>true</AutoTrustForms> <ProcessFullTrustForms>true</ProcessFullTrustForms> <StripDataObjects>true</StripDataObjects> <StripDotNETCode>true</StripDotNETCode> </ConverterSpecificSettings> </ConversionSettings> </Override>
Carrying out OCR processing during conversion {% #OCR %}
As of version 7.1 of the Muhimbi PDF Converter it is possible to carry out OCR (Optical Character Recognition) during the conversion process. This can be used to convert an image into a searchable PDF or make bitmap based PDFs (scanned PDFs) searchable and indexable.
Starting with version 7.2 the PDF Converter for SharePoint includes dedicated OCR Workflow Actions ( Nintex Workflow, SharePoint Designer). However, if you are running an older version of the software, or you have a good reason to carry out OCR using the Convert Document Workflow Activity, then specify OCR settings using the XML syntax described below.
Please note that in order to use this facility you need to run at least version 7.1 of our software. Image files can be converted into searchable PDFs using the syntax below. When converting image based PDFs into searchable PDFs the syntax is the same, however the SkipPDFFiles setting will need to be disabled and the PDF Pass-through converter needs to be enabled in our Central Admin screen. For use in a Production environment, the OCR facilities require a OCR and PDF/A Archiving for SharePoint add-on_.
<Override> <ConversionSettings> <OCRSettings> <Language>English</Language> <Performance>Slow</Performance> <Paginate>false</Paginate> <WhiteList></WhiteList> <BlackList></BlackList> <Regions> <OCRRegion> <X>100</X><Y>100</Y><Width>200</Width><Height>50</Height> <StartPage>0</StartPage><EndPage>0</EndPage> <PageInterval>1</PageInterval><PageRange></PageRange> </OCRRegion> </Regions> </OCRSettings> </ConversionSettings> </Override>
The Regions element can be removed unless you wish to OCR only a certain part of the page or certain pages. When specifying OCRRegions the StartPage, EndPage, PageInterval and PageRange elements are optional.
The unit of measure for all coordinates is pt (Points, 1/72 inch).
Generate a table of contents
As of version 7.3 the Muhimbi PDF Converter comes with a brilliant new facility to generate tables of content for individual conversions as well as merge operations. Although it is a little bit ‘technical’ please read this blog post, it shows how the underlying system works and contains a useful XSL based sample template.
Let’s have a look at a typical example.
<Override> <ConversionSettings> <TOCSettings> <MinimumEntries>0</MinimumEntries> <Bookmark>Table Of Contents</Bookmark> <Location>Front</Location> <Properties> <NameValuePair> <Name>title</Name> <Value>Custom Title</Value> </NameValuePair> </Properties> <Template>https://YourServer/SomeSiteCollection/SomeDocLib/TOC.xsl</Template> </TOCSettings> </ConversionSettings> </Override>
This XML is similar to the C# sample code in the before mentioned blog post, which also provides details about the various properties such as MinimumEntries and Location. The recommended approach is to upload the template included in that post to a SharePoint document library and specifying its location in the Template element.
If you wish to merge a number of documents AND add a table of contents for the entire - merged - PDF then this can be achieved as follows:
-
(Convert) and merge multiple files into a single PDF using the Merge Documents into PDF workflow activity.
-
Process the generated PDF by passing it into the Convert Document workflow activity and specifying TOCSettings using the XML override syntax.
Override Email Conversion settings
Muhimbi’s email converter is a very flexible module that allows a lot of settings to be tweaked. Don’t be intimidated by the sheer number and names of parameters, you can find them all in the Developer Guide (Search for ConverterSpecificSettings_MSG)
The default settings are provided below. Feel free to change them.
<Override> <ConversionSettings> <ConverterSpecificSettings type="ConverterSpecificSettings_MSG"> <ConvertAttachments>true</ConvertAttachments> <PaperSize>Letter</PaperSize> <HTMLScaleMode>FitWidthScaleImagesOnly</HTMLScaleMode> <PlainTextLineBreaks>RemoveExtra</PlainTextLineBreaks> <BestBodyMode>Default</BestBodyMode> <EmailAddressDisplayMode>Name</EmailAddressDisplayMode> <FromEmailAddressDisplayMode>NameAndAddress</FromEmailAddressDisplayMode> <AttachmentMergeMode>Merge</AttachmentMergeMode> <DisplayAttachmentSummary>true</DisplayAttachmentSummary> <BreakOnUnsupportedAttachment>false</BreakOnUnsupportedAttachment> <BreakOnUnsupportedEmbeddedObject>false</BreakOnUnsupportedEmbeddedObject> <EmbeddedObjectDisplayMode>InlineFitWidth</EmbeddedObjectDisplayMode> <EmbeddedObjectIconDisplayMode>IconOnly</EmbeddedObjectIconDisplayMode> <EmbeddedObjectScalePercentage>3.33</EmbeddedObjectScalePercentage> <SentDateMissingDisplayMode></SentDateMissingDisplayMode> </ConverterSpecificSettings> </ConversionSettings> </Override>
The ‘overriding’ facility described in this post works with SharePoint Designer Workflows, Nintex Workflows, Microsoft Flow (Power Automate) and K2 Workflows.
Hyper-compression
As of version 10.3, it is possible to convert documents into PDF and apply compression at the same time. When any of the supported document types are converted into PDF it is possible to use the below override XML to achieve compressed output. If your document is already in PDF format, simply use conversion of PDF to PDF and apply the compression override simultaneously.
Sample override XML is below:
<Override> <ConversionSettings> <CompressionSettings> <RemoveAnnotations>Default</RemoveAnnotations> <RemoveBlankPages>Default</RemoveBlankPages> <RemoveBookmarks>Default</RemoveBookmarks> <RemoveEmbeddedFiles>Default</RemoveEmbeddedFiles> <RemoveFormFields>Default</RemoveFormFields> <RemoveJavaScript>Default</RemoveJavaScript> <RemoveMetadata>Default</RemoveMetadata> <RemovePageThumbnails>Default</RemovePageThumbnails> <PackFonts>Default</PackFonts> <PackDocument>Default</PackDocument> <RecompressImages>Default</RecompressImages> <EnableMRC>Default</EnableMRC> <PreserveSmoothing>Default</PreserveSmoothing> <DownscaleImages>Default</DownscaleImages> <EnableColorDetection>Default</EnableColorDetection> <EnableCharRepair>Default</EnableCharRepair> <EnableJPEG2000>Default</EnableJPEG2000> <EnableJBIG2>Default</EnableJBIG2> <JBIG2PMSThreshold>85</JBIG2PMSThreshold> <DownscaleResolution>150</DownscaleResolution> <DownscaleResolutionMRC>150</DownscaleResolutionMRC> <ImageQuality>ImageQualityDefault</ImageQuality> </CompressionSettings> </ConversionSettings> </Override>
Feel free to change any of the values. Use Default, True or False (tri-state boolean, values are cases sensitive) for the following properties: RemoveAnnotations, RemoveBlankPages, RemoveBookmarks, RemoveEmbeddedFiles, RemoveFormFields, RemoveJavaScript, RemoveMetadata, RemovePageThumbnails, PackFonts, PackDocument, RecompressImages, EnableMRC, PreserveSmoothing, DownscaleImages, EnableColorDetection, EnableCharRepair, EnableJPEG2000, EnableJBIG2
Use the range of 0 to 100 for JBIG2PMSThreshold. User sensible DPI value for DownscaleResolution and DownscaleResolutionMRC.
For ImageQuality use one of the following values:
-
ImageQualityDefault
-
ImageQualityVeryLow
-
ImageQualityLow
-
ImageQualityMedium
-
ImageQualityHigh
-
ImageQualityVeryHigh
-
ImageQualityVeryVeryHigh
Please note that you need a PDF Converter Professional add-on license in addition to a valid PDF Converter Services or PDF Converter for SharePoint License to use this functionality.
Clavin is a Microsoft Business Applications MVP who supports 1,000+ high-level enterprise customers with challenges related to PDF conversion in combination with SharePoint on-premises Office 365, Azure, Nintex, K2, and Power Platform mostly no-code solutions.