Converting PDF document to PDF/A1b using the Muhimbi PDF Converter Web Service
As Muhimbi’s range of PDF Conversion products, including the PDF Converter for SharePoint and the PDF Converter Services, now provide the ability to post process any converted document for output in PDF/A format, one obvious use for this brilliant new functionality is to convert regular PDF files to PDF/A format.
In this post we’ll provide a simple .NET sample that invokes our Web Services interface to carry out the conversion from PDF to PDF/A1b. The code is nearly identical to the code to convert and watermark a simple MS-Word file with the following exceptions. You can apply the same changes to the Java sample to make it do the same using that language.
-
openOptions.FileExtension is set to pdf.
-
conversionSettings.PDFProfile is set to PDFProfile.PDF_A1B.
-
converstionSettings.OutputFormatSpecificSettings is set to an instance of OutputFormatSpecificSettings_PDF with the PostProcessFile property set to True.
-
The client.ProcessChanges() method is invoked rather than client.Convert().
-
All references to watermarks have been removed as they are not part of this sample.
Some minor clean-up has been carried out as well to make the code even shorter. After running the example the resulting file validates perfectly according to Acrobat X Pro.
Sample Code
Listed below is sample code to convert PDF to PDF/A. You can either copy the code from this blog post, download the Visual Studio Project or open the project from the Sample Code folder in the Windows Start Menu.
The sample code expects the path of the PDF file on the command line. If the path is omitted then the first PDF file found in the current directory will be used.
-
Download and install the Muhimbi PDF Converter Services or PDF Converter for SharePoint.
-
If you need help with installation, check out the Administration guide for PDF Converter Services or the Administration guide for PDF Converter for SharePoint On-Premises.
-
Create a new Visual Studio C# Console application named PDFA_Conversion.
-
Add a Service Reference to the following URL and specify ConversionService as the namespace
https://localhost:41734/Muhimbi.DocumentConverter.WebService/?wsdl
-
Paste the following code into Program.cs.
using System; using System.Diagnostics; using System.IO; using System.ServiceModel; using Watermarking.ConversionService; namespace PDFA_Conversion { class Program { // ** The URL where the Web Service is located. Amend host name if needed. static string SERVICE_URL = "https://localhost:41734/Muhimbi.DocumentConverter.WebService/"; static void Main(string[] args) { DocumentConverterServiceClient client = null; try { // ** Determine the source file and read it into a byte array. string sourceFileName = null; if (args.Length == 0) { // ** If nothing is specified then read the first PDF file from the folder. string[] sourceFiles = Directory.GetFiles(Directory.GetCurrentDirectory(), "*.pdf"); if (sourceFiles.Length > 0) sourceFileName = sourceFiles[0]; else { Console.WriteLine("Please specify a document to convert to PDF/A."); Console.ReadKey(); return; } } else sourceFileName = args[0]; byte[] sourceFile = File.ReadAllBytes(sourceFileName); // ** Open the service and configure the bindings client = OpenService(SERVICE_URL); //** Set the absolute minimum open options OpenOptions openOptions = new OpenOptions(); openOptions.OriginalFileName = Path.GetFileName(sourceFileName); openOptions.FileExtension = "pdf"; // ** Set the absolute minimum conversion settings. ConversionSettings conversionSettings = new ConversionSettings(); conversionSettings.PDFProfile = PDFProfile.PDF_A1B; // ** Specify output settings as we want to force post processing of files. OutputFormatSpecificSettings_PDF osf = new OutputFormatSpecificSettings_PDF(); osf.PostProcessFile = true; // ** We need to specify ALL values of an object, so use these for PDF/A osf.FastWebView = false; osf.EmbedAllFonts = true; osf.SubsetFonts = false; conversionSettings.OutputFormatSpecificSettings = osf; // ** Carry out the conversion. Console.WriteLine("Converting file " + sourceFileName + " to PDF/A."); byte[] convFile = client.ProcessChanges(sourceFile, openOptions, conversionSettings); // ** Write the converted file back to the file system using the same name. string destinationFileName = Path.GetFileName(sourceFileName); using (FileStream fs = File.Create(destinationFileName)) { fs.Write(convFile, 0, convFile.Length); fs.Close(); } Console.WriteLine("File converted to " + destinationFileName); // ** Open the generated PDF/A file in a PDF Reader Console.WriteLine("Launching file in PDF Reader"); Process.Start(destinationFileName); } catch (FaultException<WebServiceFaultException> ex) { Console.WriteLine("FaultException occurred: ExceptionType: " + ex.Detail.ExceptionType.ToString()); } catch (Exception ex) { Console.WriteLine(ex.ToString()); } finally { CloseService(client); } Console.ReadKey(); } /// <summary> /// Configure the Bindings, endpoints and open the service using the specified address. /// </summary> /// <returns>An instance of the Web Service.</returns> public static DocumentConverterServiceClient OpenService(string address) { DocumentConverterServiceClient client = null; try { BasicHttpBinding binding = new BasicHttpBinding(); // ** Use standard Windows Security. binding.Security.Mode = BasicHttpSecurityMode.TransportCredentialOnly; binding.Security.Transport.ClientCredentialType = HttpClientCredentialType.Windows; // ** Increase the client Timeout to deal with (very) long running requests. binding.SendTimeout = TimeSpan.FromMinutes(30); binding.ReceiveTimeout = TimeSpan.FromMinutes(30); // ** Set the maximum document size to 50MB binding.MaxReceivedMessageSize = 50 * 1024 * 1024; binding.ReaderQuotas.MaxArrayLength = 50 * 1024 * 1024; binding.ReaderQuotas.MaxStringContentLength = 50 * 1024 * 1024; // ** Specify an identity (any identity) in order to get it past .net3.5 sp1 EndpointIdentity epi = EndpointIdentity.CreateUpnIdentity("unknown"); EndpointAddress epa = new EndpointAddress(new Uri(address), epi); client = new DocumentConverterServiceClient(binding, epa); client.Open(); return client; } catch (Exception) { CloseService(client); throw; } } /// <summary> /// Check if the client is open and then close it. /// </summary> /// <param name="client">The client to close</param> public static void CloseService(DocumentConverterServiceClient client) { if (client != null && client.State == CommunicationState.Opened) client.Close(); } } }
-
Make sure the output folder contains a PDF file.
-
Compile and execute the application. The converted PDF/A file will automatically be opened in your system’s PDF reader.
As all this functionality is exposed via a Web Services interface, it works equally well from Java and other web services enabled environments. Please note that you need the OCR and PDF/A Archiving for SharePoint add-on license in addition to a valid PDF Converter for SharePoint or PDF Converter Services License in order to use this functionality.
This code is merely an example of what is possible, feel free to adapt it to you own needs. The possibilities are endless.
Clavin is a Microsoft Business Applications MVP who supports 1,000+ high-level enterprise customers with challenges related to PDF conversion in combination with SharePoint on-premises Office 365, Azure, Nintex, K2, and Power Platform mostly no-code solutions.