To facilitate the new PDF Splitting facility in our PDF Converter for SharePoint we have added the ability to split a single file into multiple ones to our core PDF Conversion engine, which our SharePoint product shares with our generic Java / .NET oriented PDF Converter Services.
In this post we’ll describe in detail how to invoke this new splitting facility from your own code. This demo uses C# and .NET, but the web services based interface is identical when used from Java ( See this generic PDF Conversion sample).
This post is part of the following series related to manipulating PDF files using web services.
-
Converting Office files to PDF Format using a Web Services based interface (C# / .NET).
-
Converting Office files to PDF Format using a Web Services based interface (Java).
-
Invoking the PDF Converter Web Service from Visual Studio 2005 using VB.net
-
Using the awesome new watermarking features of the Muhimbi PDF Converter Services (C# / .NET).
-
Using the PDF Watermarking features from Java based environments.
-
Converting and merging multiple files using the PDF Converter Services and .NET / C#.
Key Features
The key features of the new splitting facility are as follows:
-
Split a single PDF file into one or more individual PDF files.
-
Split based on number of pages or bookmarks.
-
Automatically generate numbered file names using .NET’s formatting syntax, e.g. ‘split-{0:3D}.pdf’ will use 3 digits for the sequential numbers starting at ‘split-001.pdf’. When splitting by bookmark then an optional {1} parameter can be inserted in the file name to include the name of the bookmark as well.
-
Can be combined in combination with other actions, e.g. convert & merge.
A note about splitting based on bookmark levels: PDFs store bookmarks at the page level, so it is not clear on what part of the page a heading starts or ends. As a result an extra page will always be exported for each file split based on bookmark levels.
For example let’s assume the following document:
-
Page 1: Contains chapter 1 and sections 1.1. and 1.2.
-
Page 2: Contains the last paragraph of 1.2 and all of chapter 2.
-
Page 3: Contains Chapter 3.
When splitting this document based on bookmarks using ‘1’ as the batch size then the following files will be created:
-
File 1: Contains page 1 and 2 as expected.
-
File 2: Contains pages 2 and 3 even though Chapter 2 is only really part of page 2. This is because there is no way to know if Chapter 2 runs over into page 3 or not.
-
File 3: Contains Chapter 3.
Object Model
The object model is relatively straight forward. The classes related to PDF Splitting are displayed below. A number of enumerations are used as well by the various classes, these can be found in our original post about Converting files using the Web Services interface.
The Web Service method that controls splitting (as well as merging) of files is called ProcessBatch. It accepts a ProcessingOptions object that holds all information about the files to process and the operations to apply. A Results object is returned that, when it comes to splitting of files, contains one or more results that hold the contents of the file as well as the suggested output file name, which you may us to save the file locally.
As the ProcessingOptions class accepts both MergeSettings and SplitOptions it is possible to convert and merge a set of input files and then split up the results, all in a single web service call. Just populate the various properties and the system will take care of the rest.
Example code
The following sample describes the steps needed to split up a single PDF file based on the number of pages. We are using Visual Studio and C#, but any environment that can invoke web services should be able to access this functionality. Note that the WSDL can be found at https://localhost:41734/Muhimbi.DocumentConverter.WebService/?wsdl.
A generic PDF Conversion Java based example is installed alongside the product and discussed in the User & Developer Guide. The source code for this example can be found in the folder the Muhimbi Conversion service has been installed to.
-
Start a new Visual Studio project and create the project type of your choice. In this example we are using a standard .net 3.0 project of type Console Application. Name it ‘Split PDF’.
-
In the Solution Explorer window, right-click References and select Add Service Reference. (Do not use web references!)
-
In the Address box enter the WSDL address listed in the introduction of this section. If the Conversion Service is located on a different machine then substitute localhost with the server’s name.
-
Accept the default Namespace of ServiceReference1 and click the OK button to generate the proxy classes.
-
Optionally add a PDF file to the solution, set the Build Action to None and Copy to Output Directory to Copy if newer. By doing this there will always be a valid test file in the same directory as the compiled executable.
-
Copy and paste the following code and replace the contents of Program.cs.
using System; using System.IO; using System.ServiceModel; using Split_PDF.ServiceReference1; namespace Split_PDF { class Program { // ** The URL where the Web Service is located. Amend host name if needed. static string SERVICE_URL = "https://localhost:41734/Muhimbi.DocumentConverter.WebService/"; static void Main(string[] args) { DocumentConverterServiceClient client = null; try { // ** Determine the source file and read it into a byte array. string sourceFileName = null; if (args.Length == 0) { //** Delete any split files from a previous test run. foreach (string file in Directory.GetFiles(Directory.GetCurrentDirectory(), "spf-*.pdf")) { File.Delete(file); } // ** If nothing is specified then read the first PDF file. string[] sourceFiles = Directory.GetFiles(Directory.GetCurrentDirectory(), "*.pdf"); if (sourceFiles.Length > 0) sourceFileName = sourceFiles[0]; else { Console.WriteLine("Please specify a document to split."); Console.ReadKey(); return; } } else sourceFileName = args[0]; byte[] sourceFile = File.ReadAllBytes(sourceFileName); // ** Open the service and configure the bindings client = OpenService(SERVICE_URL); //** Set the absolute minimum open options OpenOptions openOptions = new OpenOptions(); openOptions.OriginalFileName = Path.GetFileName(sourceFileName); openOptions.FileExtension = "pdf"; // ** Set the absolute minimum conversion settings. ConversionSettings conversionSettings = new ConversionSettings(); // ** Create the ProcessingOptions for the splitting task. ProcessingOptions processingOptions = new ProcessingOptions() { MergeSettings = null, SplitOptions = new FileSplitOptions() { FileNameTemplate = "spf-{0:D3}", FileSplitType = FileSplitType.ByNumberOfPages, BatchSize = 5, BookmarkLevel = 0 }, SourceFiles = new SourceFile[1] { new SourceFile() { MergeSettings = null, OpenOptions = openOptions, ConversionSettings = conversionSettings, File = sourceFile } } }; // ** Carry out the splittng. Console.WriteLine("Splitting file " + sourceFileName); BatchResults batchResults = client.ProcessBatch(processingOptions); // ** Process the returned files foreach (BatchResult result in batchResults.Results) { Console.WriteLine("Writing split file " + result.FileName); File.WriteAllBytes(result.FileName, result.File); } Console.WriteLine("Finished."); } catch (FaultException<WebServiceFaultException> ex) { Console.WriteLine("FaultException occurred: ExceptionType: " + ex.Detail.ExceptionType.ToString()); } catch (Exception ex) { Console.WriteLine(ex.ToString()); } finally { CloseService(client); } Console.ReadKey(); } /// <summary> /// Configure the Bindings, endpoints and open the service using the specified address. /// </summary> /// <returns>An instance of the Web Service.</returns> public static DocumentConverterServiceClient OpenService(string address) { DocumentConverterServiceClient client = null; try { BasicHttpBinding binding = new BasicHttpBinding(); // ** Use standard Windows Security. binding.Security.Mode = BasicHttpSecurityMode.TransportCredentialOnly; binding.Security.Transport.ClientCredentialType = HttpClientCredentialType.Windows; // ** Increase the client Timeout to deal with (very) long running requests. binding.SendTimeout = TimeSpan.FromMinutes(30); binding.ReceiveTimeout = TimeSpan.FromMinutes(30); // ** Set the maximum document size to 50MB binding.MaxReceivedMessageSize = 50 * 1024 * 1024; binding.ReaderQuotas.MaxArrayLength = 50 * 1024 * 1024; binding.ReaderQuotas.MaxStringContentLength = 50 * 1024 * 1024; // ** Specify an identity (any identity) in order to get it past .net3.5 sp1 EndpointIdentity epi = EndpointIdentity.CreateUpnIdentity("unknown"); EndpointAddress epa = new EndpointAddress(new Uri(address), epi); client = new DocumentConverterServiceClient(binding, epa); client.Open(); return client; } catch (Exception) { CloseService(client); throw; } } /// <summary> /// Check if the client is open and then close it. /// </summary> /// <param name="client">The client to close</param> public static void CloseService(DocumentConverterServiceClient client) { if (client != null && client.State == CommunicationState.Opened) client.Close(); } } }
Compile the application and run it either from the command prompt, with a path to the PDF file to split on the command line, or – if a PDF file is present in the executable’s folder – just run it.
Note that In this example we are programmatically configuring the WCF Bindings and End Points. If you wish you can use the declarative approach using the config file as well.
This new functionality is available as of version 5.2 of our software.
Clavin is a Microsoft Business Applications MVP who supports 1,000+ high-level enterprise customers with challenges related to PDF conversion in combination with SharePoint on-premises Office 365, Azure, Nintex, K2, and Power Platform mostly no-code solutions.