How to highlight and redact documents using pattern detection
This guide explains how to use the pattern redaction and highlighting features of the Document Converter API. These features enable users to redact or highlight specific patterns (e.g., sensitive information such as account numbers) in PDF documents using regular expressions.
With this guide, you’ll:
-
Learn the configuration options for redaction and highlighting (e.g., color, transparency, and case sensitivity).
-
See how to implement these features with sample C# code.
Pattern redaction and highlighting with the API
The following table outlines the properties used in pattern redaction and highlighting operations:
Property | Description |
---|---|
Debug |
Debug mode gives additional logging information. |
Pattern |
Regular expression pattern for the text to be redacted/highlighted. |
CaseSensitive |
The regular expression is case sensitive. |
Red |
Red component of the highlight/redaction color. Range 0–255. |
Green |
Green component of the highlight/redaction color. Range 0–255. |
Blue |
Blue component of the highlight/redaction color. Range 0–255. |
Alpha |
Alpha value, only used in the pattern highlighting operation. Range 0–255, fixed at 255 for redaction. |
Example — Implementing pattern redaction with API
Pattern redaction removes sensitive information from PDF documents based on a specified regular expression pattern. The following example demonstrates how to perform pattern redaction using the API:
/// <summary> /// Perform pattern redaction on the supplied file, writing the result into the target folder. /// </summary> /// <param name="ServiceURL">URL endpoint for the PDF Converter service.</param> /// <param name="sourceFileName">Source filename.</param> /// <param name="targetFolder">Target folder to receive the output file.</param> static void PatternRedaction(string ServiceURL, string sourceFileName, string targetFolder) { DocumentConverterServiceClient client = null; try { // Create minimum `OpenOptions` object. OpenOptions openOptions = new OpenOptions(); openOptions.OriginalFileName = Path.GetFileName(sourceFileName); // Create minimum `PatternRedactionSettings`. PatternRedactionSettings patternRedactionSettings = new PatternRedactionSettings(); // Set what needs to be redacted. patternRedactionSettings.Red = 0; patternRedactionSettings.Green = 0; patternRedactionSettings.Blue = 255; patternRedactionSettings.Pattern = "\"374245455400126\""; // Create target folder if required. if (!Directory.Exists(targetFolder)) { Directory.CreateDirectory(targetFolder); } // ** Read the source file into a byte array. byte[] sourceFile = File.ReadAllBytes(sourceFileName); // ** Open the service and configure the bindings. client = OpenService(ServiceURL); // ** Carry out the conversion. byte[] result = client.PatternRedaction(sourceFile, openOptions, patternRedactionSettings); // ** Save the results. if (result != null) { if (!Directory.Exists(targetFolder)) { Directory.CreateDirectory(targetFolder); } string filename = Path.GetFileNameWithoutExtension(sourceFileName); string destinationFileName = Path.GetFullPath(Path.Combine(targetFolder, filename + "-redacted.pdf")); using (FileStream fs = File.Create(destinationFileName)) { fs.Write(result, 0, result.Length); fs.Close(); } Console.WriteLine("File converted to " + destinationFileName); // Open the destination file. ProcessStartInfo psi = new ProcessStartInfo(); psi.FileName = destinationFileName; psi.UseShellExecute = true; Process.Start(psi); } else { Console.WriteLine("Nothing returned"); } } catch (FaultException<WebServiceFaultException> ex) { Console.WriteLine($"FaultException occurred: ExceptionType: {ex.Detail.ExceptionType.ToString()}"); Console.WriteLine(); Console.WriteLine($"Error Detail: {string.Join(Environment.NewLine, ex.Detail.ExceptionDetails)}"); Console.WriteLine($"Error message: {ex.Message}"); Console.WriteLine(); Console.WriteLine($"Error reason: {ex.Reason}"); } catch (Exception ex) { Console.WriteLine(ex.Message); Console.WriteLine(ex.StackTrace); Console.WriteLine(ex.Data.ToString()); } finally { if (client != null) { CloseService(client); } } }
For a practical demonstration, refer to the sample codes for pattern redaction and highlighting.
Conclusion
Pattern redaction and highlighting are powerful tools for handling sensitive information in PDF documents. By leveraging the API’s flexibility, you can integrate these features into custom applications. For more advanced configurations or troubleshooting, refer to the documentation or contact Support.