How to highlight and redact documents using pattern detection

This guide explains how to use the pattern redaction and highlighting features of the Document Converter API. These features enable users to redact or highlight specific patterns (e.g., sensitive information such as account numbers) in PDF documents using regular expressions.

With this guide, you’ll:

  • Learn the configuration options for redaction and highlighting (e.g., color, transparency, and case sensitivity).

  • See how to implement these features with sample C# code.

Pattern redaction and highlighting with the API

The following table outlines the properties used in pattern redaction and highlighting operations:

Property Description
Debug Debug mode gives additional logging information.
Pattern Regular expression pattern for the text to be redacted/highlighted.
CaseSensitive The regular expression is case sensitive.
Red Red component of the highlight/redaction color. Range 0–255.
Green Green component of the highlight/redaction color. Range 0–255.
Blue Blue component of the highlight/redaction color. Range 0–255.
Alpha Alpha value, only used in the pattern highlighting operation. Range 0–255, fixed at 255 for redaction.

Example — Implementing pattern redaction with API

Pattern redaction removes sensitive information from PDF documents based on a specified regular expression pattern. The following example demonstrates how to perform pattern redaction using the API:

/// <summary>
        /// Perform pattern redaction on the supplied file, writing the result into the target folder.
        /// </summary>
        /// <param name="ServiceURL">URL endpoint for the PDF Converter service.</param>
        /// <param name="sourceFileName">Source filename.</param>
        /// <param name="targetFolder">Target folder to receive the output file.</param>
        static void PatternRedaction(string ServiceURL, string sourceFileName, string targetFolder)
        {
            DocumentConverterServiceClient client = null;
            try
            {
                // Create minimum `OpenOptions` object.
                OpenOptions openOptions = new OpenOptions();
                openOptions.OriginalFileName = Path.GetFileName(sourceFileName);

                // Create minimum `PatternRedactionSettings`.
                PatternRedactionSettings patternRedactionSettings = new PatternRedactionSettings();
                // Set what needs to be redacted.
                patternRedactionSettings.Red = 0;
                patternRedactionSettings.Green = 0;
                patternRedactionSettings.Blue = 255;
                patternRedactionSettings.Pattern = "\"374245455400126\"";

                // Create target folder if required.
                if (!Directory.Exists(targetFolder))
                {
                    Directory.CreateDirectory(targetFolder);
                }
                // ** Read the source file into a byte array.
                byte[] sourceFile = File.ReadAllBytes(sourceFileName);

                // ** Open the service and configure the bindings.
                client = OpenService(ServiceURL);

                // ** Carry out the conversion.
                byte[] result = client.PatternRedaction(sourceFile, openOptions, patternRedactionSettings);

                // ** Save the results.
                if (result != null)
                {
                    if (!Directory.Exists(targetFolder))
                    {
                        Directory.CreateDirectory(targetFolder);
                    }
                    string filename = Path.GetFileNameWithoutExtension(sourceFileName);
                    string destinationFileName = Path.GetFullPath(Path.Combine(targetFolder, filename + "-redacted.pdf"));
                    using (FileStream fs = File.Create(destinationFileName))
                    {
                        fs.Write(result, 0, result.Length);
                        fs.Close();
                    }
                    Console.WriteLine("File converted to " + destinationFileName);
                    // Open the destination file.
                    ProcessStartInfo psi = new ProcessStartInfo();
                    psi.FileName = destinationFileName;
                    psi.UseShellExecute = true;
                    Process.Start(psi);
                }
                else
                {
                    Console.WriteLine("Nothing returned");
                }
            }
            catch (FaultException<WebServiceFaultException> ex)
            {
                Console.WriteLine($"FaultException occurred: ExceptionType: {ex.Detail.ExceptionType.ToString()}");
                Console.WriteLine();
                Console.WriteLine($"Error Detail: {string.Join(Environment.NewLine, ex.Detail.ExceptionDetails)}");
                Console.WriteLine($"Error message: {ex.Message}");
                Console.WriteLine();
                Console.WriteLine($"Error reason: {ex.Reason}");
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.Message);
                Console.WriteLine(ex.StackTrace);
                Console.WriteLine(ex.Data.ToString());
            }
            finally
            {
                if (client != null)
                {
                    CloseService(client);

                }
            }

        }

For a practical demonstration, refer to the sample codes for pattern redaction and highlighting.

Conclusion

Pattern redaction and highlighting are powerful tools for handling sensitive information in PDF documents. By leveraging the API’s flexibility, you can integrate these features into custom applications. For more advanced configurations or troubleshooting, refer to the documentation or contact Support.