Search and redact PDFs in Java
Searching and redacting in Nutrient Java SDK can be done using the RedactionProcessor
.
Simple text search
The RedactionProcessor
lets you create redactions using simple text search rules — any piece of text matching a provided query is covered by the redaction annotations. For example, you could use the following to redact any occurrence of the phrase “ACME Bank”:
final PdfDocument document = PdfDocument.open(new FileDataProvider(new File("DocumentWithText.pdf"))); // Create a redaction processor, search for and redact all instances of "ACME Bank", and overwrite the original document. RedactionProcessor.create() .addRedactionTemplates(new RedactionRegEx.Builder("ACME Bank").build()) .redact(document);
The example above will add the redactions and apply them to the original document using the basic redact
method, which takes the original document as an argument. To add the redactions and save to a different document, you can use the redact
method that takes a WritableDataProvider
:
// Create a redaction processor, search for and redact all instances of "ACME Bank", and save to a new location. RedactionProcessor.create() .addRedactionTemplates(new RedactionRegEx.Builder("ACME Bank").build()) .redact(document, new FileDataProvider(new File("OutputDocument.pdf")));
To only add the redaction annotations without applying them, use identifyAndAddRedactionAnnotations
. You can also customize the redaction annotation color:
// Create a redaction processor, and search for and add redaction annotations for all instances of "ACME Bank". RedactionProcessor.create() .addRedactionTemplates(new RedactionRegEx.Builder("ACME Bank").setFillColor(Color.RED).build()) .identifyAndAddRedactionAnnotations(document); // You can then save and apply redactions separately: document.save(new DocumentSaveOptions.Builder().applyRedactionAnnotations(true).build());
Regular expression search
RedactionRegEx
can also take regular expressions. The following example searches for any four-digit code that’s followed by three letters:
RedactionProcessor.create() .addRedactionTemplates(new RedactionRegEx.Builder("\\d{4}[A-Za-z]{3}").build()) .redact(document);
ℹ️ Note: To provide any regular expression escape sequence, it’s necessary to double escape the character for the literal string to be valid — just like with ‘\d’, which is equivalent to ‘\d’ in regular expressions.
ℹ️ Note: Nutrient uses ICU regular expressions, which are a derivative of Perl regular expressions. See the ICU Regular Expressions guide for details.