Levels of Redaction Automation
In another one of our blog posts, we discussed how a credit card company can batch process files and automate the process of applying redactions. In this post, we’ll go a step further and discuss the different ways we can add and apply redactions: manually, semi automatically, and fully automatically. Each of these ways depends on how much automation we want in identifying the text for removal.
In the aforementioned blog post, we wanted to remove sensitive information like a person’s name, phone number, and email address. Continuing with that example, we might want to remove the social security number present in a document without actually knowing in advance what its value is.
But before we can apply redactions, we need to add redaction annotations to a document to mark the area that has to be redacted. There are different ways in which we can do this, and they vary based on the amount of automation we want.
Manually
With this method, you can manually choose the area you want to remove from a document.
Via the UI
You can use the following options that ship by default in the PSPDFKit for Web UI:
-
Text Redactions — When you want to redact text, you can select the Text Redaction option from the toolbar. Then you have to select the text you want to redact. This will add a redaction annotation to that area; this is a mark to show that the area in question will be removed once you apply redactions. An alternate way is to select the text and then click on the text highlighter icon in the inline tooltip.
-
Area Redactions — When you want to remove an area, you can select the Area Redaction option from the toolbar and then create a rectangle that marks the area that needs to be removed.
Via the API
You can also use our programmatic API to add redaction annotations in case you want to provide a custom UI in your application for adding them. Here’s an example:
const boundingBox = new PSPDFKit.Geometry.Rect({ left: 25, top: 25, width: 175, height: 30, }); await instance.create( new PSPDFKit.Annotations.RedactionAnnotation({ pageIndex: 0, boundingBox, rects: new PSPDFKit.Immutable.List([boundingBox]), }), );
Semi Automatically
There might be instances where you want to redact all the occurrences of a particular text from a PDF document. This method can be used when you already know the text you want to remove from the PDF. We used this approach in the last blog post since we already knew the exact text we wanted to remove.
To achieve this, you can search for text in a PDF and then add redaction annotations to the bounding box of those search results. Rather than doing this in steps, you can use the createRedactionsBySearch
API to search and add redaction annotations to those areas automatically:
const instance = await PSPDFKit.load(options); const annotations = await instance.createRedactionsBySearch( 'Ritesh Kumar', ); console.log( 'List of newly created Redaction Annotation IDs', annotations, );
The above code will add a redaction annotation to all the occurrences of “Ritesh Kumar” in the document. Using this in the credit card example allows you to redact all occurrences of a customer’s name on a credit card application form. Once you’ve verified that the annotations have been added correctly, you can apply the redactions:
await instance.applyRedactions();
Fully Automatically
There might be situations where you want to remove the social security number from an application just by checking the pattern instead of searching for the exact number. This can be a more secure way of storing data, since you don’t have to map names with their social security numbers while you’re redacting a PDF. For these situations, we allow you to search by pattern:
const instance = await PSPDFKit.load(options); const annotations = await instance.createRedactionsBySearch( PSPDFKit.SearchPattern.SOCIAL_SECURITY_NUMBER, { searchType: PSPDFKit.SearchType.PRESET }, ); console.log( 'List of newly created Redaction Annotation IDs', annotations, );
The code above will search for all the occurrences of social security numbers in a PDF and add redaction annotations to them. You can read more about all the search patterns we support by exploring our SearchPattern
API.
Conclusion
In this blog, we took an example of a credit card company and discussed how we can use our API to batch process PDFs and remove sensitive information from them manually, semi automatically, and fully automatically. The same implementation can be used anywhere else that involves redaction of multiple PDF files.
If you want play with a demo, head over to our Web Catalog. Please keep in mind that Redaction is a component in our SDK that has to be purchased separately. If you’re interested, please get in touch with our sales team.