Automate document redaction with predefined patterns
Nutrient lets you search the document for text matching predefined patterns and then create redactions on top of the matching text. After the redaction is applied, the text will be permanently and irreversibly removed.
Note that, by design, some of the preset patterns might overfit the criteria (i.e. include false positive results). This might happen since we strive for including all positive results and avoiding data loss. Make sure to review the matches found.
Pattern name | Description |
---|---|
CREDIT_CARD_NUMBER |
Catches credit card numbers with a number beginning with 1-6, and must be 13 to 19 digits long. Spaces and - are allowed anywhere in the number. |
DATE |
Matches date formats such as mm/dd/yyyy, mm/dd/yy, dd/mm/yyyy, and dd/mm/yy. It will reject any days/months greater than 31 and will match if a leading zero is or is not used for a single digit day or month. The delimiter can either be - , . , or / . |
TIME |
Matches time formats such as 00:00:00, 00:00, 00:00 PM. 12- and 24-hour formats are allowed. Seconds and 12 hour AM/PM denotation are both optional. |
EMAIL_ADDRESS |
Matches an email address with the format of [email protected] , where xyz can be any alpha numeric character or a dot. Find out more about the email pattern. |
INTERNATIONAL_PHONE_NUMBER |
Matches international-style phone numbers with a prefix of + or 00, containing between 7 and 15 digits with spaces or - occurring anywhere within the number. |
IP_V4 |
Matches an IPV4 address limited to number ranges of 0-255, with an optional mask. |
IP_V6 |
Matches full and compressed IPv6 addresses as defined in RFC 2373. |
MAC_ADDRESS |
Matches a MAC address with delimiters of either - or : |
NORTH_AMERICAN_PHONE_NUMBER |
Matches an NANP style phone number. In general, this will match the US and Canadian and various Caribbean countries. The pattern will also match an optional international prefix of +1 . |
SOCIAL_SECURITY_NUMBER |
Matches a US social security number (SSN). The format of the number should be either XXX-XX-XXXX or XXXXXXXXX, with X denoting [0-9]. We expect the number to have word boundaries on either side, or to be the start/end of the string. |
URL |
Matches a URL with a prefix of http |
US_ZIP_CODE |
Matches a USA-style zip code. The format expected is 00000 or 00000-0000, where the delimiter can either be - or / . |
VIN |
Matches US and ISO 3779 standard VINs. The format expects 17 characters, with the last 5 characters being numeric. I , O , Q , _ characters are not allowed in upper or lower case. |
To create and apply redactions that match a preset pattern, use a createRedactionsBySearch
operation with a preset strategy, like so:
instance .createRedactionsBySearch(PSPDFKit.SearchPattern.CREDIT_CARD_NUMBER, { searchType: PSPDFKit.SearchType.PRESET, searchInAnnotations: true, annotationPreset: { overlayText: "Redacted" } }) .then(function (ids) { console.log("The following annotations have been added:", ids); return instance.applyRedactions(); }); // You can add an "annotations.create" event listener and add custom logic based on the // information for each of the newly created redaction annotations. const { RedactionAnnotation } = PSPDFKit.Annotations; instance.addEventListener("annotations.create", (annotations) => { const redactions = annotations.filter( (annot) => annot instanceof RedactionAnnotation ); if (redactions.size > 0) { console.log("Redactions: ", redactions.toJS()); } });