Enhance document tagging with regular expressions

In Tagging, there a several places where you can specify patterns or regular expressions to constrain metadata that is extracted or tagged. Regular expressions enable you to apply formatting rules, check lengths, etc. to text to make sure they match a specific pattern. In essence, it validates the metadata before they are extracted from the document or tagged in SharePoint.

Here are some basic examples

Regular Expression Example matches Description
abc$ abc, 123abc Any text ending with abc
^abc abc, abc123 Any text that starts with abc
^[0-9]{5}$ 11111, 12345, 99999 Any 5 digit numbers
\d{1,4} 1, 24, 445, 3333 Any number that is 1 to 4 digits
[A-Za-z]{4}-\d{4} ABCD-1234, GYDL-8450 4 letters followed by a dash, then 4 numbers
[A-Za-z]{4}(- _)\d{4}
[A-Za-z]{4}[\W_]\d{4} ABCD-1234, ABCD_1234, ABCD 1234, ABCD+1234, ABCD#1234 4 letters followed by any non-word separator, then 4 numbers

Below are a few useful resources to get you started with regular expressions:

Some useful regular expressions taken from the resources above:

Field Regular Expression Example matches Description
Social Security Number ^\d{3}-\d{2}-\d{4}$ 111-11-1111 Validates the format, type, and length of the supplied input field. The input must consist of 3 numeric characters followed by a dash, then 2 numeric characters followed by a dash, and then 4 numeric characters.
Phone Number ^[01]?[- .]?(\([2-9]\d{2}\) [2-9]\d{2})[- .]?\d{3}[- .]?\d{4}$ (425) 555-0123
425-555-0123
425 555 0123
1-425-555-0123 Validates a U.S. phone number. It must consist of 3 numeric characters, optionally enclosed in parentheses, followed by a set of 3 numeric characters and then a set of 4 numeric characters.
E-mail ^(?(””)(””.+?””@) (([0-9a-zA-Z]((\.(?!\.)) [-!#\$%&’\*\+/=\?\^`\{\}|~\w])*)(?<=[0-9a-zA-Z])@))(?(\[)(\[(\d{1,3}\.){3}\d{1,3}\])
ZIP Code ^(\d{5}-\d{4} \d{5} \d{9})$
Currency (non- negative) ^\d+(\.\d\d)?$ 1.00 Validates a positive currency amount. If there is a decimal point, it requires 2 numeric characters after the decimal point. For example, 3.00 is valid but 3.1 is not.
Currency (positive or negative) ^(-)?\d+(\.\d\d)?$ 1.20 Validates for a positive or negative currency amount. If there is a decimal point, it requires 2 numeric characters after the decimal point.