Redact PDFs Using RegEx

Document Engine lets you create redactions on top of text matching a provided regular expression via the createRedactions action. This is the most versatile redaction creation strategy.

{
  "type": "createRedactions",
  "strategy": "regex",
  "strategyOptions": {
    "regex": "\\d+"
  }
}

See the API Reference for details regarding the regular expression syntax and escaping rules.

Applying Redactions

After redaction annotations are created, they need to be applied to the document to effectively and permanently remove the covered content. You can achieve this by adding the applyRedactions action to the /build instructions.

Before you get started, make sure Document Engine is up and running.

You’ll be sending multipart POST requests with instructions to Document Engine’s /api/build endpoint. To learn more about multipart requests, refer to our blog post on the topic, A Brief Tour of Multipart Requests.

Check out the API Reference to learn more about the /api/build endpoint and all the actions you can perform on PDFs with Document Engine.

Creating and Applying Redactions in a File on Disk

Send a multipart request to the /api/build endpoint attached with the input file and the instructions JSON:

curl -X POST http://localhost:5000/api/build \
  -H "Authorization: Token token=<API token>" \
  -F file=@/path/to/example-document.pdf \
  -F instructions='{
  "parts": [
    {
      "file": "document",
      "actions": [
        {
          "type": "createRedactions",
          "strategy": "regex",
          "strategyOptions": {
            "regex": "\\d+"
          }
        },
        {
          "type": "applyRedactions"
        }
      ]
    }
  ]
}' \
  -o result.pdf
POST /api/build HTTP/1.1
Content-Type: multipart/form-data; boundary=customboundary
Authorization: Token token=<API token>

--customboundary
Content-Disposition: form-data; name="file"; filename="example-document.pdf"
Content-Type: application/pdf

<PDF data>
--customboundary
Content-Disposition: form-data; name="instructions"
Content-Type: application/json

{
  "parts": [
    {
      "file": "document",
      "actions": [
        {
          "type": "createRedactions",
          "strategy": "regex",
          "strategyOptions": {
            "regex": "\\d+"
          }
        },
        {
          "type": "applyRedactions"
        }
      ]
    }
  ]
}
--customboundary--

This creates redaction annotations and applies them to the file, removing the content beneath them.

Creating and Applying Redactions in a File from a URL

Send a request to the /api/build endpoint and include a URL pointing to the file you want to redact:

curl -X POST http://localhost:5000/api/build \
  -H "Authorization: Token token=<API token>" \
  -F instructions='{
  "parts": [
    {
      "file": {
        "url": "https://pspdfkit.com/downloads/examples/credit-card-application.pdf"
      },
      "actions": [
        {
          "type": "createRedactions",
          "strategy": "regex",
          "strategyOptions": {
            "regex": "\\d+"
          }
        },
        {
          "type": "applyRedactions"
        }
      ]
    }
  ]
}' \
  -o result.pdf
POST /api/build HTTP/1.1
Content-Type: multipart/form-data; boundary=customboundary
Authorization: Token token=<API token>

--customboundary
Content-Disposition: form-data; name="instructions"
Content-Type: application/json

{
  "parts": [
    {
      "file": {
        "url": "https://pspdfkit.com/downloads/examples/credit-card-application.pdf"
      },
      "actions": [
        {
          "type": "createRedactions",
          "strategy": "regex",
          "strategyOptions": {
            "regex": "\\d+"
          }
        },
        {
          "type": "applyRedactions"
        }
      ]
    }
  ]
}
--customboundary--

This creates redaction annotations and applies them to the file, removing the content beneath them.