Redacting PDFs in Linux

Information

PSPDFKit Processor has been deprecated and replaced by Document Engine. To start using Document Engine, refer to the migration guide. With Document Engine, you’ll have access to robust new capabilities (read the blog for more information).

Redaction is the process of removing image, text, and vector content from a PDF page. This not only involves obscuring the content, but also removing the data in the document within the specified region.

Redaction is generally used when you want to remove personally identifiable or sensitive information from a document to ensure confidentiality and conform to regulations and privacy laws, such as GDPR or HIPAA. By using the Redaction component, the original content of a PDF can’t be restored, thereby guaranteeing privacy.

Redaction is a two-step process:

  • First, redaction annotations are created in the areas that are to be redacted. This step won’t remove any content from the document yet; it just marks regions for redaction.

  • Second, to actually remove the content, the redaction annotations need to be applied. In this step, the page content within the region of the redaction annotations is irreversibly removed.

The actual removal of content happens only after redaction annotations are applied to the document. Before applying them, they can be edited and removed the same as any other annotation.

For more information, see Adobe’s PDF Redaction: Addendum for the PDF Reference.

Security of Redaction

Redaction permanently removes the following information from a file:

  • Visible text and the graphical content under the areas marked for redaction

  • Annotations, comments, and markup intersecting the areas marked for redaction

The information isn’t simply obscured or masked, but is completely missing from the file.

Redaction doesn’t remove the following information:

  • Metadata such as PDF title and author

  • Embedded content and attached files (like XMP)

  • Hidden layers

  • Hidden text

Redacting Graphic Objects Shared across Pages

Graphic objects — including images and vector graphics — can be reused across pages in a PDF. If a graphic object is redacted and reused, all instances of that graphic object will also be redacted.

This means that, for example, when you redact part of an image, the same part of the same image on another page will also be redacted. This is compatible with how Adobe Acrobat does it. A common example of this is if you have to redact a logo that is shown on each page.

Warning

Even if the images look exactly the same, they could be separate images and not be redacted the same. Always be careful and review the redacted document when you’re done.

Licensing

The following redaction features are part of the Redaction component that must be licensed separately. The following list describes the expected behavior if Redaction isn’t part of your license: