Convert to PDF/A

PDF/A is a document format intended for long-term preservation. Document Engine supports converting source files into all PDF/A versions and conformance levels:

  • PDF/A-1a, PDF/A-1b

  • PDF/A-2a, PDF/A-2u, PDF/A-2b

  • PDF/A-3a, PDF/A-3u, PDF/A-3b

  • PDF/A-4, PDF/A-4e, PDF/A-4f

For more information on the long-term preservation of documents, check out our demo video below, or have a look at our complete guide to PDF/A.

Converting Documents to PDF/A

To generate a PDF/A document using the Build API, specify the relevant options in the output section of the build instructions, as shown in the following example:

curl -X POST http://localhost:5000/api/build \
  -H "Authorization: Token token=<API token>" \
  -F document=@/path/to/example-document.pdf \
  -F instructions='{
  "parts": [
    {
      "file": "document"
    }
  ],
  "output": {
    "type": "pdfa",
    "conformance": "pdfa-2a",
    "vectorization": true,
    "rasterization": true
  }
}' \
  -o result.pdf
POST /api/build HTTP/1.1
Content-Type: multipart/form-data; boundary=customboundary
Authorization: Token token=<API token>

--customboundary
Content-Disposition: form-data; name="document"; filename="example-document.pdf"
Content-Type: application/pdf

<PDF data>
--customboundary
Content-Disposition: form-data; name="instructions"
Content-Type: application/json

{
  "parts": [
    {
      "file": "document"
    }
  ],
  "output": {
    "type": "pdfa",
    "conformance": "pdfa-2a",
    "vectorization": true,
    "rasterization": true
  }
}
--customboundary--

Configuring PDF/A Conversion

PDF/A documents are intended for long-term preservation, and their structure is different from PDF documents. To ensure compliance with your chosen conformance level, the conversion process may introduce changes to the document’s content or appearance. This might change the document by adding, editing, or removing document structure elements, embedding fonts, etc.

In some cases, direct conversion isn’t possible. Document Engine then uses other techniques such as vectorization and rasterization:

  • Vectorization means that if some document elements cannot be used directly in the PDF/A output, they’re embedded in the output document as vector-based graphic elements. This technique is typically used for fonts and paths.

  • Rasterization means that if some document content cannot be used directly in the PDF/A output, it’s embedded in the output document as raster images.

Both approaches result in the loss of fonts and text information because the text is converted into shapes and raster images. Text information can later be recovered using optical character recognition (OCR).

To control whether Document Engine uses the vectorization and rasterization techniques if necessary, set the vectorization and rasterization options to true.

Licensing

To convert documents to PDF/A with Document Engine, the PDF/A API needs to be included in your Document Engine license. Contact Sales to add the PDF/A API to your license. After it’s added to your license, update the license or activation keys in your configuration.

PDF/A Validation

Document Engine also supports validating the conformance level of existing PDF/A documents. To learn more about how to validate PDF/A conformance, refer to the PDF/A validation guide.