Build API Reference for PDF Processor
PSPDFKit Processor has been deprecated and replaced by Document Engine. To start using Document Engine, refer to the migration guide. With Document Engine, you’ll have access to robust new capabilities (read the blog for more information).
Processor’s /build
API allows you to assemble a PDF from multiple parts, such as an existing PDF, a blank page, or an HTML page. You can apply one or more actions, such as watermarking, rotating pages, or importing annotations. Once the entire PDF is generated from its parts, you can also apply additional actions, such as optical character recognition (OCR), to the assembled PDF itself.
Authentication
The /build
API supports multiple types of authentication, which you can read about in our Authentication API guide.
Multipart Request
The basic use case for the /build
API is to upload all inputs together in the build instructions with the multipart/form-data
request. In the following example, the request to the /build
API consists of multiple parts and actions:
curl -X POST http://localhost:5000/build \ -H "Authorization: Token token=secret" \ -o result.pdf \ -F [email protected] \ -F [email protected] \ -F [email protected] \ -F instructions='{ "parts": [ { "file": "document", "pages": { "start": 0, "end": 0 }, "actions": [ { "type": "flatten" } ] }, { "html": "index" }, { "page": "new", "pageCount": 3, "backgroundColor": "green" } ], "actions": [ { "type": "watermark", "image": "logo", "width": "25%" } ], "output": { "type": "pdf" } }'
In the request, the output file is set to result.pdf
. On the success of the request, the output will be written to this file.
The request requires three dependencies: document.pdf
, index.html
, and logo.png
. The presence of any unused dependencies results in the request failing.
The instructions
JSON consists of three fields: parts
, actions
, and output
. In this example, the output file contains the following elements:
-
The first page from the
document.pdf
file. This page is flattened, which embeds all annotations. -
The
index.html
file converted to a PDF document. -
Three newly created pages with a green background.
A watermark is then applied across all pages, and the output is converted to a PDF document.
For more information about all possible options for parts, actions, and output, refer to the relevant specification below.
Simple Request
The /build
API supports inputs provided from remote URLs. If all inputs are provided as remote URLs, the multipart request isn’t necessary and can be simplified to a non-multipart request with the application/json
content type.
For example, to provide all inputs from the previous example via URL, the processing request can be simplified to:
curl -X POST http://localhost:5000/build \ -H "Authorization: Token token=secret" \ -H "Content-Type: application/json" \ -o result.pdf \ --data '{ "parts": [ { "file": {"url": "https://remote-files-storage/document.pdf"}, "pages": { "start": 0, "end": 0 }, "actions": [ { "type": "flatten" } ] }, { "html": {"url": "https://remote-files-storage/index.html"} }, { "page": "new", "pageCount": 3, "backgroundColor": "green" } ], "actions": [ { "type": "watermark", "image": {"url": "https://remote-files-storage/logo.png"}, "width": "25%" } ], "output": { "type": "pdf" } }'
API Specification
This section provides a full reference of the /build
API.
Inputs
Input files and assets used during processing can be provided either as a part in the multipart/form-data
request or as a URL.
-
Part name — Parts in the
multipart/form-data
request are referenced by their names:type InputName = string;
-
Remote URL — Inputs for processing can be provided with a URL:
type InputUrl = { // Valid absolute URL that's accessible by the Processor service. url: string,; // An optional SHA256 hash of the file content. Used to check downloaded file integrity if provided. sha256?: string; };
Instructions
When making requests to the API, the instructions
object needs to follow the given specification. The parts
parameter is required, and the actions
and output
are optional:
type Instructions = {
parts: Part[];
actions?: Action[];
output?: Output;
};
Parts
The parts
object can have the following types outlined below.
File Part
This is used to import an already existing PDF file:
type Part = { // The input file that will be used as a source for the part. file: InputName | InputUrl; // Password of the file, if it's password protected. password?: string; // The page range to include in the part. pages?: { // The page index can be negative, indicating an offset from the last page, with -1 // being the last page itself. start?: number | 'first' | 'last'; end?: number | 'first' | 'last'; }; // The following actions will only be applied to the given part, and // not the entire document. actions?: Action[]; };
HTML Part
This is used to generate a PDF from HTML. For detailed information on generating a PDF from HTML, check out our PDF Generation guide:
type PageSize = | 'A0' | 'A1' | 'A2' | 'A3' | 'A4' | 'A5' | 'A6' | 'A7' | 'A8' | 'Letter' | 'Legal'; type Part = { // The HTML file that will be used as a source for the part. html: InputName | InputUrl; // All assets imported in the HTML. Reference the name passed in the multipart request. assets?: Array<string>; layout?: { orientation?: 'landscape' | 'portrait'; size?: | { width: number; height: number; } | PageSize; // {width, height} in mm or page size preset. margin?: { // Margin sizes in mm. left: number; top: number; right: number; bottom: number; }; }; actions?: Action[]; };
New Page Part
This is used to create a new blank page:
type Part = { page: 'new'; // The number of blank pages to add. pageCount?: number; // Background color of the blank pages. // Specified with predefined color names, // or with RGB, HEX, HSL, RGBA, or HSLA values. backgroundColor?: string; layout?: { orientation?: 'landscape' | 'portrait'; size?: { width: number; height: number; }; margin?: { left?: number; right?: number; top?: number; bottom?: number; }; }; // The following actions will only be applied to the given part, and // not the entire document. actions?: Action[]; };
Actions
Actions
can be one of the following outlined below.
Apply JSON
This applies the given Instant JSON file to the document, and it’s used to import annotations to a document. It can also be used to fill forms:
type Action = { type: 'applyInstantJson'; // The input file in Instant JSON format. file: InputName | InputUrl; };
Apply XFDF
This applies the given XFDF file to the document, and it’s used to import annotations to a document. It can also be used to fill forms:
type Action = { type: 'applyXfdf'; // The input file in XFDF format. file: InputName | InputUrl; };
Flatten
This flattens the annotations in the given part or document:
type Action = { type: 'flatten'; // Optional list of annotation IDs to flatten. These can be annotation IDs or `pdfObjectId`s. // If provided, only the annotations with the given IDs will be flattened. annotationIds?: string[]; };
Insert Page
This inserts a blank page in the middle of a document:
type Action = { type: 'insertPage'; // Either one of `afterPageIndex` or `beforePageIndex` is required. afterPageIndex?: integer; beforePageIndex?: integer; pageHeight: number; pageWidth: number; rotateBy?: 90 | 180 | 270; // Background color of the blank pages. // Specified with predefined color names, // or with RGB, HEX, HSL, RGBA, or HSLA values. backgroundColor?: string; };
OCR
This performs optical character recognition (OCR) in the given document. The list of supported languages can be found in the supported languages guide:
type Action = { type: 'ocr'; language: string; };
Rotate Page
This rotates all the pages of the given document by the angle specified:
type Action = { type: 'rotate'; rotateBy: 90 | 180 | 270; };
Watermark
This adds an image or text watermark on all the pages of the given document:
type Action = TextWatermarkAction | ImageWatermarkAction; type TextWatermarkAction = { type: 'watermark'; text: string; // Both width and height are required. width: Dimension; height: Dimension; // Only one out of top or bottom is allowed. top?: Position; // Only one out of right or left is allowed. right?: Position; bottom?: Position; left?: Position; rotation?: integer; opacity?: number; fontSize?: integer; fontColor?: string; // `italic` and `bold` are supported styles. // They can be used together. fontStyle?: string[]; fontFamily?: string; }; type ImageWatermarkAction = { type: 'watermark'; // The input image file that will be used as a source for the image watermark. image: InputName | InputUrl; // Both width and height are required. width: Dimension; height: Dimension; // Only one out of top or bottom is allowed. top?: Position; // Only one out of right or left is allowed. right?: Position; bottom?: Position; left?: Position; rotation?: integer; opacity?: number; }; type Dimension = { value: integer; unit: 'pt' | '%'; }; type Position = { value: integer; unit: 'pt' | '%'; };
Create Redactions
This creates redaction annotations according to the given strategy. Once redactions are created, they need to be applied using the applyRedactions
action:
type Content = { fillColor?: string; // default is "#000000" overlayText?: string; // default is null repeatOverlayText?: boolean; // default is false color?: string; // default is "#F82400" outlineColor?: string; // default is "#F82400" creatorName?: string; // default is null customData?: object; }; type Action = { type: 'createRedactions'; strategy: 'preset' | 'regex' | 'text'; strategyOptions: PresetOption | RegexOption | TextOption; content?: Content; }; type PresetOption = { preset: | 'credit-card-number' | 'date' | 'email-address' | 'international-phone-number' | 'ipv4' | 'ipv6' | 'mac-address' | 'north-american-phone-number' | 'social-security-number' | 'time' | 'url' | 'us-zip-code' | 'vin'; include_annotations?: boolean; case_sensitive?: boolean; }; type RegexOption = { regex: string; include_annotations?: boolean; case_sensitive?: boolean; }; type TextOption = { text: string; include_annotations?: boolean; case_sensitive?: boolean; };
Apply Redactions
This applies the redactions created by an earlier createRedactions
action:
type Action = { type: 'applyRedactions'; };
Output
The output
object needs to follow this specification:
type Output = PDFOutput | PDFAOutput | ImageOutput | JsonContentOutput; type BasePDFOutput = { owner_password?: string; user_password?: string; user_permissions?: UserPermissions[]; metadata?: Metadata; labels?: Label[]; optimize?: Optimize; }; type PDFOutput = BasePDFOutput & { type: 'pdf' }; type PDFAOutput = BasePDFOutput & { type: 'pdfa'; // Default is "pdfa-1b". conformance?: | 'pdfa-1a' | 'pdfa-1b' | 'pdfa-2a' | 'pdfa-2u' | 'pdfa-2b' | 'pdfa-3a' | 'pdfa-3u'; // The currently supported conformance levels. vectorization?: boolean; // `true` by default. rasterization?: boolean; // `true` by default. }; type ImageOutput = { type: 'image'; format: 'jpg' | 'jpeg' | 'png' | 'webp'; // The default is to render the first page. pages?: { // The page index can be negative, indicating an offset from the last page, with -1 // being the last page itself. start?: number | 'first' | 'last'; end?: number | 'first' | 'last'; }; // One of width, height, or DPI needs to be specified. width?: number; height?: number; dpi?: number; }; // Extracts document contents and returns them as a JSON. type JsonContentOutput = { type: 'json-content'; // When set to true, extracts document text. Text is extracted via OCR process. plainText: boolean; // When set to true, extracts structured document text. This includes text words, characters, lines, and paragraphs. structuredText: boolean; // When set to true, extracts key-value pairs detected within the document contents. Example of detected values are phone numbers, email addresses, currencies, numbers, dates. keyValuePairs: boolean; // When set to true, extracts table data from the document. tables: boolean; // Specifies the language to be used for OCR text extraction. Supports the same set of languages as OCR action. language: string; } type UserPermissions = | 'printing' | 'modification' | 'extract' | 'annotations_and_forms' | 'fill_forms' | 'extract_accessibility' | 'assemble' | 'print_high_quality'; // Represents metadata for a PDF. type Metadata = { title?: string; author?: string; }; // Represents a label and the pages associated with it. type Label = { pages: integer[]; // 0-based page index. label: string; }; // Represents the available compression options for the PDF output. type Optimize = { grayscaleText?: boolean; // `false` by default. grayscaleGraphics?: boolean; // `false` by default. grayscaleFormFields?: boolean; // `false` by default. grayscaleAnnotations?: boolean; // `false` by default. disableImages?: boolean; // `false` by default. mrcCompression?: boolean; // `false` by default. linearize?: boolean; // `false` by default. imageOptimizationQuality?: integer; // The range is between 1 and 4, where 1 is low and 4 is very high. The default value is 2. };