PDF Editing and Document Operations API for Linux
PSPDFKit Processor has been deprecated and replaced by Document Engine. To start using Document Engine, refer to the migration guide. With Document Engine, you’ll have access to robust new capabilities (read the blog for more information).
The
POST /process
API has been deprecated, and it may be removed in a future version of PSPDFKit. To perform document operations on a PDF, please use the build API instead.
Processing a Document
To process a document, submit a multipart/form-data
request to the POST /process
API endpoint.
Available headers for POST /process
are outlined below.
-
Optional:
Authorization
— The JSON Web Token (JWT). -
Optional:
pspdfkit-pdf-password
— The password required for the PDF document to be processed. -
Optional:
X-Request-Id
— If this is set, the log statements associated with the HTTP request are marked with arequest_id
label. Logs correlated with the same request have the same request ID. This helps you determine which request triggered a specific response and what errors or warnings were emitted during the request processing. The request ID needs to be between 20 and 200 characters long.
Available parameters for POST /process
are outlined below.
-
"file"
,"url"
, or"generation"
:-
"file"
— The document be processed. -
"url"
— The URL of the document to be processed. -
"generation"
— A JSON object describing how the document should be generated. See the PDF Generation schema guide for more information.
-
-
Optional:
"operations"
— The JSON object describing the operations to be performed on the supplied document. For all available operations, see the available operations guide. -
Optional: Attachment data for the operations — For example, the XFDF to be imported when using the
applyXfdf
document operation.
Request
POST /process Content-Type: multipart/form-data; boundary=customboundary Authorization: Token token="JWT Token" pspdfkit-pdf-password: "PDF Password" --customboundary Content-Disposition: form-data; name="file"; filename="Example Document.pdf" Content-Type: application/pdf <Document data> --customboundary Content-Disposition: form-data; name="operations" Content-Type: application/json <Operations JSON> --customboundary--
curl -H "Authorization: Token token=JWT_TOKEN" \ -F [email protected] \ -F operations="{\"operations\":[{\"type\": \"flattenAnnotations\"}]}" \ http://localhost:5000/process \ --output result.pdf
Response
HTTP/1.1 200 OK Content-Type: application/pdf <PDF data>
Available Operations
The following operations can be used in the POST /process
API to modify documents:
type Rotation = 0 | 90 | 180 | 270; type AddPageConfiguration = { backgroundColor: string, // #RRGGBB or rgb(number, number, number). pageWidth: number, pageHeight: number, rotateBy: Rotation, insets?: [number, number, number, number] }; type Annotation = ...; // See watermark documentation for more information. type Range = [min, max]; // 'min' and 'max' are inclusive. type ImportPageIndex = Array<number | Range>; type DocumentOperation = | {| type: "addPage", afterPageIndex: number, ...AddPageConfiguration |} | {| type: "addPage", beforePageIndex: number, ...AddPageConfiguration |} | {| type: "duplicatePages", pageIndexes: Array |} | {| type: "movePages", pageIndexes: Array, afterPageIndex: number |} | {| type: "movePages", pageIndexes: Array, beforePageIndex: number |} | {| type: "rotatePages", pageIndexes: Array, rotateBy: Rotation |} | {| type: "keepPages", pageIndexes: Array |} | {| type: "removePages", pageIndexes: Array |} | {| type: "setPageLabel", pageIndexes: Array, pageLabel: string |} | {| type: "importDocument", afterPageIndex: number, importedPageIndexes?: ImportPageIndex, treatImportedDocumentAsOnePage: boolean, document: string |} | {| type: "importDocument", beforePageIndex: number, importedPageIndexes?: ImportPageIndex, treatImportedDocumentAsOnePage: boolean, document: string |} | {| type: "applyXfdf", dataFilePath: string |} | {| type: "applyInstantJson", dataFilePath: string |} | {| type: "performOcr", pageIndexes: Array, language: string |} | {| type: "flattenAnnotations", annotationIds?: Array, pageIndexes?: Array, noteAnnotationBackgroundColor?: string, // #RRGGBB noteAnnotationOpacity?: number // 0.0 - 1.0 |} | {| type: "updateMetadata", metadata: { title?: string, author?: string, |} | {| type: "watermark", pageIndexes: Array, annotation: Annotation |} | {| type: "createRedactions", strategy: "regex" | "preset" | "text", strategyOptions: object, content: ?{ fillColor: ?string, // default is "#000000" overlayText: ?string, // default is null repeatOverlayText: ?boolean, // default is false color: ?string, // default is "#F82400" outlineColor: ?string, // default is "#F82400" creatorName: ?string, // default is null customData: ?object |} | {| type: "applyRedactions" |};
addPage
The addPage
operation allows you to add a single empty page to the document.
duplicatePages
The duplicatePages
operation will duplicate all pages at the given page indices. The duplicated page will be placed directly after the original page.
movePages
The movePages
operation moves the page at the specified page index to a place before or after the specified page index.
rotatePages
The rotatePages
operation will rotate the specified pages the desired amount. If the page is already rotated, this will add the specified rotation, so if a page is already rotated 90 degrees and you apply a 90-degreee rotation, it’ll result in the page being rotated 180 degrees.
keepPages
The keepPages
operation will remove all pages except the ones specified to be kept. So if you specify [0]
, only the first page of the document will be kept, and all others will be removed.
removePages
The removePages
operation will remove all specified pages.
setPageLabel
The setPageLabel
operation will set the label for all specified pages. This label is, for example, shown in PSPDFKit for Android and PSPDFKit for iOS when scrolling pages.
importDocument
The importDocument
operation allows you to add an existing PDF into your document. It’ll be added either before or after the specified page index, depending on if afterPageIndex
or beforePageIndex
is used. Using the treatImportedDocumentAsOnePage
option, you can make sure that as far as all follow-up operations are concerned, the imported document is only treated as a single page that makes specifying indices easier.
importedPageIndexes
may be used to import specific pages or a range of pages. If this parameter is left blank, the entire document will be imported.
applyXfdf
The applyXfdf
operation allows you to apply an existing XFDF file to the document. This will import all annotations found in the XFDF file and add them to the document.
applyInstantJson
The applyInstantJson
operation allows you to apply an existing Instant JSON file to the document. This will import all annotations and fill the form fields with the values found in the Instant JSON.
performOcr
The performOcr
operation allows you to run OCR on your document.
For a list of all languages supported by the performOcr
operation, see here.
flattenAnnotations
The flattenAnnotations
operation will flatten all annotations and form fields in the document, meaning they can no longer be modified. The note annotation options are used to specify how flattened note annotations are rendered. Currently, you can change the background color and the opacity. The background color is only used if the note annotation doesn’t have a color set. To flatten a subset of annotations, you can specify the annotation IDs (these can be annotation IDs you get from the GET /annotations
request, or they can be pdfObjectId
s of the annotations) or page indexes.
updateMetadata
The updateMetadata
operation allows you to change the title and author metadata stored in a PDF.
watermark
The watermark
operation allows you to add a specified annotation to all specified pages.
annotation
is an Instant JSON annotation, as described here.
Example Operations Object:
{ "operations": [ { "type": "watermark", "pageIndexes": "all", "annotation": { "horizontalAlign": "left", "bbox": [ 510.794701986755, 145.13907284768214, 101.03311258278146, 20.344370860927157 ], "font": "Helvetica", "rotation": 0, "pageIndex": 0, "updatedAt": "2019-07-09T06:55:33.426Z", "verticalAlign": "top", "type": "pspdfkit/text", "opacity": 0.5, "text": "Text annotation", "fontColor": "#000000", "fontSize": 72, "isFitting": true, "createdAt": "2019-07-09T06:55:24.320Z", "v": 1, "name": "1a287131-0473-402e-8094-097cb49083e2" } } ] }
This adds a simple free text annotation on all pages.
createRedactions
The createRedactions
operation allows you to batch create redaction annotation based on specified search criteria:
-
strategy
determines how PSPDFKit Processor finds the places to redact, and the shape ofstrategyOptions
. -
content
is optional and allows you to override the default values we use for the created redaction annotations.
Usage of this feature requires you to license the “Redaction” component.
preset Strategy
The preset
strategy creates redaction annotations on top of both text and annotations, which match one of the predefined patterns:
{ type: "createRedactions", strategy: "preset", strategyOptions: { preset: "credit-card-number" | "date" | "email-address" | "international-phone-number" | "ipv4" | "ipv6" | "mac-address" | "north-american-phone-number" | "social-security-number" | "time" | "url" | "us-zip-code" | "vin", includeAnnotations: ?boolean // default is true } }
includeAnnotations
determines whether redactions should also be created on top of annotations that include the matching text.
Note that the provided presets are designed in such a way that they might find matches across different types of data. When you’re not sure about the results, review the redaction annotations visually before applying them.
A created redaction annotation covers the matching text exactly, or in case of annotations, the whole annotation’s bounding box.
regex Strategy
The regex
strategy creates redaction annotations on top of text and annotations, which match the provided regular expressions:
{ type: "createRedactions", strategy: "regex", strategyOptions: { regex: string, includeAnnotations: ?boolean, // default is true caseSensitive: ?boolean } }
includeAnnotations
determines whether redactions should also be created on top of annotations that include the matching text.
The regular expression follows the ICU standard, which is described in detail here. To escape regex
control characters (e.g. “.” or “?”), you need to put a double backslash (”\”) in front of them. By default, the regular expression is case sensitive, but you can change that by setting the caseSensitive
parameter to false
.
If the regular expression is invalid, no redaction annotations are created.
A created redaction annotation covers the matching text exactly, or in case of annotations, the entire annotation’s bounding box.
text Strategy
The text
strategy creates redaction annotations on top of text and annotations, which match a literal search pattern:
{ type: "createRedactions", strategy: "text", strategyOptions: { text: string, includeAnnotations: ?boolean, // default is true caseSensitive: ?boolean } }
The text
property inside strategyOptions
is a search query. includeAnnotations
determines whether redactions should also be created on top of annotations that include the matching text.
Note that, by default, the search query is case insensitive, but you can change this by setting caseSensitive
to true
.
A created redaction annotation covers the matching text exactly, or in case of annotations, the entire annotation’s bounding box.
applyRedactions
The applyRedactions
operation will apply all redaction annotations that exist in the document. It is always ran as the last step no matter where it is placed in the list of operations. After redactions are applied the redaction annotations are removed from the document and in their place only the content as configured in the redaction annotation will be left, all other content will be permanently removed.
Usage of this feature requires you to license the “Redaction” component.