PDF Editing and Document Operations API for Linux

Information

PSPDFKit Processor has been deprecated and replaced by Document Engine. To start using Document Engine, refer to the migration guide. With Document Engine, you’ll have access to robust new capabilities (read the blog for more information).

Warning

The POST /process API has been deprecated, and it may be removed in a future version of PSPDFKit. To perform document operations on a PDF, please use the build API instead.

Processing a Document

To process a document, submit a multipart/form-data request to the POST /process API endpoint.

Available headers for POST /process are outlined below.

  • Optional: Authorization — The JSON Web Token (JWT).

  • Optional: pspdfkit-pdf-password — The password required for the PDF document to be processed.

  • Optional: X-Request-Id — If this is set, the log statements associated with the HTTP request are marked with a request_id label. Logs correlated with the same request have the same request ID. This helps you determine which request triggered a specific response and what errors or warnings were emitted during the request processing. The request ID needs to be between 20 and 200 characters long.

Available parameters for POST /process are outlined below.

  • "file", "url", or "generation":

    • "file" — The document be processed.

    • "url" — The URL of the document to be processed.

    • "generation" — A JSON object describing how the document should be generated. See the PDF Generation schema guide for more information.

  • Optional: "operations" — The JSON object describing the operations to be performed on the supplied document. For all available operations, see the available operations guide.

  • Optional: Attachment data for the operations — For example, the XFDF to be imported when using the applyXfdf document operation.

Request

POST /process
Content-Type: multipart/form-data; boundary=customboundary
Authorization: Token token="JWT Token"
pspdfkit-pdf-password: "PDF Password"

--customboundary
Content-Disposition: form-data; name="file"; filename="Example Document.pdf"
Content-Type: application/pdf

<Document data>
--customboundary
Content-Disposition: form-data; name="operations"
Content-Type: application/json

<Operations JSON>
--customboundary--
curl -H "Authorization: Token token=JWT_TOKEN" \
  -F [email protected] \
  -F operations="{\"operations\":[{\"type\": \"flattenAnnotations\"}]}" \
  http://localhost:5000/process \
  --output result.pdf

Response

HTTP/1.1 200 OK
Content-Type: application/pdf

<PDF data>

Available Operations

The following operations can be used in the POST /process API to modify documents:

type Rotation = 0 | 90 | 180 | 270;

type AddPageConfiguration = {
  backgroundColor: string, // #RRGGBB or rgb(number, number, number).
  pageWidth: number,
  pageHeight: number,
  rotateBy: Rotation,
  insets?: [number, number, number, number]
};

type Annotation = ...; // See watermark documentation for more information.

type Range = [min, max]; // 'min' and 'max' are inclusive.
type ImportPageIndex = Array<number | Range>;

type DocumentOperation =
  | {| type: "addPage", afterPageIndex: number, ...AddPageConfiguration |}
  | {| type: "addPage", beforePageIndex: number, ...AddPageConfiguration |}
  | {| type: "duplicatePages", pageIndexes: Array |}
  | {| type: "movePages", pageIndexes: Array, afterPageIndex: number |}
  | {| type: "movePages", pageIndexes: Array, beforePageIndex: number |}
  | {| type: "rotatePages", pageIndexes: Array, rotateBy: Rotation |}
  | {| type: "keepPages", pageIndexes: Array |}
  | {| type: "removePages", pageIndexes: Array |}
  | {| type: "setPageLabel", pageIndexes: Array, pageLabel: string |}
  | {|
      type: "importDocument",
      afterPageIndex: number,
      importedPageIndexes?: ImportPageIndex,
      treatImportedDocumentAsOnePage: boolean,
      document: string
    |}
  | {|
      type: "importDocument",
      beforePageIndex: number,
      importedPageIndexes?: ImportPageIndex,
      treatImportedDocumentAsOnePage: boolean,
      document: string
    |}
  | {|
      type: "applyXfdf",
      dataFilePath: string
    |}
  | {|
      type: "applyInstantJson",
      dataFilePath: string
    |}
  | {|
      type: "performOcr",
      pageIndexes: Array,
      language: string
    |}
  | {|
      type: "flattenAnnotations",
      annotationIds?: Array,
      pageIndexes?: Array,
      noteAnnotationBackgroundColor?: string, // #RRGGBB
      noteAnnotationOpacity?: number // 0.0 - 1.0
    |}
  | {|
      type: "updateMetadata",
      metadata: {
        title?: string,
        author?: string,
    |}
  | {|
      type: "watermark",
      pageIndexes: Array,
      annotation: Annotation
    |}
  | {|
      type: "createRedactions",
      strategy: "regex" | "preset" | "text",
      strategyOptions: object,
      content: ?{
        fillColor: ?string, // default is "#000000"
        overlayText: ?string, // default is null
        repeatOverlayText: ?boolean, // default is false
        color: ?string, // default is "#F82400"
        outlineColor: ?string, // default is "#F82400"
        creatorName: ?string,   // default is null
        customData: ?object
    |}
  | {|
      type: "applyRedactions"
    |};

addPage

The addPage operation allows you to add a single empty page to the document.

duplicatePages

The duplicatePages operation will duplicate all pages at the given page indices. The duplicated page will be placed directly after the original page.

movePages

The movePages operation moves the page at the specified page index to a place before or after the specified page index.

rotatePages

The rotatePages operation will rotate the specified pages the desired amount. If the page is already rotated, this will add the specified rotation, so if a page is already rotated 90 degrees and you apply a 90-degreee rotation, it’ll result in the page being rotated 180 degrees.

keepPages

The keepPages operation will remove all pages except the ones specified to be kept. So if you specify [0], only the first page of the document will be kept, and all others will be removed.

removePages

The removePages operation will remove all specified pages.

setPageLabel

The setPageLabel operation will set the label for all specified pages. This label is, for example, shown in PSPDFKit for Android and PSPDFKit for iOS when scrolling pages.

importDocument

The importDocument operation allows you to add an existing PDF into your document. It’ll be added either before or after the specified page index, depending on if afterPageIndex or beforePageIndex is used. Using the treatImportedDocumentAsOnePage option, you can make sure that as far as all follow-up operations are concerned, the imported document is only treated as a single page that makes specifying indices easier.

importedPageIndexes may be used to import specific pages or a range of pages. If this parameter is left blank, the entire document will be imported.

applyXfdf

The applyXfdf operation allows you to apply an existing XFDF file to the document. This will import all annotations found in the XFDF file and add them to the document.

applyInstantJson

The applyInstantJson operation allows you to apply an existing Instant JSON file to the document. This will import all annotations and fill the form fields with the values found in the Instant JSON.

performOcr

The performOcr operation allows you to run OCR on your document.

For a list of all languages supported by the performOcr operation, see here.

flattenAnnotations

The flattenAnnotations operation will flatten all annotations and form fields in the document, meaning they can no longer be modified. The note annotation options are used to specify how flattened note annotations are rendered. Currently, you can change the background color and the opacity. The background color is only used if the note annotation doesn’t have a color set. To flatten a subset of annotations, you can specify the annotation IDs (these can be annotation IDs you get from the GET /annotations request, or they can be pdfObjectIds of the annotations) or page indexes.

updateMetadata

The updateMetadata operation allows you to change the title and author metadata stored in a PDF.

watermark

The watermark operation allows you to add a specified annotation to all specified pages.

annotation is an Instant JSON annotation, as described here.

Example Operations Object:

{
	"operations": [
		{
			"type": "watermark",
			"pageIndexes": "all",
			"annotation": {
				"horizontalAlign": "left",
				"bbox": [
					510.794701986755,
					145.13907284768214,
					101.03311258278146,
					20.344370860927157
				],
				"font": "Helvetica",
				"rotation": 0,
				"pageIndex": 0,
				"updatedAt": "2019-07-09T06:55:33.426Z",
				"verticalAlign": "top",
				"type": "pspdfkit/text",
				"opacity": 0.5,
				"text": "Text annotation",
				"fontColor": "#000000",
				"fontSize": 72,
				"isFitting": true,
				"createdAt": "2019-07-09T06:55:24.320Z",
				"v": 1,
				"name": "1a287131-0473-402e-8094-097cb49083e2"
			}
		}
	]
}

This adds a simple free text annotation on all pages.

createRedactions

The createRedactions operation allows you to batch create redaction annotation based on specified search criteria:

  • strategy determines how PSPDFKit Processor finds the places to redact, and the shape of strategyOptions.

  • content is optional and allows you to override the default values we use for the created redaction annotations.

Usage of this feature requires you to license the “Redaction” component.

preset Strategy

The preset strategy creates redaction annotations on top of both text and annotations, which match one of the predefined patterns:

{
  type: "createRedactions",
  strategy: "preset",
  strategyOptions: {
    preset: "credit-card-number"
          | "date"
          | "email-address"
          | "international-phone-number"
          | "ipv4"
          | "ipv6"
          | "mac-address"
          | "north-american-phone-number"
          | "social-security-number"
          | "time"
          | "url"
          | "us-zip-code"
          | "vin",
    includeAnnotations: ?boolean // default is true
  }
}

includeAnnotations determines whether redactions should also be created on top of annotations that include the matching text.

Note that the provided presets are designed in such a way that they might find matches across different types of data. When you’re not sure about the results, review the redaction annotations visually before applying them.

A created redaction annotation covers the matching text exactly, or in case of annotations, the whole annotation’s bounding box.

regex Strategy

The regex strategy creates redaction annotations on top of text and annotations, which match the provided regular expressions:

{
  type: "createRedactions",
  strategy: "regex",
  strategyOptions: {
    regex: string,
    includeAnnotations: ?boolean, // default is true
    caseSensitive: ?boolean
  }
}

includeAnnotations determines whether redactions should also be created on top of annotations that include the matching text.

The regular expression follows the ICU standard, which is described in detail here. To escape regex control characters (e.g. “.” or “?”), you need to put a double backslash (”\”) in front of them. By default, the regular expression is case sensitive, but you can change that by setting the caseSensitive parameter to false.

If the regular expression is invalid, no redaction annotations are created.

A created redaction annotation covers the matching text exactly, or in case of annotations, the entire annotation’s bounding box.

text Strategy

The text strategy creates redaction annotations on top of text and annotations, which match a literal search pattern:

{
  type: "createRedactions",
  strategy: "text",
  strategyOptions: {
    text: string,
    includeAnnotations: ?boolean, // default is true
    caseSensitive: ?boolean
  }
}

The text property inside strategyOptions is a search query. includeAnnotations determines whether redactions should also be created on top of annotations that include the matching text.

Note that, by default, the search query is case insensitive, but you can change this by setting caseSensitive to true.

A created redaction annotation covers the matching text exactly, or in case of annotations, the entire annotation’s bounding box.

applyRedactions

The applyRedactions operation will apply all redaction annotations that exist in the document. It is always ran as the last step no matter where it is placed in the list of operations. After redactions are applied the redaction annotations are removed from the document and in their place only the content as configured in the redaction annotation will be left, all other content will be permanently removed.

Usage of this feature requires you to license the “Redaction” component.