How to Use the OCR API
PSPDFKit Server has been deprecated and replaced by Document Engine. To start using Document Engine, refer to the migration guide. With Document Engine, you’ll have access to robust new capabilities (read the blog for more information).
This guide provides an overview of the OCR API and how to use it. For information on what OCR can do, please look here.
API Overview
PSPDFKit Server allows you to perform OCR using the performOcr
document operation. This can be either applied directly on upload or used with existing documents.
Running OCR on Upload
You can run OCR when uploading your document by providing performOcr
inside the operations
parameter. For more information on running operations on document upload, see here:
POST /api/documents Content-Type: multipart/form-data; boundary=customboundary Authorization: Token token="<secret token>" --customboundary Content-Disposition: form-data; name="operations" { operations: [ { type: "performOcr", language: "english", pageIndexes: [0], }, ], } --customboundary Content-Disposition: form-data; name="file"; filename="Example Document.pdf" Content-Type: application/pdf <PDF data> --customboundary--
Applying OCR to Existing Documents
You can also run OCR on documents you have already uploaded by using the apply_operations
endpoint:
POST /api/documents/:document_id/apply_operations Content-Type: application/json Authorization: Token token="<secret token>" { "operations": [ { type: "performOcr", language: "english", pageIndexes: [0], }, ] }
Performance Considerations
Running OCR is a CPU-bound single-threaded operation. That means performing many parallel OCR operations on a single PSPDFKit Server instance can cause a high load for extended periods of time. We did some performance testing using our development hardware (2.4 GHz 8-core Intel Core i9 9980HK, 32 GB RAM, running a single OCR operation at a time), which should give you an idea of what kinds of speed you can expect given your server infrastructure:
-
Running OCR on a 6-page document: ~35–40 seconds to run OCR on the entire document, ~6–11 seconds to run OCR on a single page.
-
Running OCR on a 1-page document: ~3–4 seconds to run OCR on the page.
Things that affect how fast OCR will be performed:
-
The amount of pages in the document.
-
The amount of pages OCR will be performed on.
-
The content of the pages OCR will be performed on.
-
The single-threaded performance of your server hardware.