How to Use the OCR Server
This guide provides an overview of the OCR API and how to use it. For information on what OCR can do, refer to the OCR overview guide.
API Overview
Document Engine allows you to perform OCR using the ocr
action in the Build API. This can be either applied directly on upload or used with existing documents.
Running OCR on Upload
You can run OCR when uploading your document by providing the OCR action inside the instructions
parameter.
curl -X POST http://localhost:5000/api/documents \ -H "Authorization: Token token=<API token>" \ -F instructions='{ "parts": [ { "file": "file-part" } ], "actions": [ { "type": "ocr", "language": "english" } ] }' \ -F document=@/path/to/Example Document.pdf \ -o result.pdf
POST /api/documents HTTP/1.1 Content-Type: multipart/form-data; boundary=customboundary Authorization: Token token=<API token> --customboundary Content-Disposition: form-data; name="instructions" Content-Type: application/json { "parts": [ { "file": "file-part" } ], "actions": [ { "type": "ocr", "language": "english" } ] } --customboundary Content-Disposition: form-data; name="document"; filename="Example Document.pdf" Content-Type: application/pdf <PDF data> --customboundary--
Applying OCR to Existing Documents and Persisting the Result
You can also run OCR on documents you’ve already uploaded by using the apply_instructions
endpoint.
curl -X POST http://localhost:5000/api/documents/:document_id/apply_instructions \ -H 'Authorization: Token token=<API token>' \ -H "Content-Type: application/json" \ -d '{ "parts": [ { "document": { "id": "#self" } } ], "actions": [ { "type": "ocr", "language": "english" } ] }' \ -o result.pdf
POST /api/documents/:document_id/apply_instructions HTTP/1.1 Content-Type: application/json Authorization: Token token=<API token> { "parts": [ { "document": { "id": "#self" } } ], "actions": [ { "type": "ocr", "language": "english" } ] }
Note that the current document can be referred to by using the #self
anchor:
{
"document": { "id": "#self" }
}
Performing OCR and Downloading the Result
You can also run OCR as part of the Build API request. This method allows you to upload input documents and retrieve the result without storing anything in Document Engine’s persistent storage.
curl -X POST http://localhost:5000/api/build \ -H "Authorization: Token token=<API token>" \ -F instructions='{ "parts": [ { "file": "file-part" } ], "actions": [ { "type": "ocr", "language": "english" } ] }' \ -F document=@/path/to/Example Document.pdf \ -o result.pdf
POST /api/build HTTP/1.1 Content-Type: multipart/form-data; boundary=customboundary Authorization: Token token=<API token> --customboundary Content-Disposition: form-data; name="instructions" Content-Type: application/json { "parts": [ { "file": "file-part" } ], "actions": [ { "type": "ocr", "language": "english" } ] } --customboundary Content-Disposition: form-data; name="document"; filename="Example Document.pdf" Content-Type: application/pdf <PDF data> --customboundary--
Performance Considerations
Running OCR is a CPU-bound single-threaded operation. This means performing many parallel OCR operations on a single Document Engine instance can cause a high load for extended periods of time. We did some performance testing using our development hardware (2.4 GHz 8-core Intel Core i9 9980HK, 32 GB RAM, running a single OCR operation at a time), which should give you an idea of what kinds of speed you can expect given your server infrastructure:
-
Running OCR on a 6-page document: ~35–40 seconds to run OCR on the entire document, ~6–11 seconds to run OCR on a single page.
-
Running OCR on a 1-page document: ~3–4 seconds to run OCR on the page.
Things that affect how fast OCR will be performed:
-
The amount of pages in the document.
-
The amount of pages OCR will be performed on.
-
The content of the pages OCR will be performed on.
-
The single-threaded performance of your server hardware.