How to Use the OCR Server

This guide provides an overview of the OCR API and how to use it. For information on what OCR can do, refer to the OCR overview guide.

API Overview

Document Engine allows you to perform OCR using the ocr action in the Build API. This can be either applied directly on upload or used with existing documents.

Running OCR on Upload

You can run OCR when uploading your document by providing the OCR action inside the instructions parameter.

curl -X POST http://localhost:5000/api/documents \
  -H "Authorization: Token token=<API token>" \
  -F instructions='{
  "parts": [
    {
      "file": "file-part"
    }
  ],
  "actions": [
    {
      "type": "ocr",
      "language": "english"
    }
  ]
}' \
  -F document=@/path/to/Example Document.pdf \
  -o result.pdf
POST /api/documents HTTP/1.1
Content-Type: multipart/form-data; boundary=customboundary
Authorization: Token token=<API token>

--customboundary
Content-Disposition: form-data; name="instructions"
Content-Type: application/json

{
  "parts": [
    {
      "file": "file-part"
    }
  ],
  "actions": [
    {
      "type": "ocr",
      "language": "english"
    }
  ]
}
--customboundary
Content-Disposition: form-data; name="document"; filename="Example Document.pdf"
Content-Type: application/pdf

<PDF data>
--customboundary--

Applying OCR to Existing Documents and Persisting the Result

You can also run OCR on documents you’ve already uploaded by using the apply_instructions endpoint.

curl -X POST http://localhost:5000/api/documents/:document_id/apply_instructions \
  -H 'Authorization: Token token=<API token>' \
  -H "Content-Type: application/json" \
  -d '{
  "parts": [
    {
      "document": {
        "id": "#self"
      }
    }
  ],
  "actions": [
    {
      "type": "ocr",
      "language": "english"
    }
  ]
}' \
  -o result.pdf
POST /api/documents/:document_id/apply_instructions HTTP/1.1
Content-Type: application/json
Authorization: Token token=<API token>

{
  "parts": [
    {
      "document": {
        "id": "#self"
      }
    }
  ],
  "actions": [
    {
      "type": "ocr",
      "language": "english"
    }
  ]
}

Note that the current document can be referred to by using the #self anchor:

{
  "document": { "id": "#self" }
}

Performing OCR and Downloading the Result

You can also run OCR as part of the Build API request. This method allows you to upload input documents and retrieve the result without storing anything in Document Engine’s persistent storage.

curl -X POST http://localhost:5000/api/build \
  -H "Authorization: Token token=<API token>" \
  -F instructions='{
  "parts": [
    {
      "file": "file-part"
    }
  ],
  "actions": [
    {
      "type": "ocr",
      "language": "english"
    }
  ]
}' \
  -F document=@/path/to/Example Document.pdf \
  -o result.pdf
POST /api/build HTTP/1.1
Content-Type: multipart/form-data; boundary=customboundary
Authorization: Token token=<API token>

--customboundary
Content-Disposition: form-data; name="instructions"
Content-Type: application/json

{
  "parts": [
    {
      "file": "file-part"
    }
  ],
  "actions": [
    {
      "type": "ocr",
      "language": "english"
    }
  ]
}
--customboundary
Content-Disposition: form-data; name="document"; filename="Example Document.pdf"
Content-Type: application/pdf

<PDF data>
--customboundary--

Performance Considerations

Running OCR is a CPU-bound single-threaded operation. This means performing many parallel OCR operations on a single Document Engine instance can cause a high load for extended periods of time. We did some performance testing using our development hardware (2.4 GHz 8-core Intel Core i9 9980HK, 32 GB RAM, running a single OCR operation at a time), which should give you an idea of what kinds of speed you can expect given your server infrastructure:

  • Running OCR on a 6-page document: ~35–40 seconds to run OCR on the entire document, ~6–11 seconds to run OCR on a single page.

  • Running OCR on a 1-page document: ~3–4 seconds to run OCR on the page.

Things that affect how fast OCR will be performed:

  • The amount of pages in the document.

  • The amount of pages OCR will be performed on.

  • The content of the pages OCR will be performed on.

  • The single-threaded performance of your server hardware.