OCR
You can perform OCR (optical character recognition) on any document with PSPDFKit for Web.
OCR is available when using the Web SDK with Document Engine in server-backed operational mode.
To do so, open the document from Document Engine and apply the performOcr
document operation with Instance.applyOperations
:
await instance.applyOperations([ { type: "performOcr", language: "english", pageIndexes: "all" } ]);
This will detect all English text in the document and make it available for searching and manual text selection.
Other Languages
If your document is written in a language other than English, you can extract its text by modifying the language
parameter. For example, to perform OCR in Spanish, run:
await instance.applyOperations([ { type: "performOcr", language: "spanish", pageIndexes: "all" } ]);
PSPDFKit for Web can perform OCR in the following languages:
-
Croatian
-
Czech
-
Danish
-
Dutch
-
English
-
Finnish
-
French
-
German
-
Indonesian
-
Italian
-
Malay
-
Norwegian
-
Polish
-
Portuguese
-
Serbian
-
Slovak
-
Slovenian
-
Spanish
-
Swedish
-
Turkish
-
Welsh