How to OCR a PDF on Android

With OCR, you can enhance raster and vector PDFs to unlock previously inaccessible text and make it available for text selection, annotation, search, accessibility, and more. OCR builds on top of the PdfProcessor APIs, which offer a range of input and output sources to work with.

Performing OCR

To perform OCR on a document, create a new PdfProcessorTask and use its performOcrOnPages() method to specify the indexes of pages that OCR should be performed on, along with the language that should be used:

// Set up a set of all pages that should be processed.
val pageIndexes: Set<Int> = (0 until document.pageCount).toSet()
// Create a task and configure it for OCR processing. Here, we'll detect English text.
val task = PdfProcessorTask.fromDocument(document)
    .performOcrOnPages(allPages, OcrLanguage.ENGLISH)

// Set up a set of all pages that should be processed.
final Set<Integer> allPages = new HashSet<Integer>();
for(int pageIndex = 0; pageIndex < document.getPageCount(); pageIndex++) {
    allPages.add(pageIndex);
}
// Create a task and configure it for OCR processing. Here, we'll detect English text.
final PdfProcessorTask task = PdfProcessorTask.fromDocument(document)
    .performOcrOnPages(allPages, OcrLanguage.ENGLISH);

Next, after setting up the processor task, you can start processing the document by passing the task to one of the existing document processing methods of PdfProcessor.

Since OCR processing speeds depend on various factors like the size of a document, the number of processed pages, and the device processing is performed on, make sure to run processing away from the main thread either by using processDocumentAsync() or by using any of the blocking processor methods on a background thread:

val outputFile = context.filesDir.resolve("processed-document.pdf")
val disposable = PdfProcessor.processDocumentAsync(task, outputFile).subscribe()

final File outputFile = new File(context.getFilesDir(), "processed-document.pdf");
final Disposable disposable = PdfProcessor.processDocumentAsync(task, outputFile).subscribe();

💡 Tip: The OCR processor is also capable of extracting text from partially detected pages. When processing pages that contain text streams for only parts of the visible text, the OCR processor will detect and embed text for the missing areas while leaving existing text streams untouched.

Language Selection

The OCR processor supports 21 different languages that each come as a separate downloadable language pack. When calling performOcrOnPages(), you have to pass it one of the existing OcrLanguage values that corresponds to the language of choice.

ℹ️ Note: When performing OCR processing for a particular language for the first time, Nutrient will extract the language pack data for that language from the app’s assets and copy it into the app’s private directory. This is done on the fly, and it’s only done once per language.