How to Unlock PDF Functionality with OCR in iOS

Stefan Kieleithner

January 6, 2021

How to Unlock PDF Functionality with OCR in iOS

With PSPDFKit 9.5 for iOS, we introduced functionality for running optical character recognition (OCR) on a PDF. This feature can be used to make inaccessible text in a PDF — whether due to the PDF being scanned or the text consisting of vector graphics — interactive, and it allows working with the text in a computer-readable format.

In this blog post, we’ll cover examples of how to add text-related functionality to a document that previously had inaccessible text by combining OCR and PDF functionality to work with the text.

Use Cases for OCR

We can use OCR to expose previously inaccessible text in a scanned or photographed image that has been converted to a PDF as a computer-readable format. This enables more functionality than having the text available as a visual representation. In having access to the underlying text, we can perform various operations on the PDF that weren’t possible before. These include extracting words, selecting text, highlighting passages, and searching for phrases.

Integrating OCR Functionality

To take advantage of OCR functionality in an app, we need to integrate the PSPDFKitOCR framework and add the appropriate language bundles for the languages the app should be able to perform OCR in. For more details and complete instructions, head over to our OCR integration guide.

Performing OCR

Once PSPDFKitOCR has been integrated, our API can be used to perform OCR on a document. With the following snippet, OCR will be performed on the first page of a document. It detects English text and saves the resulting document to a new location before the document is shown onscreen:

guard let processorConfiguration = Processor.Configuration(document: document) else { return }
processorConfiguration.performOCROnPages(at: IndexSet(integer: 0), options: ProcessorOCROptions(language: .english))
let processor = Processor(configuration: processorConfiguration, securityOptions: nil)
let ocrURL: URL = ... // File URL for OCRed document to be saved at.
DispatchQueue.global(qos: .userInitiated).async {
try processor.write(toFileURL: ocrURL)
    DispatchQueue.main.async {
      let ocrDocument = Document(url: ocrURL)
        pdfController.document = ocrDocument
    }
}

This creates a processor configuration from a document that should have OCR performed on it. Then we call the performOCROnPages(at: options:) method to actually mark the processor to perform the OCR action. Here we’ll need to provide an index set containing the pages that should be included in OCR, as well as the language the text should be recognized in.

Furthermore, we create the processor with the configuration and set a URL where the output PDF should be saved. We then perform the write method. This saves the document to disk on a background thread since it can take a few seconds, and we don’t want to block the main thread.

When the PDF has been created, we create the document from the URL the new file has been saved at, and we show the document on PDFViewController.

You can find more details on and examples of how to use the API in our OCR Usage guide.

Working with Text on the Processed Document

Once OCR has been performed on a document, we can start enhancing it with text-related functionality — something which wasn’t possible before.

Extraction

Text can be extracted from a document after performing OCR. There are various APIs for getting a word from a document’s pages, the most important of which is the text parser.

The text parser has the ability to get a page’s text in ways that are easy to work with. One way is by providing access to all the page’s words. Here’s an example of how to extract the first word of the first page:

let textParser = document.textParserForPage(at: 0)!
let word = textParser.words.first!

We’re using the document’s text parser to first get the textual representation of the document, and then query the first object of the word’s property to get the first word on the page.

Selection

After we’ve extracted text, we can make use of text selection. This can be done either with PSPDFKit’s UI or programmatically. All recognized text will be able to be selected. This code snippet shows how to work with text selection and select the first word of the first page using our API:

let textParser = document.textParserForPage(at: 0)!
let word = textParser.words.first!

let pageView = pdfController.pageViewForPage(at: 0)!
let selectionView = pageView.selectionView

selectionView.selectedGlyphs = textParser.glyphs(in: word.range)

We’re extracting the text of the first page to get the first word from it. Then we’re setting the selected glyphs property on the text selection view on the page view to mark a word as selected.

Highlight

Not only can we select text on the new document; we can also add highlights. Highlights are text markup annotations that will add a background to the selected text.

The code below shows how to highlight a selected phrase:

let highlightAnnotation = HighlightAnnotation.textOverlayAnnotation(with: selectionView.selectedGlyphs)!
document.add(annotations: [highlightAnnotation])

This will take the currently selected text, create a highlight annotation with the default style, and add it to the document.

Search

Using SearchViewController allows users of an app to search all text across a document. By default, the search UI can be accessed via the search button item that’s shown in the navigation bar of the PDF controller. This also works with text that has been recognized in a document that had OCR performed on it.

Conclusion

In this post, we went over how to connect OCR and text-related PDF functionality to work with various documents — on iOS and using Swift — in which text was not accessible prior to performing OCR.

OCR is available across our product line on various platforms, so be sure to check out our product page covering OCR to find the product that fits your needs.