The How and Why of OCR / Providing Document Access to the Visually Impaired
While Optical Character Recognition (OCR) may seem like a newer technology, it’s been around for more than 50 years. In fact, OCR has become embedded in our daily life without much fanfare. For example, if you’ve ever inserted a check directly into an ATM and the ATM displayed the amount– that was OCR working for you. Of course, OCR functionality goes well beyond depositing Grandma’s birthday check.
Due to an overwhelming amount of user requests, OCR has been an important part of Muhimbi’s range of server-side PDF Conversion products ( SharePoint, SDK for Java / PHP / C#, SharePoint Online / Office 365. Implementing software to recognize images and convert them to alpha-numeric characters was no trivial task, but thankfully it’s much easier to explain than it was to actually implement!
When an image is entered into a system it is reviewed for recognizable text. That text is then deciphered by the system with its best guess for each individual character. The system then creates a hidden data layer that contains the deciphered content, synced to the appropriate space on the image.
There are a lot of ways this can be useful for a business and we have included a few examples below. Perhaps more than one will ring familiar to your organization’s needs-set.
If an organization needs to digitize old orders and invoices, doing so manually would involve discrete steps for scanning in the paper copies, renaming them, and storing them in the correct place. However, with OCR technology it’s possible to scan the images in, set rules to look for key information, rename files, and create settings to automatically store them appropriately. SharePoint workflows become super helpful with tasks like these!
Another example involves InfoPath, always a popular topic in our PDF Converter for SharePoint’s use case. It’s not uncommon for InfoPath forms to allow (or require) the attaching of relevant documents. Those docs are most often attached as images, or non-OCR PDFs. By having OCR scan and digitize the content of those files their later usability is significantly increased.
OCR also offers advantages that deal with search-ability. The content in the hidden data layer attached to the file is searchable using a PDF reader or web browser. This allows for “search by content” functionality. Additionally, this text layer can also be set to be crawl-able or index-able allowing search engines to display the OCR document as results. Naturally, this makes said documents much more convenient to work with.
Scanned Document with OCRed text selected
Perhaps most meaningfully, OCR can empower visually impaired users to access content that would be otherwise impossible; the data layer can be used by a text-to-speech system to ‘read’ content to a user. Of course, even though expanding available content to the visually impaired has obvious business value, the impact goes well beyond office work.
For a bit of history in how OCR became a dominant technology in providing content access to the visually impaired, we should start by mentioning that many governments around the world have implemented standards based off Web Content Accessibility Guidelines (WCAG), which has helped formalize how web content should be created and accessed by any machine. Some examples of governmental implementation include US section 508 and UK Equality Act of 2010, meaning that all US and UK government websites must adhere to the standards set in the WCAG.
The WCAG is a lasting legacy of the Web Accessibility Initiative, which spun out of the personal computer boom of the 1990s. As recently as a few years ago, only about 1% of published books became available in braille, so the WCAG and Web Accessibility Initiative have played an important role in setting up useful guidelines to make sure that online content was held to a higher standard.
The wide adoption of these standards means more electronic content has become available to the visually impaired, both through electronic braille readers (which can cost upwards of $3,000), and the less expensive combination of OCR technology and a screen reader. A screen reader, either as a desktop application or a browser extension, allows text-to-speech capabilities for both rich-text content as well as OCR saved content.
Furthermore, while personal, organizational, or corporate sites aren’t required to comply with these standards, most do because they’ve become widely accepted best-practices. This increases the prevalence and frequency with which OCR technology is used.
There are plenty of solid business reasons for including OCR capabilities into Muhimbi’s range of server side PDF Conversion products. However, we can’t help but think that bringing new content, and more options to those with a visual impairment is perhaps the most notable.
An overview of the various OCR facilities provided by our product range can be found in this Knowledge Base Article.
What do you think, is this something that could be useful for your organisation? Leave a comment below or contact our friendly support desk for more information. We love to help.