Optimize PDFs with advanced OCR features
The Muhimbi Document Converter comes with support for a number of OCR (Optical Character Recognition) related facilities including the ability to make image based PDFs (Scans, faxes) fully searchable and indexable. In addition it support a way to extract this text to allow information such as Invoice numbers, Purchase Order numbers or other identifiable information to be extracted and used as part of a larger software / workflow process.
For more details and examples see the following articles:
- The How and Why of OCR /Providing document access to the visually impaired(opens in a new tab)
- OCR Facilities provided by Muhimbi’s server based PDF Conversion products(opens in a new tab)
- Converting scans and images to searchable PDFs using Java and server side OCR(opens in a new tab)
- Converting scans and images to searchable PDFs using C# and server side OCR(opens in a new tab)
- Converting scans and images to searchable PDFs using SharePoint Designer Workflows(opens in a new tab)
- Converting scans and images to searchable PDFs using OCR & Nintex Workflow(opens in a new tab)
- Extract text from scanned content using OCR and SharePoint Designer Workflows(opens in a new tab)
- Extract text from scanned content using OCR and Nintex Workflow(opens in a new tab)
- Utilise 3rd party OCR Engines in Muhimbi’s range of Server Side PDF Products(opens in a new tab)
Please note that in order to use OCR in a production environment, a valid add-on license for the OCR and PDF/A Archiving Add-on(opens in a new tab) must be installed alongside a regular license.