OCR and data extraction

Give new life to scanned documents

Enable the seamless capture, processing, and integration of physical documents into your users’ digital workflows with robust scanning features.

Use cases

Create searchable PDFs from scanned documents

Capture and convert physical documents into digital formats using TWAIN and WIA scanning protocols, enabling easy integration into electronic workflows.

Extract data from PDFs

Set up table extraction and data parsing techniques to read semi-structured and unstructured data and convert from PDFs into structured formats like Excel.

Recognize text in multiple languages

Use OCR to accurately recognize and extract text in multiple languages, ensuring global compatibility and ease of processing diverse documents.

Automate data entry

Populate data fields in a database or other system by extracting structured tabular data from documents.

Detect barcodes, OMR, MICR, and MRZ data

Leverage powerful, built-in functions for scanning, decoding, and improving the quality of data across a variety of document types.

AI Document Processing

Attain human-level precision in data classification and extraction from various texts and image documents without set rules or coding.

components

How we help

OCR

Convert scanned documents to searchable text

Convert scanned documents to searchable text

Integrate optical character recognition (OCR) in your app with high accuracy in text extraction and support for multiple languages.

Data Extraction

Extract key-value pairs with an easy-to-use API

Extract key-value pairs with an easy-to-use API

Apply heuristics, mathematics, and machine learning capabilities to automatically and accurately extract key-value pairs, tables, and structured data from documents.

Table Extraction

Turn complex tables into rich structured data

Turn complex tables into rich structured data

Transform tables inside your documents and scanned images — including bordered, semi-bordered​​, and borderless tables — into structured data or editable Excel (XLSX) files.

Image Processing

Edit, print, and preprocess raster and vector images

Edit, print, and preprocess raster and vector images

Improve OCR, OMR, and barcode detection with advanced character and symbol recognition. Use 500+ powerful low-level functions to clean up, manipulate, and edit more than 100 document and image formats.

Explore other use cases

Signing

Streamline contract execution, digital workflows, and approval processes within your apps with electronic and digital signatures.

Markup

Enhance the review and feedback process with document editing, highlighting, and annotating.

Intelligent Document Processing

Automate the extraction of data from semi-structured and unstructured documents.

Frequently asked questions

What features are available for OCR out of the box?

The following features and capabilities are available for OCR:

  • Full Unicode support
  • Multithread support
  • Character recognition confidence
  • Retrieving a character’s location
  • Retrieving font information (e.g. style, family)
  • Retrieving paragraph information (e.g. justification, alignment, bounding box)
  • Direct conversion from an image or image-based PDF to PDF
  • Extracting OCR results as text
  • Getting the OCR result based on internal GdPicture structures serialized as a JSON string
  • Recognizing only digits, only alpha, or results based on allowed or disallowed characters
  • OCR context support (defines if the engine is processing a document, a single word, a single character, a text block, vertical text, etc.)
  • Orientation detection

How easy is the OCR setup process?

Our code snippets, samples, and complete documentation provide a seamless and straightforward setup process.

Does Nutrient handle advanced PDF OCR features?

Yes. To learn more, visit our OCR guide.

Which languages are supported by OCR?

Nutrient Web SDK can perform OCR in the following languages: Croatian, Czech, Danish, Dutch, English, Finnish, French, German, Indonesian, Italian, Malay, Norwegian, Polish, Portuguese, Serbian, Slovak, Slovenian, Spanish, Swedish, Turkish, and Welsh. To learn more, visit our OCR guide.

Is there a trial version of Nutrient OCR?

Yes. You can begin a free trial by selecting the free trial option at the top of each page.