Effortlessly extract data from PDFs and images

Information

PSPDFKit Processor has been deprecated and replaced by Document Engine. To start using Document Engine, refer to the migration guide. With Document Engine, you’ll have access to robust new capabilities (read the blog for more information).

Extract content and data from PDF documents and images. With our data extraction library, you can easily integrate a wide set of extraction capabilities into your application or workflow: Extract text, images, key values, table and form data, and more. Leverage AI, ML, and adaptive layout understanding to accurately extract information from unstructured or semi-structured documents. Explore our guides and code samples to learn how you can quickly integrate data extraction into your application.

Try for Free Launch Demo

Nutrient SDKs are deployed in some of the world’s most popular applications, such as those made by Autodesk, Disney, UBS, Dropbox, IBM, and Lufthansa.

Key Capabilities

  • Powered by AI and ML — 15+ years of continuous improvements in accuracy

  • Key-value pairs — Extract key values like phone numbers, IBANs, credit cards, and more

  • PDF tables — Extract structured table data from financial reports

  • Text and images — Extract from unstructured and semi-structured documents and images

Guides for Key-Value Pair Extraction

Key-Value Pair Overview
Learn about our key-value pair technology

How Key-Value Pair Extraction Works
Learn how to use our key-value pair engine

Data Model
Learn about the data model behind the extraction technology

Confidence Score
Learn how confidence scores are determined

Data Types
Learn about the automatically detected data types

Using the Data Extraction API
Learn how to extract data using the API

Extract Data from Bank Statements
Learn how to extract data from bank statements

Extract Data from Tables
Learn how to extract data from tables

Extract Text
Learn how to extract text from documents or images