Effortlessly extract data from PDFs and images
PSPDFKit Processor has been deprecated and replaced by Document Engine. To start using Document Engine, refer to the migration guide. With Document Engine, you’ll have access to robust new capabilities (read the blog for more information).
Extract content and data from PDF documents and images. With our data extraction library, you can easily integrate a wide set of extraction capabilities into your application or workflow: Extract text, images, key values, table and form data, and more. Leverage AI, ML, and adaptive layout understanding to accurately extract information from unstructured or semi-structured documents. Explore our guides and code samples to learn how you can quickly integrate data extraction into your application.
Nutrient SDKs are deployed in some of the world’s most popular applications, such as those made by Autodesk, Disney, UBS, Dropbox, IBM, and Lufthansa.
Key Capabilities
-
Powered by AI and ML — 15+ years of continuous improvements in accuracy
-
Key-value pairs — Extract key values like phone numbers, IBANs, credit cards, and more
-
PDF tables — Extract structured table data from financial reports
-
Text and images — Extract from unstructured and semi-structured documents and images
Guides for Key-Value Pair Extraction
Key-Value Pair Overview
Learn about our key-value pair technology
How Key-Value Pair Extraction Works
Learn how to use our key-value pair engine
Data Model
Learn about the data model behind the extraction technology
Confidence Score
Learn how confidence scores are determined
Data Types
Learn about the automatically detected data types
Using the Data Extraction API
Learn how to extract data using the API
Extract Data from Bank Statements
Learn how to extract data from bank statements
Extract Data from Tables
Learn how to extract data from tables
Extract Text
Learn how to extract text from documents or images