AI DOCUMENT AND DATA EXTRACTION

Extract structured data from any document

Pull tables, key-value pairs, and handwritten text from PDFs, scanned documents, and images. Convert well-structured PDFs to Markdown in milliseconds, use Vision API (Python, Java) for layout-aware AI extraction with structured JSON output, or use AI Document Processing for LLM-driven classification and template-based validation.



Why teams build extraction with Nutrient

On-premises data processing

Every engine — including VLM — runs on your infrastructure. Connect a cloud provider only if you choose to. No data leaves your network unless you decide it should.

Automated field extraction

Describe the fields you need in plain language. The SDK extracts them from invoices, ID documents, forms, and receipts, and flags values that fail validation.

Structured JSON output

Vision API returns JSON with element coordinates, type classification, reading order, and confidence scores. Every element traces back to its source page and position.

100+ supported file types

PDFs, scanned images, Office files, CAD drawings, email, and 40+ camera RAW formats. Handles multicolumn layouts, mixed languages, handwriting, and skewed scans.

PDF TO MARKDOWN

Fast extraction from digitally born PDFs

For PDFs that already have embedded text and structure — reports, documentation, contracts — extract content as clean Markdown without AI models. The SDK reads the existing text layer; detects headings, paragraphs, tables, and lists; and outputs structured Markdown. No model downloads, no network access, no processing overhead. The fastest extraction option when your documents are already well-structured.

Vision API demo

VISION API

Layout-aware extraction with on-premises AI

Three extraction modes, one API — from fast character recognition to full layout analysis with vision language models. All modes can run on-premises. Output is structured JSON with element coordinates, reading order, and confidence scores.

OCR extraction

Fast character recognition with word-level bounding boxes. Converts pixels to text without structural context. Best for searchable, selectable text from clean documents.


ICR extraction

On-premises AI models for layout segmentation, table detection, handwriting, and equation recognition. Extracts tables with cell-level coordinates and resolves reading order. No network access required.


VLM-enhanced extraction

All three engines — OCR, ICR, and a vision language model — run in parallel on presegmented document regions. Handles degraded scans, complex handwriting, and unusual layouts. Run locally with Qwen, or connect Claude or OpenAI.


Merged output

OCR contributes exact characters, ICR contributes spatial precision and bounding boxes, and the VLM contributes document understanding. The merged result is more accurate than any engine alone — no hallucinated text, no missing fields, no lost coordinates.

AI DOCUMENT PROCESSING

Classify documents and extract data with templates

Define extraction fields in plain language — no regex, no rule engines. An LLM identifies document types, extracts key-value pairs, and runs each value through built-in validators (IBAN, VAT ID, postal address, and eight more). Ships with 10 preconfigured templates.

AI Document Processing capabilities
Document classification

The LLM identifies document types — invoices, purchase orders, resumes, payroll statements — and routes them to the correct extraction template. No rule authoring required.


Template-based extraction

Invoice, passport, ID card, receipt, and six more templates ship ready to use. Build custom templates by describing fields in plain language — the LLM handles the rest.


Built-in data validators

Eleven validators check extracted values against known formats: IBAN, VAT ID, credit card number, postal address, email, phone number, and more. Failed checks set the field to VerificationNeeded.


Choose your LLM provider

Connect OpenAI or Azure OpenAI today, with more providers coming. Your API key, your infrastructure rules — documents are processed through the provider you configure.

DEMO

See AI Document Processing in action

Available on your platform

Vision API is available today in Python and Java SDKs, with .NET and DWS Processor API support coming soon. AI Document Processing is available today, with more platforms coming soon.


Deep dives and technical reads

EXPLORE BLOG

FREE TRIAL

Start extracting data from documents

Try Vision API in Python or Java, or get started with AI Document Processing — no payment information required.


Frequently asked questions

What are the different extraction approaches and when should I use each?

Nutrient offers three approaches. PDF to Markdown converts well-structured, digitally born PDFs into clean Markdown — no AI models, no network access, and the fastest option. Vision API extracts structured JSON with element coordinates, reading order, and classification using OCR, ICR, or VLM-enhanced mode — use it when you need spatial precision or need to handle scanned documents, handwriting, and complex layouts. AI Document Processing sits a level above — it uses an LLM to classify documents, extract key-value pairs via templates, and validate results. Use it for end-to-end workflows like invoice processing or ID verification.

Can Vision API process scanned PDFs and image-based documents?

Yes. Vision API supports more than 100 file types, including scanned PDFs, 50+ image formats (TIFF, JPEG, PNG, DICOM, HEIF, and raw camera formats), Microsoft Office documents, and CAD files. ICR mode uses on-premises AI models for layout detection and handwriting recognition without sending data to any external service.

What programming languages and platforms are supported?

Vision API is available today in the Python SDK (3.8+) and Java SDK (8+), with .NET and DWS Processor API support coming soon. AI Document Processing is available today, with cross-platform support coming soon.

Does the extraction run on-premises or require cloud connectivity?

PDF to Markdown, OCR, and ICR run fully on-premises — no network access required. VLM-enhanced mode can also run on-premises with a self-hosted model like Qwen, or you can connect a cloud provider like Claude or OpenAI. AI Document Processing requires an LLM provider (OpenAI or Azure OpenAI). In every configuration, you decide what leaves your infrastructure.

How does Vision API handle tables and complex document layouts?

ICR mode detects layout regions (headers, paragraphs, tables, and figures) and extracts tables with cell-level bounding-box coordinates. It preserves reading order across multicolumn layouts and identifies hierarchical relationships between document elements. In VLM-enhanced mode, all three engines run in parallel — the merged output catches merged cells, footnotes, and edge cases that ICR alone can miss.

What preconfigured templates does AI Document Processing include?

Ten templates ship out of the box: bank details, driver’s license, ID card, invoice (15 fields), passport, payment card, payroll statement, purchase order, receipt, and resume. Each template defines expected fields with data types. You can extend any template with custom fields or build new templates from scratch using natural language field descriptions.

What output format does Vision API produce?

Structured JSON. Each document element (text block, table, figure, and heading) includes bounding-box coordinates, reading order index, element type, and confidence scores. The format is designed for direct database ingestion, visual review UIs, and audit trails.

Can I extract data from documents in multiple languages?

Yes. ICR handles non-Latin scripts and mixed-language layouts. VLM-enhanced mode is strongest here — the vision language model adds context that pure layout analysis misses. AI Document Processing inherits the language capabilities of your LLM provider (OpenAI, Azure OpenAI).

How does data validation work in AI Document Processing?

AI Document Processing includes 11 built-in validators: IBAN integrity, credit card number, VAT ID, postal address, email, URI, phone number, vehicle identification number, currency, date, and number formats. When a validator flags a value, the field’s validation state is set to VerificationNeeded, indicating it requires human review. You can combine multiple validators per field.