Next-level document management integration solutions

Jonathan D. Rhyne

March 31, 2023

Next-level document management integration solutions

Summary

Discover how Nutrient’s professional PDF redaction library combines with GdPicture.NET to deliver intelligent document processing and smart redaction capabilities. This guide explores how our redaction library uses natural language processing and computer vision to automatically identify and protect sensitive information. Learn how this integration enables scalable, automated redaction workflows while reducing manual effort and security risks.

In 2022, Nutrient (formerly PSPDFKit) acquired ORPALIS, a leading document imaging and OCR software company. This opened a new chapter in offering comprehensive PDF solutions: ORPALIS’s flagship product, GdPicture.NET, can now be integrated with Nutrient Web SDK projects.

GdPicture is a .NET library with advanced document imaging, intelligent processing, and versatile PDF capabilities. Integrating it with Nutrient Web SDK combines client-side features and server-side power. The result is a customizable, end-to-end PDF solution.

Let’s dive into the benefits of adding GdPicture technology to your Nutrient Web SDK projects.

Advanced capabilities of GdPicture.NET you can integrate with your Nutrient Web SDK applications

Lightning-fast PDF generation capabilities

By integrating GdPicture.NET with your existing Nutrient Web SDK project, you can generate structured, secure, and compliant PDF files. Your generated PDFs will also be ready for long-term archiving with PDF/A compliance and optimized for web viewing using linearization.

Convert any content to a PDF file

GdPicture.NET supports high-fidelity conversion of Office files, HTML content, images, email formats, and much more. In fact, it supports more than 100 document formats, without any drop in conversion quality.

OCR complex documents at volume — Quickly and accurately

In the context of PDFs, optical character recognition (OCR) is a technology that enables the conversion of scanned documents, images, and printed text within a PDF into machine-readable and editable text.

The GdPicture.NET OCR engine is highly accurate and versatile, optimized to quickly and efficiently process large quantities of complex documents. It can handle more than 100 languages and document formats, overcoming challenges faced by other OCR solutions in recognizing complex character sets and multi-language text.

By integrating the GdPicture.NET OCR engine with Nutrient Web SDK, you can unlock several new server-side functionalities:

Create searchable PDF/A files — Convert scanned documents, images, and existing PDFs into searchable, dynamic text, making it easier to store, index, search, and manipulate millions of documents.
OCR at scale — GdPicture.NET offers high-speed multithreading to automate workflows at scale.
Automatic document recognition — The GdPicture.NET ADR engine can automatically categorize and classify documents within your document management systems. It allows your applications to recognize various structured documents, such as invoices, checks, forms, orders, delivery notes, and page separators. This enables seamless organization and management within a variety of use case scenarios, such as scanning, archiving, indexing, sorting, classification, search, and document and information management.
Machine readable zones — The MRZ recognition engine enables you to develop applications that efficiently extract and decode MRZ characters found on passports, visas, and various ID cards, in turn streamlining document processing.
Optical mark recognition — The OMR engine can detect checkbox content, fill-in areas, multiple choice exam forms, and any highlighted choice areas. It also offers an anchoring mechanism (template recognition) to define the area for processing.
Automatic form processing — With OCR, you can recognize and extract text from scanned or image-based forms, enabling automation in data processing. This can help streamline workflows by reducing manual data entry and improving the accuracy of collected information.

If you want to test this solution, you can check out our OCR demo.

AI-based intelligent document processing

Documents containing unstructured data can cause difficulties when trying to automate PDF workflows. This is due to their lack of format and structure, making them more challenging to analyze, process, and manage compared to structured data.

Additionally, they’re difficult to search, index, and integrate into automated workflows, leading to inefficiencies and storage issues.

The GdPicture.NET intelligent document processing (IDP) capabilities are designed to solve these challenges and help streamline your PDF workflows.

Intelligent document processing demo

Industry leading key-value pair extraction

Nutrient Web SDK customers can now benefit from the advanced key-value pair (KVP) extraction capabilities of the GdPicture.NET OCR engine. This powerful feature simplifies the extraction of relevant data from your documents.

The KVP engine offers intelligent document understanding and processing for unstructured and semi-structured documents. It rapidly identifies information with labels and values, extracts them, and qualifies the value, significantly reducing manual data entry efforts.

The engine also adapts to the document and optimizes its approach, addressing the usual weaknesses of traditional OCR and pure machine learning (ML) engines. This includes text recognition in noisy documents, dotted lines, touching and broken characters, text on colored backgrounds, underlined text, skewed text, and text in graphics and tables.

If you want to learn more, you can read this KVP extraction article(opens in a new tab) from ORPALIS.

Key-value pair demo

Smart redaction

Many organizations still rely on manual redaction, which is time-consuming, error-prone, and unscalable. To overcome these challenges, Nutrient Web SDK users can implement GdPicture.NET smart redaction technology, which utilizes natural language and computer vision to simplify and expedite the redaction process.

Smart redaction identifies information types rather than specific text — handling data in various orientations — and enables automatic redaction of information like credit card numbers, email addresses, IBANs, phone numbers, URIs, VAT IDs, VINs, and SSNs, without manual region selection. This powerful technology can be integrated with Nutrient Web SDK, helping users save time and reduce potentially costly errors when managing sensitive data in their documents.

Table extraction

Traditional OCR table extraction solutions that aren’t supported by AI have severe limitations when trying to batch-process documents of various types and layouts. They may get particularly poor results with:

Low-quality documents with many speckles
Skewed pages or uneven text alignment
Colored backgrounds or light-colored text in cells
Characters touching borders or cell edges

GdPicture.NET table extraction technology relies on an AI-powered engine that employs the latest machine vision and artificial intelligence techniques to automatically detect and extract table data from all types of documents, regardless of image quality, scan distortions, skewed pages, page breaks, or colored cells. It also converts the extracted tables to XLSX.

If you want to learn more about how ORPALIS handles table extraction and how the engine functions, you can read this blog series:

Handle large files with the MRC hyper-compression engine

GdPicture.NET’s mixed raster content (MRC) hyper-compression engine is a gamechanger for managing large document files.

With this innovative technology, you can quickly reduce any 300-DPI color document to a mere 20–60 KB, without compromising on image quality. This powerful compression engine ensures that your documents are easier to store, share, and load, all while maintaining their original quality.

MRC hyper-compression demo

GdPicture.NET barcode scanning

Nutrient Web SDK customers can benefit from GdPicture.NET barcode scanning technology for enhanced document processing. The technology supports numerous 1D and 2D barcode symbologies, allowing for accurate decoding and extraction of data from various barcodes. With its high-speed processing and advanced recognition algorithms, the SDK can be seamlessly integrated into existing workflows, improving overall efficiency.

Additionally, it offers barcode generation capabilities, making it possible to create custom barcodes for documents.

GdPicture.NET seamless integration

We’ve worked hard so that GdPicture.NET technology can be efficiently and quickly deployed to Nutrient Web SDK projects. To ensure this, we offer comprehensive documentation, getting started guides, and top-notch technical support.

This level of support is helpful in reducing the time and costs associated with integrating new solutions, and it’ll enable our customers to quickly realize the benefits of a comprehensive document management solution without incurring additional expenses.

What to do next

If you’re a Nutrient Web SDK customer and you want to integrate GdPicture.NET with a new or existing project, you can contact us, and we’ll talk about extending your license.

Additionally, you can test and explore all the new and improved licensable features of the Nutrient .NET SDK by heading over to our guides and getting started.