Blog Post

PSPDFKit API OCR and Office Conversion Improvements

Tomáš Šurín
Kelly Benitez
Illustration: PSPDFKit API OCR and Office Conversion Improvements

PSPDFKit API is now shipping with brand-new OCR and Office conversion engines. Earlier this year, PSPDFKit merged with ORPALIS, and in the last few weeks, we’ve been diligently working on leveraging GdPicture.NET technology to deliver significant improvements in performance and accuracy to PSPDFKit API.

What Is GdPicture.NET?

GdPicture.NET is a comprehensive all-in-one toolkit providing complete PDF support, along with support for a number of file formats — including Office, CAD, and images. It also ships with rich image processing and industry-leading OCR and document-understanding capabilities that are using state-of-the-art artificial intelligence and machine learning algorithms. Over the coming months, we’ll be incorporating much of this cutting-edge technology into PSPDFKit API.

Why We Replaced the Previous Engines

Our previous OCR engine was based on the Tesseract open source project, and we used LibreOffice as the core of our Office conversion tools. Our technology produced good-quality results, but we found it lacking in certain aspects due to these two fundamental parts that were powering it.

The main issue with our OCR engine was the performance, which was only acceptable at best. In the case of Office conversion, our main pain point was that we were unable to effectively improve the conversion quality itself.

Performance and Accuracy

Both the OCR and Office conversion engines bring improved performance and accuracy, with documents being processed more quickly and accurately. The OCR performance gain is especially considerable: We measured improved performance of up to 7× when compared to the previous engine — all while delivering the same or sometimes even better accuracy.

With Office conversion, we achieved better conversion results for many documents from our test set. We didn’t find any regressions in quality on the same set of documents.

Conclusion

We’re excited to bring you these huge improvements to our API tools. You can already try the tools for free:

We also invite you to read our blog posts with detailed explanations of how to use our Office conversion tools:

Note that this is only a small glimpse into what’s possible with the combined powers of PSPDFKit API and GdPicture.NET. Stay tuned for the new capabilities and improvements that we’re planning to introduce soon.

Authors
Kelly Benitez Marketing Operations Manager

Kelly joined Nutrient in 2017 as an intern and now handles marketing operations. She loves diving into data and a good spreadsheet. Outside of work, she enjoys bouldering, reading, crocheting, and going for walks.

Tomáš Šurín Server and Services Engineer

Tomáš has a deep interest in building (and breaking) stuff both in the digital and physical world. In his spare time, you’ll find him relaxing off the grid, cooking good food, playing board games, and discussing science and philosophy.

Explore related topics

Related products

PSPDFKit API

Product page Guides

Share post
Free trial Ready to get started?
Free trial