OCR PDF in C#
This guide demonstrates how to convert a PDF file into a searchable PDF using Nutrient .NET SDK’s powerful OCR library for C#. With our advanced optical character recognition (OCR) engine, you can extract text from PDF files and save it in a separate PDF, enabling searchability and allowing users to copy and paste text seamlessly. This process enhances document accessibility and supports efficient text recognition in .NET applications.
To convert a PDF file into a searchable PDF using Nutrient .NET SDK’s OCR library, follow these steps:
-
Create a
GdPicturePDF
object. -
Load the source document by passing its path to the
LoadFromFile
method of theGdPicturePDF
object. -
Determine the number of pages with the
GetPageCount
method of theGdPicturePDF
object. -
Iterate through pages of the source document.
-
For each page, run the OCR process with the
OcrPage
method of theGdPicturePDF
object. Configure the OCR process by passing the following parameters to theOcrPage
method:-
Language settings: Set the code of the language that Nutrient .NET SDK uses to recognize text in the source document. To specify several languages, separate the language codes with the
+
character. For example,eng+fra
. -
OCR resource folder path: Set the path to the OCR resource folder. The default language resources are located in
GdPicture.NET 14\Redist\OCR
. For more information on adding language resources, see the language support guide. -
Character allowlist: Set the character allowlist. When scanning the document, the OCR engine only recognizes the characters included in the allowlist. When you set
""
, all characters are recognized. -
DPI resolution: Set the dot-per-inch (DPI) resolution the OCR engine uses. It’s recommended to use
300
for the best combination of speed and accuracy.
-
-
Save the result in a new PDF document.
This approach enables you to effectively integrate OCR-powered PDF-to-searchable-PDF functionality in .NET applications, improving text accessibility and usability.
The example below converts a PDF file to a searchable PDF:
using GdPicturePDF gdpicturePDF = new GdPicturePDF(); // Load the source document. gdpicturePDF.LoadFromFile(@"C:\temp\source.pdf"); // Determine the number of pages. int pageCount = gdpicturePDF.GetPageCount(); // Loop through the pages of the source document. for (int i = 1; i <= pageCount; i++) { // Select a page and run the OCR process on it. gdpicturePDF.SelectPage(i); gdpicturePDF.OcrPage("eng", @"C:\GdPicture.NET 14\Redist\OCR", "", 300); } // Save the result in a new PDF document. gdpicturePDF.SaveToFile(@"C:\temp\output.pdf"); gdpicturePDF.CloseDocument();
Using gdpicturePDF As GdPicturePDF = New GdPicturePDF() ' Load the source document. gdpicturePDF.LoadFromFile("C:\temp\source.pdf") ' Determine the number of pages. Dim pageCount As Integer = gdpicturePDF.GetPageCount() ' Loop through the pages of the source document. For i = 1 To pageCount ' Select a page and run the OCR process on it. gdpicturePDF.SelectPage(i) gdpicturePDF.OcrPage("eng", "C:\GdPicture.NET 14\Redist\OCR", "", 300) Next ' Save the result in a new PDF document. gdpicturePDF.SaveToFile("C:\temp\output.pdf") gdpicturePDF.CloseDocument() End Using