OCR PDF in C#

This guide demonstrates how to convert a PDF file into a searchable PDF using Nutrient .NET SDK’s powerful OCR library for C#. With our advanced optical character recognition (OCR) engine, you can extract text from PDF files and save it in a separate PDF, facilitating searchability and enabling users to copy and paste text seamlessly. This process enhances document accessibility and supports efficient text recognition in .NET applications.

OCR all pages in one call

In addition to the page-by-page OcrPage workflow below, you can OCR an entire document in a single operation using OcrPages.

using GdPicturePDF pdf = new GdPicturePDF();
pdf.LoadFromFile(@"input_image_based.pdf");
pdf.OcrPages("*", 0, "eng", "", "", 200);
pdf.SaveToFile(@"output.pdf");

This approach is useful for streamlined server workflows where you want to process all pages with one call.

Converting PDF to searchable PDF

To convert a PDF file into a searchable PDF using Nutrient .NET SDK’s OCR library, follow the steps below:

Create a GdPicturePDF object.
Load the source document by passing its path to the LoadFromFile method of the GdPicturePDF object.
Determine the number of pages with the GetPageCount method of the GdPicturePDF object.
Iterate through pages of the source document.
For each page, run the OCR process with the OcrPage method of the GdPicturePDF object. Configure the OCR process by passing the following parameters to the OcrPage method:
1. Language settings: Set the code of the language that Nutrient .NET SDK uses to recognize text in the source document. To specify several languages, separate the language codes with the + character. For example, eng+fra.
2. OCR resource folder path: Set the path to the OCR resource folder. The default language resources are located in GdPicture.NET 14\Redist\OCR. For more information on adding language resources, see the language support guide.
3. Character allowlist: Set the character allowlist. When scanning the document, the OCR engine only recognizes the characters included in the allowlist. When you set "", all characters are recognized.
4. DPI resolution: Set the dot-per-inch (DPI) resolution the OCR engine uses. It’s recommended to use 300 for the best combination of speed and accuracy.
Save the result in a new PDF document.

This approach enables you to effectively integrate OCR-powered PDF-to-searchable-PDF functionality in .NET applications, improving text accessibility and usability.

The example below converts a PDF file to a searchable PDF:

C#
VISUAL BASIC

using GdPicturePDF gdpicturePDF = new GdPicturePDF();
// Load the source document.
gdpicturePDF.LoadFromFile(@"C:\temp\source.pdf");
// Determine the number of pages.
int pageCount = gdpicturePDF.GetPageCount();
// Loop through the pages of the source document.
for (int i = 1; i <= pageCount; i++)
{
    // Select a page and run the OCR process on it.
    gdpicturePDF.SelectPage(i);
    gdpicturePDF.OcrPage("eng", @"C:\GdPicture.NET 14\Redist\OCR", "", 300);
}
// Save the result in a new PDF document.
gdpicturePDF.SaveToFile(@"C:\temp\output.pdf");
gdpicturePDF.CloseDocument();

Using gdpicturePDF As GdPicturePDF = New GdPicturePDF()
    ' Load the source document.
    gdpicturePDF.LoadFromFile("C:\temp\source.pdf")
    ' Determine the number of pages.
    Dim pageCount As Integer = gdpicturePDF.GetPageCount()
    ' Loop through the pages of the source document.
    For i = 1 To pageCount
        ' Select a page and run the OCR process on it.
        gdpicturePDF.SelectPage(i)
        gdpicturePDF.OcrPage("eng", "C:\GdPicture.NET 14\Redist\OCR", "", 300)
    Next
    ' Save the result in a new PDF document.
    gdpicturePDF.SaveToFile("C:\temp\output.pdf")
    gdpicturePDF.CloseDocument()
End Using

Used methods and properties

OCR PDF in C#

OCR all pages in one call

Converting PDF to searchable PDF

Was this helpful?

Help us improve

Thank you for your feedback!

Something went wrong. Please try again or let us know.