This guide demonstrates how to convert a PDF file into a searchable PDF using Nutrient .NET SDK’s powerful OCR library for C#. With our advanced optical character recognition (OCR) engine, you can extract text from PDF files and save it in a separate PDF, facilitating searchability and enabling users to copy and paste text seamlessly. This process enhances document accessibility and supports efficient text recognition in .NET applications.
OCR all pages in one call
In addition to the page-by-page OcrPage workflow below, you can OCR an entire document in a single operation using OcrPages.
using GdPicturePDF pdf = new GdPicturePDF();pdf.LoadFromFile(@"input_image_based.pdf");pdf.OcrPages("*", 0, "eng", "", "", 200);pdf.SaveToFile(@"output.pdf");This approach is useful for streamlined server workflows where you want to process all pages with one call.
Converting PDF to searchable PDF
To convert a PDF file into a searchable PDF using Nutrient .NET SDK’s OCR library, follow the steps below:
- Create a
GdPicturePDFobject. - Load the source document by passing its path to the
LoadFromFilemethod of theGdPicturePDFobject. - Determine the number of pages with the
GetPageCountmethod of theGdPicturePDFobject. - Iterate through pages of the source document.
- For each page, run the OCR process with the
OcrPagemethod of theGdPicturePDFobject. Configure the OCR process by passing the following parameters to theOcrPagemethod:- Language settings: Set the code of the language that Nutrient .NET SDK uses to recognize text in the source document. To specify several languages, separate the language codes with the
+character. For example,eng+fra. - OCR resource folder path: Set the path to the OCR resource folder. The default language resources are located in
GdPicture.NET 14\Redist\OCR. For more information on adding language resources, see the language support guide. - Character allowlist: Set the character allowlist. When scanning the document, the OCR engine only recognizes the characters included in the allowlist. When you set
"", all characters are recognized. - DPI resolution: Set the dot-per-inch (DPI) resolution the OCR engine uses. It’s recommended to use
300for the best combination of speed and accuracy.
- Language settings: Set the code of the language that Nutrient .NET SDK uses to recognize text in the source document. To specify several languages, separate the language codes with the
- Save the result in a new PDF document.
This approach enables you to effectively integrate OCR-powered PDF-to-searchable-PDF functionality in .NET applications, improving text accessibility and usability.
The example below converts a PDF file to a searchable PDF:
using GdPicturePDF gdpicturePDF = new GdPicturePDF();// Load the source document.gdpicturePDF.LoadFromFile(@"C:\temp\source.pdf");// Determine the number of pages.int pageCount = gdpicturePDF.GetPageCount();// Loop through the pages of the source document.for (int i = 1; i <= pageCount; i++){ // Select a page and run the OCR process on it. gdpicturePDF.SelectPage(i); gdpicturePDF.OcrPage("eng", @"C:\GdPicture.NET 14\Redist\OCR", "", 300);}// Save the result in a new PDF document.gdpicturePDF.SaveToFile(@"C:\temp\output.pdf");gdpicturePDF.CloseDocument();Using gdpicturePDF As GdPicturePDF = New GdPicturePDF() ' Load the source document. gdpicturePDF.LoadFromFile("C:\temp\source.pdf") ' Determine the number of pages. Dim pageCount As Integer = gdpicturePDF.GetPageCount() ' Loop through the pages of the source document. For i = 1 To pageCount ' Select a page and run the OCR process on it. gdpicturePDF.SelectPage(i) gdpicturePDF.OcrPage("eng", "C:\GdPicture.NET 14\Redist\OCR", "", 300) Next ' Save the result in a new PDF document. gdpicturePDF.SaveToFile("C:\temp\output.pdf") gdpicturePDF.CloseDocument()End Using