Scan and OCR PDFs in C#
This guide explains how to scan a physical document with a scanner and then save the scanned image in a searchable PDF. Nutrient .NET SDK’s (formerly GdPicture.NET) optical character recognition (OCR) engine enables you to recognize text in an image and then save the text in a PDF. This guide uses the TWAIN protocol(opens in a new tab).
Printing and scanning aren’t supported in the cross-platform .NET 6.0 assembly. For more information, see the system compatibility guide.
To get an image from a scanner and then save it in a searchable PDF, follow the steps below:
- Create a
GdPictureImaging
object and aGdPicturePDF
object. - Store the handle of the active windows in a variable by calling the
IntPtr.Zero
structure. - Select the scanner by passing the handle to the
TwainSelectSource
and theTwainOpenDefaultSource
methods of theGdPictureImaging
object. - Optional: Hide the scanning user interface with the
TwainSetHideUI
method of theGdPictureImaging
object. Use this setting when your application cannot communicate with the scanner. - Create a new PDF document with the
NewPDF
method of theGdPicturePDF
object. The parameter of this method sets the conformance level of the PDF document. This parameter is a member of thePdfConformance
enumeration. For example, usePDF
to create a common PDF document. - Get the image from the scanner by passing the handle to the
TwainAcquireToGdPictureImage
method of theGdPictureImaging
object. - Add the scanned image to a new page in the destination document with the
AddImageFromGdPictureImage
method of theGdPicturePDF
object. - Run the OCR process with the
RunOCR
method of theGdPictureOCR
object:- Set the code of the language that Nutrient .NET SDK uses to recognize text in the source document. To specify several languages, separate the language codes with the
+
character. For example,eng+fra
. - Set the path to the OCR resource folder. The default language resources are located in
GdPicture.NET 14\Redist\OCR
. For more information on adding language resources, see the language support guide. - Set the character allowlist. When scanning the document, the OCR engine only recognizes the characters included in the allowlist. When you set
""
, all characters are recognized. - Set the dot-per-inch (DPI) resolution the OCR engine uses. It’s recommended to use
300
for the best combination of speed and accuracy.
- Set the code of the language that Nutrient .NET SDK uses to recognize text in the source document. To specify several languages, separate the language codes with the
- Save the result in a PDF document.
- Close the TWAIN source handle.
The example below gets an image from a scanner and then saves it in a searchable PDF:
using GdPictureImaging gdpictureImaging = new GdPictureImaging();using GdPicturePDF gdpicturePDF = new GdPicturePDF();// Store the handle of the active windows in a variable.IntPtr WINDOW_HANDLE = IntPtr.Zero;// Select the scanner.gdpictureImaging.TwainSelectSource(WINDOW_HANDLE);gdpictureImaging.TwainOpenDefaultSource(WINDOW_HANDLE);// (Optional) Hide the scanning user interface.gdpictureImaging.TwainSetHideUI(true);// Create the destination PDF document.gdpicturePDF.NewPDF(PdfConformance.PDF);// Get the image from the scanner.int imageID = gdpictureImaging.TwainAcquireToGdPictureImage(WINDOW_HANDLE);// Add the scanned image to a new page in the destination document.gdpicturePDF.AddImageFromGdPictureImage(imageID, false, true);// Run the OCR process.gdpicturePDF.OcrPage("eng", @"C:\GdPicture.NET 14\Redist\OCR", "", 300);// Save the result in a PDF document.gdpicturePDF.SaveToFile(@"C:\temp\output.pdf");// Release unnecessary resources.gdpictureImaging.ReleaseGdPictureImage(imageID);gdpictureImaging.TwainCloseSource();
Using gdpictureImaging As GdPictureImaging = New GdPictureImaging()Using gdpicturePDF As GdPicturePDF = New GdPicturePDF() ' Store the handle of the active windows in a variable. Dim WINDOW_HANDLE = IntPtr.Zero ' Select the scanner. gdpictureImaging.TwainSelectSource(WINDOW_HANDLE) gdpictureImaging.TwainOpenDefaultSource(WINDOW_HANDLE) ' (Optional) Hide the scanning user interface. gdpictureImaging.TwainSetHideUI(True) ' Create the destination PDF document. gdpicturePDF.NewPDF(PdfConformance.PDF) ' Get the image from the scanner. Dim imageID As Integer = gdpictureImaging.TwainAcquireToGdPictureImage(WINDOW_HANDLE) ' Add the scanned image to a new page in the destination document. gdpicturePDF.AddImageFromGdPictureImage(imageID, False, True) ' Run the OCR process. gdpicturePDF.OcrPage("eng", "C:\GdPicture.NET 14\Redist\OCR", "", 300) ' Save the result in a PDF document. gdpicturePDF.SaveToFile("C:\temp\output.pdf") ' Release unnecessary resources. gdpictureImaging.ReleaseGdPictureImage(imageID) gdpictureImaging.TwainCloseSource()End UsingEnd Using
Used methods
Related topics