Enhance Characters in PDFs and Images in C#
This guide explains how to enhance characters in PDFs and images.
Thick and Oversampled Characters
Sometimes characters in documents appear thick and their features are unclear — for example, if too much ink was used to print a page, or if a document was scanned and printed many times. A process called erosion can fix this issue by removing pixels on the edges of images.
The images below show what a document looks like before and after erosion.
Don’t preprocess documents before recognizing text with OCR. The GdPicture.NET OCR engine preprocesses documents automatically with better results than manual preprocessing.
To fix thick characters, follow these steps:
-
Create a
GdPictureImaging
object. -
Select the image by passing its path to the
CreateGdPictureImageFromFile
method of theGdPictureImaging
object. -
Fix thick characters by passing the image ID to the
FxBitonalErode8
method of theGdPictureImaging
object. -
Save the output in a new image with the
SaveAsPNG
method of theGdPictureImaging
object. -
Release the image resource with the
ReleaseGdPictureImage
method of theGdPictureImaging
object.
The example below fixes thick characters:
using GdPictureImaging gdpictureImaging = new GdPictureImaging(); // Load the image from a file. int imageId = gdpictureImaging.CreateGdPictureImageFromFile(@"C:/temp/source.png"); // Fix thick characters. gdpictureImaging.FxBitonalErode8(imageId); // Save the output in a new image. gdpictureImaging.SaveAsPNG(imageId, @"C:/temp/output.png"); gdpictureImaging.ReleaseGdPictureImage(imageId);
Using gdpictureImaging As GdPictureImaging = New GdPictureImaging() ' Load the image from a file. Dim imageId As Integer = gdpictureImaging.CreateGdPictureImageFromFile("C:/temp/source.png") ' Fix thick characters. gdpictureImaging.FxBitonalErode8(imageId) ' Save the output in a new image. gdpictureImaging.SaveAsPNG(imageId, "C:/temp/output.png") gdpictureImaging.ReleaseGdPictureImage(imageId) End Using
Used Methods and Properties
Related Topics
Faint and Low-Sampled Characters
Sometimes characters in documents appear faint and low-sampled, and their features are unclear — for example, if the brightness used to scan a page was too high, or a document was converted to binary images with a bad algorithm. A process called black dilation can fix this issue by adding black pixels around objects.
To fix faint characters, follow these steps:
-
Create a
GdPictureImaging
object. -
Select the image by passing its path to the
CreateGdPictureImageFromFile
method of theGdPictureImaging
object. -
Fix faint characters by passing the image ID to the
FxBitonalDilate8
method of theGdPictureImaging
object. -
Save the output in a new image with the
SaveAsPNG
method of theGdPictureImaging
object. -
Release the image resource with the
ReleaseGdPictureImage
method of theGdPictureImaging
object.
The example below fixes faint characters:
using GdPictureImaging gdpictureImaging = new GdPictureImaging(); // Load the image from a file. int imageId = gdpictureImaging.CreateGdPictureImageFromFile(@"C:/temp/source.png"); // Fix faint characters. gdpictureImaging.FxBitonalDilate8(imageId); // Save the output in a new image. gdpictureImaging.SaveAsPNG(imageId, @"C:/temp/output.png"); gdpictureImaging.ReleaseGdPictureImage(imageId);
Using gdpictureImaging As GdPictureImaging = New GdPictureImaging() ' Load the image from a file. Dim imageId As Integer = gdpictureImaging.CreateGdPictureImageFromFile("C:/temp/source.png") ' Fix faint characters. gdpictureImaging.FxBitonalDilate8(imageId) ' Save the output in a new image. gdpictureImaging.SaveAsPNG(imageId, "C:/temp/output.png") gdpictureImaging.ReleaseGdPictureImage(imageId) End Using