Blog post

Resolve the AquaForest OCR file not found error

Neil Pitman Neil Pitman
Illustration: AquaForest OCR file not found: How to fix

When using the Aquaforest OCR SDK, intermittently you may receive the following message in your application:

System.IO.FileNotFoundException was caught
FileName=C:WINDOWSTEMPAquaforestOcrxxxx_xxx_x.hocr
Message=Could not find file 'C:WINDOWSTEMPAquaforestOcrxxxx_xxx_x.hocr'

This message is generated as a direct result of the source file not being OCR’d, however the particular message is not appropriate in this case.  In order to resolve this issue you need to subscribe to the StatusUpdate which will allow you to use StatusUpdateEventArgs.  This class is available for each page processed when subscribing to the StatusUpdate event and provides information relating to the processing outcome for the page.

Properties

Below are the properties of this class.

  • int PageNumber This property returns page for which the object relates to.

  • int Rotation A value from 0 to 3 which indicates the rotation used for the output in terms of the number of 90° steps away from the orientation in which the input page was provided. If AutoRotation is set to false this will always be 0.

  • double ConfidenceScore Generally a value of 1 or greater would indicate that reasonable OCR of a page, but this should be confirmed using “typical” source files.

  • bool TextAvailable This property indicates whether text was extracted for the page.

  • bool ImageAvailable This property indicates whether an image (after all appropriate pre-processing) was successfully extracted.

  • bool BlankPage This property indicates whether the page was detected as blank.

Example

Below is an example in C# where the above class has been used (higlighted in red) to overcome this issue:

class Program
{
static bool textAvailable = false;
static void Main(string[] args)
{
try
{
Ocr _ocr = new
Ocr();
_ocr.License = "";
PreProcessor _preProcessor = new PreProcessor();
_ocr.EnableConsoleOutput = true;
string OCRFiles = System.IO.Path.GetFullPath(@"............bin");
System.Environment.SetEnvironmentVariable("PATH", System.Environment.GetEnvironmentVariable("PATH") + ";"
+ OCRFiles);
_ocr.ResourceFolder = OCRFiles;
_preProcessor.Deskew = true;
_preProcessor.Autorotate = false;
_ocr.Language = SupportedLanguages.English;
_ocr.EnablePdfOutput = true;
_ocr.StatusUpdate += OcrStatusUpdate;
_ocr.ReadTIFFSource(System.IO.Path.GetFullPath(@"............docstiffssample.tif"));
if (_ocr.Recognize(_preProcessor))
{
string words = null;
for (int j = 1;
j = _ocr.NumberPages; j++)
{
try
{
if (textAvailable)
words += _ocr.ReadPageString(j);
}
catch (Exception
ex)
{
Console.WriteLine("ERROR");
}
}
_ocr.SavePDFOutput(System.IO.Path.GetFullPath(@"............docstiffssample.pdf"),
true);
}
_ocr.DeleteTemporaryFiles();
}
catch (Exception
e)
{
Console.WriteLine("Error
in OCR Processing :" + e.Message);
}
}
private static void OcrStatusUpdate(object sender,
StatusUpdateEventArgs statusUpdateEventArgs)
{
textAvailable = statusUpdateEventArgs.TextAvailable;
}
}
Author
Neil Pitman
Neil Pitman Head of IT Business Solutions

Neil established Aquaforest (later acquired by Nutrient) in 2001 to provide high-performance PDF, OCR, and SharePoint products to a worldwide market.

Explore related topics

Free trial Ready to get started?
Free trial