Extract Data from Bank Statements Using C#
GdPicture.NET’s key-value pair (KVP) extraction engine enables you to recognize related data items in a document and export them to an external destination like a spreadsheet.
To extract data items from a bank statement, follow these steps:
-
Create a
GdPictureOCR
object and aGdPictureImaging
object. -
Select the bank statement by passing its path to the
CreateGdPictureImageFromFile
method of theGdPictureImaging
object. -
Configure the OCR process with the
GdPictureOCR
object in the following way:-
Set the bank statement with the
SetImage
method. -
Set the path to the OCR resource folder with the
ResourceFolder
property. The default language resources are located inGdPicture.NET 14\Redist\OCR
. For more information on adding language resources, see the language support guide. -
With the
AddLanguage
method, add the language resources that GdPicture.NET uses to recognize text in the image. This method takes a member of theOCRLanguage
enumeration.
-
-
Run the OCR process with the
RunOCR
method of theGdPictureOCR
object. -
Get the number of key-value pairs detected during the OCR process with the
GetKeyValuePairCount
method of theGdPictureOCR
object, and loop through them. -
Get the key-value pairs, the data types, and the confidence scores with the following methods:
-
Write the output to the console.
-
Release unnecessary resources.
The example below retrieves key-value pairs from the following bank statement.
Download the sample bank statement and run the code below, or check out our demo.
using GdPictureOCR gdpictureOCR = new GdPictureOCR(); using GdPictureImaging gdpictureImaging = new GdPictureImaging(); // Load the source document. int imageId = gdpictureImaging.CreateGdPictureImageFromFile(@"C:\temp\source.png"); // Configure the OCR process. gdpictureOCR.ResourceFolder = @"C:\GdPicture.NET 14\Redist\OCR"; gdpictureOCR.AddLanguage(OCRLanguage.English); gdpictureOCR.SetImage(imageId); // Run the OCR process. string ocrResultId = gdpictureOCR.RunOCR(); string keyValuePairsData = ""; for (int pairIndex = 0; pairIndex < gdpictureOCR.GetKeyValuePairCount(ocrResultId); pairIndex++) { keyValuePairsData += $"| Key: {gdpictureOCR.GetKeyValuePairKeyString(ocrResultId, pairIndex)} | " + $"Value: {gdpictureOCR.GetKeyValuePairValueString(ocrResultId, pairIndex)} | " + $"Document Type: {gdpictureOCR.GetKeyValuePairDataType(ocrResultId, pairIndex).ToString()} | " + $"Confidence Level: {Math.Round(gdpictureOCR.GetKeyValuePairConfidence(ocrResultId, pairIndex), 1).ToString()}% |\n"; } // Write the output to the console. Console.WriteLine(keyValuePairsData); // Release unnecessary resources. gdpictureImaging.ReleaseGdPictureImage(imageId); gdpictureOCR.ReleaseOCRResults();
Using gdpictureOCR As GdPictureOCR = New GdPictureOCR() Using gdpictureImaging As GdPictureImaging = New GdPictureImaging() ' Load the source document. Dim imageId As Integer = gdpictureImaging.CreateGdPictureImageFromFile("C:\temp\source.png") ' Configure the OCR process. gdpictureOCR.ResourceFolder = "C:\GdPicture.NET 14\Redist\OCR" gdpictureOCR.AddLanguage(OCRLanguage.English) gdpictureOCR.SetImage(imageId) ' Run the OCR process. Dim ocrResultId As String = gdpictureOCR.RunOCR() Dim keyValuePairsData = "" For pairIndex As Integer = 0 To gdpictureOCR.GetKeyValuePairCount(ocrResultId) - 1 keyValuePairsData += $"| Key: {gdpictureOCR.GetKeyValuePairKeyString(ocrResultId, pairIndex)} | Value: {gdpictureOCR.GetKeyValuePairValueString(ocrResultId, pairIndex)} | Document Type: {gdpictureOCR.GetKeyValuePairDataType(ocrResultId, pairIndex).ToString()} | Confidence Level: {Math.Round(gdpictureOCR.GetKeyValuePairConfidence(CStr(ocrResultId), CInt(pairIndex)), CInt(1)).ToString()}% |" & vbLf Next ' Write the output to the console. Console.WriteLine(keyValuePairsData) ' Release unnecessary resources. gdpictureImaging.ReleaseGdPictureImage(imageId) gdpictureOCR.ReleaseOCRResults() End Using End Using
Used Methods and Properties
Related Topics
Format the output to obtain the following table:
Key | Value | Document Type | Confidence Level |
---|---|---|---|
IBAN | FR7611808009101234567890147 | IBAN | 100% |
Phone | 786-315-0313 | PhoneNumber | 100% |
BIC | 12345678901 | Number | 66.4% |
Bank Code | 11808 | Number | 99.4% |
Counter Code | 00914 | Number | 100% |
Number Account | 12345678901 | Number | 99.3% |
Bank Key | 47 | Number | 74.2% |
River Bank | 100 | Number | 74% |
Account Owner | David Bricklane | String | 100% |
Domiciliation | East Bank Summerfield | String | 97.5% |