Extract Data from Invoices Using C#
GdPicture.NET’s key-value pair (KVP) extraction engine enables you to recognize related data items in a document and export them to an external destination like a spreadsheet.
To extract data items from an invoice, follow these steps:
-
Create a
GdPictureOCR
object and aGdPictureImaging
object. -
Select the invoice by passing its path to the
CreateGdPictureImageFromFile
method of theGdPictureImaging
object. -
Configure the OCR process with the
GdPictureOCR
object in the following way:-
Set the invoice with the
SetImage
method. -
Set the path to the OCR resource folder with the
ResourceFolder
property. The default language resources are located inGdPicture.NET 14\Redist\OCR
. For more information on adding language resources, see the language support guide. -
With the
AddLanguage
method, add the language resources that GdPicture.NET uses to recognize text in the image. This method takes a member of theOCRLanguage
enumeration.
-
-
Run the OCR process with the
RunOCR
method of theGdPictureOCR
object. -
Get the number of key-value pairs detected during the OCR process with the
GetKeyValuePairCount
method of theGdPictureOCR
object, and loop through them. -
Get the key-value pairs, the data types, and the confidence scores with the following methods:
-
Write the output to the console.
-
Release unnecessary resources.
The example below retrieves key-value pairs from the following invoice.
Download the sample invoice and run the code below, or check out our demo.
using GdPictureOCR gdpictureOCR = new GdPictureOCR(); using GdPictureImaging gdpictureImaging = new GdPictureImaging(); // Load the source document. int imageId = gdpictureImaging.CreateGdPictureImageFromFile(@"C:\temp\source.png"); // Configure the OCR process. gdpictureOCR.ResourceFolder = @"C:\GdPicture.NET 14\Redist\OCR"; gdpictureOCR.AddLanguage(OCRLanguage.English); gdpictureOCR.SetImage(imageId); // Run the OCR process. string ocrResultId = gdpictureOCR.RunOCR(); string keyValuePairsData = ""; for (int pairIndex = 0; pairIndex < gdpictureOCR.GetKeyValuePairCount(ocrResultId); pairIndex++) { keyValuePairsData += $"| Key: {gdpictureOCR.GetKeyValuePairKeyString(ocrResultId, pairIndex)} | " + $"Value: {gdpictureOCR.GetKeyValuePairValueString(ocrResultId, pairIndex)} | " + $"Document Type: {gdpictureOCR.GetKeyValuePairDataType(ocrResultId, pairIndex).ToString()} | " + $"Confidence Level: {Math.Round(gdpictureOCR.GetKeyValuePairConfidence(ocrResultId, pairIndex), 1).ToString()}% |\n"; } // Write the output to the console. Console.WriteLine(keyValuePairsData); // Release unnecessary resources. gdpictureImaging.ReleaseGdPictureImage(imageId); gdpictureOCR.ReleaseOCRResults();
Using gdpictureOCR As GdPictureOCR = New GdPictureOCR() Using gdpictureImaging As GdPictureImaging = New GdPictureImaging() ' Load the source document. Dim imageId As Integer = gdpictureImaging.CreateGdPictureImageFromFile("C:\temp\source.png") ' Configure the OCR process. gdpictureOCR.ResourceFolder = "C:\GdPicture.NET 14\Redist\OCR" gdpictureOCR.AddLanguage(OCRLanguage.English) gdpictureOCR.SetImage(imageId) ' Run the OCR process. Dim ocrResultId As String = gdpictureOCR.RunOCR() Dim keyValuePairsData = "" For pairIndex As Integer = 0 To gdpictureOCR.GetKeyValuePairCount(ocrResultId) - 1 keyValuePairsData += $"| Key: {gdpictureOCR.GetKeyValuePairKeyString(ocrResultId, pairIndex)} | Value: {gdpictureOCR.GetKeyValuePairValueString(ocrResultId, pairIndex)} | Document Type: {gdpictureOCR.GetKeyValuePairDataType(ocrResultId, pairIndex).ToString()} | Confidence Level: {Math.Round(gdpictureOCR.GetKeyValuePairConfidence(CStr(ocrResultId), CInt(pairIndex)), CInt(1)).ToString()}% |" & vbLf Next ' Write the output to the console. Console.WriteLine(keyValuePairsData) ' Release unnecessary resources. gdpictureImaging.ReleaseGdPictureImage(imageId) gdpictureOCR.ReleaseOCRResults() End Using End Using
Used Methods and Properties
Related Topics
Format the output to obtain the following table:
Key | Value | Document Type | Confidence Level |
---|---|---|---|
Billing date | 20/09/2022 | DateTime | 100% |
Order date | 20/09/2022 | DateTime | 100% |
Republic of PDF | +100 847 738 227 | PhoneNumber | 77.2% |
IBAN | AT13 2060 4236 6111 5994 | IBAN | 100% |
Customer | Vandelay Industries Around the Corner 13 NBC City | String | 69.8% |
Delivery address | Vandelay Industries Around the Corner 13 NBC City | String | 69.9% |
Invoice number | No 00162 | String | 70.9% |
Ref. number | 34751 | Number | 92.9% |
No | 00162 | Number | 100% |
Reference | P00201 | UID | 100% |
Quantity Total (excl. VAT) | 320.00€ | Currency | 59% |
Subtotal | 1,220.00€ | Currency | 100% |
Discount (10%) | -122.00€ | Currency | 90.6% |
VAT (5.5%) | +6710€ | Currency | 66.9% |
Shipping cost | 0.00€ | Currency | 75% |
TOTAL | 1,165.10€ | Currency | 100% |
Description | Lake Mirror | String | 99.6% |
VAT | 5.5% | Percentage | 66.6% |
Price per unit (excl. VAT) | 320.00€ | Currency | 80% |
Tax No. | AT98765321 | UID | 73.8% |
# | [email protected] | EmailAddress | 65.6% |
# | www.bruuuk.com | URL | 65.6% |