Extract Data from Bank Statements
This guide explains how to extract key-value pairs (KVPs) from bank statements using Document Engine. For example, this enables you to extract IBANs or account numbers. For more information, refer to the guide on how key-value pair extraction works.
Sending the Request to Extract Data
To extract key-value pairs from a bank statement, post a multipart request to the /api/build
endpoint. In the instructions, specify the following output parameters:
-
type
specifies the output type. Set this tojson-content
. -
keyValuePairs
is a Boolean value that determines whether to extract key-value pairs. -
language
specifies the language used for recognizing text with optical character recognition (OCR). Sometimes, text is stored in a PDF or an image in a way that makes it so you cannot search or copy it. PSPDFKit’s OCR engine allows you to recognize text and save it in a separate file where you can both search and copy and paste the text. For more information, refer to the list of supported languages.
curl -X POST http://localhost:5000/api/build \ -H "Authorization: Token token=<API token>" \ -F document=@/path/to/example-document.pdf \ -F instructions='{ "parts": [ { "file": "document" } ], "output": { "type": "json-content", "keyValuePairs": true, "language": "english" } }' \ -o result.pdf
POST /api/build HTTP/1.1 Content-Type: multipart/form-data; boundary=customboundary Authorization: Token token=<API token> --customboundary Content-Disposition: form-data; name="document"; filename="example-document.pdf" Content-Type: application/pdf <PDF data> --customboundary Content-Disposition: form-data; name="instructions" Content-Type: application/json { "parts": [ { "file": "document" } ], "output": { "type": "json-content", "keyValuePairs": true, "language": "english" } } --customboundary--
For more information on the Build instructions, refer to the API Reference.
Example Data Extraction Response
{ "pages": [ { "pageIndex": 0, "keyValuePairs": [ { "confidence": 95.4, "key": { "bbox": { "left": 0, "top": 0, "width": 100, "height": 100 }, "content": "IBAN" }, "value": { "bbox": { "left": 0, "top": 0, "width": 100, "height": 100 }, "content": "FR7611808009101234567890147", "dataType": "String" } } ] } ] }