PDF Recognition to JSON Job Step

The new PDF Recognition to JSON step automatically extracts important data from searchable PDF files in the form of Key/Value pairs. The output is given as a JSON file that contains each expected key along with its value pair.

A UI program exists to test pdf files and show what data pairs will be extracted from the file. This program can be found at:
<Autobahn DX Installation directory>\distribution\recognition\AquaforestDataExtractorUI.exe”

You must use an ‘Expected Key’ file to tell Autobahn which keys to extract from the input files. You can also specify synonyms for your keys, so that values paired with any synonym will also be extracted with the key. This is very useful when processing files with varying formats and different ways of framing the same data. Our example ‘Expected Key’ file below highlights how this file can be used to cover multiple naming for the same key.

You can get more information about each step property in Section 5.2.7.29

Example of an “Expected Key” file:

{

"expectedKeys": \[

{

"expectedKey": "Invoice No",

"synonyms": \[

"Invoice Number",

"Invoice No.",

"Invoice Num"

\]

},

{

"expectedKey": "Inv Date",

"synonyms": \[

"Invoice Date",

"Inv. Date",

"Inv date"

\]

},

{

"expectedKey": "Reference",

"synonyms": \[ \]

},

{

"expectedKey": "City/State/Zip",

"synonyms": \[ "Postcode" \]

}

\]

}