Extract and export data from PDF files easily

Advanced Export allows the export of areas on the PDF page to CSV or XLSX files.

For an Advanced Export job, the Select Variables tab looks like this.

Select Variables tab

The file displayed is either the one selected in the Location Settings tab or can be selected by clicking the Open File button.

If there is no Document Automation Server (DAS) Content Extraction variable, click the Add Item button.

Select the area on the displayed file containing the information required.

Select Area on Displayed File

Click the Camera icon to capture the text.

Captured Text

The extracted text contains more than the actual invoice number. It is because the area is designed to cover slightly different format invoices.

This is the same area on the invoice that makes up the second page of the example document.

Extra Captured Area

Click Done.

The text needs to be refined.

For this example file, the invoice number is of the (regular expression) format [a-z][0-9\]+.

The first part selects a string starting with the alphabetic characters A-Z. The second part is one or more numeric characters or a hyphen.

This is added to the column settings by selecting text in zone where text matches pattern and entering the pattern in the textbox. You can view the available tips on the regular expressions by clicking the ?.

Coulmn Settings

Check the extracted text using the Camera icon in the column settings.

Add another item.

Select an area that covers the Grand Total on both pages.

Selecting Area Displaying the Grand Total

Click the Camera icon on Column 3 to select the area.

Check that the text contains the value you are after.

Click Done.

The pattern for this selection is more complicated. The literal text Grand Total identifies the beginning of the selection. Next, there is one or more whitespace (space or punctuation) characters followed by one or more digits, a decimal point then two digits.

Pattern for Grand Total

Click the Camera icon to see the extracted text.

Captured Text

The next task is to refine this text further by removing everything before the “Grand Total” part.

Select all text in paragraph after value and Value = Grand Total where text matches pattern one or more digits optionally followed by a decimal point and two digits.

Refining the Text

Click the Camera icon to see the extracted text.

Captured Text

Save the job and run it.

The output file will contain the following:

Output File