Extract data from PDF forms using Power Automate

In this guide, you’ll learn how to extract data from a PDF form using Power Automate. When working with PDF forms and Muhimbi Document Converter, you can extract information from PDF forms in the FDF, XFDF, and XML standards.

There are two ways to extract PDF form data:

  • Converting the XML string to a JSON object and parsing it.

  • Parsing the XML using XPath.

There are advantages and disadvantages to both methods. JSON’s data types also have a one-to-one mapping (key-value pair) to data, while XML is a markup language and uses tags (<>) to represent data items. JSON has a smaller overhead compared to XML, hence it’s easier and quicker to parse, as it’s more lightweight. Sometimes, converting XML to JSON might become a little overwhelming, so you may want to use XPath to extract meaningful information.

In this guide, you’ll learn how to extract data from PDF forms using both methods. You’ll work with an example where a form filled with relevant data is uploaded to an MS SharePoint document library. The flow will pick up the uploaded form automatically, extract the data contained within it, and then add the data to an MS SharePoint list.

Prerequisites

Before beginning, ensure the following prerequisites are in place:

  • A Power Automate subscription.

  • A full or free trial subscription to Muhimbi [Document Converter for SharePoint][].

  • Appropriate privileges to create Power Automate flows.

  • Working knowledge of Power Automate.

Converting an XML String to a JSON Object and Parsing It

Create a new flow using the Automated cloud flow option.

create a new flow

Give your flow a meaningful name and select the When a file is created in a folder SharePoint trigger. Then click Create.

![create file](/images/guides/muhimbi/pdf-converter/power-automate/extract-pdf-form-data-2.jpg)

In the trigger, you can specify the path to the SharePoint Online library to monitor for new files.

sharepoint library

Add the SharePoint Get file content action to the flow canvas and configure it with the details below:

  • Site Address — Specify the path to the SharePoint Online site collection that holds the file.

  • File Identifier — Select x-ms-file-id, which is the output of the When a file is created in a folder action.

configure file settings

Add the Muhimbi Convert document action to the flow canvas and configure it with the details below:

  • Source file namex-mx-filename-encoded, which is the output of the When a file is created in a folder action.

  • Source file content — Specify File Content, which is the output of the Get file content action.

  • Output Format — Choose XML.

configure conversion settings

Add the Compose action to the flow canvas. Add the Processed file content output from the Convert document action, and convert it to base64ToString.

convert to

Add the Compose action to the flow canvas. Then, add the output from the base64ToString action and convert it to JSON using the expression json(xml(outputs('Compose\_-_base64ToString'))).

compose action to flow canvas

Save and perform a manual test on the workflow. Upload a supported file type to the folder that’s configured by the trigger. After a few seconds, you’ll see the XML output in the Compose - base64ToString action and the JSON output in the Compose - XML to JSON action.

XML output:

\<?xml version="1.0" encoding="utf-8"?\>

\<fields xmlns:xfdf="http://ns.adobe.com/xfdf-transition/"\>

\<GivenNameTextBox xfdf:original="Given Name Text Box"\>GName1\</GivenNameTextBox\>

\<FamilyNameTextBox xfdf:original="Family Name Text Box"\>FName1\</FamilyNameTextBox\>

\<Address1TextBox xfdf:original="Address 1 Text Box"\>Address11\</Address1TextBox\>

\<HousenrTextBox xfdf:original="House nr Text Box"\>HouseNr1\</HousenrTextBox\>

\<Address2TextBox xfdf:original="Address 2 Text Box"\>Address21\</Address2TextBox\>

\<PostcodeTextBox xfdf:original="Postcode Text Box"\>Postcode1\</PostcodeTextBox\>

\<CityTextBox xfdf:original="City Text Box"\>City1\</CityTextBox\>

\<CountryComboBox xfdf:original="Country Combo Box"\>Bulgaria\</CountryComboBox\>

\<GenderListBox xfdf:original="Gender List Box"\>Woman\</GenderListBox\>

\<HeightFormattedField xfdf:original="Height Formatted Field"\>120\</HeightFormattedField\>

\<DrivingLicenseCheckBox xfdf:original="Driving License Check Box"\>Yes\</DrivingLicenseCheckBox\>

\<FavouriteColourListBox xfdf:original="Favourite Colour List Box"\>Red\</FavouriteColourListBox\>

\</fields\>

**JSON Output :

**{

"?xml": {

"@version": "1.0",

"@encoding": "utf-8"

},

"fields": {

"@xmlns:xfdf": "http://ns.adobe.com/xfdf-transition/",

"GivenNameTextBox": {

"@xfdf:original": "Given Name Text Box",

"#text": "GName1"

},

"FamilyNameTextBox": {

"@xfdf:original": "Family Name Text Box",

"#text": "FName1"

},

"Address1TextBox": {

"@xfdf:original": "Address 1 Text Box",

"#text": "Address11"

},

"HousenrTextBox": {

"@xfdf:original": "House nr Text Box",

"#text": "HouseNr1"

},

"Address2TextBox": {

"@xfdf:original": "Address 2 Text Box",

"#text": "Address21"

},

"PostcodeTextBox": {

"@xfdf:original": "Postcode Text Box",

"#text": "Postcode1"

},

"CityTextBox": {

"@xfdf:original": "City Text Box",

"#text": "City1"

},

"CountryComboBox": {

"@xfdf:original": "Country Combo Box",

"#text": "Bulgaria"

},

"GenderListBox": {

"@xfdf:original": "Gender List Box",

"#text": "Woman"

},

"HeightFormattedField": {

"@xfdf:original": "Height Formatted Field",

"#text": "120"

},

"DrivingLicenseCheckBox": {

"@xfdf:original": "Driving License Check Box",

"#text": "Yes"

},

"FavouriteColourListBox": {

"@xfdf:original": "Favourite Colour List Box",

"#text": "Red"

}

}

}

Add the Parse JSON action to the flow canvas and configure it with the details below:

  • Content — Add the output of the XML to JSON action.

  • Schema — Click the Generate from sample button and copy the JSON from the step above.

Add the SharePoint Create item action, and pass the following expressions:

  • Site Address — Select the location of the SharePoint list where the list item should be added.

  • List Name — Select the SharePoint list where the list item should be added.

add sharepoint item action

Note: In this step, you’ll add the expressions directly and map them to the SharePoint list.

Publish your flow and upload a supported file type to the folder monitored by the specified SharePoint trigger. After a short wait, a new list item within the target SharePoint list will be created.

Parsing XML Using XPath

The process of parsing XML using XPath is similar. Perform the steps to get the XML output. Then add the following actions to the flow.

Extract the text() value from XML using the xpath('\<xml\>', '\<xpath\>') XPath expression.

This will return an array with XML nodes or values that match the specified XPath expression. In this scenario, there’s only one element in the array, so use the first value using the expression first(), which is first(xpath(xml(outputs('Base64_to_XML(String)')),'/fields/GivenNameTextBox/text()')).

extract text value from xml

Add the SharePoint Create item action, and pass the expressions directly, like so:

  • Site Address — Select the location of the SharePoint list where the list item should be added.

  • List Name — Select the SharePoint list where the list item should be added.

create item action**

Note: In this step, you’ll add the expressions directly and map them to the SharePoint list.

Publish your flow and upload a supported file type to the folder monitored by the specified SharePoint trigger. After a short wait, a new list item within the target SharePoint list will be created.