Extract Key-Value Pairs from Documents Using Power Automate
Muhimbi’s Document Converter for Power Automate recently added a new action - KVP extraction - which effectively extracts key-value pairs (KVPs) from unstructured documents or images. Leveraging AI, ML, and adaptive layout understanding, you can automatically label and extract information such as phone numbers, IBANs, credit cards, names, and email addresses.
The engine can handle scenarios like text recognition in noisy documents, recognition of dotted lines, handling touching and broken characters, text on colored backgrounds, underlined text, skewed text, and text in graphics and tables. Follow this guide to learn how to extract data from PDFs by using intelligent document processing.
Steps to extract key-value pair data from PDF documents using Power Automate:
1.Add “For a selected file” action.
From the list of triggers choose to manually trigger a flow, then add the SharePoint action “For a selected file” to the Power Automate Canvas and configure it following the details below:
-
Site Address: Specify the path to the SharePoint Online site collection that holds the file.
-
Library Name: Specify the library name where the file is located.
2.Add “Get file properties” action.
From the list of actions find SharePoint’s “Get file properties” action and configure it following the details below:
-
Site Address: Specify the path to the SharePoint Online site collection that holds the file.
-
Library Name: Specify the library name where the file is located.
-
Id: Add “ID” from the dynamic list.
3.Add “Get file content” action.
The third action to add is “Get file content” where you specify the site address and file identifier following the details below:
-
Site Address: Specify the path to the SharePoint Online site collection that holds the file.
-
File Identifier: Add “Identifier” from the dynamic list.
4.Initialize variable (Invoice Number).
In the actions search bar, type “Initialize variable” and click to add the action to the flow.
-
Name: Type “Invoice number” in this field or whatever you need to extract.
-
Type: Choose “String” from the drop-down list.
-
Value: You can leave this field empty.
5.Initialize variable (Invoice Date). In the actions search bar, type “Initialize variable” and click to add the action to the flow.
-
Name: Type “Invoice date” in this field or whatever you need to extract.
-
Type: Choose “String” from the drop-down list.
-
Value: Enter string value
6.Add the Muhimbi action “Extract Key Value Pairs”.
Add the Muhimbi “Extract Key Value Pairs” action to the Power Automate and configure it following the details below.
-
Source File name: Choose “File name with extension”
-
Source file content: Choose the “Get file content” action.
-
OCR language: eng (We support many other languages)
-
DPI: In this example, we specified a value 300
-
KVP Output format supports three file types “json”, “csv”, “xml”
-
Page range: This is not a mandatory field but you can specify the page range of the document where you want to extract data.
-
Other fields are set up as default
-
The confidence threshold should be set up to 50.
7.Compose Just Log.
For this action, use the function Convert to Base64string.
8.Parse JSON.
Content field: Choose “Outputs” from the dynamic list. Schema: The code will be generated.
9.Apply to each.
Put the “Body” content from the dynamic list in the first field. Use the Switch statement to set the values for the invoice number and invoice date.
10.Create item.
Site Address: Site Address: Specify the path to the SharePoint Online site collection that holds the file.
List Name: Enter the name of your list. In this example, we put the name “Invoice Details”.
Title: Choose Choose the “Invoice Number” from the dynamic list of functions.
Invoice Date: Choose the “Invoice Date” from the dynamic list of functions.
That is it. Publish your Flow and run your flow manually, you can find the output file in the location mentioned in the “Create item” step. If the output file contains any key value, it will be based on the input file used.