Key-Value Pair Extraction
Laboratory Tool
This is a Laboratory tool and isn't for use in production. It might have documented known issues, might not be feature complete, and is subject to change.
A key-value pair links 2 data elements. The key is a unique identifier that defines the dataset (for example, person, place, thing) and the value is the identified data. Examples of key-value pairs:
- Person: John
- Place: Bank
- Thing: Check
The Key-Value Pair Extraction tool identifies key-value pair structures in your documents. The tool leverages the Google Tesseract library and fuzzy matching to find key-value pairs. The Key-Value Pair Extraction tool isn’t intended for tabular data. For tabular data, use the Image Template .
If you are passing noisy documents to the Key-Value Pair Extraction tool, try to pre-process images with the OCR Optimization feature in the Image Processing to improve results. The OCR Optimization feature cleans up documents that have non-white backgrounds, watermarks, and other noise.
Language Support
The Key-Value Pair Extraction tool supports English, Chinese (Simplified), French, German, Italian, Portuguese, and Spanish as inputs.
Tool Components
The Key-Value Pair Extraction tool has 3 anchors:
- D anchor: Use the D anchor to pass the image data you want to analyze.
- K anchor: Use the K anchor to pass the keys you want to identify.
- Output anchor: Use the output anchor to pass the key-value pairs downstream.
Configure the Tool
- Add a Key-Value Pair Extraction tool to the canvas.
- Use the anchors to connect the Key-Value Pair Extraction tool to the image data and keys you want to use in the workflow.
- Select the column that contains the Image data.
- Select the Language of the text within the image data.
- Select the column that contains the Keys.
Tip
You can use the Text Input Tool to enter your keys within the workflow.
- Run the workflow.
Output
The Key-Value Pair Extraction tool outputs the incoming columns in addition to columns named after each identified key. The column for each key contains the associated values in a single cell. If there is more than 1 value per key, the tool separates the values with a space (for example, value1 value2 value3). If a key appears at more than 1 location, the tool creates a column for each instance (for example, key1, key2, key3).
FAQ
Do the keys I choose need to match the document exactly?
For best results, we recommend keys match the document as close as possible. However, the Key-Value Pair Extraction tool can find keys with different cases or key-value pairs with different delimiters (for example, [KEY: value] and [key, value]).
Can my document have a colored background?
In general, you can use the tool with images that have black text on white backgrounds. However, if you are dealing with documents that have a non-white background, the OCR Optimization feature in the Image Processing can correct this.
If I use the grayscale option in the Image Processing tool first, will that improve the key-value pair results?
We recommend using the OCR Optimization feature in the Image Processing first as it automatically converts to grayscale in the background and negates the need for manual grayscale adjustments.
Can you use the Key-Value Pair Extraction tool with the Image Template tool?
You can’t connect the Key-Value Pair Extraction tool with the Image Template tool. Note, the Key-Value Pair Extraction tool identifies all instances of your specified keys and returns their corresponding values, regardless of their positions in a document. This negates the need for the creation of bounding boxes and annotation.
I am getting the following error in the Results: "This key isn’t supported: None"–what should I do?
Delete any empty rows in your list of keys, then run the workflow again.
Does the Key-Value Pair Extraction tool work on handwriting? Can I use it on forms with handwritten inputs?
The Key-Value Pair Extraction tool is not optimized for handwriting.
What does the tool recognize as a key-value pair? How should I format the key-value pairs?
Ideally, structure the key-value pairs like this:
Structure
<Key>: <Value>
Example 1
Company: Alteryx
Example 2
Name: Libby
The tool can also recognize keys with multi-line values as long as there are no lines, such as cells from a table, separating the values:
Structure
<Key>: <Value Line 1>
<Value Line 2>
<Value Line 3>
Example 1
Shipping Address: ABC Company
123 Main Street
Some City, New York 12345
Example 2
Billing Address: XYZ Vendor
456 Pleasant Street