Key-Value Pair Extraction
This is a Laboratory tool and isn't for use in production. It might have documented known issues, might not be feature complete, and is subject to change.
A key-value pair links 2 data elements. The key is a unique identifier that defines the dataset (for example, person, place, thing) and the value is the identified data. Examples of key-value pairs:
- Person: John
- Place: Bank
- Thing: Check
The Key-Value Pair Extraction tool identifies key-value pair structures in your documents. The tool leverages the Google Tesseract library and fuzzy matching to find key-value pairs. The Key-Value Pair Extraction tool isn’t intended for tabular data. For tabular data, use the Image Template tool.
If you are passing noisy documents to the Key-Value Pair Extraction tool, try to pre-process images with the OCR Optimization feature in the Image Processing tool to improve results. The OCR Optimization feature cleans up documents that have non-white backgrounds, watermarks, and other noise.
This tool is part of Alteryx Intelligence Suite. Intelligence Suite requires a separate license and add-on installer to Designer. After you install Designer, install Intelligence Suite and start your free trial.
The Key-Value Pair Extraction tool supports English, Chinese (Simplified), French, German, Italian, Japanese, Portuguese, and Spanish as inputs. We recommend that your key and value are in the same language.
The Key-Value Pair Extraction tool has 3 anchors:
- D anchor: Use the D anchor to pass the image data you want to analyze.
- K anchor: Use the K anchor to pass the keys you want to identify.
- Output anchor: Use the output anchor to pass the key-value pairs downstream.
Configure the Tool
- Add a Key-Value Pair Extraction tool to the canvas.
- Use the anchors to connect the Key-Value Pair Extraction tool to the image data and keys you want to use in the workflow.
- Select the column containing the Image data.
- Select the Language of the text within the image data.
- Select the column containing the Keys. Tip: You can use the Text Input tool to enter your keys within the workflow.
- Run the workflow.
The Key-Value Pair Extraction tool outputs the incoming columns in addition to columns named after each identified key. The column for each key contains the associated values in a single cell. If there is more than 1 value per key, the tool separates the values with a space (for example, value1 value2 value3). If a key appears at more than 1 location, the tool creates a column for each instance (for example, key1, key2, key3).
For best results, we recommend keys match the document as close as possible. However, the Key-Value Pair Extraction tool can find keys with different cases or key-value pairs with different delimiters (for example, [KEY: value] and [key, value]).
In general, you can use the tool with images that have black text on white backgrounds. However, if you are dealing with documents that have a non-white background, the OCR Optimization feature in the Image Processing tool can correct this.
We recommend using the OCR Optimization feature in the Image Processing tool first as it automatically converts to grayscale in the background and negates the need for manual grayscale adjustments.
You can’t connect the Key-Value Pair Extraction tool with the Image Template tool. Note, the Key-Value Pair Extraction tool identifies all instances of your specified keys and returns their corresponding values, regardless of their positions in a document. This negates the need for the creation of bounding boxes and annotation.
Delete any empty rows in your list of keys, then run the workflow again.
The Key-Value Pair Extraction tool is not optimized for handwriting.
Ideally, structure the key-value pairs like this:
The tool can also recognize keys with multi-line values as long as there are no lines, such as cells from a table, separating the values:
<Key>: <Value Line 1>
<Value Line 2>
<Value Line 3>
Shipping Address: ABC Company
123 Main Street
Some City, New York 12345
Billing Address: XYZ Vendor
456 Pleasant Street