Tool Icon

Key-Value Pair Extraction

Version:
2022.1
Last modified: September 01, 2022

Laboratory Tool

This is a Laboratory tool and isn't for use in production. It might have documented known issues, might not be feature complete, and is subject to change.

A key-value pair links 2 data elements. The key is a unique identifier that defines the dataset (for example, person, place, thing) and the value is the identified data. Examples of key-value pairs:

  • Person: John
  • Place: Bank
  • Thing: Check

The Key-Value Pair Extraction tool identifies key-value pair structures in your documents. The tool leverages the Google Tesseract library and fuzzy matching to find key-value pairs. The Key-Value Pair Extraction tool isn’t intended for tabular data. For tabular data, use the Image Template tool.

If you are passing noisy documents to the Key-Value Pair Extraction tool, try to pre-process images with the OCR Optimization feature in the Image Processing tool to improve results. The OCR Optimization feature cleans up documents that have non-white backgrounds, watermarks, and other noise.

This tool is part of Alteryx Intelligence Suite. Intelligence Suite requires a separate license and add-on installer to Designer. After you install Designer, install Intelligence Suite and start your free trial.

Language support

The Key-Value Pair Extraction tool supports English, Chinese (Simplified), French, German, Italian, Japanese, Portuguese, and Spanish as inputs. We recommend that your key and value are in the same language.

Tool Components

The Key-Value Pair Extraction tool has 3 anchors:

  • D anchor: Use the D anchor to pass the image data you want to analyze.
  • K anchor: Use the K anchor to pass the keys you want to identify.
  • Output anchor: Use the output anchor to pass the key-value pairs downstream.

Configure the Tool

  1. Add a Key-Value Pair Extraction tool to the canvas.
  2. Use the anchors to connect the Key-Value Pair Extraction tool to the image data and keys you want to use in the workflow.
  3. Select the column containing the Image data.
  4. Select the Language of the text within the image data.
  5. Select the column containing the Keys. Tip: You can use the Text Input tool to enter your keys within the workflow.
  6. Run the workflow.

Output

The Key-Value Pair Extraction tool outputs the incoming columns in addition to columns named after each identified key. The column for each key contains the associated values in a single cell. If there is more than 1 value per key, the tool separates the values with a space (for example, value1 value2 value3). If a key appears at more than 1 location, the tool creates a column for each instance (for example, key1, key2, key3).

FAQ

Do the keys I choose need to match the document exactly?

For best results, we recommend keys match the document as close as possible. However, the Key-Value Pair Extraction tool can find keys with different cases or key-value pairs with different delimiters (for example, [KEY: value] and [key, value]).

Can my document have a colored background?

In general, you can use the tool with images that have black text on white backgrounds. However, if you are dealing with documents that have a non-white background, the OCR Optimization feature in the Image Processing tool can correct this.

If I use the grayscale option in the Image Processing tool first, will that improve the key-value pair results?

We recommend using the OCR Optimization feature in the Image Processing tool first as it automatically converts to grayscale in the background and negates the need for manual grayscale adjustments.

Can you use the Key-Value Pair Extraction tool with the Image Template tool?

You can’t connect the Key-Value Pair Extraction tool with the Image Template tool. Note, the Key-Value Pair Extraction tool identifies all instances of your specified keys and returns their corresponding values, regardless of their positions in a document. This negates the need for the creation of bounding boxes and annotation.

I am getting the following error in the Results: “This key isn’t supported: None” – what should I do?

Delete any empty rows in your list of keys, then run the workflow again.

What does the tool recognize as a key-value pair? How should I format the key-value pairs?

Ideally, structure the key-value pairs like this:

Structure

<Key>: <Value>

Example 1

Company: Alteryx

Example 2

Name: Libby

The tool can also recognize keys with multi-line values as long as there are no lines, such as cells from a table, separating the values:

Structure

<Key>: <Value Line 1>

<Value Line 2>

<Value Line 3>

Example 1

Shipping Address: ABC Company

123 Main Street

Some City, New York 12345

Example 2

Billing Address: XYZ Vendor

456 Pleasant Street

Was This Page Helpful?

Running into problems or issues with your Alteryx product? Visit the Alteryx Community or contact support. Can't submit this form? Email us.