Zero-shot Text Classification
The Zero-shot Text Classification tool assigns scored categories to bodies of text based on a category list you define. For example, you can feed in newspaper articles and define the label categories "Politics" and "Technology" and the tool provides a probability for the relevance of each label. The Zero-shot Text Classification tool doesn’t require training data and leverages ONNX Runtime using the huggingface transformer model.
Important
This tool is part of Alteryx Intelligence Suite. Intelligence Suite requires a separate license and add-on installer to Designer. After you install Designer, install Intelligence Suite and start your free trial.
Language Support
The Zero-shot Text Classification tool only supports English at this time.
Tool Components
The Zero-shot Text Classification tool has 3 anchors (2 inputs and 1 output):
-
D input anchor: Use the D input anchor to connect the text data you want to categorize.
-
L input anchor: Use the L input anchor to pass category labels to the tool.
-
Output anchor: Use the output anchor to pass the scored categories for each body of text downstream.
Configure the Tool
-
Add a Zero-shot Text Classification tool to the canvas.
-
Use the D input anchor to connect the Zero-shot Classification tool to the text data you want to use in the workflow.
-
If you have large bodies of text, split the text into smaller sections or pre-process your text with the Text Pre-processing Tool or Text Summary tools.
-
Use the L input anchor to pass the category labels to the Zero-shot Classification tool. You can use the Text Input Tool to create your list of category labels.
-
Select the Column with Text you want to analyze. The tool doesn’t require training data.
-
Select the Column with Labels for the categories you want to score.
-
(Optional) Select Multi-label Classification to treat categories independently from each other. Use this option to determine if your text belongs to more than 1 category.
-
Run the workflow.
Output
The output includes 2 sets of columns:
-
Column for each category label. Each column represents the degree to which the text in each row is associated with each category. A higher value in the category column indicates a greater probability the text associates with that category.
-
Column that contains the category label with the highest probability value if you use more than 1 category label.