Zero-shot Text Classification
The Zero-shot Text Classification tool assigns scored categories to bodies of text based on a category list you define. For example, you can feed in newspaper articles and define the label categories "Politics" and "Technology" and the tool provides a probability for the relevance of each label. The Zero-shot Text Classification tool doesn’t require training data and leverages ONNX Runtime using the huggingface transformer model.
This tool is part of Alteryx Intelligence Suite. Intelligence Suite requires a separate license and add-on installer to Designer. After you install Designer, install Intelligence Suite and start your free trial.
The Zero-shot Text Classification tool only supports English at this time.
The Zero-shot Text Classification tool has 3 anchors (2 inputs and 1 output):
- D input anchor: Use the D input anchor to connect the text data you want to categorize.
- L input anchor: Use the L input anchor to pass category labels to the tool.
- Output anchor: Use the output anchor to pass the scored categories for each body of text downstream.
Configure the Tool
- Add a Zero-shot Text Classification tool to the canvas.
- Use the D input anchor to connect the Zero-shot Text Classification tool to the text data you want to use in the workflow.
- If you have large bodies of text, split the text into smaller sections or pre-process your text with the Text Pre-processing or Text Summary tools.
- Use the L input anchor to pass the category labels to the Zero-shot Classification tool. You can use the Text Input tool to create your list of category labels.
- Select the Column with Text you want to analyze. The tool doesn’t require training data.
- Select the Column with Labels for the categories you want to score.
- (Optional) Select Multi-label Classification to treat categories independently from each other. Use this option to determine if your text belongs to more than 1 category.
- Run the workflow.
The output includes 2 sets of columns:
- Column for each category label. Each column represents the degree to which the text in each row is associated with each category. A higher value in the category column indicates a greater probability the text associates with that category.
- Column that contains the category label with the highest probability value if you use more than 1 category label.