Use Topic Modeling to identify and categorize topics in a body of text. Consider using the Text Pre-processing tool upstream before passing data into the Topic Modeling tool.
This tool is part of Alteryx Intelligence Suite. Intelligence Suite requires a separate license and add-on installer to Designer. After you install Designer, install Intelligence Suite and start your free trial.
The Topic Modeling tool supports English, French, German, Italian, Portuguese, and Spanish.
The Topic Modeling tool has 3 anchors:
- Input anchor: Use the input anchor to connect the text data you want to analyze.
- D anchor: Use the D anchor to pass the data you've analyzed downstream.
- R anchor: Use the R anchor to view a report of the analysis.
- M anchor: Use the M anchor to pass the model object downstream for use with new data. The model object is compatible with the Predict tool.
Configure the Tool
- Add a Topic Modeling tool to the canvas.
- Use the anchor to connect the Topic Modeling tool to the text data you want to use in the workflow.
- Select the Text Field you want to analyze.
- Specify the Number of Topics you want to model.
- In the Output Options section, select the kind of output you want in the R anchor:
- The Interactive Chart option generates an interactive report that includes two charts: Top-30 Most Salient Terms and Intertopic Distance Map.
- The Word Relevance Summary option generates a static report with measures of each term's salience to the model and relevance to each topic.
- The Dictionary Options and LDA Options are at their default values. For more information about these options, see the Advanced Options section below.
- Run the workflow.
The Topic Modeling tool has some advanced options.
|Min Frequency||Min Frequency is the minimum frequency at which a word can appear in a body of text before the Topic Modeling tool ignores the word, where frequency is measured by the number of documents containing a word divided by the total number of documents in the body of text.||
|Max Frequency||Max Frequency is the maximum frequency at which a word can appear in a body of text before the Topic Modeling tool ignores the word, where frequency is measured by the number of documents containing a word divided by the total number of documents in the body of text.||
|Max Words||Max Words specifies how many words you want the Topic Modeling tool's algorithm to consider, based on how frequently the words appear across all the documents.||
|Alpha||Alpha represents the density of topics the algorithm should expect in each document. Increasing Alpha allows the algorithm to recognize a greater number of distinct topics in a document. Decreasing Alpha limits the number of topics the algorithm recognizes in each document.||Number||None|
|Eta||Eta represents the density of words needed to make up a topic. Increasing Eta increases the number of words needed to identify a topic. Decreasing Eta reduces the number of words needed to identify a topic.||Number||>= 0|
The D anchor outputs a new column for each topic. The columns represent the degree to which each topic is present in the text associated with each row. A higher value in the topic column indicates a greater probability the text associates with that topic. The R anchor outputs one of two reports based on your selection: Either an Interactive Chart with the Top-30 Most Salient Terms and Intertopic Distance Map, or a Word Relevance Summary with measures of each term's salience to the model and relevance to each topic. The M anchor outputs a model object downstream for use with new data. The model object is compatible with the Predict tool.