Use Topic Modeling to identify and categorize topics in a body of text. Consider using the Text Pre-processing tool upstream before passing data into the Topic Modeling tool.
The Topic Modeling tool has 3 anchors:
- Input anchor: Use the input anchor to connect the text data you want to analyze.
- D anchor: Use the D anchor to pass the data you've analyzed downstream.
- R anchor: Use the R anchor to view a report of the analysis.
- M anchor: Use the M anchor to pass the model object downstream for use with new data. The model object is compatible with the Predict tool.
Configure the Tool
- Add a Topic Modeling tool to the canvas.
- Use the anchor to connect the Topic Modeling tool to the text data you want to use in the workflow.
- Select the Text Field you want to analyze.
- Specify the Number of Topics you want to model.
- In the Output Options section, select the kind of output you want in the R anchor:
- The Interactive Chart option generates an interactive report that includes two charts: Top-30 Most Salient Terms and Intertopic Distance Map.
- The Word Relevance Summary option generates a static report with measures of each term's salience to the model and relevance to each topic.
- The Dictionary Options and LDA Options are at their default values. For more information about these options, see the Advanced Options section below.
- Run the workflow.
The Topic Modeling tool has some advanced options.
|Min Frequency||Min Frequency is the minimum frequency at which a word can appear in a body of text before the Topic Modeling tool ignores the word, where frequency is measured by the number of documents containing a word divided by the total number of documents in the body of text.||
|Max Frequency||Max Frequency is the maximum frequency at which a word can appear in a body of text before the Topic Modeling tool ignores the word, where frequency is measured by the number of documents containing a word divided by the total number of documents in the body of text.||
|Max Words||Max Words specifies how many words you want the Topic Modeling tool's algorithm to consider, based on how frequently the words appear across all the documents.||
|Alpha||Alpha represents the density of topics the algorithm should expect in each document. Increasing Alpha allows the algorithm to recognize a greater number of distinct topics in a document. Decreasing Alpha limits the number of topics the algorithm recognizes in each document.||Number||None|
|Eta||Eta represents the density of words needed to make up a topic. Increasing Eta increases the number of words needed to identify a topic. Decreasing Eta reduces the number of words needed to identify a topic.||Number||>= 0|
The D anchor outputs a new column for each topic. The columns represent the degree to which each topic is present in the text associated with each row. A higher value in the topic column indicates a greater probability the text associates with that topic. The R anchor outputs one of two reports based on your selection: Either an Interactive Chart with the Top-30 Most Salient Terms and Intertopic Distance Map, or a Word Relevance Summary with measures of each term's salience to the model and relevance to each topic. The M anchor outputs a model object downstream for use with new data. The model object is compatible with the Predict tool.