Topic Modeling Tool
Use Topic Modeling to identify and categorize topics in a body of text.
Tool Components
The Topic Modeling tool has three anchors:
- Input anchor: Use the input anchor to connect the text data you want to analyze.
- "D" anchor: Use the output anchor to pass the data you've analyzed downstream.
- "R" anchor: Use the "R" anchor to view a report of the analysis.
Configure the Tool
- Add a Topic Modeling tool to the canvas.
- Use the anchors to connect the Topic Modeling tool to the text data you want to use in the workflow.
- Select the Text Field you want to analyze.
- Specify the Number of Topics you want to model.
- In the Output Options section, select the kind of output you want:
- The Interactive Output option generates an interactive report that includes two charts: Top-30 Most Salient Terms and Intertopic Distance Map.
- The Word Relevance Output option generates a static report with measures of each term's salience to the model and relevance to each topic.
- Run the workflow.
Here are some resources about the concepts saliency and relevance as they relate to topic modeling.
Advanced Options
The Topic Modeling tool has some advanced options.
Dictionary Options
Name | Description | Options | Recommended Option |
---|---|---|---|
Min Frequency | Min Frequency is the minimum frequency at which a word can appear in a body of text before the LDA tool ignores the word, where frequency is measured by the number of documents containing a word divided by the total number of documents in the body of text. |
|
0.01 |
Max Frequency | Max Frequency is the maximum frequency at which a word can appear in a body of text before the LDA tool ignores the word, where frequency is measured by the number of documents containing a word divided by the total number of documents in the body of text. |
|
0.8 |
Max Words | Max Words specifies how many words you want the LDA algorithm to consider, based on how frequently the words appear across all the documents. |
|
0 |
LDA Options
Name | Description | Options | Recommended Option |
---|---|---|---|
Alpha | Alpha represents the density of topics the algorithm should expect in each document. Increasing Alpha allows the algorithm to recognize a greater number of distinct topics in a document. Decreasing Alpha limits the number of topics the algorithm recognizes in each document. | Number | None |
Eta | Eta represents the density of words needed to make up a topic. Increasing Eta increases the number of words needed to identify a topic. Decreasing Eta reduces the number of words needed to identify a topic. | Number | >= 0 |
Output
The Topic Modeling tool outputs a new column for each topic. The columns represent the degree to which each topic is present in the text associated with each row.