
Topic Modeling
Version:
Current
Last modified: January 11, 2021
Use Topic Modeling to identify and categorize topics in a body of text.
The content on this page is available in all supported languages via the language toggle on the top-right of the page.
Tool Components
The Topic Modeling tool has three anchors:
- Input anchor: Use the input anchor to connect the text data you want to analyze.
- "D" anchor: Use the output anchor to pass the data you've analyzed downstream.
- "R" anchor: Use the "R" anchor to view a report of the analysis.
Configure the Tool
- Add a Topic Modeling tool to the canvas.
- Use the anchor to connect the Topic Modeling tool to the text data you want to use in the workflow.
- Select the Text Field you want to analyze.
- Specify the Number of Topics you want to model.
- In the Output Options section, select the kind of output you want:
- The Interactive Chart option generates an interactive report that includes two charts: Top-30 Most Salient Terms and Intertopic Distance Map.
- The Word Relevance Summary option generates a static report with measures of each term's salience to the model and relevance to each topic.
- Run the workflow.
Advanced Options
The Topic Modeling tool has some advanced options.
Dictionary Options
Name | Description | Options | Recommended Option |
---|---|---|---|
Min Frequency | Min Frequency is the minimum frequency at which a word can appear in a body of text before the LDA tool ignores the word, where frequency is measured by the number of documents containing a word divided by the total number of documents in the body of text. |
|
0.01 |
Max Frequency | Max Frequency is the maximum frequency at which a word can appear in a body of text before the LDA tool ignores the word, where frequency is measured by the number of documents containing a word divided by the total number of documents in the body of text. |
|
0.8 |
Max Words | Max Words specifies how many words you want the LDA algorithm to consider, based on how frequently the words appear across all the documents. |
|
0 |
LDA Options
Name | Description | Options | Recommended Option |
---|---|---|---|
Alpha | Alpha represents the density of topics the algorithm should expect in each document. Increasing Alpha allows the algorithm to recognize a greater number of distinct topics in a document. Decreasing Alpha limits the number of topics the algorithm recognizes in each document. | Number | None |
Eta | Eta represents the density of words needed to make up a topic. Increasing Eta increases the number of words needed to identify a topic. Decreasing Eta reduces the number of words needed to identify a topic. | Number | >= 0 |
Output
The Topic Modeling tool outputs a new column for each topic. The columns represent the degree to which each topic is present in the text associated with each row.