Use Topic Modeling to identify and categorize topics in a body of text.
The Topic Modeling tool has 3 anchors:
- Input anchor: Use the input anchor to connect the text data you want to analyze.
- "D" anchor: Use the "D" anchor to pass the data you've analyzed downstream.
- "R" anchor: Use the "R" anchor to view a report of the analysis.
Configure the Tool
- Add a Topic Modeling tool to the canvas.
- Use the anchor to connect the Topic Modeling tool to the text data you want to use in the workflow.
- Select the Text Field you want to analyze.
- Specify the Number of Topics you want to model.
- In the Output Options section, select the kind of output you want:
- The Interactive Chart option generates an interactive report that includes two charts: Top-30 Most Salient Terms and Intertopic Distance Map.
- The Word Relevance Summary option generates a static report with measures of each term's salience to the model and relevance to each topic.
- Run the workflow.
The Topic Modeling tool has some advanced options.
|Min Frequency||Min Frequency is the minimum frequency at which a word can appear in a body of text before the LDA tool ignores the word, where frequency is measured by the number of documents containing a word divided by the total number of documents in the body of text.||
|Max Frequency||Max Frequency is the maximum frequency at which a word can appear in a body of text before the LDA tool ignores the word, where frequency is measured by the number of documents containing a word divided by the total number of documents in the body of text.||
|Max Words||Max Words specifies how many words you want the LDA algorithm to consider, based on how frequently the words appear across all the documents.||
|Alpha||Alpha represents the density of topics the algorithm should expect in each document. Increasing Alpha allows the algorithm to recognize a greater number of distinct topics in a document. Decreasing Alpha limits the number of topics the algorithm recognizes in each document.||Number||None|
|Eta||Eta represents the density of words needed to make up a topic. Increasing Eta increases the number of words needed to identify a topic. Decreasing Eta reduces the number of words needed to identify a topic.||Number||>= 0|
The Topic Modeling tool outputs a new column for each topic. The columns represent the degree to which each topic is present in the text associated with each row.