Topic Modeling Tool

Use Topic Modeling to identify and categorize topics in a body of text.

Tool Components

The Topic Modeling tool has three anchors:

  • Input anchor: Use the input anchor to connect the text data you want to analyze.
  • "D" anchor: Use the output anchor to pass the data you've analyzed downstream.
  • "R" anchor: Use the "R" anchor to view a report of the analysis.

Configure the Tool

  1. Add a Topic Modeling tool to the canvas.
  2. Use the anchors to connect the Topic Modeling tool to the text data you want to use in the workflow.
  3. Select the Text Field you want to analyze.
  4. Specify the Number of Topics you want to model.
  5. In the Output Options section, select the kind of output you want:
  6. The Interactive Output option generates an interactive report that includes two charts: Top-30 Most Salient Terms and Intertopic Distance Map.
  7. The Word Relevance Output option generates a static report with measures of each term's salience to the model and relevance to each topic.
  8. Run the workflow.

Here are some resources about the concepts saliency and relevance as they relate to topic modeling.

Advanced Options

The Topic Modeling tool has some advanced options.

Dictionary Options

Name Description Options Recommended Option
Min Frequency Min Frequency is the minimum frequency at which a word can appear in a body of text before the LDA tool ignores the word, where frequency is measured by the number of documents containing a word divided by the total number of documents in the body of text.
  • >= 0
  • <= 0
0.01
Max Frequency Max Frequency is the maximum frequency at which a word can appear in a body of text before the LDA tool ignores the word, where frequency is measured by the number of documents containing a word divided by the total number of documents in the body of text.
  • >= 0
  • <= 0
0.8
Max Words Max Words specifies how many words you want the LDA algorithm to consider, based on how frequently the words appear across all the documents.
  • >= 0
0

LDA Options

Name Description Options Recommended Option
Alpha Alpha represents the density of topics the algorithm should expect in each document. Increasing Alpha allows the algorithm to recognize a greater number of distinct topics in a document. Decreasing Alpha limits the number of topics the algorithm recognizes in each document. Number None
Eta Eta represents the density of words needed to make up a topic. Increasing Eta increases the number of words needed to identify a topic. Decreasing Eta reduces the number of words needed to identify a topic. Number >= 0

Output

The Topic Modeling tool outputs a new column for each topic. The columns represent the degree to which each topic is present in the text associated with each row.