Icon for the Topic Modeling Tool

Topic Modeling

Version:
2021.3
Last modified: September 07, 2021

Use Topic Modeling to identify and categorize topics in a body of text. Consider using the Text Pre-processing tool upstream before passing data into the Topic Modeling tool.

Tool Components

The Topic Modeling tool has 3 anchors:

  • Input anchor: Use the input anchor to connect the text data you want to analyze.
  • D anchor: Use the D anchor to pass the data you've analyzed downstream.
  • R anchor: Use the R anchor to view a report of the analysis.

Configure the Tool

  1. Add a Topic Modeling tool to the canvas.
  2. Use the anchor to connect the Topic Modeling tool to the text data you want to use in the workflow.
  3. Select the Text Field you want to analyze.
  4. Specify the Number of Topics you want to model.
  5. In the Output Options section, select the kind of output you want:
    • The Interactive Chart option generates an interactive report that includes two charts: Top-30 Most Salient Terms and Intertopic Distance Map.
    • The Word Relevance Summary option generates a static report with measures of each term's salience to the model and relevance to each topic.
  6. Run the workflow.

Resources

This tool uses latent Dirichlet allocation (LDA) to identify topics. Here are some resources about the LDA algorithm and the concepts of saliency and relevance.

Advanced Options

The Topic Modeling tool has some advanced options.

Dictionary Options

Name Description Options Recommended Option
Min Frequency Min Frequency is the minimum frequency at which a word can appear in a body of text before the Topic Modeling tool ignores the word, where frequency is measured by the number of documents containing a word divided by the total number of documents in the body of text.
  • >= 0
  • <= 0
0.01
Max Frequency Max Frequency is the maximum frequency at which a word can appear in a body of text before the Topic Modeling tool ignores the word, where frequency is measured by the number of documents containing a word divided by the total number of documents in the body of text.
  • >= 0
  • <= 0
0.8
Max Words Max Words specifies how many words you want the Topic Modeling tool's algorithm to consider, based on how frequently the words appear across all the documents.
  • >= 0
0

LDA Options

Name Description Options Recommended Option
Alpha Alpha represents the density of topics the algorithm should expect in each document. Increasing Alpha allows the algorithm to recognize a greater number of distinct topics in a document. Decreasing Alpha limits the number of topics the algorithm recognizes in each document. Number None
Eta Eta represents the density of words needed to make up a topic. Increasing Eta increases the number of words needed to identify a topic. Decreasing Eta reduces the number of words needed to identify a topic. Number >= 0

Output

The Topic Modeling tool outputs a new column for each topic. The columns represent the degree to which each topic is present in the text associated with each row.

Was This Page Helpful?

Running into problems or issues with your Alteryx product? Visit the Alteryx Community or contact support. Can't submit this form? Email us.