Part-of-Speech Tagger

Version:
2022.1
Last modified: June 24, 2022

Use the Part-of-Speech Tagger tool to identify parts of speech like nouns, verbs, and adjectives from text. Part-of-speech tagging is a common processing step to cleanse, prepare, and enhance data for Natural Language Processing applications. The Part-of-Speech Tagger tool leverages the part-of-speech capabilities in the spaCy package. Part-of-speech tagging accuracy for English is about 97%, and varies slightly for the other supported languages.

Language support

The Part-of-Speech Tagger tool supports English, French, German, Italian, Portuguese, and Spanish. The part-of-speech output tags are only available in English.

Tool Components

The Part-of-Speech Tagger tool has 2 anchors:

  • Input anchor: Use the input anchor to connect the text data you want to analyze.
  • Output anchor: Use the output anchor to pass the tagged text data downstream.

Configure the Tool

  1. Add a Part-of-Speech Tagger tool to the canvas.
  2. Use the anchors to connect the Part-of-Speech Tagger tool to the text data you want to use in the workflow.
  3. Select the Language of the text data.
  4. Select the Column with Text you want to analyze.
  5. Run the workflow.

Output

The Part-of-Speech Tagger tool outputs the incoming columns in addition to 2 columns:

  • part_of_speech_tags: This column contains a JSON output with a list of part-of-speech tags and descriptions. Each token (word) in a corpus (where each row in the input text column contains a corpus) contains the values listed below within the JSON output.
    • text: The tagged word.
    • part_of_speech: The course-grained part of speech tag.
    • part_of_speech_description: The course-grained part of speech tag description.
    • fine_grained_tag: The fine-grained part of speech tag.
    • fine_grained_tag_description: The fine-grained part of speech tag description.
    • dependency: The part of speech dependency.
    • dependency_description: The part of speech dependency description.
    • character_index: The index of the 1st character of the word in the corpus. The index starts at 0.
    • word_index: The index of the word in the corpus. The index starts at 0.
    • text_length: The length of the word.
  • dependency_diagram: This column contains an HTML object of the displaCy tagger dependency diagram that is viewable via the Browse tool.

How to Parse JSON Output

To transform the JSON output to tabular data, use a combination of the JSON Parse, Text To Columns, and Cross Tab tools in this example flow:

  1. Pass the Part-of-Speech Tagger tool output to the JSON Parse tool input.
  2. Select the part-of-speech column under JSON Field.
  3. Select Output values into single string field.
  4. Pass the JSON Parse tool output to the Text To Columns input.
  5. Select the JSON name column under Column to split and set Delimiters to a period (.).
  6. Select Split to columns and set Number of columns to 3.
  7. Pass the Text to Columns tool output to the Cross Tab tool input.
  8. Cross Tab tool configuration:
    1. Group data by these values: Select the column name containing your original text data and the second split JSON name column (by default this is JSON_Name2).
    2. Change Column Headers: Select the third split JSON name column (by default this is JSON_Name3).
    3. Values for New Columns: Select the JSON_ValueString.
    4. Method for Aggregating Values: Select Concatenate.
  9. Run your workflow. The Cross Tab tool output now contains the tabular form of the Part-of-Speech Tagger tool output.

Dependency Diagram

Below is a sample dependency diagram for the sentence, "This is a sentence." The coarse-grained part-of-speech tag populates below each word. The description for the coarse-grained part-of-speech tag is in the JSON output under "part_of_speech_description." Each arrow indicates the syntactic dependency between two words. The description for each dependency is in the JSON output under "dependency_description."

Dependency Diagram Example

Coarse-grained part-of-speech tag descriptions for the dependency diagram above:

  • AUX: Auxiliary
  • DET: Determiner
  • NOUN: Noun

Dependency descriptions for the dependency diagram above:

  • nsubj: Nominal Subject
  • attr: Attribute
  • det: Determiner

FAQ

How should I use the dependency diagram?

The diagram is a visual to help the user see the part-of-speech tags. The diagram also depicts how words are associated. At this stage, the dependencies are only part of the visual and not included in the output.

Does the dependency diagram work with the Reporting tools?

At this time, the Part-of-Speech Tagger doesn’t work with the Reporting tools. For example, you can’t save the dependency diagram as an image.

Why does the Part-of-Speech Tagger tool take several seconds to run?

The model is cached on the first run and therefore the first run will be slower. For the same text, the workflows will be faster on subsequent runs. Note, the cache does expire and it's possible that the cycle may start over again.

Was This Page Helpful?

Running into problems or issues with your Alteryx product? Visit the Alteryx Community or contact support. Can't submit this form? Email us.