Named Entity Recognition Tool
Use Named Entity Recognition to identify entities, like people, places, and things, in text. You can use our predefined set of entities or your own custom entities.
Tool Components
The Named Entity Recognition tool has 4 anchors.
-
The D input anchor: Connect the text data you want to identify entities in.
-
The E input anchor (optional): Connect the data with the custom entities you want to identify. This data has to contain the custom entity names and the labels you want to use to train the model to identify the custom entities.
-
The D output anchor: Output new columns of data that display information about the entities in your data.
-
The M output anchor: Output the model object so you can reuse it later.
Configure the Tool
To use this tool...
-
Drag the tool onto the canvas.
-
Connect the D anchor to text data with entities you want to identity.
-
Specify the Language of the text data.
-
Select the Column with Text.
-
Run the workflow
Advanced Configuration
If you want to use your own entities to train the model, select Train with New Entities.
Important
To train with new entities, provide them in data connected to the E anchor.
Match Entities
-
Select the Column with Entities you want to identify in your data. These entities are custom entities.
-
Select the Column with Labels the tool can use while training the model to identify your custom entities.
-
Check the box if you want your model to be Case Sensitive.
Train Model
Epochs
An epoch is a single pass (forward and backward) of all data in a training set through a neural network. Epochs are related to iterations, but not the same. An iteration is a single pass of all data in a batch of a training set.
Increasing the number of epochs allows the model to learn from the training set for a longer time. But doing that also increases the computational expense.
You can increase the number of epochs to help reduce error in the model. But at some point, the amount of error reduction might not be worth the added computational expense. Also, increasing the number of epochs too much can cause problems of overfitting, while not using enough epochs can cause problems of underfitting.
By default, the tool uses 10 epochs.
Early Stopping
Early stopping is a method that tells an iterative machine learning method, like the convolutional neural network used in the Named Entity Recognition tool, when to stop learning. Named Entity Recognition uses F1 as the metric for early stopping.
Early stopping is helpful when your model has problems of overfitting. Overfitting occurs when your model learns by memorizing the answers, rather than identifying the underlying patterns in your data. You can also use early stopping to prevent the algorithm from running through unnecessary epochs.
Use early stopping if you're concerned that your model might overfit your data or that additional epochs won't improve your model.
By default, the tool uses early stopping.
Batch Size
A batch is a subset of the entire training dataset.
Decreasing the batch size allows you to stagger how much data passes through a neural network at any given time. Doing that allows you to train models without taking up as much memory as you would if passing all data through a neural network at once. Sometimes batching can speed up training. But breaking your data into batches might also increase error in the model.
Separate your data into batches when your machine is unable to process all the data at once, or if you want to reduce training time.
By default, the tool uses a batch size of 32.