Use Data Health to check on the health of your data. You can use the tool with the feature-engineering tools, like Build Features and Feature Types, to improve the health of your data. To determine how healthy your data is, the tool analyzes missing values, outliers, and sparsity.
The Data Health tool has 4 anchors.
- Input anchor: The input anchor connects to the data whose health you want to check.
- S output anchor: The S output anchor passes each column's associated data-health scores downstream.
- R output anchor: The R output anchor passes a comprehensive report about the data's health downstream. You can view the report using a Browse tool.
- O output anchor: The O output anchor passes the outliers from the data downstream.
Configure the Tool
To use the Data Health tool, you have to configure options for what scale to use for scoring the health of the data and whether to include recommendations for how to improve the health of the data in the output.
To use this tool, the upstream data has to have at least 30 rows.
From the dropdown, choose whether you want to output scores on a percentage scale (0–100%) or a normalized scale (0–1). The tool generates the score by assessing missing values, outliers, and sparsity.
2. Output Recommendations Based On Score
Check the box if you want the tool to give you recommendations for how to improve the health of the data, based on the score it receives. The recommendation appears as an additional column in the dataset.