- Alteryx
- Live Query
- Databricks Tools
Databricks Tools
To start building a workflow, drag a tool from the Tool Palette onto the workflow canvas. To connect tools, select an output anchor and drag the connector arrow to the next tool's input anchor.
Use this page to view all Live Query for Databricks tools. Tools are grouped according to their tool categories.
In/Out
Provide inputs and outputs for workflows.
Item | Description |
|---|---|
Use DateTimeNow to input the current date and time at workflow runtime in the format you choose. | |
Use Input Data to connect to a table to pull data into your workflow. | |
Use Output Data to write results of a workflow to supported file types or data sources. | |
Use Text Input to manually enter text to create small data files for input. This can be useful in testing and creating Lookup tables while you build your workflow. |
Preparation
Prepare data for downstream analysis.
Item | Description |
|---|---|
Use Data Cleansing to fix common data quality issues. You can replace null values, remove punctuation, modify capitalization, and more. | |
Use Filter to select data using a condition. | |
Use Formula to create new columns, update columns, and use 1 or more expressions to perform a variety of calculations and operations. | |
Use Multi-Column Formula to create or update multiple columns using a single expression. | |
Use Random % Sample to return an expected number of rows that result in a random sample of the incoming data stream. | |
Use Row ID to create a new column in the data and assign a unique identifier, which increments sequentially for each row in the data. | |
Use Sample to limit the data stream to a specified number, percentage, or random set of rows. In addition, the Sample tool applies the selected configuration to the columns you want to group by. | |
Use Select to include, exclude, and reorder the columns of data that pass through your workflow. | |
Use Select Rows to return rows and ranges of rows that are specified, including discontinuous ranges of rows. This tool is useful for troubleshooting and sampling. | |
Use Sort to arrange the rows in a table in alphanumeric order based on the values of the specified data fields. | |
Use Tile to assign a value (tile) based on ranges in the data. The tool does this based on the user specifying 1 of 3 methods. |
Join
Join 2 or more streams of data by appending data to wide/long schema.
Item | Description |
|---|---|
Use Append Columns to append every row from a source dataset to every row of a target dataset. This operation is also known as a cross join. | |
Use Join to combine 2 inputs based on common columns between the 2 tables. You can also join 2 data streams based on row position. | |
Use Join Multiple to combine 2 or more inputs based on a commonality between the input tables. By default, the tool outputs a full outer join. | |
Use Union to combine 2 or more datasets on column names or positions. |
Parse
Separate data values into a standard table schema.
Item | Description |
|---|---|
Use DateTime to transform date-time data to and from a variety of formats, including both expression-friendly and human-readable formats. | |
Use Text To Columns to take the text in 1 column and split the string value into multiple separate columns or rows, based on a 1 or more delimiters. |
Transform
Summarize or rearrange data.
Item | Description |
|---|---|
Use Arrange to manually transpose and rearrange your columns for presentation purposes. Data is transformed so that each row is turned into multiple rows, and you can create new columns using column description data. | |
Use Count Rows to return a count of the number of rows passing through the tool. Use this tool when you want to report on the resulting row count of a process. It even returns a count of zero which a Summarize tool does not do. | |
Use Cross Tab to pivot the orientation of data in a table by moving vertical columns onto a horizontal axis and summarizing data where specified. | |
Enables dynamic, sliding-window style aggregations—ideal for time-series, performance monitoring, and any analysis requiring context over sequential rows of data. | |
Use Running Total to calculate a cumulative sum per row in a dataset. | |
Use Summarize to perform various actions (functions and calculations) on your data. | |
Use Weighted Average to calculate the weighted average of an incoming data column. |
Developer
Create custom functions for your workflows.
Item | Description |
|---|---|
Use Dynamic Rename to rename columns in upstream data. Use this tool to rename a pattern in the column headers, like removing a prefix or replacing underscores with spaces. | |
The SQL Transform tool enables you to write and execute custom SQL queries directly within your workflow. |