Azure Databricks Admin Setup
After you've completed the initial Databricks workspace configuration, follow this setup guide to provision Azure Databricks workspaces for Alteryx Analytics Cloud (AAC) users.
Important
You must first configure your base storage environment to ADLS and disable ADS before setting up Databricks workspaces. Go to ADLS as Private Data Storage to learn more.
Workspace Details
Enter a unique Workspace Name under Workspace Details. The Service URL automatically populates.
Cluster for Spark Jobs
AAC uses this cluster configuration to schedule import from or publish to Databricks via Spark Jobs. AAC creates a new cluster based on these details:
Select a Cluster Policy. This defines the limits on the attributes available during cluster creation. For an unrestricted policy (default), leave this option blank. Refer to the later Cluster Policy Requirements section for details on these options.
Select a Driver Node Type.
Select a Worker Node Type. These are the Azure instance types to use for launching cluster nodes. You can also select a pool if you have one available rather than standalone instances. To reduce workflow job run latency, use pools with a reasonable number of warm, idle instances.
Enter the Minimum Workers and Maximum Workers for the Databricks job cluster. Every cluster starts with the minimum number of workers provisioned. More workers dynamically add to the cluster if required based on workload, up to the maximum.
Cluster Policy Requirements
With Databricks, you can create policies with a specific set of restrictions on cluster configurations. Select one of these policies from the Cluster Policy dropdown. You can choose between Unrestricted Policy or Other Policy.
Unrestricted Policy (Recommended): The unrestricted policy grants you the freedom to define any cluster configuration you desire without limitations. This is the default policy.
Other Policy: If you select a policy from the dropdown, ensure that the selected cluster policy permits the chosen cluster configuration. Additionally, AAC provides some default configurations while creating a job cluster.
Note
The default configuration provided might change with future releases. Therefore, it is recommended to not define any default configuration in the cluster policy.
During Databricks workspace creation, AAC performs basic cluster policy validation, but the actual validation takes place during job execution. If the configuration doesn’t match the cluster policy, the Databricks job will fail with a validation error indicating a configuration mismatch.
Cluster for Photon Jobs
This is a long-running cluster required to browse, preview, and import Databricks tables as datasets in AAC. The cluster must meet these requirements to show up as an option:
Run in shared-access mode.
Use Databricks runtime version 12.2 LTS.
Once you’ve determined your Photon cluster, select Save.
You've now configured your Databricks workspace for use in AAC.
To Edit or Delete your Databricks workspace, select the 3-dot menu next to your workspace.
Use Databricks for Workflow Execution
After you’ve configured at least 1 Databricks workspace, you can enable the Databricks runtime for workflows in Admin Console > Settings > Job execution > Spark Engine. This replaces the scalable runtime used for executing workflows from EMR Spark to Databricks.
Once you’ve switched the engine, Databricks becomes available as a workflow job run option for users who’ve registered a personal access token against at least 1 Databricks workspace.
When you run a full workflow, AAC launches a dedicated job cluster using the Databricks configuration defined by the admin (for example the driver/worker node type and auto-scaling configuration). Every workflow job run gets a dedicated cluster. Workflow job run clusters only last for the duration of the run and then automatically terminate afterward. AAC never shares these clusters between users or different workflow runs.