Microsoft Azure Data Lake Store
Alteryx Tool. The latest version is available from the Alteryx Community.
Depending on the chosen login method, an administrator might need to configure access to Azure Data Lake and Azure Active Directory before a connection can be made using the Alteryx Azure Data Lake tools.
|Type of Support||
Read & Write
Alteryx Tools Used to Connect
The Azure Data Lake Tools allow you to connect to an Azure Data Lake Store resource and read/write data.
Use the Azure Data Lake (ADL) File Input tool to read data from files located in an Azure Data Lake Store (ADLS) to your Alteryx workflow.
To write a data from your Alteryx workflow to a file located in an ADLS, use the ADL File Output tool.
The supported file formats are CSV, XLSX, JSON, or Avro (for the Output tool, the Append action is supported only for CSV format).
All these tools, except Shared Key, authenticate against an Azure Active Directory endpoint.
Data Connection Manager
The Azure Data Lake Store tools version 2.3.0 and later support Data Connection Manager for an easier and a safer storage of your credentials. Create a connection to Azure Data Lake Store and easily reuse the stored credentials in workflows that include the Azure Data Lake Store tools. Alteryx Server requires stored credentials to successfully run the published workflows.
Enable AMP Engine
Make sure you have the AMP engine enabled for the workflows which contain the Azure Data Lake Store tools version 2.3.0 and later.
Authentication and Authorization
The Azure Data Lake endpoints for Gen1 and Gen2 storages differ, during the authentication, you need to specify which kind of storage you would like to connect to. In case you are not certain what type of storage you are using, you can ask your Azure administrator or check on your Microsoft Azure Portal.
- For publishing workflows to Server or AAH, use the Service-to-Service or Shared Key authentication types, so you do not have to re-upload your workflow once your Refresh Token expires.
- Since loading the metadata can take a long time, you can disable metadata loading by selecting 'Disable Auto Configure in the Advanced User Settings (Options > User Settings > Edit User Settings > Advanced).
You need to have granted permissions to read and write data within an Azure Data Lake Store account. For more information about how these permissions are assigned and applied, see the official Microsoft documentation.
Single vs. Multi-Tenancy
Single-tenant applications are available only in the tenant they were registered in, also known as their home tenant. You or your Azure Administrator can create single-tenant Azure applications and storage under your account which you will use during authentication in Designer. Multi-tenant apps are available to users in both their home tenant and other tenants.
The basic End-User authentication is the most convenient way of accessing your ADLS data in Designer. Contact your Azure Administrator to allow the public Alteryx applications in your organization’s Azure tenant. See the Microsoft documentation describing the steps.
ADLS Client ID for the Gen1 Alteryx application: 7fa1a397-27aa-40ad-b47c-a47fa9e600bd
ADLS Client ID for the Gen2 Alteryx application: 2584cace-63ff-47cb-96d2-d153704f4d75
After this setup, you and your colleagues can use your normal Microsoft credentials to access the ADLS data.
The advanced End-User authentication supports single- and multi-tenant authentication, and can be used with both public and private applications.
For the Credential setup, see the instructions on Microsoft documentation.
- Tenant ID: You can obtain the tenant ID from your Azure Portal, or rely on the auto-discovery mechanism in Azure by typing “common” in the Tennant ID field. In case of access to multiple tenants, you can specify the tenant ID. For more information on multi-tenancy, see the Single vs. Multi-Tenancy section.
- Client ID: The unique identified of an Azure application. The client ID field is mandatory.
- Client secret: If your application is private, then it is mandatory to provide a client secret. If you are using a public application, please leave the field empty.
The Service-to-Service authentication is suitable for publishing workflows on Server and Hub.
For the Credential setup, see the instructions on Microsoft documentation.
- Shared Key authentication can be used only with Gen2 storages.
- Publishing to Server will only work for Designer and Server 2020.4 and newer versions because this authentication method was introduced starting with the 2020.4 releases.
With an Azure storage account, Microsoft generates two access keys that can be used to authorize access to your Azure Data Lake via Shared Key authorization. You can find more information about the Shared Key an its usage on Microsoft documentation.
Azure National Clouds and Custom Endpoints
Starting with the v2.0 release, the ADLS connectors support access to custom endpoints. The URLs for the US and China national clouds can be selected on the authentication screen of the connectors in the Authentication Authority Endpoint field.
The file storages are accessed via registered applications. The application registration is necessary for all authentication types with the exception of End-User (Basic) and Shared Key. To register the application on Azure Portal, see instructions on Microsoft Documentation Portal.
Use Microsoft Azure Applications in Alteryx Designer
- Add Azure Data Lake Input or Azure Data Lake Output on the Designer canvas.
- Select the tool to see the Configuration panel on the right.
- Fill in the authentication data with ones available on http://portal.azure.com/. To navigate on the Azure Portal, refer to Microsoft Documentation.
- Copy Directory ID (tenant) and Application ID (client) to the Designer
- (Optional) Select Use Gen1 if you want to connect to Azure Data Lake Gen1 storage.
- Paste Client Secret if connecting in Service-to-Service mode.
- Select Connect.
Data Selection and Configuration Options
In the Data tab, you can specify the data you would like to use:
- Specify the Storage Account Name. This storage needs to be the same type (Gen1, respectively Gen2), as selected on the Authentication page.
- For Gen2 storages, specify File System Name.
- Once the storage and file system for Gen2 have been selected, you can configure the path of the file you would like to read or write. You can specify the path either by direct input in the File Path field or using the file browser. For the Azure Data Lake File Output tool, you can use the same mechanism to create a new file.
- For Excel files, the Sheet name can be specified in the Sheet field located under the file browser. If left empty, the first sheet will be automatically selected. In case of new files, the sheet will be given the default name “Sheet”.
File Formats and Configuration
The ADLS tools support the following data formats: .csv, .avro, .json and .xlsx.
- CSV files
- Write: You can overwrite or append to an existing CSV file.
For compatibility with the Input and Output Data Tools, the encoding should be UTF-8 SIG.
- JSON files
- Read: To correctly read JSON files, they must be using UTF-8 encoding without BOM.
- Write: The datatype conversion when writing to JSON files has the following limitations: Decimal, Datetime, and Time cells are output as Strings.
- Avro files
- Excel files
- Read: All data is read as V_Wstrings.
- If in a state without access (read/write) to a certain folder created by another account, this is due to permissions.
- If you encounter an error that states the token may have been revoked, you must log out and then back in to the configuration panel to reauthenticate.
Token lifetime properties are configurable by the System Administrator.
The Azure Data Lake Explorer must grant permissions to read and write data within an Azure Data Lake Store account. For more information about how these permissions are assigned and applied, please visit the official Microsoft documentation.
JSON and Avro are UTF-8 only.
For JSON, there is a silent conversion error if you try to store numbers that are too large for their datatype.
Writing to Excel files is currently limited to only a full file overwrite.
Avro files with fields of type bytes are not supported and will fail upon import.
Alteryx workflow float field values are converted to double in the destination Avro file.
The Microsoft Azure Data Lake, OneDrive, and Dynamics CRM connectors support authentication via Microsoft user credentials, like email and password. In interactive workflows, it is not currently possible to authenticate with different Microsoft user accounts across these connectors. This limitation does not impact scheduled workflows. In the case that you are authenticated with a Microsoft user account in one of these connectors and try to authenticate to another connector with a different Microsoft user account, you will see an error message pop-up. To resolve this issue, follow one of these recommendations:
- The Azure Active Directory Administrator can grant the necessary permissions to one user account and ensure the user building the workflow has one user account that has access to the services needed in that workflow.
- Log out of any connectors that are authenticated to a different Microsoft user account before trying to log in.
- Avoid using end-user authentication when possible. Use service-to-service authentication in the Azure Data Lake connectors and Application login authentication in the Dynamics CRM connectors.