Skip to main content

Microsoft Azure Data Lake Store

Connection Type

Alteryx Tool. The latest version is available from the Alteryx Marketplace.

Driver Details

Depending on the chosen login method, an administrator might need to configure access to Azure Data Lake and Azure Active Directory before a connection can be made using the Alteryx Azure Data Lake tools.

Type of Support

Read & Write

Version

Description

v2.5.0

  • Compatible with Alteryx Designer and Server 2021.4.2 Patch 6, 2022.1 Patch 4 and later.

  • Requires AMP engine.

  • Alteryx Server requires stored credentials in DCM to run workflows.

  • Added an option to download files locally for later processing.

  • Removed support for the Gen1 authentication.

  • Fixed issues related to DCM authentication. (TPM-2462, TPM-2176)

  • Fixed issues where ADLS Input couldn’t properly read Null values. (TPM-3098, TPM-2659, TPM-2229)

  • Fixed issues where the workflow couldn’t run if the ADLS tools were set to start processing data on line other than 1. (TPM-2609, TPM-2277)

  • Fixed issue where ADLS Input ignored other data after processing a single loop of chunk. (TPM-2781)

  • Fixed issue where ADLS Input returned a Malformed CSV file error. (TPM-2175)

  • Fixed issue where ADLS Output changed date values. (TPM-1928)

v2.4.3

  • Compatible with Alteryx Designer and Server 2021.4.2 Patch 4, 2022.1 Patch 2 and later.

  • Compatible with AMP engine only.

  • Added support for DCM.

    • DCM is required for running this connector on Alteryx Server.

  • FIPS capable.

  • SSL/TLS validation is now performed against Windows Certificate Store.

  • Improved write performance and stability.

  • Added option to tweak memory consumption and performance by modifying variable upload chunk size.

  • Fixed issue where XLSX data were truncated when cells contained more than 255 characters.

  • Fixed issue with non-Unicode character in SSL certificates.

  • Fixed issue with ADLS Output tool adding double quote characters despite the Quote Character field being set to None. (TPM-1964)

  • Security improvements.

v2.2.0

  • Allows user to specify custom CSV delimiter in Output.

  • Fixed potential XXE vulnerability.

v2.1.0

  • New and improved user interface.

  • Fixed minor issues.

  • Compatible with Alteryx Designer and Server version 2021.2 and later.

v2.0

  • UI upgrade and improved error handling

  • Added support for Gen2 storages

  • Added support for Azure Government, China Cloud and custom endpoints

  • Shared Key authentication support

  • Public application support (own and Alteryx)

  • Multi-tenancy support

  • Excel input and output support

  • Added the ability to use custom delimiters for reading and writing .csv files

  • Compatible with Alteryx Designer and Server version 2019.3 and later.

v1.1.0

  • Fixed end user authentication errors

  • Allowed users to specify redirect URI for end user authentication

v1.0.2

  • Update Code Page options.

  • Distinguished between encodings with same language (e.g., ‘Language’ -> ‘Language (specific code)’) and ordered encodings alphabetically.

  • Allowed user to specify encoding for CSV files on output tool.

  • Improved error message to indicate when an invalid store name is provided.

  • Improved data conversion handling to not throw a warning instead of an error when a field is missing a value.

  • Fixed error where the files/folders displayed are not refreshed after user changes store name.

  • Fixed issue where default value settings were occasionally not respected.

  • Disabled production logging to prevent permissions issues for different installations/configurations of Designer and support scheduled workflow functionality

v1.0.1

  • Fixed issue preventing packages from being installed successfully

v1.0.0

  • Initial release for Azure Data Lake File Input and Azure Data Lake File Output

Alteryx Tools Used to Connect

Data Connection Manager

The Azure Data Lake Store tools version 2.3.0 and later support Data Connection Manager for an easier and a safer storage of your credentials. Create a connection to Azure Data Lake Store and easily reuse the stored credentials in workflows that include the Azure Data Lake Store tools. Alteryx Server requires stored credentials to successfully run the published workflows.

Enable AMP Engine

Make sure you have the AMP engine enabled for the workflows which contain the Azure Data Lake Store tools version 2.3.0 and later.

Authentication and Authorization

The Azure Data Lake endpoints for Gen1 and Gen2 storages differ, during the authentication, you need to specify which kind of storage you would like to connect to. In case you are not certain what type of storage you are using, you can ask your Azure administrator or check on your Microsoft Azure Portal.

TIPS

  • For publishing workflows to Server or AAH, use the Service-to-Service or Shared Key authentication types, so you do not have to re-upload your workflow once your Refresh Token expires.

  • Since loading the metadata can take a long time, you can disable metadata loading by selecting 'Disable Auto Configure in the Advanced User Settings (Options > User Settings > Edit User Settings > Advanced).

You need to have granted permissions to read and write data within an Azure Data Lake Store account. For more information about how these permissions are assigned and applied, see the official Microsoft documentation.

Single vs. Multi-Tenancy

Single-tenant applications are available only in the tenant they were registered in, also known as their home tenant. You or your Azure Administrator can create single-tenant Azure applications and storage under your account which you will use during authentication in Designer. Multi-tenant apps are available to users in both their home tenant and other tenants.

End-User (Basic)

The basic End-User authentication is the most convenient way of accessing your ADLS data in Designer. Contact your Azure Administrator to allow the public Alteryx applications in your organization’s Azure tenant. See the Microsoft documentation describing the steps.

Tenant: Common

ADLS Client ID for the Gen2 Alteryx application: 2584cace-63ff-47cb-96d2-d153704f4d75

After this setup, you and your colleagues can use your normal Microsoft credentials to access the ADLS data.

End-User (Advanced)

The advanced End-User authentication supports single- and multi-tenant authentication, and can be used with both public and private applications.

For the Credential setup, see the instructions on Microsoft documentation.

Authentication Configuration

  • Tenant ID: You can obtain the tenant ID from your Azure Portal, or rely on the auto-discovery mechanism in Azure by typing “common” in the Tennant ID field. In case of access to multiple tenants, you can specify the tenant ID. For more information on multi-tenancy, see the Single vs. Multi-Tenancy section.

  • Client ID: The unique identified of an Azure application. The client ID field is mandatory.

  • Client secret: If your application is private, then it is mandatory to provide a client secret. If you are using a public application, please leave the field empty.

Service-to-Service

The Service-to-Service authentication is suitable for publishing workflows on Server and Hub.

For the Credential setup, see the instructions on Microsoft documentation.

Shared Key

Note

  • Shared Key authentication can be used only with Gen2 storages.

  • Publishing to Server will only work for Designer and Server 2020.4 and newer versions because this authentication method was introduced starting with the 2020.4 releases.

With an Azure storage account, Microsoft generates two access keys that can be used to authorize access to your Azure Data Lake via Shared Key authorization. You can find more information about the Shared Key an its usage on Microsoft documentation.

Azure National Clouds and Custom Endpoints

Starting with the v2.0 release, the ADLS connectors support access to custom endpoints. The URLs for the US and China national clouds can be selected on the authentication screen of the connectors in the Authentication Authority Endpoint field.

Custom Microsoft Azure API Application Setup

To set up a custom API application for this tool, see our guide.Set Up Microsoft Azure API Application

Data Selection and Configuration Options

In the Data tab, you can specify the data you would like to use:

  1. Specify the Storage Account Name. This storage needs to be the same type (Gen1, respectively Gen2), as selected on the Authentication page.

  2. For Gen2 storages, specify File System Name.

  3. Once the storage and file system for Gen2 have been selected, you can configure the path of the file you would like to read or write. You can specify the path either by direct input in the File Path field or using the file browser. For the Azure Data Lake File Output tool, you can use the same mechanism to create a new file.

  4. For Excel files, the Sheet name can be specified in the Sheet field located under the file browser. If left empty, the first sheet will be automatically selected. In case of new files, the sheet will be given the default name “Sheet”.

File Formats and Configuration

The ADLS tools support the following data formats: .csv, .avro, .json and .xlsx.

  • CSV files

    • Read

    • Write: You can overwrite or append to an existing CSV file.

Tip

For compatibility with the Input and Output Data Tools, the encoding should be UTF-8 SIG.

  • JSON files

    • Read: To correctly read JSON files, they must be using UTF-8 encoding without BOM.

    • Write: The datatype conversion when writing to JSON files has the following limitations: Decimal, Datetime, and Time cells are output as Strings.

  • Avro files

    • Read

    • Write

  • Excel files

    • Read: All data is read as V_Wstrings.

    • Write

Additional Details

  • If in a state without access (read/write) to a certain folder created by another account, this is due to permissions.

  • If you encounter an error that states the token may have been revoked, you must log out and then back in to the configuration panel to reauthenticate.

Note

Token lifetime properties are configurable by the System Administrator.

The Azure Data Lake Explorer must grant permissions to read and write data within an Azure Data Lake Store account. For more information about how these permissions are assigned and applied, please visit the official Microsoft documentation.

Limitations

As of the Azure Data Lake Store tools version 2.5.0, the Gen1 authentication isn’t supported.

JSON and Avro are UTF-8 only.

For JSON, there is a silent conversion error if you try to store numbers that are too large for their datatype.

Writing to Excel files is currently limited to only a full file overwrite.

Avro files with fields of type bytes are not supported and will fail upon import.

Alteryx workflow float field values are converted to double in the destination Avro file.

The Microsoft Azure Data Lake, OneDrive, and Dynamics CRM connectors support authentication via Microsoft user credentials, like email and password. In interactive workflows, it is not currently possible to authenticate with different Microsoft user accounts across these connectors. This limitation does not impact scheduled workflows. In the case that you are authenticated with a Microsoft user account in one of these connectors and try to authenticate to another connector with a different Microsoft user account, you will see an error message pop-up. To resolve this issue, follow one of these recommendations:

  • The Azure Active Directory Administrator can grant the necessary permissions to one user account and ensure the user building the workflow has one user account that has access to the services needed in that workflow.

  • Log out of any connectors that are authenticated to a different Microsoft user account before trying to log in.

  • Avoid using end-user authentication when possible. Use service-to-service authentication in the Azure Data Lake connectors and Application login authentication in the Dynamics CRM connectors.

Desktop Automation (Scheduler) is not supported by this connector.