Microsoft Azure Data Lake Store

Version:
2021.3
Last modified: September 06, 2021
Connection Type

This tool is not automatically installed with Designer. The latest version is available from the Alteryx Analytics Gallery.

Driver Details

Depending on the chosen login method, an administrator might need to configure access to Azure Data Lake and Azure Active Directory before a connection can be made using the Alteryx Azure Data Lake tools.

Type of Support

Read & Write

Release Notes
Version Description
v2.1.0
  • New and improved user interface.
  • Fixed minor issues.
  • Compatible with Alteryx Designer version 2021.2 and later.
v2.0
  • UI upgrade and improved error handling
  • Added support for Gen2 storages
  • Added support for Azure Government, China Cloud and custom endpoints
  • Shared Key authentication support
  • Public application support (own and Alteryx)
  • Multi-tenancy support
  • Excel input and output support
  • Added the ability to use custom delimiters for reading and writing .csv files
  • Compatible with Alteryx Designer version 2019.3 and later.
v1.1.0
  • Fixed end user authentication errors
  • Allowed users to specify redirect URI for end user authentication
v1.0.2
  • Update Code Page options.

  • Distinguished between encodings with same language (e.g., ‘Language’ -> ‘Language (specific code)’) and ordered encodings alphabetically.

  • Allowed user to specify encoding for CSV files on output tool.

  • Improved error message to indicate when an invalid store name is provided.

  • Improved data conversion handling to not throw a warning instead of an error when a field is missing a value.

  • Fixed error where the files/folders displayed are not refreshed after user changes store name.

  • Fixed issue where default value settings were occasionally not respected.

  • Disabled production logging to prevent permissions issues for different installations/configurations of Designer and support scheduled workflow functionality

v1.0.1
  • Fixed issue preventing packages from being installed successfully
v1.0.0
  • Initial release for Azure Data Lake File Input and Azure Data Lake File Output

 

Alteryx Tools Used to Connect

Link
Gray icon with file folder

Microsoft Azure Data Lake File Input Tool

Link
Gray icon with file graphic inside

Microsoft Azure Data Lake File Output Tool

The Azure Data Lake Tools allow you to connect to an Azure Data Lake Store resource and read/write data.
Use the Azure Data Lake (ADL) File Input tool to read data from files located in an Azure Data Lake Store (ADLS) to your Alteryx workflow.
To write a data from your Alteryx workflow to a file located in an ADLS, use the ADL File Output tool
The supported file formats are CSV, XLSX, JSON, or Avro (for the Output tool, the Append action is supported only for CSV format).
All these tools, except Shared Key, authenticate against an Azure Active Directory endpoint.

Authentication and Authorization

The Azure Data Lake endpoints for Gen1 and Gen2 storages differ, during the authentication, you need to specify which kind of storage you would like to connect to. In case you are not certain what type of storage you are using, you can ask your Azure administrator or check on your Microsoft Azure Portal.

TIPS

  • For publishing workflows to Server or AAH, use the Service-to-Service or Shared Key authentication types, so you do not have to re-upload your workflow once your Refresh Token expires.
  • Since loading the metadata can take a long time, you can disable metadata loading by selecting 'Disable Auto Configure in the Advanced User Settings (Options > User Settings > Edit User Settings > Advanced).

You need to have granted permissions to read and write data within an Azure Data Lake Store account. For more information about how these permissions are assigned and applied, see the official Microsoft documentation.

Single vs. Multi-Tenancy

Single-tenant applications are available only in the tenant they were registered in, also known as their home tenant. You or your Azure Administrator can create single-tenant Azure applications and storage under your account which you will use during authentication in Designer. Multi-tenant apps are available to users in both their home tenant and other tenants.

    End-User (Basic)

    The basic End-User authentication is the most convenient way of accessing your ADLS data in Designer. Contact your Azure Administrator to allow the public Alteryx applications in your organization’s Azure tenant. See the Microsoft documentation describing the steps

    Tenant: common
    ADLS Client ID for the Gen1 Alteryx application: 7fa1a397-27aa-40ad-b47c-a47fa9e600bd
    ADLS Client ID for the Gen2 Alteryx application: 2584cace-63ff-47cb-96d2-d153704f4d75


    After this setup, you and your colleagues can use your normal Microsoft credentials to access the ADLS data.

    End-User (Advanced)

    The advanced End-User authentication supports single- and multi-tenant authentication, and can be used with both public and private applications.
    For the Credential setup, see the instructions on Microsoft documentation.

    Authentication Configuration

    • Tenant ID: You can obtain the tenant ID from your Azure Portal, or rely on the auto-discovery mechanism in Azure by typing “common” in the Tennant ID field. In case of access to multiple tenants, you can specify the tenant ID. For more information on multi-tenancy, see the Single vs. Multi-Tenancy section.
    • Client ID: The unique identified of an Azure application. The client ID field is mandatory. 
    • Client secret: If your application is private, then it is mandatory to provide a client secret. If you are using a public application, please leave the field empty.

    Service-to-Service

    The Service-to-Service authentication is suitable for publishing workflows on Server and Hub.
    For the Credential setup, see the instructions on Microsoft documentation.

    Shared Key

    • Shared Key authentication can be used only with Gen2 storages.
    • Publishing to Server will only work for Designer and Server 2020.4 and newer versions because this authentication method was introduced starting with the 2020.4 releases.

    With an Azure storage account, Microsoft generates two access keys that can be used to authorize access to your Azure Data Lake via Shared Key authorization. You can find more information about the Shared Key an its usage on Microsoft documentation

    Azure National Clouds and Custom Endpoints

    Starting with the v2.0 release, the ADLS connectors support access to custom endpoints. The URLs for the US and China national clouds can be selected on the authentication screen of the connectors in the Authentication Authority Endpoint field.

    Application Setup

    The file storages are accessed via registered applications. The application registration is necessary for all authentication types with the exception of End-User (Basic) and Shared Key. To register the application on Azure Portal, see instructions on Microsoft Documentation Portal.

    Use Microsoft Azure Applications in Alteryx Designer

    1. Add Azure Data Lake Input or Azure Data Lake Output on the Designer canvas.
    2. Select the tool to see the Configuration panel on the right.
    3. Fill in the authentication data with ones available on http://portal.azure.com/. To navigate on the Azure Portal, refer to Microsoft Documentation.
    4. Copy Directory ID (tenant) and Application ID (client)  to the Designer
    5. (Optional) Select Use Gen1 if you want to connect to Azure Data Lake Gen1 storage.
    6. Paste Client Secret if connecting in Service-to-Service mode.
    7. Select Connect.

    Data Selection and Configuration Options

    In the Data tab, you can specify the data you would like to use:

    1. Specify the Storage Account Name. This storage needs to be the same type (Gen1, respectively Gen2), as selected on the Authentication page. 
    2. For Gen2 storages, specify File System Name.
    3. Once the storage and file system for Gen2 have been selected, you can configure the path of the file you would like to read or write. You can specify the path either by direct input in the File Path field or using the file browser. For the Azure Data Lake File Output tool, you can use the same mechanism to create a new file. 
    4. For Excel files, the Sheet name can be specified in the Sheet field located under the file browser. If left empty, the first sheet will be automatically selected. In case of new files, the sheet will be given the default name “Sheet”.

    File Formats and Configuration

    The ADLS tools support the following data formats: .csv, .avro, .json and .xlsx.

    • CSV files
      • Read
      • Write: You can overwrite or append to an existing CSV file. 

    Tip

    For compatibility with the Input and Output Data Tools, the encoding should be UTF-8 SIG.

    • JSON files
      • Read: To correctly read JSON files, they must be using UTF-8 encoding without BOM. 
      • Write: The datatype conversion when writing to JSON files has the following limitations: Decimal, Datetime, and Time cells are output as Strings.
    • Avro files
      • Read
      • Write
    • Excel files
      • Read: All data is read as V_Wstrings. 
      • Write

    Additional Details

    • If in a state without access (read/write) to a certain folder created by another account, this is due to permissions.
    • If you encounter an error that states the token may have been revoked, you must log out and then back in to the configuration panel to reauthenticate.

    Token lifetime properties are configurable by the System Administrator.

    The Azure Data Lake Explorer must grant permissions to read and write data within an Azure Data Lake Store account. For more information about how these permissions are assigned and applied, please visit the official Microsoft documentation.

    Limitations

    JSON and Avro: UTF-8 Only

    JSON and Avro are UTF-8 only.
     

    JSON: Silent Conversion Error

    For JSON, there is a silent conversion error if you try to store numbers that are too large for their datatype.

    Writing to Excel Files Limited

    Writing to Excel files is currently limited to only a full file overwrite.

    Avro Byte Field Type

    Avro files with fields of type bytes are not supported and will fail upon import.

    Output: Alteryx Float Field to Avro Type Conversion

    Alteryx workflow float field values are converted to double in the destination Avro file.

    Multiple Connectors with Different Azure Active Directory User Accounts

    The Microsoft Azure Data Lake, OneDrive, and Dynamics CRM connectors support authentication via Microsoft user credentials, like email and password. In interactive workflows, it is not currently possible to authenticate with different Microsoft user accounts across these connectors. This limitation does not impact scheduled workflows. In the case that you are authenticated with a Microsoft user account in one of these connectors and try to authenticate to another connector with a different Microsoft user account, you will see an error message pop-up. To resolve this issue, follow one of these recommendations:

    • The Azure Active Directory Administrator can grant the necessary permissions to one user account and ensure the user building the workflow has one user account that has access to the services needed in that workflow.
    • Log out of any connectors that are authenticated to a different Microsoft user account before trying to log in.
    • Avoid using end-user authentication when possible. Use service-to-service authentication in the Azure Data Lake connectors and Application login authentication in the Dynamics CRM connectors.
    Was This Page Helpful?

    Running into problems or issues with your Alteryx product? Visit the Alteryx Community or contact support. Can't submit this form? Email us.