Apache Spark ODBC

Version:
2021.3
Last modified: August 11, 2021
Connection Type

ODBC (64-bit)

Driver Configuration Requirements

For optimal performance, you must enable the Fast SQLPrepare option within the driver Advanced Options to allow Alteryx to retrieve metadata without running a query.

Driver Details

In-Database processing requires 64-bit database drivers.

Type of Support

Read & Write, In-Database

Validated On

Database Version: 2.3.1.3.0.1.0-187
ODBC Client Version: 2.06.16.1019

For more information about the Simba Spark ODBC driver, see the Installation and Configuration Guide on Simba portal.

Alteryx Tools Used to Connect

    Standard Workflow Processing

    Link
    Input Data Tool Icon

    Input Data Tool

    Link
    Output Data Tool Icon

    Output Data Tool

    In-database Workflow Processing

    Link
    Blue icon with database being plugged in.

    Connect In-DB Tool

    Link
    Blue icon with a stream-like object flowing into a database.

    Data Stream In Tool

    To use the Apache Spark ODBC, you must have Apache Spark SQL enabled. Not all Hadoop distributions support Apache Spark. If you are unable to connect using Apache Spark ODBC, contact your Hadoop vendor for instructions on how to set up the Apache Spark server correctly.

    If you have issues with reading or writing Unicode® characters, access the Simba Impala ODBC driver. Under Advanced Options, select the “Use SQL Unicode Types” option.

    Read Support

    Install and configure the Apache Spark ODBC driver:

    • Spark Server Type: Select the appropriate server type for the version of Apache Spark that you are running. If you are running Apache Spark 1.1 and later, then select Apache SparkThriftServer.
    • Authentication Mechanism: See the installation guide downloaded with the Simba Apache Spark driver to configure this setting based on your setup.

    To set up the driver Advanced Options, see the installation guide downloaded with the Simba Apache Spark driver.

    Write Support

      • For both standard and in-database workflows, use the Data Stream In tool to write to Apache Spark. Write support is via HDFS.
      • If you are writing with HDFS Avro, you must select the Default WebHDFS (50070) port option in the HDFS Avro Connection properties window.

      To write a table with field names that total more than 4000 characters, use CSV instead of Avro.

      Limitations

      At this time, Alteryx supports native Spark in Cloudera Data Platform (CDP) but not Cloudera Distributed Hadoop (CDH).

      Was This Page Helpful?

      Running into problems or issues with your Alteryx product? Visit the Alteryx Community or contact support. Can't submit this form? Email us.