Skip to main content

MongoDB Connections

Note

This feature may not be available in all product editions. For more information on available features, see Compare Editions.

You can create connections to MongoDB and MongoDB Atlas connections through Trifacta Application . These connections enable to read data from the MongoDB workspace.

If you are connecting the Dataprep by Trifacta platform to any relational source of data, you must add the Alteryx Service to your whitelist for those resources. See Whitelist Platform Service.

Supported Versions:

  • Database versions: 2.6 - 6.0.1

Supported Environments:

  • Read: Supported

  • Write: Not supported

Prerequisites

  • MongoDB supports basic (username/password) authentication.

Limitations

Note

During normal selection or import of an entire table, you may encounter an error indicating a problem with a specific column. Since some tables require filtering based on a particular column, data from them can only be ingested using custom SQL statements. In this case, the problematic column can be used as a filter in the WHERE clause of a custom SQL statement to ingest the table.

  • For more information, please consult the CData driver documentation for the specific table.

  • For more information on using custom SQL, see Create Dataset with SQL.

Note

For filtering date columns, this connection type supports a set of literal functions on dates. You can use these to reduce the volume of data extracted from the database using a custom SQL query. For more information, see the pg_dateliteralfunctions.htm page in the driver documentation for this connection type.

  • This connection is read-only.

Create Connection

MongoDB

To create a MongoDB connection, please specify the following properties:

Property

Description

Host

Name of the host.

Port

Set this value to the port number through which to access MongoDB. By default, this value is 27017.

Database

The database that you want to read

Auth Database

Name of the MongoDB database used for authentication

Replica Set

(Optional) Comma-separated list of secondary servers in the replica set, specified by address and port.

A replica set is a group of mongoDB processes that maintain the same data set. Replica sets provide redundancy and high availabilityand are the basis for all production deployments. For more information, see https://docs.mongodb.com/manual/replication/.

Secondary Reads

Enable this checkbox if you want to read from secondary (slave) servers.

Use SSL

Enable this checkbox if you want to connect using SSL.

Connect String Options

(Optional) You can specify additional options used to connect as a string value.

The following option sets the connection timeout in milliseconds:

Timeout=0;

The default value is 0, which disables connection timeouts. See below for more information.

Test Connection

After you have defined the connection credentials type, credentials, and connection string, you can verify that the Trifacta Application can use them to connect to the database.

Advanced options: Default Column Data Type Inference

Set to disabled to prevent the product from applying its own type inference to each column on import. The default value is enabled.

Advanced options: Enable SSH Tunneling

If available, the SSH tunneling options allow you to configure SSH tunneling authentication between the Trifacta Application and your database.

Note

SSH tunneling is available on a per-connection basis. It may not be available for all connections.

For more information, see Configure SSH Tunnel Connectivity.

Connection Name

Display name of the connection

Connection Description

(Optional) Description of the connection, which appears in the application.

MongoDB Atlas

To create a MongoDB Atlas connection, please specify the following properties:

Property

Description

Host

Name of the host.

Port

Set this value to the port number through which to access MongoDB. By default, this value is 27017.

Database

The database that you want to read

Replica Set

(Optional) Comma-separated list of secondary servers in the replica set, specified by address and port.

A replica set is a group of mongoDB processes that maintain the same data set. Replica sets provide redundancy and high availabilityand are the basis for all production deployments. For more information, see https://docs.mongodb.com/manual/replication/.

Secondary Reads

Enable this checkbox if you want to read from secondary (slave) servers.

Connect String Options

(Optional) The option sets the connection timeout in milliseconds:

Timeout=0;

The default value is 0, which disables connection timeouts. See below for more information.

Test Connection

After you have defined the connection credentials type, credentials, and connection string, you can verify that the Trifacta Application can use them to connect to the database.

Advanced options: Default Column Data Type Inference

Set to disabled to prevent the product from applying its own type inference to each column on import. The default value is enabled.

Note

SSH tunneling is not supported for MongoDB Atlas.

Connection Name

Display name of the connection

Connection Description

(Optional) Description of the connection, which appears in the application.

For more information on these settings, see https://cdn.cdata.com/help/DGH/jdbc/.

Create connection via API

Depending on your product edition, you can create connections of this type.

MongoDB:

"vendor": "mongodb",
"vendorName": "MongoDB",
"type": "jdbc"

MongoDB Atlas:

"vendor": "mongodb_atlas",
"vendorName": "MongoDB Atlas",
"type": "jdbc"

Dataprep by Trifacta: API Reference docs

Connect string options

Connection timeout

By default, the supported driver applies a connection timeout to MongoDB of 0 seconds. As needed, you can modify the connection timeout through connect string options:

Timeout=<value_in_seconds>;

where:

<value_in_seconds> corresponds to the number of seconds for the time.

Flattening Documents

Documents can contain other documents, which enables the storage of nested data. You can control the flattening of nested objects and arrays through the CData driver through Connect String Options.

Note

Columns that have been flattened can be accessed or referenced using custom SQL queries. Additional information is below.

Flatten Objects:

By default, the CData driver flattens nested Objects. As needed, you can set FlattenObjects to false to disable this behavior.

For more information, see http://cdn.cdata.com/help/DGG/jdbc/RSBMongodb_p_FlattenObjects.htm.

Flatten Arrays:

By default, CData driver does not flatten Arrays.

  • As needed, you can configure the number of elements that you want to have returned in your flattened arrays.

  • To flatten all elements of all arrays, set FlattenArrays to -1.

For more information, see http://cdn.cdata.com/help/DGG/jdbc/RSBMongodb_p_FlattenArrays.htm.

Referencing flattened columns:

If you have flattened Objects or Arrays, you can reference these columns using square brackets in your custom SQL queries.

Example of flattened Object:

SELECT [address.city] FROM my_table;

Example of flattened Array:

SELECT * FROM my_table WHERE [hobbies.0]='cricket';

Driver Information

For more information on CData JDBC drivers, see https://cdn.cdata.com/help/DGH/jdbc/.

Using MongoDB

MongoDB is a NoSQL document database that provides high performance, availability, and scalability.

MongoDB Data Organization Hierarchy

MongoDb has a two-level data hierarchy:

+ Schema1
  + Collection1
  + Collection2
+ Schema2
  + Collection3
  + Collection4
  • Schema roughly corresponds to a database.

  • Collection roughly corresponds to a table.

    • A collection is composed of documents. A Document is a binary JSON representation of the fields and values of a row.

Database Uses

For more information on interacting with databases, see Using Databases.

Read Data

You can import datasets from MongoDB through the Import Data page. See Import Data Page.

Data Type Mappings

Note

The Alteryx data types listed in this section reflect the raw data type of the converted column. Depending on the contents of the column, the application may re-infer a different data type, when a dataset using this type of source is loaded.

Access/Read

When data is imported from MongoDB, the supported data types from the source are converted to corresponding data types supported by the Trifacta Application.

Source Data Type

Supported

Alteryx data type

ObjectId

Y

String

RegEx

Y

String

String

Y

String

Binary

Y

String

Integer

Y

Integer

Timestamp

Y

Datetime

Double

Y

Float

Array

Y

String

Bool

Y

bool

Null

Y

String

Date

Y

Datetime

Write/Publish

Not supported.