MongoDB Connections
Note
This feature may not be available in all product editions. For more information on available features, see Compare Editions.
You can create connections to MongoDB and MongoDB Atlas connections through Trifacta Application . These connections enable to read data from the MongoDB workspace.
If you are connecting the Dataprep by Trifacta platform to any relational source of data, you must add the Alteryx Service to your whitelist for those resources. See Whitelist Platform Service.
Supported Versions:
Database versions: 2.6 - 6.0.1
Supported Environments:
Read: Supported
Write: Not supported
Prerequisites
MongoDB supports basic (username/password) authentication.
Limitations
Note
During normal selection or import of an entire table, you may encounter an error indicating a problem with a specific column. Since some tables require filtering based on a particular column, data from them can only be ingested using custom SQL statements. In this case, the problematic column can be used as a filter in the WHERE clause of a custom SQL statement to ingest the table.
For more information, please consult the CData driver documentation for the specific table.
For more information on using custom SQL, see Create Dataset with SQL.
Note
For filtering date columns, this connection type supports a set of literal functions on dates. You can use these to reduce the volume of data extracted from the database using a custom SQL query. For more information, see the pg_dateliteralfunctions.htm
page in the driver documentation for this connection type.
This connection is read-only.
Create Connection
MongoDB
To create a MongoDB connection, please specify the following properties:
Property | Description |
---|---|
Host | Name of the host. |
Port | Set this value to the port number through which to access MongoDB. By default, this value is 27017. |
Database | The database that you want to read |
Auth Database | Name of the MongoDB database used for authentication |
Replica Set | (Optional) Comma-separated list of secondary servers in the replica set, specified by address and port. A replica set is a group of mongoDB processes that maintain the same data set. Replica sets provide redundancy and high availabilityand are the basis for all production deployments. For more information, see https://docs.mongodb.com/manual/replication/. |
Secondary Reads | Enable this checkbox if you want to read from secondary (slave) servers. |
Use SSL | Enable this checkbox if you want to connect using SSL. |
Connect String Options | (Optional) You can specify additional options used to connect as a string value. The following option sets the connection timeout in milliseconds: Timeout=0; The default value is |
Test Connection | After you have defined the connection credentials type, credentials, and connection string, you can verify that the Trifacta Application can use them to connect to the database. |
Advanced options: Default Column Data Type Inference | Set to |
Advanced options: Enable SSH Tunneling | If available, the SSH tunneling options allow you to configure SSH tunneling authentication between the Trifacta Application and your database. Note SSH tunneling is available on a per-connection basis. It may not be available for all connections. For more information, see Configure SSH Tunnel Connectivity. |
Connection Name | Display name of the connection |
Connection Description | (Optional) Description of the connection, which appears in the application. |
MongoDB Atlas
To create a MongoDB Atlas connection, please specify the following properties:
Property | Description |
---|---|
Host | Name of the host. |
Port | Set this value to the port number through which to access MongoDB. By default, this value is |
Database | The database that you want to read |
Replica Set | (Optional) Comma-separated list of secondary servers in the replica set, specified by address and port. A replica set is a group of mongoDB processes that maintain the same data set. Replica sets provide redundancy and high availabilityand are the basis for all production deployments. For more information, see https://docs.mongodb.com/manual/replication/. |
Secondary Reads | Enable this checkbox if you want to read from secondary (slave) servers. |
Connect String Options | (Optional) The option sets the connection timeout in milliseconds: Timeout=0; The default value is |
Test Connection | After you have defined the connection credentials type, credentials, and connection string, you can verify that the Trifacta Application can use them to connect to the database. |
Advanced options: Default Column Data Type Inference | Set to Note SSH tunneling is not supported for MongoDB Atlas. |
Connection Name | Display name of the connection |
Connection Description | (Optional) Description of the connection, which appears in the application. |
For more information on these settings, see https://cdn.cdata.com/help/DGH/jdbc/.
Create connection via API
Depending on your product edition, you can create connections of this type.
MongoDB:
"vendor": "mongodb", "vendorName": "MongoDB", "type": "jdbc"
MongoDB Atlas:
"vendor": "mongodb_atlas", "vendorName": "MongoDB Atlas", "type": "jdbc"
Dataprep by Trifacta: API Reference docs
Connect string options
Connection timeout
By default, the supported driver applies a connection timeout to MongoDB of 0
seconds. As needed, you can modify the connection timeout through connect string options:
Timeout=<value_in_seconds>;
where:
<value_in_seconds>
corresponds to the number of seconds for the time.
Flattening Documents
Documents can contain other documents, which enables the storage of nested data. You can control the flattening of nested objects and arrays through the CData driver through Connect String Options.
Note
Columns that have been flattened can be accessed or referenced using custom SQL queries. Additional information is below.
Flatten Objects:
By default, the CData driver flattens nested Objects. As needed, you can set FlattenObjects to false
to disable this behavior.
For more information, see http://cdn.cdata.com/help/DGG/jdbc/RSBMongodb_p_FlattenObjects.htm.
Flatten Arrays:
By default, CData driver does not flatten Arrays.
As needed, you can configure the number of elements that you want to have returned in your flattened arrays.
To flatten all elements of all arrays, set FlattenArrays to
-1
.
For more information, see http://cdn.cdata.com/help/DGG/jdbc/RSBMongodb_p_FlattenArrays.htm.
Referencing flattened columns:
If you have flattened Objects or Arrays, you can reference these columns using square brackets in your custom SQL queries.
Example of flattened Object:
SELECT [address.city] FROM my_table;
Example of flattened Array:
SELECT * FROM my_table WHERE [hobbies.0]='cricket';
Driver Information
For more information on CData JDBC drivers, see https://cdn.cdata.com/help/DGH/jdbc/.
Using MongoDB
MongoDB is a NoSQL document database that provides high performance, availability, and scalability.
MongoDB Data Organization Hierarchy
MongoDb has a two-level data hierarchy:
+ Schema1 + Collection1 + Collection2 + Schema2 + Collection3 + Collection4
Schema roughly corresponds to a database.
Collection roughly corresponds to a table.
A collection is composed of documents. A Document is a binary JSON representation of the fields and values of a row.
Database Uses
For more information on interacting with databases, see Using Databases.
Read Data
You can import datasets from MongoDB through the Import Data page. See Import Data Page.
Data Type Mappings
Note
The Alteryx data types listed in this section reflect the raw data type of the converted column. Depending on the contents of the column, the application may re-infer a different data type, when a dataset using this type of source is loaded.
Access/Read
When data is imported from MongoDB, the supported data types from the source are converted to corresponding data types supported by the Trifacta Application.
Source Data Type | Supported | Alteryx data type |
---|---|---|
ObjectId | Y | String |
RegEx | Y | String |
String | Y | String |
Binary | Y | String |
Integer | Y | Integer |
Timestamp | Y | Datetime |
Double | Y | Float |
Array | Y | String |
Bool | Y | bool |
Null | Y | String |
Date | Y | Datetime |
Write/Publish
Not supported.