Access Cross-Project Cloud Storage Buckets
By default, Dataprep by Trifacta can access data within the Google Cloud Platform project from which the product is run. To enable access for your project to a Cloud Storage bucket owned by a different project, you must make the bucket accessible to the service accounts in your Dataprep by Trifacta project. Then, you must enter that storage location in the Trifacta Application.
Note
If you grant Dataprep by Trifacta access to a bucket in another project, disabling Dataprep by Trifacta does not remove these permissions. The permissions must be manually removed to fully revoke product access to buckets in other projects. For more information, see https://cloud.google.com/dataprep/docs/concepts/gcs-buckets#removing_service_account_access_to_a_bucket.
To visit your current project on the Google Cloud console, see https://console.cloud.google.com/dataprep/.
Project Service Accounts
In the Google Cloud Console, select IAM > Service Accounts. The following service accounts are used by the product:
Service Account Name | Owner | Service Account Name |
---|---|---|
Compute Engine | Google, Inc. | <project-number>-compute@developer.gserviceaccount.com |
Dataprep Service Agent | Alteryx | service-<project-number>@trifacta-gcloud-prod.iam.gserviceaccount.com |
where:
<project-number>
is the numeric project identifier.
Methods for Granting Access
You can provide access to remote BigQuery datasets through one of the following methods:
Note
When using a named service account to access data or run jobs in other projects, each user requesting access must be granted the roles/iam.serviceAccountUser
role on the service account.
Note
OAuth users of the product require the following roles and permissions, too.
Grant access through IAM role
To the IAM role used to access the Cloud Storage datasets, you must add the following service accounts:
Alteryx Service Account for the Dataprep by Trifacta project. This service account is required for reading the data.
Compute Engine Service Account for the Dataprep by Trifacta project. This service account is for running your Dataprep by Trifacta job on Dataflow using the Cloud Storage datasets.
This method of access enables all users of the Dataprep by Trifacta project to access all datasets governed by the IAM role. For more information, see https://console.cloud.google.com/iam-admin/roles.
Grant service account access to a bucket
Use Google Cloud SDK gsutil
commands to grant your project's service accounts ownership (read/write permission) to both the bucket and its contents. For more information on gsutil, see https://cloud.google.com/storage/docs/gsutil_install#sdk-install.
Note
When using a named service account to access data or run jobs in other projects, each user requesting access must be granted the roles/iam.serviceAccountUser
role on the service account.
To grant your project's service accounts access to both current and new objects in a Cloud Storage bucket in another project, run both sets of the following commands.
To grant your project's service accounts access to new objects created in a Cloud Storage bucket in another project, use the following gsutil defacl
commands in your shell or terminal window:
$ gsutil defacl ch -u \ <project-number>-compute@developer.gserviceaccount.com:OWNER \ gs://<bucket-name> $ gsutil defacl ch -u \ service-<project-number>@trifacta-gcloud-prod.iam.gserviceaccount.com:OWNER \ gs://<bucket-name>
where:
<project-number>
is the numeric identifier for your project.<bucket-name>
is the name of the bucket to which you wish to grant access.
To grant your project's service accounts access to a Cloud Storage bucket and its current contents in another project, use the following gsutil defacl
commands in your shell or terminal window:
$ gsutil acl ch -u \ <project-number>-compute@developer.gserviceaccount.com:OWNER \ gs://<bucket> $ gsutil -m acl ch -r -u \ <project-number>-compute@developer.gserviceaccount.com:OWNER \ gs://<bucket> $ gsutil acl ch -u \ service-<project-number>@trifacta-gcloud-prod.iam.gserviceaccount.com:OWNER \ gs://<bucket> $ gsutil -m acl ch -r -u \ service-<project-number>@trifacta-gcloud-prod.iam.gserviceaccount.com:OWNER \ gs://<bucket>
where:
<project-number>
is the numeric identifier for your project.<bucket-name>
is the name of the bucket to which you wish to grant access.
Tip
The -m
option runs the command in parallel for quicker processing. The -r
option runs the command recursively on resources within the bucket.
Use bucket in Dataprep by Trifacta application
For import
Steps:
Login to the Trifacta Application.
In the left nav bar, click the Library for Data icon.
Click Import Data.
Click the GCS icon in the left nav bar.
Under Choose a file or folder, click the Pencil icon.
Enter the URL of the bucket:
gs://<bucket>
Navigate to select the datasets to import.
For publishing
Steps:
In Flow View, create or select the output object that you wish to use for publishing to the GCS bucket.
In the right context panel, click Edit for either manual or scheduled destinations.
Add or edit a publishing action.
Under Choose a file or folder, click the Pencil icon.
Enter the URL of the bucket:
gs://<bucket>
Navigate to specify the location in the bucket where you wish to publish the output.
Remove service account access to a bucket
If you have granted service account access to a bucket, you can run the following Google Cloud SDK gsutil acl commands to remove your project's service accounts ownership (read/write permission) to the bucket and its contents.
$ gsutil defacl ch -d \ <project-number>-compute@developer.gserviceaccount.com:OWNER \ gs://<bucket> $ gsutil defacl ch -d \ service-<project-number>@trifacta-gcloud-prod.iam.gserviceaccount.com:OWNER \ gs://<bucket> $ gsutil acl ch -d \ <project-number>-compute@developer.gserviceaccount.com \ gs://<bucket> $ gsutil -m acl ch -r -d \ <project-number>-compute@developer.gserviceaccount.com \ gs://<bucket> $ gsutil acl ch -d \ service-<project-number>@trifacta-gcloud-prod.iam.gserviceaccount.com \ gs://<bucket> gsutil -m acl ch -r -d \ service-<project-number>@trifacta-gcloud-prod.iam.gserviceaccount.com \ gs://<bucket>
where:
<project-number>
is the numeric identifier for your project.<bucket-name>
is the name of the bucket to which you wish to grant access.
Tip
The -m
option runs the command in parallel for quicker processing. The -r
option runs the command recursively on resources within the bucket.