Google Service Account Management
Dataprep by Trifacta uses service accounts to execute all its jobs on Dataflow. Access to Google Cloud Platform resources is governed by service accounts. This section describes the default service accounts available in the product and your available options for managing them.
Overview
Definition of a service account
A service account is used by the Trifacta Application to access services and resources in the Google Cloud Platform. In general, a service account is an identity that can be used by an application or service to make requests to resources or services in the Google Cloud Platform of your behalf. A service account can be assigned to multiple user accounts, enabling a set of users to access all resources defined within the service account.
A service account is authenticated using a private/public key pair.
A service account cannot be used from within the browser.
A service account is structured as follows:
Service Account
Role 1
Permission 1
Permission 2
Role 2
Permission 3
Permission 4
Service account types
Dataprep by Trifacta requires the following types of services accounts:
Account usage | Description | Default service account | Other options |
---|---|---|---|
Design time | The design time service account is used by the Trifacta Application to make Google Cloud Platform requests as you are working within the application. |
| Fine-grained access controls Note When fine-grained access is enabled, access to platform resources are based on the user's defined identity within the platform. Dataprep by Trifacta has no governance over these permissions. |
Runtime | The runtime service account is used to execute jobs on Dataflow. |
| Companion Service Accounts Note Companion Service Accounts is the implementation of custom service accounts for Dataprep by Trifacta. Other custom service accounts are not supported. |
A service account can be used by one or more users, who are accessing the platform. For more information on service accounts, see https://cloud.google.com/compute/docs/access/service-accounts.
For more information on the service accounts used by Dataflow to manage security and permissions while running Dataprep by Trifacta jobs, see https://cloud.google.com/dataflow/docs/concepts/security-and-permissions#security_and_permissions_for_pipelines_on_google_cloud_platform.
is a cloud-based data processing service for both batch and real-time data streaming applications. You can use it to set up processing pipelines for integrating, preparing and analyzing large data sets, such as those found in Web analytics or big data analytics applications. For more information, see https://cloud.google.com/dataflow.
Default service accounts
In the Google Cloud Console, select IAM > Service Accounts. The following service accounts are used by the product:
Service Account Name | Owner | Service Account Name |
---|---|---|
Compute Engine | Google, Inc. | <project-number>-compute@developer.gserviceaccount.com |
Dataprep Service Agent | Alteryx | service-<project-number>@trifacta-gcloud-prod.iam.gserviceaccount.com |
where:
<project-number>
is the numeric project identifier.
When Dataprep by Trifacta is enabled in your project, the Dataprep service agent is automatically created. This service account is used on behalf of all project users for application-based interactions with Google Cloud Platform.
Key areas of permissions include:
BigQuery
compute
Dataflow
Cloud Storage
Note
The required list of permissions is subject to change with each release. Any changes to the list of required permissions are captured in an updated definition of the role.
For more information on the list of permissions in the Dataprep service agent, see https://console.cloud.google.com/iam-admin/roles/details/roles%3Cdataprep.serviceAgent.
The Compute Engine service account is the default account for all project users to run jobs on Dataflow. A Compute Engine service accountenables the Trifacta Application to launch and manage compute engine instances where jobs are executed. When the product is enabled for your project, the appropriate compute engine service account is assigned at the project level. Automatically, all users of the project are assigned this account by default.
This service account has the following name:
<project-number>-compute@developer.gserviceaccount.com
where:
<project-number>
- identifier for the the project using the compute service account.
Tip
For all project users to use the default compute service engine account, no additional configuration is required.
Review Org policy
The Google Cloud Platform enables a wide range of organization policies, which can determine default behaviors across projects and services. Before you begin, you should review your organization policies. In particular, please review the following organization policy:
Disable Automatic IAM Grants for Default Service Accounts
This policy limits products and services that are enabled in a project from gaining access to the default service accounts that they need to work within the Google Cloud Platform. These service accounts must be managed using finer-grained controls within the enterprise.
When disabled:
Dataprep by Trifacta can use the two required service agents in any project where the product is enabled.
These service agents are:
Dataprep Service Agent: manages design time interactions between Dataprep by Trifacta and Google Cloud Platform
Compute Engine Service Account: manages runtime product interactions with Dataflow
When enabled:
You must manually assign these service agents or create them from scratch.
Note
For effective use of Dataprep by Trifacta, it is recommended that you disable the above organization policy, at least while you are creating projects for use with the product.
Access models
The following are the primary access models based on service account types.
Access Model | Design time SA | Runtime SA | Notes |
---|---|---|---|
Default | Dataprep Service Agent | Compute Engine SA | When a project is first enabled with Dataprep by Trifacta, these service accounts are automatically provisioned to each user of the project. This default provisioning cannot be overridden. It can be changed after initial startup. |
Companion Service Accounts | Dataprep Service Agent | Companion Service Account | This access model allows you to manage access to Dataflow resources when running jobs (runtime) using your existing IAM roles and permissioning structures.
See "Companion Service Accounts" below. |
Fine-grained access | Fine-grained access | Companion Service Account | In this access model, access to Google Cloud Platform is governed entirely based on your fine-grained permissions scheme for users. Note When fine-grained access is enabled, access to platform resources are based on the user's defined identity within the platform. Dataprep by Trifacta has no governance over these permissions. |
Using Service Accounts
During normal operations, end users do not have to interact with service accounts.
Design time: The assigned service account performs requests in the background on the behalf of the user.
Runtime: The assigned service account is used to make requests to Dataflow and other resources to perform the actions required by the job definition.
As needed, overrides to a user's assigned service account can be applied.
Overrides
The service account that is used for a job is determined based on the following priority level, highest to lowest:
Priority | Description |
---|---|
1 | Job-level overrides: Individual users can override the default service account or their companion service account when executing individual jobs. Scheduledjobs use job-level overrides. For more information, see Runtime Dataflow Execution Settings. |
2 | User preferences: If defined here, these service accounts are applied to individual users. Note If preferred, project owners can require the use of individual service accounts for each user of the project. Companion service accounts are described below. |
3 | Compute Engine service account: If no other service account is specified, then the project default service account is used. Note When the product is enabled, the default Compute Engine service account is provisioned for each user of the project. |
Users can specify the service account to use for all of their jobs. For example, if a user is invited into multiple projects, that user may be required to submit jobs in all projects using the same service account.
Note
A service account assigned to a user's preferences takes precedence over the project-level service account.
For more information, see User Execution Settings Page.
A user's service account can be assigned by the user or, if companion service accounts is enabled, by the project owner, or both. See "Companion Service Accounts" below.
For individual jobs, a user can select the service account to use. This value overrides user preferences and project owner selections. For more information, see Run Job Page.
Using service accounts in your VPC
If you are running Dataprep by Trifacta in your enterprise Virtual Private Cloud (VPC), use of service accounts is different from above.
Design time:
To execute sampling or other jobs that are initiated from inside the Transformer Page, the project compute-engine service account or the user's credentials are used.
Runtime:
For any job that is batched and delivered to an external running environment for execution within your VPC, service accounts are used as follows:
If a Companion Service Account is available, it is used.
If a Companion Service Account is not available:
If a user-specified service account is available, it is used.
Otherwise, an auto-provisioned service account for the project is used for batch job execution.
Note
To run jobs in your VPC using a service account, you must enable the use of Workload Identity in the Google Cloud Platform and enable it for use in Dataprep by Trifacta. See Dataprep In-VPC Execution.
Companion Service Accounts
Note
This feature may not be available in all product editions. For more information on available features, see Compare Editions.
A companion service account is a replacement for the single Compute Engine service account for submitting jobs to Dataflow on behalf of the user. For example, separate companion service accounts can be specified to enable access to different BigQuery tables between users. In this manner, a project owner can provide finer-grained access controls to individual users.
Note
A companion service account is applied only to the execution of Dataflow transformation jobs. Other job types, such as ingestion, publishing, Trifacta Photon transformation, or pushdown, use the service agent account or the submitting user's permissions.
This service account must be specified in the Google IAM console and contain all of the permissions required to access a user's data and run jobs on Dataflow. For more information on permissions, see "Service Account Permissions" above.
When companion service accounts are enabled:
Companion service accounts must be specified for individual users of the project, instead of all users relying on the default Compute Engine service account in the project.
Project owners can apply them for each user.
Individual users can apply their own.
The default Compute Engine service account is no longer available for use.
Companion service accounts can be overridden for individual jobs when defining the job to execute.
Previously created scheduled jobs automatically inherit and use the companion service account specified for the user.
Tip
Before enabling the feature, you should create and specify the companion service accounts. Then, when the feature is enabled, there is no service disruption.
Note
Changes to a user's permissions must be reflected in Dataprep by Trifacta and in the related companion service account.
Create companion service accounts
Like any custom service account, companion service accounts must be created in the IAM console and applied to the project.
These accounts must have the required permissions for the user or users that will use them.
For more information on creating IAM roles, see https://console.cloud.google.com/iam-admin/iam.
For more information on creating service accounts, see https://cloud.google.com/iam/docs/creating-managing-service-accounts#iam-service-accounts-create-console.
Requirements
Any companion service account or companion service account must meet the following requirements:
Service account must be defined in IAM console.
The minimum set of permissions to access Dataprep by Trifacta and any related datastores must be included in the custom service account. See "Service Account Permissions" above.
Permissions in each user's IAM role must be reflected in any custom service account applied to the user's account. Changes in one must be reflected in the other.
Service account must be applied to the project in IAM console.
Tip
After custom service accounts are specified in the IAM console and assigned to the project, they can be used in the product. Custom service accounts can be applied at the project level, user level, or job execution level.
Manage companion service accounts
After these service accounts have been created, you can assign a companion service account to each user of the project. For more information, see Service Accounts Page.
Tip
Individual users can also specify the companion service account through their user preferences. User preferences selections override any selections made by the project owner. See User Execution Settings Page.
Enable companion service accounts
A project owner must enable the use of companion service accounts.
Note
If the use of companion service accounts is later disabled, all project users revert to using the Compute Engine service account.
For more information, see Dataprep Project Settings Page.
Service Account Permissions
Note
A user may be able cancel a job from the Trifacta Application, even though the user is not permitted to cancel the job in the running environment. The service account associated with the user's Alteryx account may have the appropriate permissions, but the user's personal account does not.
General permissions reference
For more information on the Google Cloud Platform permissions required for Dataprep by Trifacta, please see Required Dataprep User Permissions.
Dataflow permissions
A service account that interacts with Dataflow must have the following permissions.
For the list of minimum required permissions for access Dataflow, see https://cloud.google.com/dataflow/docs/concepts/access-control#roles.
The following sections are additional permissions required for Dataprep by Trifacta to interact with Dataflow.
Tip
These permissions must be included as part of any custom service agent.
Note
The ability to cancel a job from within the Trifacta Application is temporarily disabled. When it is re-enabled, this permission will be required. You should leave this permission enabled, if possible.
To enable users to cancel Dataflow jobs, the service account must have the following permission:
dataflow.jobs.cancel
To run Dataprep by Trifacta jobs on Dataflow, the actAs permission must be provisioned based on the following applicable scenario:
When not using companion service account: User must have
iam.serviceAccounts.actAs
permission specified at the project level or in the default compute engine service account.When using companion service account: User must have
iam.serviceAccounts.actAs
permission on companion service account or granted explicitly to the user.IAM disabled: If you are not using IAM roles and have enabled companion service accounts, the Dataprep Service account role, which is assigned to the default service account has the actAs permission for the project.
Project owners require no additional permissions on the projects that they own.
For more information, see https://cloud.google.com/dataflow/docs/concepts/security-and-permissions#security_and_permissions_for_pipelines_on_google_cloud_platform.
Note
Any service account that is used to run jobs must have at least the same permissions that are available through the IAM role to connect to data through the Trifacta Application. For example, to run a job sourced from Cloud Storage datasets, the service account must have the ability to read those datasets accessed through a user's IAM role. The same applies to publishing datasets.
For more information on Cloud Storage and BigQuery permissions, see Required Dataprep User Permissions.
Optional permissions
If users are permitted to access data in Cloud Storage or BigQuery that is owned by another project, additional permissions are required.
For more information, see Access Cross-Project Cloud Storage Buckets.
For more information, see Access Cross-Project BigQuery Datasets.
If you have deployed a Virtual Private Cloud Security Controls perimeter, additional configuration is required to ensure that Dataprep by Trifacta can operate within the perimeter. For more information, see Configure VPC-SC Perimeter.