Set Up GCP Project and VPC for Private Data
Google Cloud Platform (GCP) private data processing involves running an Alteryx Analytics Cloud (AAC) data processing cluster inside of your GCP project and VPC. This combination of your infrastructure, together with Alteryx-managed GCP resources and software, is commonly referred to as a private data processing.
This page focuses on how to set up your GCP project and VPC for a private data processing on AAC.
Note
The GCP project and VPC setup require access and permissions to the GCP console. If you don’t have this access, please contact your IT team to complete this step.
Caution
Never delete resources provisioned for Private Data Processing.
Setup Steps
Important
To continue with these steps, you must have the GCP Owner
RBAC role assigned to you.
Step 1: Select the GCP Project
Select the project where you’d like to run your private data processing.
To improve performance and reduce egress costs, your Google storage and private data handling GKE cluster should be in the same region that you selected for private data storage. This applies to any data sources that you want to connect to the AAC .
The VPC created in the GCP project should be dedicated to AAC. You can set up connectivity to private data sources using VPC peering, transit gateways, PrivateLink, or others.
Important
You should only set up 1 private data handling instance per GCP project.
Step 2: Enable Google APIs
To create cloud resources for Private Data Handling, you must enable APIs in the project.
From the GCP console, select APIs & Services.
Select ENABLED APIS AND SERVICES.
Enable these APIs:
Cloud Logging API
Cloud Monitoring API
Compute Engine API
Secret Manager API
Service Networking API
Cloud Asset API
Kubernetes Engine API
Google Cloud Memorystore for Redis API
Step 3: Configure IAM
With your GCP project in place, now set up the service principal and access keys.
Step 3a: Create a Service Account
Create a service account with the name
aac-automation-sa
.Generate keys with the key type as JSON.
Store the JSON Blob file.
Note
You'll need the service key JSON Blob file to provision the cloud resources in a later step.
Step 3b: IAM Binding to the Service Account
Assign these roles to the aac-automation-sa
service account:
Secret Manager Admin:
roles/secretmanager.admin
Service Account Admin:
roles/iam.serviceAccountAdmin
Service Account User:
roles/iam.serviceAccountUser
Project IAM Admin:
roles/resourcemanager.projectIamAdmin
Service Account key Admin:
roles/iam.serviceAccountKeyAdmin
Compute Network Viewer:
roles/compute.networkViewer
Cloud KMS Viewer:
roles/cloudkms.viewer
Important
GCP doesn't allow wildcard (*
) in the policy document. GCP also has limitations on the number of individual permissions assigned to a custom role. Therefore, you must assign the service account a set of GCP-managed predefined roles.
Step 4: Configure Virtual Private Network
Step 4a: Create a VPC Network
Create a virtual network.
Select Subnet creation mode = Custom.
Disable or delete the default firewall rules.
Select Dynamic routing mode = Global.
The VPC requires 1 subnet. Configure the subnets as shown in this table:
Subnet Name | Subnet Size | Secondary Subnet Name | Secondary Subnet Size |
---|---|---|---|
aac-private | 10.10.10.0/24 | N/A | N/A |
Important
The subnet IP addresses and sizes in the table are shown as an example.
Modify values as needed to meet your network architecture. Subnet region must be the region where ‘Private data Handling’ is to be provisioned.
The subnet name MUST match with the name as shown in the table.
Step 4b: Subnet Route Table
Important
You must configure the VPC with a network connection to the internet in your project.
Note
The <gateway id>
could be either a NAT gateway or internet gateway, depending on your network architecture.
This is an example subnet route table:
Address Prefix | Next Hop |
---|---|
/24 CIDR Block (aac-private) | aac-vpc |
0.0.0.0/0 | <gateway_ID> |
Step 5: Trigger Private Data Handling Provisioning
Data processing provisioning triggers from the Admin Console inside AAC. You need Workspace Admin privileges within a workspace in order to see it.
From the AAC landing page, select the Profile menu and then select Workspace Admin.
From the left navigation panel, select Private Data Handling and then select Processing.
Caution
If you modify or remove any of the AAC-provisioned public cloud resources once private data handling is provisioned, it leads to an inconsistent state. This inconsistency triggers errors during the job execution or deprovisioning of the private data handling setup.
Make sure that Private Data Storage shows Successfully Configured
before you proceed. If the status is Not Configured
, go to GCS as Private Data Storage first, then return to this step.
Under the Processing section, enter the required Environment Details from the GCP Project and VPC setup steps you just completed:
Enter the GCP Project ID.
Select the Region of the GCP project you want to use for private data processing.
Enter the VPC Name.
Enter the GKE Control Plane Address Range.
Copy and paste JSON Blob file you created in the previous step.
Select Create.
Selecting Create triggers the deployment of the cluster and resources in the GCP project. This runs a set of validation checks to verify the correct configuration of the GCP project. If there are incorrectly configured permissions, or the creation or tagging of the VPC resources is not correct, you receive an error message with a description of how to fix the error.
Once the initial validation checks complete, provisioning commences. A message box on the screen periodically refreshes with status updates.
Note
The provisioning process takes approximately 35–40 minutes to complete.
After the provisioning completes, you can view the created resources (for example, VM instances and node pools) through the GCP console. It is very important that you don't modify them on your own. Manual changes might cause issues with the function of the private data processing.