Private Data Handling
Private data handling is a capability in Alteryx Analytics Cloud (AAC) that allows you to store your data and run data processing jobs in your own cloud infrastructure. Private data handling provides more security and control for those with sensitive data. It also results in improved performance and reduced egress costs by moving the processing in AAC next to your data.
Warning
Never delete resources provisioned for private data processing.
Overview
At the highest level, AAC differentiates between customer data and application metadata.
Customer data belongs to you. This is the data that you want to join and merge, prep and blend, analyze, train models on, and report on.
Application metadata is the data that AAC needs to do the jobs you ask it to do.
AAC uses a split-plane architecture and has divided responsibility for these 2 kinds of data into different planes to provide more flexibility to customers. These 2 planes are the control plane and the data plane.
Plane | Description |
---|---|
Control Plane | The control plane powers the user's design time experience, acts as the command and control center, and stores application metadata. |
Data Plane | The data plane is responsible for these aspects of processing and storage of customer data:
At design-time, samples of customer data leave the data plane and display to the user in the browser. Apart from this, no customer data ever leaves the data plane. Note Third-party datastores and execution engines such as data warehouses are also part of the data plane. |
Use private data handling to run a data plane inside your own cloud infrastructure, giving you control over where you store and process your data. It is comprised of 2 components:
Private data storage: Use AAC to leverage your existing cloud infrastructure for the at-rest storage of platform metadata and other assets.
Private data processing: Use AAC to run your own data processing resources for the execution of data processing activities including connecting to data sources, processing data, converting data from one format to another, and publishing job outputs. This arrangement ensures that no data leaves your cloud infrastructure.
Private Data Handling includes defense-in-depth security controls to protect your data assets and meet compliance requirements. If desired, you can also increase security by putting firewall/IP restrictions to only allow ingress from the AAC control plane.
Feature availability:
Feature | Availability |
---|---|
Private Data Storage |
|
Private Data Processing |
|
Architecture
When you configure private data processing for your workspace, the AAC control plane will initiate interactions with your workspace storage (private data storage), a Kubernetes cluster managed by Alteryx, and a Spark processor (where available)—all running inside of your VPC. Some of these interactions are for command and control. Others are data paths to move your data from one place to another as directed.
Customer data is never stored or cached within the control plane, though at design-time sample data might be inspected and formatted there (for delimiter and header detection, column name and type inference, and the transform by example capability). If you choose to download generated PDF reports, this also happens in the control plane.
Data Security
Alteryx offers a downloadable whitepaper that covers private data handling privacy and security in depth. You can find a link to this document at alteryx.com/trust in the Private Data Handling section.
For convenience, these are a few highlights for encryption of data in transit and at rest:
Data is TLS 1.3 encrypted when in transit between browser <=> control plane and control plane <=> data plane.
Alteryx uses mTLS encryption for intra-cluster communications.
A database in the control plane encrypted with 256-bit AES block ciphers stores file storage and database credentials.
Alteryx applies envelope encryption to these credentials before they pass from the control plane to the data plane. In the data plane, these credentials become available to job pods as Kubernetes secrets.
The cloud provider's secret manager stores the private key used to decrypt the encrypted credentials. The secret manager resides in the data plane and is mounted into the AYX cluster using the external secrets operator.
Workloads access secrets in the secret manager through a Kubernetes ServiceAccount.
Upgrades
Not having to worry about upgrades is a benefit of software-as-a-service. AAC manages upgrades for you.
AAC manages software upgrades for long-running services. When new versions of the software are available, Alteryx pushes new container images to our image repositories. AAC retrieves these new image versions and seamlessly begins using them within the cluster without disrupting any running jobs.
Alteryx also manages infrastructure upgrades on your behalf.
Metrics Collection
AAC uses Datadog to collect application monitoring usage data to monitor and maintain operational stability. The Datadog agent collects these metrics:
Telemetry Metrics from the kubernetes cluster, storage bucket, spark processor (when enabled), and compute nodes.
Custom logs from the services in the processing cluster.
Cloud provider logs (for example, AWS Cloudwatch and Azure Monitor) for the public cloud-managed services used.
Configure Private Data Handling
Private data handling consists of 2 capabilities: private data storage and private data processing. You’ll need to configure private data storage first, then private data processing afterward.
Private Data Storage
Alteryx Data Storage (ADS) is available in all workspaces by default for the storage of uploaded files, job outputs, sample data, and some temporary processing files.
Private data storage to replaces ADS with your own file store. Any data saved in ADS will be inaccessible after you set up Priavate Data Storage. If possible, this should be done before users upload any assets.
Private data storage supports AWS S3, Azure ADLS, and Google Cloud Storage (GCS) as storage providers.
Once you've configured private data storage in the cloud provider of your choice, you can proceed to configure private data processing in that same cloud provider.
For more information on how to set up private data storage with your cloud storage provider, go to Private Data Storage.
Private Data Processing
Use Private data processing to run Alteryx data processing within your own VPC. To configure this capability, you must first complete the setup steps to prepare your VPC to run Alteryx data processing. Each AAC product has its own steps. You can run multiple products in the same VPC, you’ll just need to complete the setup instructions for each product.
After completing the setup steps, you’ll sign in to AAC and turn on private data processing for any solutions you want to use in your workspace.
After you enable private data processing, there might be additional setup steps needed depending on your solution. For example, after you’ve turned on private data processing for Designer Cloud, you’ll need to update your private data storage permissions to allow the data processing cluster to access your data store.
Alteryx recommends using a dedicated account and VPC for the best security and stability, although other configurations are possible.
In this step, select a region and an account. Then create a VPC, subnets, route tables, and the permissions that allow AAC to create and manage the data processing infrastructure and software. You'll also grant AAC permission to spin up the cluster, kick off the provisioning process, and update a trust relationship between your new processing cluster and your private data storage bucket.
For more information on private data processing, including the shared responsibility model, required cloud resources for different apps, regional availability and more, go to Private Data Processing
Follow these guides to set up private data processing based on you cloud provider...
Known Limitations
These are some known limitations to private data processing...
Some applications such as App Builder and Location Intelligence are not yet compatible and show as disabled in a workspace where you've enabled private data processing.
Using SSH Tunneling with connectors is not yet supported in a workspace with private data processing.
You can only attach 1 workspace to 1 data processing environment. If you provide your own data processing environment, only the AAC applications you run in that data processing environment will work.