Storage Maintenance
This page provides some tips and guidelines fo maintaining your backend storage.
Note
Except for temporary files that it creates as part of normal operations or storage used as part of feature execution, Designer Cloud Powered by Trifacta Enterprise Edition does not remove files from the backend storage for safety reasons. Unless resources have been provided to you by Alteryx, management of the backend datastore is the responsibility of the customer.
Note
Designer Cloud Powered by Trifacta Enterprise Edition does not store data on the Trifacta node where the software is installed.
Note
Designer Cloud Powered by Trifacta Enterprise Edition does not modify source data.
Alteryx Storage
Log files are stored by default in the following location on the Trifacta node:
/opt/trifacta/logs
Service logs
Service log files are automatically auto-rotated at 50 MB. For more information on configuring log rotation, see Configure Logging for Services.
Job logs
Logs related to job execution are not automatically rotated.
Note
Job log files can accumulate over time. As a good rule of thumb, you can set up a recurring job through an external scheduler to purge old job logs that are older than six months.
Job log files are stored in the following directories:
/opt/trifacta/logs/jobs /opt/trifacta/logs/jobgroups
They are organized by job identifier in sub-directories.
For more information on job logs, see Diagnose Failed Jobs.
Base Storage Layer
Temp files
Job temp files
Temporary files may be written to the temporary directory on the backend datastore, particularly during job execution.
/tmp
Note
These files may be purged during restarts of the platform.
Spark temp files
During execution of jobs, Spark may use the following directories on backend storage for storage of temporary files:
/user/<UserID> /trifacta/tempfiles
Samples and profile statistics
The Designer Cloud Powered by Trifacta platform generates your samples and profiling statistics in one of the following directories for each user:
The default directory:
/trifacta/queryResults/.trifacta
The user-defined output directory
Note
These files should be removed on a periodic basis.
Datasets
While samples and job results may be retained on backend storage, the Designer Cloud Powered by Trifacta platform does not store your source data.
Note
Datasets removed from the Library are removed as references to the product. The underlying data is not actually deleted.
Storage for features
The following features do store data on the base storage layer.
File conversion
Data sources that are stored in a binary format, such as PDF or Excel, or that require additional processing, such as JSON, must be converted to file format that can be natively ingested by the Designer Cloud Powered by Trifacta platform. Typically these files are stored in the base storage layer in CSV format.
This feature is enabled by default.
JDBC ingestion
When JDBC ingestion is enabled, some objects used in sampling that are sourced from JDBC sources may be stored in the base storage layer for faster retrieval. After job execution, these objects are deleted, or if datasource caching is enabled, are moved to the appropriate datasource cache.
For more information, see Configure JDBC Ingestion.
Datasource caching
If datasource caching has been enabled, cached objects can be stored in either a global or user-specific cache. For more information, see Configure Data Source Caching.