Skip to main content

Import Google Sheets Data

Note

This feature may not be available in all product editions. For more information on available features, see Compare Editions.

Dataprep by Trifacta can import Google® Sheets® spreadsheets.

The sheets of a spreadsheets can be imported as:

  • Individual datasets

  • A single dataset

Limitations:

Note

This integration provides access to all Google Sheets in the connecting user's account. Access includes spreadsheets with disabled options for download, print, or copy as well as hidden sheets within spreadsheets.

Note

If you have enabled in-VPC processing in Dataprep by Trifacta, you must specify a private key and public key for the Trifacta Application to use. For more information, see Dataprep In-VPC Execution.

Note

Before you begin, you should review information on file formats supported for import, which can cause your files to fail to import or to be properly ingested and formatted. For more information, see Supported File Formats.

  • Import-only support

Note

After you import a Google Sheet into Dataprep by Trifacta, renaming the source Google Sheet or a tab in it can break your datasets and flows. Details are below.

  • Creation of a dataset with parameters from Google Sheets is not supported.

  • Connected sheets or embedded external datasources in your Google Sheets are not supported.

    Note

    If your connected sheet is linked to a table-based source, you may import that source directly into the product.

  • If you have enabled Google Advanced Protection, this connection type does not work.

  • A Google Sheet can contain up to 5,000,000 cells. Each cell can contain up to 50,000 characters.

    • Dataprep by Trifacta supports a maximum of 25,000 characters in a cell.

  • Filepath and source row number information is not available from original Sheets. These references return values from the CSV files that have been converted on the backend. For more information, seeSource Metadata References.

  • Source Sheets files with cells bracketed by single double quotes may not be properly ingested if any terminating quotes are missing.

    Tip

    You can check the data quality bars for mismatched values or, for strings, the data histogram bars for anomalous values to see if the above issue is present. If so, deselect Detect Structure on import. Then, use a Split rows transformation applied to the affected column to break up the column as needed.

  • If loading your Sheets-based dataset in the Transformer page results in a blank screen, please take a new sample. The file requires conversion again with each generated sampling.

  • Latest state of the spreadsheet may not be reflected in the Transformer page due to caching. When you run a job, the platform collects the latest version of the data and converts it to CSV for execution.

  • IMPORTRANGE function in Google Sheets is not supported for importing data from another sheet.

Process:

  1. A spreadsheet can be read directly from your Google Drive.

    Note

    When you first use the Google Sheets connector, you must enable Dataprep by Trifacta to read all of your Google Drive data. When the connector is used, it locates only the Google Sheets data, including any Sheets that have been shared with you. All other data in Google Drive, including any Microsoft® Workbooks®, is ignored. You can then select the Sheet or Sheets you wish to import.

  2. Sheets in a worksheet are ingested and written to Cloud Storage in CSV format.

  3. CSV files are available for selection.

  4. These CSV files are the source from which the imported datasets are created.

Steps:

  1. In the menu bar, click Library for Data .

  2. In the Library for Data page, click Import Data. Select the Google Sheets connection.

    Tip

    You can paste links that you gather from Google to select spreadsheets. To access a Google Sheet, edit the path and paste the link. Use this method for publicly available Google Sheets, too.

    ImportGoogleSheetsData-GEE.png

    Figure: Import Google Sheets spreadsheet

  3. After you select the spreadsheet, it is uploaded and converted to CSV format and stored. Depending on the size of the spreadsheet, this process may take a while.

  4. By default, all sheets in the spreadsheet are imported as individual datasets. To change how the data is imported, click Edit in the right panel.

    ImportGoogleSheetsData-ImportSettings-GEE.png

    Figure: Import settings for Google Sheets datasets

  5. Dataset creation:

    1. 1 dataset per sheet: (Default) Each selected sheet in the spreadsheet is imported as a separate dataset.

      Specify the base name of the datasets that you are creating. If you are creating a single dataset, the name of the spreadsheet is used.

    2. Selected sheets into 1 dataset: All selected sheets in the spreadsheet are combined and imported as a single dataset.

      Note

      The schemas of each dataset must match. Columns must be listed in the same order in each dataset. The column headers are taken from the first selected dataset.

    3. All and future sheets into 1 dataset: If the spreadsheet is updated periodically with new sheets that you would like to add in the future, select this option. After initial selection of sheets, all sheets that are added to the spreadsheet in the future are automatically added as part of the imported dataset.

      Note

      When an imported dataset based on this option is first loaded into the Transformer page, the data grid displays an initial sample taken from rows in the first sheet only. When you take another sample from the Samples panel, data is collected from other sheets.

  6. Selected sheets:

    1. You can select the sheets to import.

      Note

      Special characters in sheet names are filtered out.

    2. To preview the data of an individual sheet, mouse over a dataset and click Jump to.

  7. Remove special characters from column names: Select this option to remove any special characters from the inferred column headers during import.

  8. You can apply the column headers to your datasets during import. Select the required option from the drop-down list:

    • Infer header: (default) When selected, the Trifacta Application infers the header based on the data in the import.

    • Use first row as header: When selected, the first row is used as the column headers.

    • No header: When selected, the inference is ignored and column headers are defined using generic names with no headers.

  9. To save changes, click Save.

  10. After your datasets have been added, you can edit the name and description information for each in the right navigation panel.

  11. Optionally, you can assign the new dataset(s) to an existing flow or create a new one to contain them.

For more information, see Import Data Page.

After import:

After you have imported the Google Sheet, you should avoid renaming the Google Sheet or any tab in it that is part of the imported datasets. If you rename a datasource, you can see one or more of the following issues in Dataprep by Trifacta:

  • When you open a recipe using the dataset in the Transformer page, you may receive an error that the Cloud Storage path cannot be loaded.

  • Collecting samples in the Transformer page returns a generic error message.