Skip to main content

Document Extract Tool (Live Query for Google BigQuery)

Use the Document Extract tool to extract structured data from PDF files using AI. This helps you quickly retrieve specific fields without manually reviewing documents. It extracts the requested fields from each PDF and writes the results to the specified output column for each row.

Use this tool when working with documents such as invoices, forms, or reports that require structured data extraction.

Tool Availability

The Document Extract tool is available in the Google BigQuery tool palette.

Before using this tool...

  1. From a Cloud Native workflow, go to Options.

  2. Verify that Enable LiveQuery is enabled.

  3. Verify that the Selected Connection displays a BigQuery connection.

Tool Configuration

Select Mode

Choose the extraction method:

  • Pre-trained Extraction Model: Use an existing model.

  • Gemini AI Model: Define fields to extract using AI.

Column Name

Use the Column Name dropdown to select the column that contains the full path to your document files. The selected column must contain string data that references PDF files accessible through your BigQuery connection.

Pre-trained Extraction Model

Database Schema

Use the Database Schema dropdown to select the schema where the model is hosted.

Model

Use the Model dropdown to select the extraction model.

Gemini AI Model

Fields to Extract

Enter a list of fields that you want the model to extract from the document. Type a field name and press Enter to add it to the list. You can enter multiple fields.

For best results, use clear and specific field names. For example:

  • Invoice Number, Invoice Date, Total Amount

  • Customer Name, Address, Account Number

Execute

Select Execute to preview how the model extracts data from your documents. The preview processes only 2 rows to display results in the Results grid.

Output Column Name

Enter a name for the output column in Output Column Name. If you don't specify a name, the tool appends _extract to the input column name.

Output

The tool outputs a new column that contains the extracted data for each row.

Permission Required

To use the Gemini AI model, assign the following permissions to the BigQuery credentials (service account key or OAuth):

  • Vertex AI user

  • BigQuery connection user

  • BigQuery job user

  • bigquery.connections.delegate