Document Extract Tool (Live Query for Google BigQuery)
Use the Document Extract tool to extract structured data from PDF files using AI. This helps you quickly retrieve specific fields without manually reviewing documents. It extracts the requested fields from each PDF and writes the results to the specified output column for each row.
Use this tool when working with documents such as invoices, forms, or reports that require structured data extraction.
Tool Availability
The Document Extract tool is available in the Google BigQuery tool palette.
Before using this tool...
From a Cloud Native workflow, go to Options.
Verify that Enable LiveQuery is enabled.
Verify that the Selected Connection displays a BigQuery connection.
Tool Configuration
Select Mode
Choose the extraction method:
Pre-trained Extraction Model: Use an existing model.
Gemini AI Model: Define fields to extract using AI.
Column Name
Use the Column Name dropdown to select the column that contains the full path to your document files. The selected column must contain string data that references PDF files accessible through your BigQuery connection.
Pre-trained Extraction Model
Database Schema
Use the Database Schema dropdown to select the schema where the model is hosted.
Model
Use the Model dropdown to select the extraction model.
Gemini AI Model
Fields to Extract
Enter a list of fields that you want the model to extract from the document. Type a field name and press Enter to add it to the list. You can enter multiple fields.
For best results, use clear and specific field names. For example:
Invoice Number,Invoice Date,Total AmountCustomer Name,Address,Account Number
Execute
Select Execute to preview how the model extracts data from your documents. The preview processes only 2 rows to display results in the Results grid.
Output Column Name
Enter a name for the output column in Output Column Name. If you don't specify a name, the tool appends _extract to the input column name.
Output
The tool outputs a new column that contains the extracted data for each row.
Permission Required
To use the Gemini AI model, assign the following permissions to the BigQuery credentials (service account key or OAuth):
Vertex AI userBigQuery connection userBigQuery job userbigquery.connections.delegate