Autenticare
Google Tools · · 8 min

AI agent over structured data: text-to-SQL with Gemini + BigQuery in practice

RAG handles documents. But what about questions over structured data in BigQuery, Snowflake, or PostgreSQL? The text-to-SQL pattern with Gemini that actually works — and what still requires a human.

Fabiano Brito

Fabiano Brito

CEO & Founder

AI agent over structured data: text-to-SQL with Gemini + BigQuery in practice
Text-to-SQL with Gemini and BigQuery is a production pattern that translates natural-language analytical questions into validated database queries using a semantic layer. This architecture is critical for enterprises because it allows AI agents to deliver accurate, auditable analytical answers and charts directly to executives instead of redirecting them to BI tools.
50%
Executive queries are analytical
95%+
Accuracy with Semantic Layer
10x
Faster than manual BI requests
0
Direct DB access required by users
TL;DR A question like "how much did we sell in the Southeast in March vs. February?" is not RAG — it's text-to-SQL. In Gemini Enterprise + BigQuery, the pattern works in production with a semantic layer, operation whitelist, and dry-run validation. Without those, you get a broken SQL generator and wrong reports.

Half of the questions executives ask the "company ChatGPT" are analytical: comparisons, totals, trends, segmentation. Without text-to-SQL, the agent replies "check the BI tool." With a well-built text-to-SQL, the agent returns the right number with a chart.

Aspect Naive Text-to-SQL Gemini + Semantic Layer
Schema Knowledge Guesses table and column names Uses curated, versioned catalog
Query Validation Executes raw generated SQL directly Dry-runs, parses, and whitelists operations
Security & ACL Shared service account (high risk) User's own IAM identity and credentials

Standard architecture (7 steps)

1
Natural-language question

Raw user input, with session context.

2
Relevant schema retrieval

Agent searches the catalog for tables that cover the topic (semantic layer).

3
SQL generation

Gemini 2.5 Pro produces a parameterized query in the warehouse dialect.

4
Validation

SQL parser + operation whitelist + ACL + dry-run.

5
Execution

BigQuery/Snowflake/Postgres with the user's own identity, not a service account.

6
Post-processing

Agent formats the result and suggests a visualization.

7
Auditable answer

Number + table + optional chart + the query used (for auditing).


The key component: semantic layer

The model doesn't memorize your data warehouse schema. Without a semantic layer, it guesses table and column names.

A semantic layer is a curated catalog:

  • Tables and columns with descriptions in PT/EN.
  • Synonyms ("receita" = "revenue" = "faturamento").
  • Relationships between tables (explicit foreign keys).
  • Pre-defined metrics ("average ticket = SUM(value)/COUNT(order)").
  • Default filters ("confirmed orders only").
  • Temporal and geographic granularity.

Eliminates Hallucinations

By mapping business terms like "revenue" or "faturamento" to exact database columns, the model never has to guess schema structures.

Enforces Business Logic

Pre-defined metrics (e.g., average ticket calculations) and default filters ensure the AI uses the exact same formulas as your BI tools.

Tools: dbt + Looker semantic layer, Cube.js, or your own YAML definition. In Autenticare projects, we standardize on versioned YAML.

💡 Key Insight: Version Control is Mandatory

Treat your semantic layer as code. Storing your YAML definitions in Git allows you to track changes, run CI/CD tests, and prevent breaking changes from reaching your production AI agents.


Prompt patterns for text-to-SQL

Always include in the prompt:

  • Schema of the relevant tables (full DDL).
  • 3–5 examples of question → well-formed SQL.
  • Explicit dialect ("PostgreSQL 15", "BigQuery Standard SQL").
  • Constraints: "always use LIMIT 1000", "never DELETE/UPDATE/DROP", "use named parameters".
  • Output format: raw SQL inside a code fence, no extra comments.
  • Uncertainty rule: "if there is no data to answer, return null + exp

Frequently Asked Questions

What is the main advantage of using text-to-SQL with Gemini and BigQuery?

With well-executed text-to-SQL, the agent returns the correct number with a chart, instead of just indicating to consult the BI.

What are the steps of the standard architecture for text-to-SQL with Gemini and BigQuery?

The standard architecture involves 7 steps, from the question in natural language to the auditable answer, including schema retrieval, SQL generation, validation, execution, and post-processing.

Why is the 'semantic layer' a key component in the text-to-SQL architecture?

Without a 'semantic layer', the model may guess table and column names, compromising the accuracy of the results.

What is a 'semantic layer' in the context of text-to-SQL?

The 'semantic layer' is a curated catalog that contains tables and columns with descriptions, synonyms, relationships between tables, pre-defined metrics, standard filters, and temporal and geographic granularity.

What are the steps of the standard architecture for text-to-SQL with Gemini and BigQuery?

The standard architecture involves 7 steps, from the question in natural language to the auditable answer, including schema retrieval, SQL generation, validation, execution, and post-processing.

Why is the 'semantic layer' a key component in the text-to-SQL architecture?

Without a 'semantic layer', the model may guess table and column names, compromising the accuracy of the results.

What is a 'semantic layer' in the context of text-to-SQL?

The 'semantic layer' is a curated catalog that contains tables and columns with descriptions, synonyms, relationships between tables, pre-defined metrics, standard filters, and temporal and geographic granularity.

What are the steps of the standard architecture for text-to-SQL with Gemini and BigQuery?

The standard architecture involves 7 steps, from the question in natural language to the auditable answer, including schema retrieval, SQL generation, validation, execution, and post-processing.

Why is the 'semantic layer' a key component in the text-to-SQL architecture?

Without a 'semantic layer', the model may guess table and column names, compromising the accuracy of the results.

What is a 'semantic layer' in the context of text-to-SQL?

The 'semantic layer' is a curated catalog that contains tables and columns with descriptions, synonyms, relationships between tables, pre-defined metrics, standard filters, and temporal and geographic granularity.

Ready to build your enterprise AI Agent?

Talk to our specialists about implementing a secure, validated text-to-SQL architecture over BigQuery or Snowflake.

Talk to a specialist →