Autenticare
Google Tools · · 8 min

AI agent over structured data: text-to-SQL with Gemini + BigQuery in practice

RAG handles documents. But what about questions over structured data in BigQuery, Snowflake, or PostgreSQL? The text-to-SQL pattern with Gemini that actually works — and what still requires a human.

Fabiano Brito

Fabiano Brito

CEO & Founder

AI agent over structured data: text-to-SQL with Gemini + BigQuery in practice
TL;DR A question like "how much did we sell in the Southeast in March vs. February?" is not RAG — it's text-to-SQL. In Gemini Enterprise + BigQuery, the pattern works in production with a semantic layer, operation whitelist, and dry-run validation. Without those, you get a broken SQL generator and wrong reports.

Half of the questions executives ask the "company ChatGPT" are analytical: comparisons, totals, trends, segmentation. Without text-to-SQL, the agent replies "check the BI tool." With a well-built text-to-SQL, the agent returns the right number with a chart.


Standard architecture (7 steps)

1
Natural-language question

Raw user input, with session context.

2
Relevant schema retrieval

Agent searches the catalog for tables that cover the topic (semantic layer).

3
SQL generation

Gemini 2.5 Pro produces a parameterized query in the warehouse dialect.

4
Validation

SQL parser + operation whitelist + ACL + dry-run.

5
Execution

BigQuery/Snowflake/Postgres with the user's own identity, not a service account.

6
Post-processing

Agent formats the result and suggests a visualization.

7
Auditable answer

Number + table + optional chart + the query used (for auditing).


The key component: semantic layer

The model doesn't memorize your data warehouse schema. Without a semantic layer, it guesses table and column names.

A semantic layer is a curated catalog:

  • Tables and columns with descriptions in PT/EN.
  • Synonyms ("receita" = "revenue" = "faturamento").
  • Relationships between tables (explicit foreign keys).
  • Pre-defined metrics ("average ticket = SUM(value)/COUNT(order)").
  • Default filters ("confirmed orders only").
  • Temporal and geographic granularity.

Tools: dbt + Looker semantic layer, Cube.js, or your own YAML definition. In Autenticare projects, we standardize on versioned YAML.


Prompt patterns for text-to-SQL

Always include in the prompt:

  • Schema of the relevant tables (full DDL).
  • 3–5 examples of question → well-formed SQL.
  • Explicit dialect ("PostgreSQL 15", "BigQuery Standard SQL").
  • Constraints: "always use LIMIT 1000", "never DELETE/UPDATE/DROP", "use named parameters".
  • Output format: raw SQL inside a code fence, no extra comments.
  • Uncertainty rule: "if there is no data to answer, return null + explanation".

See broader patterns in corporate prompt engineering.


Mandatory validation before execution

  1. SQL parser: ensures it is valid SQL (not a comment, not loose text).
  2. Operation whitelist: SELECT only.
  3. Table whitelist: agent may only touch tables marked "agent-accessible".
  4. ACL check: does the user have permission on the table?
  5. Dry run in BigQuery (zero cost, validates syntax and estimates bytes).
  6. Bytes-processed cap (cost ceiling).
  7. Execution timeout.
⚠️ Without this chain… One day the agent runs SELECT * FROM events on a 50 TB table and the DBA becomes famous. Dry-run, table whitelist, and byte cap are not optional — they are the difference between text-to-SQL in production and a BigQuery billing incident.

Hybrid "analytical agent" pattern

In real cases, it is better to combine:

  • Pre-defined metrics (semantic layer): for common questions, runs a ready query with parameters. Faster and more reliable.
  • Free text-to-SQL: for ad-hoc questions outside the catalog. Extra validation required.
  • RAG over a dictionary: agent consults business documentation (policies, definitions) before generating SQL.

What goes wrong with naive text-to-SQL

Forgotten joins

"How much did we sell by region?" — agent forgets the JOIN with the store table and returns the wrong number. Solution: recommended joins in the semantic layer.

Undeclared filters

"Monthly revenue" — agent doesn't filter status="confirmed". Always include a default filter in the semantic layer.

Temporal granularity

"March sales" — March of which year? Without context the agent assumes wrongly. Include the current date in the prompt + rule: "if the year is not specified, use the current year".

Currency conversion

Multinational with BRL, USD, ARS in the same table: agent sums everything without converting. A derived metric in the semantic layer solves this.

Confusion between similar metrics

"Net revenue" vs. "gross revenue" vs. "GMV" — without a clear definition the agent chooses badly. A glossary with canonical definitions solves this.


Specific evaluation

A gold set for text-to-SQL differs from RAG. Structure:

  • Natural-language question.
  • Expected SQL (canonical) or a set of equivalent SQLs.
  • Expected result (numbers).
  • Primary metric: execution match (result matches, even if SQL is written differently).
  • Secondary metric: logical form match (correct SQL structure).

90%+ recall on execution match is a realistic target for a corpus of 100–300 questions. Details in agent evaluation in production.


Costs to control

  • BigQuery bytes: limit per query and per user/day.
  • Gemini tokens: a large schema in the prompt consumes tokens. Select only relevant tables via a first router call.
  • Query cache: identical question within a short window = result from cache, no re-execution.

Governance

  • Audit log: question + generated SQL + bytes processed + result + user.
  • Real identity: query runs with the user's own permission, not a service account.
  • PII handling: never expose CPF/e-mail in an aggregated answer — aggregation is mandatory on sensitive columns.
  • DPIA for analysis over personal data — see DPIA for Gemini Enterprise projects.

When NOT to use text-to-SQL

  • Questions requiring judgment ("what should we do?"): combine with complementary analysis.
  • Critical decisions (price, customer offer): agent suggests, human decides.
  • Complex multi-table analysis with deep business logic: better to use traditional BI + agent to summarize.

Text-to-SQL works when the agent knows what it cannot touch. Curated semantic layer + whitelist + dry-run >> giant prompt with 200 tables dumped inside.
Text-to-SQL POC

Text-to-SQL in production over your BigQuery/Snowflake/Postgres?

Autenticare delivers a POC in 4–6 weeks: semantic layer + agent + dashboard + evaluation gold set. You leave with a measured execution match and a rollout plan.