AI agent over structured data: text-to-SQL with Gemini + BigQuery in practice

TL;DR A question like "how much did we sell in the Southeast in March vs. February?" is not RAG — it's text-to-SQL. In Gemini Enterprise + BigQuery, the pattern works in production with a semantic layer, operation whitelist, and dry-run validation. Without those, you get a broken SQL generator and wrong reports.

Half of the questions executives ask the "company ChatGPT" are analytical: comparisons, totals, trends, segmentation. Without text-to-SQL, the agent replies "check the BI tool." With a well-built text-to-SQL, the agent returns the right number with a chart.

Standard architecture (7 steps)

Natural-language question

Raw user input, with session context.

Relevant schema retrieval

Agent searches the catalog for tables that cover the topic (semantic layer).

SQL generation

Gemini 2.5 Pro produces a parameterized query in the warehouse dialect.

Validation

SQL parser + operation whitelist + ACL + dry-run.

Execution

BigQuery/Snowflake/Postgres with the user's own identity, not a service account.

Post-processing

Agent formats the result and suggests a visualization.

Auditable answer

Number + table + optional chart + the query used (for auditing).

The key component: semantic layer

The model doesn't memorize your data warehouse schema. Without a semantic layer, it guesses table and column names.

A semantic layer is a curated catalog:

Tables and columns with descriptions in PT/EN.
Synonyms ("receita" = "revenue" = "faturamento").
Relationships between tables (explicit foreign keys).
Pre-defined metrics ("average ticket = SUM(value)/COUNT(order)").
Default filters ("confirmed orders only").
Temporal and geographic granularity.

Tools: dbt + Looker semantic layer, Cube.js, or your own YAML definition. In Autenticare projects, we standardize on versioned YAML.

Prompt patterns for text-to-SQL

Always include in the prompt:

Schema of the relevant tables (full DDL).
3–5 examples of question → well-formed SQL.
Explicit dialect ("PostgreSQL 15", "BigQuery Standard SQL").
Constraints: "always use LIMIT 1000", "never DELETE/UPDATE/DROP", "use named parameters".
Output format: raw SQL inside a code fence, no extra comments.
Uncertainty rule: "if there is no data to answer, return null + explanation".

See broader patterns in corporate prompt engineering.

Mandatory validation before execution

SQL parser: ensures it is valid SQL (not a comment, not loose text).
Operation whitelist: SELECT only.
Table whitelist: agent may only touch tables marked "agent-accessible".
ACL check: does the user have permission on the table?
Dry run in BigQuery (zero cost, validates syntax and estimates bytes).
Bytes-processed cap (cost ceiling).
Execution timeout.

⚠️ Without this chain… One day the agent runs SELECT * FROM events on a 50 TB table and the DBA becomes famous. Dry-run, table whitelist, and byte cap are not optional — they are the difference between text-to-SQL in production and a BigQuery billing incident.

Hybrid "analytical agent" pattern

In real cases, it is better to combine:

Pre-defined metrics (semantic layer): for common questions, runs a ready query with parameters. Faster and more reliable.
Free text-to-SQL: for ad-hoc questions outside the catalog. Extra validation required.
RAG over a dictionary: agent consults business documentation (policies, definitions) before generating SQL.

What goes wrong with naive text-to-SQL

Forgotten joins

"How much did we sell by region?" — agent forgets the JOIN with the store table and returns the wrong number. Solution: recommended joins in the semantic layer.

Undeclared filters

"Monthly revenue" — agent doesn't filter status="confirmed". Always include a default filter in the semantic layer.

Temporal granularity

"March sales" — March of which year? Without context the agent assumes wrongly. Include the current date in the prompt + rule: "if the year is not specified, use the current year".

Currency conversion

Multinational with BRL, USD, ARS in the same table: agent sums everything without converting. A derived metric in the semantic layer solves this.

Confusion between similar metrics

"Net revenue" vs. "gross revenue" vs. "GMV" — without a clear definition the agent chooses badly. A glossary with canonical definitions solves this.

Specific evaluation

A gold set for text-to-SQL differs from RAG. Structure:

Natural-language question.
Expected SQL (canonical) or a set of equivalent SQLs.
Expected result (numbers).
Primary metric: execution match (result matches, even if SQL is written differently).
Secondary metric: logical form match (correct SQL structure).

90%+ recall on execution match is a realistic target for a corpus of 100–300 questions. Details in agent evaluation in production.

Costs to control

BigQuery bytes: limit per query and per user/day.
Gemini tokens: a large schema in the prompt consumes tokens. Select only relevant tables via a first router call.
Query cache: identical question within a short window = result from cache, no re-execution.

Governance

Audit log: question + generated SQL + bytes processed + result + user.
Real identity: query runs with the user's own permission, not a service account.
PII handling: never expose CPF/e-mail in an aggregated answer — aggregation is mandatory on sensitive columns.
DPIA for analysis over personal data — see DPIA for Gemini Enterprise projects.

When NOT to use text-to-SQL

Questions requiring judgment ("what should we do?"): combine with complementary analysis.
Critical decisions (price, customer offer): agent suggests, human decides.
Complex multi-table analysis with deep business logic: better to use traditional BI + agent to summarize.

Text-to-SQL works when the agent knows what it cannot touch. Curated semantic layer + whitelist + dry-run >> giant prompt with 200 tables dumped inside.

Text-to-SQL POC

Text-to-SQL in production over your BigQuery/Snowflake/Postgres?

Autenticare delivers a POC in 4–6 weeks: semantic layer + agent + dashboard + evaluation gold set. You leave with a measured execution match and a rollout plan.

Request POC → Corporate RAG