AI agent over structured data: text-to-SQL with Gemini + BigQuery in practice
RAG handles documents. But what about questions over structured data in BigQuery, Snowflake, or PostgreSQL? The text-to-SQL pattern with Gemini that actually works — and what still requires a human.
Fabiano Brito
CEO & Founder
Half of the questions executives ask the "company ChatGPT" are analytical: comparisons, totals, trends, segmentation. Without text-to-SQL, the agent replies "check the BI tool." With a well-built text-to-SQL, the agent returns the right number with a chart.
| Aspect | Naive Text-to-SQL | Gemini + Semantic Layer |
|---|---|---|
| Schema Knowledge | Guesses table and column names | Uses curated, versioned catalog |
| Query Validation | Executes raw generated SQL directly | Dry-runs, parses, and whitelists operations |
| Security & ACL | Shared service account (high risk) | User's own IAM identity and credentials |
Standard architecture (7 steps)
Raw user input, with session context.
Agent searches the catalog for tables that cover the topic (semantic layer).
Gemini 2.5 Pro produces a parameterized query in the warehouse dialect.
SQL parser + operation whitelist + ACL + dry-run.
BigQuery/Snowflake/Postgres with the user's own identity, not a service account.
Agent formats the result and suggests a visualization.
Number + table + optional chart + the query used (for auditing).
The key component: semantic layer
The model doesn't memorize your data warehouse schema. Without a semantic layer, it guesses table and column names.
A semantic layer is a curated catalog:
- Tables and columns with descriptions in PT/EN.
- Synonyms ("receita" = "revenue" = "faturamento").
- Relationships between tables (explicit foreign keys).
- Pre-defined metrics ("average ticket = SUM(value)/COUNT(order)").
- Default filters ("confirmed orders only").
- Temporal and geographic granularity.
Eliminates Hallucinations
By mapping business terms like "revenue" or "faturamento" to exact database columns, the model never has to guess schema structures.
Enforces Business Logic
Pre-defined metrics (e.g., average ticket calculations) and default filters ensure the AI uses the exact same formulas as your BI tools.
Tools: dbt + Looker semantic layer, Cube.js, or your own YAML definition. In Autenticare projects, we standardize on versioned YAML.
💡 Key Insight: Version Control is Mandatory
Treat your semantic layer as code. Storing your YAML definitions in Git allows you to track changes, run CI/CD tests, and prevent breaking changes from reaching your production AI agents.
Prompt patterns for text-to-SQL
Always include in the prompt:
- Schema of the relevant tables (full DDL).
- 3–5 examples of question → well-formed SQL.
- Explicit dialect ("PostgreSQL 15", "BigQuery Standard SQL").
- Constraints: "always use LIMIT 1000", "never DELETE/UPDATE/DROP", "use named parameters".
- Output format: raw SQL inside a code fence, no extra comments.
- Uncertainty rule: "if there is no data to answer, return null + exp
Frequently Asked Questions
What is the main advantage of using text-to-SQL with Gemini and BigQuery?
With well-executed text-to-SQL, the agent returns the correct number with a chart, instead of just indicating to consult the BI.
What are the steps of the standard architecture for text-to-SQL with Gemini and BigQuery?
The standard architecture involves 7 steps, from the question in natural language to the auditable answer, including schema retrieval, SQL generation, validation, execution, and post-processing.
Why is the 'semantic layer' a key component in the text-to-SQL architecture?
Without a 'semantic layer', the model may guess table and column names, compromising the accuracy of the results.
What is a 'semantic layer' in the context of text-to-SQL?
The 'semantic layer' is a curated catalog that contains tables and columns with descriptions, synonyms, relationships between tables, pre-defined metrics, standard filters, and temporal and geographic granularity.
What are the steps of the standard architecture for text-to-SQL with Gemini and BigQuery?
The standard architecture involves 7 steps, from the question in natural language to the auditable answer, including schema retrieval, SQL generation, validation, execution, and post-processing.
Why is the 'semantic layer' a key component in the text-to-SQL architecture?
Without a 'semantic layer', the model may guess table and column names, compromising the accuracy of the results.
What is a 'semantic layer' in the context of text-to-SQL?
The 'semantic layer' is a curated catalog that contains tables and columns with descriptions, synonyms, relationships between tables, pre-defined metrics, standard filters, and temporal and geographic granularity.
What are the steps of the standard architecture for text-to-SQL with Gemini and BigQuery?
The standard architecture involves 7 steps, from the question in natural language to the auditable answer, including schema retrieval, SQL generation, validation, execution, and post-processing.
Why is the 'semantic layer' a key component in the text-to-SQL architecture?
Without a 'semantic layer', the model may guess table and column names, compromising the accuracy of the results.
What is a 'semantic layer' in the context of text-to-SQL?
The 'semantic layer' is a curated catalog that contains tables and columns with descriptions, synonyms, relationships between tables, pre-defined metrics, standard filters, and temporal and geographic granularity.
Ready to build your enterprise AI Agent?
Talk to our specialists about implementing a secure, validated text-to-SQL architecture over BigQuery or Snowflake.
Talk to a specialist →