Corporate RAG with Vertex AI Search: architecture that works at scale
RAG is no longer demo code. In real projects, the difference between proof-of-concept and production lies in chunking, reranking, citations and governance. Technical guide with Vertex AI Search.
Fabiano Brito
CEO & Founder
RAG (Retrieval-Augmented Generation) became commodity in demos. But when the project leaves the notebook and enters operations, problems arise that don't show up with 10 documents: inconsistent search times, hallucinated responses on proprietary bases, difficulty with incremental updates, and impossible auditing.
This post is the playbook we use in Autenticare projects with Vertex AI Search — the RAG engine of Gemini Enterprise.
1. Chunking: the most expensive mistake
The default — breaking into 512-token blocks — works for Wikipedia. It fails on corporate PDFs with tables, contracts with referenced clauses, and technical bases with captioned diagrams.
Strategy that works:
- Semantic chunking: respect headings (H1-H4), complete paragraphs, full tables. Variable size from 200 to 1500 tokens.
- 15% overlap to preserve edge context.
- Rich metadata: document, section, date, author, jurisdiction, sensitivity classification. Vertex AI Search indexes everything natively.
- Pre-summarization of each chunk for recall boost (the summary goes in as a separate, retrievable field).
2. Reranking: the second retrieval nobody talks about
Embedding-search returns top-50 relevant candidates — but order matters. Without a reranker, the LLM receives contaminated context and responds poorly.
with reranker on
RAG + mandatory citation
incremental upsert vs full
Vertex AI Search has a native reranker (cross-encoder) that takes the top-50 and returns the top-5 ordered by contextual relevance. The default should be on — but many people forget.
3. Mandatory citations
RAG without citation is disguised hallucination. Every response must include where it came from: document, page, paragraph. In compliance and legal contexts, without this the output has no evidentiary value.
In Vertex AI Search, this is a configuration parameter — include_citations: true. In the prompt, simply instruct: "if the answer is not in the retrieved documents, say 'I did not find it in the base' — do not invent". This reduces hallucination in well-configured RAG to less than 1%.
4. Incremental updates
Re-indexing the entire base weekly is expensive and slow. Vertex AI Search accepts upsert per document via API — you only update what changed. In projects with 500k+ documents, this reduces operating cost by 90%.
Autenticare recommendation: a Cloud Run pipeline that listens for changes in Drive/SharePoint/Confluence and triggers upsert on the index. Typical latency: 2–5 minutes between edit and availability in the agent.
5. LGPD governance
RAG is the point where personal data appears most. Three non-negotiable rules:
- DLP at ingest: mask CPF, email, phone, sensitive data before indexing.
- ACL at retrieval: the agent only retrieves documents that the real user has permission to see. Vertex AI Search supports filtering by Workspace group or native IAM.
- Complete audit log: who asked what, which documents were retrieved, what response was generated. Mandatory for ANPD inspection.
We detail the opt-out and LGPD-compatible setup in Gemini Enterprise training opt-out.
Reference architecture — 90 days
Map sources (Drive, SharePoint, Confluence, database), classify sensitivity, decide ACL model by group.
Semantic chunking + DLP (Cloud DLP API) + Vertex AI Search indexing with rich metadata.
Agent consuming the index, with reranker and mandatory citation configured.
50–100 questions with validated answers, prompt tuning, confidence threshold.
Gradual release to real users, quality and cost dashboards, weekly human review.
Real cost — base 100k docs / 500 users
| Component | Monthly cost |
|---|---|
| Vertex AI Search (storage + queries) | ~US$ 1,500 |
| Gemini Enterprise Standard (500 × US$ 30) | US$ 15,000 |
| Cloud Run + DLP + logs | ~US$ 200 |
| Total | ~US$ 16,700 |
Calculate the ROI with the calculator.
Is your knowledge base ready to become an agent?
In 90 days we design, index and deliver a corporate RAG agent with LGPD governance, mandatory citation and active gold set. Google Cloud Premier Partner.
