Autenticare
Google Tools · · 8 min

Corporate RAG with Vertex AI Search: architecture that works at scale

RAG is no longer demo code. In real projects, the difference between proof-of-concept and production lies in chunking, reranking, citations and governance. Technical guide with Vertex AI Search.

Fabiano Brito

Fabiano Brito

CEO & Founder

Corporate RAG with Vertex AI Search: architecture that works at scale
TL;DR 80% of RAG projects fail in production for one of three reasons: naive chunking, absence of reranking, or lack of mandatory citation. Vertex AI Search (part of Gemini Enterprise) solves all three by default — as long as you design the indexing correctly.

RAG (Retrieval-Augmented Generation) became commodity in demos. But when the project leaves the notebook and enters operations, problems arise that don't show up with 10 documents: inconsistent search times, hallucinated responses on proprietary bases, difficulty with incremental updates, and impossible auditing.

This post is the playbook we use in Autenticare projects with Vertex AI Search — the RAG engine of Gemini Enterprise.


1. Chunking: the most expensive mistake

The default — breaking into 512-token blocks — works for Wikipedia. It fails on corporate PDFs with tables, contracts with referenced clauses, and technical bases with captioned diagrams.

Strategy that works:

  • Semantic chunking: respect headings (H1-H4), complete paragraphs, full tables. Variable size from 200 to 1500 tokens.
  • 15% overlap to preserve edge context.
  • Rich metadata: document, section, date, author, jurisdiction, sensitivity classification. Vertex AI Search indexes everything natively.
  • Pre-summarization of each chunk for recall boost (the summary goes in as a separate, retrievable field).

2. Reranking: the second retrieval nobody talks about

Embedding-search returns top-50 relevant candidates — but order matters. Without a reranker, the LLM receives contaminated context and responds poorly.

+25–40%
relevance@1
with reranker on
<1%
Measured hallucination
RAG + mandatory citation
−90%
Reindex cost
incremental upsert vs full

Vertex AI Search has a native reranker (cross-encoder) that takes the top-50 and returns the top-5 ordered by contextual relevance. The default should be on — but many people forget.


3. Mandatory citations

RAG without citation is disguised hallucination. Every response must include where it came from: document, page, paragraph. In compliance and legal contexts, without this the output has no evidentiary value.

In Vertex AI Search, this is a configuration parameter — include_citations: true. In the prompt, simply instruct: "if the answer is not in the retrieved documents, say 'I did not find it in the base' — do not invent". This reduces hallucination in well-configured RAG to less than 1%.


4. Incremental updates

Re-indexing the entire base weekly is expensive and slow. Vertex AI Search accepts upsert per document via API — you only update what changed. In projects with 500k+ documents, this reduces operating cost by 90%.

Autenticare recommendation: a Cloud Run pipeline that listens for changes in Drive/SharePoint/Confluence and triggers upsert on the index. Typical latency: 2–5 minutes between edit and availability in the agent.


5. LGPD governance

RAG is the point where personal data appears most. Three non-negotiable rules:

⚠️ LGPD trap in RAG Indexing the base without DLP or ACL compromises the entire layer. An agent that retrieves CPFs or documents outside the user's scope is a leak waiting to happen.
  1. DLP at ingest: mask CPF, email, phone, sensitive data before indexing.
  2. ACL at retrieval: the agent only retrieves documents that the real user has permission to see. Vertex AI Search supports filtering by Workspace group or native IAM.
  3. Complete audit log: who asked what, which documents were retrieved, what response was generated. Mandatory for ANPD inspection.

We detail the opt-out and LGPD-compatible setup in Gemini Enterprise training opt-out.


Reference architecture — 90 days

1
Weeks 1–2 — Inventory and classification

Map sources (Drive, SharePoint, Confluence, database), classify sensitivity, decide ACL model by group.

2
Weeks 3–4 — Ingest pipeline

Semantic chunking + DLP (Cloud DLP API) + Vertex AI Search indexing with rich metadata.

3
Weeks 5–6 — Agent in Gemini Enterprise

Agent consuming the index, with reranker and mandatory citation configured.

4
Weeks 7–8 — Evaluation against gold set

50–100 questions with validated answers, prompt tuning, confidence threshold.

5
Weeks 9–12 — Monitored rollout

Gradual release to real users, quality and cost dashboards, weekly human review.


Real cost — base 100k docs / 500 users

ComponentMonthly cost
Vertex AI Search (storage + queries)~US$ 1,500
Gemini Enterprise Standard (500 × US$ 30)US$ 15,000
Cloud Run + DLP + logs~US$ 200
Total~US$ 16,700

Calculate the ROI with the calculator.

Premier RAG Architecture

Is your knowledge base ready to become an agent?

In 90 days we design, index and deliver a corporate RAG agent with LGPD governance, mandatory citation and active gold set. Google Cloud Premier Partner.


Also read