Autenticare
Google Tools · · 8 min

Embeddings and Enterprise Semantic Search: the Invisible Engine Behind RAG

Embeddings seem like a technical detail — but the wrong choice degrades any enterprise AI agent. Objective guide for CTOs on models, dimensions, cost and what really matters in production.

Fabiano Brito

Fabiano Brito

CEO & Founder

Embeddings and Enterprise Semantic Search: the Invisible Engine Behind RAG
TL;DR Embedding is the numeric representation that makes semantic search work. In Gemini Enterprise, the default (text-embedding-005 via Vertex AI) covers 90% of cases. The other 10% (heavy multilingual, code, legal) deserve a conscious choice. More important than the model: chunking, metadata, reranker and hybrid search.

Embedding is the most underrated RAG component. Teams debate prompts for weeks but use the default embedding without comparing. Result: mediocre semantic search engine, an agent that seems to be "hallucinating" when it's actually receiving bad context.

This post is the guide we'd give a CTO before approving the architecture.


What is embedding (in 2 paragraphs)

An embedding model takes text ("services contract") and returns a vector of N numbers (e.g.: 768). Texts with similar meaning become nearby vectors in space; different texts go far apart.

Semantic search works like this: index all documents as vectors. Receive a question, vectorize it, find the K nearest vectors = top-K relevant documents. This is what is behind "search" in any modern RAG.


Models available in Vertex AI

ModelDimCost (US$/1M tokens)Recommended use
text-embedding-005 768 0.025 General default, light multilingual
text-embedding-large-005 3072 0.10 High precision, higher latency
text-multilingual-embedding-002 768 0.025 Specific multilingual (20+ languages)
OSS (E5, BGE) variable own infra Cases with ultra-sensitive on-prem data

In Gemini Enterprise + Vertex AI Search, the choice comes configured. To customize, create a dedicated Data Store.


Practical decision criteria

1
Languages involved

PT-BR + occasional English only → text-embedding-005. Significant ES/ZH/AR/JA → text-multilingual-embedding-002.

2
Domain

General business → default. Dense legal/medical → test large version. Source code → dedicated code embedding (never text embedding).

3
Acceptable latency

Larger vector = slower search + more storage. Conversational chat → 768 unless recall is critical.

4
Volume and cost

At 100k+ docs, the difference between 768 vs 3072 becomes real storage and periodic reindexing cost.


What matters MORE than model choice

Bad embedding of good chunk >> good embedding of bad chunk. Absolute priority is semantic chunking, not model.
  • Semantic chunking: respect document structure. See enterprise RAG.
  • Pre-summarization: indexing (chunk + summary) doubles recall in technical corpora.
  • Metadata as filter: category/date/author reduce the universe before semantic search.
  • Reranking: top-50 from embedding → top-5 from cross-encoder. Vertex AI Search has it native.
  • Hybrid search: semantic + lexical combination. Catches synonyms AND exact terms (contract number, proper name).

How to evaluate your search quality

MetricWhat it measuresTarget
Recall@10Right document in top 10?> 90%
MRRAverage position of best result> 0.6
nDCG@10Quality of result ordering> 0.7

Build a gold set of 100–300 (question, expected document) pairs. Use Vertex AI Evaluation to run on every change. More in production agent evaluation.


Common mistakes

⚠️ 5 frequent pitfalls Reindexing entire base every week (use upsert), mixing languages in monolingual embedding, 2,000-token chunks that "blur" semantics, ignoring structured metadata, and never testing model swap on a sample.

When to build your own embedding

Very rare. Consider only if:

  • Extremely specific domain (industrial chemistry, pharma patents).
  • Volume justifies fine-tuning + serving cost (millions of queries/month).
  • Team has a dedicated ML engineer.

In 95% of enterprise projects, default Vertex AI model + careful chunking beats amateur fine-tuning.

Embeddings audit

Your RAG performs mediocre and you don't know if the bottleneck is the embedding?

In 2 weeks: gold set + benchmark of 3 models + reranker on + report with action plan. Decision based on data, not hype.


Read also