AI Model Governance: Model Cards, Versioning and What the ANPD May Ask
Models change silently. Without versioning, model cards and a baseline, your team won't detect drift and has nothing to show auditors. Practical framework for model governance in Gemini Enterprise.
Fabiano Brito
CEO & Founder
Companies running Gemini, GPT, Claude or Llama in enterprise production have a "model fleet" — each with its own version, behavior, bias and cost. Without governance, nobody knows which version runs where, and audit becomes a nightmare.
What is a model card (and why it matters)
Model card is the model's "spec sheet". Invented by Google in 2019, it became the de facto standard. For each production model, document:
- Identification: name, exact version, provider, snapshot date.
- Intended use: use case, user profile, supported decision.
- Out-of-scope use: what is not an accepted use case.
- Training data: what's known about origin (for proprietary models, what the provider publishes).
- Evaluation metrics: internal gold set, benchmarks, baseline.
- Known limitations: languages, domains, identified biases.
- Mitigation: prompt, guardrails, human hand-off.
- Technical owner: who maintains.
- Business owner: who is accountable for decisions.
- Review date: reassessment cycle.
In Gemini Enterprise, model card is per agent + per underlying model — it can be a Markdown file in the project repository.
Explicit versioning
Version pinning is mandatory. "Gemini Pro" is not a version — it's a family. "Gemini 2.5 Pro snapshot 2026-04" is a version.
Practices:
- API call always with explicit model version.
- Version change = PR + reassessment against gold set.
- Documented rollback.
- Notification to business owner before promoting new version.
Without this, Google updates a snapshot, behavior shifts, metrics regress — and nobody understands why.
Baseline and drift
Every new version is compared against baseline (current production version). Metrics:
- Faithfulness, relevance, completeness, safety (see agent evaluation in production).
- p50/p95 latency.
- Cost per execution.
- Human hand-off rate.
- Tool call distribution.
Regression of any metric by more than 5% = production block until investigated.
What ANPD and auditors are asking (2026)
Emerging pattern in sector audits (BACEN, SUSEP, ANS and others have published converging guidance):
Bias assessment: how to do it without theater
Bias in LLMs is real and measurable. How to audit:
- Define sensitive segments relevant to the case (e.g., in credit: region, age, declared gender).
- Build a balanced sample of cases per segment.
- Run agent over sample, compare outcome and tone across segments.
- Metric: statistical parity difference, equal opportunity difference.
- Report quarterly to board + risk committee.
- Corrective action when difference exceeds threshold (typical: 10%).
The automated decision case (LGPD Art. 20)
If the agent makes decisions with legal or significant effects (credit denied, contract refused, service denied), data subjects have the right to:
- Know the decision was automated.
- Receive an explanation of the criteria.
- Request review by a natural person.
Operationally:
- UX makes it clear: "this initial analysis is automated".
- Justification delivered with the decision (not just "denied").
- Explicit review channel with a defined SLA.
- Training for human reviewers.
Internal AI committee
In medium/large organizations, a committee is recommended with:
- DPO.
- Legal (sector regulatory compliance).
- Technical owner of each agent.
- HR representative (work impact).
- Business representative.
- Monthly meetings to review inventory, bias metrics, incidents.
Without a forum, AI decisions float between IT and business — when something blows up, nobody is accountable.
Decommissioning plan
The most ignored of all, but essential:
- How to turn off the agent without operational disruption?
- How fast can manual operation scale?
- Who decides to shut it down?
- How long to retain logs after shutdown?
A 2-page document. Costs nothing. Saves you in an incident.
Minimum governance stack
- Inventory: living spreadsheet or Notion/Confluence with each agent, model, owner, status.
- Model cards: MD file per agent in the repository.
- Versioning: version pinned in code, PR for any change.
- Continuous evaluation: gold set pipeline + metrics dashboard.
- Audit log: BigQuery/Cloud Logging with compatible retention.
- DPIA updated annually or on material change.
- Committee with public minutes.
Model governance is not legal theater. It's what separates "we were in control" from "we found out alongside the complaining data subject" when something goes wrong.
3 weeks to be audit-ready
Inventory + model cards + DPIA + structured committee + bias metrics pipeline. Delivered in 3 weeks, in parallel with technical implementation, including DPO and committee training.
