Autenticare
Governance & Compliance · · 8 min

AI Model Governance: Model Cards, Versioning and What the ANPD May Ask

Models change silently. Without versioning, model cards and a baseline, your team won't detect drift and has nothing to show auditors. Practical framework for model governance in Gemini Enterprise.

Fabiano Brito

Fabiano Brito

CEO & Founder

AI Model Governance: Model Cards, Versioning and What the ANPD May Ask
TL;DR In 2026, ANPD and sector auditors (BACEN, SUSEP, ANS) are starting to require formal documentation on AI models used in automated decisions. Model card, versioning, evaluation baseline and review plan are no longer "best practice" — they're becoming mandatory. Practical recipe for Gemini Enterprise projects.

Companies running Gemini, GPT, Claude or Llama in enterprise production have a "model fleet" — each with its own version, behavior, bias and cost. Without governance, nobody knows which version runs where, and audit becomes a nightmare.


What is a model card (and why it matters)

Model card is the model's "spec sheet". Invented by Google in 2019, it became the de facto standard. For each production model, document:

  • Identification: name, exact version, provider, snapshot date.
  • Intended use: use case, user profile, supported decision.
  • Out-of-scope use: what is not an accepted use case.
  • Training data: what's known about origin (for proprietary models, what the provider publishes).
  • Evaluation metrics: internal gold set, benchmarks, baseline.
  • Known limitations: languages, domains, identified biases.
  • Mitigation: prompt, guardrails, human hand-off.
  • Technical owner: who maintains.
  • Business owner: who is accountable for decisions.
  • Review date: reassessment cycle.

In Gemini Enterprise, model card is per agent + per underlying model — it can be a Markdown file in the project repository.


Explicit versioning

Version pinning is mandatory. "Gemini Pro" is not a version — it's a family. "Gemini 2.5 Pro snapshot 2026-04" is a version.

Practices:

  • API call always with explicit model version.
  • Version change = PR + reassessment against gold set.
  • Documented rollback.
  • Notification to business owner before promoting new version.

Without this, Google updates a snapshot, behavior shifts, metrics regress — and nobody understands why.


Baseline and drift

Every new version is compared against baseline (current production version). Metrics:

  • Faithfulness, relevance, completeness, safety (see agent evaluation in production).
  • p50/p95 latency.
  • Cost per execution.
  • Human hand-off rate.
  • Tool call distribution.

Regression of any metric by more than 5% = production block until investigated.


What ANPD and auditors are asking (2026)

Emerging pattern in sector audits (BACEN, SUSEP, ANS and others have published converging guidance):

1
Inventory of AI systems used in automated decisions — a living list with owner, status, criticality.
2
Model card per system — spec sheet with version, limitations, mitigations.
3
DPIA with LLM-specific risk matrix (see DPIA for Gemini Enterprise projects).
4
Bias assessment quarterly by sensitive segment (gender, race, region, age).
5
Audit log capable of reconstructing individual decisions — input, context, response, tools called, decision made.
6
Right to human review in operation — a channel with SLA, not just a contract clause.
7
Decommissioning plan: how to turn off the model without operational disruption.

Bias assessment: how to do it without theater

Bias in LLMs is real and measurable. How to audit:

  • Define sensitive segments relevant to the case (e.g., in credit: region, age, declared gender).
  • Build a balanced sample of cases per segment.
  • Run agent over sample, compare outcome and tone across segments.
  • Metric: statistical parity difference, equal opportunity difference.
  • Report quarterly to board + risk committee.
  • Corrective action when difference exceeds threshold (typical: 10%).

The automated decision case (LGPD Art. 20)

If the agent makes decisions with legal or significant effects (credit denied, contract refused, service denied), data subjects have the right to:

  • Know the decision was automated.
  • Receive an explanation of the criteria.
  • Request review by a natural person.

Operationally:

  • UX makes it clear: "this initial analysis is automated".
  • Justification delivered with the decision (not just "denied").
  • Explicit review channel with a defined SLA.
  • Training for human reviewers.

Internal AI committee

In medium/large organizations, a committee is recommended with:

  • DPO.
  • Legal (sector regulatory compliance).
  • Technical owner of each agent.
  • HR representative (work impact).
  • Business representative.
  • Monthly meetings to review inventory, bias metrics, incidents.

Without a forum, AI decisions float between IT and business — when something blows up, nobody is accountable.


Decommissioning plan

The most ignored of all, but essential:

  • How to turn off the agent without operational disruption?
  • How fast can manual operation scale?
  • Who decides to shut it down?
  • How long to retain logs after shutdown?

A 2-page document. Costs nothing. Saves you in an incident.


Minimum governance stack

  • Inventory: living spreadsheet or Notion/Confluence with each agent, model, owner, status.
  • Model cards: MD file per agent in the repository.
  • Versioning: version pinned in code, PR for any change.
  • Continuous evaluation: gold set pipeline + metrics dashboard.
  • Audit log: BigQuery/Cloud Logging with compatible retention.
  • DPIA updated annually or on material change.
  • Committee with public minutes.

⚠️ LGPD Art. 20 is not optional If the agent makes decisions with legal or significant effects (credit, contract, service denied), data subjects have an explicit right to human review, explanation of criteria, and to know the decision was automated. "Denied" without justification won't pass. Train human reviewers — they need to understand the agent in order to challenge it.
Model governance is not legal theater. It's what separates "we were in control" from "we found out alongside the complaining data subject" when something goes wrong.
AI Governance Kit

3 weeks to be audit-ready

Inventory + model cards + DPIA + structured committee + bias metrics pipeline. Delivered in 3 weeks, in parallel with technical implementation, including DPO and committee training.


Also read