Autenticare
Engenharia Agêntica · · 7 min

Memory Bank in Production: 3 Patterns to Prevent PII Leakage Between Client Sessions

Persistent memory in the Gemini Enterprise Agent Platform is a powerful accelerator — and a silent data-leak vector if not configured with tenant isolation. Three production patterns that prevent one client's data from contaminating another's session.

Fabiano Brito

Fabiano Brito

CTO, Autenticare

Memory Bank in Production: 3 Patterns to Prevent PII Leakage Between Client Sessions
TL;DR The Memory Bank in the Gemini Enterprise Agent Platform is powerful and dangerous in equal measure. Without tenant ID isolation, a customer-service agent can answer client B's session with client A's data. This post documents the 3 patterns we use in production to prevent exactly that.

The Gemini Enterprise Agent Platform introduced Memory Bank as a first-class feature: agents persist context between sessions, learn user preferences, and resume interrupted conversations without losing thread. For customer-facing use cases — insurance, banking, healthcare — this has real value.

The problem: persistent memory without tenant isolation is a scheduled data leak.

In a multi-client deployment, a shared agent stores and retrieves memory. If the partition key is missing or incorrect, the Memory Bank’s vector search returns fragments from previous sessions — belonging to other clients. The model completes its response using those fragments without flagging the contamination. The user sees data that isn’t theirs. The engineer receives no alert.

This scenario isn’t hypothetical. It is the platform’s default behavior without additional configuration.


Why Memory Bank Exposes PII by Default

Memory Bank is built on a vector index. Each entry is a memory fragment with optional metadata. When an agent retrieves context for a new session, it queries the index using semantic embeddings of the current conversation — not identity filters, unless you configure them explicitly.

Semantic search finds what is relevant, not what is yours. Without partitioning, “what is my account balance?” can return memory from a previous session of another client who asked the same question.

Well-configured persistent memory is what separates an agent that learns from an agent that leaks. The difference is a single metadata field.

Pattern 1 — Tenant-scoped memory profiles

The most direct pattern: each client (tenant) has its own Memory Profile — an isolated namespace within the Memory Bank. The agent only reads and writes within the active tenant’s profile.

In the Gemini Enterprise Agent Platform, this means passing tenant_id as a mandatory filter in all memory operations:

# Wrong — no tenant scope
memory_bank.search(query=user_message, top_k=5)

# Correct — with tenant filter
memory_bank.search(
    query=user_message,
    top_k=5,
    filter={"tenant_id": session.tenant_id}
)

The metadata filter eliminates from the search any fragment not belonging to the current tenant. This isn’t an optional feature — it’s the line that separates real isolation from apparent isolation.

When to use: any multi-client deployment. Always. No exceptions.


Pattern 2 — TTL differentiated by data sensitivity

Not all memory has the same useful lifespan — or the same risk. Preference data (communication tone, preferred report format) is low-risk and benefits from longevity. Transactional data (balance queried, operation authorized) has a short relevance window and high risk.

Configure TTL (time-to-live) differentiated by data category:

MEMORY_TTL = {
    "preference":    60 * 60 * 24 * 90,   # 90 days  — low risk
    "interaction":   60 * 60 * 24 * 30,   # 30 days  — medium risk
    "transactional": 60 * 60 * 24 * 1,    # 1 day    — high risk
    "pii_explicit":  60 * 60 * 4,          # 4 hours  — critical
}

memory_bank.write(
    content=fragment,
    metadata={
        "tenant_id": session.tenant_id,
        "category": "transactional",
    },
    ttl=MEMORY_TTL["transactional"]
)

Short TTLs for sensitive data aren’t just a GDPR/LGPD compliance measure — they minimize the exposure window if tenant filtering is violated by a bug.

When to use: any system handling financial, healthcare, or personal identifier data. Short TTL is a defense-in-depth layer.


Pattern 3 — Memory audit log with origin traceability

Patterns 1 and 2 prevent leaks under normal conditions. Pattern 3 detects when something went wrong.

Every fragment written to Memory Bank should carry a minimal audit trail: who wrote it, in which session, with which tenant, and when. In production:

memory_bank.write(
    content=fragment,
    metadata={
        "tenant_id":  session.tenant_id,
        "session_id": session.id,
        "agent_id":   agent.id,
        "written_at": datetime.utcnow().isoformat(),
        "category":   classify_pii(fragment),   # enum: none / low / high
    },
    ttl=MEMORY_TTL[classify_pii(fragment)]
)

With this log, any memory read can be audited: “this fragment came from which session, which tenant?”. When a client reports seeing someone else’s data, you have full traceability to investigate — and to demonstrate compliance.

Combine with an automatic alert when classify_pii(fragment) == "high" appears in a cross-session read. False positives will exist, but the cost of a false negative is incomparably higher.

When to use: any regulated system (healthcare, finance, education) where auditability is a legal requirement.


The Three Patterns Together

Each pattern solves a different layer of the problem:

Pattern 1

Tenant Isolation

Mandatory tenant_id filter on all Memory Bank operations. Prevents cross-client access.

Prevents the leak
Pattern 2

Sensitivity-based TTL

Transactional data expires in hours, preferences in weeks. Minimizes exposure window if isolation fails.

Limits the damage
Pattern 3

Audit Log

Full traceability of each fragment's origin. Detects cross-session anomalies and proves compliance.

Detects what slipped through

Applied together, they form a defense-in-depth: Pattern 1 prevents the normal case, Pattern 2 limits damage if Pattern 1 fails, and Pattern 3 provides traceability for regulatory audits.


What the Official Documentation Doesn’t Say

Google’s documentation for Memory Bank describes the API clearly. What it does not do is warn that the default behavior is insecure for multi-client deployments. The tenant filter is not the default — it is a configuration that must be explicit.

This is the kind of trap that only surfaces in production, when the first client reports seeing another client’s data. At that point, the rework of retroactively isolating existing memories is costly — and potentially involves regulatory notification.

The cost of implementing these three patterns from the start is a few hours. The cost of not implementing them can be a GDPR violation with a 72-hour notification obligation.


Building agents that handle sensitive data?

Autenticare designs agent architectures with tenant isolation, auditability, and LGPD compliance from the first sprint. Talk to our team before going to production.



Primary source: Google Cloud Blog — Introducing Gemini Enterprise Agent Platform