Autenticare
Agentic Engineering · · 7 min

The agent orchestra: why 3 specialists beat 1 generalist in production

The 'one agent does everything' model has hit its ceiling. The next generation coordinates teams of specialized agents — with isolated contexts, worktrees, a shared task list, and quality gates. Inspired by Addy Osmani, adapted to the Gemini Enterprise Agent Platform stack we run at Autenticare.

Fabiano Brito

Fabiano Brito

CEO & Founder

The agent orchestra: why 3 specialists beat 1 generalist in production
TL;DR Pairing with a single agent is what most people practice — and it's where most people get stuck. Productive teams today coordinate 3 to 5 specialized agents, each with its own context, running in isolated worktrees, syncing through a shared task list, with automated quality gates. The bottleneck is no longer code generation — it's verification. And the leverage isn't the prompt anymore: it's the spec.

The single-agent ceiling

Anyone who’s been working with agents for more than six months has felt it: there’s a plateau where adding more prompts stops accelerating anything. Three constraints explain the ceiling:

Constraint 1

📚 Context overload

Real customer codebases exceed any window. When the agent has to "remember" everything at once, it forgets the essential.

Constraint 2

🎯 No specialization

An agent that does database, API, UI and tests becomes a "jack of all trades" — master of none. The agent that only knows the data layer writes better SQL.

Constraint 3

🚦 No coordination

Multiple agents without coordination primitives (file locks, task dependencies) become chaos. Merge conflicts eat your parallelism gains.

Conductor vs orchestrator

The metaphor that best captures the shift (we owe it to Addy Osmani): you go from conductor — guiding a single musician in real time — to orchestrator — coordinating an entire ensemble asynchronously. The chat thread stops being your environment; the repo (and task board) takes its place.

DimensionConductor (1 agent)Orchestrator (team)
ModeSynchronous — you wait for each replyAsynchronous — you plan and review
ContextYour window is the ceilingN independent windows, one per specialist
Throughput1× — sequential~3× with 3 agents in parallel
WorkspaceChat threadRepo + worktrees + task list
Your rolePrompt typistProcess engineer: spec, gates, retro

Three orchestration patterns

In production, we use three patterns — chosen by task scope, not by trend:

Pattern 1

🌿 Subagents

Parent decomposes work, delegates to focused children, manages the dependency graph manually. Zero setup — start today. Token-cost neutral.

Setup
None
Sweet spot
2–4 children
Pattern 2

👥 Agent Teams

Team Lead + shared task list with file locking + peer-to-peer messaging. Auto-unblocking when dependencies clear.

Sweet spot
3–5 agents
Coordination
Task list + locks
Pattern 3

☁️ Cloud Async

Assign a task, close the laptop, come back to a PR. Runs on managed VMs — in our case, on top of Agent Runtime from Gemini Enterprise Agent Platform.

Mode
Fire-and-forget
Session
Days at a time

The bottleneck is no longer generation. It's verification. Human review isn't optional overhead — it's the safety system.

The numbers that matter

~3×
Throughput
3 agents in parallel
3–5
Team sweet spot
above that, dispersion
−3%
When AI writes AGENTS.md
+20% cost (ETH Zurich)

The right-hand number matters: research by Gloaguen et al. (ETH Zurich) shows that letting agents write their own AGENTS.md degrades performance by ~3% and increases cost by 20%+. The AGENTS.md must be human-curated — it’s what encodes the team’s institutional knowledge.

AGENTS.md: the shared brain

Four sections are enough:

## STYLE
- Functional components with hooks; named exports
- Errors always typed; never `throw "string"`

## GOTCHAS
- SQLite needs WAL for concurrent reads
- Express middleware order matters for auth

## ARCH_DECISIONS
- State in SQLite, no in-memory cache
- One Express router per feature module

## TEST_STRATEGY
- Integration > unit for HTTP routes
- supertest for request assertions

Every session reads it. No agent writes to it directly — the lead approves every line that goes in.

5 practices to start tomorrow

1
Start with subagents

Decompose the task into 2–3 surgically scoped children. No setup. The cheapest way to prove the thesis internally.

2
Isolate with worktrees

Each agent in its own git worktree. Zero merge conflicts — and a per-feature diff that's trivial to review.

3
Plan approval before coding

Teammate writes the plan; lead approves or rejects. Fixing architecture in the planning phase costs 1/10th of fixing it in code.

4
Hooks running lint + test

On TaskCompleted, validate automatically. If it fails, the agent keeps working. Lead only sees green code — like built-in CI.

5
Compound learning via AGENTS.md

Every session reads, lead updates. Patterns and gotchas become the team's long-term memory — never rediscovered each sprint.

⚠️ Classic trap Spinning up 10 agents in parallel just because you can. WIP limits are a virtue: run only what you can actually review. Above 5 agents, you're usually "generating debt that another human will pay for".

What changes at Autenticare

This model only closes in production when you have the infrastructure to sustain it. That’s exactly what the Gemini Enterprise Agent Platform (announced 04/22 — full analysis here) delivered:

  • Agent Runtime gives you the “managed VMs” of the Cloud Async pattern — sub-second cold start, day-long sessions.
  • Memory Bank + Memory Profiles is AGENTS.md elevated to a platform primitive — long-term memory across sessions.
  • Agent Sandbox is the isolated worktree in production — generated code runs without risk to the real system.

Not a coincidence. The industry path is the same: from one-shot prompt execution to the agent factory.

You're no longer writing software. You're building the factory that writes the software.

Agentic engineering in production

Want to move from a single agent to an orchestra?

We structure 3-to-5 agent teams on Gemini Enterprise Agent Platform — with worktrees, curated AGENTS.md, quality hooks and plan approval. Auditable, repeatable and measurable.


Inspired by "The Code Agent Orchestra" by Addy Osmani (Google Chrome). Adapted to the stack we operate at Autenticare on top of Gemini Enterprise Agent Platform.

Also read