The agent orchestra: why 3 specialists beat 1 generalist in production
The 'one agent does everything' model has hit its ceiling. The next generation coordinates teams of specialized agents — with isolated contexts, worktrees, a shared task list, and quality gates. Inspired by Addy Osmani, adapted to the Gemini Enterprise Agent Platform stack we run at Autenticare.
Fabiano Brito
CEO & Founder
The single-agent ceiling
Anyone who’s been working with agents for more than six months has felt it: there’s a plateau where adding more prompts stops accelerating anything. Three constraints explain the ceiling:
📚 Context overload
Real customer codebases exceed any window. When the agent has to "remember" everything at once, it forgets the essential.
🎯 No specialization
An agent that does database, API, UI and tests becomes a "jack of all trades" — master of none. The agent that only knows the data layer writes better SQL.
🚦 No coordination
Multiple agents without coordination primitives (file locks, task dependencies) become chaos. Merge conflicts eat your parallelism gains.
Conductor vs orchestrator
The metaphor that best captures the shift (we owe it to Addy Osmani): you go from conductor — guiding a single musician in real time — to orchestrator — coordinating an entire ensemble asynchronously. The chat thread stops being your environment; the repo (and task board) takes its place.
| Dimension | Conductor (1 agent) | Orchestrator (team) |
|---|---|---|
| Mode | Synchronous — you wait for each reply | Asynchronous — you plan and review |
| Context | Your window is the ceiling | N independent windows, one per specialist |
| Throughput | 1× — sequential | ~3× with 3 agents in parallel |
| Workspace | Chat thread | Repo + worktrees + task list |
| Your role | Prompt typist | Process engineer: spec, gates, retro |
Three orchestration patterns
In production, we use three patterns — chosen by task scope, not by trend:
🌿 Subagents
Parent decomposes work, delegates to focused children, manages the dependency graph manually. Zero setup — start today. Token-cost neutral.
- Setup
- None
- Sweet spot
- 2–4 children
👥 Agent Teams
Team Lead + shared task list with file locking + peer-to-peer messaging. Auto-unblocking when dependencies clear.
- Sweet spot
- 3–5 agents
- Coordination
- Task list + locks
☁️ Cloud Async
Assign a task, close the laptop, come back to a PR. Runs on managed VMs — in our case, on top of Agent Runtime from Gemini Enterprise Agent Platform.
- Mode
- Fire-and-forget
- Session
- Days at a time
The bottleneck is no longer generation. It's verification. Human review isn't optional overhead — it's the safety system.
The numbers that matter
3 agents in parallel
above that, dispersion
+20% cost (ETH Zurich)
The right-hand number matters: research by Gloaguen et al. (ETH Zurich) shows that letting agents write their own AGENTS.md degrades performance by ~3% and increases cost by 20%+. The AGENTS.md must be human-curated — it’s what encodes the team’s institutional knowledge.
AGENTS.md: the shared brain
Four sections are enough:
## STYLE
- Functional components with hooks; named exports
- Errors always typed; never `throw "string"`
## GOTCHAS
- SQLite needs WAL for concurrent reads
- Express middleware order matters for auth
## ARCH_DECISIONS
- State in SQLite, no in-memory cache
- One Express router per feature module
## TEST_STRATEGY
- Integration > unit for HTTP routes
- supertest for request assertions
Every session reads it. No agent writes to it directly — the lead approves every line that goes in.
5 practices to start tomorrow
Decompose the task into 2–3 surgically scoped children. No setup. The cheapest way to prove the thesis internally.
Each agent in its own git worktree. Zero merge conflicts — and a per-feature diff that's trivial to review.
Teammate writes the plan; lead approves or rejects. Fixing architecture in the planning phase costs 1/10th of fixing it in code.
On TaskCompleted, validate automatically. If it fails, the agent keeps working. Lead only sees green code — like built-in CI.
Every session reads, lead updates. Patterns and gotchas become the team's long-term memory — never rediscovered each sprint.
What changes at Autenticare
This model only closes in production when you have the infrastructure to sustain it. That’s exactly what the Gemini Enterprise Agent Platform (announced 04/22 — full analysis here) delivered:
- Agent Runtime gives you the “managed VMs” of the Cloud Async pattern — sub-second cold start, day-long sessions.
- Memory Bank + Memory Profiles is
AGENTS.mdelevated to a platform primitive — long-term memory across sessions. - Agent Sandbox is the isolated worktree in production — generated code runs without risk to the real system.
Not a coincidence. The industry path is the same: from one-shot prompt execution to the agent factory.
You're no longer writing software. You're building the factory that writes the software.
Want to move from a single agent to an orchestra?
We structure 3-to-5 agent teams on Gemini Enterprise Agent Platform — with worktrees, curated AGENTS.md, quality hooks and plan approval. Auditable, repeatable and measurable.
Inspired by "The Code Agent Orchestra" by Addy Osmani (Google Chrome). Adapted to the stack we operate at Autenticare on top of Gemini Enterprise Agent Platform.
