Agent Team Philosophy

Two Principles. One Quality System.

The team is built on two compounding ideas — each powerful alone, decisive together.

Philosophy 01

Quality emerges from adversarial loops

A critic agent actively tries to break what a builder just produced. The builder fixes only what failed. The critic tries again. This loop runs unattended until nothing breaks — mimicking days of real-team code review, compressed into minutes, with no human intervention required.

critic → builder → critic unattended no human needed

Philosophy 02

Specialization beats generalism

Twelve agents each own exactly one part of the problem. A storage agent that only sees schema files reasons better about schema than a session that has seen everything. Specialization removes context pollution, role confusion, and the quality cliff that degrades single-session output past ~200 lines.

12 specialists clean context role clarity

Philosophy 01 · The Adversarial Loop

How It Works

The Critic–Builder Feedback Cycle

The critic is not a reviewer — it actively constructs failure scenarios, adversarial inputs, and edge cases the builder didn't consider. When it finds something, it returns a specific ISSUES list. The builder fixes only those items. Then the critic runs again from scratch.

Step 1

Builder
Implements

Step 2

Critic
Tries to break it

Step 3

Builder
Fixes issues

Step 4

Critic
Reviews again

Result

PASS
Done

Repeats up to N iterations · No human intervention required

Philosophy 01 · Why It's Different

What Makes It Different

The adversarial loop isn't human review and it isn't automated testing — it's a third thing that combines properties of both.

Unlike human review

No calendar dependency. No reviewer fatigue. No social pressure to approve work from a colleague. The critic has no relationship with the builder — its only job is to find failures, and it runs the moment the builder finishes.

Unlike automated tests

Tests verify what you thought to test. The critic actively constructs scenarios you didn't think of — adversarial inputs, race conditions, implicit assumptions, edge cases outside the spec. It looks for what you missed, not what you covered.

Philosophy 01 · Evidence

What Breaks Without It

Skip critic

Subtle logic errors, edge case gaps, and security issues reach production review. The critic actively tries to construct failure scenarios — it is harder to satisfy than a reviewer because its job is adversarial, not approving. Without it, the builder's blind spots become the reviewer's blind spots too.

Philosophy 02 · The Problem

The Single-Agent Trap

What Goes Wrong When One Agent Does Everything

These are not hypothetical failure modes — they are the structural problems that drove the team's design.

Context Pollution

An agent that has seen schema, frontend, tests, and business requirements all in one session reasons poorly about any of them. Each new piece of context crowds out earlier reasoning. The signal-to-noise ratio drops with every tool call.

Role Confusion

A general-purpose agent asked to both design and implement makes compromises — cutting corners on design to ship faster, or over-engineering implementation to prove capability. Specialization removes this tension completely.

Quality Cliff

Single-session quality degrades predictably as tasks grow. The first 200 lines are good. By 500 lines, the agent is fighting its own earlier decisions. By 1000 lines, it contradicts the architecture it designed 20 minutes ago.

Philosophy 02 · The Solution

12 Agents, Each With One Job

Every agent has a single domain, a single model tier, and a single place in the sequence. No agent makes decisions outside its scope.

orchestrator

Routes and sequences, never writes code

architect

Design decisions, pre-implementation

ideator

Lateral thinking, output to human only

critic

Adversarial review, tries to break the code

🌐 Playwright

frontend

UI, React, TypeScript, Tailwind

🌐 Playwright

backend

API, DB queries, Auth, Supabase

storage

All storage, sole RLS owner

researcher

Web research, docs, library investigation

tester

Tests — write and verify

🌐 Playwright

reviewer

Code review, read only, structured output

🌐 Playwright

explorer

Codebase navigation, read only, cheap

author

Docs and changelog, last step only

opus — expensive / rare

sonnet — workhorse

haiku — cheap / constant

Philosophy 02 · How It Runs

Sequential. Foreground. One at a Time.

Parallel agents sound faster. They're not better. Sequential execution keeps context clean, makes dependencies explicit, and means each agent's output is available to feed directly into the next agent's brief.

Focused context beats broad context

Each agent starts with a clean context window containing only what its role requires. This is not about model capability — it's about signal quality. A storage agent that only sees schema files reasons better about schema than any agent that has seen everything.

Delegate early, not late

The cost of fixing a wrong design after implementation is far higher than the token cost of running architect before builder. When in doubt, add the pre-implementation step. An architect brief is cheap. A builder rewrite is not.

The constraint is clarity, not speed

AI generates code faster than humans can review it. The bottleneck is always intent quality — a vague brief produces vague code regardless of which agent runs it. Sequential handoffs force each brief to be explicit and complete.

Philosophy 02 · Evidence

What Breaks Without It

Skip explorer

Builders create files or patterns inconsistent with the codebase — wrong directory, wrong naming convention, wrong abstraction level. Explorer's "existing patterns to follow" section is what keeps builders consistent on multi-file tasks. Without it, every builder starts from assumptions, not facts.

Skip architect

Builders solve the wrong problem elegantly. Correct implementation of a wrong design is the most expensive kind of rework — the code works, passes tests, clears review, and still needs to be thrown away. Architect prevents this by producing a written design decision before any code is written.

Cost & Quality

When the Overhead Is Worth It

Subagents multiply token usage by 4–7× versus a single session. The multiplier is justified when focused context produces better output than one bloated session. It is not justified for simple, single-file tasks.