Coding is now AI Agent Management

March 2026 • Nathan Cavaglione

Slawk - a Slack clone built with AI agents

github.com/ncvgl/slawk

I posted about building a Slack clone using multiple AI agents. Everybody asked: “How did the agents talk to each other?”

I initially treated Claude Code like a smart assistant — write prompt, get code, review, repeat. But after 30 exchanges, context bloat kicked in: slower responses, confusing outputs, compaction takes a few minutes…

So I tried something different: What if I treated AI like a dev team instead of a chatbot?

One persistent “CTO” session that plans, delegates, and reviews. It never writes code. It just thinks.

Ephemeral “developer” sessions that get spawned for single tasks, build the feature, commit, and get deleted. No conversation history. No memory of previous tasks.

Disposable “QA” sessions that test the app like a human user, report bugs, and disappear.

Agent orchestration flow diagram

QA sessions use ~10-20 tool calls, developer sessions average 30-50K tokens per feature, the CTO session sits around 80K after 3 days. Total: maybe $20-30. The velocity gains are worth 100x that.

How much the CTO controls the handoff depends on how much you trust the system. Early on I had it write each developer prompt to a local .md file — I’d review it, then manually start the subagent session. Once I stopped changing anything, I let the CTO spawn subagents directly. Same outcome with less friction.

Here’s the prompt I gave the CTO for a real task:

Implement thread replies as the next feature.
Use TDD: write Playwright test first, then implement.
The feature should work end-to-end (backend + frontend).
Commit when done and report back.

That’s it. The CTO plans the task then spawns a developer agent with this prompt:

You are implementing thread replies for a Slack clone.

Backend requirements:
- Add thread_id foreign key to messages table
- Add GET /api/messages/:id/replies endpoint
- Add real-time events for new replies

Frontend requirements:
- Add "Reply in thread" button to messages
- Add right sidebar that shows thread when opened
- Show reply count on parent message
- Real-time updates when new replies arrive

Process:
1. Write Playwright test that verifies the full flow
2. Implement backend changes
3. Implement frontend changes
4. Iterate until test passes
5. Run ALL existing tests to ensure nothing broke
6. Commit with message: "feat: Add thread replies"

The agent has no context about previous features, so it explores the codebase first:

grep -r "messages" backend/src --include="*.ts"
cat backend/src/routes/messages.ts
cat frontend/src/components/MessageList.tsx

5-6% of its context window. It finds what it needs and gets to work.

Fresh eyes catch mistakes. Without context, the agent can’t make assumptions. It reads the actual code, which often reveals inconsistencies a loaded agent would miss.

TDD is the safety net. If the agent misses something, the test fails. It iterates:

Test failed: Thread sidebar doesn't open when clicking reply count

Debugging... found the issue: missing event listener on reply count button
Fixed and re-running tests...

All tests passing ✅

Developer agent running Playwright tests, iterating until all pass

Speed and parallelization. A fresh agent responds in 5 seconds vs 15-20 for a bloated one. And since agents don’t share context, they can work simultaneously using git worktrees.

After building 4-5 features, I tell the CTO:

Spawn a QA agent to test the app using the MCP browser extension.
Have it test the main features and report bugs it finds.

The CTO spawns a QA agent with this brief below. Make sure you have it enabled by typing /chrome in Claude Code. See this chrome extension

You are a QA engineer testing a Slack clone.

Open the app at http://localhost:5173 using Chrome MCP.
Test these features like a real user:
- Register a new account
- Create a channel
- Send messages
- Try replying in a thread
- Upload a file
- Use search

For each bug:
- Take a screenshot
- Note the steps to reproduce
- Rate severity (Critical/High/Medium/Low)

Write your findings to qa-report.md

To validate UI accuracy, I gave it two URLs: my local app and an actual Slack workspace I’m part of. Its job was to test the same features in both and flag differences. It clicks buttons, fills forms, pulls JavaScript from the page for advanced debugging, takes screenshots when things break.

QA agent navigating the reaction feature and confirming the thumbs up works

The result in Slawk — thumbs up reaction on a message

Something I didn’t ask for: halfway through, it created a slack-design-system.md file — colors, spacing, component patterns, all mapped from the real Slack UI. It just decided having a design anchor would help it stay consistent. I thought that was smart.

After 10-15 minutes, I get a markdown file:

# QA Report - Thread Feature Testing

## Bug #1: Thread sidebar doesn't show typing indicators
Severity: Medium
Steps: Open thread, have another user type a reply
Expected: See "User is typing..." in thread
Actual: No typing indicator appears

## Bug #2: Thread reply count doesn't update in real-time
Severity: High
Steps: User A opens channel, User B replies in thread
Expected: Reply count updates immediately
Actual: Need to refresh to see new count

QA agent feature checklist — 15+ features tested, all marked PASS

The CTO reads this report and spawns fix agents for each bug. Just ask it for subagents and you will see it spawn them.

The last 20% takes 80% of the time. After the QA agent found 14 bugs, I asked the CTO to fix them all in parallel. It ran them 3 at a time using subagents. Three hours total.

Half that time went to a single bug: removing a star icon stuck in the bottom-right corner of the screen. The agent tried everything — searched the codebase, modified CSS, rebuilt components. Nothing worked. Because the icon wasn’t part of Slawk. It was the Claude MCP extension overlaid on top of the browser.

The agent had no way to know that. It needed me to step in and say: skip this one, it’s not an issue.

14 bugs tracked in a table — the last row reads 'Not a bug, it's Claude Code itself'

That’s the pattern. Parallel AI work is fast until it hits something it can’t resolve alone — and then it waits for you. The moment you step in, 10 agents working simultaneously collapse to one, because your attention can only be in one place. You are the bottleneck.

The key to speed isn’t better prompts. It’s removing yourself from the feedback loop. And occasionally checking if it needs you to step back in.