She Wants to Talk to the Manager
A multi-agent system that spawns autonomous AI agents, each in its own terminal, working together like a software team. No servers. No databases. Just shell scripts and markdown.
Let's Explore ↓From one command to a team of AI agents building your software
You type one sentence: "Build me an invoicing SaaS." Within seconds, an AI product manager starts asking you clarifying questions. A minute later, a dev lead is breaking tasks into tickets. Developers start writing code in parallel. A QA agent tests everything.
That's agent-karen. It's an orchestration system that turns a single Claude Code session into a full software team.
karen init ~/my-project
Sets up the scaffolding in your project: creates folders for inboxes, memory, and context. Writes permissions so agents can work without asking for approval on every command.
karen start
Boots up a manager agent in your terminal. This is the agent you talk to directly. It reads its role instructions and waits for your command.
"Spawn a PM and let's brainstorm"
You tell the manager what you want. It spawns agents in new terminal tabs, delegates work, monitors progress, and reports back to you.
Let's trace the journey after you tell the manager to build something.
Understanding this system means you can direct AI agents more effectively. Instead of one monolithic AI doing everything, you get specialized agents that focus on what they're good at — just like a real team. When you know how the pieces connect, you can tell the manager exactly what to do, debug communication breakdowns, and add new roles tailored to your project.
From zero to a working multi-agent team in under 2 minutes
karen runs on top of two tools you'll need installed first:
Anthropic's CLI for Claude. This is the AI engine that powers each agent. Install with npm install -g @anthropic-ai/claude-code
A terminal multiplexer for macOS that gives each agent its own tab. Download from cmux.com. Alternatively, tmux works too.
One command. That's it.
npm install -g agent-karen
Downloads karen from the npm registry and installs it globally so you can use the karen command from anywhere.
karen init ~/projects/my-app
Creates the .agent/ directory with inboxes, memory, and state folders. Writes permissions to .claude/settings.json so agents can work autonomously. Sets up hooks for message passing.
cd ~/projects/my-app
karen start
Launches a cmux workspace with the manager agent. This is the agent you'll talk to directly — it orchestrates everything else.
"I want to build a task management app
with drag-and-drop. Spawn a PM and
let's brainstorm."
The manager spawns a PM agent in a new tab. The PM asks you clarifying questions, writes a brief, then the manager spawns a dev lead who breaks it into tasks and assigns developers. All automatically.
If you have existing documentation you want agents to reference (API docs, design specs, etc.), link them during init:
karen init ~/projects/my-app \
--knowledge ~/docs/api-reference \
--knowledge ~/docs/design-system
Creates symlinks in .agent/knowledge/ pointing to your docs. Every agent reads these on boot, so they have full context about your project without you having to explain it each time.
Want agents with specialized behaviors? Create a .agent-roles/ directory in your project and add markdown files like copywriter.md or data-engineer.md. Karen auto-discovers them — no config needed.
Every agent has a role, a personality, and a strict job description
Here's a secret that makes the whole system click: every agent is the same AI model (Claude). What makes each agent different is its role file — a markdown document that tells the agent who it is, what tools it has, and how to behave.
Think of it like an acting troupe. Same actors, different scripts. The PM reads pm.md and becomes a product manager. The QA reads qa.md and becomes a testing specialist.
The boss. Talks to you, spawns agents, monitors health, never writes code. The only agent that stays alive for the whole session.
Asks questions, writes the product brief, defines scope. First to spawn, first to finish.
Breaks the brief into tasks, spawns developers, coordinates QA. The technical brain.
Writes code, runs tests, reports back. Multiple devs can work in parallel on different tasks.
Reviews code, runs test suites, writes bug reports. The last line of defense before shipping.
Audits for vulnerabilities. Classifies findings by severity. P0 findings block the entire release.
Every role file follows the same structure. Here's the key section from the PM's role:
# ROLE: Product Manager
## Workflow
1. Read your inbox for the product goal.
2. Ask 3-5 clarifying questions via msg.sh.
3. Write the brief to .agent/context/brief.md.
4. Notify manager: "Brief complete."
## Sending messages
.agent/scripts/msg.sh manager "<msg>" question
.agent/scripts/msg.sh lead "<msg>" result
This file defines the agent's identity — it's a product manager.
Step-by-step instructions the agent follows, in order.
First, check the inbox to understand what you're building.
Ask the manager specific questions to fill gaps.
Write a structured document with the product spec.
Signal that the work is done so the next phase can start.
These are the exact commands the agent runs to send messages to other agents.
Want a "copywriter" agent? Create .agent-roles/copywriter.md in your project with the same structure. The spawn system automatically finds it. You're programming AI behavior with plain English — no code required.
JSONL inboxes, message passing, and a shared audit trail
Every agent has a physical mailbox on disk: a file called .agent/inbox/{name}.jsonl. When one agent wants to talk to another, it doesn't call it directly — it drops a letter in the mailbox.
The recipient checks their mailbox automatically before every response (via a hook). If there's new mail, it appears right in their conversation context.
Each message is a single line of JSONL:
{
"from": "pm",
"type": "question",
"ts": "2026-03-22T10:30:00Z",
"body": "What's the target user persona?"
}
This message is from the PM agent.
It's a question (as opposed to a result, escalation, or regular message).
Timestamped so you can trace the conversation history.
The actual content of the message — what the PM is asking.
Here's what it looks like when the manager, PM, and dev lead coordinate on a task:
The type field in each message tells the recipient how to react:
message
Regular communication. "Here's what I found." No urgency.
question
Needs a response before the sender can continue.
result
Work is done. Signals completion — triggers auto-shutdown of the sender's workspace.
escalation
Something is wrong. Goes straight to the manager. Priority handling.
unblock
Answering a blocker. "Go ahead and use bcrypt for password hashing."
There's no server, no message queue, no database. Everything is flat files on disk. Inboxes are JSONL files. The audit trail is a markdown file. This pattern — using the filesystem as your state store — is surprisingly powerful for local tools. It's human-readable, version-controllable, and never crashes.
How spawn.sh boots an agent and notify-done.sh retires it
When the manager runs .agent/scripts/spawn.sh pm "Build X", a precise sequence unfolds — like a factory assembly line producing a new worker:
The script searches three directories in order: your project's .agent-roles/, the scaffold's custom-roles/, and the default roles/. First match wins.
Drops a JSONL message into .agent/inbox/pm.jsonl with the context you provided. This is the agent's first instruction.
Records the event in .agent/communications.md so there's a paper trail.
Opens a new tab in cmux (or tmux), copies the role file as CLAUDE.md, and launches Claude Code with boot instructions.
Claude reads its role file, checks shared memory, reads its personal memory from prior sessions, scans the knowledge base, then reads the inbox. Work begins immediately.
Here's the actual command that boots the agent in a new terminal:
BOOTSTRAP=$(cat <<EOF
cd "$WORKDIR" && \
export AGENT_ROLE="$ROLE" && \
cp "$ROLE_FILE" CLAUDE.md && \
claude "You have been activated
as $ROLE. Orient yourself..."
EOF
)
mux_spawn "$ROLE" "$BOOTSTRAP"
Build a startup command as a text string...
Navigate to the project directory.
Set an environment variable so hooks know which agent this is.
Copy the role definition as the agent's instructions file.
Launch Claude with a first-run prompt telling it to read its role, memory, and inbox, then start working.
Send this command to a brand new terminal workspace.
When an agent sends a result message, the system knows it's done. The notify-done.sh Stop hook detects this and closes the workspace after a 2-second delay.
Without this, finished agents would sit forever showing "Needs Input" in the terminal — a zombie worker taking up space.
When an agent shuts down, its inbox, memory file, and context artifacts all stay on disk. Respawning the same role picks up where it left off — like an employee coming back from vacation and reading their notes.
Clever patterns that make the system work without a server
How does an agent know it has new mail without constantly checking? The answer is hooks — shell scripts that Claude Code runs automatically at specific moments.
When: Before every Claude response.
Does: Reads the inbox file, finds unread messages using a cursor, and outputs them for Claude to see.
When: After every Claude response.
Does: Checks if the agent sent a "result" message. If yes, closes the workspace.
When: After every Claude response.
Does: If enabled, checks how long other agents have been idle. Reaps them after a timeout.
Instead of marking messages as "read" (which would require modifying the inbox file), the system tracks a cursor — a number stored in .agent/state/{role}_inbox_cursor.
TOTAL_LINES=$(wc -l < "$INBOX" | tr -d ' ')
CURSOR=0
if [[ -f "$CURSOR_FILE" ]]; then
CURSOR=$(cat "$CURSOR_FILE")
fi
if [[ "$TOTAL_LINES" -le "$CURSOR" ]]; then
exit 0 # No new messages
fi
Count how many total messages are in the inbox file.
Start with cursor at zero (haven't read anything).
If there's a saved cursor from last time, use that instead.
If total messages is less than or equal to cursor, we've read everything. Nothing new — exit quietly.
Karen needs to open terminal tabs, send text to them, and close them. But different users have different terminal tools. The solution: lib/mux.sh — a single API that works across three backends.
Visual macOS app with tabs, status bar, and notifications. The premium experience.
Classic terminal multiplexer. Works everywhere. Hidden windows, keyboard-driven.
Fallback for macOS. Opens native tabs. No push messaging — agents poll instead.
This "one interface, multiple implementations" pattern appears in almost every piece of software. Your phone's camera app works whether you have an iPhone or Android — the app is the abstraction layer over different hardware. When you encounter an unfamiliar codebase, look for these layers — they're the seams that show you how the system is organized.
Debugging agent failures, communication breakdowns, and zombie workspaces
Agent systems have unique failure modes. Here are the big three and how to diagnose them:
An agent crashed or its terminal closed. Messages pile up in the inbox with nobody reading them. Fix: Run health.sh to find dead agents, then respawn them.
An agent sent a message but the recipient never saw it. Usually the inbox hook didn't fire. Fix: Check communications.md to verify the message was logged, then check the recipient's cursor file.
A workspace file exists but the actual terminal is gone. Agents try to send to a nonexistent tab. Fix: Run shutdown.sh --all to clean up, then respawn what you need.
health.sh
Shows every agent's status (UP/DOWN), inbox size, and last activity.
communications.md
The audit trail. Every spawn, message, and shutdown is logged with timestamps.
bd list
Shows all open tasks across agents. Tells you what's stuck and what's done.
status.sh
Quick snapshot of active workspaces, inbox counts, and task state.
How all the pieces fit together — and why there's no server
Every piece of state in karen lives in plain files under .agent/. Here's the full picture:
Every piece of state is a plain file. Human-readable, git-friendly, never crashes. You can debug with cat and grep.
Instead of building a message broker, karen uses the terminal itself. cmux/tmux already solve visibility, switching, and notifications.
Agents write specs and reports in markdown (for you to read). They communicate via JSONL (for hooks to parse). Best format for each audience.
Agent behavior is defined entirely by a text file. No code to change, no settings to configure. Edit the markdown, change the agent.
You now understand how a team of AI agents coordinate using nothing but shell scripts, markdown files, and JSONL inboxes. No servers. No databases. Just clever use of the filesystem and terminal multiplexers.
Create custom roles by writing markdown files. Debug communication issues by tracing messages through inboxes and communications.md. Understand why agents behave the way they do — it's all in the role file. Extend the system by adding new hooks, backends, or integrations. The whole thing is ~500 lines of bash.