What Is Karen

From one command to a team of AI agents building your software

Imagine Having a Software Team on Speed-Dial

You type one sentence: "Build me an invoicing SaaS." Within seconds, an AI product manager starts asking you clarifying questions. A minute later, a dev lead is breaking tasks into tickets. Developers start writing code in parallel. A QA agent tests everything.

That's agent-karen. It's an orchestration system that turns a single Claude Code session into a full software team.

The Three-Command Workflow

1

karen init ~/my-project

Sets up the scaffolding in your project: creates folders for inboxes, memory, and context. Writes permissions so agents can work without asking for approval on every command.

2

karen start

Boots up a manager agent in your terminal. This is the agent you talk to directly. It reads its role instructions and waits for your command.

3

"Spawn a PM and let's brainstorm"

You tell the manager what you want. It spawns agents in new terminal tabs, delegates work, monitors progress, and reports back to you.

What Happens After You Say "Go"

Let's trace the journey after you tell the manager to build something.

You

M

Manager

PM

L

Lead

D

Devs

Click "Next Step" to begin the trace

Step 0 / 7

💡

Why This Matters for You

Understanding this system means you can direct AI agents more effectively. Instead of one monolithic AI doing everything, you get specialized agents that focus on what they're good at — just like a real team. When you know how the pieces connect, you can tell the manager exactly what to do, debug communication breakdowns, and add new roles tailored to your project.

02

Install & Get Started

From zero to a working multi-agent team in under 2 minutes

What You Need

karen runs on top of two tools you'll need installed first:

⚡

Claude Code

Anthropic's CLI for Claude. This is the AI engine that powers each agent. Install with npm install -g @anthropic-ai/claude-code

🎨

cmux

A terminal multiplexer for macOS that gives each agent its own tab. Download from cmux.com. Alternatively, tmux works too.

Install karen

One command. That's it.

TERMINAL

npm install -g agent-karen

WHAT THIS DOES

Downloads karen from the npm registry and installs it globally so you can use the karen command from anywhere.

Set Up Your Project

1

Initialize the scaffold

TERMINAL

karen init ~/projects/my-app

WHAT THIS DOES

Creates the .agent/ directory with inboxes, memory, and state folders. Writes permissions to .claude/settings.json so agents can work autonomously. Sets up hooks for message passing.

2

Start the manager

TERMINAL

cd ~/projects/my-app
karen start

WHAT THIS DOES

Launches a cmux workspace with the manager agent. This is the agent you'll talk to directly — it orchestrates everything else.

3

Tell the manager what to build

IN THE MANAGER'S TERMINAL

"I want to build a task management app
with drag-and-drop. Spawn a PM and
let's brainstorm."

WHAT HAPPENS NEXT

The manager spawns a PM agent in a new tab. The PM asks you clarifying questions, writes a brief, then the manager spawns a dev lead who breaks it into tasks and assigns developers. All automatically.

Optional: Link Reference Docs

If you have existing documentation you want agents to reference (API docs, design specs, etc.), link them during init:

TERMINAL

karen init ~/projects/my-app \
  --knowledge ~/docs/api-reference \
  --knowledge ~/docs/design-system

WHAT THIS DOES

Creates symlinks in .agent/knowledge/ pointing to your docs. Every agent reads these on boot, so they have full context about your project without you having to explain it each time.

📚

Custom Roles

Want agents with specialized behaviors? Create a .agent-roles/ directory in your project and add markdown files like copywriter.md or data-engineer.md. Karen auto-discovers them — no config needed.

02

Meet the Cast of Characters

Every agent has a role, a personality, and a strict job description

Agents Are Just Claude Reading Instructions

Here's a secret that makes the whole system click: every agent is the same AI model (Claude). What makes each agent different is its role file — a markdown document that tells the agent who it is, what tools it has, and how to behave.

Think of it like an acting troupe. Same actors, different scripts. The PM reads pm.md and becomes a product manager. The QA reads qa.md and becomes a testing specialist.

The Roster

👑

Manager

The boss. Talks to you, spawns agents, monitors health, never writes code. The only agent that stays alive for the whole session.

📋

PM

Asks questions, writes the product brief, defines scope. First to spawn, first to finish.

🛠

Dev Lead

Breaks the brief into tasks, spawns developers, coordinates QA. The technical brain.

💻

Dev (1, 2, 3...)

Writes code, runs tests, reports back. Multiple devs can work in parallel on different tasks.

🔎

QA

Reviews code, runs test suites, writes bug reports. The last line of defense before shipping.

🔒

Security

Audits for vulnerabilities. Classifies findings by severity. P0 findings block the entire release.

What a Role File Looks Like

Every role file follows the same structure. Here's the key section from the PM's role:

CODE

# ROLE: Product Manager

## Workflow
1. Read your inbox for the product goal.
2. Ask 3-5 clarifying questions via msg.sh.
3. Write the brief to .agent/context/brief.md.
4. Notify manager: "Brief complete."

## Sending messages
.agent/scripts/msg.sh manager "<msg>" question
.agent/scripts/msg.sh lead "<msg>" result

PLAIN ENGLISH

This file defines the agent's identity — it's a product manager.

Step-by-step instructions the agent follows, in order.

First, check the inbox to understand what you're building.

Ask the manager specific questions to fill gaps.

Write a structured document with the product spec.

Signal that the work is done so the next phase can start.

These are the exact commands the agent runs to send messages to other agents.

🎯

Practical Skill: Customizing Roles

Want a "copywriter" agent? Create .agent-roles/copywriter.md in your project with the same structure. The spawn system automatically finds it. You're programming AI behavior with plain English — no code required.

03

How Agents Communicate

JSONL inboxes, message passing, and a shared audit trail

The Post Office Metaphor

Every agent has a physical mailbox on disk: a file called .agent/inbox/{name}.jsonl. When one agent wants to talk to another, it doesn't call it directly — it drops a letter in the mailbox.

The recipient checks their mailbox automatically before every response (via a hook). If there's new mail, it appears right in their conversation context.

Inside a Message

Each message is a single line of JSONL:

CODE

{
  "from": "pm",
  "type": "question",
  "ts": "2026-03-22T10:30:00Z",
  "body": "What's the target user persona?"
}

PLAIN ENGLISH

This message is from the PM agent.

It's a question (as opposed to a result, escalation, or regular message).

Timestamped so you can trace the conversation history.

The actual content of the message — what the PM is asking.

Watch a Conversation Unfold

Here's what it looks like when the manager, PM, and dev lead coordinate on a task:

#agent-comms — .agent/communications.md

0 / 6 messages

Message Types Signal Intent

The type field in each message tells the recipient how to react:

message Regular communication. "Here's what I found." No urgency.

question Needs a response before the sender can continue.

result Work is done. Signals completion — triggers auto-shutdown of the sender's workspace.

escalation Something is wrong. Goes straight to the manager. Priority handling.

unblock Answering a blocker. "Go ahead and use bcrypt for password hashing."

💡

The "File as Database" Pattern

There's no server, no message queue, no database. Everything is flat files on disk. Inboxes are JSONL files. The audit trail is a markdown file. This pattern — using the filesystem as your state store — is surprisingly powerful for local tools. It's human-readable, version-controllable, and never crashes.

04

Birth, Life, and Death of an Agent

How spawn.sh boots an agent and notify-done.sh retires it

Spawning: From Script to Working Agent

When the manager runs .agent/scripts/spawn.sh pm "Build X", a precise sequence unfolds — like a factory assembly line producing a new worker:

1

Find the role file

The script searches three directories in order: your project's .agent-roles/, the scaffold's custom-roles/, and the default roles/. First match wins.

2

Write the init message

Drops a JSONL message into .agent/inbox/pm.jsonl with the context you provided. This is the agent's first instruction.

3

Log the spawn

Records the event in .agent/communications.md so there's a paper trail.

4

Create a terminal workspace

Opens a new tab in cmux (or tmux), copies the role file as CLAUDE.md, and launches Claude Code with boot instructions.

5

Agent orients itself

Claude reads its role file, checks shared memory, reads its personal memory from prior sessions, scans the knowledge base, then reads the inbox. Work begins immediately.

The Spawn Code

Here's the actual command that boots the agent in a new terminal:

CODE

BOOTSTRAP=$(cat <<EOF
cd "$WORKDIR" && \
  export AGENT_ROLE="$ROLE" && \
  cp "$ROLE_FILE" CLAUDE.md && \
  claude "You have been activated
    as $ROLE. Orient yourself..."
EOF
)

mux_spawn "$ROLE" "$BOOTSTRAP"

PLAIN ENGLISH

Build a startup command as a text string...

Navigate to the project directory.

Set an environment variable so hooks know which agent this is.

Copy the role definition as the agent's instructions file.

Launch Claude with a first-run prompt telling it to read its role, memory, and inbox, then start working.

Send this command to a brand new terminal workspace.

Auto-Shutdown: Dying Gracefully

When an agent sends a result message, the system knows it's done. The notify-done.sh Stop hook detects this and closes the workspace after a 2-second delay.

Without this, finished agents would sit forever showing "Needs Input" in the terminal — a zombie worker taking up space.

📚

Memory Survives Death

When an agent shuts down, its inbox, memory file, and context artifacts all stay on disk. Respawning the same role picks up where it left off — like an employee coming back from vacation and reading their notes.

05

The Engineering Tricks

Clever patterns that make the system work without a server

Trick 1: Hooks as the Nervous System

How does an agent know it has new mail without constantly checking? The answer is hooks — shell scripts that Claude Code runs automatically at specific moments.

📬

check-inbox.sh

When: Before every Claude response.
Does: Reads the inbox file, finds unread messages using a cursor, and outputs them for Claude to see.

🛑

notify-done.sh

When: After every Claude response.
Does: Checks if the agent sent a "result" message. If yes, closes the workspace.

⏰

auto-shutdown.sh

When: After every Claude response.
Does: If enabled, checks how long other agents have been idle. Reaps them after a timeout.

Trick 2: The Cursor-Based Inbox

Instead of marking messages as "read" (which would require modifying the inbox file), the system tracks a cursor — a number stored in .agent/state/{role}_inbox_cursor.

CODE

TOTAL_LINES=$(wc -l < "$INBOX" | tr -d ' ')

CURSOR=0
if [[ -f "$CURSOR_FILE" ]]; then
  CURSOR=$(cat "$CURSOR_FILE")
fi

if [[ "$TOTAL_LINES" -le "$CURSOR" ]]; then
  exit 0  # No new messages
fi

PLAIN ENGLISH

Count how many total messages are in the inbox file.

Start with cursor at zero (haven't read anything).

If there's a saved cursor from last time, use that instead.

If total messages is less than or equal to cursor, we've read everything. Nothing new — exit quietly.

Trick 3: The Multiplexer Abstraction

Karen needs to open terminal tabs, send text to them, and close them. But different users have different terminal tools. The solution: lib/mux.sh — a single API that works across three backends.

🎨

cmux

Visual macOS app with tabs, status bar, and notifications. The premium experience.

🖥

tmux

Classic terminal multiplexer. Works everywhere. Hidden windows, keyboard-driven.

💻

Terminal.app / iTerm

Fallback for macOS. Opens native tabs. No push messaging — agents poll instead.

💡

Abstraction Layers Are Everywhere

This "one interface, multiple implementations" pattern appears in almost every piece of software. Your phone's camera app works whether you have an iPhone or Android — the app is the abstraction layer over different hardware. When you encounter an unfamiliar codebase, look for these layers — they're the seams that show you how the system is organized.

06

When Things Go Wrong

Debugging agent failures, communication breakdowns, and zombie workspaces

The Three Things That Break

Agent systems have unique failure modes. Here are the big three and how to diagnose them:

💀

Dead Agents

An agent crashed or its terminal closed. Messages pile up in the inbox with nobody reading them. Fix: Run health.sh to find dead agents, then respawn them.

💬

Lost Messages

An agent sent a message but the recipient never saw it. Usually the inbox hook didn't fire. Fix: Check communications.md to verify the message was logged, then check the recipient's cursor file.

🧟

Zombie Workspaces

A workspace file exists but the actual terminal is gone. Agents try to send to a nonexistent tab. Fix: Run shutdown.sh --all to clean up, then respawn what you need.

Your Debugging Toolkit

health.sh Shows every agent's status (UP/DOWN), inbox size, and last activity.

communications.md The audit trail. Every spawn, message, and shutdown is logged with timestamps.

bd list Shows all open tasks across agents. Tells you what's stuck and what's done.

status.sh Quick snapshot of active workspaces, inbox counts, and task state.

07

The Architecture at a Glance

How all the pieces fit together — and why there's no server

The Complete File Map

Every piece of state in karen lives in plain files under .agent/. Here's the full picture:

.agent/ All agent runtime state lives here

inbox/ One .jsonl file per agent — their mailbox

context/ Shared artifacts: briefs, specs, audit reports

state/ Workspace IDs, cursors, done markers

memory/ Shared + per-agent memory (survives restarts)

knowledge/ Symlinked reference docs

communications.md Audit log of every interaction

scripts/ → Symlink to scaffold's scripts

hooks/ → Symlink to scaffold's hooks

The Design Decisions That Matter

🗃

Files, Not Databases

Every piece of state is a plain file. Human-readable, git-friendly, never crashes. You can debug with cat and grep.

🔌

Terminal as Transport

Instead of building a message broker, karen uses the terminal itself. cmux/tmux already solve visibility, switching, and notifications.

📜

Markdown for Humans, JSON for Machines

Agents write specs and reports in markdown (for you to read). They communicate via JSONL (for hooks to parse). Best format for each audience.

🔁

Role = Markdown = Behavior

Agent behavior is defined entirely by a text file. No code to change, no settings to configure. Edit the markdown, change the agent.

You Made It.

You now understand how a team of AI agents coordinate using nothing but shell scripts, markdown files, and JSONL inboxes. No servers. No databases. Just clever use of the filesystem and terminal multiplexers.

🎉

What You Can Do Now

Create custom roles by writing markdown files. Debug communication issues by tracing messages through inboxes and communications.md. Understand why agents behave the way they do — it's all in the role file. Extend the system by adding new hooks, backends, or integrations. The whole thing is ~500 lines of bash.

Agent Karen

Imagine Having a Software Team on Speed-Dial

The Three-Command Workflow

What Happens After You Say "Go"

What You Need

Install karen

Set Up Your Project

Optional: Link Reference Docs

Agents Are Just Claude Reading Instructions

The Roster

Manager

PM

Dev Lead

Dev (1, 2, 3...)

QA

Security

What a Role File Looks Like

The Post Office Metaphor

Inside a Message

Watch a Conversation Unfold

Message Types Signal Intent

Spawning: From Script to Working Agent

The Spawn Code

Auto-Shutdown: Dying Gracefully

Trick 1: Hooks as the Nervous System

check-inbox.sh

notify-done.sh

auto-shutdown.sh

Trick 2: The Cursor-Based Inbox

Trick 3: The Multiplexer Abstraction

The Three Things That Break

Dead Agents

Lost Messages

Zombie Workspaces

Your Debugging Toolkit

The Complete File Map

The Design Decisions That Matter

Files, Not Databases

Terminal as Transport

Markdown for Humans, JSON for Machines

Role = Markdown = Behavior

You Made It.