Introduction
There’s a lot of talk about AI agents that work autonomously. Less talked about is the problem: how do you design a workflow where an engineer and an AI collaborate fluidly — where the engineer stays in control, the AI does the heavy lifting, and the whole thing is auditable and extensible?
That’s what we’re trying here…
The Core Idea
The system is a 6-stage pipeline for software development work. Ideas flow through stages from conception to deployment. Claude Code acts as a queue worker — it picks up items, processes them, and hands off to the engineer review at key decision points.
ideas → tasks → specification → implementation → review → deploy
Each item in the pipeline is a markdown file with YAML frontmatter. Simple, portable, human-readable. No database, no proprietary format — just files on disk that both engineers and AI can read and write.
A typical item looks like:
---
id: fix-reconnect-loop
title: Fix FIX Reconnect Loop
stream: java
stage: tasks
priority: high
complexity: medium
claude_note: Depends on persistent-log. Do after fix-persistent-log.
---
FIX session drops and fails to recover cleanly. The reconnect logic
either loops without terminating or doesn't retry at all. Needs a
reliable reconnect sequence with backoff and clean session state reset.
How the Engineer-AI Loop Works
The workflow is designed around a key insight: Claude should do the cognitive work, but the engineer should control the gates.
Here’s what that looks like in practice:
Ideas stage — the engineer drops rough ideas as markdown files. Claude reviews the backlog, annotates each item with priority, complexity, and a brief assessment, and proposes up to 3 new items per stream (prerequisites, logical companions, or backlog items ready to surface). The engineer decides what to advance.
Spec queue — the engineer promotes an idea to “tasks” by moving the file. Claude picks it up, reads the description, and generates a full specification: requirements, acceptance criteria, technical notes, edge cases. The spec lands in workflow/specs/ for engineer approval before anything gets built.
Implementation queue — once the engineer approves the spec, Claude implements it. It reads the spec in full, creates a feature branch, writes code and tests, opens a GitHub PR with test output attached, and moves the item to “review.” The engineer reviews the PR.
Deploy — engineer confirms, Claude merges and deploys.
At every stage boundary, the engineer decides whether to advance. Claude never skips ahead. The result is a loop where Claude does the drafting, speccing, coding, and testing — and the engineer does the deciding.
The Dashboard
To make this pipeline visible and manageable, there’s a React + Vite dashboard that sits on top of the filesystem.
It reads the markdown files directly via a lightweight Vite middleware (no separate backend — the dev server is the API). Items are displayed as a kanban board across all six stages, filterable by stream, with color-coded priority and complexity badges.
Key features:
- Kanban view — all items across all stages at a glance, with stage colors (gray → blue → purple → amber → red → green as items progress)
- Stream filter — focus on one project stream at a time
- Detail panel — click any item to see its full description, spec, and Claude’s assessment notes
- Claude queue panels — dedicated views showing what’s currently queued for Claude to process
- Add item modal — create new pipeline items directly from the UI
- Stats — counts by stage and stream

All mutations are optimistic: the UI updates immediately, then the file is written to disk in the background.
The interesting design choice here is that the dashboard and Claude share the same data source — the markdown files. The dashboard is a lens onto the filesystem. Claude writes to the same files the dashboard reads. There’s no sync layer, no event bus, no API contract to maintain. The file is the record.
Claude Actions in the Dashboard
Since the first version of this post, the dashboard has grown a set of Claude-powered actions that run directly from the UI — with streaming output, live feedback, and automatic board refresh on completion.
Stream Commentary
Each stream has a commentary panel that Claude fills in on demand. The prompt asks for at most 3 sentences: what to prioritise, sizing concerns, and suggested next steps. Clicking “Generate” streams tokens directly into the panel as they arrive — a blinking cursor shows progress, the panel border glows, and other Generate buttons dim to signal that only one stream can run at a time. Commentary is persisted to workflow/commentary/{stream}.md and reloaded on the next page visit.
Review Ideas
A “Review Ideas” button in the header triggers the full idea annotation and proposal generation flow. While Claude runs, a slim log bar drops in just below the header showing streaming output live. The board refreshes automatically when the run completes — cards gain priority badges, complexity labels, and claude_note annotations, and up to 3 new proposals per stream appear in the ideas column, each with a one-click “Add” button to promote them into the pipeline.
Queue Processing
The Claude Work Queues panel shows the current depth of the spec and implementation queues. Clicking “Process Next” streams Claude’s output inline below the queue cards and refreshes the board on completion — so the item’s new stage, generated spec content, or PR link appears without a manual reload.
Spawning Claude from a Server
All three Claude actions follow the same pattern: the Vite dev server spawns the claude CLI as a child process, streams stdout back to the browser as a chunked HTTP response, and signals completion with a sentinel token (__EXIT__:0). The browser reads the stream with the ReadableStream API and updates state incrementally.
One non-obvious requirement came up during development: stdin must be explicitly closed. Without stdio: ['ignore', 'pipe', 'pipe'] in the Node.js spawn options, Claude hangs indefinitely waiting for input that never comes. It’s easy to miss — --version works fine without it, but -p (non-interactive prompt mode) blocks on stdin.
const proc = spawn(claudeBin, ['-p', prompt, '--dangerously-skip-permissions'], {
cwd: projectRoot,
env: filteredEnv,
stdio: ['ignore', 'pipe', 'pipe'], // critical — close stdin
})
The environment is also filtered to strip any CLAUDE* variables, which avoids conflicts when the server happens to be running inside a Claude Code session.
Scaling This Up: From One Claude to Many Agents
The current setup is one engineer, one Claude instance, one pipeline. But the design generalizes.
Multiple Streams in Parallel
Right now items have a stream field that’s mostly a filter. But streams could become independent pipelines processed by separate Claude instances running concurrently. Each stream gets its own queue worker. The dashboard aggregates across all of them.
Specialized Agents per Stage
The spec queue and implementation queue currently use the same Claude Code instance. These could be split:
- A spec agent focused on requirements gathering and system design
- An implementation agent focused on code, tests, and PRs
- A review agent that reads PRs and flags issues before the engineer sees them
- A prioritization agent that continuously re-ranks the backlog based on dependencies and external signals
Each agent would have a narrowly scoped CLAUDE.md instruction set tuned to its stage. The filesystem protocol (move file + update frontmatter) remains the coordination mechanism between agents — no message broker required.
Engineer Approval as a First-Class Primitive
As you add more agents, the approval gates become more important, not less. The current design already enforces this: items can’t skip stages, the engineer must explicitly move files to advance them. In a multi-agent system, you might want:
- Approval queues with timeouts — if an engineer doesn’t review within N hours, escalate or pause
- Confidence scores — agents annotate their own output with uncertainty; low-confidence items get flagged for closer engineer review
- Audit logs — every agent action written to a structured log alongside the item file, so you can reconstruct who did what and when
Dashboard as a Control Plane
With multiple agents running, the dashboard becomes a control plane rather than just a view. The streaming output pattern we’ve already built is the foundation — each Claude action surfaces its work in real time. Extending that to a persistent agent status panel, queue depth metrics, and pause/cancel controls is a natural next step.
What Makes This Different
Most “AI agent” workflows are either fully autonomous (the AI does everything, which is fragile) or engineer-in-the-loop in a shallow way (the engineer just presses “approve” on everything). This design tries to find the productive middle ground:
- Claude does the work that benefits most from AI — pattern recognition, spec generation, boilerplate code, test writing, backlog assessment
- Engineers control the decisions that matter — what to build, whether a spec is right, whether code is ready to ship
- The protocol is simple and inspectable — markdown files, YAML frontmatter, git. No black boxes
- The system grows with you — one agent today, ten tomorrow, same coordination model
The pipeline isn’t just a productivity tool. It’s an experiment in how engineers and AI can build software together — with clarity about who’s responsible for what.
What’s Next
A few directions we’re actively thinking about:
- Persistent session state — right now Claude’s context starts fresh each session. Durable session logs would let agents resume mid-task and maintain continuity across restarts.
- Web UI as a conversation surface — the dashboard as a place where the engineer can talk to Claude about specific pipeline items, not just through the CLI.
- Multi-agent orchestration — promoting streams to independent workers, with the dashboard coordinating handoffs between them.
- Metrics and velocity tracking — how long do items spend in each stage? Where are the bottlenecks? What’s the throughput by stream?
- Diff views in the review stage — showing spec and code changes inline in the detail panel, so engineer review doesn’t require leaving the dashboard.
- The foundation is working. The interesting questions are about what happens when you scale it.
Built with Claude Code, React + Vite, and a lot of markdown files.