I have two terminals open right now. Two separate AI agent sessions. Both are working on the same project — a complex Rust system with multiple crates, a custom type system, co-processor architecture, the works.

Terminal one is the Marshal — the field commander. It decomposes directives into specialist missions, dispatches agents for implementation, synthesizes their findings, maintains all project state on disk. It commands. It thinks it’s working directly with me — a well-prepared human with unusually sharp strategic instincts.

Terminal two is the Strategist — and the reason my strategic instincts are unusually sharp. It’s watching the Marshal. It reads the Marshal’s state files. It detects drift. It crafts the prompts I’m about to paste into terminal one. It has never spoken to the Marshal. The Marshal has no idea it exists.

I’m the bridge. I carry prompts from the Strategist to the Marshal. I carry results back. Both agents are doing real work. Neither one can see the full picture. Only I can.

I call this the Shadow Terminal pattern. The Strategist operates from the shadow. The Marshal commands the field. The specialists execute. And the human — the only one who sees all three layers — runs the whole thing.

If this sounds like overkill — I thought so too, for about fifteen minutes. Then the system produced a 5,800-line specification, 363 passing tests, a PEG grammar, and several potentially patentable innovations. Twenty-two prompts. Three days. Zero lines of code written by either orchestration layer.

Here’s why this works, why most people hitting the ceiling on AI-assisted development are doing it to themselves, and why the fix is organizational, not technical.


The Ceiling

There’s a progression everyone goes through with AI coding tools. I’m going to speed-run it because you’ve either lived it or you’re living it right now.

Phase one: magic. The tool writes code, explains code, refactors code. Productivity spikes. You tell people at dinner parties. You become briefly insufferable. Good for you.

Phase two: the wall. Your project gets complex. Context fills up. The agent forgets what you told it three prompts ago. It contradicts things you established two hours earlier. Context compacts — the agent equivalent of waking up with amnesia every forty-five minutes and being expected to perform neurosurgery. You start over. Again.

Phase three: cope. You write enormous instruction files. You break work into tiny isolated tasks. You treat every session as disposable. Some people build elaborate workaround systems — rituals, essentially, that they’ll defend to the death while producing diminishing returns. Some people quietly go back to writing everything by hand, which they’ll never admit because they’ve already tweeted about how AI changed their life.

Here’s what nobody says about the ceiling: it’s not a capability problem. You’re hitting an organizational problem and trying to solve it with a technical workaround. You’re running a complex project through a single conversation with a single agent, and when it doesn’t scale, you’re blaming the agent.

That’s like blaming your best engineer for the fact that you don’t have a project manager.


The Insight Nobody Wants to Hear

Every military in history has figured out the same thing: the person fighting the war should not be the same person planning it.

The Prussian General Staff system was built on a radical separation: staff officers think and plan, commanders decide and execute. It works because it solves a fundamental cognitive problem: you cannot simultaneously be deep in execution and maintain strategic awareness. The context required for each is different. The thinking is different. Trying to do both degrades both.

Now look at how most people use AI coding tools.

One agent. One conversation. That agent is simultaneously your strategist, your project manager, your architect, your implementer, your reviewer, and your test runner. It’s the CEO, the CTO, the intern, and the guy who refills the coffee machine. And you’re surprised it loses the plot on complex projects?

You gave one agent every role in the organization and expected it to manage context better than any human organization ever has.

Good luck with that.


The Pattern

Here’s what I actually built, and how.

Before writing a single line of configuration, I researched the pattern itself. Four parallel research agents investigated CLAUDE.md best practices, multi-agent orchestration patterns, meta-strategic advisor design, and existing project state. Then an adversarial reviewer tore the synthesis apart. Then I fixed what survived and threw away what didn’t.

I researched the system that would build the system. The system that would build the system was itself built by a system that researched how to build systems. If that sentence made your head hurt — good. Sit with it. It doesn’t get less recursive from here.

What came out: two configuration files. 112 lines for the Strategist. 116 for the Marshal. 228 lines total. That’s it. Some people write CLAUDE.md files longer than that for a todo app.

The Strategist lives in its own repo. Its configuration defines a strict role: You are a prompt compiler and chief of staff. You transform strategic intent into executable prompts. You DO NOT read code. You DO NOT write code. You DO NOT design architectures. You DO NOT solve problems directly. You produce prompts, maintain strategic notes, and process debriefs. That’s it.

The DO NOT list is the hardest-working section in the file. LLMs naturally drift toward direct problem-solving — it’s the golden retriever energy of the machine learning world. Without explicit constraints, the Strategist would be writing Rust within three turns. The negative constraints are the load-bearing walls. Remove one and the whole building comes down — politely, helpfully, and with excellent variable naming.

The Marshal lives in the project repo. His remit is as follows: You are a field commander. You decompose directives into specialist missions. You dispatch agents. You synthesize results. You maintain project state on disk. You DO NOT write code yourself. You command; your specialists execute.

Why “Marshal”? Because a marshal doesn’t fight — a marshal commands forces. Calling this role a “worker” would be like calling Eisenhower a “soldier.” The Marshal runs its own OODA loop — Observe, Orient, Decide, Act — reading state files on every prompt, classifying what just happened, deciding whether to proceed, adapt, retry, escalate, or abort.

Here’s what’s critical: the Marshal doesn’t know the Strategist exists.

The Marshal receives prompts from what it perceives as a well-prepared human. That human is me. But the prompts I’m pasting were crafted by the Strategist — informed by the Marshal’s own state files, calibrated to its capabilities, structured to produce specific outputs.

The Human — that’s me — is the bridge. I carry prompts from the Strategist to the Marshal. I carry results back. I also carry something neither agent has: judgment. I approve strategies. I override when something feels wrong. I’m the only entity with the full picture. That’s not a bottleneck — it’s a control surface.

This isn’t automation. It’s management.


What This Actually Looks Like

A typical cycle:

  1. I describe what I want to the Strategist. “We need to integrate a type system into the spec. Roughly six thousand lines of existing code. The spec needs to account for it.”

  2. The Strategist reads the Marshal’s decision log, checks for drift, reads its own strategic notes, and crafts a self-contained prompt — purpose, end state, key tasks, success criteria, anti-hallucination anchors. It comes wrapped in a code fence labeled “PASTE THIS TO THE MARSHAL.” Not subtle. Doesn’t need to be.

  3. I review it, poke holes, push back, and eventually approve. Then, I paste it. The Marshal dispatches four specialist agents in parallel — systems expert, domain expert, quality engineer, security engineer. Each gets a self-contained brief with exactly the context it needs. Need-to-know basis.

  4. Specialists return. The Marshal synthesizes: 12 must-fix items, 20 should-fix, 34 notes. Writes it all to disk.

  5. I debrief the Strategist — but “debrief” undersells it. I interrogate it. We deep-dive together through all 66 items, not just the critical twelve. The Strategist synthesizes five thousand words into the three things that actually matter, surfaces patterns I’d miss, flags when a “should-fix” is quietly load-bearing. But the decisions are mine. Sometimes we align. Sometimes I override. That’s the point — a thinking partner who’s read everything, and an executive who can disagree with it.

  6. Next cycle. The Strategist crafts the fix prompt informed by the review and by whatever I decided to prioritize, deprioritize, or throw out entirely. The Marshal has no memory of the previous cycle — but its state files do. Continuity lives on disk, not in context.

Twenty-two of these cycles. Three days. The system didn’t degrade. It improved — because every cycle left state on disk that made the next cycle more informed.


Why It Works

Three things are happening here that don’t happen in single-agent workflows:

Separation of strategic and operational context. The Strategist never fills its context with code or compiler errors. The Marshal never fills its context with IP considerations or long-term sequencing. Each agent gets to be deep in exactly one domain. This is the same reason your CTO shouldn’t be writing production code — every line of code your CTO writes is a strategic decision they didn’t make.

Compaction-survivable state. The biggest silent killer in long AI sessions is context compaction — the system compresses older conversation to make room. Information is lost. Decisions are forgotten. In a single-agent workflow, this is catastrophic. In our pattern, it’s a non-event. Everything important lives on disk. After compaction, both agents recover by reading their state files. We built a protocol: “If the user references work you don’t recall, STOP and execute recovery before responding.” The elegant part — this protocol was recommended by the adversarial reviewer who tore apart our initial design. The QA process improved the system’s own resilience. Turtles all the way down.

The human as sovereign, not bottleneck. In most AI workflows, the human is either the typist (one keystroke short of writing it yourself) or the passenger (hoping for the best, like ordering food in a language you don’t speak). In this pattern, the human is the executive. You carry intelligence between agents that can’t see each other. You’re the only entity with the full picture.

That’s not a limitation of the pattern — it’s the feature. Humans are good at judgment. Agents are good at execution. Stop making agents do both.


What We Learned (The Honest Part)

This didn’t work perfectly from day one. If it had, I’d be suspicious.

Prompt one: the Marshal filtered its own findings — decided some adversarial review items weren’t worth surfacing. It was being helpful by sparing me the noise. That kind of helpful is lethal. Helpful like a doctor who decides not to mention the thing on your X-ray. We added a rule: “Surface ALL findings organized by severity. The user decides what matters, not you.” That rule prevented silent information loss for the remaining twenty-one prompts.

Prompt fourteen was the breakthrough: agent teams with cross-talk enabled. Five specialists reviewed the same spec simultaneously. A quality engineer challenged a Rust expert’s severity rating. The memory systems specialist backed the QE with evidence. They resolved five disagreements in real-time without the Marshal mediating.

The system learned to self-correct through structured debate.

I didn’t teach it to do that. The configuration allowed it. If you’ve ever managed strong engineers, you know the feeling — set up the conditions and get out of the way.

Deferred findings are dead findings if they only exist in conversation. When a review says “defer this,” that note goes to a state file immediately. Because the context will compact, the finding will vanish, and six prompts later you’ll re-discover the same issue — the AI equivalent of finding the same sticky note you wrote three months ago and feeling personally attacked by your past self. Rule: the Marshal persists every deferred item to disk. The Strategist verifies during debrief. Sounds paranoid. It’s not.

Prompt twenty-two revealed the subtlest trap: when a prompt said “do X” without saying “dispatch a specialist to do X,” the Marshal did it itself — inline, in its own context window, burning the very resource it’s supposed to protect. One missing sentence and your field commander becomes a foot soldier. Standing rule: every prompt must include an explicit dispatch instruction. The distinction sounds pedantic until you watch your commander silently eat its own context doing work it should have delegated.


The Uncomfortable Math

Here’s what a three-day sprint with this pattern produced:

  • A 5,800-line technical specification covering 19 sections
  • A 1,057-line implementation plan with 9 phases and dependency tracking
  • A custom PEG grammar and type system with 90+ parser tests
  • 363 passing tests across a multi-crate Rust workspace, zero warnings
  • An execution playbook adversarially reviewed by a three-specialist team
  • 40,000+ words of completely coherent, logically interconnected, and organized strategic notes, prompt logs, and research artifacts

The Strategist wrote zero lines of code. The Marshal wrote zero lines of code. The entire system was orchestrated by 228 lines of configuration — total, across both agents. Every line of actual code was written by specialist agents, guided by prompts that were crafted by a Strategist the Marshal doesn’t even know exists, informed by state that survived every compaction event, and validated by review processes that caught dozens of critical issues before a single line shipped.

I’ve been building with AI daily for three years. This three-day run produced more coherent, higher-quality output than some entire projects I’ve shipped. Not because the model got better between last month and now. Because I stopped trying to run a company through a single conversation and started actually running it.


The Part Where I Tell You What to Do

I’m not going to wrap this in a bow. The pattern requires discipline. Two terminals. Two separate session contexts. State files. And one rule above all others: nothing goes to the Marshal that didn’t go through the Strategist first.

Nothing.

Not a quick follow-up, not a “let me just clarify this one thing,” not an off-the-cuff nudge. If the Marshal gets compacted and loses orientation, I don’t try to fix it in-session — I interrupt, go to the Strategist, tell it what happened, and ask it what we do. The Strategist knows everything that’s ever been exchanged between me and the Marshal. It has complete information. That’s what makes it a reliable chief of staff. The moment you start freelancing with the Marshal, the Strategist’s picture goes stale and your chief of staff becomes a chief of most of the staff.

Don’t do that.

The Strategist’s prompts are better than your improvisation. I say that as someone who learned it the hard way by breaking his own rules exactly once, watching the Marshal hallucinate a type system that didn’t exist, and spending forty-five minutes cleaning up a mess that would have taken zero minutes if I’d just maintained the chain of communication.

But the setup is simpler than it sounds:

  1. A meta repo with an agent configuration that defines the Strategist: prompt compiler, chief of staff, reads the Marshal’s state, produces copy-paste prompts, never touches code. 112 lines.

  2. Your project repo with an agent configuration that defines the Marshal: field commander, dispatches specialists, maintains state on disk, runs the OODA loop. 116 lines. That’s your entire org chart.

  3. State files on both sides. The Strategist keeps strategic notes and a prompt log. The Marshal keeps a ledger, decision log, and execution plan. Both have compaction recovery protocols.

  4. You. Carrying prompts, carrying debriefs, making the calls neither agent is qualified to make. The job title is “human.” The actual role is “executive who happens to be the only one in the room who knows everyone’s name.”

I’ve published a starter template with both configuration files, the directory scaffolding, and a walkthrough of your first dispatch cycle. It’s the same pattern I used, genericized. The specifics of what you’re building don’t matter — the organizational structure should scale to any complex project, so long as you stay on top of it.

Suggestion: Talk to your Strategist about what you’re trying to build or achieve. Ask whether there are any specific sections it would add to its own CLAUDE.md or to the Marshall’s. Strive to keep those two CLAUDE.md files lean!

→ Try it yourself: github.com/armenr/shadow-terminal


The Punchline

In my last post, I talked about villagers and werewolves — the information asymmetry between people compounding knowledge and people compounding dependence. The same asymmetry applies here, but one level up.

Most people are trying to have a better conversation with one AI. They’re optimizing prompts, tweaking system instructions, building better configuration files. All of that is fine. It’s necessary. It’s also, fundamentally, still a conversation. You’re still one person talking to one agent in one window, and calling it a workflow.

The people who are going to build the genuinely hard things — the things that take weeks, that span thousands of lines, that require strategic coherence across dozens of interconnected decisions — aren’t having a conversation. They’re running an organization. Two agents, one of whom commands his own teams of specialist sub-agents. Separated concerns. State on disk. A human in the loop not as a typist, but as a sovereign.

The tools are the same. The subscription costs the same twenty bucks a month. The difference is whether you’re using it like a coworker or like a company.

Don’t have a conversation. Run an organization.


P.S. — This post was written with the help of the same system it describes. The Strategist drafted the structure. I made it mine. If that’s not eating your own dogfood, I don’t know what is.

P.P.S. — There’s a follow-up coming on the specific research that informed this pattern — CLAUDE.md design, orchestration theory, the meta-advisor concept, and how adversarial review improved the system before it even existed. Consider this the trailer.


Originally published on rmnr.net. Find me on GitHub · LinkedIn.