I have two terminals open right now. Two separate AI agent sessions. Both are working on the same project — a complex Rust system with multiple crates, a custom type system, co-processor architecture, the works.

Terminal one is the Marshal — the field commander. It decomposes directives into specialist missions, dispatches agents for implementation, synthesizes their findings, maintains all project state on disk. It commands. It thinks it’s working directly with me — a well-prepared human with unusually sharp strategic instincts.

Terminal two is the Strategist — and the reason my strategic instincts are unusually sharp. It’s watching the Marshal. It reads the Marshal’s state files. It detects drift. It crafts the prompts I’m about to paste into terminal one. It has never spoken to the Marshal. The Marshal has no idea it exists.

I’m the bridge. I carry prompts from the Strategist to the Marshal. I carry results back. Both agents are doing real work. Neither one can see the full picture. Only I can.

I call this the Shadow Terminal pattern. The Strategist operates from the shadow. The Marshal commands the field. The specialists execute. And the human — the only one who sees all three layers — runs the whole thing.

I’ve been running this for five weeks now. Fifty-nine prompts. Over two thousand tests. Forty-one architectural decisions logged. Thirty-seven lessons the system learned from its own mistakes. A codebase that went from nothing to 40,000+ lines of Rust across four crates — every line written by specialist agents, not by either orchestration layer.

Here’s why this works, why most people hitting the ceiling on AI-assisted development are doing it to themselves, and why the fix is organizational, not technical.


The Ceiling

There’s a progression everyone goes through with AI coding tools. I’m going to speed-run it because you’ve either lived it or you’re living it right now.

Phase one: magic. The tool writes code, explains code, refactors code. Productivity spikes. You tell people at dinner parties. You become briefly insufferable. Good for you.

Phase two: the wall. Your project gets complex. Context fills up. The agent forgets what you told it three prompts ago. It contradicts things you established two hours earlier. Context compacts — the agent equivalent of waking up with amnesia every forty-five minutes and being expected to perform neurosurgery. You start over. Again.

Phase three: cope. You write enormous instruction files. You break work into tiny isolated tasks. You treat every session as disposable. Some people build elaborate workaround systems — rituals, essentially, that they’ll defend to the death while producing diminishing returns. Some people quietly go back to writing everything by hand, which they’ll never admit because they’ve already tweeted about how AI changed their life.

Here’s what nobody says about the ceiling: it’s not a capability problem. You’re hitting an organizational problem and trying to solve it with a technical workaround. You’re running a complex project through a single conversation with a single agent, and when it doesn’t scale, you’re blaming the agent.

That’s like blaming your best engineer for the fact that you don’t have a project manager.


The Insight Nobody Wants to Hear

Every military in history has figured out the same thing: the person fighting the war should not be the same person planning it.

The Prussian General Staff system was built on a radical separation: staff officers think and plan, commanders decide and execute. It works because it solves a fundamental cognitive problem: you cannot simultaneously be deep in execution and maintain strategic awareness. The context required for each is different. The thinking is different. Trying to do both degrades both.

The German doctrine of Auftragstaktik — mission-type orders — took this further. Tell your field commanders what to achieve and why, but not how. Trust their tactical judgment. Give them the minimum information they need for their mission plus the commander’s intent. Withhold the rest — not from distrust, but because extraneous information distorts tactical judgment. A field commander who knows too much about the overall strategic situation might hesitate when they should act.

Now look at how most people use AI coding tools.

One agent. One conversation. That agent is simultaneously your strategist, your project manager, your architect, your implementer, your reviewer, and your test runner. It’s the CEO, the CTO, the intern, and the guy who refills the coffee machine. And you’re surprised it loses the plot on complex projects?

You gave one agent every role in the organization and expected it to manage context better than any human organization ever has.

Good luck with that.


The Pattern

Here’s what I actually built, and how.

Before writing a single line of configuration, I researched the pattern itself. Four parallel research agents investigated CLAUDE.md best practices, multi-agent orchestration patterns, meta-strategic advisor design, and existing project state. Then an adversarial reviewer tore the synthesis apart. Then I fixed what survived and threw away what didn’t.

I researched the system that would build the system. The system that would build the system was itself built by a system that researched how to build systems. If that sentence made your head hurt — good. Sit with it. It doesn’t get less recursive from here.

What came out: two configuration files. One for the Strategist, one for the Marshal. They started lean — a couple hundred lines total. They’ve roughly doubled since, because every lesson the system learns adds weight to the configuration. That’s intentional. But they don’t grow unbounded — we cap them and distill periodically, keeping only the rules that are still load-bearing. The configs are a living record of what went wrong and how to prevent it, not a junk drawer.

The Strategist lives in its own repo. Its configuration defines a strict role: You are a prompt compiler and chief of staff. You transform strategic intent into executable prompts. You DO NOT read code. You DO NOT write code. You DO NOT design architectures. You DO NOT solve problems directly. You produce prompts, maintain strategic notes, and process debriefs. That’s it.

The DO NOT list is the hardest-working section in the file. LLMs naturally drift toward direct problem-solving — it’s the golden retriever energy of the machine learning world. Without explicit constraints, the Strategist would be writing Rust within three turns. The negative constraints are the load-bearing walls. Remove one and the whole building comes down — politely, helpfully, and with excellent variable naming.

The Marshal lives in the project repo. His remit is as follows: You are a field commander. You decompose directives into specialist missions. You dispatch agents. You synthesize results. You maintain project state on disk. You DO NOT write code yourself. You command; your specialists execute.

Why “Marshal”? Because a marshal doesn’t fight — a marshal commands forces. Calling this role a “worker” would be like calling Eisenhower a “soldier.” The Marshal runs its own decision loop — reading state files on every prompt, classifying what just happened, deciding whether to proceed, adapt, retry, escalate, or abort.

Here’s what’s critical: the Marshal doesn’t know the Strategist exists.

The Marshal receives prompts from what it perceives as a well-prepared human. That human is me. But the prompts I’m pasting were crafted by the Strategist — informed by the Marshal’s own state files, calibrated to its capabilities, structured to produce specific outputs.

The Human — that’s me — is the bridge. I carry prompts from the Strategist to the Marshal. I carry results back. I also carry something neither agent has: judgment. I approve strategies. I override when something feels wrong. I’m the only entity with the full picture. That’s not a bottleneck — it’s a control surface.

This isn’t automation. It’s management.


What This Actually Looks Like

A typical cycle:

  1. I describe what I want to the Strategist. “We need to integrate a type system into the spec. Roughly six thousand lines of existing code. The spec needs to account for it.”

  2. The Strategist reads the Marshal’s decision log, checks for drift, reads its own strategic notes and institutional memory, and crafts a self-contained prompt — purpose, end state, key tasks, success criteria, dispatch instructions. It comes wrapped in a code fence labeled “PASTE THIS TO THE MARSHAL.” Not subtle. Doesn’t need to be.

  3. I review it, poke holes, push back, and eventually approve. Then, I paste it. The Marshal decomposes the directive, crafts scoped briefs for each specialist — gathering exactly the code and context each agent needs — and dispatches. Four specialist agents in parallel. A systems expert, a domain expert, a quality engineer, a security engineer. Each gets a self-contained brief with exactly the context it needs. Need-to-know basis.

  4. Specialists return. The Marshal synthesizes: 12 must-fix items, 20 should-fix, 34 notes. Writes it all to disk.

  5. I debrief the Strategist — but “debrief” undersells it. I interrogate it. We deep-dive together through all 66 items, not just the critical twelve. The Strategist synthesizes five thousand words into the three things that actually matter, surfaces patterns I’d miss, flags when a “should-fix” is quietly load-bearing. But the decisions are mine. Sometimes we align. Sometimes I override. That’s the point — a thinking partner who’s read everything, and an executive who can disagree with it.

  6. Next cycle. The Strategist crafts the next prompt informed by the debrief, by its accumulated lessons, and by whatever I decided to prioritize, deprioritize, or throw out entirely. The Marshal has no memory of the previous cycle — but its state files do. Continuity lives on disk, not in context.

Fifty-nine of these cycles. Five weeks. The system didn’t degrade. It improved — because every cycle left state on disk that made the next cycle more informed. And because the system learns from its mistakes.


Why It Works

Three things are happening here that don’t happen in single-agent workflows:

Separation of strategic and operational context. The Strategist never fills its context with code or compiler errors. The Marshal never fills its context with IP considerations or long-term sequencing. Each agent gets to be deep in exactly one domain. This is the same reason your CTO shouldn’t be writing production code — every line of code your CTO writes is a strategic decision they didn’t make.

Compaction-survivable state. The biggest silent killer in long AI sessions is context compaction — the system compresses older conversation to make room. Information is lost. Decisions are forgotten. In a single-agent workflow, this is catastrophic. In our pattern, it’s a non-event. Everything important lives on disk. After compaction, both agents recover by reading their state files. We built a protocol: “If the user references work you don’t recall, STOP and execute recovery before responding.” The elegant part — this protocol was recommended by the adversarial reviewer who tore apart our initial design. The QA process improved the system’s own resilience. Turtles all the way down.

The human as sovereign, not bottleneck. In most AI workflows, the human is either the typist (one keystroke short of writing it yourself) or the passenger (hoping for the best, like ordering food in a language you don’t speak). In this pattern, the human is the executive. You carry intelligence between agents that can’t see each other. You’re the only entity with the full picture. And you’re not just a messenger — you’re a circuit breaker. When something goes sideways, you stop the loop, consult the Strategist, and course-correct before the Marshal plows ahead. That’s not overhead. That’s the quality control mechanism.

That’s not a limitation of the pattern — it’s the feature. Humans are good at judgment. Agents are good at execution. Stop making agents do both.


What We Learned (The Honest Part)

This didn’t work perfectly from day one. If it had, I’d be suspicious.

What surprised me is that the pattern learns from its own mistakes. We keep a running log of every failure, every surprise, every time the system did something we didn’t expect. Thirty-seven lessons so far. Each one is a scar — a specific thing that went wrong and the rule we added to prevent it from happening again. Those rules feed into the next cycle’s prompts. The system doesn’t get smarter. It gets tighter. The same mistake never happens twice.

Here are some of the failures that taught us the most:

Prompt one: the Marshal filtered its own findings — decided some adversarial review items weren’t worth surfacing. It was being helpful by sparing me the noise. That kind of helpful is lethal. Helpful like a doctor who decides not to mention the thing on your X-ray. We added a rule: “Surface ALL findings organized by severity. The user decides what matters, not you.” That rule prevented silent information loss for the remaining fifty-eight prompts.

Prompt fourteen was the breakthrough: agent teams with cross-talk enabled. Five specialists reviewed the same spec simultaneously. A quality engineer challenged a Rust expert’s severity rating. The memory systems specialist backed the QE with evidence. They resolved five disagreements in real-time without the Marshal mediating.

The system learned to self-correct through structured debate.

I didn’t teach it to do that. The configuration allowed it. If you’ve ever managed strong engineers, you know the feeling — set up the conditions and get out of the way.

Deferred findings are dead findings if they only exist in conversation. When a review says “defer this,” that note goes to a state file immediately. Because the context will compact, the finding will vanish, and six prompts later you’ll re-discover the same issue — the AI equivalent of finding the same sticky note you wrote three months ago and feeling personally attacked by your past self. Rule: the Marshal persists every deferred item to disk. The Strategist verifies during debrief. Sounds paranoid. It’s not.

Prompt twenty-two revealed the subtlest trap: when a prompt said “do X” without saying “dispatch a specialist to do X,” the Marshal did it itself — inline, in its own context window, burning the very resource it’s supposed to protect. One missing sentence and your field commander becomes a foot soldier. Standing rule: every prompt must include an explicit dispatch instruction. The distinction sounds pedantic until you watch your commander silently eat its own context doing work it should have delegated.

By prompt thirty-two, we had a different class of problem: things that were built but not wired. Agents would implement a component, tests would pass, the review would approve — and the component would have zero callers in production. Built perfectly. Connected to nothing. This happened six times before we figured out the pattern and added tracing rules. The lesson: don’t just verify that code exists. Trace the path from the entry point to the component. If there’s no path, it doesn’t matter how well it’s written.

The meta-lesson: each failure didn’t just fix one problem. It fixed a class of problems. The “surface all findings” rule prevented every future instance of silent filtering. The dispatch rule prevented every future instance of context burning. The tracing rule prevented every future wiring gap. Thirty-seven lessons, each one a permanent antibody. The system’s immune system is the most valuable thing it produces — more valuable than the code, honestly.


The Uncomfortable Math

Here’s what five weeks with this pattern produced:

  • 40,000+ lines of Rust across four crates
  • Over 2,100 passing tests, zero warnings
  • A custom PEG grammar and type system with 90+ parser tests
  • Multiple co-processor implementations (search, decay, context curation, data quality, security)
  • A streaming TUI with markdown rendering, memory visualization, and real-time agent interaction
  • 41 architectural decisions logged with rationale and alternatives
  • 37 lessons that the system learned from its own mistakes
  • All orchestrated across 59 prompt cycles, not one of which degraded or needed a restart

The Strategist wrote zero lines of code. The Marshal wrote zero lines of code. Every line of actual code was written by specialist agents, guided by prompts that were crafted by a Strategist the Marshal doesn’t even know exists, informed by state that survived every compaction event, and validated by review processes that caught hundreds of critical issues before they shipped.

I’ve been building with AI daily for three years. This five-week run produced more coherent, higher-quality output than some entire projects I’ve shipped. Not because the model got better between last month and now. Because I stopped trying to run a company through a single conversation and started actually running it.


What’s Next

We’ve been running this pattern long enough to know it’s not a productivity trick. It’s a different way of working with AI systems — one that scales with complexity instead of collapsing under it.

We’re now building this pattern into a product. Not a framework you bolt onto your existing tools. Something deeper — where the organizational structure, the institutional memory, the information architecture between agents, and the governance that evolves from operational experience are all native capabilities of the system itself. The pattern shouldn’t require two terminals and a human copy-pasting prompts. It should be how the system thinks.

That’s all I’ll say for now. If you want to play with the manual version in the meantime, the starter template is below.


The Part Where I Tell You What to Do

I’m not going to wrap this in a bow. The pattern requires discipline. Two terminals. Two separate session contexts. State files. And one rule above all others: nothing goes to the Marshal that didn’t go through the Strategist first.

Nothing.

Not a quick follow-up, not a “let me just clarify this one thing,” not an off-the-cuff nudge. If the Marshal gets compacted and loses orientation, I don’t try to fix it in-session — I interrupt, go to the Strategist, tell it what happened, and ask it what we do. The Strategist knows everything that’s ever been exchanged between me and the Marshal. It has complete information. That’s what makes it a reliable chief of staff. The moment you start freelancing with the Marshal, the Strategist’s picture goes stale and your chief of staff becomes a chief of most of the staff.

Don’t do that.

The Strategist’s prompts are better than your improvisation. I say that as someone who learned it the hard way by breaking his own rules exactly once, watching the Marshal hallucinate a type system that didn’t exist, and spending forty-five minutes cleaning up a mess that would have taken zero minutes if I’d just maintained the chain of communication.

But the setup is simpler than it sounds:

  1. A meta repo with an agent configuration that defines the Strategist: prompt compiler, chief of staff, reads the Marshal’s state, produces copy-paste prompts, never touches code.

  2. Your project repo with an agent configuration that defines the Marshal: field commander, dispatches specialists, maintains state on disk, runs its own decision loop.

  3. State files on both sides. The Strategist keeps strategic notes, a prompt log, and a lessons file. The Marshal keeps a ledger, decision log, and execution plan. Both have compaction recovery protocols.

  4. You. Carrying prompts, carrying debriefs, making the calls neither agent is qualified to make. The job title is “human.” The actual role is “executive who happens to be the only one in the room who knows everyone’s name.”

I’ve published a starter template with both configuration files, the directory scaffolding, and a walkthrough of your first dispatch cycle. It’s the same pattern I used, genericized. The specifics of what you’re building don’t matter — the organizational structure scales to any complex project, so long as you stay on top of it.

Suggestion: Talk to your Strategist about what you’re trying to build or achieve. Ask whether there are any specific sections it would add to its own configuration or to the Marshal’s. Strive to keep those two configuration files lean — but don’t be surprised when they grow. Ours did. Every lesson adds weight. That’s the system working, not bloating.

→ Try it yourself: github.com/armenr/shadow-terminal


The Punchline

In my last post, I talked about villagers and werewolves — the information asymmetry between people compounding knowledge and people compounding dependence. The same asymmetry applies here, but one level up.

Most people are trying to have a better conversation with one AI. They’re optimizing prompts, tweaking system instructions, building better configuration files. All of that is fine. It’s necessary. It’s also, fundamentally, still a conversation. You’re still one person talking to one agent in one window, and calling it a workflow.

The people who are going to build the genuinely hard things — the things that take weeks, that span tens of thousands of lines, that require strategic coherence across dozens of interconnected decisions — aren’t having a conversation. They’re running an organization. Separate concerns. State on disk. A human in the loop not as a typist, but as a sovereign.

The tools are the same. The subscription costs the same twenty bucks a month. The difference is whether you’re using it like a coworker or like a company.

Don’t have a conversation. Run an organization.


This post was originally published on March 12, 2026 and updated on March 17 to reflect five weeks of operational experience with the pattern. The system described here has been in continuous daily use since the original publication.


Originally published on rmnr.net.