The Semantic CPU
A CPU is inert.
No memory, no storage, no peripherals. Take a processor — billions of transistors, thousands of instructions in its repertoire, decades of engineering in every square millimeter of silicon — strip it of everything except its instruction set, and drop it on a table.
It’s a marvel of engineering that can do exactly nothing.
This is not a criticism. The instruction set is extraordinary. An x86 chip can add, multiply, branch, compare, shift, load, store, and execute speculative operations across pipelined stages at billions of cycles per second. But without RAM, there’s nowhere to put the data. Without a bus, there’s no way to move it. Without I/O, there’s no way to receive input or produce output. Without an operating system, there’s no way to coordinate any of it.
The CPU is the most important component in a computer. It is also the most useless one in isolation.
Hold that thought.
The Chip on the Table
Now look at a large language model.
An LLM has an extraordinary built-in instruction set: the statistical weight of its training data, the patterns it’s learned, a constitution that shapes its behavior, and the ability to operate natively in human language. That instruction set is remarkable — arguably the most sophisticated information-processing capability anyone has engineered to date.
But strip away the context window — its only volatile working memory — and it retains nothing between invocations. No persistent state. No file system. No knowledge of what it did five minutes ago, let alone five months ago. Every session starts from zero. Every conversation is a cold boot.
The context window is the LLM’s equivalent of RAM — fast, variable-sized, capable of holding the entire working set. And like RAM, it’s volatile. Kill the session, lose the state. There is no disk. No persistent storage you can write to. Just the chip and whatever fits in working memory.
Now, here’s where the analogy bends, and I want to name that honestly. An LLM does have a kind of persistent storage: its weights. Hundreds of billions of parameters, trained over months, encoding the statistical structure of human knowledge. That’s real. You could think of the weights as ROM — read-only memory baked in at training time.
But you can’t write to it. You can’t update your own weights at inference time. You can’t say “remember this for next session” and have it stick. The weights are the instruction set and the reference library, not the notebook. An agent that can read the encyclopedia but can’t jot down a Post-it note is still an agent without functional memory.
The LLM is a stateless executor. The most capable one ever built — and the most unfinished.
Classical computing has two canonical architectures for connecting a processor to the world. Von Neumann: one memory space for both instructions and data, one bus. Harvard: separate memories, separate buses. Both assume the processor has persistent memory it can read from and write to. Both assume that state survives between operations.
An LLM’s architecture is closer to Harvard than Von Neumann — weights and context are separate, non-unified memory spaces. But it’s a Harvard machine with a critical disability: one memory is read-only, the other is volatile. You have ROM and you have RAM, but you have no disk. No way to persist what you’ve learned. No way to carry state across sessions. It’s something we don’t have a great name for yet — a processor with storage it can’t update and working memory it can’t keep.
But the Instruction Set Is Semantic
Here’s where the analogy gets interesting, because the LLM isn’t just a different shape of CPU. It’s a fundamentally different kind of computing element.
A CPU is deterministic. Same input, same output. It operates on bits — ones and zeros, voltage levels, logic gates. It follows procedures. It executes finite-state logic. This is why we can prove programs correct. This is why caches work, why pipelines work, why branch prediction works. Determinism is the foundation that every optimization in classical computing is built on.
An LLM, at its default operating mode, is probabilistic. Same input, different output. Yes — you can set the temperature to zero, clamp the sampling, and get deterministic behavior. But that’s not the interesting mode. That’s using a jazz musician as a metronome. The interesting mode is the one where the machine explores — where it navigates a high-dimensional space of meaning and produces outputs that are, at their best, not retrievals but something that, if a human did it, we’d call reasoning.
Whether that constitutes “real” reasoning is a debate I’ll leave to the philosophers and the ML researchers who are still arguing about it. What I can tell you from three years of building with these systems: whatever you call it, the outputs are useful in ways that deterministic pattern matching has never been. The machine does something in the space of language that no prior computing element has done. That’s the observation that matters for what follows.
A CPU’s fundamental parameter is clock speed — how many deterministic operations per second. An LLM’s closest equivalent is temperature — and temperature doesn’t control speed. It controls how surprising the outputs should be. How much the machine should explore the space of possible responses versus exploiting the most likely one. There is no analog for that in classical computing. We have never had a computing primitive whose fundamental tuning parameter is the degree to which it should surprise you.
A CPU computes. An LLM does something that, for lack of a better word, looks like comprehension. We have never had a computing primitive that does that.
A CPU doesn’t “understand” an addition. It flips gates. The number 7 has no semantic content inside a processor — it’s a voltage pattern. An LLM doesn’t retrieve the answer to your question from a lookup table. It navigates meaning, shaped by the structure of everything it’s been trained on, and produces an output that is — at its best — not a recollection, but a thought.
The interesting thing about this machine is not that it always thinks. It’s that it sometimes does, and it also does a lot of things that look nothing like thinking — pattern completion, statistical correlation, confident recitation. The fact that the same machine does both is the defining strangeness. No prior computing element operated in the space of meaning at all, let alone unreliably.
And here’s the problem.
The Mismatch
Every tool, framework, API surface, orchestration layer, storage system, and infrastructure component in modern computing was built for the old machine. The deterministic one. The one that operates on bits, follows procedures, and produces the same output given the same input.
None of it was designed for a computing element that operates in language and behaves probabilistically.
Databases. Built for structured queries, typed schemas, ACID transactions. The whole paradigm assumes you know what you’re looking for and can express it in a formal query language. LLMs are actually decent at generating SQL — that’s not the problem. The problem is that SQL can’t express the queries a semantic system actually needs. “What do I know that’s relevant to this situation, weighted by how recently I learned it and how many times it’s been validated?” That’s not a SELECT statement. The query language itself is the wrong abstraction.
Orchestrators. Built for predictable state machines — DAGs, finite-state transitions, predefined branches. An LLM doesn’t follow a DAG. It reasons about what to do next based on context, history, and judgment. Forcing it into a predefined execution graph is forcing fluid thought into rigid plumbing. You end up writing a thousand-line system prompt to describe every edge case the DAG can’t handle, and the whole point of the DAG was to avoid that kind of complexity.
Memory. This is the one that gets me. RAM is volatile but fast. Disk is persistent but slow. Both are byte-addressable. Neither has any concept of relevance, decay, or reinforcement. They don’t know what’s important. They don’t forget what isn’t. They don’t strengthen knowledge that gets used or weaken knowledge that doesn’t. They store bytes at addresses and retrieve bytes from addresses. That’s it.
A memory system for a semantic computing element should behave semantically. A decision your agent made three months ago should persist differently than a casual observation it made yesterday. A lesson learned from a failure should reinforce every time the same context reappears. Knowledge that hasn’t been accessed in six months should fade — not because storage is expensive, but because stale knowledge actively degrades reasoning when it surfaces at the wrong moment.
No vector store does this. No relational database does this. They weren’t designed to. They were built for a machine that doesn’t need to remember — it just needs to store and retrieve.
We handed a system that operates in meaning a filing cabinet and a Rolodex and wondered why it couldn’t think across time.
The Fool’s Errand
So what’s the industry doing about this?
Making the CPU faster.
More parameters. Longer context windows. Faster inference. Better benchmarks. Every AI lab on the planet is locked in an arms race to build a better model — more capable, more aligned, more efficient. Anthropic, OpenAI, Google, DeepSeek, Meta. Billions in funding. Thousands of researchers. Custom silicon.
This is real work. Important work. I’m not dismissing it. The models are getting meaningfully better, and I benefit from that every day.
But here’s what I keep coming back to: for the kinds of agent work I do — coding, architecture, orchestrating multi-crate Rust projects — GPT-4-class models were sufficient two years ago. Not perfect. Sufficient. What wasn’t sufficient was everything around the model. The memory that doesn’t exist. The orchestration that’s either a single conversation or a rigid DAG. The tool interfaces that are either “here’s a function signature, good luck” or a thousand lines of system prompt guardrails. The governance that’s either nothing or a hard-coded rule set that can’t adapt to context.
I’ll say clearly: there are entire categories of agent work where models are NOT yet sufficient — long-horizon autonomous planning, scientific reasoning, medical diagnosis, mathematical proof. The “models are good enough” claim is domain-dependent, and anyone who tells you otherwise is selling something.
The model companies are playing a game with table stakes of $10 billion and 10,000 PhDs. They’re the Intel and AMD of this era — building better processors, faster, with more transistors. That’s their job. They’re excellent at it.
That’s also not your game. Or mine.
The fool’s errand isn’t building with LLMs. It’s trying to compete on building better LLMs when the chip is sitting on a table with no memory, no bus, and no operating system.
The Hypothesis
I could be wrong about this. I want to say that up front, because what follows is a hypothesis — informed by three years of building with these systems daily, but a hypothesis nonetheless.
Full disclosure: I’m building in this space. I have a stake in the conclusion that the bottleneck is infrastructure rather than model quality. You should weigh that when you evaluate what follows.
The hypothesis: for the domains where current models are already capable — coding, analysis, multi-step tool use, conversational problem-solving — the bottleneck isn’t the model. It’s everything around it.
My evidence comes from one large project. A complex Rust system — 110,000+ lines across multiple crates, thousands of tests, architectural decisions compounding across dozens of prompt cycles. I believe what I’ve seen generalizes. You should be skeptical of that belief. Here’s what I’ve seen:
The limiting factor in every single session, every single time, is not the model’s intelligence. It’s context management. It’s memory. It’s the fact that the agent wakes up with amnesia every session and has to rediscover what it knew yesterday. It’s the orchestration collapsing when a single conversation can’t hold the complexity of the problem. It’s tool interfaces that require more prompt engineering to describe than they took to build.
I wrote about context compaction — the silent killer of long AI sessions, where the system compresses older conversation to make room and critical decisions simply vanish. That’s not a model problem. That’s an infrastructure problem. The model didn’t forget. The infrastructure around it threw the memory away.
I watched an agent with institutional memory outperform a specialist investigation agent that was arguably smarter in the moment — because the memory-aware agent already knew things the specialist had to rediscover from scratch. Same model. Different infrastructure. The result wasn’t close.
Maybe Claude 5 will be so powerful that infrastructure doesn’t matter. If the next generation of models drops agent failure rates by 80% without any infrastructure changes, I was wrong. Full stop.
But I think the opposite is more likely: better models will make good infrastructure more valuable, not less. A more capable model benefits more from rich context, not less. A smarter agent with no memory is still an agent with no memory — it just forgets more impressive things. The better the chip, the more it needs the rest of the computer.
And that’s never not been true. The Intel 8080 didn’t make personal computer operating systems unnecessary. It made them inevitable.
The Opportunity
If the bottleneck is infrastructure, then the opportunity is infrastructure. But not “AI-powered” versions of existing tools. Not a SQL database with an LLM adapter bolted on. Not a finite-state orchestrator with natural language transitions. Not a vector store wearing a memory costume.
The opportunity is infrastructure designed from first principles for a semantic, probabilistic computing element. Tools that think in the same substrate the machine thinks in.
I’m building something called h00.sh to test this hypothesis. It’s an embeddable memory substrate for AI agents, written in Rust. Here’s the bet I’m making — not as abstract principles, but as concrete architectural decisions I’m living with:
Memory that behaves like memory. Different kinds of knowledge should persist differently. A decision your agent made three months ago is not the same kind of thing as an observation it made ten minutes ago — they should decay at different rates, reinforce under different conditions, and surface in different contexts. The system shouldn’t just store — it should curate. Not because someone wrote a cron job to clean up stale records, but because the memory substrate itself understands that not all knowledge ages the same way.
Search that understands need, not just similarity. The question isn’t “find documents similar to this query.” It’s “what does the agent need right now, given what it’s doing, what it’s done before, and what’s gone stale since last time?” The consumer of search results is a reasoning engine, not a human scanning a list. That changes what search should optimize for.
Knowledge as structure, not text. h00.sh includes a code intelligence layer built on a knowledge graph — symbols, relationships, reachability analysis. “Is this function actually reachable from a production entry point?” is a question no traditional type system can answer, but it’s exactly the kind of structural knowledge an agent needs to make sound decisions about code. Graphs, not tables. Reachability, not joins.
These are my specific bets. Other people will make different ones. The general principle is what matters: the infrastructure layer for semantic computing doesn’t exist yet, and the pieces that do exist were built for a different machine.
The best patterns from classical computing aren’t wrong. They’re aimed at the wrong substrate. Take garbage collection — McCarthy invented it in 1959 for Lisp, and the core insight (automatically reclaim resources the program no longer needs) is exactly right. But what does garbage collection mean when the garbage is stale knowledge? What does cache eviction mean when “least recently used” should be “least relevant given current context, weighted by how many times this knowledge has been validated”? What does a type system mean when types describe temporal behavior and semantic categories, not byte layouts and column constraints?
Same concepts. Genuinely different implementations. The challenge isn’t invention — it’s translation. Taking fifty years of hard-won systems engineering wisdom and re-grounding it in a substrate that operates on meaning instead of bits.
The old ideas are good. The old substrate is wrong. Bring the patterns. Rethink what they mean when the machine thinks in language.
It’s 1975
Here’s the parallel I keep arriving at.
In 1974, Intel shipped the 8080. Eight-bit processor. 2 MHz clock speed. It was, by any reasonable standard, a sufficient computing element for useful work. The Altair 8800 was built around it. CP/M — the first operating system that regular people could use on a microcomputer — was written for it. BASIC ran on it.
The 8080 was good enough. What it lacked was everything around it. No standard bus. No standard I/O. No storage worth mentioning. No operating system that most people would recognize as one. The Apple I was two years away. The IBM PC was seven years away. The Macintosh was a decade out.
And here’s what I think people forget about the next twenty years of computing: the CPU improved significantly. 8-bit to 16-bit to 32-bit. Protected mode. Virtual memory. Each step was a real architectural leap. But what transformed computing — what turned the microprocessor from a hobbyist curiosity into the foundation of modern civilization — was everything else. DOS. The BIOS standard. The ISA bus. Hard drive controllers. Programming languages. Compilers. Networking stacks. Window managers. The entire ecosystem that made the processor useful.
The companies that defined the next era of computing weren’t only the chip makers. Intel shaped everything — but so did Microsoft, Apple, Oracle, Cisco. The platform builders. The infrastructure builders. The ones who figured out that a good processor was necessary but not sufficient, and the rest of the computer was where the leverage lived.
Now look at 2026.
The LLM is powerful. More powerful than the 8080 ever was relative to its era. But there’s no standard memory architecture. No standard agent protocol. MCP is maybe the beginning of a bus standard — an early attempt at letting the processor talk to peripherals. But the rest of it — the memory, the storage, the governance, the type system, the orchestration — is all still being invented. Or, worse, being duct-taped together from components designed for a completely different kind of machine.
We are at 1975. The CPU exists. It’s remarkable. We’re still building the computer around it.
I’m not going to end this with a call to action, because that would miss the point. This isn’t an announcement. It’s a bet — one I’m making with my own time, my own code, and my own conviction about where the leverage is.
The LLM is a given. A remarkable, extraordinary, world-changing given. The people improving it are doing essential work and I genuinely hope they never stop.
But the opportunity — the one that’s actually available to people who build — isn’t a better chip. It’s the memory, the storage, the bus, the operating system, the programming languages, the governance. All the things that turn a processor into a computer.
We have the chip. We need the computer.
This post was written with the help of Claude. The system that helped me write it forgets everything about this conversation the moment the session ends. It’s also why it took longer than it should have. That’s the problem.