04 Mar 2026 9 min read

Chat Is Not Where It's At

It's Q1 2026, the information landscape around AI right now is genuinely hard to read. Across the interwebs, on reddit and twitter, in private discord servers, and in the mainstream news, frontline researchers, bedroom practitioners and everyone in between weighs in on the status of the AI revolution, the trajectory and the road ahead. They cite their lived experience, benchmark results, deployment postmortems, without any clearly emerging consensus. Contradictory things are simultaneously true, and there doesn't seem to be clean explanation for why.

Developers report massive productivity gains with AI coding assistance. Leaders of frontier labs threaten massive workforce disruption in timescales of months. But where's the traction? The products exist, at least on the business side, but where's the evidence they're actually sticking? Where are the revolutionary consumer products? The tooling ecosystem is exploding, still reporting has it that enterprise AI deployments fail at catastrophic rates. Somewhere north of 90% of corporate pilots stall or get quietly killed. Agents are everywhere. Nobody can point to an agent workflow that has unquestionably replaced a human process end-to-end, at scale, in the wild. Anthropic and OpenAI ship something that genuinely impresses serious people every few months, sometimes weeks or days. But what's actually different about what average people are doing now versus eighteen months ago?

None of this adds up cleanly. The signals point in too many directions to chalk up to merely hype or just cynicism. How is it possible for this incongruity to exist and persist?

The Easing

One way to look at the AI industry over the last three years is that it has been walking its way back from an initial overestimation. Sequentially recalibrating through distinct phases. Each step is a correction and each correction reveals what the previous assumption couldn't support. The direction of travel is consistent, toward an equilibrium that hasn't yet been reached. Early success made it seem like the model itself could function as both reasoning engine and control system, that intelligence inside the model could replace structure in the surrounding software. The last three years have largely been the slow rediscovery that these are different roles. LLMs are powerful inference engines, but they are weak control systems. Much of the turbulence comes from trying to build reliable software around that mismatch:

Phase 0 (Late 2022, early 2023): The capability shock

Initial fundamental overestimation. GPT-3.5 then GPT-4 land and something genuinely unprecedented happens: general instruction-following works. The industry draws an immediate inference; if a model can follow instructions, software (and cost) can collapse into prompting. Natural language is the new programming language. Two overinterpretations form almost instantly: that a model made large enough can understand tasks, and that the model can plan processes. Neither turns out to be fully true, but the belief is enough to launch everything that follows.

Phase 1 (2023): Just prompt it, bro.

Frameworks like LangChain emerge to formalize the dream. The architecture is simple: prompt goes in, result comes out, everything lives inside the context window. Engineering effort pours into dark arts of prompt crafting, few-shot examples, output parsing, chain-of-thought incantations. The LLM is treated as both reasoning engine and control logic simultaneously. It works well enough to be exciting and badly enough to reveal the first real limit: the model doesn't know your data.

Phase 2 (2024): We need RAG

The answer is RAG. Vector infrastructure explodes — Pinecone, Weaviate, FAISS, LlamaIndex — built on a simple theory: if the model can be provided the right information, it will produce the right result. Better, much better. But vector RAG only solves knowledge access. And not without further dark arts of chunking and indexing. Still limited to providing ambiguous semantically similar information to context. It says nothing about intent or process execution. The model is still regarded as the brain responsible for making sense out of the input.

Phase 3 (2024-2025): The agent hypothesis

Models gain tool APIs and function calling. For the first time the model could decide when to call external software, which made it appear capable of operating computational systems rather than merely generating answers. The assumption is immediate: autonomous workflows are now possible. AutoGPT, BabyAGI, then even more coordination frameworks — CrewAI, LangGraph, AutoGen — built on the belief that intelligence emerges if you let the model plan and call tools. The architecture grows elaborate: goal decomposition, tool calls, iteration loops, the LLM as orchestrator. This is where the paradigm hits its structural ceiling. LLMs are probabilistic, as control systems they are unstable. The more autonomous the workflow, the more the nondeterminism compounds.

Phase 4 (2025-2026): The production wall

Organizations discover what labs didn't advertise: LLM systems fail for software reasons, not intelligence reasons. Nondeterministic outputs, retrieval drift, tool misuse, infinite loops, untraceable reasoning. You can't hinge your business on this. The stalled pilot problem: 95% failure rate on enterprise deployments. The dominant explanation blames implementation; Bad prompts, wrong framework, insufficient fine-tuning. The failures are too consistent and too structural for that to be the whole story.

Into this moment drops OpenClaw. Personal computing orchestration, an agent that lives on your machine, sees your screen, operates your applications and accounts, acts on your behalf. A genuine phenomenon, and the most ambitious expression of the phase 3 paradigm yet. But who's actually using it: enthusiasts, hobbyists, developers with the technical (and financial) tolerance to babysit an autonomous system through its configuration and failure modes. Whether it delivers durable, reliable value to even advanced users remains genuinely unclear, anecdotal.

Phase 5 (Emerging Now): The architectural reversal, quietly underway

The most sophisticated practitioners are stopping trying to make the LLM the controller and building software systems that use the LLM in constrained, scoped steps. Deterministic workflows, typed interfaces, structured outputs. Smaller, bounded LLM decisions inside larger computational architecture. The LLM becomes a component. Not the controller. This is the equilibrium the easing has been pointing toward all along and it looks a lot like what good software engineering looked like before anyone decided structure was optional.

Running parallel to the mainstream track are dissenting ones, getting louder at each phase. In 2023, saying "prompting isn't enough" made you a killjoy. In 2024, saying "RAG isn't enough" made you difficult. By now, Yann LeCun's publicly arguing that the architecture itself is insufficient begins to gain more traction: we need world models, grounded representations of reality, something fundamentally different from next-token prediction. Fei-Fei Li's World Labs is massively funded on the premise that spatial intelligence, the ability to perceive, reason about, and interact with persistent 3D structure, is what current architectures fundamentally lack. Foundation Capital makes waves citing 'context models' as a trillion dollar opportunity. They're not saying the same thing, but they are all saying it from a step ahead.

The Nature of the Bubble

When most people say "AI bubble," they mean financial valuation. Inflated expectations, companies worth more than their revenue justifies, a correction coming. That might be true. But the bubble I'm describing is conceptual. It's the distance between where most people's mental models currently are and the equilibrium the easing is pointing toward. And at its core is a single belief that almost nobody states directly but that organizes nearly everything being built right now:

Structured representation is over.

Latent representations can replace explicit ones. The weights hold the knowledge. The context window holds the state. When we scale big enough the model can synthesize, reason, and retrieve on demand. Why would you ever maintain a schema again? Why model your domain formally when you can just describe it in natural language and let the LLM figure it out? The era of databases, ontologies, knowledge graphs, formal semantics — all that painstaking, expensive, brittle infrastructure — has been superseded by something that just... understands.

This belief, this aspirational assertion, is the bubble. Everything else flows downstream from it.

We've seen this pattern before. The lesson of the dot-com era was also about hazards of confusing an interface layer for the thing itself. Having a website was not the same as having a business, the website was just the customer-facing layer. The business still required logistics, fulfillment, inventory management, customer operations: the boring, expensive infrastructure underneath. Pets.com had a great website and no supply chain. The companies that survived the crash, Amazon most obviously, used the technology to build the infrastructure while the losers just polished the interface.

The AI version is the assumption that a chat interface, or a polished agent loop, can constitute a reliable computational system. Chat is merely the interface, the system requires everything the bubble declared unnecessary: persistent state, structured knowledge, formal orchestration, clear separation between the user layer and the processing layer. Most of what's been built in the last three years is "the website". The infrastructure is largely missing, because the prevailing belief was that we didn't need it anymore.

Failure Modes

Abandoning structured representation comes with predictable and legible consequences. You can trace common recurring issues directly back to that persistent foundational bet.

LLMs are stateless

Every invocation starts from scratch. Everything that looks like memory, conversation history, retrieved context, injected summaries, is scaffolding bolted on from outside. The model retains nothing between invocations. This is a property of the architecture, not a bug to patch. Building systems that require persistent state on top of a stateless component means solving state management outside the LLM architecture entirely: often ad hoc, in the scaffolding, differently with each design. Maintaining explicit structured state outside the model, is exactly what the "structured is over" belief was supposed to make unnecessary.

Current orchestration frameworks are computationally underpowered

At the simplest level, the dominant pattern is a linear chain of LLM calls. More sophisticated systems use DAGs where the LLM makes routing decisions between nodes, sometimes constrained by templated outputs. At the upper end of complexity, something like a finite state machine, though typically flat rather than nested. There's a hierarchy of computational power, reaching back to Chomsky if you want the formal grounding, and finite state machines sit near the bottom of it. Real cognitive processes require context-sensitive computation: nested structures, scoped state, accumulating context that changes how subsequent steps are interpreted. Most agent frameworks can't express this, the control flow remains leaky, unable to constrain behavior based on accumulated computational context. Organizing LLM calls into flat graphs is what seems natural when you're thinking about the problem as chaining prompts, when you believe the capabilities of the model's internal representations can substitute for explicit computational structure.

Context is not structure

Andrej Karpathy has a useful framing: the context window is like RAM. You load what you need, process it, output something. Hence "context engineering." But dumping your entire hard drive into RAM isn't efficient computing. A program, at the lowest level, is about selectively moving information from long-term indexed structured storage into working memory for data processing, then persisting the results back. The key word is structured. The long-term storage has a shape, relationships, queryable organization specifically designed to assist the computation. Vector RAG approximates this: retrieve relevant fragments, inject them into context, hope the model synthesizes them correctly. What it can't do is model the relationships between those fragments. The system has no representation it can navigate or easily update. It has a pile of text that might, best case, contain the relevant pieces. The belief that model weights can serve as the organization layer, that the knowledge is "in there" and just needs to be prompted out, is precisely what makes this feel like enough even though we know it isn't.

Chat collapses what needs to be separated

A conversation turn tends to collapse at least three distinct things: user intent, system state, and computational process. For question-answering, that's fine. For anything stateful, multi-step, or collaborative, any task where the system needs to maintain context across sessions, coordinate multiple processes, or integrate feedback over time, it's fatal. You can't express complex system state with precision through conversational turns. You can't provide meaningful oversight to a process you can only observe through chat messages. And you can't separate interface concerns from computational concerns when they're collapsed into the same stream.

Inevitable Equilibrium

The end-state that the easing is pointing toward isn't mysterious or hard to describe. The industry already knows the components and the experience has already largely been envisioned.

LLMs as cognitive components inside larger “conventional” computational architectures.

Transformers didn’t replace the architecture of computing; they introduced a new primitive inside it: probabilistic semantic inference. Everything else in the system still needs the determinism and structure software has always required. Explicit state management, persistent, structured, and queryable, living outside the model where it can actually be reasoned over. Orchestration with real computational power seated in the architecture itself, capable of handling nested contexts and accumulating records without depending on the model's internal representations to hold things together. A clear separation between the interface layer and the computational substrate. And human participation designed in as a first-class structural element, not just a validator on work that's already essentially complete but an integrated part of how the system operates.

State machines, formal semantics, structured data, computational hierarchy: these are solved problems. The industry knew how to build this before LLMs existed. The bubble is the temporary belief that transformers made it unnecessary. That natural language cognition was powerful enough to absorb everything, that the boring infrastructure could be replaced with a big enough model, a large enough context window, a clever enough prompt. The one domain where LLMs immediately delivered unquestionable durable value, coding assistance, implicitly relies on having this structure in place. Compilers, syntax, repositories, and tests provide a deterministic environment that constrains the model’s probabilistic reasoning.

Transformers added something genuinely powerful; natural language cognition at a level that wasn't previously possible. But a new capability isn't a replacement architecture, nor should it be. Structured representation isn't over. It never was. The 95% failure rate is the expensive forcing function which will cause the industry to rediscover that the hard way. What that equilibrium looks like as an interaction paradigm is Part II.