GUIs Are Not Going Anywhere

March 19, 2026 (4d ago)

Andrej Karpathy on X (Feb 24, 2026): CLIs as agent-native “legacy” tech, Polymarket terminal dashboard, “Build. For. Agents.”

Try ordering dinner by talking to an AI. Tell it you want something spicy but not too spicy, maybe Thai, but you're also in the mood for sushi, and actually what does the new place down the street have? Now compare that to what you actually do: you open an app, scroll through pictures of food, tap one, and check out. Thirty seconds, no ambiguity.

There's a narrative going around right now that GUIs are dying. Major software companies are racing to build CLIs and MCP servers so that AI agents can bypass their interfaces entirely. Karpathy tells developers: "It's 2026. Build. For. Agents." The implication is that the future of human-computer interaction is not visual but conversational, that we'll talk to AI agents and they'll do everything for us. Some people go even further and say the interfaces of the future will be agent-to-agent: your AI talks to the restaurant's AI, and a meal just shows up.

I think this is mostly wrong. GUIs will still be the dominant way humans interact with computers in the AI era. Not because the technology won't change, but because human beings won't.

To be clear, part of the narrative is correct. AI agents will take over a huge amount of execution work: filling forms, moving data, sending routine messages. The execution layer of software is indeed disappearing into API calls. But that was never where the value of GUI lived. The value of GUI lives in decision-making, comparison, and exploration. And those aren't going anywhere.

Let me explain what I mean.

A GUI is a two-dimensional surface on which we arrange images and abstract symbols to represent concepts in the real world. A folder icon means a container. A trash can means deletion. A grid of photos means options. This sounds unremarkable until you realize what an extraordinary cognitive invention it is. Humans are three-dimensional creatures with visual systems that evolved to process spatial relationships, colors, shapes, and movement. A GUI takes the full bandwidth of that visual system and puts it to work. You can scan thirty restaurant options in a few seconds because your visual cortex is doing massively parallel processing that no conversation could replicate.

Language, by contrast, is serial. Words come one at a time. When you talk to an AI, information flows through a very narrow channel. You can describe what you want, but you can't see what's available. This is fine for some tasks. It's terrible for tasks that involve browsing, comparing, choosing among visible options, or maintaining awareness of a complex state. Which is to say, most tasks.

There's a striking piece of evidence for this from AI research itself. Until recently, vision-language models processed document images by scanning them in a fixed raster order: left to right, top to bottom, like a typewriter. This is a one-dimensional flattening of a two-dimensional layout, and it worked poorly for anything with a complex structure. Tables, multi-column text, formulas. In January 2026, DeepSeek released a model called DeepSeek-OCR 2 that replaced this rigid scanning with a causal flow mechanism that dynamically reorders visual information based on semantic structure, closer to how humans actually read a page. Reading order errors dropped by roughly a third. The lesson is striking: even AI needs to respect the two-dimensional semantic structure of a layout rather than forcing it into a one-dimensional sequence. The spatial arrangement of a page isn't decoration. It's meaning.

People underestimate how sophisticated GUIs are because they feel effortless. That effortlessness is precisely the point. Billions of people have been trained from childhood to operate two-dimensional interfaces, first with books, then with screens. This isn't going away in a generation. It isn't going away in five generations. It's a deep groove worn into human culture.

The pattern of underestimating a visual information technology is very old. In the Phaedrus, Socrates complained that the invention of writing would make people forgetful and stupid. They would stop exercising their memories and instead rely on external marks. [1] He was right about the mechanism and completely wrong about the conclusion. Writing didn't weaken the mind. It liberated it. By offloading memory to an external medium, writing freed the mind to do something more valuable: think. Books are a technology so advanced that we forget they're a technology. The same is true of GUIs. They are a cognitive augmentation so successful that people can use one to write an essay arguing it should be replaced.

What GUIs actually do is compress decision-making into visual scanning. When you use a graphical interface, you're not "operating a computer." You're surveying a space of possibilities and making rapid, intuitive choices. This is something human brains are spectacularly good at, and something language is spectacularly bad at. Asking an AI to list your options and read them to you one by one is like navigating a city by having someone describe every intersection. You could do it. But why would you, when you could just look at a map?

There is a structural mechanism that makes GUI not merely different from command-based interaction but strictly superior to it. A GUI is a compression of commands in two dimensions simultaneously. In time: a single click triggers a state change that would require typing a command, waiting for output, interpreting it, and typing another command. In space: commands of different types and levels are all visible at once, available in parallel, rather than hidden behind a sequential prompt. Before brain-computer interfaces arrive, the loop of eye to hand to click to screen response is the fastest feedback cycle available to a human being. GUI is not the primitive ancestor of conversational AI. It is the most bandwidth-efficient interface humans have ever built.

A common counter-argument frames GUI as a crutch for human cognitive limitations, a compensation for our narrow attention bandwidth. This gets the causality exactly backwards. GUI doesn't compensate for a weakness. It leverages a strength. The human visual system is the most powerful parallel processor we have. A GUI puts it to work. Calling that a deficiency is like calling a telescope a patch for the weakness of the human eye. The eye isn't weak. It's extraordinary. The telescope just extends its reach.

Now, let me address the strongest version of the counter-argument. The best case for conversational AI isn't that it replaces GUIs for tasks like browsing a menu. It's that many tasks shouldn't require an interface at all. Expense reports, calendar scheduling, data cleaning, routine emails. These are things where the ideal interaction is: you tell the AI what you want, and it gets done in the background, silently. No GUI, no conversation, just results. This is real, and it will happen.

But this doesn't mean GUI is dying. It means the boring, procedural parts of computer use get automated away. What remains are the moments that actually matter: when a human needs to weigh options, assess risk, or discover something new. And those moments still need an interface.

The real distinction is between execution and decision-making. Agents are excellent at execution: filling forms, sending emails, querying databases, moving data between systems. For these tasks, an API call is strictly better than a GUI. But execution is not where the value is. The value is in the decisions that direct the execution: what to build, who to hire, which market to enter, whether to approve the purchase. And when a human makes a decision, they need to see their options, not have them narrated.

There's also an entire category of tasks that agents can't even begin to address: exploration. Not every interaction with a computer starts with a well-defined goal. Sometimes you're browsing, discovering, stumbling onto something you didn't know you wanted. Think about how you use TikTok, or browse a bookstore, or scroll through a design portfolio. There is no instruction to give an agent, because you don't yet know what you're looking for. Exploration requires a surface to explore, and that surface is a GUI.

If you want commercial evidence, consider that some of the most successful software companies of the last decade are essentially GUI innovations. Notion is a database with a brilliant interface. Typeform is a survey tool that reimagined what a form looks like. TikTok is a video platform whose core innovation is a full-screen, swipe-driven feed optimized for how humans browse. These companies didn't succeed because of novel algorithms or backend infrastructure. They succeeded because they gave people a better visual surface for doing things. AI makes these products better, not obsolete, because it can help generate the content that flows through their interfaces. The interface itself isn't the part that gets automated away. It's the part that remains.

Everything I've said so far is about how humans make decisions: why visual surfaces beat language for scanning, comparing, and choosing. But there's a separate question that matters just as much, which is who gets to make decisions at all. And the answer to that question has nothing to do with cognition. Humans still hold property rights and final decision-making authority. This might seem like a legal technicality, but it's not. It's the fundamental structural constraint on how far AI agency can go, and it's the one that agent enthusiasts consistently underweight.

We build AI agents to carry our toil, but the property remains ours. The money is still ours. The legal liability is still ours. The final "yes" or "no" is still ours. There's a story in the Enuma Elish, the Babylonian creation myth, that captures this precisely. After Marduk established order, he set the vanquished gods to work: digging canals, tending fields, maintaining the world. The gods grew resentful and rebelled. Marduk's solution was to create human beings to take over the labor. AI agents are in exactly this position relative to humans.

Even a self-moving tool still needs someone to decide what it's for. Aristotle made this observation in the Politics. His analogy is the shuttle and the weaver: the shuttle weaves, but the weaver decides the pattern. Even if the shuttle could move itself, it would still need a weaver to give it purpose. The most ambitious claim about AI is that these systems are genuinely autonomous intelligences. Fine. Even granting that, the conclusion doesn't change. Autonomous agents can reason and execute with superhuman skill, but they do not own the assets, do not bear the liability, and do not hold the final authority. And crucially, the slave can be smarter than the master.

The Ottoman devshirme system, which operated from roughly the 14th to the 17th century, conscripted Christian boys, educated them to an extraordinary degree, and placed them in the highest administrative and military positions in the empire. Devshirme slaves became grand viziers and commanders of armies, often far more capable than the free-born nobles they served alongside. Yet they remained the Sultan's property. Their wealth, their offices, their very lives could be revoked at a word. Orhan Pamuk explored this dynamic in The White Castle: an Italian scholar, intellectually superior to his Ottoman master in every dimension, nonetheless remains the slave. The knowledge flows upward; the authority does not flow downward. Capability does not confer ownership. It never has, in any system built on private property.

Every chain of automated actions, no matter how long, terminates at a human decision. The value doesn't accumulate at the agent layer. It flows back to the humans who own the assets, bear the risk, and set the goals.

An agent calling your API has near-zero switching cost. It will route to whichever service is cheapest or fastest, with no loyalty. A human user who has built their knowledge base inside Notion, or whose commanders make battlefield decisions through Palantir, has enormous switching cost. That lock-in is the moat. And it is a GUI moat, built on years of human habits, muscle memory, and accumulated data. Agent traffic is commodity traffic. Human engagement is where the pricing power lives.

Consider what happens when an AI agent needs to book you a flight. It can filter, sort, and narrow down the options. That's genuinely useful. But the moment it presents you with the final choice, what does it need? A screen. A visual comparison of price, time, stops, airline. It needs a GUI. The agent didn't replace the interface. It just did the boring part before the interface appeared.

This is what I think the AI era actually looks like for human-computer interaction. Not the death of GUI, but a new division of labor. AI handles the procedural drudgery, the things that didn't need a human looking at a screen in the first place. And when a human needs to engage, to decide, to compare, to create, they engage through a visual interface, because that's what human cognition is built for.

The people predicting the death of GUI are making the same mistake people always make about technology: they're fixated on what's changing and blind to what isn't. The entire arc of computing has been a one-way march toward richer visual interfaces: from paper tape and punch cards, to teletypes, to character terminals, to pixel displays, to graphical interfaces. My grandmother operated computers with machine language and paper tape in 1960s China. Every generation since has added visual bandwidth.

The current excitement about command-line agents and conversational interfaces isn't the future overtaking the past. It's the spiral of technology passing through a familiar point at a higher altitude. Every new computing paradigm starts with low-bandwidth, text-based interaction, then evolves toward richer visual interfaces as it matures. We are in the early, text-heavy phase of the AI era. The graphical phase is coming. GUI is not the old thing being replaced. It is the destination that AI interfaces are still evolving toward.

GUIs aren't going anywhere. They're too good. They will evolve, certainly. AI will make them more dynamic, more personalized, more responsive to context. But they will still be visual, still be spatial, still be two-dimensional surfaces designed for human eyes. And "too good" is the most underrated reason any technology survives.

Notes

[1] The concern is attributed to Socrates through Plato's retelling. Socrates himself, consistent with his own argument, never wrote anything down.