How Does GPT-5 Work?

Welcome to another premium edition of Where's Your Ed At! Please subscribe to it so I can continue to drink 80 Diet Cokes a day. Email me at ez@betteroffline.com with the subject "premium" if you ever want to chat. I realize this is before the paywall, so if you email me without paying, no promises I don't respond with the lyrics to Cheeseburger In Paradise.

Also: this is an open call — if you've tried prompt caching with GPT-5 on OpenAI's API, please reach out!

You've probably heard a lot about GPT-5 this week, with takes ranging from "it's just good at stuff" to SemiAnalysis' wild statement that "GPT-5 [is setting] the stage for Ad Monetization and the SuperApp," a piece that makes several assertions about how the "router" that underpins GPT-5 is somehow the secret way that OpenAI will inject ads.

Here's a quote:

Before the router, there was no way for a query to be distinguished, and after the router, the first low-value query could be routed to a GPT5 mini model that can answer with 0 tool calls and no reasoning. This likely means serving this user is approaching the cost of a search query.

This...did not make a ton of sense to me. Why would this be the case? The article also makes a lot of claims about the "value" of a question and how ChatGPT could — I am serious — "agentically reach out to lawyers" based on a query.

In fact, I'm not sure this piece reflects how GPT-5 works at all.

The router serves multiple purposes on both the cost and performance side. On the cost side, routing users to mini versions of each model allows OpenAI to service users with lower costs

To be fair on SemiAnalysis, it's not as if OpenAI gave them much help.

Here's what it says:

GPT‑5 is a unified system with a smart, efficient model that answers most questions, a deeper reasoning model (GPT‑5 thinking) for harder problems, and a real‑time router that quickly decides which to use based on conversation type, complexity, tool needs, and your explicit intent (for example, if you say “think hard about this” in the prompt). The router is continuously trained on real signals, including when users switch models, preference rates for responses, and measured correctness, improving over time. Once usage limits are reached, a mini version of each model handles remaining queries. In the near future, we plan to integrate these capabilities into a single model.

There is a really, really important distinction to make here: that GPT-5, as described above, is referring to GPT-5 as part of ChatGPT. OpenAI's API-based access to GPT-5 models does not route them, nor does OpenAI offer access to its router, or any other associated models.

How do I know this? Because I went and found out how ChatGPT-5 actually works.

Sidebar: From here on out, I will define two things: GPT-5, referring to the model (and its associated mini and nano models) and ChatGPT-5, referring to the current state of ChatGPT, which features an "Auto," "Fast," and "Thinking," and "Thinking Mini."

In discussions with a source at an infrastructure provider familiar with the architecture, it appears that ChatGPT-5 is, in fact, potentially more expensive to run than previous models, and due to the complex and chaotic nature of its architecture, can at times burn upwards of double the tokens per query.

ChatGPT-5 is also significantly more convoluted, plagued by latency issues, and is more compute-intensive thanks to OpenAI's new "smarter, more efficient" model.

In simple terms, every user prompt on ChatGPT — whether it's on the auto, "Fast," "Thinking Fast" or "Thinking" tab — starts by putting the user's prompt before the "static prompt," which is a hidden prompt where instructions like "You are ChatGPT, you are a Large Language Model, You Are A Helpful Chatbot" and so on goes. These static prompts are different with each model you use - a reasoning model will have a different instruction set than a more chat-focused one, such as “think hard about a particular problem before giving an answer.”

This becomes an issue when you use multiple different models in the same conversation, because the router — the thing that selects the right model for the request — has to look at the user prompt. It can’t consider the static instructions first. The order has to be flipped for the whole thing to work.

Put simpler: Previous versions of ChatGPT would take the static prompt, and then (invisibly) append the user prompt onto it. ChatGPT-5 can’t do that.

Every time you use ChatGPT-5, every single thing you say or do can cause it to do something different. Attach a file? Might need a different model. Ask it to "look into something and be detailed?" Might trigger a reasoning model. Ask a question in a weird way? Sorry, the router's gonna need to send you to a different model.

Every single thing that can happen when you ask ChatGPT to do something may trigger the "router" to change model, or request a new tool, and each time it does so requires a completely fresh static prompt, regardless of whether you select Auto, Thinking, Fast or any other option. This, in turn, requires it to expend more compute, with queries consuming more tokens compared to previous versions.

As a result, ChatGPT-5 may be "smart," but it sure doesn't seem "efficient." To play Devil's Advocate, OpenAI likely added the routing model as a means of creating more sophisticated outputs for users, and, I imagine, with the intention of cost-saving. Then again, this may just be the thing that it had ready to ship — after all, GPT-5 was meant to be "the next great leap in AI," and the pressure was on to get it out the door.

By creating a system that depends on an external routing model — likely another LLM — OpenAI has removed the ability to cache the hidden instructions that dictate how the models generate answers in ChatGPT, creating massive infrastructural overhead.

Worse still, this happens with every single "turn" (IE: message) on ChatGPT-5, regardless of the model you choose, creating endless infrastructural baggage with no real way out that only compounds based on how complex a user's queries get.

Could OpenAI make a better router? Sure! Does it have a good router today? I don't think so! Every time you message ChatGPT it has the potential to change model or tooling based on its own whims, each time requiring a fresh static prompt.

It doesn't even need to be a case where a user asks ChatGPT-5 to "think," and based on my tests with GPT-5, sometimes just asking it a four-word question can trigger it to "think longer" for no apparent reason.

OpenAI has created a product with latency issues and an overwhelmingly convoluted routing system that's already straining capacity, to the point that this announcement feels like OpenAI is walking away from its API entirely. Unlike the GPT-4o announcement, which mentions the API in the first paragraph, the GPT-5 announcement has no reference to it, and a single reference to developers when talking about coding. Sam Altman has already hinted that he intends to deprecate any "new API demand" — though I imagine he'll let anyone in who will pay for priority processing.

ChatGPT-5 feels like the ultimate comeuppance for a company that was never forced to build a product, choosing instead to bolt increasingly-complex "tools" onto the sides of models in the hopes that one would magically appear.

Now each and every "feature" of ChatGPT burns even more money than it did before.

ChatGPT-5 feels like a product that was rushed to market by a desperate company that had to get something out the door.

In simpler terms, OpenAI gave ChatGPT a middle manager.

Read the full story

Already have an account? Sign in

How Does GPT-5 Work?

Read the full story

Welcome to Where's Your Ed At!