GPT-5 Takeoff Encounters Turbulence: OpenAI's new model hits turbulence with cost, performance, and API complaints

OpenAI launched GPT-5, the highly anticipated successor to its groundbreaking series of large language models, but glitches in the rollout left many early users disappointed and frustrated.

Bar charts comparing GPT-5’s speed, price, and intelligence vs. top AI models like Claude, Gemini, and Grok in 2025 benchmarks.
Loading the Elevenlabs Text to Speech AudioNative Player...

OpenAI launched GPT-5, the highly anticipated successor to its groundbreaking series of large language models, but glitches in the rollout left many early users disappointed and frustrated.

What’s new: Rather than a family of models, GPT-5 is a family of systems — GPT-5, GPT-5 Mini, GPT-5 Nano, and GPT-5 Pro — that include non-reasoning and variable-reasoning models along with a router that switches between them automatically depending on the input. OpenAI made GPT-5 the only option in the ChatGPT user interface without prior notice, but the router failed right out of the gate, causing the company to reinstate ChatGPT access to earlier models for paid users.

  • Input/output: Text and images in (up to 272,000 tokens), text out (up to 128,000 tokens including reasoning and response, 122 tokens per second, 72 seconds to first token)
  • Performance: Outperforms previous OpenAI models on most benchmarks reported; tops competing models on some benchmarks of math, coding, and multimodal abilities as well as health knowledge; reduced hallucinations
  • Features: Developer options include four levels of reasoning, three levels of verbosity (output length), tool calling via JSON or natural language, selectable non-reasoning and reasoning models, summaries of reasoning tokens
  • Availability/price: Via API GPT-5 $1.25/$0.13/$10 per million input/cached/output tokens, GPT-5 Mini $0.25/$0.025/$2 per million input/cached/output tokens, GPT-5 Nano $0.05/$0.005/$0.40 per million input/cached/output tokens; via ChatGPT free limited access; via ChatGPT Pro $200/month unlimited access to GPT-5 and GPT-5 Pro
  • Knowledge cutoff: September 30, 2024 (GPT-5), May 30, 2024 (GPT-5 Mini, GPT-5 Nano)
  • Undisclosed: Model, router, and system architectures; training methods and data

How it works: OpenAI revealed few details about GPT-5’s architecture and training except “safe completions” fine-tuning to balance safety and helpfulness, which is documented in a paper.

  • The router selects between non-reasoning and reasoning models based on input “type,” “complexity,” tool requirements, and explicit user intent (such as a prompt to “think hard”). The router learns from user behavior. When ChatGPT users reach usage limits, the router directs queries to mini versions of each model.
  • The team trained the models on web content, licensed data, and human and generated input. They fine-tuned them to reason via reinforcement learning.  
  • In addition, they fine-tuned the models to prefer helpful but “safe” answers over refusals to answer, an approach the team calls safe completions. Given a potentially problematic input, a model aims to respond usefully while staying within safety guidelines, explains when it must refuse, and suggests related outputs that don’t touch on topics it has been trained to avoid.

Results: GPT-5 topped some benchmarks according to OpenAI's evaluations. However, it fell short of competing models on some measures of abstract reasoning in independent tests.

  • On SWE-bench (software engineering tasks), GPT-5 (74.9 percent accuracy) outperformed Claude Opus 4.1 (74.5 percent accuracy).
  • On AIME 2025 (competition math problems), GPT-5 set to high reasoning without tools (94.6 percent accuracy) surpassed o3 set to high reasoning (88.9 percent).
  • On the EQ-Bench Creative Writing v3 benchmark (leaderboard here), GPT-5 with an unspecified reasoning level (90.30) outperformed o3 (87.65), Gemini-2.5-pro (86.00), and Claude Opus 4 (83.75).
  • On Artificial Analysis’s Intelligence Index, a weighted average of 10 benchmarks, GPT-5 set to either high or medium reasoning exceeded all other models tested, followed by xAI Grok 4 and OpenAI o3. However, it fared worse on benchmarks of abstract reasoning without tool use. For instance, on ARC-AGI-1 and ARC-AGI-2 (visual puzzles), GPT-5 with high reasoning (65.7 percent and 9.9 percent respectively) underperformed Grok 4 Thinking (66.7 percent and 16 percent respectively).

Behind the news: Launched in March 2023, GPT-4 raised the bar for vision-language performance, and anticipation of the next version grew steadily over the two years since. In December 2024, The Wall Street Journal reported GPT-5 was delayed as the scale of the project stretched OpenAI’s computational limits. In a mid-February 2025 post on the X social network, OpenAI CEO Sam Altman offered GPT-4.5 as a stopgap and outlined the improvements expected with GPT-5. But in April, he said GPT-5 would be delayed further and launched o3 and o4-mini, whose performance once again topped leaderboards. GPT-5’s August 7 debut brought an end to the long wait, but misleading graphs of its performance, rate limits, and the malfunctioning switcher marred the event, while the unexpected deprecation of earlier models in ChatGPT hamstrung many users.

Why it matters: OpenAI models have consistently topped language benchmarks. With GPT-5, the company has launched a system architecture that integrates its best models and takes advantage of the strengths of each: rapid output, slower output with adjustable computation devoted to reasoning, and graceful degradation to smaller versions.

We’re thinking: Novices may find that the GPT-5 router’s ability to choose a model for any given input simplifies things, but it remains to be seen whether expert users, who may be better at selecting the appropriate model for their tasks, will be happy to give up this control.