Hacker News Viewer

GPT-5.5

by rd on 4/23/2026, 6:01:39 PM

https://openai.com/index/introducing-gpt-5-5/

Comments

by: tedsanders

Just as a heads up, even though GPT-5.5 is releasing today, the rollout in ChatGPT and Codex will be gradual over many hours so that we can make sure service remains stable for everyone (same as our previous launches). You may not see it right away, and if you don&#x27;t, try again later in the day. We usually start with Pro&#x2F;Enterprise accounts and then work our way down to Plus. We know it&#x27;s slightly annoying to have to wait a random amount of time, but we do it this way to keep service maximally stable.<p>(I work at OpenAI.)

4/23/2026, 6:13:31 PM


by: _alternator_

&gt; One engineer at NVIDIA who had early access to the model went as far as to say: &quot;Losing access to GPT‑5.5 feels like I&#x27;ve had a limb amputated.”<p>This quote is more sinister than I think was intended; it likely applies to all frontier coding models. As they get better, we quickly come to rely on them for coding. It&#x27;s like playing a game on God Mode. Engineers become dependent; it&#x27;s truly addictive.<p>This matches my own experience and unease with these tools. I don&#x27;t really have the patience to write code anymore because I can one shot it with frontier models 10x faster. My role has shifted, and while it&#x27;s awesome to get so much working so quickly, the fact is, when the tokens run out, I&#x27;m basically done working.<p>It&#x27;s literally higher leverage for me to go for a walk if Claude goes down than to write code because if I come back refreshed and Claude is working an hour later then I&#x27;ll make more progress than mentally wearing myself out reading a bunch of LLM generated code trying to figure out how to solve the problem manually.<p>Anyway, it continues to make me uneasy, is all I&#x27;m saying.

4/23/2026, 8:23:32 PM


by: simonw

This doesn&#x27;t have API access yet, but OpenAI seem to approve of the Codex API backdoor used by OpenClaw these days... <a href="https:&#x2F;&#x2F;twitter.com&#x2F;steipete&#x2F;status&#x2F;2046775849769148838" rel="nofollow">https:&#x2F;&#x2F;twitter.com&#x2F;steipete&#x2F;status&#x2F;2046775849769148838</a> and <a href="https:&#x2F;&#x2F;twitter.com&#x2F;romainhuet&#x2F;status&#x2F;2038699202834841962" rel="nofollow">https:&#x2F;&#x2F;twitter.com&#x2F;romainhuet&#x2F;status&#x2F;2038699202834841962</a><p>And that backdoor API has GPT-5.5.<p>So here&#x27;s a pelican: <a href="https:&#x2F;&#x2F;simonwillison.net&#x2F;2026&#x2F;Apr&#x2F;23&#x2F;gpt-5-5&#x2F;#and-some-pelicans" rel="nofollow">https:&#x2F;&#x2F;simonwillison.net&#x2F;2026&#x2F;Apr&#x2F;23&#x2F;gpt-5-5&#x2F;#and-some-peli...</a><p>I used this new plugin for LLM: <a href="https:&#x2F;&#x2F;github.com&#x2F;simonw&#x2F;llm-openai-via-codex" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;simonw&#x2F;llm-openai-via-codex</a><p>UPDATE: I got a <i>much better</i> pelican by setting the reasoning effort to xhigh: <a href="https:&#x2F;&#x2F;gist.github.com&#x2F;simonw&#x2F;a6168e4165a258e4d664aeae8e602cc5?permalink_comment_id=6115759#gistcomment-6115759" rel="nofollow">https:&#x2F;&#x2F;gist.github.com&#x2F;simonw&#x2F;a6168e4165a258e4d664aeae8e602...</a>

4/23/2026, 7:24:57 PM


by: Someone1234

I&#x27;d like to draw people&#x27;s attention to this section of this page:<p><a href="https:&#x2F;&#x2F;developers.openai.com&#x2F;codex&#x2F;pricing?codex-usage-limits=plus#what-are-the-usage-limits-for-my-plan" rel="nofollow">https:&#x2F;&#x2F;developers.openai.com&#x2F;codex&#x2F;pricing?codex-usage-limi...</a><p>Note the Local Messages between 5.3, 5.4, and 5.5. And, yes, I did read the linked article and know they&#x27;re claiming that 5.5&#x27;s new efficient should make it break-even with 5.4, but the point stands, tighter limits&#x2F;higher prices.

4/23/2026, 6:30:27 PM


by: jfkimmes

Everyone talked about the marketing stunt that was Anthropic&#x27;s gated Mythos model with an 83% result on CyberGym. OpenAI just dropped GPT 5.5, which scores 82% and is open for anybody to use.<p>I recommend anybody in offensive&#x2F;defensive cybersecurity to experiment with this. This is the real data point we needed - without the hype!<p>Never thought I&#x27;d say this but OpenAI is the &#x27;open&#x27; option again.

4/23/2026, 6:55:44 PM


by: astlouis44

<i>A playable 3D dungeon arena prototype built with Codex and GPT models. Codex handled the game architecture, TypeScript&#x2F;Three.js implementation, combat systems, enemy encounters, HUD feedback, and GPT‑generated environment textures. Character models, character textures, and animations were created with third-party asset-generation tools</i><p>The game that this prompt generated looks pretty decent visually. A big part of this likely due to the fact the meshes were created using a seperate tool (probably meshy, tripo.ai, or similiar) and not generated by 5.5 itself.<p>It really seems like we could be at the dawn of a new era similiar to flash, where any gamer or hobbyist can generate game concepts quickly and instantly publish them to the web. Three.js in particular is really picking up as the primary way to design games with AI, in spite of the fact it&#x27;s not even a game engine, just a web rendering library.

4/23/2026, 6:10:17 PM


by: silvertaza

Still huge hallucination rate, unfortunately at 86%. To compare, Opus sits at 36%.<p>Source: <a href="https:&#x2F;&#x2F;artificialanalysis.ai&#x2F;models?omniscience=omniscience-hallucination-rate#aa-omniscience-hallucination-rate" rel="nofollow">https:&#x2F;&#x2F;artificialanalysis.ai&#x2F;models?omniscience=omniscience...</a>

4/23/2026, 7:57:41 PM


by: minimaxir

The more interesting part of the announcement than &quot;it&#x27;s better at benchmarks&quot;:<p>&gt; To better utilize GPUs, Codex analyzed weeks’ worth of production traffic patterns and wrote custom heuristic algorithms to optimally partition and balance work. The effort had an outsized impact, increasing token generation speeds by over 20%.<p>The ability for agentic LLMs to improve computational efficiency&#x2F;speed is a highly impactful domain I wish was more tested than with benchmarks. From my experience Opus is still much better than GPT&#x2F;Codex in this aspect, but given that OpenAI is getting material gains out of this type of performancemaxxing and they have an increasing incentive to continue doing so given cost&#x2F;capacity issues, I wonder if OpenAI will continue optimizing for it.

4/23/2026, 6:08:19 PM


by: 6thbit

<p><pre><code> Mythos 5.5 SWE-bench Pro 77.8%* 58.6% Terminal-bench-2.0 82.0% 82.7%* GPQA Diamond 94.6%* 93.6% H. Last Exam 56.8%* 41.4% H. Last Exam (tools) 64.7%* 52.2% BrowseComp 86.9% 84.4% (90.1% Pro)* OSWorld-Verified 79.6%* 78.7% </code></pre> Still far from Mythos on SWE-bench but quite comparable otherwise. Source for mythos values: <a href="https:&#x2F;&#x2F;www.anthropic.com&#x2F;glasswing" rel="nofollow">https:&#x2F;&#x2F;www.anthropic.com&#x2F;glasswing</a>

4/23/2026, 7:19:31 PM


by: M4R5H4LL

I am a heavy Claude Code user. I just tried using Codex with 5.4 (as a Plus user I don&#x27;t have access to 5.5 yet), and it was quite underwhelming. The app stopped regularly much earlier than what I wanted. It also claimed to have fixed issues when it did not; this is not a hallmark of GPT, and Opus has similar issues, but Claude will not make the same mistake three times in a row. It is unusable at the moment, while Claude allows me do get real work done on a daily basis. Until then...

4/23/2026, 8:17:47 PM


by: applfanboysbgon

If there&#x27;s a bingo card for model releases, &quot;our [superlative] and [superlative] model yet&quot; is surely the free space.

4/23/2026, 6:07:17 PM


by: vthallam

This model is great at long horizon tasks, and Codex now has heartbeats, so it can keep checking on things. Give it your hardest problem that would take hours with verifiable constraints, you will see how good this is:)<p>*I work at OAI.

4/23/2026, 7:03:02 PM


by: aliljet

I&#x27;ve found myself so deeply embedded in the Claude Max subscription that I&#x27;m worried about potentially makign a switch. How are people making sure they stay nimble enough not to get trarpped by one company&#x27;s ecosystem over another? For what it&#x27;s worth, Opus 4.7 has not been a step up and it&#x27;s come with an enormously higher usage of the subscription Anthropic offers making the entire offering double worse.

4/23/2026, 7:12:42 PM


by: BrokenCogs

I&#x27;m here for the pelicans and I&#x27;m not leaving until I see one!

4/23/2026, 6:19:27 PM


by: mudkipdev

This is 3x the price of GPT-5.1, released just 6 months ago. Is no one else alarmed by the trend? What happens when the cheaper models are deprecated&#x2F;removed over time?

4/23/2026, 7:11:58 PM


by: CompleteSkeptic

Is this the first time OpenAI has published comparisons to other labs?<p>Seems so to me - see GPT-5.4[1] and 5.2[2] announcements.<p>Might be an tacit admission of being behind.<p>[1] <a href="https:&#x2F;&#x2F;openai.com&#x2F;index&#x2F;introducing-gpt-5-4&#x2F;" rel="nofollow">https:&#x2F;&#x2F;openai.com&#x2F;index&#x2F;introducing-gpt-5-4&#x2F;</a> [2] <a href="https:&#x2F;&#x2F;openai.com&#x2F;index&#x2F;introducing-gpt-5-2&#x2F;" rel="nofollow">https:&#x2F;&#x2F;openai.com&#x2F;index&#x2F;introducing-gpt-5-2&#x2F;</a>

4/23/2026, 7:23:21 PM


by: h14h

This seems huge for subscription customers. Looking at the Artificial Analysis numbers, 5.5 at medium effort yields roughly the intelligence as 5.4 (xhigh) while using less than a fifth the tokens.<p>As long as tokens count roughly equally towards subscription plan usage between 5.5 &amp; 5.4, you can look at this as effectively a 5x increase in usage limits.

4/23/2026, 6:47:40 PM


by: kburman

What a time. I am back here genuinely wishing for OpenAI to release a great model, because without stiff competition, it feels like Anthropic has completely lost its mind.

4/23/2026, 8:15:26 PM


by: baxuz

Ah yes, the next &quot;trust me bro&quot;

4/23/2026, 8:28:10 PM


by: gallerdude

If GPT-5.5 Pro really was Spud, and two years of pretraining culminated in one release, WOW, you cannot feel it at all from this announcement. If OpenAI wants to know why they like they’ve fallen behind the vibes of Anthropic, they need to look no further than their marketing department. This makes everything feel like a completely linear upgrade in every way.

4/23/2026, 6:41:45 PM


by: jryio

Their &#x27;Preparedness Framework&#x27;[1] is 20 pages and looks ChatGPT generated, I don&#x27;t feel prepared reading it.<p><a href="https:&#x2F;&#x2F;cdn.openai.com&#x2F;pdf&#x2F;18a02b5d-6b67-4cec-ab64-68cdfbddebcd&#x2F;preparedness-framework-v2.pdf" rel="nofollow">https:&#x2F;&#x2F;cdn.openai.com&#x2F;pdf&#x2F;18a02b5d-6b67-4cec-ab64-68cdfbdde...</a>

4/23/2026, 6:12:11 PM


by: ativzzz

I like that they waited for opus 4.7 to come out first so they had a few days to find the benchmarks that gpt 5.5 is better at

4/23/2026, 6:08:49 PM


by: NitpickLawyer

&gt; Across all three evals, GPT‑5.5 improves on GPT‑5.4’s scores while using fewer tokens.<p>Yeah, this was the next step. Have RLVR make the model good. Next iteration start penalising long + correct and reward short + correct.<p>&gt; CyberGym 81.8%<p>Mythos was self reported at 83.1% ... So not far. Also it seems they&#x27;re going the same route with verification. We&#x27;re entering the era where SotA will only be available after KYC, it seems.

4/23/2026, 6:53:40 PM


by: 2001zhaozhao

Pricing: $5&#x2F;1M input, $30&#x2F;1M output<p>(same input price and 20% more output price than Opus 4.7)

4/23/2026, 6:23:07 PM


by: Rapzid

In Copilot where it&#x27;s easy to switch models Opus 4.6 was still providing, IMHO, better stock results than GPT-5.4.<p>Particularly in areas outside straight coding tasks. So analysis, planning, etc. Better and more thorough output. Better use of formatting options(tables, diagrams, etc).<p>I&#x27;m hoping to see improvements in this area with 5.5.

4/23/2026, 7:52:53 PM


by: losvedir

&gt; It excels at ... researching online<p>How does this work exactly? Is there like a &quot;search online&quot; tool that the harness is expected to provide? Or does the OpenAI infra do that as part of serving the response?<p>I&#x27;ve been working on building my own agent, just for fun, and I conceptually get using a command line, listing files, reading them, etc, but am sort of stumped how I&#x27;m supposed to do the web search piece of it.<p>Given that they&#x27;re calling out that this model is great at online research - to what extent is that a property of the model itself? I would have thought that was a harness concern.

4/23/2026, 6:33:23 PM


by: pants2

Labs still aren&#x27;t publishing ARC-AGI-3 scores, even though it&#x27;s been out for some time. Is it because the numbers are too embarrassing?

4/23/2026, 8:06:47 PM


by: sosodev

I hope the industry starts competing more on highest scores with lowest tokens like this. It&#x27;s a win for everybody. It means the model is more intelligent, is more efficient to inference, and costs less for the end user.<p>So much bench-maxxing is just giving the model a ton of tokens so it can inefficiently explore the solution space.

4/23/2026, 6:30:33 PM


by: nickvec

I&#x27;m conflicted whether I should keep my Claude Max 5x subscription at this point and switch back to GPT&#x2F;Codex... anyone else in a similar position? I&#x27;d rather not be paying for two AI providers and context switching between the two, though I&#x27;m having a hard time gauging if Claude Code is still the &quot;cream of the crop&quot; for SWE work. I haven&#x27;t played around with Codex much.

4/23/2026, 7:41:40 PM


by: louiereederson

For a 56.7 score on the Artificial Intelligence Index, GPT 5.5 used 22m output tokens. For a score of 57, Opus 4.7 used 111m output tokens.<p>The efficiency gap is enormous. Maybe it&#x27;s the difference between GB200 NVL72 and an Amazon Tranium chip?

4/23/2026, 6:16:59 PM


by: baalimago

Worth the 100% price increase over GPT-5.4?

4/23/2026, 6:15:54 PM


by: meetpateltech

GPT-5.5 System Card:<p><a href="https:&#x2F;&#x2F;deploymentsafety.openai.com&#x2F;gpt-5-5" rel="nofollow">https:&#x2F;&#x2F;deploymentsafety.openai.com&#x2F;gpt-5-5</a>

4/23/2026, 6:06:46 PM


by: ZeroCool2u

Benchmarks are favorable enough they&#x27;re comparing to non-OpenAI models again. Interesting that tokens&#x2F;second is similar to 5.4. Maybe there&#x27;s some genuine innovation beyond bigger model better this time?

4/23/2026, 6:07:38 PM


by: zerotosixty

Those who are using gpt5.5 how does it compare to Opus 4.6 &#x2F; 4.7 in terms of code generation?

4/23/2026, 8:09:59 PM


by: jdw64

GPT is really great, but I wish the GPT desktop app supported MCP as well.<p>You can kind of use connectors like MCP, but having to use ngrok every time just to expose a local filesystem for file editing is more cumbersome than expected.

4/23/2026, 6:10:54 PM


by: ace2pace

I hear its as good as Opus 4.7.<p>The battle has just begun

4/23/2026, 8:18:19 PM


by: cscheid

I know this is irrelevant on the grand scheme of things, but that WebGL animation is really quite wrong. That is extra funny given the &quot;ensure it has realistic orbital mechanics.&quot; phrase in the prompt.<p>I prescribe 20 hours of KSP to everyone involved, that&#x27;ll set them right.

4/23/2026, 7:44:45 PM


by: vessenes

Yay. 5.4 was a frustrating model - moments of extreme intelligence (I liked it very much for code review) - but also a sort of idiocy&#x2F;literalism that made it very unsuited for prompting in a vague sense. I also found its openclaw engagement wooden and frustrating. Which didn’t matter until anthropic started charging $150 a day for opus for openclaw.<p>Anyway - these benchmarks look really good; I’m hopeful on the qualitative stuff.

4/23/2026, 6:24:28 PM


by: thimabi

Will we also see a GPT-5.5-Codex version of this model? Or will the same version of it be served both in the web app and in Codex?

4/23/2026, 6:26:14 PM


by: jumploops

&gt; GPT‑5.5 improves on GPT‑5.4’s scores while using fewer tokens.<p>This might be great if it translates to agentic engineering and not just benchmarks.<p>It seems some of the gains from Opus 4.6 to 4.7 required more tokens, not less.<p>Maybe more interesting is that they’ve used codex to improve model inference latency. iirc this is a new (expectedly larger) pretrain, so it’s presumably slower to serve.

4/23/2026, 6:17:15 PM


by: benjx88

Good job on the release notice. I appreciate that it isn&#x27;t just marketing fluff, but actually includes the technical specs for those of us who care and not concentrated in coding agents only.<p>I hope GPT 5.5 Pro is not cutting corners and neuter from the start, you got the compute for it not to be.

4/23/2026, 7:18:25 PM


by: GenerWork

Looking at the space&#x2F;game&#x2F;earthquake tracker examples makes me hopeful that OpenAI is going to focus a bit more on interface visual development&#x2F;integration from tools like Figma. This is one area where Anthropic definitely reigns supreme.

4/23/2026, 7:19:06 PM


by: AbuAssar

This is the first time openAi include competing models in their benchmarks, always included only openAi models.

4/23/2026, 7:46:29 PM


by: nickandbro

Very impressive! Interesting how all other benchmarks it seems to surpass Opus 4.7 except SWE-Bench Pro (Public). You would think that doing so well at Cyber, it would naturally possess more abilities there. Wonder what makes up the actual difference there

4/23/2026, 7:10:02 PM


by: extr

Seems like a continuation of the current meta where GPT models are better in GPT-like ways and Claude models are better in Claude-like ways, with the differences between each slightly narrowing with each generation. 5.5 is noticeably better to talk to, 4.7 is noticeably more precise. Etc etc.

4/23/2026, 6:50:23 PM


by:

4/23/2026, 7:06:56 PM


by: cchrist

Which is better GPT-5.5 or Opus 4.7? And for what tasks?

4/23/2026, 7:50:35 PM


by:

4/23/2026, 6:49:05 PM


by: impulser_

What is the reason behind OpenAI being able to release new models very fast?<p>Since Feb when we got Gemini 3.1, Opus 4.6, and GPT-5.3-Codex we have seen GPT-5.4 and GPT-5.5 but only Opus 4.7 and no new Gemini model.<p>Both of these are pretty decent improvements.

4/23/2026, 6:15:00 PM


by: nullbyte

82.7% on Terminal Bench is crazy

4/23/2026, 6:09:09 PM


by: I_am_tiberius

I&#x27;d really like to see improvements like these: - Some technical proof that data is never read by open ai. - Proof that no logs of my data or derived data is saved. etc...

4/23/2026, 6:24:51 PM


by:

4/23/2026, 6:51:51 PM


by: YmiYugy

So according to the benchmarks somewhere in between Opus 4.7 and Mythos

4/23/2026, 6:14:20 PM


by: k2xl

Surprised to see SWE-Bench Pro only a slight improvement (57.7% -&gt; 58.6%) while Opus 4.7 hit 64.3%. I wonder what Anthropic is doing to achieve higher scores on this - and also what makes this test particular hard to do well in compared to Terminal Bench (which 5.5 seemed to have a big jump in)

4/23/2026, 6:23:31 PM


by:

4/23/2026, 7:38:50 PM


by: jawiggins

What is the major and minor semver meaning for these models? Is each minor release a new fine-tuning with a new subset of example data while the major releases are made from scratch? Or do they even mean anything at this point?

4/23/2026, 7:58:39 PM


by: egorfine

&gt; We are releasing GPT‑5.5 with our strongest set of safeguards to date<p>...<p>&gt; we’re deploying stricter classifiers for potential cyber risk which some users may find annoying initially<p>So we should be expecting to not be able to check our own code for vulnerabilities, because inherently the model cannot know whether I&#x27;m feeding my code or someone else&#x27;s.

4/23/2026, 7:47:57 PM


by: bradley13

&quot;our strongest set of safeguards to date&quot;<p>How much capability is lost, by hobbling models with a zillion protections against idiots?<p>Every prompt gets evaluated, to ensure you are not a hacker, you are not suicidal, you are not a racist, you are not...<p>Maybe just...leave that all off? I know, I know, individual responsibility no longer exists, but I can dream.

4/23/2026, 7:46:24 PM


by: faxmeyourcode

How does it compare to mythos?

4/23/2026, 6:34:18 PM


by: woeirua

Nice to see them openly compare to Opus-4.7… but they don’t compare it against Mythos which says everything you need to know.<p>The LinkedIn&#x2F;X influencers who hyped this as a Mythos-class model should be ashamed of themselves, but they’ll be too busy posting slop content about how “GPT-5.5 changes everything”.

4/23/2026, 6:48:03 PM


by: tantalor

&gt; A playable 3D dungeon arena<p>Where&#x27;s the demo link?

4/23/2026, 6:58:33 PM


by: throwaway2027

Good timing I had just renewed my subscription.

4/23/2026, 6:43:28 PM


by: ionwake

is there anywhere I can try it? ( I just stopped my pro sub ) but was wondering if there is a playground or 3rd party so i can just test it briefly?

4/23/2026, 6:30:56 PM


by: elAhmo

Is Codex receiving 5.4 or 5.5 release?<p>I am still using Codex 5.3 and haven&#x27;t switched to GPT 5.4 as I don&#x27;t like the &#x27;its automatic bro trust us&#x27;, so wondering is Codex going to get these specific releases at all in the future.

4/23/2026, 7:40:16 PM


by: senko

I might just be following too many AI-related people on X, but omg the media blitz around 5.5 is aggressive.<p>Soo many unconvincing &quot;I&#x27;ve had access for three weeks and omg it&#x27;s amazing&quot; takes, it actually primes me for it to be a &quot;meh&quot;.<p>I prefer to see for myself, but the gradual rollout, combined with full-on marketing campaign, is annoying.

4/23/2026, 7:21:27 PM


by: debba

Cannot see it in Codex CLI

4/23/2026, 6:51:38 PM


by: XCSme

2x the price for 1-5% performance gain

4/23/2026, 7:37:36 PM


by: phillipcarter

... <i>sigh</i>. I realize there&#x27;s little that can be done about this, but I <i>just</i> got through a real-world session determining of Opus 4.7 is meaningfully better than Opus 4.6 or GPT 5.4, and now there&#x27;s another one to try things with. These benchmark results generally mean little to me in practice.<p>Anyways, still exciting to see more improvements.

4/23/2026, 6:49:43 PM


by: cynicalpeace

It&#x27;s possible that &quot;smarter&quot; AI won&#x27;t lead to more productivity in the economy. Why?<p>Because software and &quot;information technology&quot; generally didn&#x27;t increase productivity over the past 30 years.<p>This has been long known as Solow&#x27;s productivity paradox. There&#x27;s lots of theories as to why this is observed, one of them being &quot;mismeasurement&quot; of productivity data.<p>But my favorite theory is that information technology is mostly entertainment, and rather than making you more productive, it distracts you and makes you more lazy.<p>AI&#x27;s main application has been information space so far. If that continues, I doubt you will get more productivity from it.<p>If you give AI a body... well, maybe that changes.

4/23/2026, 6:13:12 PM


by: objektif

Are there faster mini&#x2F;nano versions as well?

4/23/2026, 6:10:49 PM


by: varispeed

I am sceptical. The generation after 4o models have become crappier and crappier. Hope this one changes the trend. 5.4 is unusable for complex coding work.

4/23/2026, 7:19:58 PM


by: numbers

I&#x27;ve stopped trusting these &quot;trust me bro&quot; benchmarks and just started going to LM Arena and looking for the actual benchmark comparisons.<p><a href="https:&#x2F;&#x2F;arena.ai&#x2F;leaderboard&#x2F;code" rel="nofollow">https:&#x2F;&#x2F;arena.ai&#x2F;leaderboard&#x2F;code</a>

4/23/2026, 6:31:25 PM


by: mondojesus

I&#x27;m still using 5.3 in codex. Are 5.4 and 5.5 better than 5.3 in concrete ways?

4/23/2026, 6:48:25 PM


by: enraged_camel

Is this the first time OpenAI compared their new release to Anthropic models? Previously they were comparing only to GPT&#x27;s own previous versions.

4/23/2026, 6:35:22 PM


by: k2xl

ARC-AGI 3 is missing on this list - given that the SOTA before 5.5 &lt;1% if I recall, I wonder if this didn&#x27;t make meaningful progress.

4/23/2026, 6:24:09 PM


by: yuvrajmalgat

finally

4/23/2026, 7:21:09 PM


by: cmrdporcupine

Not rolled out to my Codex CLI yet, but some users on Reddit claiming it&#x27;s on theirs.

4/23/2026, 6:13:00 PM


by: throwaw12

If anyone tried it already, how do you feel?<p>Numbers look too good, wondering if it is benchmaxxed or not

4/23/2026, 6:47:26 PM


by: xnx

Next up: Google I&#x2F;O on May 19?<p>I have to imagine they&#x27;ll go to Gemini 3.5 if only for marketing reasons.

4/23/2026, 6:21:50 PM


by: luqtas

they are using ethical training weights this time!!! &#x2F;j

4/23/2026, 6:05:10 PM


by: charliecs

[dead]

4/23/2026, 6:35:45 PM


by: yuvrajmalgat

[dead]

4/23/2026, 7:31:41 PM


by: jeremie_strand

[dead]

4/23/2026, 6:31:00 PM


by: MagicMoonlight

Two hundred pages of shilling and it’s a 1% improvement in the benchmarks. They’re dead in the water.<p>Imagine spending 100m on some of these AI “geniuses” and this is the best they can do.

4/23/2026, 6:35:36 PM


by: justonepost2

the attenuation of man nears<p>&lt; 5 years until humans are buffered out of existence tbh<p>may the light of potentia spread forth beyond us

4/23/2026, 6:39:12 PM


by: coderssh

Great modal, I have been using codex and its awesome. Lets see what GPT-5.5 does to it

4/23/2026, 6:32:45 PM


by: vardump

I just can&#x27;t bear to use services from this company after what they did to the global DRAM markets.<p>I&#x27;m not trying to make any kind of moral statement, but the company just feels toxic to me.

4/23/2026, 7:01:56 PM