Qwen3-Coder-Next
by danielhanchen on 2/3/2026, 4:01:50 PM
https://qwen.ai/blog?id=qwen3-coder-next
Comments
by: cedws
I kind of lost interest in local models. Then Anthropic started saying I’m not allowed to use my Claude Code subscription with my preferred tools and it reminded me why we need to support open tools and models. I’ve cancelled my CC subscription, I’m not paying to support anticompetitive behaviour.
2/3/2026, 5:05:08 PM
by: simonw
This GGUF is 48.4GB - <a href="https://huggingface.co/Qwen/Qwen3-Coder-Next-GGUF/tree/main/Qwen3-Coder-Next-Q4_K_M" rel="nofollow">https://huggingface.co/Qwen/Qwen3-Coder-Next-GGUF/tree/main/...</a> - which should be usable on higher end laptops.<p>I still haven't experienced a local model that fits on my 64GB MacBook Pro and can run a coding agent like Codex CLI or Claude code well enough to be useful.<p>Maybe this will be the one? This Unsloth guide from a sibling comment suggests it might be: <a href="https://unsloth.ai/docs/models/qwen3-coder-next">https://unsloth.ai/docs/models/qwen3-coder-next</a>
2/3/2026, 4:15:21 PM
by: danielhanchen
For those interested, made some Dynamic Unsloth GGUFs for local deployment at <a href="https://huggingface.co/unsloth/Qwen3-Coder-Next-GGUF" rel="nofollow">https://huggingface.co/unsloth/Qwen3-Coder-Next-GGUF</a> and made a guide on using Claude Code / Codex locally: <a href="https://unsloth.ai/docs/models/qwen3-coder-next">https://unsloth.ai/docs/models/qwen3-coder-next</a>
2/3/2026, 4:06:12 PM
by: skhameneh
It’s hard to elaborate just how wild this model might be if it performs as claimed. The claims are this can perform close to Sonnet 4.5 for assisted coding (SWE bench) while using only 3B active parameters. This is obscenely small for the claimed performance.
2/3/2026, 4:38:51 PM
by: ionwake
will this run on an apple m4 air with 32gb ram?<p>Im currently using qwen 2.5 16b , and it works really well
2/3/2026, 5:38:47 PM
by: vessenes
3B active parameters, and slightly worse than GLM 4.7. On benchmarks. That's pretty amazing! With better orchestration tools being deployed, I've been wondering if faster, dumber coding agents paired with wise orchestrators might be overall faster than using the say opus 4.5 on the bottom for coding. At least we might want to deploy to these guys for simple tasks.
2/3/2026, 4:15:38 PM
by: storus
Does Qwen3 allow adjusting context during an LLM call or does the housekeeping need to be done before/after each call but not when a single LLM call with multiple tool calls is in progress?
2/3/2026, 5:32:42 PM
by: Alifatisk
As always, the Qwen team is pushing out fantastic content
2/3/2026, 5:35:33 PM
by: valcron1000
Still nothing to compete with GPT-OSS-20B for local image with 16 VRAM.
2/3/2026, 5:22:56 PM
by: alexellisuk
Is this going to need 1x or 2x of those RTX PRO 6000s to allow for a decent KV for an active context length of 64-100k?<p>It's one thing running the model without any context, but coding agents build it up close to the max and that slows down generation massively in my experience.
2/3/2026, 4:39:12 PM
by: zamadatix
Can anyone help me understand the "Number of Agent Turns" vs "SWE-Bench Pro (%)" figure? I.e. what does the spread of Qwen3-Coder-Next from ~50 to ~280 agent turns represent for a fixed score of 44.3%: that sometimes it takes that spread of agent turns to achieve said fixed score for the given model?
2/3/2026, 4:22:56 PM
by: orliesaurus
how can anyone keep up with all these releases... what's next? Sonnet 5?
2/3/2026, 5:01:53 PM
by: ossicones
What browser use agent are they using here?
2/3/2026, 5:06:36 PM
by: throwaw12
We are getting there, as a next step please release something to outperform Opus 4.5 and GPT 5.2 in coding tasks
2/3/2026, 4:33:48 PM
by: endymion-light
Looks great - i'll try to check it out on my gaming PC.<p>On a misc note: What's being used to create the screen recordings? It looks so smooth!
2/3/2026, 4:22:05 PM
by: syntaxing
Is Qwen next architecture ironed out in llama cpp?
2/3/2026, 4:53:15 PM
by: raphaelmolly8
[dead]
2/3/2026, 5:01:15 PM
by: Soerensen
The agent orchestration point from vessenes is interesting - using faster, smaller models for routine tasks while reserving frontier models for complex reasoning.<p>In practice, I've found the economics work like this:<p>1. Code generation (boilerplate, tests, migrations) - smaller models are fine, and latency matters more than peak capability 2. Architecture decisions, debugging subtle issues - worth the cost of frontier models 3. Refactoring existing code - the model needs to "understand" before changing, so context and reasoning matter more<p>The 3B active parameters claim is the key unlock here. If this actually runs well on consumer hardware with reasonable context windows, it becomes the obvious choice for category 1 tasks. The question is whether the SWE-Bench numbers hold up for real-world "agent turn" scenarios where you're doing hundreds of small operations.
2/3/2026, 4:40:16 PM