How does misalignment scale with model intelligence and task complexity?

by salkahfi on 2/3/2026, 12:28:06 AM

https://alignment.anthropic.com/2026/hot-mess-of-ai/

Comments

by: hogehoge51

My ignorant question: They did bias and variance noise, how about quantisation noise? I feel like sometimes agents are "flipfloping" between metastable interpretations of the problem or solution.

2/3/2026, 3:34:29 AM

by: jmtulloss

The comments so far seem focused on taking a cheap shot, but as somebody working on using AI to help people with hard, long-term tasks, it's a valuable piece of writing.- It's short and to the point- It's actionable in the short term (make sure the tasks per session aren't too difficult) and useful for researchers in the long term- It's informative on how these models work, informed by some of the best in the business- It gives us a specific vector to look at, clearly defined ("coherence", or, more fun, "hot mess")

2/3/2026, 1:26:43 AM

by: gopalv

> Making models larger improves overall accuracy but doesn't reliably reduce incoherence on hard problems.Coherence requires 2 opposing forces to hold coherence in one dimension and at least 3 of them in higher dimensions of quality.My team wrote up a paper titled "If You Want Coherence, Orchestrate a Team of Rivals"[1] because we kept finding that upping the reasoning threshold resulted in less coherence - more experimentation before we hit a dead-end to turn around.So we had a better result from using Haiku (we fail over to Sonnet) over Opus and using a higher reasoning model to decompose tasks rather than perform each one of them.Once a plan is made, the cheaper models do better as they do not double-think their approaches - they fail or they succeed, they are not as tenacious as the higher cost models.We can escalate to higher authority and get out of that mess faster if we fail hard and early.The knowledge of how exactly failure happened seems to be less useful to the higher reasoning model over the action biased models.Splitting up the tactical and strategic sides of the problem, seems to work similarly to how Generals don't hold guns in a war.[1] - <a href="https://arxiv.org/abs/2601.14351" rel="nofollow">https://arxiv.org/abs/2601.14351</a>

2/3/2026, 12:57:39 AM

by: CuriouslyC

This is a good line: "It found that smarter entities are subjectively judged to behave less coherently"I think this is twofold:1. Advanced intelligence requires the ability to traverse between domain valleys in the cognitive manifold. Be it via temperature or some fancy tunneling technique, it's going to be higher error (less coherent) in the valleys of the manifold than naive gradient following to the local minima.2. It's hard to "punch up" when evaluating intelligence. When someone is a certain amount smarter than you, distinguishing their plausible bullshit from their deep insights is really, really hard.

2/3/2026, 12:56:41 AM

by: cadamsdotcom

When humans dream, we are disconnected from the world around us. Without the grounding that comes from being connected to our bodies, anything can happen in a dream.It is no surprise that models need grounding too, lest their outputs be no more useful than dreams.It’s us engineers who give arms and legs to models, so they can navigate the world and succeed at their tasks.

2/3/2026, 3:29:11 AM

by: leahtheelectron

It's nice seeing this with Sohl-Dickstein as the last author after reading this blog post from him some time ago: <a href="https://sohl-dickstein.github.io/2023/03/09/coherence.html" rel="nofollow">https://sohl-dickstein.github.io/2023/03/09/coherence.html</a>

2/3/2026, 3:18:11 AM

by: smy20011

I think It's not because AI working on "misaligned" goals. The user never specify the goal clearly enough for AI system to work.However, I think producing detailed enough specification requires same or even larger amount of work than writing code. We write rough specification and clarify these during the process of coding. I think there are minimal effort required to produce these specification, AI will not help you speed up these effort.

2/3/2026, 1:16:07 AM

by: tbrownaw

Longer thinking sections have more space for noise to accumulate?

2/3/2026, 3:09:06 AM

by: nayroclade

The models they tested are already way behind the current state-of-the-art. Would be interesting to see if their results hold up when repeated with the latest frontier models.

2/3/2026, 1:28:59 AM

by: IgorPartola

For some reason the article reads to me like “AI is not evil, it just has accidents when it loses coherence.” Sounds a lot like liability shifting.

2/3/2026, 1:05:23 AM

by: cyanydeez

Oh, the irony of thinking this refers to the investors and shell companies.

2/3/2026, 12:57:08 AM

by:

2/3/2026, 1:41:39 AM

by: tsunamifury

I don’t know why it seems so hard for these guys to understand you scorecard every step for new strategy to Close distance at goal and if you have multiple generated forward options with no good weight you spawn a new agent and multiple paths. Then you score all the terminal branches and prune.LLMs aren’t constrained to linear logic like your average human.

2/3/2026, 1:02:48 AM

by: throwpoaster

Yudkowsky btfo.

2/3/2026, 1:04:46 AM

Hacker News Viewer

Top 20