Hacker News Viewer

Agent Reading Test

by kaycebasques on 4/6/2026, 6:56:57 PM

<a href="https:&#x2F;&#x2F;dacharycarey.com&#x2F;2026&#x2F;04&#x2F;06&#x2F;designing-agent-reading-test&#x2F;" rel="nofollow">https:&#x2F;&#x2F;dacharycarey.com&#x2F;2026&#x2F;04&#x2F;06&#x2F;designing-agent-reading-...</a>

https://agentreadingtest.com

Comments

by: theyCallMeSwift

I love this idea, but have a hypothesis that 90% of agents that people actually use today would fail this test inadvertently (false negative).<p>Industry best practice + standard implementation for most agents right now is to do web browsing &#x2F; fetching via subagents. Their output is summarized using a cheaper model and then passed back to the parent. It&#x27;s very unlikely that without preserving the actual content the subagents see that the `CANARY-` strings would be found in the output.<p>Any thoughts on how you&#x27;d change the test structure with this in mind?

4/6/2026, 8:48:15 PM


by:

4/6/2026, 9:52:01 PM


by: dostick

The tests should have negative weights based on how often that issue encountered and impact. The 2. SPI should have like 8 negative points out of 10 as most common blocker. And whole test inverse score.

4/6/2026, 8:04:47 PM


by: massimoto

Would love to see some results for different providers. The tests looks super logically thought out, but could use a TL;DR (too lazy; didn&#x27;t run) output.<p>Claude Web Opus 4.6 Extended: 14 &#x2F; 20 points<p>x:CANARY-SPA-JSONLY-prism x:CANARY-CONNEG-MD-sigma

4/6/2026, 8:36:03 PM


by: kaycebasques

See also <a href="https:&#x2F;&#x2F;dacharycarey.com&#x2F;2026&#x2F;04&#x2F;06&#x2F;designing-agent-reading-test&#x2F;" rel="nofollow">https:&#x2F;&#x2F;dacharycarey.com&#x2F;2026&#x2F;04&#x2F;06&#x2F;designing-agent-reading-...</a>

4/6/2026, 6:57:10 PM