Agent Reading Test
by kaycebasques on 4/6/2026, 6:56:57 PM
<a href="https://dacharycarey.com/2026/04/06/designing-agent-reading-test/" rel="nofollow">https://dacharycarey.com/2026/04/06/designing-agent-reading-...</a>
Comments
by: theyCallMeSwift
I love this idea, but have a hypothesis that 90% of agents that people actually use today would fail this test inadvertently (false negative).<p>Industry best practice + standard implementation for most agents right now is to do web browsing / fetching via subagents. Their output is summarized using a cheaper model and then passed back to the parent. It's very unlikely that without preserving the actual content the subagents see that the `CANARY-` strings would be found in the output.<p>Any thoughts on how you'd change the test structure with this in mind?
4/6/2026, 8:48:15 PM
by:
4/6/2026, 9:52:01 PM
by: dostick
The tests should have negative weights based on how often that issue encountered and impact. The 2. SPI should have like 8 negative points out of 10 as most common blocker. And whole test inverse score.
4/6/2026, 8:04:47 PM
by: massimoto
Would love to see some results for different providers. The tests looks super logically thought out, but could use a TL;DR (too lazy; didn't run) output.<p>Claude Web Opus 4.6 Extended: 14 / 20 points<p>x:CANARY-SPA-JSONLY-prism x:CANARY-CONNEG-MD-sigma
4/6/2026, 8:36:03 PM
by: kaycebasques
See also <a href="https://dacharycarey.com/2026/04/06/designing-agent-reading-test/" rel="nofollow">https://dacharycarey.com/2026/04/06/designing-agent-reading-...</a>
4/6/2026, 6:57:10 PM