Hacker News Viewer

Compressed Agents.md > Agent Skills

by maximedupre on 1/29/2026, 1:08:11 PM

https://vercel.com/blog/agents-md-outperforms-skills-in-our-agent-evals

Comments

by: tottenhm

&gt; In 56% of eval cases, the skill was never invoked. The agent had access to the documentation but didn&#x27;t use it.<p>The agent passes the Turing test...

1/29/2026, 9:23:46 PM


by: jgbuddy

Am I missing something here?<p>Obviously directly including context in something like a system prompt will put it in context 100% of the time. You could just as easily take all of an agent&#x27;s skills, feed it to the agent (in a system prompt, or similar) and it will follow the instructions more reliably.<p>However, at a certain point you have to use skills, because including it in the context every time is wasteful, or not possible. this is the same reason anthropic is doing advanced tool use ref: <a href="https:&#x2F;&#x2F;www.anthropic.com&#x2F;engineering&#x2F;advanced-tool-use" rel="nofollow">https:&#x2F;&#x2F;www.anthropic.com&#x2F;engineering&#x2F;advanced-tool-use</a>, because there&#x27;s not enough context to straight up include everything.<p>It&#x27;s all a context &#x2F; price trade off, obviously if you have the context budget just include what you can directly (in this case, compressing into a AGENTS.md)

1/29/2026, 10:07:59 PM


by: meatcar

What if instead of needing to run a codemod to cache per-lib docs locally, documentation could be distributed alongside a given lib, as a dev dependency, version locked, and accessible locally as plaintext. All docs can be linked in node_modules&#x2F;.docs (like binaries are in .bin). It would be a sort of collection of manuals.<p>What a wonderful world that would be.

1/29/2026, 11:23:08 PM


by: verdverm

This largely mirrors my experience building my custom agent<p>1. Start from the Claude Code extracted instructions, they have many things like this in there. Their knowledge share in docs and blog on this aspect are bar none<p>2. Use AGENTS.md as a table of contents and sparknotes, put them everywhere, load them automatically<p>3. Have topical markdown files &#x2F; skills<p>4. Make great tools, this is still opaque in my mind to explain, lots of overlap with MCP and skills, conceptually they are the same to me<p>5. Iterate, experiment, do weird things, and have fun!<p>I changed read&#x2F;write_file to put contents in the state and presented in the system prompt, same for the agents.md, now working on evals to show how much better this is, because anecdotally, it kicks ass

1/29/2026, 11:11:45 PM


by: thorum

The article presents AGENTS.md as something distinct from Skills, but it is actually a simplified instance of the same concept. Their AGENTS.md approach tells the AI where to find instructions for performing a task. That’s a Skill.<p>I expect the benefit is from better Skill design, specifically, minimizing the number of steps and decisions between the AI’s starting state and the correct information. Fewer transitions -&gt; fewer chances for error to compound.

1/29/2026, 10:10:46 PM


by: BenoitEssiambre

Wouldn&#x27;t this have been more readable with a \n newline instead of a pipe operator as a seperator? This wouldn&#x27;t have made the prompt longer.

1/29/2026, 10:55:20 PM


by: jryan49

Something that I always wonder with each blog post comparing different types of prompt engineering is did they run it once, or multiple times? LLMs are not consistent for the same task. I imagine they realize this of course, but I never get enough details of the testing methodology.

1/29/2026, 9:48:35 PM


by: smcleod

Sounds like they&#x27;ve been using skills incorrectly if they&#x27;re finding their agents don&#x27;t invoke the skills. I have Claude Code agents calling my skills frequently, almost every session. You need to make sure your skill descriptions are well defined and describe when to use them and that your tasks &#x2F; goals clearly set out requirements that align with the available skills.

1/29/2026, 10:04:32 PM


by: pietz

Isn&#x27;t it obvious that an agent will do better if he internalizes the knowledge on something instead of having the option to request it?<p>Skills are new. Models haven&#x27;t been trained on them yet. Give it 2 months.

1/29/2026, 9:39:48 PM


by: newzino

The compressed agents.md approach is interesting, but the comparison misses a key variable: what happens when the agent needs to do something outside the scope of its instructions?<p>With explicit skills, you can add new capabilities modularly - drop in a new skill file and the agent can use it. With a compressed blob, every extension requires regenerating the entire instruction set, which creates a versioning problem.<p>The real question is about failure modes. A skill-based system fails gracefully when a skill is missing - the agent knows it can&#x27;t do X. A compressed system might hallucinate capabilities it doesn&#x27;t actually have because the boundary between &quot;things I can do&quot; and &quot;things I can&#x27;t&quot; is implicit in the training rather than explicit in the architecture.<p>Both approaches optimize for different things. Compressed optimizes for coherent behavior within a narrow scope. Skills optimize for extensibility and explicit capability boundaries. The right choice depends on whether you&#x27;re building a specialist or a platform.

1/29/2026, 10:41:41 PM


by:

1/29/2026, 10:04:29 PM


by: sheepscreek

It seems their tests rely on Claude alone. It’s not safe to assume that Codex or Gemini will behave the same way as Claude. I use all three and each has its own idiosyncrasies.

1/29/2026, 10:29:59 PM


by: rao-v

In a month or three we’ll have the sensible approach, which is smaller cheaper fast models optimized for looking at a query and identifying which skills &#x2F; context to provide in full to the main model.<p>It’s really silly to waste big model tokens on throat clearing steps

1/29/2026, 9:43:53 PM


by: ChrisArchitect

Title is: AGENTS.md outperforms skills in our agent evals

1/29/2026, 10:59:20 PM


by: sothatsit

This seems like an issue that will be fixed in newer model releases that are better trained to use skills.

1/29/2026, 9:55:47 PM


by: EnPissant

This is confusing.<p>TFA says they added an index to Agents.md that told the agent where to find all documentation and that was a big improvement.<p>The part I don&#x27;t understand is that this is exactly how I thought skills work. The short descriptions are given to the model up-front and then it can request the full documentation as it wants. With skills this is called &quot;Progressive disclosure&quot;.<p>Maybe they used more effective short descriptions in the AGENTS.md than they did in their skills?

1/29/2026, 9:08:23 PM


by: ares623

2 months later: &quot;Anthropic introduces &#x27;Claude Instincts&#x27;&quot;

1/29/2026, 9:07:54 PM


by: CjHuber

That feels like a stupid article. well of course if you have one single thing you want to optimize putting it into AGENTS.md is better. but the advantage of skills is exactly that you don&#x27;t cram them all into the AGENTS file. Let&#x27;s say you had 3 different elaborate things you want the agent to do. good luck putting them all in your AGENTS.md and later hoping that the agent remembers any of it. After all the key advantage of the SKILLs is that they get loaded to the end of the context when needed

1/29/2026, 10:11:20 PM


by: heliumtera

you are telling me that a markdown saying:<p>*You are the Super Duper Database Master Administrator of the Galaxy*<p>does not improve the model ability reason about databases?

1/29/2026, 11:04:47 PM


by: delduca

Ah nice… vercel is vibecoded

1/29/2026, 10:48:38 PM


by: thom

You need the model to interpret documentation as policy you care about (in which case it will pay attention) rather than as something it can look up if it doesn’t know something (which it will never admit). It helps to really internalise the personality of LLMs as wildly overconfident but utterly obsequious.

1/29/2026, 10:04:10 PM


by:

1/29/2026, 10:25:33 PM