A Claude Code skill that makes Claude talk like a caveman, cutting token use

by tosh on 4/5/2026, 8:56:18 AM

https://github.com/JuliusBrussee/caveman

Comments

by: JBrussee-2

Author here. A few people are arguing against a stronger claim than the repo is meant to make. As well, this was very much intended to be a joke and not research level commentary.This skill is not intended to reduce hidden reasoning / thinking tokens. Anthropic’s own docs suggest more thinking budget can improve performance, so I would not claim otherwise.What it targets is the visible completion: less preamble, less filler, less polished-but-nonessential text. Therefore, since post-completion output is “cavemanned” the code hasn’t been affected by the skill at all :)Also surprising to hear so little faith in RL. Quite sure that the models from Anthropic have been so heavily tuned to be coding agents that you cannot “force” a model to degrade immensely.The fair criticism is that my “~75%” README number is from preliminary testing, not a rigorous benchmark. That should be phrased more carefully, and I’m working on a proper eval now.Also yes, skills are not free: Anthropic notes they consume context when loaded, even if only skill metadata is preloaded initially.So the real eval is end-to-end: - total input tokens - total output tokens - latency - quality/task successThere is actual research suggesting concise prompting can reduce response length substantially without always wrecking quality, though it is task-dependent and can hurt in some domains. (<a href="https://arxiv.org/html/2401.05618v3" rel="nofollow">https://arxiv.org/html/2401.05618v3</a>)So my current position is: interesting idea, narrower claim than some people think, needs benchmarks, and the README should be more precise until those exist.

4/5/2026, 3:38:54 PM

by: throwatdem12311

Ok but when the model is responding to you isn’t the text it’s generating also part of the context it’s using to generate the next token as it goes? Wouldn’t this just make the answers…dumb?

4/5/2026, 4:25:47 PM

by: padolsey

This is fun. I'd like to see the same idea but oriented for richer tokens instead of simpler tokens. If you want to spend less tokens, then spend the 'good' ones. So, instead of saying 'make good' you could say 'improve idiomatically' or something. Depends on one's needs. I try to imagine every single token as an opportunity to bend/expand/limit the geometries I have access to. Language is a beautiful modulator to apply to reality, so I'll wager applying it with pedantic finesse will bring finer outputs than brutish humphs of cavemen. But let's see the benchmarks!

4/5/2026, 2:24:36 PM

by: vurudlxtyt

Grug brained developer meets AI tooling (<a href="https://grugbrain.dev" rel="nofollow">https://grugbrain.dev</a>)

4/5/2026, 3:50:08 PM

by: teekert

Idk I try talk like cavemen to claude. Claude seems answer less good. We have more misunderstandings. Feel like sometimes need more words in total to explain previous instructions. Also less context is more damage if typo. Who agrees? Could be just feeling I have. I often ad fluff. Feels like better result from LLM. Me think LLM also get less thinking and less info from own previous replies if talk like caveman.

4/5/2026, 10:21:20 AM

by: nharada

I wonder if this will actually be why the models move to "neuralese" or whatever non-language latent representation people work out. Interpretability disappears but efficiency potentially goes way up. Even without a performance increase that would be pretty huge.

4/5/2026, 4:08:14 PM

by: herf

We need a high quality compression function for human readers... because AIs can make code and text faster than we can read.

4/5/2026, 4:16:09 PM

by: TeMPOraL

Oh boy. Someone didn't get the memo that for LLMs, tokens are units of thinking. I.e. whatever feat of computation needs to happen to produce results you seek, it needs to fit in the tokens the LLM produces. Being a finite system, there's only so much computation the LLM internal structure can do per token, so the more you force the model to be concise, the more difficult the task becomes for it - worst case, you can guarantee not to get a good answer because it requires more computation than possible with the tokens produced.I.e. by demanding the model to be concise, you're literally making it dumber.(Separating out "chain of thought" into "thinking mode" and removing user control over it definitely helped with this problem.)

4/5/2026, 10:18:48 AM

by: nayroclade

Cute idea, but you're never gonna blow your token budget on output. Input tokens are the bottleneck, because the agent's ingesting swathes of skills, directory trees, code files, tool outputs, etc. The output is generally a few hundred lines of code and a bit of natural language explanation.

4/5/2026, 10:46:50 AM

by: itpcc

But will it lose some context, like Kevin’s small talk? (<a href="https://www.youtube.com/watch?v=_K-L9uhsBLM" rel="nofollow">https://www.youtube.com/watch?v=_K-L9uhsBLM</a>)Like "Sea world" or "see the world".

4/5/2026, 4:01:47 PM

by: Hard_Space

Also see <a href="https://arxiv.org/pdf/2604.00025" rel="nofollow">https://arxiv.org/pdf/2604.00025</a> ('Brevity Constraints Reverse Performance Hierarchies in Language Models' March 2026)

4/5/2026, 10:23:54 AM

by: FurstFly

Okay, I like how it reduces token usage, but it kind of feels that, it will reduce the overall model intelligence. LLMs are probabilistic models, and you are basically playing with their priors.

4/5/2026, 12:41:26 PM

by: arrty88

Feels like there should be a way to compile skills and readme’s and even code files into concise maps and descriptions optimized for LLMs. They only recompile if timestamps are modified.

4/5/2026, 3:59:25 PM

by:

4/5/2026, 2:49:38 PM

by: abejfehr

There’s a lot of debate about whether this reduces model accuracy, but this is basically Chinese grammar and Chinese vibe coding seems to work fine while (supposedly) using 30-40% less tokens

4/5/2026, 2:08:21 PM

by: ryanschaefer

Kinda ironic this description is so verbose.> Use when user says "caveman mode", "talk like caveman", "use caveman", "less tokens", "be brief", or invokes /cavemanFor the first part of this: couldn’t this just be a UserSubmitPrompt hook with regex against these?See additionalContext in the json output of a script: <a href="https://code.claude.com/docs/en/hooks#structured-json-output" rel="nofollow">https://code.claude.com/docs/en/hooks#structured-json-output</a>For the second, /caveman will always invoke the skill /caveman: <a href="https://code.claude.com/docs/en/skills" rel="nofollow">https://code.claude.com/docs/en/skills</a>

4/5/2026, 10:23:54 AM

by: postalcoder

I disagree with this method and would discourage others from using it too, especially if accuracy, faster responses, and saving money are your priorities.This only makes sense if you assume that you are the consumer of the response. When compacting, harnesses typically save a copy of the text exchange but strip out the tool calls in between. Because the agent relies on this text history to understand its own past actions, a log full of caveman-style responses leaves it with zero context about the changes it made, and the decisions behind them.To recover that lost context, the agent will have to execute unnecessary research loops just to resume its task.

4/5/2026, 2:58:26 PM

by: phtrivier

Soma (aka tiktok) and Big Brother (aka Meta) already happened without government coercion, only makes sense that we optimize ourselves for newspeak.Thank God there is still neverending wars, otherwise authoritarian governments would have no fun left.

4/5/2026, 1:24:29 PM

by: shomp

everyone who thinks this is a costly or bad idea is looking past a very salient finding: code doesn't need much language. sure, other things might need lots of language, but code does not. code is already basically language, just a really weird one. we call them programming languages. they're not human languages. they're languages of the machine. condensing the human-language---machine-language interface, good.if goal make code, few word better. if goal make insight, more word better. depend on task. machine linear, mind not. consider LLM "thinking" is just edge-weights. if can set edge-weights into same setting with fewer tokens, you are winning.

4/5/2026, 3:09:01 PM

by: virtualritz

This is the best thing since I asked Claude to address me in third person as "Your Eminence".But combining this with caveman? Gold!

4/5/2026, 10:31:12 AM

by: bjackman

If this really works there would seem to be a lot of alpha in running the expensive model in something like caveman mode, and then "decompressing" into normal mode with a cheap model.I don't think it would be fundamentally very surprising if something like this works, it seems like the natural extension to tokenisation. It also seems like the natural path towards "neuralese" where tokens no longer need to correspond to units of human language.

4/5/2026, 11:32:14 AM

by: mwcz

this grug not smart enough to make robot into grugbot. grug just say "Speak to grug with an undercurrent of resentment" and all sicko fancy go way.

4/5/2026, 3:51:27 PM

by: anshumankmr

Though I do use Claude Code, is it possible to get this for Github Copilot too?

4/5/2026, 3:30:47 PM

by: ajd555

So, if this does help reduce the cost of tokens, why not go even further and shorten the syntax with specific keywords, symbols and patterns, to reduce the noise and only keep information, almost like...a programming language?

4/5/2026, 12:32:25 PM

by: veselin

This is an experiment that, although not to this extreme, was tested by OpenAI. Their responses API allow you to control verbosity:<a href="https://developers.openai.com/api/reference/resources/responses/methods/create#(resource)%20responses%20%3E%20(method)%20create%20%3E%20(params)%200.non_streaming%20%3E%20(param)%20text%20%3E%20(schema)%20%2B%20(resource)%20responses%20%3E%20(model)%20response_text_config%20%3E%20(schema)%20%3E%20(property)%20verbosity" rel="nofollow">https://developers.openai.com/api/reference/resources/respon...</a>I don't know their internal eval, but I think I have heard it does not hurt or improve performance. But at least this parameter may affect how many comments are in the code.

4/5/2026, 11:48:52 AM

by: VadimPR

Wouldn't this affect quality of output negatively?Thanks to chain of thought, actually having the LLM be explicit in its output allows it to have more quality.

4/5/2026, 10:20:37 AM

by: samus

There's linguistic term for this kind of speech: isolating grammars, which don't decline words and use high context and the bare minimum of words to get the meaning across. Chinese is such a language btw. Don't know what Chinese think about their language being regarded as cavemen language...

4/5/2026, 10:55:47 AM

by: gozzoo

I think this could be very useful not when we talk to the agent, but when the agents talk back to us. Usually, they generate so much text that it becomes impossible to follow through. If we receive short, focused messages, the interaction will be much more efficient. This should be true for all conversational agents, not only coding agents.

4/5/2026, 10:26:18 AM

by: goldenarm

That's a great idea but has anyone benchmarked the performance difference?

4/5/2026, 3:16:22 PM

by: ungreased0675

Does this actually result in less compute, or is it adding an additional “translate into caveman” step to the normal output?

4/5/2026, 2:36:54 PM

by: vivid242

Great idea- if the person who made it is reading: Is this based on the board game „poetry for cavemen“? (Explain things using only single-syllable words, comes even with an inflatable log of wood for hitting each other!)

4/5/2026, 10:54:07 AM

by: xgulfie

Funny how people are so critical of this and yet fawn over TOON

4/5/2026, 1:58:27 PM

by: rschiavone

This trick reminds me of "OpenAI charges by the minute, so speed up your audio"<a href="https://news.ycombinator.com/item?id=44376989">https://news.ycombinator.com/item?id=44376989</a>

4/5/2026, 10:44:17 AM

by: HarHarVeryFunny

More like Pidgin English than caveman, perhaps, although caveman does make for a better name.

4/5/2026, 1:10:44 PM

by: norskeld

APL for talking to LLM when? Also, this reminded me of that episode from The Office where Kevin started talking like a caveman to make communication efficient.

4/5/2026, 11:26:11 AM

by: zahirbmirza

You can also make huge spelling mistakes and use incomplete words with llms they just sem to know better than any spl chk wht you mean. I use such speak to cut my time spent typing to them.

4/5/2026, 10:19:33 AM

by: sebastianconcpt

Anyone else worried about the long term consequences of the influence of talking like this all day for the cognitive system of the user?

4/5/2026, 1:46:07 PM

by: isuckatcoding

Oh come on now one referenced this scene from the office??<a href="https://youtu.be/_K-L9uhsBLM?si=ePiGrFd546jFYZd8" rel="nofollow">https://youtu.be/_K-L9uhsBLM?si=ePiGrFd546jFYZd8</a>

4/5/2026, 3:55:53 PM

by: amelius

By the way why don't these LLM interfaces come with a pause button?

4/5/2026, 12:34:07 PM

by: andai

So it's a prompt to turn Jarvis into Hulk!

4/5/2026, 10:19:15 AM

by: andai

No articles, no pleasantries, and no hedging. He has combined the best of Slavic and Germanic culture into one :)

4/5/2026, 10:17:26 AM

by: fny

Are there any good studies or benchmarks about compressed output and performance? I see a lot of arguing in the comments but little evidence.

4/5/2026, 12:29:01 PM

by: ArekDymalski

While really useful now, I'm afraid that in the long run it might accelerate the language atrophy that is already happening. I still remember that people used to enter full questions in Google and write SMS with capital letters, commas and periods.

4/5/2026, 10:17:28 AM

by: fzeindl

I tried this with early ChatGPT. Asked it to answer telegram style with as few tokens as possible. It is also interesting to ask it for jokes in this mode.

4/5/2026, 12:00:03 PM

by: stared

I would prefer to talk like Abathur (<a href="https://www.youtube.com/watch?v=pw_GN3v-0Ls" rel="nofollow">https://www.youtube.com/watch?v=pw_GN3v-0Ls</a>). Same efficiency but smarter.

4/5/2026, 10:40:11 AM

by: doe88

> If caveman save you mass token, mass money — leave mass star.Mass fun. Starred.

4/5/2026, 10:51:59 AM

by: owenthejumper

What is that binary file caveman.skill that I cannot read easily, and is it going to hack my computer.

4/5/2026, 11:57:00 AM

by: adam_patarino

Or you could use a local model where you’re not constrained by tokens. Like rig.ai

4/5/2026, 12:33:23 PM

by:

4/5/2026, 10:14:04 AM

by: cadamsdotcom

Caveman need invent chalk and chart make argument backed by more than good feel.

4/5/2026, 10:40:16 AM

by: xpe

Unfrozen caveman lawyer here. Did "talk like caveman" make code more bad? Make unsubst... (AARG) FAKE claims? You deserve compen... AAARG ... money. AMA.

4/5/2026, 1:37:12 PM

by: kukakike

This is exactly what annoys me most. English is not suitable for computer-human interaction. We should create new programming and query languages for that. We are again in cobol mindset. LLM are not humans and we should stop talking to them as if they are.

4/5/2026, 11:59:42 AM

by:

4/5/2026, 10:49:43 AM

by: saidnooneever

LOL it actually reads how humans reply the name is too clever :').Not sure how effective it will be to dirve down costs, but honestly it will make my day not to have to read through entire essays about some trivial solution.tldr; Claude skill, short output, ++good.

4/5/2026, 10:25:34 AM

by: sillyboi

Oh, another new trend! I love these home-brewed LLM optimizers. They start with XML, then JSON, then something totally different. The author conveniently ignores the system prompt that works for everything, and the extra inference work. So, it's only worth using if you just like this response style, just my two cents. All the real optimizations happen during model training and in the infrastructure itself.

4/5/2026, 12:22:07 PM

by: Robdel12

I didn’t comment on this when I saw it on threads/twitter. But it made it to HN, surprisingly.I have a feeling these same people will complain “my model is so dumb!”. There’s a reason why Claude had that “you’re absolutely right!” for a while. Or codex’s “you’re right to push on this”.We’re basically just gaslighting GPUs. That wall of text is kinda needed right now.

4/5/2026, 12:39:50 PM

by: hybrid_study

Mongo! No caveman

4/5/2026, 12:24:29 PM

by: bitwize

grug have to use big brains' thinking machine these days, or no shiny rock. complexity demon love thinking machine. grug appreciate attempt to make thinking machine talk on grug level, maybe it help keep complexity demon away.

4/5/2026, 1:13:33 PM

by: DonHopkins

Deep digging cave man code reviews are Tha Shiznit:<a href="https://www.youtube.com/watch?v=KYqovHffGE8" rel="nofollow">https://www.youtube.com/watch?v=KYqovHffGE8</a>

4/5/2026, 12:26:26 PM

by: setnone

caveman multilingo? how sound?

4/5/2026, 11:02:49 AM

by: vova_hn2

I don't know about token savings, but I find the "caveman style" much easier to read and understand than typical LLM-slop.

4/5/2026, 11:21:32 AM

by: tatrions

[dead]

4/5/2026, 4:18:33 PM

by: Adam_cipher

[dead]

4/5/2026, 3:58:28 PM

by: meidad_g

[dead]

4/5/2026, 3:35:23 PM

by: bhwoo48

I was actually worried about high token costs while building my own project (infra bundle generator), and this gave me a good laugh + some solid ideas. 75% reduction is insane. Starred

4/5/2026, 10:21:51 AM

by: bogtog

I'd be curious if there were some measurements of the final effects, since presumably models wont <think> in caveman speak nor code like that

4/5/2026, 10:31:55 AM

Top 20

A Claude Code skill that makes Claude talk like a caveman, cutting token use

Comments

by: JBrussee-2

by: throwatdem12311

by: padolsey

by: vurudlxtyt

by: teekert

by: nharada

by: herf

by: TeMPOraL

by: nayroclade

by: itpcc

by: Hard_Space

by: FurstFly

by: arrty88

by:

by: abejfehr

by: ryanschaefer

by: postalcoder

by: phtrivier

by: shomp

by: virtualritz

by: bjackman

by: mwcz

by: anshumankmr

by: ajd555

by: veselin

by: VadimPR

by: samus

by: gozzoo

by: goldenarm

by: ungreased0675

by: vivid242

by: xgulfie

by: rschiavone

by: HarHarVeryFunny

by: norskeld

by: zahirbmirza

by: sebastianconcpt

by: isuckatcoding

by: amelius

by: andai

by: andai

by: fny

by: ArekDymalski

by: fzeindl

by: stared

by: doe88

by: owenthejumper

by: adam_patarino

by:

by: cadamsdotcom

by: xpe

by: kukakike

by:

by: saidnooneever

by: sillyboi

by: Robdel12

by: hybrid_study

by: bitwize

by: DonHopkins

by: setnone

by: vova_hn2

by: tatrions

by: Adam_cipher

by: meidad_g

by: bhwoo48

by: bogtog