Hacker News Viewer

April 2026 TLDR Setup for Ollama and Gemma 4 26B on a Mac mini

by greenstevester on 4/3/2026, 9:35:16 AM

https://gist.github.com/greenstevester/fc49b4e60a4fef9effc79066c1033ae5

Comments

by: redrove

There is virtually no reason to use Ollama over LM Studio or the myriad of other alternatives.<p>Ollama is slower and they started out as a shameless llama.cpp ripoff without giving credit and now they “ported” it to Go which means they’re just vibe code translating llama.cpp, bugs included.

4/3/2026, 10:25:23 AM


by: robotswantdata

Why are you using Ollama? Just use llama.cpp<p>brew install llama.cpp<p>use the inbuilt CLI, Server or Chat interface. + Hook it up to any other app

4/3/2026, 10:57:08 AM


by: easygenes

Why is ollama so many people’s go-to? Genuinely curious, I’ve tried it but it feels overly stripped down &#x2F; dumbed down vs nearly everything else I’ve used.<p>Lately I’ve been playing with Unsloth Studio and think that’s probably a much better “give it to a beginner” default.

4/3/2026, 10:40:31 AM


by: boutell

Last night I had to install the VO.20 pre-release of ollama to use this model. So I&#x27;m wondering if these instructions are accurate.

4/3/2026, 11:38:18 AM


by: greenstevester

Right. So Google released Gemma 4, a 26B mixture-of-experts model that only activates 4B parameters per token.<p>It&#x27;s essentially a model that&#x27;s learned to do the absolute minimum amount of work while still getting paid. I respect that enormously.<p>It scores 1441 on Arena Elo — roughly the same as Qwen 3.5 at 397B and Kimi k2.5 at 1100B.<p>Ollama v0.19 switched to Apple&#x27;s MLX framework on Apple Silicon. 93% faster decode.<p>They&#x27;ve also improved caching so your coding agents don&#x27;t have to re-read the entire prompt every time, about time I&#x27;d say.<p>The gist covers the full setup: install, auto-start on boot, keep the model warm in memory.<p>It runs on a 24GB Mac mini, which means the most expensive part of your local AI setup is still the desk you put it on.

4/3/2026, 9:35:16 AM


by: logicallee

In case someone would like to know what these are like on this hardware, I tested Gemma 4 32b (the ~20 GB model, the largest Gemma model Google published) and Gemma 4 gemma4:e4b (the ~10 GB model) on this exact setup (Mac Mini M4 with 24 GB of RAM using Ollama), I livestreamed it:<p><a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;live&#x2F;G5OVcKO70ns" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;live&#x2F;G5OVcKO70ns</a><p>The ~10 GB model is super speedy, loading in a few seconds and giving responses almost instantly. If you just want to see its performance, it says hello around the 2 minute mark in the video (and fast!) and the ~20 GB model says hello around 5 minutes 45 seconds in the video. You can see the difference in their loading times and speed, which is a substantial difference. I also had each of them complete a difficult coding task, they both got it correct but the 20 GB model was much slower. It&#x27;s a bit too slow to use on this setup day to day, plus it would take almost all the memory. The 10 GB model could fit comfortably on a Mac Mini 24 GB with plenty of RAM left for everything else, and it seems like you can use it for small-size useful coding tasks.

4/3/2026, 12:06:46 PM