April 2026 TLDR Setup for Ollama and Gemma 4 26B on a Mac mini

by greenstevester on 4/3/2026, 9:35:16 AM

https://gist.github.com/greenstevester/fc49b4e60a4fef9effc79066c1033ae5

Comments

by: redrove

There is virtually no reason to use Ollama over LM Studio or the myriad of other alternatives.Ollama is slower and they started out as a shameless llama.cpp ripoff without giving credit and now they “ported” it to Go which means they’re just vibe code translating llama.cpp, bugs included.

4/3/2026, 10:25:23 AM

by: robotswantdata

Why are you using Ollama? Just use llama.cppbrew install llama.cppuse the inbuilt CLI, Server or Chat interface. + Hook it up to any other app

4/3/2026, 10:57:08 AM

by: easygenes

Why is ollama so many people’s go-to? Genuinely curious, I’ve tried it but it feels overly stripped down / dumbed down vs nearly everything else I’ve used.Lately I’ve been playing with Unsloth Studio and think that’s probably a much better “give it to a beginner” default.

4/3/2026, 10:40:31 AM

by: boutell

Last night I had to install the VO.20 pre-release of ollama to use this model. So I'm wondering if these instructions are accurate.

4/3/2026, 11:38:18 AM

by: greenstevester

Right. So Google released Gemma 4, a 26B mixture-of-experts model that only activates 4B parameters per token.It's essentially a model that's learned to do the absolute minimum amount of work while still getting paid. I respect that enormously.It scores 1441 on Arena Elo — roughly the same as Qwen 3.5 at 397B and Kimi k2.5 at 1100B.Ollama v0.19 switched to Apple's MLX framework on Apple Silicon. 93% faster decode.They've also improved caching so your coding agents don't have to re-read the entire prompt every time, about time I'd say.The gist covers the full setup: install, auto-start on boot, keep the model warm in memory.It runs on a 24GB Mac mini, which means the most expensive part of your local AI setup is still the desk you put it on.

4/3/2026, 9:35:16 AM

by: logicallee

In case someone would like to know what these are like on this hardware, I tested Gemma 4 32b (the ~20 GB model, the largest Gemma model Google published) and Gemma 4 gemma4:e4b (the ~10 GB model) on this exact setup (Mac Mini M4 with 24 GB of RAM using Ollama), I livestreamed it:<a href="https://www.youtube.com/live/G5OVcKO70ns" rel="nofollow">https://www.youtube.com/live/G5OVcKO70ns</a>The ~10 GB model is super speedy, loading in a few seconds and giving responses almost instantly. If you just want to see its performance, it says hello around the 2 minute mark in the video (and fast!) and the ~20 GB model says hello around 5 minutes 45 seconds in the video. You can see the difference in their loading times and speed, which is a substantial difference. I also had each of them complete a difficult coding task, they both got it correct but the 20 GB model was much slower. It's a bit too slow to use on this setup day to day, plus it would take almost all the memory. The 10 GB model could fit comfortably on a Mac Mini 24 GB with plenty of RAM left for everything else, and it seems like you can use it for small-size useful coding tasks.

4/3/2026, 12:06:46 PM

Hacker News Viewer

Top 20