Show HN: Ghost Pepper – Local hold-to-talk speech-to-text for macOS

by MattHart88 on 4/6/2026, 7:50:16 PM

I built this because I wanted to see how far I could get with a voice-to-text app that used 100% local models so no data left my computer. I've been using a ton for coding and emails. Experimenting with using it as a voice interface for my other agents too. 100% open-source MIT license, would love feedback, PRs, and ideas on where to take it.

https://github.com/matthartman/ghost-pepper

Comments

by: goodroot

Nice one! For Linux folks, I developed <a href="https://github.com/goodroot/hyprwhspr" rel="nofollow">https://github.com/goodroot/hyprwhspr</a>.On Linux, there's access to the latest Cohere Transcribe model and it works very, very well. Requires a GPU though. Larger local models generally shouldn't require a subordinate model for clean up.Have you compared WhisperKit to faster-whisper or similar? You might be able to run turbov3 successfully and negate the need for cleanup.Incidentally, waiting for Apple to blow this all up with native STT any day now. :)

4/6/2026, 8:08:38 PM

by: parhamn

I see a lot of whisper stuff out there. Are these updated models are the same old OpenAI whispers or have they been updated heavily?I've been using parakeet v3 which is fantastic (and tiny). Confused still seeing whisper out there.

4/6/2026, 9:03:22 PM

by: mharrison

Cool, I've been doing a lot of "coding" (and other typing tasks) recently by tapping a button on my Stream Deck. It starts recording me until I tap it again. At which point, it transcribes the recording and plops it into the paste buffer.The button next to it pastes when I press it. If I press it again, it hits the enter command.You can get a lot done with two buttons.

4/6/2026, 9:34:04 PM

by: charlietran

Thank you for sharing, I appreciate the emphasis on local speed and privacy. As a current user of Hex (<a href="https://github.com/kitlangton/Hex" rel="nofollow">https://github.com/kitlangton/Hex</a>), which has similar goals, what are your thoughts on how they compare?

4/6/2026, 8:00:54 PM

by: lostathome

If anyone interested, I built Hitoku Draft. It is a context aware voice assistant. Local models only.Here is an example <a href="https://www.youtube.com/watch?v=Dw_q6l3Cwp4" rel="nofollow">https://www.youtube.com/watch?v=Dw_q6l3Cwp4</a>I was mainly motivated by papers like this <a href="https://arxiv.org/pdf/2602.16800" rel="nofollow">https://arxiv.org/pdf/2602.16800</a>. But I found myself using it during vacation when I did not have internet connection.<a href="https://hitoku.me/draft/" rel="nofollow">https://hitoku.me/draft/</a>I setup a code for people to download it (HITOKUHN2026), in case you want to compare, or just give feedback!

4/6/2026, 8:59:02 PM

by: douglaswlance

does it input the text as soon as it hears it? or does it wait until the end?

4/6/2026, 9:49:33 PM

by: konaraddi

That’s awesome! Do you know how it compares to Handy? Handy is open source and local only too. It’s been around a while and what I’ve been using.<a href="https://github.com/cjpais/handy" rel="nofollow">https://github.com/cjpais/handy</a>

4/6/2026, 8:13:48 PM

by: ericmcer

I see quite a few of these, the killer feature to me will be one that fine tunes the model based on your own voice.E.G. if your name is `Donold` (pronounced like Donald) there is not a transcription model in existence that will transcribe your name correctly. That means forget inputting your name or email ever, it will never output it correctly.Combine that with any subtleties of speech you have, or industry jargon you frequently use and you will have a much more useful tool.We have a ton of options for "predict the most common word that matches this audio data" but I haven't found any "predict MY most common word" setups.

4/6/2026, 9:17:55 PM

by: romeroej

always mac. when windows? why can you just make things multios

4/6/2026, 9:51:19 PM

by: ipsum2

Parakeet is significantly more accurate and faster than Whisper if it supports your language.

4/6/2026, 8:06:45 PM

by: purplehat_

Hi Matt, there's lots of speech-to-text programs out there with varying levels of quality. 100% local is admirable but it's always a tradeoff and users have to decide for themselves what's worth it.Would you consider making available a video showing someone using the app?

4/6/2026, 9:35:41 PM

by: Supercompressor

I've been looking for the opposite - wanting to dump text and it be read to me, coherently. Anyone have good recommendations?

4/6/2026, 9:16:40 PM

by: mathis

If you don't feel like downloading a large model, you can also use `yap dictate`. Yap leverages the built-in models exposed though Speech.framework on macOS 26 (Tahoe).Project repo: <a href="https://github.com/finnvoor/yap" rel="nofollow">https://github.com/finnvoor/yap</a>

4/6/2026, 8:40:58 PM

by: hyperhello

Feature request or beg: let me play a speech video and transcribe it for me.

4/6/2026, 8:55:01 PM

by: gegtik

how does this compare to macos built in siri TTS, in quality and in privacy?

4/6/2026, 9:15:25 PM

by: guzik

Sadly the app doesn't work. There is no popup asking for microphone permission.EDIT: I see there is an open issue for that on github

4/6/2026, 9:00:17 PM

by: aristech

Great job. How about the supported languages? System languages gets recognised?

4/6/2026, 8:57:10 PM

Hacker News Viewer

Top 20

Show HN: Ghost Pepper – Local hold-to-talk speech-to-text for macOS

Comments

by: goodroot

by: parhamn

by: mharrison

by: charlietran

by: lostathome

by: douglaswlance

by: konaraddi

by: ericmcer

by: romeroej

by: ipsum2

by: purplehat_

by: Supercompressor

by: mathis

by: hyperhello

by: gegtik

by: guzik

by: aristech

Top 20

Show HN: Ghost Pepper – Local hold-to-talk speech-to-text for macOS

Comments

by: goodroot

by: parhamn

by: __mharrison__

by: charlietran

by: lostathome

by: douglaswlance

by: konaraddi

by: ericmcer

by: romeroej

by: ipsum2

by: purplehat_

by: Supercompressor

by: mathis

by: hyperhello

by: gegtik

by: guzik

by: aristech

by: mharrison