Hacker News Viewer

Show HN: Ghost Pepper – Local hold-to-talk speech-to-text for macOS

by MattHart88 on 4/6/2026, 7:50:16 PM

I built this because I wanted to see how far I could get with a voice-to-text app that used 100% local models so no data left my computer. I've been using a ton for coding and emails. Experimenting with using it as a voice interface for my other agents too. 100% open-source MIT license, would love feedback, PRs, and ideas on where to take it.

https://github.com/matthartman/ghost-pepper

Comments

by: goodroot

Nice one! For Linux folks, I developed <a href="https:&#x2F;&#x2F;github.com&#x2F;goodroot&#x2F;hyprwhspr" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;goodroot&#x2F;hyprwhspr</a>.<p>On Linux, there&#x27;s access to the latest Cohere Transcribe model and it works very, very well. Requires a GPU though. Larger local models generally shouldn&#x27;t require a subordinate model for clean up.<p>Have you compared WhisperKit to faster-whisper or similar? You might be able to run turbov3 successfully and negate the need for cleanup.<p>Incidentally, waiting for Apple to blow this all up with native STT any day now. :)

4/6/2026, 8:08:38 PM


by: parhamn

I see a lot of whisper stuff out there. Are these updated models are the same old OpenAI whispers or have they been updated heavily?<p>I&#x27;ve been using parakeet v3 which is fantastic (and tiny). Confused still seeing whisper out there.

4/6/2026, 9:03:22 PM


by: __mharrison__

Cool, I&#x27;ve been doing a lot of &quot;coding&quot; (and other typing tasks) recently by tapping a button on my Stream Deck. It starts recording me until I tap it again. At which point, it transcribes the recording and plops it into the paste buffer.<p>The button next to it pastes when I press it. If I press it again, it hits the enter command.<p>You can get a lot done with two buttons.

4/6/2026, 9:34:04 PM


by: charlietran

Thank you for sharing, I appreciate the emphasis on local speed and privacy. As a current user of Hex (<a href="https:&#x2F;&#x2F;github.com&#x2F;kitlangton&#x2F;Hex" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;kitlangton&#x2F;Hex</a>), which has similar goals, what are your thoughts on how they compare?

4/6/2026, 8:00:54 PM


by: lostathome

If anyone interested, I built Hitoku Draft. It is a context aware voice assistant. Local models only.<p>Here is an example <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=Dw_q6l3Cwp4" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=Dw_q6l3Cwp4</a><p>I was mainly motivated by papers like this <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;pdf&#x2F;2602.16800" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;pdf&#x2F;2602.16800</a>. But I found myself using it during vacation when I did not have internet connection.<p><a href="https:&#x2F;&#x2F;hitoku.me&#x2F;draft&#x2F;" rel="nofollow">https:&#x2F;&#x2F;hitoku.me&#x2F;draft&#x2F;</a><p>I setup a code for people to download it (HITOKUHN2026), in case you want to compare, or just give feedback!

4/6/2026, 8:59:02 PM


by: douglaswlance

does it input the text as soon as it hears it? or does it wait until the end?

4/6/2026, 9:49:33 PM


by: konaraddi

That’s awesome! Do you know how it compares to Handy? Handy is open source and local only too. It’s been around a while and what I’ve been using.<p><a href="https:&#x2F;&#x2F;github.com&#x2F;cjpais&#x2F;handy" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;cjpais&#x2F;handy</a>

4/6/2026, 8:13:48 PM


by: ericmcer

I see quite a few of these, the killer feature to me will be one that fine tunes the model based on your own voice.<p>E.G. if your name is `Donold` (pronounced like Donald) there is not a transcription model in existence that will transcribe your name correctly. That means forget inputting your name or email ever, it will never output it correctly.<p>Combine that with any subtleties of speech you have, or industry jargon you frequently use and you will have a much more useful tool.<p>We have a ton of options for &quot;predict the most common word that matches this audio data&quot; but I haven&#x27;t found any &quot;predict MY most common word&quot; setups.

4/6/2026, 9:17:55 PM


by: romeroej

always mac. when windows? why can you just make things multios

4/6/2026, 9:51:19 PM


by: ipsum2

Parakeet is significantly more accurate and faster than Whisper if it supports your language.

4/6/2026, 8:06:45 PM


by: purplehat_

Hi Matt, there&#x27;s lots of speech-to-text programs out there with varying levels of quality. 100% local is admirable but it&#x27;s always a tradeoff and users have to decide for themselves what&#x27;s worth it.<p>Would you consider making available a video showing someone using the app?

4/6/2026, 9:35:41 PM


by: Supercompressor

I&#x27;ve been looking for the opposite - wanting to dump text and it be read to me, coherently. Anyone have good recommendations?

4/6/2026, 9:16:40 PM


by: mathis

If you don&#x27;t feel like downloading a large model, you can also use `yap dictate`. Yap leverages the built-in models exposed though Speech.framework on macOS 26 (Tahoe).<p>Project repo: <a href="https:&#x2F;&#x2F;github.com&#x2F;finnvoor&#x2F;yap" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;finnvoor&#x2F;yap</a>

4/6/2026, 8:40:58 PM


by: hyperhello

Feature request or beg: let me play a speech video and transcribe it for me.

4/6/2026, 8:55:01 PM


by: gegtik

how does this compare to macos built in siri TTS, in quality and in privacy?

4/6/2026, 9:15:25 PM


by: guzik

Sadly the app doesn&#x27;t work. There is no popup asking for microphone permission.<p>EDIT: I see there is an open issue for that on github

4/6/2026, 9:00:17 PM


by: aristech

Great job. How about the supported languages? System languages gets recognised?

4/6/2026, 8:57:10 PM