Hacker News Viewer

Show HN: I trained a 9M speech model to fix my Mandarin tones

by simedw on 1/31/2026, 12:51:27 AM

Built this because tones are killing my spoken Mandarin and I can&#x27;t reliably hear my own mistakes.<p>It&#x27;s a 9M Conformer-CTC model trained on ~300h (AISHELL + Primewords), quantized to INT8 (11 MB), runs 100% in-browser via ONNX Runtime Web.<p>Grades per-syllable pronunciation + tones with Viterbi forced alignment.<p>Try it here: <a href="https:&#x2F;&#x2F;simedw.com&#x2F;projects&#x2F;ear&#x2F;" rel="nofollow">https:&#x2F;&#x2F;simedw.com&#x2F;projects&#x2F;ear&#x2F;</a>

https://simedw.com/2026/01/31/ear-pronunication-via-ctc/

Comments

by: dapangzi

Longtime lurker, made an account specifically to give feedback here as an intermediate speaker. :)<p>This is a great initiative and I hope to see more come out of this; I am not criticizing, but just want to provide my user experience here so you have data points.<p>In short, my experience lines up with your native speakers.<p>I found that it loses track of the phonemes when speaking quickly, and tones don&#x27;t seem to line up when speaking at normal conversational speed.<p>For example, if I say 他是我的朋友 at normal conversational speed, it will assign `de` to 我, sometimes it interprets that I didn&#x27;t have the retroflexive in `shi` and renders it `si`. Listened back to make sure I said everything, the phonemes are there in the recording, but the UI displays the wrong phonemes and tones.<p>By contrast, if I speak slowly and really push each tone, the phonemes and tones all register correctly.<p>Also, is this taking into account tone transformation? Example, third tones (bottom out tone) tend to smoosh into a second tone (rising) when multiple third tones are spoken in a row. Sometimes the first tone influences the next tone slightly, etc.<p>Again, great initiative, but I think it needs a way to deal with speech that is conversationally spoken and maybe even slurred a bit due to the nature of conversational level speech.

1/31/2026, 2:17:57 AM


by: olalonde

Feedback: it might be a mic issue but my wife, who is a native speaker, seems to get most characters wrong according to the app. I will try again later in a quieter environment see if that helps.

1/31/2026, 6:06:26 AM


by: ecshafer

Anyone that is a native European language speaker that hasn&#x27;t tried to learn Chinese or some other tonal language, its really hard to understand how hard it is. The tones can really be very subtle, and your ear is not fine tuned to them. So you <i>think</i> you are saying it right, but native speakers have no idea what you are saying.

1/31/2026, 2:21:27 AM


by:

1/31/2026, 4:39:18 AM


by: bunderbunder

This is very cool, but from one Mandarin learner to another I’d caution against relying too heavily on any external feedback mechanism for improving your pronunciation.<p>If you can’t easily hear your pronunciation mistakes so clearly it hurts, consider putting more energy into training your ear. Adult language learners usually have brains that have become resistant to, but not incapable of, changing the parts of the brain responsible for phoneme recognition. The neuroplasticity is still there but it needs some nudging with focused exercises that make it clear to your brain exactly what the problem is. Minimal pair recognition drills, for example, are a great place to start.<p>It’s not the most fun task, but it’s worth it. You will tighten the pronunciation practice feedback loop much more than is possible with external feedback, so a better accent is the most obvious benefit. But beyond that, it will make a night and day difference for your listening comprehension. And that will get you access to more interesting learning materials sooner. Which hopefully increases your enjoyment and hence your time on task. Plus, more accurate and automatic phoneme recognition leaves more neurological resources free for processing other aspects of your input materials. So it may even help speed things like vocabulary and grammar acquisition.

1/31/2026, 4:37:27 AM


by: vunderba

When I was living in Taiwan, one of the ways I forced myself to remember to pronounce the tones distinctly was by waving my hand in front of me, tracing the arc of each character’s tone.<p>It helped a lot even if I did look like an insane expat conducting an invisible orchestra.<p>One more thing: there&#x27;s quite a bit of variation in how regional accents in the mainland can affect tonal pronunciation. It might be worth reaching to some native speakers to give you some baseline figures.

1/31/2026, 1:32:18 AM


by: frozennothing

This is really cool. Thank you for sharing. Before now I had not sought to understand how this technology works under the hood, but seeing it done at this scale made me curious to see if I could do something similar.

1/31/2026, 5:00:27 AM


by: tifan

Well, it would work only when I speak word by word, not as a sentence or in a normal speed for daily conversations. The model thinks I was making mistakes when I speak casually (as a native Chinese speaker, I had Mandarin 2A certification, which is required for teachers or other occupations that requires a very high degree of Mandarin accuracy). You wouldn’t really notice it but language pronunciations is very different between causal and formal speech…

1/31/2026, 4:02:53 AM


by: memalign

I wish this had a pinyin mode…! I am learning to speak Mandarin but I am not learning to read&#x2F;write.<p>( I’m learning using a flashcards web app I made and continue to update with vocab I encounter or need: <a href="https:&#x2F;&#x2F;memalign.github.io&#x2F;m&#x2F;mandarin&#x2F;cards&#x2F;index.html" rel="nofollow">https:&#x2F;&#x2F;memalign.github.io&#x2F;m&#x2F;mandarin&#x2F;cards&#x2F;index.html</a> )

1/31/2026, 4:11:28 AM


by: rahimnathwani

This is incredible. When I was first learning Chinese (casually, ~20 years ago), my teacher used some Windows software that drew a diagram of the shape of my pronunciation, so she could illustrate what I was getting wrong in some objective way.<p>The thing you&#x27;ve built is so good, and I would have loved to have it when I was learning Mandarin.<p>I tried it with a couple of sentences and it did a good job of identifying which tones were off.

1/31/2026, 1:36:45 AM


by: cocoa19

Have you tried the Azure Speech Studio? I wonder how your custom model compares to this solution.<p>I played around with python scripts for the same purpose. The AI gives feedback that can be transformed to a percentage of correctness. One annoyance is that for Mandarin, the percentage is calculated at the character level, whereas with English, it gives you a more granular score at the phoneme level.

1/31/2026, 4:52:23 AM


by: rablackburn

&gt; And if there’s one thing we’ve learned over the last decade, it’s the bitter lesson: when you have enough data and compute, learned representations usually beat carefully hand-tuned systems.<p>There are still holdouts!<p>Come back to me in a couple of decades when the trove of humanity&#x27;s data has been pored over and drifted further out of sync with (verifiable) reality.<p>Hand-tuning is the only way to make progress when you&#x27;ve hit a domain&#x27;s limits. Go deep and have fun.

1/31/2026, 4:03:29 AM


by: stuxnet79

How difficult would it be to adapt this to Cantonese? It is a surprisingly difficult language to learn. It has more tones than Mandarin plus comparatively less access to learning resources (in my experience)

1/31/2026, 3:22:25 AM


by: ChadNauseam

This is amazing. I&#x27;m also working on free language learning tech. (I have some SOTA NLP models on huggingface and a free app.) I have some SOTA NLP models on huggingface and a free app. My most recent research is a list of every phrase [0].<p>Pronunciation correction is an insanely underdeveloped field. Hit me up via email&#x2F;twitter&#x2F;discord (my bio) if you&#x27;re interested in collabing.<p>[0]: <a href="https:&#x2F;&#x2F;gist.github.com&#x2F;anchpop&#x2F;acbfb6599ce8c273cc89c7d1bb363e93" rel="nofollow">https:&#x2F;&#x2F;gist.github.com&#x2F;anchpop&#x2F;acbfb6599ce8c273cc89c7d1bb36...</a>

1/31/2026, 3:13:44 AM


by: affogarty

This is extremely cool, although I asked my wife (who is Chinese) to try it out and it said she made some mistakes.

1/31/2026, 2:02:37 AM


by: SequoiaHope

Amazingly I just did the same thing! Only with AISHELL. It needs work. I used the encoder from the Meta MMS model.<p><a href="https:&#x2F;&#x2F;github.com&#x2F;sequoia-hope&#x2F;mandarin-practice" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;sequoia-hope&#x2F;mandarin-practice</a>

1/31/2026, 3:09:43 AM


by: baby

For people trying to say the &quot;j&quot; sound correctly, as in &quot;jiu&quot; (old), just say &quot;dz&quot;, so in that example &quot;dziu&quot;

1/31/2026, 3:35:40 AM


by: byb

Neat. A personal tone trainer. Seriously, shut up and take my money now. Of course, it needs a vocabulary trainer, and zhuyin&#x2F;traditional character support.

1/31/2026, 3:09:08 AM


by: jrockway

Interesting application! A friend of mine built a model like this to help her make her voice more feminine, and it is neat to see a similar use case here.

1/31/2026, 2:58:14 AM


by: bytesandbits

great work! I am going to try it out. Currently about to learn some Mandarin to be able to talk with hawker stand owners for a trip I am doing soon. I am trilingual and can speak a few languages on top of that, but none of them tonal. I am new to tonal languages and I find myself struggling with this... a lot!

1/31/2026, 2:25:51 AM


by:

1/31/2026, 3:53:57 AM


by: nirvanatikku

talk about 30 seconds to wow. great app, UX and demo. would love to use this. kudos.

1/31/2026, 2:39:56 AM


by: jellojello

This is amazing, if you feel like opening an entire language to being learned more easily.. Farsi is a VERY overlooked language, my wife&#x2F;her family speak it but it&#x27;s so difficult finding great language lessons (it&#x27;s also called Persian&#x2F;Dari)

1/31/2026, 1:25:53 AM


by: cmuguythrow

Awesome idea!

1/31/2026, 2:39:31 AM


by: dionian

it heard wu2 but i heard wo2 from you fine. and it should sound like wo2 not wo3 if spoken quickly. not a native speaker though so i could be wrong

1/31/2026, 3:00:30 AM


by: btrlsnqtn

The article mentions the bitter lesson. I&#x27;m confused about the status of Sutton&#x27;s opinion of the bitter lesson. On the one hand, he invented the concept. On the other hand, he appears to be saying that LLMs are not the correct approach to artificial intelligence, which to a naive outsider looks like a contradiction. What gives?

1/31/2026, 2:00:33 AM


by: drekipus

instantly awesome.<p>I suck at chinese but I want to get better and I&#x27;m too embarassed to try and talk with real people and practise.<p>This is a great compromise. even just practising for a few minutes I already feel way more confident based on its feedback, and I feel like I know more about the details of pronunciation.<p>I&#x27;m worried this might get too big and start sucking like everything else.

1/31/2026, 1:48:07 AM


by: iamanllm

holy crap, I was literally imaging how I wanted something exactly like this yesterday! you are a hero!

1/31/2026, 3:53:36 AM


by: funkyfiddler369

[flagged]

1/31/2026, 1:55:03 AM