MiMo-V2.5 Voice
Bilingual ASR for dialects, code-switching, and songs
Our Take
Xiaomi just dropped an 8B open-source speech model that actually competes with Whisper on accuracy. MiMo-2.5-ASR handles eight Chinese dialects, code-switched Chinese-English speech, AND song lyrics — no language-tagging post-processing required. The numbers back it up: 5.73% WER on English versus Whisper's 7.44%, 19.55% on Wu dialect versus FunASR's 29.08%, and 3.95% on lyrics. MIT licensed, free, self-hostable — and it addresses what the benchmark babies won't tell you: most ASR models look amazing on clean studio data and then quietly fail in production where audio is noisy, speakers overlap, and people switch languages mid-sentence. This is the move for voice product teams building bilingual or Chinese-language pipelines who need accuracy that actually holds up outside the lab.
MiMo-V2.5-ASR is an 8B open-source speech recognition model from Xiaomi that transcribes Mandarin, English, eight Chinese dialects, code-switched speech, and song lyrics. Built for ML engineers, researchers, and developers building real-world voice applications.
Key Facts
Links
Similar products worth knowing
Want products like this in your inbox every morning?
Five products. Every morning. Written by someone who actually cares whether they're good or not. Free forever, unsubscribe whenever.