Our Take
A team of four Gen Z builders said "we're going to make the best voice AI on the planet" and then just... did it. Shijia Liao is the CEO and Jiahua Liu—who's at Stanford, by the way—is the co-founder bringing that academic firepower. Fish Audio's S2 model just dropped and it's their most advanced text-to-speech system yet. They hit $5 million in ARR and 420,000 monthly active users with a four-person team. FOUR PEOPLE. That's not a startup. That's a cheat code. They came out of HF0, which is basically the incubator for people who are already cracked at AI before they even start a company. Fish Audio does voice cloning, text-to-speech, and speech-to-speech that sounds so human it's kind of unsettling in the best way possible. Their API serves developers building everything from audiobook platforms to AI companions to game studios that need dynamic voice acting without hiring a thousand voice actors. The voice AI space is getting crowded but Fish Audio is swimming past everyone else because they built their own models from scratch instead of fine-tuning someone else's work. When a Gen Z team is outpacing companies with 10x their headcount and 100x their funding, you pay attention. Fish Audio is the real deal.
Key Facts
The people behind Fish Audio S2
Links
Want products like this in your inbox every morning?
Five products. Every morning. Written by someone who actually cares whether they're good or not. Free forever, unsubscribe whenever.