Latest AI news, expert analysis, bold opinions, and key trends — delivered to your inbox.
Two undergrads. Zero funding. And yet—Korean startup Nari Labs has just released Dia, an open-source text-to-speech (TTS) model that's already outperforming commercial giants like ElevenLabs and Sesame.
Dia is a 1.6B parameter model that supports:
– Emotional tones (happy, sad, angry, etc.)
– Multiple speaker tags
– Nonverbal cues like laughter, coughs, and even screams
Inspired by Google NotebookLM, the team used Google’s TPU Research Cloud for training compute—free access, high output.
In side-by-side comparisons, Dia beat out ElevenLabs Studio and Sesame’s CSM-1B in timing accuracy, expressiveness, and nonverbal script handling.
According to founder Toby Kim, Nari Labs plans to build a consumer-facing app for social content creation and remixing using the Dia model.
Dia isn’t just a technical breakthrough—it’s a cultural moment.
It proves Sam Altman’s idea that “you can just do things” is more real than ever.
With zero VC backing and no formal research pedigree, two students built a TTS model that rivals industry leaders.
AI is democratizing innovation. If you've ever thought about building something... this is your sign.