Nari Labs Dia Outperforms ElevenLabs, Sesame CSM-1B

Sesame and ElevanLabs have some realistic sources. The Dia-1.6B model can give them a run for their money. It generates realistic dialog from a transcript. It comes with emotion and tone control, so it can produce laughter, coughing, and other nonverbal communications.

Here is how a prompt would look like when using this model:

[S1] Oh fire! Oh my goodness! What’s the procedure? What to we do people? The smoke could be coming through an air duct!

[S2] Oh my god! Okay.. it’s happening. Everybody stay calm!

[S1] What’s the procedure…

[S2] Everybody stay fucking calm!!!… Everybody fucking calm down!!!!! [S1] No! No! If you touch the handle, if its hot there might be a fire down the hallway!

You can test this tool on Hugging Face.

What's Hot

xAI API Gets Agentic Server-side Tool Cooling

Google AI Studio Gets New Playground Experience

Sora 2 Update, Veo 3.1 Released

Lipsync-2-pro: Edit What Anyone Says In Any Video

ElevenLabs Voice Design v3 Announced

OpenAI Introduces New Stunning AI Audio Models

Tencent Hunyuan Game: AIGC Engine for Game Production

Leonardo Releases Lucid Realism Image Model, Veo Coming Next?

Manus General AI Agent Is a Game Changer

Lipsync-2-pro: Edit What Anyone Says In Any Video

ElevenLabs Voice Design v3 Announced

OpenAI Introduces New Stunning AI Audio Models

Most Popular

Prompt Cannon: Run Prompts Across Multiple Models

Dipal D1 2.5K Curved Screen 3D AI Character

GPTARS: GPT Powered TARS Robot

Our Picks

xAI API Gets Agentic Server-side Tool Cooling

Google AI Studio Gets New Playground Experience

Sora 2 Update, Veo 3.1 Released

What's Hot

Nari Labs Dia Outperforms ElevenLabs, Sesame CSM-1B

Related Posts