Nari Labs Dia Outperforms ElevenLabs, Sesame CSM-1B

Sesame and ElevanLabs have some realistic sources. The Dia-1.6B model can give them a run for their money. It generates realistic dialog from a transcript. It comes with emotion and tone control, so it can produce laughter, coughing, and other nonverbal communications.

Here is how a prompt would look like when using this model:

[S1] Oh fire! Oh my goodness! What’s the procedure? What to we do people? The smoke could be coming through an air duct!

[S2] Oh my god! Okay.. it’s happening. Everybody stay calm!

[S1] What’s the procedure…

[S2] Everybody stay fucking calm!!!… Everybody fucking calm down!!!!! [S1] No! No! If you touch the handle, if its hot there might be a fire down the hallway!

You can test this tool on Hugging Face.

What's Hot

How to Generate Animal Olympics Videos with AI

MeiGen-MultiTalk: Open Source Audio Driven Multi-Person Videos

Hunyuan-A13B Open Source LLM Debuts, Competes with o1, DeepSeek

ElevenLabs Voice Design v3 Announced

OpenAI Introduces New Stunning AI Audio Models

Mureka Now Gets MCP Integration

Sidekick Spatial AI Companion for Vision Pro

ChatGPT Search Now Available without a Sign Up

ElevenLabs Voice Design v3 Announced

OpenAI Introduces New Stunning AI Audio Models

Most Popular

Prompt Cannon: Run Prompts Across Multiple Models

GPTARS: GPT Powered TARS Robot

Simple Grok 2 Jailbreak

Our Picks

How to Generate Animal Olympics Videos with AI

MeiGen-MultiTalk: Open Source Audio Driven Multi-Person Videos

Hunyuan-A13B Open Source LLM Debuts, Competes with o1, DeepSeek

What's Hot

Nari Labs Dia Outperforms ElevenLabs, Sesame CSM-1B

Related Posts