OpenAI Introduces New Stunning AI Audio Models

AI audio has gotten quite realistic over the years. ElevenLabs, Hume, Sesame, and others have already many ways for you to generate realistic audio for your projects. OpenAI has launched new new speech-to-text and text-to-speech audio models in the API, so you can make customizable voice agents. You can now instruct these models to speak in specific ways.

As OpenAI explain on their blog, you can make calm, professional, and various other voice styles. gpt-4o-transcribe and gpt-4o-mini-transcribe have better language recognition and accuracy. There is an interactive demo available for developers to get a better sense how these models work.

[HT]

What's Hot

Grok 4 To Be Unveiled Tomorrow

Turntable Creates 360-Degree Videos from Images

Hunyuan3D-PolyGen Art Grade 3D Generative Model

ElevenLabs Voice Design v3 Announced

Nari Labs Dia Outperforms ElevenLabs, Sesame CSM-1B

Free Open Computer Agent Hits Hugging Face

DepthFlow: Free Open Source Alternative to Immersity AI

Perplexity Introduces Deep Research, Up to 500 Queries Per Day for Pro Users

How to Generate Animal Olympics Videos with AI

MeiGen-MultiTalk: Open Source Audio Driven Multi-Person Videos

Hunyuan-A13B Open Source LLM Debuts, Competes with o1, DeepSeek

Most Popular

Prompt Cannon: Run Prompts Across Multiple Models

GPTARS: GPT Powered TARS Robot

Simple Grok 2 Jailbreak

Our Picks

Grok 4 To Be Unveiled Tomorrow

Turntable Creates 360-Degree Videos from Images

Hunyuan3D-PolyGen Art Grade 3D Generative Model

What's Hot

OpenAI Introduces New Stunning AI Audio Models

Related Posts