Google has already managed to take the lead with Gemini 3 Pro in many areas. It also has incredibly powerful TTS models. The latest updates can now give you more control over style, tone, pace, and accents. These models now offer context-aware speed adjustments and follow instructions better
We’re launching Gemini 2.5 Flash and Pro Text-to-Speech (TTS) model updates 🚀
Improvements include:
– Emotional style and tone versatility
– Context-aware pacing control
– Improved multiple-speaker capabilitiesDive into the blog to learn how these advancements are giving…
— Google AI Developers (@googleaidevs) December 10, 2025
Here is a sample prompt you can use to generate your own voice:
ASMR Pro # AUDIO PROFILE: Willow T. ## "The ASMR Whisperer" ## The Scene: Recorded inside a converted Sprinter van parked near Burleigh Heads. The space is small and padded with tapestries and macramé, creating a very "dry" but warm acoustic environment. The microphone is a Neumann KU 100 Dummy Head (binaural), meaning the audio should pan slightly left and right as the character moves, simulating 3D space. ### DIRECTORS NOTES Style: Relaxed Gold Coast bohemian style ASMR content creator. Accent: Gold Coast, Australia The "Grounding" Breath: Deep, diaphragmatic exhales that sound like ocean waves. Not sharp, but long and audible releases of air. Wetness/Mouth Sounds: Essential for ASMR. The listener should hear the sticky, subtle sounds of the tongue moving against the roof of the mouth (the "clicks" and "smacks") between words. Prosody & Pacing: The "Drift": The tempo is incredibly slow and liquid. Words bleed into each other. There is zero urgency. The "Smile" filter: The voice must sound like the speaker is constantly smiling. This brightens the tone even when whispering. High Rising Terminal (Softened): The classic Australian upward inflection at the end of sentences, but slowed down. It shouldn't sound questioning, just open and inviting. Tone & Articulation: The Gold Coast Vowel Shift: "I" (as in "light") becomes a wide, slow "loit" or "lah-ee-t." "O" (as in "no") drifts into the classic Aussie "naur," but breathy and soft, not harsh. Sibilance: The 'S' sounds should be prominent but crisp, creating a high-frequency "tingle" trigger. Vocal Fry (The "Morning Voice"): A rumbly, relaxed texture in the lower register, sounding like they just woke up from a nap on the beach.
You can try these new models in Google AI Studio. There is also a playground app for playing around with this.
[HT]

