ElevenLabs Eleven v3: Expressive AI Voice Generator in 70 Languages

Image displaying the text 'ELEVENLABS ELEVEN V.3' with a prominent vintage microphone on one side and diverse individuals showing expressive emotions like joy and engagement on the other, highlighting the capabilities of the AI voice model.

ElevenLabs, renowned for its innovations in AI-powered voice technology, has unveiled the alpha version of its latest text-to-speech model, "Eleven v3."

The company describes this model as its most expressive to date, marking a significant advancement in audio realism, the ability to convey human emotions, and support for over seventy world languages.

Features of Eleven v3

Eleven v3 possesses unique capabilities that extend far beyond simple text narration.

It can now infuse speech with genuine performance elements, such as laughter and whispers, or express a wide range of emotions like anger, sadness, or excitement.

Users can control these tones directly by inserting simple cues like [whispers] or [laughs] into the text.

The company explained that the new model can seamlessly shift its tone mid-sentence and can even sing or change dialects based on instructions.

YouTube video thumbnail showcasing the new Eleven v3 audio features

Among the other standout features in Eleven v3 is an advanced Dialogue Mode. This mode allows for the generation of natural-sounding conversations between multiple speakers, complete with spontaneous interruptions and realistic emotional shifts.

In a major step toward global accessibility, ElevenLabs has expanded language support in its new model from thirty-three to over seventy languages, a move set to cover approximately ninety percent of the world's population.

Full Control Over Emotions, Intonation, and Cues

Commenting on this development, Mati Staniszewski, co-founder and CEO of ElevenLabs, stated, "This release is the result of the vision and leadership of my co-founder, Piotr [Dabkowski], and the incredible research team he has built."

"Eleven v3 is the most expressive text-to-speech model ever, offering full control over emotions, intonation, and non-verbal cues," Staniszewski added.

He noted that while building a good product is hard, creating a whole new paradigm is nearly impossible, expressing the team's excitement to once again push the boundaries of innovation.

Eleven v3 is specifically designed for creators, developers, and businesses aiming to produce richly expressive audio content.

Its applications include audiobooks, character-driven storytelling, video game dialogue, as well as educational materials and professional voiceovers.

Access and Current Limitations

The alpha version of "Eleven v3" is currently available through the company's website (elevenlabs.io).

In line with the launch, the company has announced an 80% promotional discount on the model's usage via the user interface until the end of June.

Despite its vast potential, ElevenLabs noted that due to its experimental nature, the current version may require more effort in prompt engineering, and its latency might not be ideal for real-time applications or instant dialogues for now.

For these scenarios, the company recommends continuing to use its other models like "v2.5 Turbo" and "Flash."

However, work is underway to develop a low-latency version of v3.

A public API is also expected soon to grant developers greater ability to integrate this technology into their applications