Hume EVI 3: New Generation of AI Voices Rivals GPT-4o and Gemini Live

AI research company Hume AI has unveiled the third generation of its Empathic Voice Interface, known as EVI 3.

The company noted that the new model doesn’t just understand speech; it delves into human emotions, offering users a high ability to create custom AI voices that are realistic and expressive.

EVI 3’s launch comes as tech giants race to develop AI models capable of more natural and personal communication.

Hume reported that EVI 3 can interact with users through a very wide range of human-like voices.

Instead of a fixed list of voice attributes, users can now describe desired voice characteristics in natural language, and the model then creates that voice.

This feature opens the door to unique voice experiences, whether the desired voice is an “old-time comedian,” a “seasoned life coach,” or even the philosopher David Hume, after whom the company is named.

Hume AI EVI 3 system interface displaying a conversation with the AI voice model, showing its ability to analyze human emotions like calmness and enthusiasm during interaction. — EVI 3 interface, showing the model’s ability to analyze emotions in voice conversations. Photo: Hume AI

Performance Outperforming Competitors

Hume, as stated on its website, aims to “ensure AI is built to serve human goals and emotional well-being.”

While this approach is reminiscent of major AI companies like OpenAI, Hume distinguishes itself with a precise focus on the “authenticity” of its models.

The goal here is for conversations to sound genuine, including spontaneous pauses or slight hesitations, rather than just robotic voice simulation.

In this context, internal tests conducted by Hume showed that EVI 3 delivered remarkable performance when compared to other leading voice models.

According to the company’s blog, EVI 3 excelled in aspects like “emotion/style modulation” during conversation and its ability to “understand emotions” in user voices, surpassing models like OpenAI’s GPT-4o in these specific metrics.

The company also noted that preliminary tests showed EVI 3 had lower response times compared to GPT-4o and Gemini Live under certain conditions.

Accessing EVI 3

Currently, interested users can experience EVI 3 through a live demo on the “hume.ai” website and via the company’s iOS app.

An API for developers is scheduled to be available in the coming weeks, opening the door to integrating this technology into various applications and services, from customer support systems to interactive stories and games.

The company has not yet announced pricing details for the EVI 3 API, but given EVI 2’s pricing ($0.072 per minute), the new pricing is expected to be usage-based.

The model currently focuses on English, with future plans to support other major languages like French, Spanish, German, and Italian, following further training and general release.

It’s worth noting that voice cloning, a feature offered by other companies, is not currently available in EVI 3.

Hume focuses on flexible voice customization, emphasizing the importance of safeguards and ethical considerations before widely releasing such features, though there are indications that this capability might be added to its Octave text-to-speech model in the future.

Hume EVI 3: New Generation of AI Voices Rivals GPT-4o and Gemini Live

Performance Outperforming Competitors

Accessing EVI 3

Related Articles

Veo 3 Fast: A New High-Speed Solution for AI-Powered Audio Video

Voxtral: Mistral’s Open-Source Voice Model Bests Whisper & GPT-4o

Google Rolls Out 3 New Voices for Gemini

Search Live: Speak and Interact Directly with Google Search

Comments

No Comments Yet