OpenAI Unveils Groundbreaking Audio Tool with Human-like Voices

tech360.tv
Apr 2, 2024
3 min read

OpenAI unveils an audio tool that can read text and mimic human voices. Limited-scale preview of the text-to-speech model, Voice Engine, shared with developers. OpenAI decides against wider release due to concerns raised by stakeholders.

OpenAI, the renowned artificial intelligence company, has provided a sneak peek into its latest innovation - an audio tool capable of reading text and mimicking human voices. This breakthrough technology showcases the potential of AI while also raising concerns about the risks associated with deepfake content.

The company recently shared early demos and use cases from a limited-scale preview of their text-to-speech model, known as Voice Engine. OpenAI has granted access to approximately 10 developers thus far, but has decided against a wider release of the feature, as revealed in a briefing with reporters earlier this month.

OpenAI made this decision after receiving feedback from various stakeholders, including policymakers, industry experts, educators, and creatives. Initially, the company had planned to offer the tool to up to 100 developers through an application process. However, they recognised the serious risks involved in generating speech that closely resembles real voices, particularly in an election year.

In a blog post, OpenAI stated, "We are engaging with US and international partners from across government, media, entertainment, education, civil society, and beyond to ensure we are incorporating their feedback as we build." The company is actively seeking input and collaboration to address the potential challenges and implications of this technology.

While AI has been used to create fake voices in certain contexts, OpenAI's Voice Engine takes it a step further by generating speech that sounds like specific individuals, complete with their unique cadence and intonations. The software only requires a 15-second audio recording of a person speaking to recreate their voice convincingly.

During a demonstration, Bloomberg listened to a clip of OpenAI's CEO, Sam Altman, explaining the technology in a voice that was indistinguishable from his actual speech, despite being entirely AI-generated. Jeff Harris, a product lead at OpenAI, described the voice as "human-caliber" when the right audio setup is in place. However, he emphasised the need for caution due to the delicate nature of accurately mimicking human speech.

One of OpenAI's current partners, the Norman Prince Neurosciences Institute at Lifespan, a not-for-profit health system, is utilising the tool to help patients regain their voice. For instance, the technology was used to restore the speech of a young patient who had difficulty speaking clearly due to a brain tumor. By replicating her earlier recordings, the tool enabled her to communicate effectively for a school project.

OpenAI's custom speech model also has the ability to translate the generated audio into different languages, making it valuable for companies in the audio business, such as Spotify Technology SA. Spotify has already incorporated the technology into its pilot program to translate podcasts by popular hosts like Lex Fridman. OpenAI also highlighted the potential for creating a wider range of voices for educational content aimed at children.

To ensure responsible usage, OpenAI has implemented strict policies for its testing program. Partners must obtain consent from the original speaker before using their voice and disclose to listeners that the voices they hear are AI-generated. Additionally, OpenAI is incorporating inaudible audio watermarks to distinguish between audio created by their tool and other sources.

Before considering a broader release, OpenAI is actively seeking feedback from external experts. The company aims to foster a global understanding of the technology's trajectory, regardless of whether they ultimately deploy it on a large scale. OpenAI also hopes that this preview of their software will inspire the need to enhance societal resilience against the challenges posed by advanced AI technologies.

In their blog post, OpenAI called for banks to phase out voice authentication as a security measure and advocated for public education on deceptive AI content. They also emphasised the importance of developing techniques to detect whether audio content is real or AI-generated.

OpenAI unveils an audio tool that can read text and mimic human voices
Limited-scale preview of the text-to-speech model, Voice Engine, shared with developers
OpenAI decides against wider release due to concerns raised by stakeholders

Source: BLOOMBERG

OpenAI unveils an audio tool that can read text and mimic human voices. Limited-scale preview of the text-to-speech model, Voice Engine, shared with developers. OpenAI decides against wider release due to concerns raised by stakeholders.

Comments