Artificial intelligence (AI) that can imitate human voices is already a reality. Very different from the robotic speeches of virtual assistants such as Siri, Alexa or Cortana, this new technology is able to reproduce real speech patterns, tone and even bring an emotional charge to speech. Despite representing a major technological advance – which could also help with the inclusion of people with disabilities – the resource is also beset with a number of controversies, such as copyright issues, the potential for job loss for voice actors and application scams. Below, learn how this technology works, its potential uses, and its risks.
🔎 ChatGPT, DALL-E and more: how AI is impacting the future of creativity
📝 Is it possible to make artificial intelligence at home? Ask question on TechTudo forum
What are Voice Cloning AI and how do they work?
Along with already popular chatbots like Bard and ChatGPT, voice-cloning artificial intelligence uses deep learning techniques (from English) reading or learning) studying human speech patterns and being able to repeat them. It’s a great development of the synthetic voices already known, such as Google’s or Apple’s virtual assistants, which can also convert text into speech – but their voices are robotic and without any intonation or emotion.
The new technique combines machine learning strategies with an artificial neural network, a method used to train computers to process data like a human brain. Associated with this, robots are still given enormous amounts of data, such as a variety of speech patterns, vocal characteristics, languages, and typical accents. All this information is processed to form a system called “speech synthesis”. Thus, these AIs are capable of emulating human speech, vocalizing text, and mimicking emotions, in a very realistic way.
Some programs of this genre even allow you to “clone” any human voice in a simple way, simply by uploading a short audio so that the robot can reproduce any text with that person’s voice. For example, Microsoft’s artificial intelligence, Wall-E, can mimic someone’s speech from just three seconds of audio. The tool has been fed over 60,000 hours of human speech and has the ability to convert text to speech, simulate speech patterns, and preserve ambient sounds of the original audio. Despite being based on a very small sample size, the results are quite convincing.
LOVO is another text-to-speech platform that provides natural, machine-generated results. This artificial intelligence promises to give an emotional charge to the text, in addition to allowing the user to edit audio, change tempo, stop time and highlight emphasized points of speech. Having over 200 human-like voices in its database, LOVO also allows the user to create more personalized content by cloning their own voice. However, unlike Wall-E, LOVO requires the user to read a specific script for 15 minutes in order to be able to “clone”.
What are the possible uses of voice cloning AI?
With voice synthesis artificial intelligence becoming popular, it is inevitable to think of the countless possibilities these resources can bring to everyday life. The first concerns accessibility: people who have lost the ability to speak will be able to use AI to communicate, turning written text into their own voices. Similarly, people with visual impairments can use this device to listen to lessons guided by personalized and natural voices.
This technique can also be used to “communicate” with relatives who have died. With a small sample of the person’s speech, it is possible to reproduce dialogues from the texts and thus, immortalize this part of the loved one. Similarly, it would also be possible to “revive” the cast. There are already some examples of artificial intelligence “reviving” artists on the Internet.
Along the same lines, it is already easy to find practical examples of using the voice cloning feature spreading across social networks. For example, singer Rihanna covering Beyoncé’s “Cut It Off” or Ariana Grande singing Anita’s “Envolver”. However, in these cases the copyright of the songs and the use of the voice of a public figure are also discussed. Since there are no specific laws for these productions, the debate still generates a lot of controversy, and everything indicates that very soon this type of process will need to be regulated by experts.
Furthermore, one of the most controversial uses of voice-cloning artificial intelligence is the possibility of dubbing a film into different languages using the original actor’s performance, or even creating animations with entirely electronic voices. . This option, which is encouraging studios worldwide, has been a major concern for professional voice actors and has led to uncertainty about the effects this technology will have on the audiovisual industry.
What are the risks of voice cloning AI?
AI that can perform speech synthesis can bring many benefits to humanity, but this technology also has some risks that need to be pointed out. The first is that the tool can be used to spread disinformation, as it can compel a public figure, such as a politician or scientist, to “reproduce” fake news and other dangerous speech.
Furthermore, this technology is already being used by criminals to perpetrate scams. The well-known “fake kidnapping scam” just got a more realistic touch with voice-cloning artificial intelligence. Instead of criminals imitating the voice of the alleged victim, they only need to reproduce the speech generated by AI, which will be able to mimic the emotion of the person in a stressful situation. To do this, criminals simply need to take a vocal sample of the person through social media, YouTube or WhatsApp messages.
How to identify if a voice is generated through AI?
As speech synthesis systems become more realistic, it is becoming increasingly difficult to identify whether a voice was generated by artificial intelligence or by a human. However, there are still some ways to go in recognizing AI-generated speech. First of all, an attempt is being made to catch the flaws of the speech. Humans normally make some “mistakes” when speaking, whether they are minor “stutters”, lack of fluency or irregular pauses. However, these verbality marks are not usually present in AI’s speeches
Although these devices are capable of simulating emotions, they are not completely believable to real people. After all, humans are complex creatures who can feel a variety of emotions at the same time. Therefore, it is worth trying to identify changes in tone during speech – if it remains very constant, it is likely to have been generated by a machine.
In addition to all this, with the advancement of these technologies, it became necessary to create our own tools to identify whether something was generated by artificial intelligence. Just as there are platforms that specialize in identifying whether a text was created by ChatGPT or Bard, there are also specialized tools to distinguish speech created by AI that clones voices, such as AI Voice Detector. To do so, simply access the website (aivoicedetector.com) and upload an audio file. Within no time the tool will tell whether the voice is real or created by artificial intelligence.
With input from Forbes, BotTalk, Microsoft, LOVO, TechSpot, Voquant, Make Use Of and MV20.
See also: ChatGPT: 5 tricks you should try on OpenAI chatbot
ChatGPT: 5 tricks you should try on OpenAI chatbot