Perfect fake videos will be recognized by artificial intelligence
Perfect fake videos will be recognized by artificial intelligence
The premiere of the opera about the poisoning of Litvinenko begins in England

A year ago, Stanford's Manish Agrawala helped develop lip-syncing technology that allowed video editors to alter the words of speakers almost imperceptibly. The tool could easily insert words that a person never spoke, even in the middle of a sentence, or delete words that he said. Everything will look realistic to the naked eye and even to many computer systems.

This tool has made it much easier to fix mistakes without re-shooting entire scenes, and has also adapted TV shows or movies for different audiences in different places. But this technology has also created disturbing new opportunities for hard-to-find fake videos, with the clear intent of distorting the truth. For example, a recent Republican video used a rougher technique for an interview with Joe Biden.

This summer, Agrawala and colleagues at Stanford and UC Berkeley presented an artificial intelligence-based approach to lip-sync technology. The new program accurately detects more than 80 percent of fakes, recognizing the smallest inconsistencies between the sounds of people and the shape of their mouths.

But Agrawala, director of the Stanford Institute for Media Innovation and professor of computer science at Forest Baskett, who is also affiliated with Stanford Institute for Human-Centered Artificial Intelligence, warns that there is no long-term technical solution to deep fakes.

How fakes work

There are legitimate reasons for video manipulation. For example, anyone filming a fictional TV show, movie, or commercial can save time and money by using digital tools to correct errors or customize scripts.

The problem arises when these tools are deliberately used to spread false information. And many of the techniques are invisible to the average viewer.

Many deep fake videos rely on face swaps, literally superimposing one person's face on another person's video. But while face changer tools can be compelling, they are relatively crude and usually leave digital or visual artifacts that a computer can detect.

On the other hand, lip sync technologies are less visible and therefore more difficult to detect. They manipulate a much smaller portion of the image and then synthesize lip movements that match exactly how a person's mouth would actually move if he or she spoke certain words. According to Agrawal, given enough samples of a person's image and voice, a fake producer can make a person “say” anything.

Counterfeit detection

Concerned about the unethical use of such technology, Agrawala worked with Ohad Freed, a doctoral student at Stanford, to develop a detection tool; Hani Farid, professor at the University of California, Berkeley School of Information; and Shruti Agarwal, a doctoral student at Berkeley.

At first, the researchers experimented with a purely manual technique in which observers studied video footage. It worked well, but in practice it was labor intensive and time consuming.

The researchers then tested an artificial intelligence-based neural network that would be much faster to do the same analysis after training on video with former President Barack Obama. The neural network detected more than 90 percent of Obama's own lip-syncing, although the accuracy for other speakers dropped to about 81 percent.

A real test of the truth

The researchers say their approach is just part of the cat and mouse game. As deep forgery techniques improve, they will leave even fewer keys.

Ultimately, says Agrawala, the real problem is not so much fighting deeply fake videos as fighting disinformation. In fact, he notes, much of the misinformation arises from distorting the meaning of what people actually said.

“To reduce misinformation, we need to improve media literacy and develop accountability systems,” he says. "This can mean laws prohibiting the deliberate production of misinformation and the consequences of violating them, as well as mechanisms for eliminating the resulting harm."

Popular by topic