Deepfakes are finally here and we just became pioneers in using deepfake technology for election campaigning. In the recently held Delhi elections, a BJP leader delivered a speech in Hindi. This was then converted to a speech in Haryanvi not by simply dubbing it but by replacing the original audio as well as using software to modify the leader’s lips to fit the sounds of Haryanvi as well. The result is a proper lip sync that makes the actually misleading video seem quite real.
Such feats don’t scratch the surface of what deepfake technologies are capable of. The powerful idea in such a video is that the original actor’s face in the entire video clip can be replaced with that of another. The replacement is ‘perfect’ in that the ratio of the size of the face to the body looks realistic throughout the video, even if from different angles and lighting conditions, and all movements look authentic. A suitably talented engineer can also alter the tone and the pitch to make the audio match that of the new actor. Thus, we have a seemingly ‘natural’ video whose true colours become visible only with stricter scrutiny.
Deepfake software works using machine learning. Faces are converted to data (0s and 1s) preserving a record of the various facial features such as the size of the nose, the curve of the eyes, the distance between the lip and the nose, etc. These details represent the different expressions a particular face makes in the video. Using this information, purpose-built programmes can ‘learn’ how the face will look under different circumstances, while also keeping track of posture, motions and movements, dimensions and perspectives. The ‘deep’ of ‘deepfake’ stands for for deep learning, the corresponding didactic technique the machine employs to teach itself about the video, and thus produce the ‘fake’.
This technology is useful for creating satirical videos, spoofs or to generate footage when an actor is not available (or dead). However, the nature of the technology is such that it is primed for misuse.
One of the first prominent examples of a deepfake video appeared in 2018: of US President Barack Obama using foul language. The comedian Jordan Peele had produced it as well as voiced Obama. As with the Delhi election video, the face is that of Obama and the voice is Peele’s, and the two were stitched seamlessly together by a deepfake tool, creating the spectacle of a foul-mouthed Obama. This is probably the simplest way to abuse the technology: get a politician to “say” things she never said. If you’re not able to locate a good mimic, software could fix that too through simple audio morphing.
So we can, for example, grab historical footage and reprocess them to produce sensational videos of leaders saying things completely opposed to what they originally said. Imagine Jawaharlal Nehru appreciating superstitious beliefs; Adolf Hitler declaring his love for the Jews; or a pro-Nazi rant delivered by Winston Churchill. History is thus easily falsifiable. Maybe even the present: imagine a leader of a global power declaring war on video against another power today, or a “seditious, anti-national” speech being planted on a political leader’s person or office to have her arrested.
Here is an example of a full deepfake with faces swapped and modified audio:
At a time when so much fake news flowing through social media platforms and messaging apps has already poisoned millions of minds, deepfakes will only create havoc. Within these apps’ echo chambers, a deepfake will become the ultimate proof of users’ beliefs, and for skeptics it will sow doubts strong enough to discredit the truth.
So far the most widespread use of deepfakes has been in creating non-consensual celebrity pornography. A celebrity’s face is swapped onto a pornographic actress in a video, without the celebrity’s consent, to create a clip that looks as if the original actor was the celebrity herself. (Most of these videos ‘feature’ female celebrities.) An automatic extension of this is the creation of revenge porn. All that’s needed is a few, often easily available, videos and images to help the software ‘learn’.
The technology has also been improving in leaps and bounds. Today, deepfakes can be created using consumer-level apps. Samsung recently claimed to have built an algorithm that can generate moving faces from a single image. An impending ‘upgrade’ touts ‘full-body faking’: entire fictitious persons shown walking, sitting and changing posture. One may be able to ‘generate’ people as required to fill up clips of political rallies to demonstrate a leader’s popularity – or an entire rally could be faked.
As such, detecting deepfakes is hard. One could examine individual frames for telltale signs of modification but this is bound to be tedious. Companies interested in such detection have been tending towards machines built to detect machine-generated visuals, i.e. automated detection using AI. Facebook, Twitter and YouTube have just banned some deepfake videos from their platforms by ‘catching’ them the moment they were uploaded, using techniques that examine blinking patterns, unnatural facial distortions, bad lip-sync, etc. Indeed, tech companies have also publicised plenty of detection challenges so they can crowd-source solutions to this problem, even if this might help the detection algorithms themselves produce even better deepfakes to evade detection. This is going to be a great “cat and cat” game.
Some countries have written laws to regulate the use of deepfaking technology. This said, deepfakes should be banned completely in political campaigns and without exceptions. The talk of using this technology for “positive messaging” is merely an excuse to bring it into the political arena. Even in the personal context, the law has an important role to play: non-consensual deepfake pornography should have penalties analogous to child porn or rape videos. For example, the American state of Virginia now covers nonconsensual deepfakes with a law made in the context of revenge pornography (defined as pornography intended to harass or coerce).
This is a step in the right direction and similar laws should be enacted quickly everywhere, especially in India. Additionally, all use of deepfakes, for education, satire, spoof or for (legally acceptable) creative expression must compulsorily include disclaimers stating that the visuals included therein are computer-generated.
If not regulated, deepfakes have the potential to not just rewrite history but drown us in a toxic politics that will thrive on our inability to distinguish the real from the unreal.
Anurag Mehra teaches engineering and policy at IIT Bombay. His policy focus is the interface between technology, culture and politics.