Overview Of How To Create Deepfakes – It’s Scarily Simple

0

Deepfakes are simple to make. Scarily simple if you are concerned about abuse. Early deepfakes were – perhaps unsurprisingly – focused on pornography. More terrifying use cases include the use of deepfakes for fake alibis in courtrooms, extortion, or terrorism. In Deepfakes – the danger of Artificial Intelligence that we will learn to manage better, I outline how transparency, regulation, and education will improve the detection of deepfake technology – which could help us combat its misuse. This article will focus more on the technical side of how deepfakes work.

Today, almost anyone can manipulate videos, audio, and images to make them look like something else. You don’t need programming skills to create a deepfake. You can create it for free in less than 30 seconds using sites like my Heritage, d-id, or any of the many free deepfake applications. Please use these tools in an ethical and morally acceptable way.

Is it that easy?

Hold on – AI and Deep Learning are that easy? Of course, it’s not that easy: There is a big difference between using a model and training a model. Before we can reach the point where we have such a self-serve tool, we must first build a model that enables it. Underlying all deepfake tools are Artificial Intelligence (AI) models. These models need a lot of training data – and creating them is not simple.

These models are based on neural networks. They mimic an architecture inspired by the information processing of nodes in our brains. Unlike our brains, however, artificial neural networks tend to be static and binary, while our brain works dynamically and analog. Looking under the hood of an neural network model, you will see it’s a layered stack of regression functions.

AI architecture for deepfakes

An academic study by Goodfellow et al. in 2014 reinvigorated the interest in deepfakes through a new deep learning architecture called Generative Adversarial Networks (GANs). GAN set up two neural networks to compete against each other (hence the “adversarial”). The first neural network is a so-called generative neural network. Based on a random seed, it creates a realistic image through a process called decoding. More on this is below. It works like a reverse pixelate effect on images:

Since the initial input is random, the created image is entirely fake. Check out, for example, the images at thispersondoesnotexist.com. All of them are fake. All of them were generated based on a random seed.

Now you’re thinking: What about the other neural network set up by GAN? I’m glad you ask – this is where the “adversarial” component comes in. The second neural network – called the discriminative classifier – checks the first, generative neural network. Essentially the second neural network is checking whether the image of the first network is real or fake. This way, the two neural networks train each other and become more and more realistic.

This is pretty cool! Today, though, most tools for deepfakes are not using GANs. Rather, they leverage either encoder-decoder pairs or first-order motion models. Let’s explain what both of those are.

Encoder-decoder pairs

Deep Learning models have different layers. Each layer represents a mathematical abstraction of the prior layer. We call this a latent representation. Or said differently, what you see in those layers (if you were to open up the Deep Neural Network Black Box) is no longer the reality that one can observe, but rather an inferred state based on the mathematical model of the AI layers before. Going from an original image to a latent image is called encoding. The process is similar to humans. We see a cat, and we use the word (e.g., representation) “CAT” for it. How do we know that a given object is a cat? Well, we have seen many cats. Thus we identify them. Our brain has created a connection between the image and the encoded-word “cat.”

Let’s turn this process around. Close your eyes and picture a cat. Can you? Sure you can. How? By using the knowledge stored in your brain. Computers do this the same way. Once the computer has encoded many similar images (like images from a cat etc.), it can reverse this process, and go from “cat” to an image. This process is called decoding. Take a look at Dall-E 2 to see how powerful this process can be.

Below is the process of encoding and decoding an image of a “2”. After the encoding stage, the computer stores the information or latent image for a number this is just the information “this is a two”. Next, the decoder will revert this information back into an image of a “2 “based on how the computer imagines it.

To create a Deepfake Generator, we need such an Encoder-Decoder pair. The encoder extracts latent features of face images, and the decoder uses this information to reconstruct the face images from the latent features. The encoder’s job is to draw a latent version of the original image that captures their emotions/expressions. The decoder’s job is then to “re-draw” the original image from the latent version. The encoder and the decoder are recurrent neural networks that train themselves to improve exponentially by practicing on thousands of source/target images.

To generate a deepfake, the decoder for the target draws the target image with the source’s latent features (expressions), and voila! We have a deepfake image. The image below is from Nguyen et al. paper and shows the process.

First-Order Motion Model

A slightly different approach for deepfakes is to replace the encoder with a motion model. Anyone who has used Snap will have used such an approach. The underlying AI detects facial expressions, eye movements, and head position – which are then superimposed on either a unicorn, a potato, or whatever Snap-Filter is en-vogue.

The initial neural network is trained on many hours of real video footage of people to help the AI recognize various important features of the person’s face – such as the eyes, upper and lower lips, teeth, ears, eyebrows, an outline of the face, etc. The computer automatically maps the right parts of one’s source images onto the destination image. Since a video is a collection of images (frames), placed one after the other, this neural network allows us to photoshop each frame in a video.

To explain how deepfakes work, we created a simple Google co-lab for you that builds on the initial code from Aliaksandr Siarohin – who (no surprise here) – works for Snap. In our Google co-lab, all you need to do is upload a folder with the target image, upload your own video (source) to your google drive, and run the co-lab notebook block-by-block. With a bit of post-processing, you can make your own short deepfake video. For example, I use this little video to welcome my students to class each term.

Do you want to learn more about how it works? Aliaksandr posted a very good video explaining that here.

It’s just the beginning

Tools to create deepfakes are constantly improving. As you see below, the amount of research done in this space is also rising exponentially. There’s more to come, and we will be here to update you.

MORE FROM FORBESDeepfakes – The Danger Of Artificial Intelligence That We Will Learn To Manage Better

This article was written with Prithvi Sriram, who has not only been a student of the course but helped create tool sets that future students can use to get their hands on Deep Learning. He currently works at Infinitus Systems, a late series B healthcare startup, where he was the founding member of the analytics team.

Stay connected with us on social media platform for instant update click here to join our  Twitter, & Facebook

We are now on Telegram. Click here to join our channel (@TechiUpdate) and stay updated with the latest Technology headlines.

For all the latest Technology News Click Here 

Read original article here

Denial of responsibility! Rapidtelecast.com is an automatic aggregator around the global media. All the content are available free on Internet. We have just arranged it in one platform for educational purpose only. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials on our website, please contact us by email – [email protected]. The content will be deleted within 24 hours.
Leave a comment