Generative Adversarial Networks

Something about creation feels decidedly human. Many traditional machine learning tasks, such as classification or prediction, are impersonal and calculating – which is to say they more or less fall within the domain where we expect a machine to be useful. The task of creation, however, feels different. It does not feel like the type of task where a machine should be able to succeed. And yet despite this, or perhaps because of it, there is a rich body of work focused on teaching machines to generate novel data with impressive results.

The goal for this type of work is straightforward – we want to get a computer to be able to create, on demand, some made-up but convincing data that is difficult to distinguish from real data. These real data can be anything we want to recreate. Perhaps we want the computer to generate synthetic images of human faces. Our task, then, is to feed it as many images of faces as we can and ask it to generate its own “fake” faces.

What’s so hard about generating synthetic data?

Let’s take a step back and appreciate the difficulty of this ask. In order for a computer to create believable images of human faces from scratch, it needs to understand what a face is. Suddenly our task feels less quantitative and more philosophical. After all, how would you define a face to someone with no prior knowledge? You could list some required elements – eyes, nose, teeth. Each of these elements would then have to be defined. Almost certainly each of these elements would be defined with respect to some other object which itself would need defining to infinite regress.

Our difficulties do not end there. After all, what about faces which are not smiling? Perhaps we cannot see the teeth, and must therefore contend that as far as images are concerned teeth are not a necessity. And what about the placement of these elements? They must be aligned properly, of course, with eyes a certain distance apart and so on. And how do we connect these elements? Merely stitching together eyes, ears, mouth in roughly the correct order will unlikely give a convincing image, which also includes remnants of the three dimensional world in which the photograph was captured. Indeed, it seems amazing that a computer could ever make progress in this arena, given the multitude of difficulties it will inevitably face.

Learning through adversity

So how do we make progress? While there are various computational methods that attempt to create synthetic data, we will focus here on a technique called generative adversarial networks, which is one of the most exciting recent developments in machine learning. The basic idea is to pit two competing models against one another, with each model improving as it competes with the other.

The specific process goes as follows: we first wire together a generic neural network which will create a random image on request (for a primer on neural networks see Jon’s excellent post here). This network will, of course, start out very uninteresting, yielding images with each pixel taking on a random color. For the remainder of this post we will refer to this network as the generator, as it is used to generate our images.

Our task, now, is to train this generator to yield images which look like faces. Here comes the clever step – to do so, we introduce a second network, independent of the generator. This second network has a much different task. It’s job is to identify fake images and distinguish them from real ones. For this reason, we call this network the discriminator.

Using these two networks, each with competing interests, we leverage the power of competition to improve our generated data. We do so by combining some real images from our data with some fake images from our generator and feeding this batch to the discriminator, which will try to distinguish between the two. With each batch, we tweak the wiring on the discriminator and generator a tiny amount in order to improve each network’s performance. Because these networks work in opposition to each other, we have a zero-sum game where a given network’s improvement comes at the cost of the other network’s performance. Both generator and discriminator will continue to find new and more complex ways to best the other as we iterate this process, and after many iterations we end up with two powerful networks predicated upon the premise that iron sharpens iron.

We see here a clear advantage in our use of machine learning to solve this problem. Whereas many traditional computational approaches focus on designing complex and task-specific algorithms, machine learning allows us to side-step the difficulties in face generation discussed above. Rather than having to explicitly define what a face is, we allow the computer to determine its own definition based on a multitude of examples. We can use the generator, trained through competition with the discriminator, to create synthetic data which is often difficult to identify as fake. Some results from a recent paper using this approach are shown below, which was trained on images of celebrity faces. All of these images are fake.

How can this fake data be useful?

While such results are certainly impressive, what are some possible applications of this synthetic data? First, it is important to remember what we have accomplished here. To be able to generate fake data, our neural network must, in a sense, learn its own definition of the data. A prerequisite to creation is understanding. Successful networks, therefore, will necessarily have gained some useful knowledge about the data in order to recreate it. This learning can be leveraged in broader, more complex data solutions to improve accuracy.

To be more specific, many machine learning techniques require large data sets in order to be effective. In practice, this is often difficult if not impossible to accommodate. If, however, a generator can be successfully trained, it could expand the size of the data and therefore the accuracy of the process. Many times in biomedical data, for example, it is just not clinically viable to gather enough data to train traditional neural networks. It is here where synthetic data can be effectively leveraged to allow these powerful techniques to operate on small samples of data.