Generative AI Model: GANs (Part 3)

31 May 2024

Note: Above is an image generated by Stable Diffusion. A classic example of Generative AI!

Welcome back to the thrilling conclusion of the Generative AI Series. In part 1 of the series, we discovered the different components of GANs. Whereas in part 2, a very important concept of cross entropy is explored. Before you go through this blog, please read part 1 and part 2 as well. In this final part, we are going to pay a visit to the advanced versions of GANs.

Previous blogs describe what a general basic model of GANs looks like and what basic loss functions are used for it. Over time, there have been various advancements in the field, and GANs have been able to generate better and better content with additional functionality based on the various applications. Let’s explore these:

Advanced Techniques in GANs:

Conditional GANs (cGANs)

Imagine a scenario where--instead of just generating images out of a set of images--you can describe what is to be generated. cGANs have the ability to generate images based on a description. For instance, “Generate an image of a cat with black fur and wearing glasses”.

But how are cGANs able to do that where as GANs are not?

Reason: GANs generally create images based on random noise (refer to part 1), whereas in cGANs, the noise is adjusted to a specific requirement so that the output is targeted and controlled.

Applications: Though traditional GANs have capabilities of generating realizing images and improving the clarity of the existing images, cGANs can do the more complex jobs due to targeting and controlled output. They can generate text-to-images, image-to-image generation etc.

CycleGANs

CycleGANs are experts in generating images from an input image. For instance, if you have a summer image of a beach and you would like to see what it would have looked like in the winter, cycleGANs are the ones who can do it. It can do it without needing before and after images of the beach (summer and winter images as per the example).

To simplify, consider cycleGANs to be two artists in the working:

Artist 1: Converts Summer image to Winter image
Artist 2: Converts Winter images to Summer images

These artists keep on doing these exercises till they are able to generate realistic images.

Instead of one pair of generators and discriminators, cycleGANs have two pairs for the two tasks.

Generator A: Generates summer image from winter image
Generator B: Generates winter image from summer image
Discriminator A: Evaluates if the image generated by Generator B looks like a real summer photo
Discriminator B: Evaluate if the image generated by Generator A looks like a real summer photo

StyleGANs

StyleGANs is a different type of advancement of GANs, which focus on the realistic nature of the images. They are incredible in generative, highly realistic images of people who do not exist. The main concept behind it is the “Style Mixing”. For this particular example, it can take various features from the faces of individuals and mix these features to create a completely different persona.

StyleGANs also Introduce hierarchical style transfer, where the generator’s layers are explicitly designed to control different levels of detail. Early layers influence broad, high-level features like face shape, while later layers adjust finer details like wrinkles or hair strands. This hierarchy enables more sophisticated and precise image manipulation.

Challenges in Training GANs

After looking at all these types of GANs and their specific use cases, let’s look at the challenges faced while training these models. Since the model is a bit complex to start with and normal neural networks face thousands of challenges, it’s pretty common for the GANs to have challenges.

Mode Collapse: Scenario where the generator generates a very limited variety of outputs, thereby effectively “collapsing” to a variety of data types (“mode”). One way to remediate this is using “Minibatch Discrimination,” which encourages the generator to produce more varied samples by comparing batches of data rather than individual samples.
Training Instability: Due to adverse training, GANs are very much dependent on the generator and discriminator performances. If one of these outperforms, it can lead to poor performance. One very simple way to avoid this is the “Gradient Penalty,” implying regularisation of the gradients so that they remain within a specific range.
Non-convergence: It happens when GANs are not able to attain a stable equilibrium where both the generator and discriminator are improving. One very simple way to mitigate this is to adjust learning rates during the training.
Evaluation Metrics: Traditional metrics like accuracy or ROC do not work in the case of GANs. Therefore, different metrics are to be considered for the evaluation like Inception Score (Measures the quality and diversity of generated images using a pre-trained Inception network) or Frechet Inception Distance (Compares the statistical similarity between generated images and real images, providing a more nuanced evaluation of quality).

Above are some of the distinguishing challenges that GANs face. There are mitigations available, but the mitigations sometimes require another structure or training to be put in place.

This was the end of the series on GANs. But keep an eye on my profiles, as other topics are coming soon!

Also, If you would like me to write about something specific, please let me know!