Generative Adversarial Networks (GANs), Explained
Abhay
4 min read
Imagine a forger and a detective locked in a room together, with a single rule: neither leaves until one of them is unbeatable. The forger paints fakes. The detective spots them. Every time the detective catches a fake, the forger learns a little more about what gives the game away. Every time the forger slips one past, the detective sharpens up. They make each other better, relentlessly, until the forger’s paintings are so good the detective is reduced to flipping a coin.
That, in one slightly noir-ish sentence, is a Generative Adversarial Network. Ian Goodfellow proposed the idea in 2014, and for the better part of a decade GANs were the way machines learned to dream up photorealistic images. Let’s open the room and watch the two of them work.
Two networks, one grudge match
A GAN is really two neural networks pointed at each other:
- The generator is the forger. It takes a vector of random noise and tries to turn it into something that looks like real data — a face, a landscape, a sneaker that doesn’t exist.
- The discriminator is the detective. It’s handed a mix of real samples and the generator’s fakes, and its only job is to call each one “real” or “fake.”
The clever bit is the scoreboard. They share a single objective, but they want opposite outcomes — what game theorists call a minimax game. The discriminator wants to maximise how often it’s right. The generator wants to minimise that exact number by fooling it. Neither has a fixed target to chase; each is chasing a moving opponent. That’s the “adversarial” in the name, and it’s both the genius and the headache of the whole approach.
The training loop, in actual code
The loop itself is surprisingly tidy. Each step you train the detective a bit, then train the forger to beat the slightly-smarter detective. In Keras-flavoured pseudocode:
for real_batch in dataset:
# 1. Train the discriminator (the detective)
noise = random_normal(batch_size, latent_dim)
fakes = generator(noise)
d_loss_real = discriminator.train_on_batch(real_batch, ones) # "these are real"
d_loss_fake = discriminator.train_on_batch(fakes, zeros) # "these are fake"
# 2. Train the generator (the forger) via the frozen detective
noise = random_normal(batch_size, latent_dim)
# The generator WANTS the discriminator to label its fakes as real (1)
g_loss = combined_model.train_on_batch(noise, ones)
Notice the sleight of hand on the last line: the generator never sees real data directly. It only ever learns from the gradient that flows back through the discriminator. Its entire education is “here’s why you got caught — do better.” When the system reaches a happy equilibrium, the discriminator’s accuracy hovers around 50%: it’s guessing, because the fakes have become indistinguishable from the real thing.
What GANs are genuinely brilliant at
When GANs work, they sing. They were the first models to produce convincingly photorealistic human faces — NVIDIA’s StyleGAN (Karras, Laine, and Aila, 2019) became the poster child, famously powering thispersondoesnotexist.com and giving artists fine-grained control over attributes like pose, lighting, and the spray of freckles on a cheek. GANs are also blisteringly fast at generation: once trained, producing a sample is a single forward pass through the generator. No iteration, no waiting. That speed keeps them alive today in real-time applications, super-resolution, style transfer, and generating synthetic training data where you need volume in a hurry.
The pains nobody warns you about
Here’s the catch: that adversarial dance is famously temperamental. Two failure modes dominate.
Mode collapse. The generator discovers one fake that reliably fools the detective and then… just makes that, forever. Ask it for handwritten digits and it produces beautiful 8s and nothing else. It found a loophole and stopped exploring — the artistic equivalent of a one-hit wonder playing the same song on loop.
Training instability. Because both networks are chasing a moving target, the losses can oscillate or diverge instead of settling. The discriminator gets too good too fast, the generator’s gradients vanish, and learning stalls. There’s no tidy “loss went down, we’re done” signal — you often have to eyeball the samples to know if it’s working. Researchers have thrown years of fixes at this (Wasserstein loss, gradient penalties, spectral normalisation, and 2024-era tricks like SoftGAN’s “borderline softening”), but stability has never been free.
Why diffusion stole the spotlight
Around 2021, diffusion models began eating GANs’ lunch for high-end image generation. Instead of an adversarial duel, diffusion learns to gradually remove noise from an image step by step — a far more stable training objective that sidesteps mode collapse and produces stunning detail and diversity. It’s the engine behind today’s big text-to-image tools. The trade-off is speed: diffusion’s many-step sampling is slow and compute-hungry, which is exactly where GANs still have an edge. (Diffusion deserves — and gets — its own separate post.)
The takeaway
Think of a GAN as the forger-versus-detective pattern: two networks improving by trying to beat each other. Reach for one when you need fast, single-shot generation and you’re willing to babysit a finicky training run — and budget time for mode collapse and instability, not just the happy path. When you need maximum fidelity and diversity and can afford slower sampling, reach for diffusion instead. The adversarial idea didn’t lose; it just found its lane. Knowing which tool fits which job is the actual skill — the rest is letting the forger and the detective fight it out.