Reading Learning Curves: Diagnosing Model Training

Your model finished training. Accuracy: 94%. You lean back, satisfied. Then it meets real data and falls apart like a sandcastle at high tide. What went wrong? The answer was almost certainly sitting in your learning curves the whole time — you just had to know how to read them.

A learning curve is a humble little plot: model performance on the y-axis, experience on the x-axis. “Experience” usually means training epochs (how long you’ve trained) or training-set size (how much data you’ve fed it). The trick is to draw two lines — one for the training set, one for a held-out validation set — and watch the gap between them. That gap is the most honest tell in all of machine learning.

The four shapes you’ll actually see

Forget memorising theory. Almost every curve you encounter falls into one of four shapes, and each one barks a clear instruction at you.

1. High training error, high validation error (the two lines hug, both stuck high). Your model can’t even fit the data it was given. This is underfitting — the model is too simple, or you stopped training too early. The instruction: add capacity (more layers, more features, a less aggressive regulariser) or just keep training. Throwing more data at an underfit model is like buying a bigger bookshelf for someone who can’t read.

2. Low training error, high validation error (a yawning gap between the lines). The model has memorised the training set and learned nothing transferable. This is the classic overfit, and it’s the one that ambushed your 94% accuracy. The instruction: add regularisation (dropout, weight decay, early stopping), simplify the model, or — the one fix that genuinely works here — get more training data.

3. Both errors low, lines converging together. Congratulations, this is the good fit. Training and validation loss both descend to a low plateau with only a sliver of daylight between them. The instruction: stop fiddling and ship it.

4. Validation loss dips, then turns back up into a “U”. This is overfitting caught in the act, epoch by epoch. The model improved, hit its sweet spot, then started memorising noise. The instruction is delightfully simple: early stopping. Roll back to the epoch where validation loss bottomed out. A sharp, early U often also hints your learning rate is too high.

Plotting them yourself

For epoch-based curves (think neural nets), you log loss each epoch and plot both series:

import matplotlib.pyplot as plt

# history.history comes from a Keras model.fit() call
plt.plot(history.history["loss"], label="train")
plt.plot(history.history["val_loss"], label="validation")
plt.xlabel("epoch")
plt.ylabel("loss")
plt.legend()
plt.title("Learning curve")
plt.show()

If you’re in scikit-learn land and want the data-size view — “would more examples even help?” — there’s a built-in for that:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import learning_curve
from sklearn.ensemble import RandomForestClassifier

sizes, train_scores, val_scores = learning_curve(
    RandomForestClassifier(), X, y,
    train_sizes=np.linspace(0.1, 1.0, 5), cv=5,
)

plt.plot(sizes, train_scores.mean(axis=1), label="train")
plt.plot(sizes, val_scores.mean(axis=1), label="validation")
plt.xlabel("training examples")
plt.ylabel("score")
plt.legend()
plt.show()

That second plot answers a question people waste weeks on: should I collect more data? If the validation curve is still climbing steeply at the right edge, yes — more examples will pay off. If it has flattened into a plateau while a gap remains, more data won’t save you; you need a different model or better features.

A couple of things that trip people up

The two axes — epochs and training-set size — tell different stories, so don’t confuse them. The epoch view diagnoses how you trained (too long? too short?). The data-size view diagnoses whether more data helps. You often want both.

Also, a noisy, jittery validation curve isn’t always overfitting — it’s frequently just a validation set that’s too small or a learning rate that’s too high. Smooth it before you panic. And always plot loss, not just accuracy: accuracy is a step function that hides the slow, telling drift of a model creeping into overfitting.

The takeaway

Next time training finishes, resist the urge to celebrate a single number. Plot train and validation together, then run the checklist:

Both high? → underfitting → more capacity, train longer.
Train low, val high? → overfitting → regularise, simplify, get more data.
Both low, converged? → ship it.
Val curve makes a U? → early stopping at the dip.

The shape of the gap is the diagnosis. Learn to read it, and your models will surprise you a lot less — which, in this line of work, is exactly the goal.

Reading Learning Curves: Diagnosing Model Training

The four shapes you’ll actually see

Plotting them yourself

A couple of things that trip people up

The takeaway

More posts

Model Explainability: Making Sense of SHAP and LIME

Diffusion Models: How AI Generates Images

Q-Learning and the Bellman Equation, Explained