What Is Deep Learning (and When You Don't Need It)

Somewhere along the way, “AI” and “deep learning” became the same word in most people’s heads. So now every problem gets the same prescription: throw a neural network at it. Got a spreadsheet of 4,000 customers and want to predict churn? Neural network. Tiny dataset, three features, a deadline? Neural network, obviously.

This is a bit like buying a forklift to carry your groceries. Impressive machine. Wrong job. Let’s untangle what deep learning actually is, why it’s so hungry, and the unglamorous truth about when a humble decision tree will quietly beat it.

So what makes it “deep”?

Deep learning is a flavour of machine learning built on neural networks — layers of simple math units that pass signals forward, each layer transforming the output of the last. The “deep” just refers to having many of those layers stacked up. (Technically, more than one hidden layer earns you the word; modern networks have dozens to hundreds.)

The point of all those layers is hierarchical features. Early layers learn crude patterns — an edge, a blob of colour. Later layers combine those into eyes, then faces, then “that’s a golden retriever.” You don’t tell it what an edge is; it works that out from the data. That’s the magic trick, and also the catch: nobody hand-coded those features, which means the network has to discover them, and discovery takes examples. Lots of them.

I won’t go layer-by-layer into the internals here — that’s its own rabbit hole. The thing to hold onto is: depth buys you automatic feature-learning, and you pay for it in data and compute.

Why it’s so hungry

A deep network can have millions or billions of parameters — knobs it tunes during training. Tune millions of knobs with a few hundred examples and you don’t get intelligence, you get a very confident memoriser. To actually generalise, you need data roughly proportional to the model’s appetite, and training cost scales brutally — at least with the square of your dataset size in the overparameterised regime.

That’s why deep learning’s golden age arrived alongside two things: enormous labelled datasets and GPUs. The GPU isn’t a luxury; storing all those weights and the intermediate activations for backpropagation eats memory like nothing else. No big data, no big silicon, no deep learning party.

Where it genuinely shines

Deep learning earns its keep on unstructured, high-dimensional data — the stuff where features are tangled and you’d never write them by hand:

Images — classification, detection, segmentation, generation.
Audio — speech recognition, music, sound events.
Language — translation, summarisation, and yes, the chatbots that ate the internet.

These domains share a trait: a single raw input (a pixel, a sample, a token) is meaningless alone, and meaning lives in patterns across thousands of them. That’s exactly the problem hierarchical layers were born to solve.

Where boring old models still win

Now the part nobody puts in the keynote. For tabular data — rows and columns, the bread and butter of actual businesses — gradient-boosted trees like XGBoost and LightGBM still routinely beat deep learning. This isn’t folklore; the landmark Grinsztajn et al. (2022) benchmark, Why do tree-based models still outperform deep learning on tabular data?, found tree ensembles the most reliable choice for medium-sized datasets (under ~10,000 rows).

Why? A few reasons:

Neural nets prefer smooth functions. Tabular targets are often jagged and irregular — trees handle the cliffs; networks try to sand them down.
Trees shrug off junk features. Performance held up even after removing half the less-useful columns.
They’re data-efficient. Trees do well on thousands of rows. Deep nets are still warming up at that scale.
They’re cheaper and more interpretable. Better results, less tuning, smaller compute budget — and you can actually explain a prediction.

# The "deep learning, obviously" reflex...
model = build_a_giant_neural_net()  # GPU, hours, much hyperparameter angst

# ...vs the answer that probably wins on your CSV:
from xgboost import XGBClassifier
model = XGBClassifier()
model.fit(X_train, y_train)         # minutes, CPU, embarrassingly good

Do you actually need it? A quick gut-check

Run your problem through this before reaching for the forklift:

Is your data unstructured (images, audio, free text)? → Deep learning is likely the right tool.
Is it tabular (spreadsheet, database export)? → Start with gradient-boosted trees. Seriously, start there.
Do you have lots of labelled examples (tens of thousands plus) and the compute to match? → Deep learning becomes viable.
Small data, tight deadline, or need to explain every decision? → Classical ML, every time.

The honest rule of thumb: start simple, and let the simple model fail before you escalate. A logistic regression or a boosted tree you can train in two minutes is the best baseline you’ll ever have — sometimes it’s also the last model you’ll ever need. Deep learning is spectacular at the problems it was built for. The skill isn’t knowing how to use it; it’s knowing when not to.

What Is Deep Learning (and When You Don't Need It)

So what makes it “deep”?

Why it’s so hungry

Where it genuinely shines

Where boring old models still win

Do you actually need it? A quick gut-check

More posts

Reading Learning Curves: Diagnosing Model Training

Model Explainability: Making Sense of SHAP and LIME

Diffusion Models: How AI Generates Images