How Neural Networks Actually Work

Neural networks have a branding problem. The name conjures up glowing brains and science-fiction sentience, when the reality is closer to a very enthusiastic spreadsheet that learned to multiply. Strip away the mystique and a neural network is just a pile of numbers that get nudged, over and over, until they stop being wrong so often. That’s it. That’s the magic trick. Let me show you the gears.

The neuron: a weighted vote

The basic unit is the neuron, and it does something almost insultingly simple. It takes some inputs, multiplies each by a weight, adds them all up, tacks on a bias, and squishes the result through an activation function.

Weights are how much the neuron cares about each input. A big weight means “this input matters a lot”; a near-zero weight means “ignore this.” The bias is a constant nudge that shifts the whole result up or down — think of it as the neuron’s baseline mood before it even looks at the data. Together, weights and biases are the network’s learnable parameters: the only things that change when it learns.

The activation function is the secret sauce. Without it, stacking neurons would just give you a fancy linear equation, and no amount of stacking turns a straight line into something that can recognise a cat. Activation functions bend the line. The two you’ll meet constantly are ReLU (max(0, x) — brutally simple, keeps the positives, zeroes the negatives) and sigmoid (squashes anything into the range 0 to 1, handy for probabilities).

Layers: input, hidden, output

One neuron is a party trick. The power comes from arranging them in layers.

The input layer is just your data: pixel values, word counts, sensor readings.
The hidden layers in the middle do the real thinking. Early layers learn crude features (edges, blobs), later layers combine them into something meaningful (whiskers, ears, “yep, that’s a cat”). “Deep learning” literally just means a network with lots of hidden layers.
The output layer delivers the verdict — one number for “spam or not,” or ten numbers for “which digit is this?”

Data flows left to right, layer to layer. This is the forward pass, and it’s mostly just matrix multiplication. Here’s a tiny one in NumPy, two inputs into a small hidden layer and out to a single prediction:

import numpy as np

def relu(x):     return np.maximum(0, x)
def sigmoid(x):  return 1 / (1 + np.exp(-x))

# inputs
x = np.array([0.7, 0.2])

# layer 1: 2 inputs -> 3 hidden neurons
W1 = np.array([[0.4, -0.5, 0.1],
               [0.3,  0.8, -0.2]])
b1 = np.array([0.1, -0.3, 0.05])

# layer 2: 3 hidden -> 1 output
W2 = np.array([[0.6], [-0.9], [0.4]])
b2 = np.array([0.2])

h = relu(x @ W1 + b1)        # forward pass, layer 1
y = sigmoid(h @ W2 + b2)     # forward pass, layer 2
print(y)                     # -> a single probability

That’s a complete neural network. No frameworks, no GPUs, no hype — just multiply, add, activate, repeat.

How it actually learns

Here’s the part everyone wants to understand. When you first build a network, those weights are random. Feed it a cat and it’ll confidently announce “37% banana.” It’s not stupid; it just hasn’t been corrected yet.

So you correct it. You show it an example with a known answer and measure how wrong the guess was using a loss function — a single number where bigger means worse. The entire goal of training is to make that number small.

To do that, the network needs to know which weights to blame for the error. Enter backpropagation. Using the chain rule from calculus, it works backwards from the output, calculating how much each individual weight and bias contributed to the mistake. This gives a gradient: a direction for every parameter that says “nudge me this way to be less wrong.”

Then gradient descent takes a small step in that direction, adjusting every weight a tiny bit. Run a cat through, measure the loss, backpropagate, nudge. Run a dog through, measure, nudge. Do this a few million times across thousands of examples, and the random numbers slowly self-organise into a configuration that gets things right. The network isn’t memorising answers — it’s discovering the patterns that generate the answers. That’s what “learning” means here: error going down, one nudge at a time.

The takeaway

A neural network is four ideas wearing a trench coat: weighted sums, activation functions for non-linearity, a loss function to measure wrongness, and gradient descent to fix it. If you want to make this concrete, do this next: open a notebook, paste the NumPy snippet above, and change the weights by hand. Watch the output move. Then imagine an algorithm doing that adjustment billions of times, automatically, chasing a smaller loss. That’s not a brain — it’s arithmetic with excellent aim. And honestly, once you see the gears, that’s somehow more impressive than the magic ever was.

How Neural Networks Actually Work

The neuron: a weighted vote

Layers: input, hidden, output

How it actually learns

The takeaway

More posts

Reading Learning Curves: Diagnosing Model Training

Model Explainability: Making Sense of SHAP and LIME

Diffusion Models: How AI Generates Images