Logistic Regression: The Workhorse of Classification
Abhay
4 min read
Let’s clear up the worst-named algorithm in machine learning right away: logistic regression does classification, not regression. It answers yes-or-no questions — spam or not spam, fraud or legit, will-churn or won’t — not “how much.” The name is a historical accident that has confused students for decades, and at this point we’re all just stuck with it, like the QWERTY keyboard or the appendix.
Despite the misleading label, logistic regression is the quiet workhorse of the field. It’s fast, it barely needs any data to be useful, and — crucially — it can tell you why it made a decision. In an era of billion-parameter black boxes, that last part is worth a lot.
From a straight line to a probability
A plain linear model computes a weighted sum of your features: w₁x₁ + w₂x₂ + … + b. That sum can be anything from minus-infinity to plus-infinity, which is awkward when you want a probability between 0 and 1. You can’t very well tell someone there’s a 240% chance of rain.
Enter the sigmoid function (also called the logistic function), the bit of mathematical origami that makes the whole thing work:
$$\sigma(z) = \frac{1}{1 + e^{-z}}$$
Feed it any number and it gently squashes the result into the open interval between 0 and 1. A big positive sum gets pushed toward 1, a big negative one toward 0, and a sum of zero lands exactly at 0.5. So logistic regression is really just a linear model wearing an S-shaped curve as a hat: compute the linear combination, then pass it through the sigmoid to get a genuine probability.
The decision threshold
A probability isn’t a decision. To turn 0.73 into an actual answer, you pick a threshold — by default 0.5. Above it, you call it the positive class; below it, the negative.
That 0.5 isn’t sacred, though. If you’re screening for a serious disease, you’d rather catch every possible case and tolerate some false alarms, so you’d drop the threshold to, say, 0.2. If you’re auto-blocking accounts and a wrong block infuriates real users, you’d raise it. Tuning the threshold is the cheapest way to trade false positives against false negatives without retraining anything.
Why people actually love it: interpretability
Here’s where logistic regression earns its keep. Each feature gets a coefficient, and those coefficients are readable. A coefficient operates on the log-odds of the positive class: increase a feature by one unit, and you nudge the log-odds by exactly that coefficient, holding everything else constant.
Exponentiate a coefficient and you get an odds ratio — the genuinely useful number. An odds ratio of 1.5 for “number of late payments” means each additional late payment multiplies the odds of default by 1.5. You can put that in a slide and a regulator will nod. Try explaining a gradient-boosted ensemble’s reasoning to an auditor and watch the room go quiet.
It in code
Modern scikit-learn (the current default solver is lbfgs, with L2 regularization and C=1.0) makes this almost embarrassingly short:
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Scaling matters: it speeds convergence and keeps coefficients comparable
scaler = StandardScaler().fit(X_train)
X_train, X_test = scaler.transform(X_train), scaler.transform(X_test)
model = LogisticRegression(max_iter=1000) # C=1.0, penalty='l2' by default
model.fit(X_train, y_train)
print("Accuracy:", model.score(X_test, y_test))
# Probabilities, not just labels
probs = model.predict_proba(X_test[:3])[:, 1]
print("P(malignant):", probs.round(3))
# Apply a custom threshold (cautious screening)
threshold = 0.3
predictions = (model.predict_proba(X_test)[:, 1] > threshold).astype(int)
Notice predict_proba hands back actual probabilities, and model.coef_ exposes those interpretable weights whenever you want to know what drove a call.
Where it runs out of road
Logistic regression draws a straight decision boundary (a flat hyperplane, technically). If your classes are tangled in a way no straight line can separate — concentric circles, an XOR-shaped mess — it’ll underfit and shrug. You can rescue it with engineered features or polynomial terms, but at some point a tree ensemble or neural net is the honest answer. It also assumes features are roughly informative and not wildly redundant, and it likes its inputs scaled.
The takeaway
Reach for logistic regression first, before anything fancier. It trains in milliseconds, gives you calibrated probabilities, and explains itself — so it doubles as a baseline and a sanity check. The rule of thumb: scale your features, fit the model, read the coefficients to understand your problem, then tune the threshold to match the real-world cost of being wrong. If a more complex model can’t beat this humble baseline by a meaningful margin, you’ve just saved yourself a lot of compute and a great deal of explaining.