Hyperparameter Tuning: Grid, Random, and Bayesian Search

You trained a model, it scored 0.83, and now you’re staring at a wall of settings wondering which dials to turn. Welcome to hyperparameter tuning — the part of machine learning that’s equal parts science, patience, and resisting the urge to brute-force everything overnight. The good news: there’s a clear progression of techniques, from “try everything” to “be clever about it,” and knowing which to reach for will save you days of wasted compute.

Parameters vs hyperparameters

First, a distinction people fumble constantly. Parameters are what the model learns from data — the weights in a neural network, the split points in a tree. You never set those by hand; training does.

Hyperparameters are the knobs you set before training begins: the learning rate, the number of trees, the regularization strength, how deep a tree can grow. The model can’t learn these from the data directly, so finding good values is a separate search problem layered on top of training. That search is what we’re tuning.

Grid search: exhaustive and expensive

The most obvious approach: list the values you want to try for each hyperparameter, then test every combination. This is grid search, and scikit-learn makes it a one-liner.

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV

param_grid = {
    "n_estimators": [100, 300, 500],
    "max_depth": [4, 8, 16],
    "min_samples_leaf": [1, 2, 4],
}

search = GridSearchCV(
    RandomForestClassifier(random_state=42),
    param_grid,
    cv=5,                 # 5-fold cross-validation
    scoring="f1",
    n_jobs=-1,            # use all cores
)
search.fit(X_train, y_train)
print(search.best_params_, search.best_score_)

Grid search is thorough and reproducible, but it has a fatal flaw: combinatorial explosion. That tidy little grid above is already 3 × 3 × 3 = 27 combinations, and with 5-fold cross-validation that’s 135 model fits. Add a fourth hyperparameter and you’re past 400. This is the curse of dimensionality wearing a lab coat — every new dial multiplies your runtime. Grid search is fine for two or three hyperparameters with a handful of values each. Beyond that, it becomes a way to keep your GPU warm and your patience cold.

Random search: often better per dollar

Here’s the counterintuitive result that changed how people tune: picking combinations at random usually beats grid search for the same compute budget. Bergstra and Bengio showed this back in 2012, and it still holds. The intuition is that in most problems only a few hyperparameters actually matter much. A grid wastes its budget testing the same value of an important dial over and over while it methodically varies an irrelevant one. Random search, by sampling freely, tries far more distinct values of the parameters that count.

from scipy.stats import randint
from sklearn.model_selection import RandomizedSearchCV

param_dist = {
    "n_estimators": randint(100, 800),
    "max_depth": randint(3, 20),
    "min_samples_leaf": randint(1, 10),
}

search = RandomizedSearchCV(
    RandomForestClassifier(random_state=42),
    param_dist,
    n_iter=40,            # you control the budget directly
    cv=5,
    scoring="f1",
    n_jobs=-1,
    random_state=42,
)
search.fit(X_train, y_train)

The killer feature is n_iter: you decide exactly how many fits you can afford, instead of letting the grid dictate it. Forty smart random draws will often match what a 400-cell grid finds, in a tenth of the time.

Bayesian search: learning as it goes

Both methods above are memoryless — they never use what they’ve already learned to decide what to try next. Bayesian optimization fixes that. It builds a probabilistic model of “settings → score,” then uses it to pick the next combination most likely to improve, balancing exploration (untried regions) against exploitation (near known-good spots). Optuna is the go-to library here, using a Tree-structured Parzen Estimator under the hood.

import optuna
from sklearn.model_selection import cross_val_score

def objective(trial):
    params = {
        "n_estimators": trial.suggest_int("n_estimators", 100, 800),
        "max_depth": trial.suggest_int("max_depth", 3, 20),
        "min_samples_leaf": trial.suggest_int("min_samples_leaf", 1, 10),
    }
    model = RandomForestClassifier(**params, random_state=42)
    return cross_val_score(model, X_train, y_train, cv=5, scoring="f1").mean()

study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=40)
print(study.best_params, study.best_value)

Optuna also prunes hopeless trials early, so it doesn’t burn cycles finishing a run that’s clearly losing. In the same 40 trials, it tends to land on better settings than random search because each new guess is informed by all the previous ones.

Rules that keep you honest

Whichever method you pick, two rules are non-negotiable:

Always tune with cross-validation. A single train/validation split can flatter a lucky configuration. Notice every example above uses cv=5 — that’s the floor, not the ceiling.
Never touch the test set. Tuning is fitting to your validation data; if you peek at the test set while tuning, your final score becomes fiction. Lock it away and evaluate on it exactly once, at the end.

A few practical tips: pick sensible ranges (a learning rate of 50 helps no one); use log-uniform scales for things like learning rate and regularization; and turn on early stopping for boosting and neural nets so each trial bails the moment it stops improving.

The takeaway

Match the tool to the problem. Two or three dials with a few values each? Grid search is honest and fine. A messier space and a fixed compute budget? Reach for random search — it’s the best value per dollar. Tuning something expensive where every trial counts? Let Optuna’s Bayesian search think for you. And whatever you do, tune on cross-validated folds and keep the test set sealed until the very end. The model you ship is only as trustworthy as the score you never cheated to get.