Recommender Systems, Explained

You open Netflix to watch one specific thing, and forty minutes later you’re three episodes into a Korean baking competition you didn’t know existed. Amazon sees you buy a tent and cheerfully suggests a sleeping bag, a headlamp, and — somewhat ominously — bear spray. This is not magic, and it is not (entirely) surveillance. It’s a recommender system: software whose entire job is to guess what you’ll want next from a catalogue too big for any human to browse. Let’s pull back the curtain.

Two ways to guess: content vs. collaborative

There are two classic philosophies for “what should I show this person?”

Content-based filtering looks at the items. It describes each movie or product with features — genre, director, price, “contains explosions” — and recommends things similar to what you already liked. Loved a gritty space thriller? Here’s another gritty space thriller. It’s intuitive and works fine for new items, but it traps you in a bubble: it will never suggest the rom-com that would’ve surprised you, because nothing in your history says “rom-com.”

Collaborative filtering ignores the item’s description entirely and looks at behaviour. The slogan: people who agreed in the past will probably agree again. It comes in two flavours:

User-based: find people whose taste resembles yours, then recommend what they liked that you haven’t seen.
Item-based: find items that tend to be liked by the same people (“customers who bought this also bought…”). Amazon famously leaned on this because items are more stable than fickle humans.

The beauty of collaborative filtering is that it can recommend things it knows nothing about. It doesn’t need to understand why fans of obscure jazz also love a particular sci-fi novel — it just notices that they do.

Matrix factorization, in plain terms

Here’s the engine that powered the famous Netflix Prize. Imagine a giant spreadsheet: users as rows, movies as columns, ratings in the cells. It’s mostly empty, because nobody has watched everything. Our job is to fill in the blanks.

Matrix factorization says: every user and every movie can be described by a short list of hidden traits — call them latent factors. Maybe factor 1 is “how much romance,” factor 2 is “how arty,” factor 3 is “how much it secretly wants to be a musical.” We don’t name these; the algorithm discovers them. A user gets a vector of preferences, a movie gets a vector of attributes, and a predicted rating is just the dot product: how well your tastes line up with the movie’s traits.

import numpy as np

# 4 users, 3 latent factors each
users = np.random.rand(4, 3)
# 5 movies, the same 3 factors
movies = np.random.rand(5, 3)

# Predicted rating for user 0 on movie 2 = alignment of their vectors
prediction = users[0] @ movies[2]
print(round(prediction, 2))

Training nudges those vectors until the dot products match the ratings we do have. The clever bit: once trained, the model predicts ratings for cells that were always empty. Two large, dense vectors stand in for one enormous, sparse spreadsheet.

The cold-start problem

All of this falls apart at the worst possible moment: when someone new arrives. A brand-new user has rated nothing, so collaborative filtering has no behaviour to lean on. A brand-new item has been rated by no one, so it’s invisible. This is the cold-start problem, and it’s the reason every app nags you to “pick 3 things you like” on signup — it’s desperately trying to warm you up.

Common fixes: fall back to content-based features for new items, recommend popular items to new users, or ask a few quick questions. It’s the recommender equivalent of small talk before the real conversation.

Modern systems: deep, hybrid, and shameless

Today’s production systems rarely pick one approach — they cheat by combining everything. Hybrid recommenders blend collaborative signals with content features. Deep learning models (neural collaborative filtering, two-tower architectures, transformers) learn richer, non-linear patterns and can fold in images, text, audio, and your last 200 clicks. Recent 2025 research even mixes review text and item descriptions into neural matrix factorization specifically to soften cold start. The latent factors are still there in spirit — they’ve just grown into learned embeddings.

How do you know it’s any good? Precision@k

You can’t show a user 10,000 ranked items; you show maybe ten. So evaluation focuses on the top of the list. Precision@k asks: of the k items I recommended, how many did the user actually like?

def precision_at_k(recommended, relevant, k):
    top_k = recommended[:k]
    hits = sum(1 for item in top_k if item in relevant)
    return hits / k

recs = ["a", "b", "c", "d", "e"]
liked = {"b", "e", "z"}
print(precision_at_k(recs, liked, 5))   # 0.4 → 2 of 5 hit

Pair it with recall@k (did we find the good stuff at all?) and ranking metrics like NDCG, which reward putting the best item first, not fifth.

The takeaway

Next time you build one, start simple and earn the complexity: item-based collaborative filtering is a shockingly strong baseline you can ship in an afternoon. Reach for matrix factorization when your data is large and sparse, keep a content-based fallback ready for the cold-start moment, and judge everything with precision@k on a held-out set before you trust your own cleverness. Fancy deep models are worth it only once the simple ones stop improving — and they will tell you when. The bear spray, however, is on you.