How Machines Actually Learn — Gradient Descent in Plain English

Every model — from linear regression to GPT — learns the same way: guess, measure how wrong you are, adjust, repeat. That loop is the whole game.

The 4-step learning loop

Predict — the model makes a guess using its current internal numbers (weights).
Measure error — a loss function scores how far the guess is from the truth. Big error = big loss.
Find the direction — calculus (the gradient) tells us which way to nudge each weight to reduce the loss.
Update — nudge the weights a small step (the learning rate) in that direction. Repeat millions of times.

The mountain analogy

Imagine standing blindfolded on a hill, wanting the lowest valley (minimum loss). You feel the slope under your feet (the gradient) and step downhill. Step too big → you overshoot the valley. Step too small → it takes forever. That step size is the learning rate — the most important knob in ML.

# the essence of training, in pseudocode
for each step:
    prediction = model(input)
    loss       = how_wrong(prediction, true_answer)
    gradient   = slope_of_loss(loss)     # which way is downhill?
    weights    = weights - learning_rate * gradient   # step downhill

Why this matters

"Training a model for 3 days on 8 GPUs" just means running this loop billions of times over huge data. GPT was trained with this exact idea, scaled massively. Once you own this loop, nothing in ML is a black box again.

Next: the three types of machine learning.

← Previous

What is AI, ML, Deep Learning & GenAI? The Real Map

Supervised, Unsupervised & Reinforcement Learning