Every model — from linear regression to GPT — learns the same way: guess, measure how wrong you are, adjust, repeat. That loop is the whole game.
The 4-step learning loop
- Predict — the model makes a guess using its current internal numbers (weights).
- Measure error — a loss function scores how far the guess is from the truth. Big error = big loss.
- Find the direction — calculus (the gradient) tells us which way to nudge each weight to reduce the loss.
- Update — nudge the weights a small step (the learning rate) in that direction. Repeat millions of times.
The mountain analogy
Imagine standing blindfolded on a hill, wanting the lowest valley (minimum loss). You feel the slope under your feet (the gradient) and step downhill. Step too big → you overshoot the valley. Step too small → it takes forever. That step size is the learning rate — the most important knob in ML.
# the essence of training, in pseudocode
for each step:
prediction = model(input)
loss = how_wrong(prediction, true_answer)
gradient = slope_of_loss(loss) # which way is downhill?
weights = weights - learning_rate * gradient # step downhillWhy this matters
"Training a model for 3 days on 8 GPUs" just means running this loop billions of times over huge data. GPT was trained with this exact idea, scaled massively. Once you own this loop, nothing in ML is a black box again.
Next: the three types of machine learning.