Supervised learning splits into two tasks. Picking the right one — and its right metric — is step one of every project.
Regression — predict a number
from sklearn.linear_model import LinearRegression # predict house price from size model = LinearRegression() model.fit(X_train, y_train) # y is a continuous number (price) model.predict([[1200]]) # -> e.g. [4_850_000] # it literally learns: price = w * size + b
Use for: prices, temperatures, sales, age, CGPA — anything on a continuous scale. Judge it with MAE / RMSE (average error in the same units) and R² (fraction of variance explained).
Classification — predict a category
from sklearn.linear_model import LogisticRegression # predict pass/fail from marks model = LogisticRegression() model.fit(X_train, y_train) # y is a class (0 or 1) model.predict([[42]]) # -> [0] (fail) model.predict_proba([[42]]) # -> [[0.78, 0.22]] probabilities!
Use for: spam/not, disease/healthy, which-digit, sentiment. Judge with accuracy, precision, recall, F1 (next lesson) — accuracy alone lies on imbalanced data.
Quick test: Is the answer a number on a scale (regression) or a bucket/label (classification)? "How many?" → regression. "Which one?" → classification. Despite the name, Logistic Regression is a classification model.