Enough theory — here is a complete, working model in ~15 lines. This template solves a huge fraction of real classification problems.
The whole thing
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
# 1. Data: features X, labels y
X, y = load_iris(return_X_y=True)
# 2. Split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42)
# 3. Choose & train a model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train) # <- learning happens here
# 4. Predict & evaluate on UNSEEN data
preds = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, preds)) # ~0.97
# 5. Use it on something new
model.predict([[5.1, 3.5, 1.4, 0.2]]) # -> predicted flower classThe pattern that never changes
Notice the shape: model.fit(X, y) then model.predict(X). Every scikit-learn model — linear regression, SVM, gradient boosting — uses this exact interface. Swap RandomForestClassifier for LogisticRegression and the rest is identical. That consistency is why sklearn is the best place to learn.
Try it free right now
Open Google Colab, paste the code, press Shift+Enter. You just trained a 97%-accurate classifier with no install. Next: regression vs classification in depth.