Home / AI GameChanger / Python & Math for AI
๐Ÿ Python & Math for AI

NumPy & Pandas Essentials for AI

Beginner โฑ 6 min read ๐Ÿ“˜ Lesson 8 of 33

Data in AI lives in two structures: NumPy arrays (raw numbers/tensors) and Pandas DataFrames (labelled tables). Master these and half the job is done.

NumPy โ€” fast math on arrays

import numpy as np

a = np.array([1, 2, 3, 4])
a * 2                 # [2 4 6 8]  โ€” operates on the WHOLE array, no loop
a.mean(), a.std()     # stats built in

# matrices (2D) โ€” how ALL model data is shaped
m = np.array([[1, 2], [3, 4]])
m.shape               # (2, 2)  โ€” always check shapes!

Vectorisation is the key idea: a * 2 runs in optimised C over the whole array โ€” 100x faster than a Python loop. Models do billions of these operations, so never loop when NumPy can vectorise.

Pandas โ€” spreadsheets in code

import pandas as pd

df = pd.read_csv("students.csv")
df.head()                       # first 5 rows
df.info()                       # columns, types, missing values
df["cgpa"].mean()               # column stats
df[df["cgpa"] >= 8]             # filter rows
df["passed"] = df["cgpa"] >= 5  # new column
df.groupby("dept")["cgpa"].mean()   # aggregate

Cleaning โ€” the unglamorous 60% of AI

df.isnull().sum()                    # how many missing per column?
df["age"].fillna(df["age"].median(), inplace=True)   # fill gaps
df.drop_duplicates(inplace=True)     # remove dupes

Real datasets are messy. "Data cleaning" and "feature engineering" are where models are won or lost โ€” more than fancy algorithms.