Embeddings are the quiet superpower behind semantic search, recommendations and RAG. The idea is beautiful: turn meaning into coordinates.
The core idea
An embedding model converts any text into a list of numbers (a vector, e.g. 1536 of them). The magic: similar meanings get similar vectors — they point in similar directions in space.
"king" → [0.21, -0.44, 0.87, ...] "queen" → [0.19, -0.41, 0.85, ...] # close to king "banana" → [-0.66, 0.12, -0.30, ...] # far away # distance in this space = difference in meaning
Semantic search beats keyword search
Search "how to reset my password" and keyword search misses a doc titled "recovering account access" — no shared words. Embeddings match them because the meaning is close. You embed the query and find the nearest document vectors.
from sklearn.metrics.pairwise import cosine_similarity # 1 = identical meaning, 0 = unrelated, -1 = opposite cosine_similarity(query_vec, doc_vec) # rank docs by this
Vector databases
To search millions of vectors fast, you use a vector database (Pinecone, Chroma, pgvector). It stores embeddings and returns the nearest neighbours to a query in milliseconds. This is the retrieval engine behind every RAG system.
Where you'll use them
- Semantic search over docs
- "Related articles" / product recommendations
- Duplicate/plagiarism detection
- Feeding relevant context to an LLM (RAG)