Vectors & Norms - Data Science Cheatsheet

Interactive Vector Calculator

How to use: Adjust vector components to see norms, dot product, and cosine similarity update in real-time. Visualize vectors in 2D space!

2D Vector Visualization

Vector A

A₁ (x-component): 3

A₂ (y-component): 4

Vector B

B₁ (x-component): 2

B₂ (y-component): 1

Calculations

||A|| (L2)

-

||B|| (L2)

-

||A|| (L1)

-

||B|| (L1)

-

Dot Product (A·B)

-

Cosine Similarity

-

Angle (degrees)

-

Vector Basics

What is a Vector?

A vector is an ordered list of numbers representing magnitude and direction in space. Fundamental building block of linear algebra and ML!

Vector Notation

$$\mathbf{v} = \begin{bmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{bmatrix}$$

Example in 2D: v = [3, 4]
Example in 3D: v = [1, 2, 3]

Components:
• v₁ - first component (x-coordinate)
• v₂ - second component (y-coordinate)
• v₃ - third component (z-coordinate), etc.

Vector Operations

Addition:
$$\mathbf{a} + \mathbf{b} = \begin{bmatrix} a_1 + b_1 \\ a_2 + b_2 \\ \vdots \\ a_n + b_n \end{bmatrix}$$
Scalar Multiplication:
$$c \cdot \mathbf{v} = \begin{bmatrix} c \cdot v_1 \\ c \cdot v_2 \\ \vdots \\ c \cdot v_n \end{bmatrix}$$
Properties:
• Commutative: a + b = b + a
• Associative: (a + b) + c = a + (b + c)
• Distributive: c(a + b) = ca + cb

Norms (Vector Length)

What is a Norm?

A norm measures the "size" or "length" of a vector. Different norms emphasize different aspects of distance. Essential for ML (regularization, distance metrics).

L2 Norm (Euclidean Distance)

$$||\mathbf{v}||_2 = \sqrt{v_1^2 + v_2^2 + \cdots + v_n^2} = \sqrt{\sum_{i=1}^{n} v_i^2}$$

Most common norm! Straight-line distance from origin

Example: v = [3, 4]
||v||₂ = √(3² + 4²) = √(9 + 16) = √25 = 5

Properties:
• Also called Euclidean norm or magnitude
• Used in Ridge regression (L2 regularization)
• Sensitive to outliers (squares large values)
• Differentiable everywhere (good for optimization)

L1 Norm (Manhattan Distance)

$$||\mathbf{v}||_1 = |v_1| + |v_2| + \cdots + |v_n| = \sum_{i=1}^{n} |v_i|$$

Sum of absolute values! Distance traveling along axes (like city blocks)

Example: v = [3, 4]
||v||₁ = |3| + |4| = 3 + 4 = 7

Properties:
• Also called taxicab norm or L1 distance
• Used in Lasso regression (L1 regularization)
• Promotes sparsity (drives coefficients to exactly 0)
• More robust to outliers than L2
• Not differentiable at 0 (minor issue in optimization)

L∞ Norm (Max Norm / Chebyshev Distance)

$$||\mathbf{v}||_\infty = \max(|v_1|, |v_2|, \ldots, |v_n|)$$

Maximum absolute component!

Example: v = [3, 4, -5]
||v||∞ = max(|3|, |4|, |-5|) = max(3, 4, 5) = 5

Properties:
• Represents worst-case distance
• Used in game theory, optimization
• Computationally cheapest (just find max)
• Very robust to outliers (only cares about largest)

General Lp Norm

$$||\mathbf{v}||_p = \left(\sum_{i=1}^{n} |v_i|^p\right)^{1/p}$$

Generalization of all norms!
• p = 1: L1 norm
• p = 2: L2 norm
• p = ∞: L∞ norm
• p < 1: Not a true norm (doesn't satisfy triangle inequality)

As p increases, norm approaches L∞

Dot Product (Inner Product)

What is Dot Product?

Dot product measures how much two vectors point in the same direction. Returns a scalar (number), not a vector!

Dot Product Formula

$$\mathbf{a} \cdot \mathbf{b} = a_1b_1 + a_2b_2 + \cdots + a_nb_n = \sum_{i=1}^{n} a_i b_i$$

$$\mathbf{a} \cdot \mathbf{b} = ||\mathbf{a}|| \cdot ||\mathbf{b}|| \cdot \cos(\theta)$$

Example: a = [3, 4], b = [2, 1]
a·b = (3)(2) + (4)(1) = 6 + 4 = 10

Geometric interpretation:
• θ = angle between vectors
• Measures projection of one vector onto another

Properties:
• Commutative: a·b = b·a
• Distributive: a·(b + c) = a·b + a·c
• a·a = ||a||²

Dot Product Interpretation

Positive (> 0): Vectors point in similar direction (angle < 90°)
Zero (= 0): Vectors are orthogonal/perpendicular (angle = 90°)
Negative (< 0): Vectors point in opposite directions (angle > 90°)

Applications:
• Finding angle between vectors
• Checking orthogonality
• Matrix multiplication
• Neural network forward pass
• Calculating work done (physics)

Cosine Similarity

What is Cosine Similarity?

Measures angle between vectors, ignoring magnitude. Range: [-1, 1]. Widely used in NLP, recommendation systems, and ML!

Cosine Similarity Formula

$$\cos(\theta) = \frac{\mathbf{a} \cdot \mathbf{b}}{||\mathbf{a}|| \cdot ||\mathbf{b}||} = \frac{\sum_{i=1}^{n} a_i b_i}{\sqrt{\sum_{i=1}^{n} a_i^2} \cdot \sqrt{\sum_{i=1}^{n} b_i^2}}$$

Example: a = [3, 4], b = [6, 8]
a·b = 18 + 32 = 50
||a|| = 5, ||b|| = 10
cos(θ) = 50 / (5 × 10) = 50/50 = 1.0
(Perfectly aligned! b = 2a)

Range and Interpretation:
• +1: Same direction (angle = 0°)
• 0: Orthogonal/perpendicular (angle = 90°)
• -1: Opposite direction (angle = 180°)

Key advantage: Scale-invariant! [1,2] and [100,200] have cosine = 1

Applications of Cosine Similarity

1. Text Similarity (NLP):
Compare document vectors (TF-IDF, word embeddings)
Measures semantic similarity regardless of document length

2. Recommendation Systems:
Find similar users or items based on preferences
User A rated [5,3,0,4], User B rated [4,2,0,5] → similar taste!

3. Image Similarity:
Compare feature vectors from CNN embeddings

4. Anomaly Detection:
Find vectors that don't align with typical patterns

Why use cosine over Euclidean?
• Robust to magnitude differences
• Better for high-dimensional sparse data
• Focuses on direction/orientation, not scale

Norm Comparison

L1 Norm

Sum of absolutes

✅ Promotes sparsity
✅ Robust to outliers
✅ Used in Lasso
❌ Not differentiable at 0

Use when: Want sparse solutions, feature selection

L2 Norm

Euclidean distance

✅ Smooth & differentiable
✅ Unique solution
✅ Used in Ridge
❌ Sensitive to outliers

Use when: Standard distance, optimization, general ML

L∞ Norm

Maximum component

✅ Very robust
✅ Fast to compute
✅ Worst-case measure
❌ Ignores other dimensions

Use when: Minimax problems, game theory

Key Insights

L1 vs L2 regularization:
L1 (Lasso) creates sparse solutions (exactly 0 coefficients). L2 (Ridge) shrinks all coefficients but rarely to exactly 0. Choose based on whether you want feature selection (L1) or just regularization (L2).

Cosine vs Euclidean:
Cosine ignores magnitude, only cares about direction. [1,2] and [100,200] are identical in cosine (=1.0) but very different in Euclidean distance. Use cosine when scale doesn't matter (text, recommendations).

Dot product = 0:
Orthogonal vectors! Perpendicular, no correlation. In ML, orthogonal features provide independent information. Gram-Schmidt creates orthogonal basis.

Unit vectors:
Normalize by dividing by L2 norm: v̂ = v / ||v||₂. Result has ||v̂||₂ = 1. Useful for comparing directions without magnitude bias.

Triangle inequality:
||a + b|| ≤ ||a|| + ||b||. True for L1, L2, L∞. Fundamental property of norms.