Vectors & Norms
Vector operations, distance metrics, and similarity measures in Linear Algebra
Interactive Vector Calculator
How to use: Adjust vector components to see norms, dot product, and cosine similarity update in real-time. Visualize vectors in 2D space!
2D Vector Visualization
Vector A
Vector B
Calculations
||A|| (L2)
-
||B|| (L2)
-
||A|| (L1)
-
||B|| (L1)
-
Dot Product (A·B)
-
Cosine Similarity
-
Angle (degrees)
-
Vector Basics
What is a Vector?
A vector is an ordered list of numbers representing magnitude and direction in space. Fundamental building block of linear algebra and ML!
Vector Notation
$$\mathbf{v} = \begin{bmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{bmatrix}$$
Example in 2D: v = [3, 4]
Example in 3D: v = [1, 2, 3]
Components:
• v₁ - first component (x-coordinate)
• v₂ - second component (y-coordinate)
• v₃ - third component (z-coordinate), etc.
Example in 3D: v = [1, 2, 3]
Components:
• v₁ - first component (x-coordinate)
• v₂ - second component (y-coordinate)
• v₃ - third component (z-coordinate), etc.
Vector Operations
Addition:
$$\mathbf{a} + \mathbf{b} = \begin{bmatrix} a_1 + b_1 \\ a_2 + b_2 \\ \vdots \\ a_n + b_n \end{bmatrix}$$
Scalar Multiplication:
$$c \cdot \mathbf{v} = \begin{bmatrix} c \cdot v_1 \\ c \cdot v_2 \\ \vdots \\ c \cdot v_n \end{bmatrix}$$
Properties:
• Commutative: a + b = b + a
• Associative: (a + b) + c = a + (b + c)
• Distributive: c(a + b) = ca + cb
$$\mathbf{a} + \mathbf{b} = \begin{bmatrix} a_1 + b_1 \\ a_2 + b_2 \\ \vdots \\ a_n + b_n \end{bmatrix}$$
Scalar Multiplication:
$$c \cdot \mathbf{v} = \begin{bmatrix} c \cdot v_1 \\ c \cdot v_2 \\ \vdots \\ c \cdot v_n \end{bmatrix}$$
Properties:
• Commutative: a + b = b + a
• Associative: (a + b) + c = a + (b + c)
• Distributive: c(a + b) = ca + cb
Norms (Vector Length)
What is a Norm?
A norm measures the "size" or "length" of a vector. Different norms emphasize different aspects of distance. Essential for ML (regularization, distance metrics).
L2 Norm (Euclidean Distance)
$$||\mathbf{v}||_2 = \sqrt{v_1^2 + v_2^2 + \cdots + v_n^2} = \sqrt{\sum_{i=1}^{n} v_i^2}$$
Most common norm! Straight-line distance from origin
Example: v = [3, 4]
||v||₂ = √(3² + 4²) = √(9 + 16) = √25 = 5
Properties:
• Also called Euclidean norm or magnitude
• Used in Ridge regression (L2 regularization)
• Sensitive to outliers (squares large values)
• Differentiable everywhere (good for optimization)
Example: v = [3, 4]
||v||₂ = √(3² + 4²) = √(9 + 16) = √25 = 5
Properties:
• Also called Euclidean norm or magnitude
• Used in Ridge regression (L2 regularization)
• Sensitive to outliers (squares large values)
• Differentiable everywhere (good for optimization)
L1 Norm (Manhattan Distance)
$$||\mathbf{v}||_1 = |v_1| + |v_2| + \cdots + |v_n| = \sum_{i=1}^{n} |v_i|$$
Sum of absolute values! Distance traveling along axes (like city blocks)
Example: v = [3, 4]
||v||₁ = |3| + |4| = 3 + 4 = 7
Properties:
• Also called taxicab norm or L1 distance
• Used in Lasso regression (L1 regularization)
• Promotes sparsity (drives coefficients to exactly 0)
• More robust to outliers than L2
• Not differentiable at 0 (minor issue in optimization)
Example: v = [3, 4]
||v||₁ = |3| + |4| = 3 + 4 = 7
Properties:
• Also called taxicab norm or L1 distance
• Used in Lasso regression (L1 regularization)
• Promotes sparsity (drives coefficients to exactly 0)
• More robust to outliers than L2
• Not differentiable at 0 (minor issue in optimization)
L∞ Norm (Max Norm / Chebyshev Distance)
$$||\mathbf{v}||_\infty = \max(|v_1|, |v_2|, \ldots, |v_n|)$$
Maximum absolute component!
Example: v = [3, 4, -5]
||v||∞ = max(|3|, |4|, |-5|) = max(3, 4, 5) = 5
Properties:
• Represents worst-case distance
• Used in game theory, optimization
• Computationally cheapest (just find max)
• Very robust to outliers (only cares about largest)
Example: v = [3, 4, -5]
||v||∞ = max(|3|, |4|, |-5|) = max(3, 4, 5) = 5
Properties:
• Represents worst-case distance
• Used in game theory, optimization
• Computationally cheapest (just find max)
• Very robust to outliers (only cares about largest)
General Lp Norm
$$||\mathbf{v}||_p = \left(\sum_{i=1}^{n} |v_i|^p\right)^{1/p}$$
Generalization of all norms!
• p = 1: L1 norm
• p = 2: L2 norm
• p = ∞: L∞ norm
• p < 1: Not a true norm (doesn't satisfy triangle inequality)
As p increases, norm approaches L∞
• p = 1: L1 norm
• p = 2: L2 norm
• p = ∞: L∞ norm
• p < 1: Not a true norm (doesn't satisfy triangle inequality)
As p increases, norm approaches L∞
Dot Product (Inner Product)
What is Dot Product?
Dot product measures how much two vectors point in the same direction. Returns a scalar (number), not a vector!
Dot Product Formula
$$\mathbf{a} \cdot \mathbf{b} = a_1b_1 + a_2b_2 + \cdots + a_nb_n = \sum_{i=1}^{n} a_i b_i$$
$$\mathbf{a} \cdot \mathbf{b} = ||\mathbf{a}|| \cdot ||\mathbf{b}|| \cdot \cos(\theta)$$
Example: a = [3, 4], b = [2, 1]
a·b = (3)(2) + (4)(1) = 6 + 4 = 10
Geometric interpretation:
• θ = angle between vectors
• Measures projection of one vector onto another
Properties:
• Commutative: a·b = b·a
• Distributive: a·(b + c) = a·b + a·c
• a·a = ||a||²
a·b = (3)(2) + (4)(1) = 6 + 4 = 10
Geometric interpretation:
• θ = angle between vectors
• Measures projection of one vector onto another
Properties:
• Commutative: a·b = b·a
• Distributive: a·(b + c) = a·b + a·c
• a·a = ||a||²
Dot Product Interpretation
Positive (> 0): Vectors point in similar direction (angle < 90°)
Zero (= 0): Vectors are orthogonal/perpendicular (angle = 90°)
Negative (< 0): Vectors point in opposite directions (angle > 90°)
Applications:
• Finding angle between vectors
• Checking orthogonality
• Matrix multiplication
• Neural network forward pass
• Calculating work done (physics)
Zero (= 0): Vectors are orthogonal/perpendicular (angle = 90°)
Negative (< 0): Vectors point in opposite directions (angle > 90°)
Applications:
• Finding angle between vectors
• Checking orthogonality
• Matrix multiplication
• Neural network forward pass
• Calculating work done (physics)
Cosine Similarity
What is Cosine Similarity?
Measures angle between vectors, ignoring magnitude. Range: [-1, 1]. Widely used in NLP, recommendation systems, and ML!
Cosine Similarity Formula
$$\cos(\theta) = \frac{\mathbf{a} \cdot \mathbf{b}}{||\mathbf{a}|| \cdot ||\mathbf{b}||} = \frac{\sum_{i=1}^{n} a_i b_i}{\sqrt{\sum_{i=1}^{n} a_i^2} \cdot \sqrt{\sum_{i=1}^{n} b_i^2}}$$
Example: a = [3, 4], b = [6, 8]
a·b = 18 + 32 = 50
||a|| = 5, ||b|| = 10
cos(θ) = 50 / (5 × 10) = 50/50 = 1.0
(Perfectly aligned! b = 2a)
Range and Interpretation:
• +1: Same direction (angle = 0°)
• 0: Orthogonal/perpendicular (angle = 90°)
• -1: Opposite direction (angle = 180°)
Key advantage: Scale-invariant! [1,2] and [100,200] have cosine = 1
a·b = 18 + 32 = 50
||a|| = 5, ||b|| = 10
cos(θ) = 50 / (5 × 10) = 50/50 = 1.0
(Perfectly aligned! b = 2a)
Range and Interpretation:
• +1: Same direction (angle = 0°)
• 0: Orthogonal/perpendicular (angle = 90°)
• -1: Opposite direction (angle = 180°)
Key advantage: Scale-invariant! [1,2] and [100,200] have cosine = 1
Applications of Cosine Similarity
1. Text Similarity (NLP):
Compare document vectors (TF-IDF, word embeddings)
Measures semantic similarity regardless of document length
2. Recommendation Systems:
Find similar users or items based on preferences
User A rated [5,3,0,4], User B rated [4,2,0,5] → similar taste!
3. Image Similarity:
Compare feature vectors from CNN embeddings
4. Anomaly Detection:
Find vectors that don't align with typical patterns
Why use cosine over Euclidean?
• Robust to magnitude differences
• Better for high-dimensional sparse data
• Focuses on direction/orientation, not scale
Compare document vectors (TF-IDF, word embeddings)
Measures semantic similarity regardless of document length
2. Recommendation Systems:
Find similar users or items based on preferences
User A rated [5,3,0,4], User B rated [4,2,0,5] → similar taste!
3. Image Similarity:
Compare feature vectors from CNN embeddings
4. Anomaly Detection:
Find vectors that don't align with typical patterns
Why use cosine over Euclidean?
• Robust to magnitude differences
• Better for high-dimensional sparse data
• Focuses on direction/orientation, not scale
Norm Comparison
L1 Norm
Sum of absolutes
✅ Promotes sparsity
✅ Robust to outliers
✅ Used in Lasso
❌ Not differentiable at 0
Use when: Want sparse solutions, feature selection
✅ Promotes sparsity
✅ Robust to outliers
✅ Used in Lasso
❌ Not differentiable at 0
Use when: Want sparse solutions, feature selection
L2 Norm
Euclidean distance
✅ Smooth & differentiable
✅ Unique solution
✅ Used in Ridge
❌ Sensitive to outliers
Use when: Standard distance, optimization, general ML
✅ Smooth & differentiable
✅ Unique solution
✅ Used in Ridge
❌ Sensitive to outliers
Use when: Standard distance, optimization, general ML
L∞ Norm
Maximum component
✅ Very robust
✅ Fast to compute
✅ Worst-case measure
❌ Ignores other dimensions
Use when: Minimax problems, game theory
✅ Very robust
✅ Fast to compute
✅ Worst-case measure
❌ Ignores other dimensions
Use when: Minimax problems, game theory
Key Insights
L1 vs L2 regularization:
L1 (Lasso) creates sparse solutions (exactly 0 coefficients). L2 (Ridge) shrinks all coefficients but rarely to exactly 0. Choose based on whether you want feature selection (L1) or just regularization (L2).
Cosine vs Euclidean:
Cosine ignores magnitude, only cares about direction. [1,2] and [100,200] are identical in cosine (=1.0) but very different in Euclidean distance. Use cosine when scale doesn't matter (text, recommendations).
Dot product = 0:
Orthogonal vectors! Perpendicular, no correlation. In ML, orthogonal features provide independent information. Gram-Schmidt creates orthogonal basis.
Unit vectors:
Normalize by dividing by L2 norm: v̂ = v / ||v||₂. Result has ||v̂||₂ = 1. Useful for comparing directions without magnitude bias.
Triangle inequality:
||a + b|| ≤ ||a|| + ||b||. True for L1, L2, L∞. Fundamental property of norms.
L1 (Lasso) creates sparse solutions (exactly 0 coefficients). L2 (Ridge) shrinks all coefficients but rarely to exactly 0. Choose based on whether you want feature selection (L1) or just regularization (L2).
Cosine vs Euclidean:
Cosine ignores magnitude, only cares about direction. [1,2] and [100,200] are identical in cosine (=1.0) but very different in Euclidean distance. Use cosine when scale doesn't matter (text, recommendations).
Dot product = 0:
Orthogonal vectors! Perpendicular, no correlation. In ML, orthogonal features provide independent information. Gram-Schmidt creates orthogonal basis.
Unit vectors:
Normalize by dividing by L2 norm: v̂ = v / ||v||₂. Result has ||v̂||₂ = 1. Useful for comparing directions without magnitude bias.
Triangle inequality:
||a + b|| ≤ ||a|| + ||b||. True for L1, L2, L∞. Fundamental property of norms.