Linear Regression

Simple linear regression with interactive visualization. Enable add point mode and click on the plot to add points!

Interactive Demo

How to use: Click "Enable Add Point Mode" button, then click anywhere on the plot to add data points. The regression line will update automatically!

Statistics

Slope (w)
-
Intercept (b)
-
R² Score
-
MSE
-

Algorithm Overview

Goal
Linear regression finds the best-fitting straight line (y = wx + b) that passes through data points by minimizing the sum of squared errors (distances from points to the line). The algorithm calculates optimal parameters w (slope) and b (intercept) using either the normal equation (analytical solution) or gradient descent (iterative optimization). The result is a model that can predict y values for any new x.
Input
• Training data: pairs of (x, y) values
x - independent variable (feature)
y - dependent variable (target)
Output
• Parameters: w (slope/weight) and b (intercept/bias)
• Prediction function: y = wx + b
• Performance metrics: R², MSE, RMSE

Mathematical Formulas

Model Equation
$$y = wx + b$$
where:
y - predicted output value (dependent variable)
x - input feature (independent variable)
w - weight/slope (how much y changes when x increases by 1)
b - bias/intercept (value of y when x = 0)
Optimal Parameters (Normal Equation)
$$w = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n}(x_i - \bar{x})^2}$$ $$b = \bar{y} - w \cdot \bar{x}$$
where:
- mean of all x values
ȳ - mean of all y values
Σ - sum over all data points
w - optimal slope that minimizes error
b - optimal intercept
Mean Squared Error (MSE)
$$\text{MSE} = \frac{1}{n} \sum_{i=1}^{n}(y_i - \hat{y}_i)^2$$
where:
n - number of data points
y - actual/true value
ŷ (y-hat) - predicted value from the model
(y - ŷ) - residual/error for each point
• Lower MSE = better fit
R² Score (Coefficient of Determination)
$$R^2 = 1 - \frac{SS_{res}}{SS_{tot}}$$ $$SS_{res} = \sum_{i=1}^{n}(y_i - \hat{y}_i)^2, \quad SS_{tot} = \sum_{i=1}^{n}(y_i - \bar{y})^2$$
where:
SSres - residual sum of squares (unexplained variance)
SStot - total sum of squares (total variance)
y - actual value
ŷ - predicted value
ȳ - mean of actual values
• R² ranges from 0 to 1 (1 = perfect fit, 0 = model no better than mean)

Key Concepts

When to use: Linear regression is best for modeling linear relationships between continuous variables.

Assumptions: Linearity, independence of observations, homoscedasticity (constant variance), normality of residuals.

Residuals: The vertical distances between data points and the regression line. They represent prediction errors. Click "Toggle Residuals" to visualize them!