Linear Regression
Simple linear regression with interactive visualization. Enable add point mode and click on the plot to add points!
Interactive Demo
How to use: Click "Enable Add Point Mode" button, then click anywhere on the plot to add data points. The regression line will update automatically!
Statistics
Slope (w)
-
Intercept (b)
-
R² Score
-
MSE
-
Algorithm Overview
Goal
Linear regression finds the best-fitting straight line (y = wx + b) that passes through data points by minimizing the sum of squared errors (distances from points to the line). The algorithm calculates optimal parameters w (slope) and b (intercept) using either the normal equation (analytical solution) or gradient descent (iterative optimization). The result is a model that can predict y values for any new x.
Input
• Training data: pairs of (x, y) values
• x - independent variable (feature)
• y - dependent variable (target)
• x - independent variable (feature)
• y - dependent variable (target)
Output
• Parameters: w (slope/weight) and b (intercept/bias)
• Prediction function: y = wx + b
• Performance metrics: R², MSE, RMSE
• Prediction function: y = wx + b
• Performance metrics: R², MSE, RMSE
Mathematical Formulas
Model Equation
$$y = wx + b$$
where:
• y - predicted output value (dependent variable)
• x - input feature (independent variable)
• w - weight/slope (how much y changes when x increases by 1)
• b - bias/intercept (value of y when x = 0)
• y - predicted output value (dependent variable)
• x - input feature (independent variable)
• w - weight/slope (how much y changes when x increases by 1)
• b - bias/intercept (value of y when x = 0)
Optimal Parameters (Normal Equation)
$$w = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n}(x_i - \bar{x})^2}$$
$$b = \bar{y} - w \cdot \bar{x}$$
where:
• x̄ - mean of all x values
• ȳ - mean of all y values
• Σ - sum over all data points
• w - optimal slope that minimizes error
• b - optimal intercept
• x̄ - mean of all x values
• ȳ - mean of all y values
• Σ - sum over all data points
• w - optimal slope that minimizes error
• b - optimal intercept
Mean Squared Error (MSE)
$$\text{MSE} = \frac{1}{n} \sum_{i=1}^{n}(y_i - \hat{y}_i)^2$$
where:
• n - number of data points
• y - actual/true value
• ŷ (y-hat) - predicted value from the model
• (y - ŷ) - residual/error for each point
• Lower MSE = better fit
• n - number of data points
• y - actual/true value
• ŷ (y-hat) - predicted value from the model
• (y - ŷ) - residual/error for each point
• Lower MSE = better fit
R² Score (Coefficient of Determination)
$$R^2 = 1 - \frac{SS_{res}}{SS_{tot}}$$
$$SS_{res} = \sum_{i=1}^{n}(y_i - \hat{y}_i)^2, \quad SS_{tot} = \sum_{i=1}^{n}(y_i - \bar{y})^2$$
where:
• SSres - residual sum of squares (unexplained variance)
• SStot - total sum of squares (total variance)
• y - actual value
• ŷ - predicted value
• ȳ - mean of actual values
• R² ranges from 0 to 1 (1 = perfect fit, 0 = model no better than mean)
• SSres - residual sum of squares (unexplained variance)
• SStot - total sum of squares (total variance)
• y - actual value
• ŷ - predicted value
• ȳ - mean of actual values
• R² ranges from 0 to 1 (1 = perfect fit, 0 = model no better than mean)
Key Concepts
When to use: Linear regression is best for modeling linear relationships between continuous variables.
Assumptions: Linearity, independence of observations, homoscedasticity (constant variance), normality of residuals.
Residuals: The vertical distances between data points and the regression line. They represent prediction errors. Click "Toggle Residuals" to visualize them!
Assumptions: Linearity, independence of observations, homoscedasticity (constant variance), normality of residuals.
Residuals: The vertical distances between data points and the regression line. They represent prediction errors. Click "Toggle Residuals" to visualize them!