Least Squares: Best-Fit Line

April 13, 2026

Problem

Fit a line y = mx + c through the points (1,1), (2,3), (3,2) using the normal equations AᵀAx̂ = Aᵀb. Compute m and c.

Explanation

The least-squares problem

Given an overdetermined system $A \mathbf{x} = \mathbf{b}$ (more equations than unknowns), usually no exact solution exists. Least squares finds the $\hat{\mathbf{x}}$ that minimizes $\|A \mathbf{x} - \mathbf{b}\|^2 = \sum_i (A \mathbf{x} - \mathbf{b})_i^2$

(the sum of squared residuals).

The normal equations

The minimizer solves $A^T A \, \hat{\mathbf{x}} = A^T \mathbf{b}$

If $A$ has linearly independent columns (full column rank), $A^T A$ is invertible and $\hat{\mathbf{x}} = (A^T A)^{-1} A^T \mathbf{b}$

Geometrically: $\hat{\mathbf{x}}$ solves $A \hat{\mathbf{x}} = \operatorname{proj}_{\operatorname{Col}(A)}(\mathbf{b})$ , so the least-squares fit projects $\mathbf{b}$ onto the column space.

Step-by-step — fit a line to 3 points

Points: $(x_i, y_i)$ = $(1, 1), (2, 3), (3, 2)$ . Model: $y = m x + c$ — two unknowns $m, c$ ; three equations. Overdetermined.

Step 1 — Set up $A \mathbf{x} = \mathbf{b}$ .

$A = \begin{pmatrix} 1 & 1 \\ 2 & 1 \\ 3 & 1 \end{pmatrix}, \quad \mathbf{x} = \begin{pmatrix} m \\ c \end{pmatrix}, \quad \mathbf{b} = \begin{pmatrix} 1 \\ 3 \\ 2 \end{pmatrix}$

(Column 1 is $x_i$ , column 2 is $1$ for the intercept.)

Step 2 — Compute $A^T A$ . $A^T A = \begin{pmatrix} 1 & 2 & 3 \\ 1 & 1 & 1 \end{pmatrix} \begin{pmatrix} 1 & 1 \\ 2 & 1 \\ 3 & 1 \end{pmatrix} = \begin{pmatrix} 14 & 6 \\ 6 & 3 \end{pmatrix}$

Entry (1,1): $1 + 4 + 9 = 14$ .
Entry (1,2): $1 + 2 + 3 = 6$ .
Entry (2,2): $1 + 1 + 1 = 3$ .

Step 3 — Compute $A^T \mathbf{b}$ . $A^T \mathbf{b} = \begin{pmatrix} 1 & 2 & 3 \\ 1 & 1 & 1 \end{pmatrix} \begin{pmatrix} 1 \\ 3 \\ 2 \end{pmatrix} = \begin{pmatrix} 1 + 6 + 6 \\ 1 + 3 + 2 \end{pmatrix} = \begin{pmatrix} 13 \\ 6 \end{pmatrix}$

Step 4 — Solve the normal equations $A^T A \, \hat{\mathbf{x}} = A^T \mathbf{b}$ .

$\begin{pmatrix} 14 & 6 \\ 6 & 3 \end{pmatrix} \begin{pmatrix} m \\ c \end{pmatrix} = \begin{pmatrix} 13 \\ 6 \end{pmatrix}$

Two equations:

$14 m + 6 c = 13$
$6 m + 3 c = 6 \implies 2 m + c = 2 \implies c = 2 - 2m$ .

Substitute: $14 m + 6(2 - 2m) = 13 \implies 14m + 12 - 12m = 13 \implies 2m = 1 \implies m = \tfrac{1}{2}$ .

Then $c = 2 - 2(\tfrac{1}{2}) = 1$ .

$\boxed{y = \tfrac{1}{2} x + 1}$

Verification: compute fitted values and residuals

$x = 1$ : $y_{\text{fit}} = 1.5$ ; residual $= 1 - 1.5 = -0.5$ .
$x = 2$ : $y_{\text{fit}} = 2$ ; residual $= 3 - 2 = 1$ .
$x = 3$ : $y_{\text{fit}} = 2.5$ ; residual $= 2 - 2.5 = -0.5$ .

Sum of squared residuals: $0.25 + 1 + 0.25 = 1.5$ .

Residual orthogonality check. Residuals should sum to 0 (when a constant is in the model) and the residual times $x$ should also sum to 0:

$\sum r_i = -0.5 + 1 - 0.5 = 0$ ✓
$\sum x_i r_i = -0.5 + 2 - 1.5 = 0$ ✓

Both zero — confirming the residual vector is orthogonal to the column space.

The closed-form for simple linear regression

For $y = m x + c$ on data $(x_i, y_i)$ : $m = \dfrac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2}, \quad c = \bar{y} - m \bar{x}$

where $\bar{x}, \bar{y}$ are the sample means. This matches what you get by solving the normal equations for the $2 \times 2$ case — it's just a more convenient form.

Where least squares appears

Linear regression (statistics, econometrics, machine learning)
Curve fitting
Computer vision (camera calibration, RANSAC's inner step)
Physics (parameter estimation from noisy data)
Signal processing (system identification)

Pitfalls and extensions

Rank-deficient $A$ : $A^T A$ becomes singular. Use the pseudoinverse (from SVD) instead.
Ill-conditioning: solve via QR decomposition ( $A = QR$ then $R \hat{\mathbf{x}} = Q^T \mathbf{b}$ ) for numerical stability.
Weighted least squares: minimize $\sum w_i (A\mathbf{x} - \mathbf{b})_i^2$ ; solve $(A^T W A) \hat{\mathbf{x}} = A^T W \mathbf{b}$ .
Regularization (ridge): minimize $\|A\mathbf{x} - \mathbf{b}\|^2 + \lambda \|\mathbf{x}\|^2$ ; solve $(A^T A + \lambda I) \hat{\mathbf{x}} = A^T \mathbf{b}$ .

Common mistakes

Solving $A \mathbf{x} = \mathbf{b}$ directly. The system is usually inconsistent; solve the normal equations instead.
Forgetting the intercept column. If your model has a constant, include a column of 1s.
Misidentifying rows and columns. Each observation is a row of $A$ ; each parameter is a column.

Try it in the visualization

Drag data points; the best-fit line snaps to its least-squares position. Dashed vertical segments show the residuals; their squared lengths are summed and displayed. The sum strictly grows if you drag the line away from the optimal fit.

Interactive Visualization

Parameters

point 1 x1.00

point 1 y1.00

point 2 x2.00

point 2 y3.00

point 3 x3.00

point 3 y2.00

point 4 x4.00

point 4 y4.00

Show data points

Show best-fit line

Show residual segments

Show squared residuals

Your turn

Got your own math or physics problem?

Turn any problem into an interactive visualization like this one — powered by AI, generated in seconds. Free to try, no credit card required.