Least Squares: Best-Fit Line

April 13, 2026

Problem

Fit a line y = mx + c through the points (1,1), (2,3), (3,2) using the normal equations AᵀAx̂ = Aᵀb. Compute m and c.

Explanation

The least-squares problem

Given an overdetermined system Ax=bA \mathbf{x} = \mathbf{b} (more equations than unknowns), usually no exact solution exists. Least squares finds the x^\hat{\mathbf{x}} that minimizes Axb2=i(Axb)i2\|A \mathbf{x} - \mathbf{b}\|^2 = \sum_i (A \mathbf{x} - \mathbf{b})_i^2

(the sum of squared residuals).

The normal equations

The minimizer solves ATAx^=ATbA^T A \, \hat{\mathbf{x}} = A^T \mathbf{b}

If AA has linearly independent columns (full column rank), ATAA^T A is invertible and x^=(ATA)1ATb\hat{\mathbf{x}} = (A^T A)^{-1} A^T \mathbf{b}

Geometrically: x^\hat{\mathbf{x}} solves Ax^=projCol(A)(b)A \hat{\mathbf{x}} = \operatorname{proj}_{\operatorname{Col}(A)}(\mathbf{b}), so the least-squares fit projects b\mathbf{b} onto the column space.

Step-by-step — fit a line to 3 points

Points: (xi,yi)(x_i, y_i) = (1,1),(2,3),(3,2)(1, 1), (2, 3), (3, 2). Model: y=mx+cy = m x + c — two unknowns m,cm, c; three equations. Overdetermined.

Step 1 — Set up Ax=bA \mathbf{x} = \mathbf{b}.

A=(112131),x=(mc),b=(132)A = \begin{pmatrix} 1 & 1 \\ 2 & 1 \\ 3 & 1 \end{pmatrix}, \quad \mathbf{x} = \begin{pmatrix} m \\ c \end{pmatrix}, \quad \mathbf{b} = \begin{pmatrix} 1 \\ 3 \\ 2 \end{pmatrix}

(Column 1 is xix_i, column 2 is 11 for the intercept.)

Step 2 — Compute ATAA^T A. ATA=(123111)(112131)=(14663)A^T A = \begin{pmatrix} 1 & 2 & 3 \\ 1 & 1 & 1 \end{pmatrix} \begin{pmatrix} 1 & 1 \\ 2 & 1 \\ 3 & 1 \end{pmatrix} = \begin{pmatrix} 14 & 6 \\ 6 & 3 \end{pmatrix}

  • Entry (1,1): 1+4+9=141 + 4 + 9 = 14.
  • Entry (1,2): 1+2+3=61 + 2 + 3 = 6.
  • Entry (2,2): 1+1+1=31 + 1 + 1 = 3.

Step 3 — Compute ATbA^T \mathbf{b}. ATb=(123111)(132)=(1+6+61+3+2)=(136)A^T \mathbf{b} = \begin{pmatrix} 1 & 2 & 3 \\ 1 & 1 & 1 \end{pmatrix} \begin{pmatrix} 1 \\ 3 \\ 2 \end{pmatrix} = \begin{pmatrix} 1 + 6 + 6 \\ 1 + 3 + 2 \end{pmatrix} = \begin{pmatrix} 13 \\ 6 \end{pmatrix}

Step 4 — Solve the normal equations ATAx^=ATbA^T A \, \hat{\mathbf{x}} = A^T \mathbf{b}.

(14663)(mc)=(136)\begin{pmatrix} 14 & 6 \\ 6 & 3 \end{pmatrix} \begin{pmatrix} m \\ c \end{pmatrix} = \begin{pmatrix} 13 \\ 6 \end{pmatrix}

Two equations:

  • 14m+6c=1314 m + 6 c = 13
  • 6m+3c=6    2m+c=2    c=22m6 m + 3 c = 6 \implies 2 m + c = 2 \implies c = 2 - 2m.

Substitute: 14m+6(22m)=13    14m+1212m=13    2m=1    m=1214 m + 6(2 - 2m) = 13 \implies 14m + 12 - 12m = 13 \implies 2m = 1 \implies m = \tfrac{1}{2}.

Then c=22(12)=1c = 2 - 2(\tfrac{1}{2}) = 1.

y=12x+1\boxed{y = \tfrac{1}{2} x + 1}

Verification: compute fitted values and residuals

  • x=1x = 1: yfit=1.5y_{\text{fit}} = 1.5; residual =11.5=0.5= 1 - 1.5 = -0.5.
  • x=2x = 2: yfit=2y_{\text{fit}} = 2; residual =32=1= 3 - 2 = 1.
  • x=3x = 3: yfit=2.5y_{\text{fit}} = 2.5; residual =22.5=0.5= 2 - 2.5 = -0.5.

Sum of squared residuals: 0.25+1+0.25=1.50.25 + 1 + 0.25 = 1.5.

Residual orthogonality check. Residuals should sum to 0 (when a constant is in the model) and the residual times xx should also sum to 0:

  • ri=0.5+10.5=0\sum r_i = -0.5 + 1 - 0.5 = 0
  • xiri=0.5+21.5=0\sum x_i r_i = -0.5 + 2 - 1.5 = 0

Both zero — confirming the residual vector is orthogonal to the column space.

The closed-form for simple linear regression

For y=mx+cy = m x + c on data (xi,yi)(x_i, y_i): m=(xixˉ)(yiyˉ)(xixˉ)2,c=yˉmxˉm = \dfrac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2}, \quad c = \bar{y} - m \bar{x}

where xˉ,yˉ\bar{x}, \bar{y} are the sample means. This matches what you get by solving the normal equations for the 2×22 \times 2 case — it's just a more convenient form.

Where least squares appears

  • Linear regression (statistics, econometrics, machine learning)
  • Curve fitting
  • Computer vision (camera calibration, RANSAC's inner step)
  • Physics (parameter estimation from noisy data)
  • Signal processing (system identification)

Pitfalls and extensions

  • Rank-deficient AA: ATAA^T A becomes singular. Use the pseudoinverse (from SVD) instead.
  • Ill-conditioning: solve via QR decomposition (A=QRA = QR then Rx^=QTbR \hat{\mathbf{x}} = Q^T \mathbf{b}) for numerical stability.
  • Weighted least squares: minimize wi(Axb)i2\sum w_i (A\mathbf{x} - \mathbf{b})_i^2; solve (ATWA)x^=ATWb(A^T W A) \hat{\mathbf{x}} = A^T W \mathbf{b}.
  • Regularization (ridge): minimize Axb2+λx2\|A\mathbf{x} - \mathbf{b}\|^2 + \lambda \|\mathbf{x}\|^2; solve (ATA+λI)x^=ATb(A^T A + \lambda I) \hat{\mathbf{x}} = A^T \mathbf{b}.

Common mistakes

  • Solving Ax=bA \mathbf{x} = \mathbf{b} directly. The system is usually inconsistent; solve the normal equations instead.
  • Forgetting the intercept column. If your model has a constant, include a column of 1s.
  • Misidentifying rows and columns. Each observation is a row of AA; each parameter is a column.

Try it in the visualization

Drag data points; the best-fit line snaps to its least-squares position. Dashed vertical segments show the residuals; their squared lengths are summed and displayed. The sum strictly grows if you drag the line away from the optimal fit.

Interactive Visualization

Parameters

1.00
1.00
2.00
3.00
3.00
2.00
4.00
4.00
Your turn

Got your own math or physics problem?

Turn any problem into an interactive visualization like this one — powered by AI, generated in seconds. Free to try, no credit card required.

Sign Up Free to Try It30 free visualizations every day
Least Squares: Best-Fit Line | MathSpin