6.8. Least Square Regression

See also

Here is some additional reading material on Least Square Regression. It is one section from Gilbert Strang’s Linear Algebra textbook. ila0403.pdf

Least square regression is a direct application of Vector Projections.

In science and engineering, we often conduct experiments where we measure an output of a system with respect to a control variable. The output and control variables might be time, voltage, weight, length, volume, temperature, pressure, or any number of parameters relative to the system under investigation. We might expect that the output and control variable are related by a polynomial equation, but we likely do not know the coefficients of the polynomial. We might not even know if the relationship of the measured data is linear, quadratic, or a higher order polynomial. We are conducting the experiment to learn about the system. Unfortunately, in addition to measuring the relationship between the output and control variables, we will also measure some noise caused by volatility of the system and measurement errors.

If the x is the control variable, the measured output data can be modeled as:

b(x) = f(x) + n(x)

  • b(x) is the output data.
  • f(x) is the underlying relationship that we wish to find.
  • n(x) is the measurement noise, which may be correlated to x, or it may be independent of x.

The accuracy of our findings will increase with more sample points. After conducting the experiment, we want to apply Vector Projections to fit the data to a polynomial that minimizes the square of the error (E = \|e\|^2).


Least square regression results can be very good if noise is moderate and does not have outliers that will skew the results. After initial regression fitting, a further analysis of the noise might lead to removal of outlier data so that a better fit can be achieved.

6.8.1. Linear Regression

The model for a linear relationship between measured data and the control variables is:

b(t) = C + D\,t

C + D\,t_1 = b(t_1) \\
C + D\,t_2 = b(t_2) \\
C + D\,t_3 = b(t_3) \\
C + D\,t_4 = b(t_4) \\
 \vdots \\
C + D\,t_n = b(t_n)

Solution Alternatives

We have already seen that calculus and linear algebra have the same solution to the projection problem, of which least square regression is an example. The computations for least square regression are always the same, but the equations look different depending on one’s vantage point. There are three ways that one can describe the calculations. We will focus of the last.

  1. Statistics Approach
  2. Calculus Approach
  3. Linear Algebra Approach Statistical Linear Regression

We start by calculating some statistical parameters from both the results, b, and the control variable, t. The equations for calculating the mean and standard deviation are found in the Common Statistical Functions section.

  1. Mean of b is \overline{b}.

  2. Mean of t is \overline{t}.

  3. Standard deviation of b is \sigma_b.

  4. Standard deviation of t is \sigma_t.

  5. The correlation between t and b is r.

    r = \frac{N \sum tb - \left(\sum b\right)\left(\sum t\right)}

Then the linear regression coefficients are:

\begin{array}{l} D = \frac{r\,\sigma_b}{\sigma_t} \\ \\
C = \overline{b} - D\,\overline{t} \end{array} Calculus Linear Regression

Setting the derivative of the squared error equal to zero, the following equations define the linear coefficients.

D = \frac{N\sum bt - \left(\sum t\right)\left(\sum b\right)}
{N \sum t^2 - \left(\sum t \right)^2} \\ \\
C = \frac{\left(\sum t^2\right)\left(\sum b\right) -
      \left(\sum t\right)\left(\sum tb\right)}
{N \sum t^2 - \left(\sum t \right)^2}
 \end{array} Linear Algebra Linear Regression

A\,u = b

Where A defines the control variables and u represents the unknown polynomial coefficients.

A = \left[ \begin{array}{ll}
1 & t_1 \\
1 & t_2 \\
\vdots & \vdots \\
1 & t_N \end{array} \right]

u = \left[ \begin{array}{l} C \\ D \end{array} \right]

The vector b holds the measured output data. Because of the noise, b is not likely to lie in the column space of A, so using the equations from Projections in \mathbb{R}^3, we can find an approximation of u as

\hat{u} = \left(A^T\,A\right)^{-1}\,A^T\,b

The projection p = A\,\hat{u} follows directly. Matching Linear Regression Equations

Here, we show that the equations for linear regression from the study of statistics, calculus, and linear algebra are, in fact, the same.

A^T\,A = \spalignmat[l]{ N \sum t; \sum t \sum t^2}
A^T\,b = \spalignmat[l]{ \sum b; \sum tb}

Using Elimination to solve, we get:

\spalignaugmat[l]{N \sum t \sum b; \sum t \sum t^2 \sum tb}

\spalignaugmat[l]{N \sum t \sum b; 0
{\sum t^2 - \frac{1}{N}\left(\sum t\right)^2}
{\sum tb - \frac{1}{N}\left(\sum b\right)\left(\sum t\right)}}

D = \frac{\sum tb - \frac{1}{N}\left(\sum b\right)\left(\sum t\right)}
{\sum t^2 - \frac{1}{N}\left(\sum t\right)^2}

Which is a match to the calculus equation. Identifying statistical equations in this, we get a match:

D = \frac{r\,\sigma_t\,\sigma_b}{{\sigma_t}^2}
= \frac{r\,\sigma_b}{\sigma_t}

From back substitution of the elimination matrix:

C = \frac{\sum b - D\,\sum t}{N} = \overline{b} - D\,\overline{t}

By expanding, we get the calculus equations.

C &= \frac{1}{N}\left(\sum t\right)^2 -
          \frac{\left(\sum tb\right)\left(\sum t\right) -
          \left(\sum b\right)\left(\sum t\right)^2}
     {N\,\sum t^2 - \left(\sum t\right)^2} \\ \\

  &= \frac{\left(\sum t^2\right)\left(\sum b\right) -
            \left(\sum t\right)\left(\sum tb\right)}
     {N\,\sum t^2 - \left(\sum t\right)^2}

6.8.2. Linear Regression Example

% Least Squares linear regression example showing the projection of the
% test data onto the column space of A.
% Using control variable t, an experiment is conducted.
% Model the data as: b(t) = C + Dt.
% b(0) = C + D*0 = 1
% b(1) = C + D*1 = 2
% b(3) = C + D*3 = 6

% A*u_hat = b, where u_hat = [ C D ]'

% If b(3) = 4, the data would map to a straight line. We would say that
% the vector b is in the column space of A. Since b is not in column space
% of A, we will do least squares linear regresion.  We need to project the
% vector b onto the column space of A.
% Note that with three sample points, we can visualize the projection in
% R^3.

%% Least Squares Projection

% Define the column space of A, which forms a plane in R^3.
a1 = [1 1 1]';  % vector from (0, 0, 0) to (1, 1, 1).
a2 = [0 1 3]';
A = [a1 a2];
B = [1 2 6]';
u_hat = (A'*A)\(A'*B);  % least squares line
p = A*u_hat;          % projection vector
e = B - p;            % error vector

C = u_hat(1);
D = u_hat(2);
fprintf('Equation estimate = %.2f + %.2f*t\n', C, D);

%% Plot the linear regression
figure, plot(a2, B, 'o');
line(a2, p, 'Color', 'k');
title('Linear Regression');

6.8.3. Quadratic and Higher Order Regression

Application of least squares regression to quadratic polynomials only requires adding an additional column to the A matrix for the square of the control variable.

C + D\,t_1 + E\,t_1^2 = b(t_1) \\
C + D\,t_2 + E\,t_2^2 = b(t_2) \\
C + D\,t_3 + E\,t_3^2 = b(t_3) \\
C + D\,t_4 + E\,t_4^2 = b(t_4) \\
 \vdots \\
C + D\,t_n + E\,t_n^2 = b(t_n)

A = \left[ \begin{array}{lll}
1 & t_1 & t_{1}^{2}\\
1 & t_2 & t_{2}^{2}\\
\vdots & \vdots  & \vdots \\
1 & t_n & t_{n}^{2} \end{array} \right]

Higher order polynomials would likewise require additional columns in the A matrix.

6.8.4. Quadratic Regression Example

%% Quadratic Least Squares Regression Demonstration
%  Generate data per a quadratic equation, add noise to it, use
%  least squares regression to find the equation.

%% Generate data

f = @(t) 1 - 4*t + t.^2;   % Quadratic function as a handle

Max = 10;
N = 40;        % number of samples
std_dev = 5;   % standard deviation of the noise

t = linspace(0, Max, N);
y = f(t) + std_dev*randn(1,N); % randn -> mean = 0, std_deviation = 1

%% Least Squares Fit to Quadratic Equation
%  model data as: b = C + Dt + Et^2
%  Projection is A*u_hat, where u_hat = [C D E]'

%  Determine A matrix from t.
A = ones(N,3);    % pre-allocate A - Nx3 matrix
% First column of A is already all ones for offset term.
A(:,2) = t';      % linear term, second column
A(:,3) = t'.^2;   % Quadratic term, third column

% b vector comes from sampled data
b = y';

u_hat = (A'*A)\(A'*b);  % least squares equation
p = A*u_hat;            % projection vector
% e = B - p;              % error vector
C = u_hat(1);
D = u_hat(2);
E = u_hat(3);
fprintf('Equation estimate = %.2f + %.2f*t + %.2f*t^2\n', C, D, E);

%% Plot the results

figure, plot(t, y, 'ob', 'DisplayName', 'Sampled Data');
title('Quadatic Regression');
xlabel('Control Variable');
line(t, p, 'Color', 'k', 'DisplayName', 'Regression Fit');
line(t, f(t), 'Color', 'm', 'DisplayName', 'No Noise Line' );
legend('show', 'Location', 'northwest');

6.8.5. MATLAB’s polyfit function

Well, an erudite engineer should understand the principles of how to do least squares regression for fitting a polynomial equation to a set of data. However, as might be expected with MATLAB, it already has a function that does the real work for us. Read MATLAB’s documentation for the polyfit and polyval functions.

%% MATLAB's polyfit function
coef = polyfit(t, y, 2);
disp(['Polyfit Coefficients: ', num2str(coef)])
pvals = polyval(coef, t);

6.8.6. Generalized Least Square Regression

The key to least square regression success is to correctly model the data with an appropriate set of basis functions. When fitting the data to a polynomial, we use progressive powers of t as the basis functions.

  f_0(t) =  1 \mbox{, a constant} \\
  f_1(t) =  t \\
  f_2(t) =  t^2 \\
  \; \; \; \vdots \\
  f_n(t) =  t^n

The regression algorithm then uses a simple linear algebra Vector Projections algorithm to compute the coefficients (\alpha_1, \alpha_2, ...
\alpha_n) to fit the data to the polynomial.

\hat{y}(t_0) = \alpha_0 f_0(t_0) + \alpha_1 f_1(t_0) +  \hdots
        + \alpha_n f_n(t_0);
\hat{y}(t_1) = \alpha_0 f_0(t_1) + \alpha_1 f_1(t_1) +  \hdots
        + \alpha_n f_n(t_1);
        \= \vdots;
\hat{y}(t_m) = \alpha_0 f_0(t_m) + \alpha_1 f_1(t_m) +  \hdots
        + \alpha_n f_n(t_m)}

The basis functions need not be polynomials, however. They can be any linear function that offers a reasonable model for the data. Consider a set of basis functions consisting of a constant offset and oscillating trigonometry functions.

  f_0(t) = 1 \mbox{, a constant} \\
  f_1(t) = sin(t) \\
  f_2(t) = cos(t)

The A matrix for the regression would then look like:

\spalignmat{ 1 sin(t_0) cos(t_0);
1 sin(t_1) cos(t_1);
1 sin(t_2) cos(t_2);
{} \vdots  {};
1 sin(t_m) cos(t_m)}

Here is a function that implements generalized least square regression. The basis functions are given as input in the form of a cell array containing function handles.

function [ alpha, varargout ] = myregression( f, x, y )
%MYREGRESSION Regression fit of data to specified functions
%   Inputs:
%         f - cell array of basis function handles
%         x - x values of data
%         y - y values of data
%   Output:
%         alpha - Coefficients of the regression fit
%         yHat  - Curve fit data (optional)

    % A will be m-by-n
    % alpha and b will be n-by-1
    if length(x) ~= length(y)
        error('Input vectors not the same size');
    m = length(x);
    n = length(f);
    A = ones(m,n);
    if size(y, 2) == 1 % a column vector
        b = y;
        cv = true;
        b = y';
        cv = false;
    if size(x, 2) == 1 % column vector
        c = x;
        c = x';
    for k = 1:n
        A(:,k) = f{k}(c);
    alpha = (A'*A) \ (A'*b);
    if nargout > 1
        yHat = A * alpha;
        if ~cv
            yHat = yHat';
        varargout{1} = yHat;

Here is a script to test the above function.

f = {@(x) 1, @sin, @cos };
x = linspace(-pi, pi, 1000)';
y = 0.5 + 2*sin(x) - 3*cos(x) + randn(size(x));
[alpha, curve] = myregression(f, x, y);
figure, plot(x, y, 'b.')
hold on
plot(x, curve,'r', 'LineWidth', 2);
title('Regression Fit of Trig Functions')
legend('data', 'regression', 'Location', 'north')
hold off
axis tight

6.8.7. Fourier Series from Least Square Regression

It turns out that when the basis functions are sine waves of the appropriate frequency and phase that the regression calculation can also find Fourier series coefficients.

f = {@(t)1,@(t) sin(t),@(t) sin(3*t),@(t) sin(5*t),@(t) sin(7*t)};

x = linspace(0, 4*pi, 1000)';
y = sign(sin(x));
%y = y + 0.1*randn(length(x),1);
[alpha, curve] = myregression(f, x, y);
figure, plot(x, y, 'b.')
hold on
plot(x, curve,'r', 'LineWidth', 2);
title('Regression Fit of Square Wave')
legend('data', 'regression') %, 'Location', 'north')
hold off
axis tight

6.8.8. Fitting Exponential Data

If the data is exponential, rather than linear, use the logarithm function to convert the data to linear before doing regression data fitting.

  • The data takes the form y(x) = c\,e^{a\,x}.
  • Taking the logarithm of the data converts it to linear data, Y = log(y(x)) = log(c) + a\,x.
  • Use linear regression to find \hat{Y}.
  • Then \hat{y}(x) = e^{\hat{Y}}.


Now complete the Least Square Regression Homework.