6.8. Least Square Regression¶
See also
Here is some additional reading material on Least Square Regression.
It is one section from Gilbert Strang’s Linear Algebra textbook.
ila0403.pdf
Least square regression is a direct application of Vector Projections.
In science and engineering, we often conduct experiments where we measure an output of a system with respect to a control variable. The output and control variables might be time, voltage, weight, length, volume, temperature, pressure, or any number of parameters relative to the system under investigation. We might expect that the output and control variable are related by a polynomial equation, but we likely do not know the coefficients of the polynomial. We might not even know if the relationship of the measured data is linear, quadratic, or a higher order polynomial. We are conducting the experiment to learn about the system. Unfortunately, in addition to measuring the relationship between the output and control variables, we will also measure some noise caused by volatility of the system and measurement errors.
If the is the control variable, the measured output data can be modeled as:
 is the output data.
 is the underlying relationship that we wish to find.
 is the measurement noise, which may be correlated to , or it may be independent of .
The accuracy of our findings will increase with more sample points. After conducting the experiment, we want to apply Vector Projections to fit the data to a polynomial that minimizes the square of the error ().
Note
Least square regression results can be very good if noise is moderate and does not have outliers that will skew the results. After initial regression fitting, a further analysis of the noise might lead to removal of outlier data so that a better fit can be achieved.
6.8.1. Linear Regression¶
The model for a linear relationship between measured data and the control variables is:

Solution Alternatives
We have already seen that calculus and linear algebra have the same solution to the projection problem, of which least square regression is an example. The computations for least square regression are always the same, but the equations look different depending on one’s vantage point. There are three ways that one can describe the calculations. We will focus of the last.
 Statistics Approach
 Calculus Approach
 Linear Algebra Approach
6.8.1.1. Statistical Linear Regression¶
We start by calculating some statistical parameters from both the results, , and the control variable, . The equations for calculating the mean and standard deviation are found in the Common Statistical Functions section.
Mean of is .
Mean of is .
Standard deviation of is .
Standard deviation of is .
The correlation between and is .
Then the linear regression coefficients are:
6.8.1.2. Calculus Linear Regression¶
Setting the derivative of the squared error equal to zero, the following equations define the linear coefficients.
6.8.1.3. Linear Algebra Linear Regression¶
Where defines the control variables and represents the unknown polynomial coefficients.
The vector holds the measured output data. Because of the noise, is not likely to lie in the column space of , so using the equations from Projections in , we can find an approximation of as
The projection follows directly.
6.8.1.4. Matching Linear Regression Equations¶
Here, we show that the equations for linear regression from the study of statistics, calculus, and linear algebra are, in fact, the same.
Using Elimination to solve, we get:
Which is a match to the calculus equation. Identifying statistical equations in this, we get a match:
From back substitution of the elimination matrix:
By expanding, we get the calculus equations.
6.8.2. Linear Regression Example¶
% Least Squares linear regression example showing the projection of the
% test data onto the column space of A.
%
% Using control variable t, an experiment is conducted.
% Model the data as: b(t) = C + Dt.
% b(0) = C + D*0 = 1
% b(1) = C + D*1 = 2
% b(3) = C + D*3 = 6
% A*u_hat = b, where u_hat = [ C D ]'
% If b(3) = 4, the data would map to a straight line. We would say that
% the vector b is in the column space of A. Since b is not in column space
% of A, we will do least squares linear regresion. We need to project the
% vector b onto the column space of A.
% Note that with three sample points, we can visualize the projection in
% R^3.
%% Least Squares Projection
% Define the column space of A, which forms a plane in R^3.
a1 = [1 1 1]'; % vector from (0, 0, 0) to (1, 1, 1).
a2 = [0 1 3]';
A = [a1 a2];
B = [1 2 6]';
u_hat = (A'*A)\(A'*B); % least squares line
p = A*u_hat; % projection vector
e = B  p; % error vector
C = u_hat(1);
D = u_hat(2);
fprintf('Equation estimate = %.2f + %.2f*t\n', C, D);
%% Plot the linear regression
figure, plot(a2, B, 'o');
line(a2, p, 'Color', 'k');
title('Linear Regression');
6.8.3. Quadratic and Higher Order Regression¶
Application of least squares regression to quadratic polynomials only requires adding an additional column to the matrix for the square of the control variable.
Higher order polynomials would likewise require additional columns in the matrix.
6.8.4. Quadratic Regression Example¶
%% Quadratic Least Squares Regression Demonstration
%
% Generate data per a quadratic equation, add noise to it, use
% least squares regression to find the equation.
%% Generate data
f = @(t) 1  4*t + t.^2; % Quadratic function as a handle
Max = 10;
N = 40; % number of samples
std_dev = 5; % standard deviation of the noise
t = linspace(0, Max, N);
y = f(t) + std_dev*randn(1,N); % randn > mean = 0, std_deviation = 1
%% Least Squares Fit to Quadratic Equation
% model data as: b = C + Dt + Et^2
% Projection is A*u_hat, where u_hat = [C D E]'
% Determine A matrix from t.
A = ones(N,3); % preallocate A  Nx3 matrix
% First column of A is already all ones for offset term.
A(:,2) = t'; % linear term, second column
A(:,3) = t'.^2; % Quadratic term, third column
% b vector comes from sampled data
b = y';
u_hat = (A'*A)\(A'*b); % least squares equation
p = A*u_hat; % projection vector
% e = B  p; % error vector
C = u_hat(1);
D = u_hat(2);
E = u_hat(3);
fprintf('Equation estimate = %.2f + %.2f*t + %.2f*t^2\n', C, D, E);
%% Plot the results
figure, plot(t, y, 'ob', 'DisplayName', 'Sampled Data');
title('Quadatic Regression');
xlabel('Control Variable');
ylabel('Output');
line(t, p, 'Color', 'k', 'DisplayName', 'Regression Fit');
line(t, f(t), 'Color', 'm', 'DisplayName', 'No Noise Line' );
legend('show', 'Location', 'northwest');
6.8.5. MATLAB’s polyfit
function¶
Well, an erudite engineer should understand the principles of how to do
least squares regression for fitting a polynomial equation to a set of data.
However, as might be expected with MATLAB, it already has a function that
does the real work for us. Read MATLAB’s documentation for the polyfit
and polyval
functions.
%% MATLAB's polyfit function
coef = polyfit(t, y, 2);
disp(['Polyfit Coefficients: ', num2str(coef)])
pvals = polyval(coef, t);
6.8.6. Generalized Least Square Regression¶
The key to least square regression success is to correctly model the data with an appropriate set of basis functions. When fitting the data to a polynomial, we use progressive powers of as the basis functions.
The regression algorithm then uses a simple linear algebra Vector Projections algorithm to compute the coefficients () to fit the data to the polynomial.
The basis functions need not be polynomials, however. They can be any linear function that offers a reasonable model for the data. Consider a set of basis functions consisting of a constant offset and oscillating trigonometry functions.
The matrix for the regression would then look like:
Here is a function that implements generalized least square regression. The basis functions are given as input in the form of a cell array containing function handles.
function [ alpha, varargout ] = myregression( f, x, y )
%MYREGRESSION Regression fit of data to specified functions
% Inputs:
% f  cell array of basis function handles
% x  x values of data
% y  y values of data
% Output:
% alpha  Coefficients of the regression fit
% yHat  Curve fit data (optional)
% A will be mbyn
% alpha and b will be nby1
if length(x) ~= length(y)
error('Input vectors not the same size');
end
m = length(x);
n = length(f);
A = ones(m,n);
if size(y, 2) == 1 % a column vector
b = y;
cv = true;
else
b = y';
cv = false;
end
if size(x, 2) == 1 % column vector
c = x;
else
c = x';
end
for k = 1:n
A(:,k) = f{k}(c);
end
alpha = (A'*A) \ (A'*b);
if nargout > 1
yHat = A * alpha;
if ~cv
yHat = yHat';
end
varargout{1} = yHat;
end
end
Here is a script to test the above function.
f = {@(x) 1, @sin, @cos };
x = linspace(pi, pi, 1000)';
y = 0.5 + 2*sin(x)  3*cos(x) + randn(size(x));
[alpha, curve] = myregression(f, x, y);
disp(alpha)
figure, plot(x, y, 'b.')
hold on
plot(x, curve,'r', 'LineWidth', 2);
xlabel('x')
ylabel('y')
title('Regression Fit of Trig Functions')
legend('data', 'regression', 'Location', 'north')
hold off
axis tight
6.8.7. Fourier Series from Least Square Regression¶
It turns out that when the basis functions are sine waves of the appropriate frequency and phase that the regression calculation can also find Fourier series coefficients.
f = {@(t)1,@(t) sin(t),@(t) sin(3*t),@(t) sin(5*t),@(t) sin(7*t)};
x = linspace(0, 4*pi, 1000)';
y = sign(sin(x));
%y = y + 0.1*randn(length(x),1);
[alpha, curve] = myregression(f, x, y);
disp(alpha)
figure, plot(x, y, 'b.')
hold on
plot(x, curve,'r', 'LineWidth', 2);
xlabel('x')
ylabel('y')
title('Regression Fit of Square Wave')
legend('data', 'regression') %, 'Location', 'north')
hold off
axis tight
6.8.8. Fitting Exponential Data¶
If the data is exponential, rather than linear, use the logarithm function to convert the data to linear before doing regression data fitting.
 The data takes the form .
 Taking the logarithm of the data converts it to linear data, .
 Use linear regression to find .
 Then .
Note
Now complete the Least Square Regression Homework.