Orthogonal least squares

Section 8.5 Orthogonal least squares

Suppose we collect some data when performing an experiment and plot it as shown on the left of Figure 8.5.1. Notice that there is no line on which all the points lie; in fact, it would be surprising if there were since we can expect some uncertainty in the measurements recorded. There does, however, appear to be a line, as shown on the right, on which the points almost lie.

Figure 8.5.1. A collection of points and a line approximating the linear relationship implied by them.

In this section, we’ll explore how the techniques developed in this chapter enable us to find the line that best approximates the data. More specifically, we’ll see how the search for a line passing through the data points leads to an inconsistent system \(A\xvec=\bvec\text{.}\) Since we are unable to find a solution, we instead seek the vector \(\xvec\) where \(A\xvec\) is as close as possible to \(\bvec\text{.}\) Orthogonal projection gives us just the right tool for doing this.

Exploration 8.5.1.

Is there a solution to the equation \(A\xvec=\bvec\) where \(A\) and \(\bvec\) are such that

\begin{equation*} \begin{bmatrix} 1 \amp 2 \\ 2 \amp 5 \\ -1 \amp 0 \\ \end{bmatrix} \xvec = \threevec5{-3}{-1}\text{.} \end{equation*}
We know that \(\threevec12{-1}\) and \(\threevec250\) form a basis for \(\col(A)\text{.}\) Find an orthogonal basis for \(\col(A)\text{.}\)
Find the orthogonal projection \(\widehat\bvec\) of \(\bvec\) onto \(\col(A)\text{.}\)
Explain why the equation \(A\xvec=\widehat\bvec\) must be consistent and then find its solution.

Solution.

The reduced row echelon form

\begin{equation*} \left[ \begin{array}{cc|c} 1 \amp 2 \amp 5 \\ 2 \amp 5 \amp -3 \\ -1 \amp 0 \amp -1\\ \end{array} \right] \sim \left[ \begin{array}{cc|c} 1 \amp 0 \amp 0 \\ 0 \amp 1 \amp 0 \\ 0 \amp 0 \amp 1 \\ \end{array} \right] \end{equation*}

shows that there is no solution.
Applying Gram-Schmidt, we find an orthogonal basis consisting of \(\wvec_1=\cthreevec12{-1}\) and \(\wvec_2=\threevec012\text{.}\)
The projection formula gives \(\bhat = \cthreevec0{-1}{-2}\text{.}\)
The equation is consistent because \(\bhat\) is in \(\col(A)\text{.}\) We find the solution \(\xvec=\twovec2{-1}\text{.}\)

Subsection 8.5.1 A first example

When we’ve encountered inconsistent systems in the past, we’ve simply said there is no solution and moved on. The preview activity, however, shows how we can find approximate solutions to an inconsistent system: if there are no solutions to \(A\xvec = \bvec\text{,}\) we instead solve the consistent system \(A\xvec = \bhat\text{,}\) the orthogonal projection of \(\bvec\) onto \(\col(A)\text{.}\) As we’ll see, this solution is, in a specific sense, the best possible.

Activity 8.5.2.

Suppose we have three data points \((1,1)\text{,}\) \((2,1)\text{,}\) and \((3,3)\) and that we would like to find a line passing through them.

Plot these three points in Figure 8.5.2. Are you able to draw a line that passes through all three points?

Figure 8.5.2. Plot the three data points here.
Remember that the equation of a line can be written as \(b + mx=y\) where \(m\) is the slope and \(b\) is the \(y\)-intercept. We will try to find \(b\) and \(m\) so that the three points lie on the line.

The first data point \((1,1)\) gives an equation for \(b\) and \(m\text{.}\) In particular, we know that when \(x=1\text{,}\) then \(y=1\) so we have \(b + m(1) = 1\) or \(b + m = 1\text{.}\) Use the other two data points to create a linear system describing \(m\) and \(b\text{.}\)
We have obtained a linear system having three equations, one from each data point, for the two unknowns \(b\) and \(m\text{.}\) Identify a matrix \(A\) and vector \(\bvec\) so that the system has the form \(A\xvec=\bvec\text{,}\) where \(\xvec=\ctwovec bm\text{.}\)

Notice that the unknown vector \(\xvec=\ctwovec bm\) describes the line that we seek.
Is there a solution to this linear system? How does this question relate to your attempt to draw a line through the three points above?
Since this system is inconsistent, we know that \(\bvec\) is not in the column space \(\col(A)\text{.}\) Find an orthogonal basis for \(\col(A)\) and use it to find the orthogonal projection \(\widehat\bvec\) of \(\bvec\) onto \(\col(A)\text{.}\)
Since \(\widehat\bvec\) is in \(\col(A)\text{,}\) the equation \(A\xvec = \widehat\bvec\) is consistent. Find its solution \(\xvec = \ctwovec{b}{m}\) and sketch the line \(y=b + mx\) in Figure 8.5.2. We say that this is the line of best fit.

Answer.

It’s not possible to draw a line through all three points.
We have the equations

\begin{align*} b + m \amp {}={} 1\\ b + 2m \amp {}={} 1\\ b + 3m \amp {}={} 3 \end{align*}
We have \(A = \begin{bmatrix} 1 \amp 1 \\ 1 \amp 2 \\ 1 \amp 3 \\ \end{bmatrix}\) and \(\bvec=\threevec113\text{.}\)
This linear system is inconsistent.
\(\bhat=\threevec{2/3}{5/3}{8/3}\text{.}\)
\(\xvec=\twovec{-1/3}1\text{.}\) This line is shown in Figure 8.5.4.

Figure 8.5.4. The line that best approximates the three data points.

Solution.

After plotting the points, we see that it’s not possible to draw a line through all three points.
We have the equations

\begin{align*} b + m \amp {}={} 1\\ b + 2m \amp {}={} 1\\ b + 3m \amp {}={} 3 \end{align*}
We have \(A = \begin{bmatrix} 1 \amp 1 \\ 1 \amp 2 \\ 1 \amp 3 \\ \end{bmatrix}\) and \(\bvec=\threevec113\text{.}\)
Finding the reduced row echelon form of the associated augmented matrix tells us this is an inconsistent system. Since a solution would describe a line passing through the three points, we should expect this.
Applying Gram-Schmidt gives us the orthogonal basis \(\wvec_1=\threevec111\) and \(\wvec_2 = \threevec{-1}01\text{.}\) Projecting \(\bvec\) onto \(\col(A)\) gives \(\bhat=\threevec{2/3}{5/3}{8/3}\text{.}\)
Solving the equation \(A\xvec = \bhat\) gives \(\xvec=\ctwovec{-1/3}1\text{,}\) which describes a line having vertical intercept \(b=-1/3\) and the slope \(=1\text{.}\) This line is shown in Figure 8.5.3.

Figure 8.5.3. The line that best approximates the three data points.

This activity illustrates the idea behind a technique known as orthogonal least squares, which we have been working toward throughout this chapter. If the data points are denoted as \((x_i, y_i)\text{,}\) we construct the matrix \(A\) and vector \(\bvec\) as

\begin{equation*} A = \begin{bmatrix} 1 \amp x_1 \\ 1 \amp x_2 \\ 1 \amp x_3 \\ \end{bmatrix},\hspace{24pt} \bvec = \threevec{y_1}{y_2}{y_3}\text{.} \end{equation*}

With the vector \(\xvec=\ctwovec bm\) representing the line \(b+mx = y\text{,}\) we see that the equation \(A\xvec=\bvec\) describes a line passing through all the data points. In our activity, it is visually apparent that there is no such line, which agrees with the fact that the equation \(A\xvec=\bvec\) is inconsistent.

Remember that \(\bhat\text{,}\) the orthogonal projection of \(\bvec\) onto \(\col(A)\text{,}\) is the closest vector in \(\col(A)\) to \(\bvec\text{.}\) Therefore, when we solve the equation \(A\xvec=\bhat\text{,}\) we are finding the vector \(\xvec\) so that \(A\xvec = \threevec{b+mx_1}{b+mx_2}{b+mx_3}\) is as close to \(\bvec=\threevec{y_1}{y_2}{y_3}\) as possible. Let’s think about what this means within the context of this problem.

The difference \(\bvec-A\xvec = \threevec{y_1-(b+mx_1)}{y_2-(b+mx_2)}{y_3-(b+mx_3)}\) so that the square of the distance between \(A\xvec\) and \(\bvec\) is

\begin{align*} \len{\bvec - A\xvec}^2 \amp =\\ \amp \left(y_1-(b+mx_1)\right)^2 + \left(y_2-(b+mx_2)\right)^2 + \left(y_3-(b+mx_3)\right)^2\text{.} \end{align*}

Our approach finds the values for \(b\) and \(m\) that make this sum of squares as small as possible, which is why we call this a least squares problem.

Drawing the line defined by the vector \(\xvec=\ctwovec bm\text{,}\) the quantity \(y_i - (b + mx_i)\) reflects the vertical distance between the line and the data point \((x_i, y_i)\text{,}\) as shown in Figure 8.5.5. Seen in this way, the square of the distance \(\len{\bvec-A\xvec}^2\) is a measure of how much the line defined by the vector \(\xvec\) misses the data points. The solution to the least squares problem is the line that misses the data points by the smallest amount possible.

Figure 8.5.5. The solution of the least squares problem and the vertical distances between the line and the data points.

Subsection 8.5.2 Solving least squares problems

Now that we’ve seen an example of what we’re trying to accomplish, let’s put this technique into a more general framework.

Given an inconsistent system \(A\xvec = \bvec\text{,}\) we seek the vector \(\xvec\) that minimizes the distance from \(A\xvec\) to \(\bvec\text{.}\) In other words, \(\xvec\) satisfies \(A\xvec = \widehat\bvec\text{,}\) where \(\bhat\) is the orthogonal projection of \(\bvec\) onto the column space \(\col(A)\text{.}\) We know the equation \(A\xvec=\bhat\) is consistent since \(\bhat\) is in \(\col(A)\text{,}\) and we know there is only one solution if we assume that the columns of \(A\) are linearly independent.

We will usually denote the solution of \(A\xvec = \bhat\) by \(\xhat\) and call this vector the least squares approximate solution of \(A\xvec=\bvec\) to distinguish it from a (possibly non-existent) solution of \(A\xvec=\bvec\text{.}\)

In the previous section we saw that if \(Q\) is a matrix whose columns form an orthonormal basis for \(\col(A)\text{,}\) then \(\bhat = QQ^T\bvec\text{.}\) The next step is to find \(\xhat = \ctwovec{b}{m}\text{.}\) In order to do this we solve \(A\xhat =\bhat\text{.}\) We know this system will be consistent since Since \(\bhat\) is in \(\col(A)\text{.,}\)

As the next activity demonstrates, there is an alternate method for finding the least squares approximate solution \(\xhat\) using a \(QR\) factorization of the matrix \(A\text{.}\) This method is preferable as it is numerically more reliable. The matrix \(R\) is upper triangular so finding \(R^{-1}\) is computaionally feasible.

Example 8.5.6.

Consider the equation

\begin{equation*} \begin{bmatrix} 2 \amp 1 \\ 2 \amp 0 \\ -1 \amp 3 \\ \end{bmatrix} \xvec = \threevec{16}{-1}7 \end{equation*}

with matrix \(A\) and vector \(\bvec\text{.}\) Since this equation is inconsistent, we will find the least squares approximate solution \(\xhat\)

Suppose we are interested in finding the least squares approximate solution to the equation \(A\xvec = \bvec\) and that we have the \(QR\) factorization \(A=QR\text{.}\) Explain why the least squares approximation solution is given by solving

\begin{align*} A\xhat \amp = QQ^T\bvec \\\\ QR\xhat \amp = QQ^T\bvec \\ \end{align*}

Multiply both sides of the second expression by \(Q^T\) and explain why

\begin{equation*} R\xhat = Q^T\bvec. \end{equation*}

Since \(R\) is upper triangular, this is a relatively simple equation to solve using back substitution. We will therefore write the least squares approximate solution as

\begin{equation*} \xhat = R^{-1}Q^T\bvec, \end{equation*}

and put this to use in the following context.

Activity 8.5.3.

The rate at which a cricket chirps is related to the outdoor temperature, as reflected in some experimental data that we’ll study in this activity. The chirp rate \(C\) is expressed in chirps per second while the temperature \(T\) is in degrees Fahrenheit. Evaluate the following cell to load the data:

Evaluating this cell also provides:

the vectors chirps and temps formed from the columns of the dataset.
the command onesvec(n), which creates an \(n\)-dimensional vector whose entries are all one.
Remember that you can form a matrix whose columns are the vectors v1 and v2 with column_matrix([v1, v2]).

We would like to represent this relationship by a linear function

\begin{equation*} \beta_0 + \beta_1 C = T\text{.} \end{equation*}

Use the first data point \((C_1,T_1)=(20.0,88.6)\) to write an equation involving \(\beta_0\) and \(\beta_1\text{.}\)
Suppose that we represent the unknowns using a vector \(\xvec = \twovec{\beta_0}{\beta_1}\text{.}\) Use the 15 data points to create the matrix \(A\) and vector \(\bvec\) so that the linear system \(A\xvec= \bvec\) describes the unknown vector \(\xvec\text{.}\)
The command onesvec(n), returns an \(n\)-dimensional vector whose entries are all one.

The command QR(A) returns the \(QR\) factorization of \(A\) as Q, R = QR(A).
Find the least squares approximate solution \(\xhat\) by solving the equation \(R\xhat = Q^T\bvec\text{.}\) Remember the vector \(\bvec\) is the vector of temperatures.
What are the values of \(\beta_0\) and \(\beta_1\) that you found?
If the chirp rate is 22 chirps per second, what is your prediction for the temperature?
You can plot the data and your line, assuming you called the solution xhat, using the cell below.

Answer.

\(\beta_0 + 20.0\beta_1 = 88.6\text{.}\)
\(A\) is the \(15\times2\) matrix whose first column consists only of 1’s and whose second column is the vector of chirp rates. The vector \(\bvec\) is the vector of temperatures.
\(\xhat = \twovec{\beta_0}{\beta_1} = \twovec{22.8}{3.4}\text{.}\)
\(97.9\) degrees.

Solution.

We have the equation \(\beta_0 + 20.0\beta_1 = 88.6\text{.}\)
\(A\) is the \(15\times2\) matrix whose first column consists only of 1’s and whose second column is the vector of chirp rates. The vector \(\bvec\) is the vector of temperatures.
\(\xhat = \twovec{\beta_0}{\beta_1} = \twovec{22.8}{3.4}\text{.}\)
The predicted temperature is \(\beta_0 + 22\beta_1 = 97.9\) degrees.

Prev Top Next