Orthogonal Projections

Section 8.4 Orthogonal Projections

We know that a linear system \(A\xvec=\bvec\) is inconsistent when \(\bvec\) is not in \(\col(A)\text{,}\) the column space of \(A\text{.}\) Later in this chapter, we’ll develop a strategy for dealing with inconsistent systems by finding \(\bhat\text{,}\) the vector in \(\col(A)\) that minimizes the distance to \(\bvec\text{.}\) The equation \(A\xvec=\bhat\) is therefore consistent and its solution set can provide us with useful information about the original system \(A\xvec=\bvec\text{.}\)

In this section and the next, we’ll develop some techniques that enable us to find \(\bhat\text{,}\) the vector in a given subspace \(W\) that is closest to a given vector \(\bvec\text{.}\)

Suppose, as shown in Figure 8.4.1, that we have a subspace \(W\) of \(\real^m\) and a vector \(\bvec\) that is not in that subspace. We would like to find the vector \(\bhat\) in \(W\) that is closest to \(\bvec\text{,}\) meaning the distance between \(\bhat\) and \(\bvec\) is as small as possible.

Figure 8.4.1. Given a plane in \(\real^3\) and a vector \(\bvec\) not in the plane, we wish to find the vector \(\bhat\) in the plane that is closest to \(\bvec\text{.}\)

Activity 8.4.1.

This activity demonstrates how to determine the orthogonal projection of a vector onto a subspace of \(\real^m\text{.}\)

Let’s begin by considering a line \(L\text{,}\) defined by the vector \(\wvec=\twovec21\text{,}\) and a vector \(\bvec=\twovec24\) not on \(L\text{,}\) as illustrated in Figure 8.1.9.

Figure 8.4.2. Finding the orthogonal projection of \(\bvec\) onto the line defined by \(\wvec\text{.}\)
1. To find \(\bhat\text{,}\) first notice that \(\bhat = s\wvec\) for some scalar \(s\text{.}\) Since \(\bvec-\bhat = \bvec - s\wvec\) is orthogonal to \(\wvec\text{,}\) what do we know about the dot product
  
  \begin{equation*} (\bvec-s\wvec)\cdot\wvec\text{?} \end{equation*}
2. Apply the distributive property of dot products to find the scalar \(s\text{.}\) What is the vector \(\bhat\text{,}\) the orthogonal projection of \(\bvec\) onto \(L\text{?}\)
3. More generally, explain why the orthogonal projection of \(\bvec\) onto the line defined by \(\wvec\) is
  
  \begin{equation*} \bhat= \frac{\bvec\cdot\wvec}{\wvec\cdot\wvec}~\wvec\text{.} \end{equation*}
The same ideas apply more generally. Suppose we have an orthogonal set of vectors \(\wvec_1=\threevec22{-1}\) and \(\wvec_2=\threevec102\) that define a plane \(W\) in \(\real^3\text{.}\) If \(\bvec=\threevec396\) another vector in \(\real^3\text{,}\) we seek the vector \(\bhat\) on the plane \(W\) closest to \(\bvec\text{.}\) As before, the vector \(\bvec-\bhat\) will be orthogonal to \(W\text{,}\) as illustrated in Figure 8.4.3.

Figure 8.4.3. Given a plane \(W\) defined by the orthogonal vectors \(\wvec_1\) and \(\wvec_2\) and another vector \(\bvec\text{,}\) we seek the vector \(\bhat\) on \(W\) closest to \(\bvec\text{.}\)
1. The vector \(\bvec-\bhat\) is orthogonal to \(W\text{.}\) What does this say about the dot products: \((\bvec-\bhat)\cdot\wvec_1\) and \((\bvec-\bhat)\cdot\wvec_2\text{?}\)
2. Let the columns of \(Q\) be an orthonormal basis for \(W\text{.}\) Explain why \(Q^T(\bvec-\bhat)= \zerovec\text{.}\)
3. Since \(\bhat\) is in the plane, we can write it as a linear combination of the columns of \(Q\text{,}\) therefore, \(\bhat=Q^T\xhat\text{.}\) We need to find the weights for the linear combination which are given by the vector \(\xhat\text{.}\) Rewriting \(Q^T(\bvec-\bhat)=\zerovec\text{,}\) as \(Q^T(\bvec-Q^T\xhat)= \zerovec\text{.}\) Explain why \(\xhat=Q^T\bvec\text{.}\)
4. Explain why \(\bhat = QQ^T\bvec\text{,}\) the orthogonal projection of \(\bvec\) onto the plane \(W\text{?}\)

Answer.

1. 0
2. \(\bhat = \twovec{16/5}{8/5}\text{.}\)
3. \(\displaystyle \bhat= \frac{\bvec\cdot\wvec}{\wvec\cdot\wvec}\wvec\)
1. 0
2. \(c_1= 2\)
  
  \(c_2=3\)
3. \(\displaystyle \bhat = \threevec744\)
We require that \(\bvec-\bhat\) be orthogonal to every vector \(\wvec_i\text{.}\)
\(\displaystyle c_i=\frac{\bvec\cdot\uvec_i}{\uvec_i\cdot\uvec_i} = \bvec\cdot\uvec_i\)
Use the fact that \(Q^T\bvec = \threevec{\bvec\cdot\uvec_1}{\vdots}{\bvec\cdot\uvec_n}\)

Solution.

1. This dot product should be 0 since the vectors are orthogonal.
2. \(\bhat=\frac{b\cdot\wvec}{\wvec\cdot\wvec}\wvec = \twovec{16/5}{8/5}\text{.}\)
3. As before, \(\bhat= \frac{\bvec\cdot\wvec}{\wvec\cdot\wvec}\wvec\)
1. These dot products are 0.
2. \(c_1=\frac{\bvec\cdot\wvec_1} {\wvec_1\cdot\wvec_1} = 2\)
  
  \(c_2=\frac{\bvec\cdot\wvec_2} {\wvec_2\cdot\wvec_2} = 3\)
3. \(\displaystyle \bhat = \threevec744\)
We know \(\bhat=c_1\wvec_1 + c_2\wvec_2 + \cdots + c_n\wvec_n\) and we can find \(c_i=\frac{\bvec\cdot\wvec_i}{\wvec_i\cdot\wvec_i}\) by requiring that \(\bvec-\bhat\) be orthogonal to every vector \(\wvec_i\text{.}\)
The vectors \(\uvec_i\) form an orthogonal set and since \(\uvec_i\cdot\uvec_i = \len{\uvec_i}^2 = 1\text{,}\) the weights are \(c_i=\frac{\bvec\cdot\uvec_i}{\uvec_i\cdot\uvec_i} = \bvec\cdot\uvec_i\text{.}\)
We have \(Q^T\bvec = \cthreevec{\bvec\cdot\uvec_1}{\vdots}{\bvec\cdot\uvec_n}\) so that

\begin{equation*} QQ^T\bvec = (\bvec\cdot\uvec_1)~\uvec_1 + (\bvec\cdot\uvec_2)~\uvec_2 + \cdots + (\bvec\cdot\uvec_n)~\uvec_n=\bhat\text{.} \end{equation*}