Lecture 21 Quiz

[NOTE: This quiz is due at the beginning of lecture Monday, not Friday. At first glance it may look like a lot of work, but in total you just have to (i) answer a simple conceptual question that we have discussed a few times in lecture, (ii) take the derivative of an ordinary function with respect to one variable, and (iii) write a single line of code. Please let us know on Piazza if you have questions!]

In the past few lectures we've covered two fundamental numerical tools used for computer graphics: numerical optimization, and numerical linear algebra. In this quiz, you're going to put these ideas together to design a real algorithm for solving linear systems. Hopefully after doing this exercise, you will never again think of a linear solver as a "black box," but rather something that you yourself can design, build, and tinker with. (Good graphics engineering often involves exactly this kind of tinkering, and you might be surprised how often people in "the real world" design or modify numerical solvers to address the needs of particular problems or hardware platforms.)

The basic idea of our algorithm is to solve a linear system $Ax=b$ by applying gradient descent to the objective

$$ f(x) = \tfrac{1}{2}x^T A x - b^T x. $$

Recall that the gradient of this function is again just $Ax-b$! Therefore, if we end up at a point $x^ * $ where the gradient is zero, then we will have solved our original linear system. In other words,

$$ 0 = f(x^ * ) = Ax^ * - b $$

implies that $Ax^ * = b$. Just as a reminder, our basic gradient descent algorithm was:

  1. Pick an initial guess $x$.
  2. Compute the gradient descent direction $d \leftarrow -\nabla f(x)$ at the current point $x$.
  3. Take a step in the descent direction: $x \leftarrow x + \tau d$, where $\tau > 0$ is our step size.
  4. Repeat until the gradient is "sufficiently small" (for instance, until its largest element is some fixed fraction of the largest element of $b$).

Question 1

Our algorithm tries to find a solution by "skiing downhill" until we hit a point where the gradient is zero (i.e., until we can't go downhill anymore). What conditions must $A$ satisfy in order for our algorithm to successfully find a solution to the linear system?

Question 2

In our optimization lecture, we briefly discussed the idea of line search. In a nutshell: once we know which direction to go, we still need to know how far to travel. In other words, we need to determine the step size $\tau$.

In general, determining the ideal step size can be difficult. Fortunately, for our little problem we can compute the optimal step size exactly. In particular, it is the distance we need to travel along the current descent direction $d$ such that our objective $f$ becomes as small as possible along this ray. More explicitly, we can write the values of $f$ along the direction $d$ as

$$ s(\tau) := f(x + \tau d). $$

Your job is to minimize $s(\tau)$ with respect to $\tau$; this minimizing value of $\tau$ gives the optimal step size.

Hint #1: set the derivative of $s$ with respect to $\tau$ equal to zero, and solve!

Hint #2: There are two different ways to compute the derivative of $s$ with respect to $\tau$: one is to expand $s$ in terms of the original definition of $f$, then take the derivative with respect to $\tau$. The other is to apply the chain rule, which yields a somewhat shorter derivation. You decide!

Hint #3: The final expression for the optimal step size $\tau$ should involve only 5 letters.

Question 3

We now have all the ingredients we need to run our gradient descent algorithm (outlined above). Consider the 2x2 linear system

$$ \tfrac{1}{5} \left[\begin{array}{cc} 2 & 1 ; \ 1 & 3 \end{array} \right] \underbrace{\left[\begin{array}{c} u \ v \end{array}\right]}_{=:x} = \left[\begin{array}{c} 4 \ 5 \end{array}\right], $$

where the semicolon denotes a new row of the 2x2 matrix. I.e., the entcries of $A$ are $A_{11} = 2$, $A_{12} = 1$, $A_{21} = 1$ and $A_{22} = 3$.

First, solve this system by hand so that you know what the true solution should be. (Also check your answer by plugging it back into the original equations!)

Next, using the skeleton code below (which is almost entirely written for you!), implement the gradient descent algorithm using the step size you found in Question 2. Run your algorithm starting with an initial guess of (u=v=0). Do you arrive at the correct answer? For your quiz submission, print out the value of $x$ for the first 10 iterations of the algorithm.

Here is the code skeleton in C, though you can implement this in any language you like. It should not require fancy data structures, or really much code at all! (We simply wanted you to be able to run this algorithm on your own machine; this is not meant as a "real" coding exercise.)

#include <stdio.h>

int main( int argc, char** argv )
{
   double A[2][2] = { { 2./5., 1./5. }, { 1./5., 3./5. } };
   double x[2] = { 0., 0. };
   double b[2] = { 4., 5. };
   double d[2];
   double tau;
   int i;

   for( i = 0; i < 10; i++ )
   {
      /* d = b - Ax */
      d[0] = b[0] - ( A[0][0]*x[0] + A[0][1]*x[1] );
      d[1] = b[1] - ( A[1][0]*x[0] + A[1][1]*x[1] );

      /* compute the step size tau */
      YOUR CODE HERE!

      /* take a gradient step */
      x[0] += tau * d[0];
      x[1] += tau * d[1];

      printf( "x_%d: %.7f %.7f\n", i, x[0], x[1] );
   }

   return 0;
}

For fun, you can try running the code with different initial guesses, step sizes, and matrices. When does it succeed? When does it fail? You can see pretty clearly how to generalize this code to NxN matrices: just replace the fixed arithmetic with loops from 1 to N. In some sense, it's pretty amazing that you can solve big linear systems with code this simple! (Albeit very slowly...) On the other hand, we do not want to see you using this code when you eventually write real applications; there are much better solutions! :-)