Interview with Good Will Hunting

In the 1997 movie Good Will Hunting, an MIT professor challenges his "applied theories"(?) class to tackle an "advanced Fourier system"(?!) he has written on a hallway chalkboard, promising fame and fortune and possibly Fields Medals or Nobel Prizes to anyone smart enough to crack it.

Later, the movie dares to give the mathematics some screen time. I had low expectations: no doubt they’d cut to gibberish splattered on a wall before cutting away out of embarrassment. Instead, the camera panned so slowly and lingered for so long that those of us in technical fields could see that this mainstream film had the courage to accurately depict university-level mathematics.

Moreover, the questions they chose can actually identify extraordinary talent, albeit in the literal sense of "extraordinary": outside the ordinary. I know this from personal experience. Over a decade or so, I asked the same questions many times when conducting technical interviews, where I hoped to find people with above-average ability.

There’s a catch. The phrasing they used seems geared towards checking that students understand the course material rather than detecting genius. The questions should be reworded if our goal is to assess problem-solving skills.

As I rarely interview these days, I’m finally comfortable partially revealing my secrets. I should mention that although I refer to it as "my" pet problem that I used in technical interviews, I originally found it on another website. The fact that it makes a cameo in Good Will Hunting is a sheer coincidence. And speaking of other websites, others wrote pages similar to this one long ago.

I walk the line

In the movie, we see a particular graph with 4 nodes and 5 edges. It’s a good choice, as this graph is large enough to require real work, yet small enough to comfortably calculate on:

The graph in my problem has a few more nodes and edges, and has directed edges instead of undirected edges; these details are unimportant, so we’ll stick with the graph that, as they say, is now a major motion picture.

By the way, I generated the above with my toy diagrams library and the following:

straightEdge aName bName p = maybe p (p <>) do
  (pa, pb) <- innerPoints aName bName p
  pure $ lineWith "" pa pb

curvedEdge rad aName bName aRad bRad p = maybe p (p <>) $ do
  (pa, pb) <- anglePoints aName bName aRad bRad p
  pure $ arcline pa pb rad

gen file dia = do
  putStrLn $ "cat > " ++ file ++ ".svg << EOF"
  putStrLn $ svg 20 dia
  putStrLn "EOF"

lbl = translate (-0.2 :+ -0.25) . text
object s = lbl s <> (unitCircle |> named s)

main = gen "goodwill" $ strutX 4 ||| object "4" === strutY (2*sqrt 3)
  === (object "1" # object "2" # object "3")
  |> straightEdge "1" "4"
  |> straightEdge "4" "2"
  |> straightEdge "1" "2"
  |> curvedEdge (-tau/6) "2" "3" (tau/8) (tau*3/8)
  |> curvedEdge (tau/6) "2" "3" (-tau/8) (-tau*3/8)
  where m # n = m ||| strutX 4 ||| n

I would ask the candidate:

How many ways can you walk from 1 to 3 in exactly 3 steps?
How many ways can you walk from 1 to 3 in exactly 30 steps?

I didn’t quite ask these questions because in my interviews, the graph was heavily disguised. In fact, it’s possible to pass my interview without noticing the underlying graph at all, which happened to me the first time I tried the problem. But my questions were effectively identical. Whether a candidate noticed or not, they were counting paths on a graph.

My first question serves to encourage the candidate to play with smaller examples, and to check we’re on the same page. Just about everybody should find exactly 2 ways to get from 1 to 3 in exactly 3 steps. Both paths visit the nodes 1-4-2-3 in that order. They differ in their choice of edge going from 2 to 3.

The second question is where the fun begins, and I urge you to try it before reading on.

This is tough to solve within the scant time allotted for an interview, so I was liberal with hints, which improved my interviews: helping a candidate also gave me an idea what it would be like to work with them. Mind you, really good candidates breezed through with little or no input from me, and occasionally, a candidate spotted the crux of the problem so quickly that I’d have difficulty filling the rest of the time.

A typical solution might resemble the following. Let \(f_i(k)\) be the number of ways of reaching \(i\) from 1 in exactly \(k\) steps. Then \( f_1(0) = 1 \) and \( f_i(0) = 0 \) for \(i = 2,3,4\).

The only way to reach 3 in exactly \(k\) steps is to first reach 2 in exactly \(k - 1\) steps then choose one of the two edges from 2 to 3:

\[ f_3(k) = 2 f_2(k - 1) \]

Similar reasoning leads to:

\[ \begin{align} &f_1(k) = f_2(k-1) + f_4(k-1) \\ &f_2(k) = f_1(k-1) + 2 f_3(k-1) + f_4(k-1) \\ &f_4(k) = f_1(k-1) + f_2(k-1) \end{align} \]

We now start from the base case \(f_i(0)\) (for \(i = 1,2,3,4\)) and repeatedly use the recurrences to find \(f_i(1), …, f_i(30)\):

f [x1, x2, x3, x4] = [x2 + x4, x1 + 2*x3 + x4, 2*x2, x1 + x2]
iterate f [1, 0, 0, 0] !! 30

[1112067033826,1847043777225,1401935442782,1112067033825]

The answer is the 3rd item in the resulting list: 1401935442782. Congratulations, you got the job! (I’m kidding of course. In reality, each interviewer would send their feedback to some hiring committee.)

Bad things come in threes

If I had been interviewing mathematicians instead of programmers, I might have also asked:

3. What is the generating function for the number of walks from 1 to 3 in a given number of steps?

For a mathematician, this is a simple but tedious exercise. Define \(g_i(z)\) to be the generating function for the number of walks from 1 to \(i\). Then the base case and recurrences become:

\[ \begin{align} &g_1 = 1 + z g_2 + z g_4 \\ &g_2 = z g_1 + 2 z g_3 + z g_4 \\ &g_3 = 2 z g_2 \\ &g_4 = z g_1 + z g_2 \end{align} \]

Substituting for \(g_3, g_4\) in the first two equations gives:

\[ \begin{align} &g_1 = 1 + z^2 g_1 + (z + z^2) g_2 \\ &g_2 = (z + z^2) g_1 + 5 z^2 g_2 \end{align} \]

Rearrange the first equation:

\[ g_1 = \frac{1 + (z + z^2) g_2}{1 - z^2} \]

and substitute in the second equation, and rearrange:

\[ g_2 = \frac{z}{1 - z - 6 z^2 + 4 z^3} \]

Then substituting in the equation for \(g_3\) yields the answer:

\[ g_3 = \frac{2 z^2}{1 - z - 6 z^2 + 4 z^3} \]

We are not the same

Why do I prefer my questions to their movie versions?

In the film, the first question asks for the adjacency matrix \(A\) of the graph. This implies the student has already been taught its definition, so the question merely tests if they can follow a recipe.

The second question asks for the matrix of the number of 3-step walks from \(i\) to \(j\) for all pairs of nodes \(i, j\) which strongly hints that matrix operations help with these sorts of questions, especially since the first question also asked for a matrix. And asking about paths of length 3 indicates a lack of fear that the student will attempt brute force because a standard method has already been explained in class.

The third and fourth questions appear out of order, because the latter is a special case of the former. Don’t mathematicians play with examples before attempting generalization? If the goal is to find a superstar who can figure out the generalization from scratch, why insult their intelligence with the fourth question? Who is smart enough to derive the general formula, yet too stupid to plug numbers into it?

The most sensible explanation for this unusual ordering of questions is that the teacher wants to check that a student can recall some formula they learned in class, and if so, the student sees this formula as more than a string of symbols to be memorized and can actually use it.

In contrast, instead of spoonfeeding students, our questions focus on the ends and not the means. A good sign is that children can understand the first two of our questions. Some of the most celebrated questions in mathematics are easy to understand, but difficult to answer.

We ask for 30-step walks to prevent brute force, and we keep quiet about matrices, because we’re happy with any good solution. But then one might complain: how can we claim that our questions are essentially the same, if the questions in the movie explicitly ask for various matrices, while we seem to be accepting answers involving less work?

It turns out that all this time, we have been performing matrix operations, albeit in an uncivilized manner. For starters, the adjacency matrix of a graph is a square array where the number in \(i\)th row and \(j\)th column holds the number of edges from \(i\) to \(j\). It’s what computer science students are likely to invent if asked for a data structure to hold a graph, as they are familiar with arrays (perhaps overly so). The adjacency matrix of our graph is:

\[ A = \begin{pmatrix} 0 & 1 & 0 & 1 \\ 1 & 0 & 2 & 1 \\ 0 & 2 & 0 & 0 \\ 1 & 1 & 0 & 0 \end{pmatrix} \]

These are precisely the coefficients in our definitions of \(f_i(k)\) above.

Next, we can describe how we counted the \(k\)-step walks from 1 to 3 as reading the 3rd entry of:

\[ (…(( \begin{pmatrix} 1 & 0 & 0 & 0 \end{pmatrix} A)A)…)A \]

where there are \(k\) multiplications by \(A\). If we wanted, say, to count the walks from 4 to 2, we would instead start from \( \begin{pmatrix}0 & 0 & 0 & 1\end{pmatrix} \) and read off the 2nd entry.

If we want counts between all pairs of nodes in a single matrix, we could stick all 4 standard basis vectors together:

\[ I = \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix} \]

and compute:

\[ (…((I A)A)…)A \]

which is just \(A^k\) because matrix multiplication is associative. The matrix \(A^3\) appears in the film.

How about the third question in the movie? Above, to find the desired generating function, we solved simultaneous equations that can be dressed up in matrices:

\[ \begin{pmatrix} g_1 & g_2 & g_3 & g_4 \end{pmatrix} = \begin{pmatrix} 1 & 0 & 0 & 0 \end{pmatrix} + \begin{pmatrix} g_1 & g_2 & g_3 & g_4 \end{pmatrix} zA \]

which rearranges to:

\[ \begin{pmatrix} g_1 & g_2 & g_3 & g_4 \end{pmatrix} = \begin{pmatrix} 1 & 0 & 0 & 0 \end{pmatrix} (I - zA)^{-1} \]

If we wanted the generating functions \(h_1, h_2, h_3, h_4\) for paths starting from 2, we’d solve:

\[ \begin{pmatrix} h_1 & h_2 & h_3 & h_4 \end{pmatrix} = \begin{pmatrix} 0 & 1 & 0 & 0 \end{pmatrix} (I - zA)^{-1} \]

More generally, let \(G\) be the matrix whose entry at the \(i\)th row and \(j\)th column is the generating function for the paths from \(i\) to \(j\). Then:

\[ G = \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix} (I - zA)^{-1} = (I - zA)^{-1} \]

so in particular:

\[ G_{ij} = [(I - zA)^{-1}]_{ij} \]

where the subscript \(ij\) means the entry at row \(i\) and column \(j\). This is better than the answer in the movie, which looks like a version of:

\[ \frac{{\rm adj} (I - zA)_{ij}} {\det (I - zA)} \]

(Their answer appears to assume \(A\) is symmetric, which means it only applies to undirected graphs.)

Sure, we can express inverses in terms of adjugates and determinants, but who cares? Certainly not computer scientists, who would scoff at the inefficient algorithm it suggests.

Indeed, to find a particular \(G_{ij}\), based on a brief web search, it seems best to use Gaussian elimination, as we did above. In general, care must be taken when coding this; see the Bareiss algorithm. Just leave the answer as the inverse of a certain matrix, and we’ll do the rest.

By the way, a classier way to derive the generating functions is to start from the fact that \(A^k\) is the matrix of counts of paths of length \(k\), yielding the generating function:

\[ G = I + Az + A^2 z^2 + … \]

By the matrix edition of the identity \(1 + x + x^2 + … = (1 - x)^{-1}\):

\[ G = (I - zA)^{-1} \]

The result has materialized out of mechanically applying laws of linear algebra, rather than thinking about recurrences and rewriting them as generating functions.

The unreasonable effectiveness of matrices

No matter how we conquer the above problem the first time, we should all ultimately learn to view the solution as an application of linear algebra to reinforce an important life lesson: if we spot linearity in a problem, we can avoid thinking hard and mechanically calculate the answer.

I once found the presence of matrices in graph theory scandalous, because in my mind, a matrix represents a linear transformation of vector spaces, and a proper mathematician only grudgingly touches a matrix because doing so forces them commit to a particular basis in each vector space. This is distasteful because we seek truths that are independent of the choice of basis.

Yet here we are, counting paths on a graph, far away from vector spaces and linear transformations, and we find matrices perfectly suit our needs.

I blame my shock on my upbringing. My university taught matrices in the context of vector spaces. And earlier as a teenager, I came across a computer graphics book that detailed how 4x4 matrices can scale, translate, rotate, and project 3D vectors for display on a 2D monitor.

However, historically, matrices predate vector spaces by a millennium or so. In The Nine Chapters on the Mathematical Art (九章算術), we find the 8th chapter describes how to represent linear equations with a matrix and how to perform Gaussian elimination to solve them. Traditionally, Chinese is written vertically, top to bottom, so as one might expect, this ancient text places the equations in columns rather than rows. But other than this cosmetic difference, namely column operations instead of row operations, it’s Gaussian elimination, seventeen centuries before Gauss.

In light of this, perhaps the effectiveness of matrices is not unreasonable. Each \(f_i(n)\) is a linear combination of the \(f_i(n - 1)\), so naturally a matrix can represent this information. On the other hand, the tough part is matrix multiplication, an operation unmentioned in The Nine Chapters, and whose associativity is hard to see from the viewpoint of linear equations.

I’m also reminded of Minkowski’s geometry of numbers, which transforms problems like diophantine approximation into questions about lattices embedded in a real vector space.

Also mind-bending are closed semirings. Semirings are like rings, but may lack additive and multiplicative inverses, and a closed semiring possesses a closure operation, which is denoted with an asterisk and satisfies the axiom:

\[ a^* = 1 + a \cdot a^* = 1 + a^* \cdot a \]

In closed semirings with well-defined infinite series:

\[ a^* = 1 + a + a^2 + … \]

which is a useful expression in discrete mathematics. For example, see our matrix \(G\) above.

It turns out the \(n \times n\) matrices over a closed semiring is itself a closed semiring. Same for the formal power series over a closed semiring. We can define closed semirings so that the closure of analogues of the adjacency matrix result in a matrix that describes reachability, or shortest paths, or longest paths, and so on. We can define unusual notions of linearity so that taking closures reveals a regular expression corresponding to a given finite-state machine, or solves a knapsack problem, or computes the Fibonacci numbers, and so on.

Now you have two problems

In Good Will Hunting, the professor rewards the solver with another two problems on the chalkboard, which we discuss elsewhere. Presumably, he drew upon vast knowledge of mathematics to select a tough-but-fair challenge.

If the professor were a computer scientist, he might have chosen a lazier way to another way to pick the new problems, because there is a simple, automatic method for generating new problems from previously solved mathematical problems. It commonly appears in technical interviews. We just ask for the most efficient program that produces the solution.

Already, we’ve hinted that it’s tricky to implement efficient Gaussian elimination. More generally, many bright minds have worked hard on algorithms for linear algebra. For example, see some of the entries in the top 10 algorithms of the 20th century according to Computing in Science & Engineering.

The movie’s questions don’t need such heavy artillery, but it’s already difficult to determine the fastest algorithm for counting paths on a graph. Even the most fundamental operations rapidly lead to complications.

How fast can we multiply two integers? Obviously it’s at least linear time because it takes that long to write down the digits. But that’s the extent of our knowledge for the lower bound! The best known upper bound is \(O(N \log N)\) (Harvey and Hoeven 2021), though this is a strictly theoretical result; in practice, integer multiplication is performed with a variety of clever algorithms with slightly worse complexity.

How fast can we multiply two matrices? A single matrix multiply takes \(O(N^\omega)\) field multiplications for some \(2 \le \omega \le 3\), where the lower bound is clear from the number of entries we must write down, while the upper bound is implied by the textbook definition. Again, nobody knows how to improve the lower bound, while over the years, many researchers have heroically reduced the exponent from roughly \(2.8074\) (Strassen 1969) to \(2.371552\) (William, Xu, Xu, and Zhou 2024).

How fast can we exponentiate by \(N\)? Surely this is easy to answer, as repeated squaring takes \(O(\log N)\) multiplications, which is optimal because it takes that long to read the number \(N\). But then we can drop the big O and improve the constant factor, perhaps with sliding windows, or figure out the minimum number of multiplications, which leads to addition chains, upon which we discover there is no known efficient algorithm to find the shortest addition chain.

In other words, asking for the fastest algorithm for computing powers of a matrix immediately leads to difficult unsolved problems!

Ben Lynn blynn@cs.stanford.edu 💡