Interview with Good Will Hunting

In the 1997 movie Good Will Hunting, an MIT professor challenges his "applied theories"(?) class to tackle an "advanced Fourier system"(?!) he has written on a hallway chalkboard, promising fame and fortune and possibly Fields Medals or Nobel Prizes to anyone smart enough to crack it.

Later, the movie dares to give the mathematics some screen time. I had low expectations. No doubt they’d cut to gibberish splattered on a wall before cutting away out of embarrassment. Instead, the camera panned so slowly and lingered for so long that those of us in technical fields could see that this mainstream film had the courage to accurately depict university-level mathematics.

Moreover, the questions they chose can actually identify extraordinary talent, albeit in the literal sense of "extraordinary": outside the ordinary. I know this from personal experience. Over a decade or so, I asked the same questions many times when conducting technical interviews, where I hoped to find people with above-average ability.

There’s a catch. The phrasing they used seems geared towards checking that students understand the course material rather than detecting genius. The questions should be reworded if our goal is to assess problem-solving skills.

As I rarely interview these days, I’m finally comfortable partially revealing my secrets. I should mention that although I refer to it as "my" pet problem that I used in technical interviews, I originally found it on another website. The fact that it makes a cameo in Good Will Hunting is a sheer coincidence. And speaking of other websites, others wrote pages similar to this one long ago.

I walk the line

In the movie, we see a particular graph with 4 nodes and 5 edges. It’s a good choice, as this graph is large enough to require real work, yet small enough to comfortably calculate on:

4123

The graph in my problem has a few more nodes and edges, and has directed edges instead of undirected edges; these details are unimportant, so we’ll stick with the graph that, as they say, is now a major motion picture.

By the way, I generated the above with my toy diagrams library and the following:

straightEdge aName bName p = maybe p (p <>) do
  (pa, pb) <- innerPoints aName bName p
  pure $ lineWith "" pa pb

curvedEdge rad aName bName aRad bRad p = maybe p (p <>) $ do
  (pa, pb) <- anglePoints aName bName aRad bRad p
  pure $ arcline pa pb rad

gen file dia = do
  putStrLn $ "cat > " ++ file ++ ".svg << EOF"
  putStrLn $ svg 20 dia
  putStrLn "EOF"

lbl = translate (-0.2 :+ -0.25) . text
object s = lbl s <> (unitCircle |> named s)

main = gen "goodwill" $ strutX 4 ||| object "4" === strutY (2*sqrt 3)
  === (object "1" # object "2" # object "3")
  |> straightEdge "1" "4"
  |> straightEdge "4" "2"
  |> straightEdge "1" "2"
  |> curvedEdge (-tau/6) "2" "3" (tau/8) (tau*3/8)
  |> curvedEdge (tau/6) "2" "3" (-tau/8) (-tau*3/8)
  where m # n = m ||| strutX 4 ||| n

I would ask the candidate:

  1. How many ways can you walk from 1 to 3 in exactly 3 steps?

  2. How many ways can you walk from 1 to 3 in exactly 30 steps?

I didn’t quite ask these questions because in my interviews, the graph was heavily disguised. In fact, it’s possible to pass my interview without noticing the underlying graph at all, which happened to me the first time I tried the problem. But my questions were effectively identical. Whether a candidate noticed or not, they were counting paths on a graph.

My first question serves to encourage the candidate to play with smaller examples, and to check we’re on the same page. Just about everybody should find exactly 2 ways to get from 1 to 3 in exactly 3 steps. Both paths visit the nodes 1-4-2-3 in that order, and there are two edges between 2 and 3.

The second question is where the fun begins, and I urge you to try it before reading on.

This is tough to solve within the scant time allotted, so I was liberal with hints, which improved my interviews: helping a candidate also gave me an idea what it would be like to work with them. Good candidates breezed through with little or no input from me. And occasionally, a candidate spotted the crux of the problem so quickly that I’d have difficulty filling the rest of the time.

A typical solution might resemble the following. Let \(f_i(k)\) be the number of ways of reaching \(i\) from 1 in exactly \(k\) steps. Then \( f_1(0) = 1 \) and \( f_i(0) = 0 \) for \(i = 2,3,4\).

The only way to reach 3 in exactly \(k\) steps is to first reach 2 in exactly \(k - 1\) steps then choose one of the two edges from 2 to 3:

\[ f_3(k) = 2 f_2(k - 1) \]

Similar reasoning leads to:

\[ \begin{align} &f_1(k) = f_2(k-1) + f_4(k-1) \\ &f_2(k) = f_1(k-1) + 2 f_3(k-1) + f_4(k-1) \\ &f_4(k) = f_1(k-1) + f_2(k-1) \end{align} \]

We now start from the base case \(f_i(0)\) (for \(i = 1,2,3,4\)) and repeatedly use the recurrences to find \(f_i(1), …​, f_i(30)\):

f [x1, x2, x3, x4] = [x2 + x4, x1 + 2*x3 + x4, 2*x2, x1 + x2]
iterate f [1, 0, 0, 0] !! 30

[1112067033826,1847043777225,1401935442782,1112067033825]

The answer is the 3rd item in the resulting list: 1401935442782. Congratulations, you got the job! (I’m kidding of course. In reality, each interviewer would send their feedback to some hiring committee.)

Bad things come in threes

If I had been interviewing mathematicians instead of programmers, I might have also asked:

3. What is the generating function for the number of walks from 1 to 3 in a given number of steps?

For a mathematician, this is a simple but tedious exercise. Define \(g_i(z)\) to be the generating function for the number of walks from 1 to \(i\). Then the base case and recurrences become:

\[ \begin{align} &g_1 = 1 + z g_2 + z g_4 \\ &g_2 = z g_1 + 2 z g_3 + z g_4 \\ &g_3 = 2 z g_2 \\ &g_4 = z g_1 + z g_2 \end{align} \]

Substituting for \(g_3, g_4\) in the first two equations gives:

\[ \begin{align} &g_1 = 1 + z^2 g_1 + (z + z^2) g_2 \\ &g_2 = (z + z^2) g_1 + 5 z^2 g_2 \end{align} \]

Rearrange the first equation:

\[ g_1 = \frac{1 + (z + z^2) g_2}{1 - z^2} \]

and substitute in the second equation, and rearrange:

\[ g_2 = \frac{z}{1 - z - 6 z^2 + 4 z^3} \]

Then substituting in the equation for \(g_3\) yields the answer:

\[ g_3 = \frac{2 z^2}{1 - z - 6 z^2 + 4 z^3} \]

We are not the same?

Why do I prefer my questions to their movie versions?

In the movie, the first question asks for the adjacency matrix \(A\) of the graph. This implies the student has already been taught its definition, so the question merely tests if they can follow a recipe.

The second question asks for the matrix of the number of 3-step walks from \(i\) to \(j\) for all pairs of nodes \(i, j\) which strongly hints that matrix operations help with these sorts of questions, especially since the first question also asked for a matrix. And asking about paths of length 3 indicates that the professor has no fear the student will attempt brute force because a standard method has already been explained in class.

The third and fourth questions appear out of order, because the latter is a special case of the former. Don’t mathematicians play with examples before attempting generalization? If the goal is to find a superstar who can figure out the generalization from scratch, why insult their intelligence with the fourth question? Who is smart enough to derive the general formula, yet too stupid to plug numbers into it?

The most sensible explanation I have for this unusual ordering of questions is that the teacher wants to check that a student can recall some formula they learned in class, and if so, the student sees this formula as more than a string of symbols to be memorized and can actually use it.

In contrast, instead of spoonfeeding students, we focus on the ends and not the means. And even children can understand our first two questions. Some of the most celebrated questions in mathematics are easy to understand, but difficult to answer, or unsolved. We ask for 30-step walks to prevent brute force, and we keep quiet about matrices, because we’re happy with any good solution.

At this point, one might complain about an inconsistency. We’ve been claiming that our questions are essentially the same as those in the movie, but how can this be if the questions in the movie explicitly ask for various matrices?

It turns out that all this time, we have been performing matrix operations, though in an uncivilized manner. For starters, the adjacency matrix of a graph is a square array where the number in \(i\)th row and \(j\)th column holds the number of edges from \(i\) to \(j\). It’s what computer science students are likely to invent if asked for a data structure to hold a graph, as they are familiar with arrays (perhaps overly so). The adjacency matrix of our graph is:

\[ A = \begin{pmatrix} 0 & 1 & 0 & 1 \\ 1 & 0 & 2 & 1 \\ 0 & 2 & 0 & 0 \\ 1 & 1 & 0 & 0 \end{pmatrix} \]

These are precisely the coefficients in our definitions of \(f_i(k)\) above.

Next, we can describe how we counted the \(k\)-step walks from 1 to 3 as reading the 3rd entry of:

\[ (…​(( \begin{pmatrix} 1 & 0 & 0 & 0 \end{pmatrix} A)A)…​)A \]

where there are \(k\) multiplications by \(A\). If we wanted, say, to count the walks from 4 to 2, we would instead start from \( \begin{pmatrix}0 & 0 & 0 & 1\end{pmatrix} \) and read off the 2nd entry.

If we want counts between all pairs of nodes in a single matrix, we could stick all 4 standard basis vectors together:

\[ I = \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix} \]

and compute:

\[ (…​((I A)A)…​)A \]

which is just \(A^k\) because matrix multiplication is associative. The matrix \(A^3\) appears in the film.

Naturally, with computers, repeatedly multiplying by \(A\) is fine for \(k = 30\). For large enough \(k\), repeated doubling to compute \(A^k\) probably helps a lot. I’m uncertain of the exact benefit because the matrix entries themselves grow exponentially.

How about the third question in the movie? Above, to find the desired generating function, we solved simultaneous equations that can be dressed up in matrices:

\[ \begin{pmatrix} g_1 & g_2 & g_3 & g_4 \end{pmatrix} = \begin{pmatrix} 1 & 0 & 0 & 0 \end{pmatrix} + \begin{pmatrix} g_1 & g_2 & g_3 & g_4 \end{pmatrix} zA \]

which rearranges to:

\[ \begin{pmatrix} g_1 & g_2 & g_3 & g_4 \end{pmatrix} = \begin{pmatrix} 1 & 0 & 0 & 0 \end{pmatrix} (I - zA)^{-1} \]

If we wanted the generating functions \(h_1, h_2, h_3, h_4\) for paths starting from 2, we’d solve:

\[ \begin{pmatrix} h_1 & h_2 & h_3 & h_4 \end{pmatrix} = \begin{pmatrix} 0 & 1 & 0 & 0 \end{pmatrix} (I - zA)^{-1} \]

Let \(G\) be the matrix with has the generating function for the paths from \(i\) to \(j\) in the \(i\)th row and \(j\)th column. Then:

\[ G = \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix} (I - zA)^{-1} = (I - zA)^{-1} \]

so in particular:

\[ G_{ij} = [(I - zA)^{-1}]_{ij} \]

where the subscript \(ij\) means the entry at row \(i\) and column \(j\). This is better than the answer in the movie, which looks like a version of:

\[ \frac{{\rm adj} (I - zA)_{ij}} {\det (I - zA)} \]

(Their answer appears to assume \(A\) is symmetric, which means it only applies to undirected graphs.)

Sure, we can express inverses in terms of adjugates and determinants, but who cares? Certainly not computer scientists, who would scoff at the inefficient algorithm it suggests.

Indeed, to find a particular \(G_{ij}\), based on a brief web search, it seems best to use Gaussian elimination, as we did above. In general, care must be taken when coding this; see the Bareiss algorithm. Just leave the answer as the inverse of a certain matrix, and we’ll do the rest.

By the way, a classier way to derive the generating functions is to start from the fact that \(A^k\) is the matrix of counts of paths of length \(k\), yielding the generating function:

\[ G = I + Az + A^2 z^2 + …​ \]

By the matrix edition of the identity \(1 + x + x^2 + …​ = (1 - x)^{-1}\):

\[ G = (I - zA)^{-1} \]

The result has materialized out of mechanically applying laws of linear algebra, rather than thinking about recurrences and rewriting them as generating functions.

The unreasonable effectiveness of matrices

No matter how we conquer the above problem the first time, we should all ultimately learn how to view the solution as an application of linear algebra to reinforce an important life lesson: if we spot linearity in a problem, we can avoid thinking hard and mechanically calculate the answers.

I first found the presence of matrices in graph theory scandalous, because in my mind, a matrix represents a linear transformation of vector spaces. A proper mathematician only grudgingly touches a matrix because doing so forces them commit to a particular basis in each vector space. This is distasteful because we seek truths that are independent of the choice of basis.

Yet here we are, counting paths on a graph, far away from vector spaces and linear transformations, and we find matrices perfectly suit our needs.

I blame my upbringing for my shock. My university taught matrices in the context of vector spaces. And earlier as a teenager, I came across a computer graphics book that detailed how matrices can scale, translate, rotate, and project 3D vectors.

However, historically matrices predate vector spaces by a millennium or so. In The Nine Chapters on the Mathematical Art, we find the 8th chapter describes how to represent linear equations with a matrix and how to perform Gaussian elimination to solve them. Traditionally, Chinese is written vertically, top to bottom, so as one might expect, this ancient text places the equations in columns rather than rows, as we do. But other than this cosmetic difference (column operations instead of row operations), it’s Gaussian elimination, over seventeen centuries before Gauss.

In light of this, perhaps the effectiveness of matrices is not unreasonable. The value of each \(f_i(n)\) is a linear combination of the \(f_i(n - 1)\), so naturally a matrix can represent this information. Admittedly, the tough part is matrix multiplication, an operation unmentioned in The Nine Chapters, and whose associativity is hard to see from the viewpoint of linear equations.

More mind-bending are closed semirings. Semirings are like rings, but may lack additive and multiplicative inverses, and a closed semiring possesses a closure operation, which is denoted with an asterisk and satisfies the axiom:

\[ a^* = 1 + a \cdot a^* = 1 + a^* \cdot a \]

In closed semirings with well-defined infinite series:

\[ a^* = 1 + a + a^2 + …​ \]

which is an expression that is useful in discrete mathematics. Above, our matrix \(G\) takes this form.

It turns out the \(n \times n\) matrices over a closed semiring is itself a closed semiring. Same for the formal power series over a closed semiring. We can define closed semirings so that the closure of analogues of the adjacency matrix result in a matrix that describes reachability, or shortest paths, or longest paths, and so on. We can define unusual notions of linearity so that taking closures reveals a regular expression corresponding to a given finite-state machine, or solves a knapsack problem, or computes the Fibonacci numbers, and so on.

Now you have two problems

In Good Will Hunting, the professor rewards the solver with another two problems on the chalkboard that supposedly took him (and his colleagues) more than two years to prove. These aren’t directly related to my interview question, but let’s walk through them anyway.

The second problem, drawing all homeomorphically irreducible trees with 10 nodes, is unworthy of a savant. The question is impressive mostly because of the jargon. The hardest part is learning the definitions.

A tree is a connected graph with no cycles, that is, given any two nodes, there is a unique path between them. The degree of a node is the number of edges connected to it. A node is external if it has degree 1, and is internal otherwise. A graph is homeomorphically irreducible or just irreducible if no node has degree 2. This is the same as saying all internal nodes have degree 3 or more.

We also need to know exactly what it means for one graph to differ from another. I think the answer occurs naturally to most people, though is hard to pin down in words. We’ll say that two graphs are considered equivalent (isomorphic) if they have the same adjacency matrix, possibly after reordering nodes.

These stringent conditions mean that the 10-node case is readily dispatched.

We first notice that if we remove all external nodes of a tree, then the remaining internal nodes must also form a tree.

Next, suppose there are 4 internal nodes. Doodling reveals that there are two cases, and in both we find 6 external nodes are barely enough. Thus we have drawn 2 of the desired trees, and have also discovered there can be no more than 4 internal nodes.

We continue bashing through cases. If there is 1 internal node, then we attach 9 external nodes to it, resulting in a unique tree. If there are 2 internal nodes, they must be connected to each other, and we find 3 different ways to attach external nodes to reach 10 nodes total. If there are 3 internal nodes, they must be connected in series, and we find 4 different ways to attach the external nodes: the degrees of the 3 nodes are 3-3-5, 3-5-3, 3-4-4, and 4-3-4.

Done! It took nowhere near two years. (Good thing they didn’t ask for a general formula!)

This question looks great on screen. We see a collection of intricate drawings of trees, and are reminded that mathematics is more than numbers and algebraic expressions.

In the movie, Will Hunting only draws 8 trees before he is interrupted and chased away. The board is missing the case with 4 internal nodes in series and what I called the 3-3-5 case.

Counting labeled trees

The other problem asks for the number of \(n\)-node labeled trees, which means this time, the nodes are labeled 1 to \(n\) and must appear in the adjacency matrix in order when deciding equality.

Counting the first 5 cases is easy, and provides enough data for a good educated guess at the general formula. I urge you to try before reading on.

If \(n = 1, 2\) there is only one possible unlabeled tree, and only one way to label it uniquely. If \(n = 3\), the only possible unlabeled tree is 3 nodes connected in series, but there are 3 different labelings, each determined by the choice of the label for the node in the middle.

If \(n = 4\) then there are two shapes: either all 4 nodes are connected in series, or there is one node of degree 3 that connects to the others, all of which are external. In the latter case, there are 4 labelings, where again each is determined by the choice of the label of the central node. In the former case, there are \(4! = 24\) permutations we can arrange 4 labels in a row, but this counts each labeling twice: rotating the line of nodes by 180 degrees yields the reverse of their permutation. Thus there are a total of \(4 + 24/2 = 16\) trees.

If \(n = 5\), then one of the following occurs:

  • we have 5 nodes in series, yielding \(5! / 2 = 60\) unique labelings.

  • we have one node of degree 4 and the other nodes are external, yielding 5 unique labelings.

  • we have one node of degree 3, one of degree 2, and the other nodes are external, yielding \(5\times 4\times 3 = 60\) unique labelings which are determined by the choices of labels of the internal nodes and the external connected to the degree 2 node.

This gives a total of 125 labeled trees. So we now have the sequence \(1, 1, 3, 16 = 4^2, 125 = 5^3\), which leads to the guess:

\[ n^{n-2} \]

This guess turns out to be correct, and is known as Cayley’s formula, who proved it in the 19th century (see Cayley, A theorem on trees). It appears on the board in the movie without proof. However, since Will Hunting was wicked smart, he was likely able to supply it.

I doubt Cayley spent two years proving his formula, but this question is the hardest problem in the movie. It’s too difficult for an interview, and even if a candidate could supply a proof, I would assume it’s because they’d come across it before.

But it is true that many famous mathematicians have answered this question. Aigner and Ziegler, Proofs from THE BOOK, walk through four distinct proofs of Cayley’s formula, each of which is ingenious and elegant. Because of my background in cryptography, my favourite is the proof due to Joyal:

On a labeled tree with \(n\) nodes, choose any node to be the "start", and any node to be the "end"; these may be the same node. Since it’s a tree, there is a unique path from the start node to the end node, which we’ll call the "spine".

Let \(a b …​ z\) be the labels of the nodes of the spine sorted in ascending order. Then the path from the start node to the end node describes a permutation of \(a b …​ z\) which we denote \(\pi\).

Define a function \(f\) from the set of naturals \([1..n]\) to itself as follows. If \(k \in [1..n] \) is on the spine, then \(f(k) = \pi(k)\). Otherwise, there is a unique path from \(k\) to the start node (in fact, we can use any node on the spine). Start walking along this path, and define \(f(k)\) to be the first node you reach. Different labelings and choices of start and end nodes lead to different functions.

There are \(n\) choices for the start node, and same for the end node. As there are \(n^n\) different functions from \([1..n]\) to itself, we have shown there are \(n^{n-2}\) different labeled trees…​provided we fill in a gap.

To prove we have constructed a bijection, we need the flip side: we must show that any function \(f\) from \([1..n]\) to itself corresponds to a labeled tree. But consider repeatedly applying \(f\) on some \(k \in [1..n]\). Either \(k\) is in a cycle, or we’ll eventually reach a value that is in a cycle.

Thus we can reverse the above process. Together, the cycles describe a permutation \(\pi\) of some of the elements of \([1..n]\), which we use to build a spine. As for the other elements, applying \(f\) describes how to connect them via subtrees rooted to the spine.

In cryptography, Pollard’s rho algorithm repeatedly applies a function from a finite set to itself, eventually causing the values to cycle, which is exploited to factor an integer or find a discrete logarithm.


Ben Lynn blynn@cs.stanford.edu 💡