The Shortest Vector Problem

Pick a polynomial with rational coefficients with a real root. Enter the root and the degree of the polynomial, and click the button. Our program will try to guess your polynomial!

(The answer for the default values should be \(x^3 + 2x^2 + 3x + 4\).)

How does this work? See László Lovász, An Algorithmic Theory of Numbers, Graphs, and Convexity. We use the LLL algorithm, which finds an approximate solution to the shortest vector problem.

Shortest Vector Problem: Given a basis \( \newcommand{\vv}[1]{\mathbf{#1}} \vv{b}_1,…,\vv{b}_n\) of \(\mathbb{R}^n\), find a non-zero element of the additive subgroup \(L\) generated by the basis (known as a lattice) with the smallest norm. That is, if we let \(\vv{B}\) denote the matrix of the basis vectors, then:

\[ L = \{ \vv{B} \vv{a} | \vv{a} \in \mathbb{Z}^n \} \]

and we wish to find a nonzero element \(\vv{v} \in L\) such that \( \lVert \vv{v}\rVert \) is minimal.

By norm, we mean the Euclidean norm, that is,

\[ \lVert \vv{v}\rVert = \sqrt{v_1^2 + … + v_n^2} \]

where the \(v_i\) are the coordinates of \(\vv{v}\) with respect to some orthonormal basis.

This problem is hard (NP-hard; Lovász suspects it is NP-complete). Indeed, how do we even prove a a shortest vector must exist in the first place? Perhaps we are doomed to only find vectors whose norms are arbitrarily close to, but strictly greater than some limit?

Easy does it

When a problem appears unsolvable in its full generality, one should temporarily restrict it; perhaps it can then be conquered by a gradual advance.

— Gottlob Frege
Begriffsschrift

Consider orthogonal lattices. Let \(\vv{b}_1,…,\vv{b}_n\) be an orthogonal basis sorted in order of increasing norm. Then \(\vv{b}_1\) is a shortest vector.

More rigorously, for any integer \(k \in [1..n]\) and any \(a_1,…,a_n \in \mathbb{Z}^n\), by orthogonality:

\[ \lVert \sum_i a_i \vv{b}_i \rVert = \sum_i \lVert a_i \vv{b}_i \rVert \ge \lVert a_k \vv{b}_k \rVert \]

And for any nonzero integer \(a\):

\[ \lVert \vv{b}_1 \rVert \le \lVert \vv{b}_k \rVert \le \lVert a \vv{b}_k \rVert \]

Thus \(\vv{b}_1\) is a shortest vector.

Two’s company

There is little to be said in zero or one dimensions. Two dimensions is a big step up in difficulty.

Inspired from above, we strive to become as orthogonal as possible: we find some integer \(r\) so that the angle between \(\vv{b}_1\) and:

\[ \vv{b}'_2 = \vv{b}_2 + r \vv{b}_1 \]

is close as possible to a right angle. Observe the basis \(\vv{b}_1, \vv{b}'_2\) generates the same lattice as \(\vv{b}_1, \vv{b}_2\). Hopefully, we can then extend our reasoning above enough to find a shortest vector.

It’s easiest to explain if we first rotate and scale so that one basis vector is \(\vv{b}_1 = (1\enspace 0)\). This transforms our task to adding some integer to the first coordinate of \(\vv{b}_2\) so it is close as possible to zero. We simply subtract an integer closest to the first coordinate of \(\vv{b}_2\). Let \(\vv{b}'_2 = (x\enspace y)\) be the result. Then we have:

\[-\frac{1}{2} \le x \le \frac{1}{2}\]

We could insist on strictness in one of the inequalities, but this has no effect on our arguments. For any integer \(n\):

\[ \lVert n \vv{b}_1 + \vv{b}'_2 \rVert^2 = (n + x)^2 + y^2 \ge x^2 + y^2 = \lVert \vv{b}'_2 \rVert^2 \]

thus \(\vv{b}'_2\) is the shortest vector of this form.

Suppose:

\[ \lVert \vv{b}_1 \rVert \le \Vert \vv{b}'_2 \rVert \]

that is:

\[ 1 \le \sqrt{x^2 + y^2} \]

Since \(x^2 \le \frac{1}{4}\) we must have \(y^2 \ge \frac{3}{4} \).

Thus for any integers \(r, s\) such that \(|s| \ge 2\):

\[ \lVert r \vv{b}_1 + s \vv{b}'_2 \rVert^2 \ge s^2 y^2 \ge 3 \gt 1 = \lVert \vv{b}_1 \rVert^2 \]

Therefore, if \( \lVert \vv{b}_1 \rVert \le \Vert \vv{b}'_2 \rVert \), then \(\vv{b}_1\) is a shortest vector, as we have just seen adding any nonzero integer multiple of \(\vv{b}'_2\) results in a vector that is at least as long. Undoing the scaling and rotation yields our original \(\vv{b}_1\).

But what if \( \lVert \vv{b}_1 \rVert \gt \Vert \vv{b}'_2 \rVert \)? In this case, we rename \( \vv{b}'_2, \vv{b}_1 \) to \( \vv{b}_1, \vv{b}_2 \) respectively, undo the scaling and rotation, and go back to the first step.

This shortest vector algorithm for two dimensions resembles Euclid’s algorithm, and it is known as Lagrange’s algorithm, Gauss’s algorithm, or the Lagrange-Gauss algorithm. It seems precedence should go to Lagrange, who wrote about it in 1773; he viewed it as a problem about quadratic forms.

Lagrange’s algorithm is a few lines.

nearestZ (a :% b) = if 2*r > b then q + 1 else q where
  (q, r) = divMod a b

dot = (sum .) . zipWith (*)
quadrance v = dot v v

lagrangeReduce [u, v] = [u, (zipWith (-) v $ (m*) <$> u)] where
  m = nearestZ $ dot v u % dot u u

lagrange b
  | quadrance u <= quadrance v = u
  | otherwise = lagrange [v, u]
  where
  [u, v] = lagrangeReduce b

lagrange [[123,456],[123,455]]
lagrange [[123,456],[60,240]]

However, the complexity of this algorithm is not obvious, and indeed, how do we know it even terminates? It turns out to be a polynomial-time algorithm:

a proof by Michael Soltys.
Phong and Stehlé walk through two proofs; they also generalize Lagrange’s algorithm to efficiently solve the cases \(n = 3, 4\).

Close enough

As the number of dimensions \(n\) increases, our troubles grow exponentially, though it could be worse. We should be grateful that exponential-time algorithms exist.

Nevertheless, we can still obtain useful results in polynomial time. The LLL algorithm, a sort of sloppy Euclid’s algorithm for more than two things, given a basis of \(\mathbb{Q}^n\), returns a shortest vector up to a factor of \(2^{(n-1)/2}\). Moreover, for a fixed dimension, it implies a polynomial-time algorithm for finding a shortest vector.

Again, we strive for orthogonality. We can convert any basis \(\vv{b}_1,…,\vv{b}_n\) of \(\mathbb{R}^n\) into an orthogonal basis \(\vv{b}^*_1,...,\vv{b}^*_n\) inductively. Start with:

\[\vv{b}^*_1 = \vv{b}_1\]

then iteratively set \(\vv{b}^*_k\) to be the rejection of \(\vv{b}_k\) on the subspace generated by the basis vectors found so far, that is, the part of \(\vv{b}_k\) perpendicular to each of \(\vv{b}^*_1,...,\vv{b}^*_{k-1}\) which we can compute by subtracting away projections:

\[ \vv{b}^*_k = \vv{b}_k - \sum_{i=1}^{k-1} \frac{\vv{b}_k \cdot \vv{b}^*_i}{\vv{b}^*_i \cdot \vv{b}^*_i } \vv{b}^*_i \]

The result is known as the Gram-Schmidt orthogonalization of the original basis. Although the \(\vv{b}^*_i\) generate a distinct lattice in general, the LLL algorithm uses the Gram-Schmidt orthogonalization to find a lower bound of the shortest vector, and also to define an \(n\)-dimensional analogue of a basis that is orthogonal as possible.

gramSchmidt = go [] . map (map (% 1)) where
  go acc = \case
    [] -> reverse acc
    h:t -> go (foldr (zipWith (+)) h (muMul h <$> acc):acc) t
  muMul w v = (mu w v *) <$> v

mu w v = -(dot w v / dot v v)

Lemma: Let \(\vv{v}\) be a nonzero vector in the lattice generated by \(\vv{b}_1,…,\vv{b}_n\). Then:

\[ \lVert \vv{v} \rVert \ge \min\{\lVert \vv{b}^*_1 \rVert,…,\lVert \vv{b}^*_n \rVert \} \]

In particular, this bound applies to the shortest vector.

Proof: Let:

\[ \vv{v} = \sum_{i=1}^k a_i \vv{b}_i = \sum_{i=1}^{k-1} a_i \vv{b}_i + a_k \vv{b}_k \]

where \(a_k \ne 0\).

With respect to the Gram-Schmidt orthogonalization of the basis:

\[ \vv{v} = \sum_{i=1}^{k-1} a^*_i \vv{b}^*_i + a^*_k \vv{b}^*_k \]

Taking the inner product with \(\vv{b}^*_k\) in both equations implies:

\[ a_k \vv{b}_k \cdot \vv{b}^*_k = a^*_k \vv{b}^*_k \cdot \vv{b}^*_k \]

since by construction \(\vv{b}^*_k\) is perpendicular to the first \(k - 1\) vectors of either basis. Also by construction \( \vv{b}_k \cdot \vv{b}^*_k = \vv{b}^*_k \cdot \vv{b}^*_k \) hence:

\[ a^*_k = a_k \]

and in particular a nonzero integer. Thus:

\[ \lVert\vv{v}\rVert^2 = \sum_{i=1}^{k} |a^*_i|^2 \lVert\vv{b}^*_i\rVert \ge |a_k|^2 \lVert\vv{b}^*_k\rVert \ge \lVert\vv{b}^*_k\rVert \]

which implies the lemma.

Write:

\[ \mu_{ji} = - \frac{\vv{b}_j \cdot \vv{b}^*_i}{\vv{b}^*_i \cdot \vv{b}^*_i } \]

for the coefficients in the Gram-Schmidt orthogonalization. If all of them satisfy:

\[ |\mu_{ji}| \le \frac{1}{2} \]

then we call \(\vv{b}_1,…,\vv{b}_n\) a weakly reduced basis.

We can convert any lattice basis to a weakly reduced basis with the same Gram-Schmidt orthogonalization by repeating the following up to \( \binom{n}{2} \) times: choose \(j\) as large as possible such that \(|\mu_{kj}| \gt \frac{1}{2} \) for some \(k\). Then replace \(\vv{b}_k\) with \(\vv{b}_k - m \vv{b}_j\), where \(m\) is an integer closest to \(\mu_{kj}\).

Our weak reduction routine expects a lattice basis along with its Gram-Schmidt orthogonalization. The return value also takes this form, as this will be easier to work with later. For testing, we add a wrapper that hides Gram-Schmidt business from the caller.

lllWeakReduce' = go [] . reverse where
  go as = \case
    [] -> as
    (h@(b, bo):bs) -> go (h:map (first tweak) as) bs where
      tweak a = zipWith (+) a $ (m*) <$> b where
        m = nearestZ $ -(dot (map (% 1) a) bo / dot bo bo)
lllWeakReduce b = map fst $ lllWeakReduce' $ zip b $ gramSchmidt b

lllWeakReduce [[1,2,3],[100,101,102], [55,-10,-20]]

This generalizes our orthogonal-as-possible basis \(\vv{b}_1, \vv{b}'_2\) above when we studied the two-dimensional case.

Back then, we wanted the first vector to be no longer than the second, and if this wasn’t the case, then we interchanged the vectors and repeated the steps. This time, we merely desire that:

\[ \lVert \vv{b}^*_i \rVert^2 \le \frac{4}{3} \lVert \vv{b}^*_{i+1} + \mu_{(i+1),i}\vv{b}^*_i \rVert^2 \]

for all \(i \in [1..n]\). A reduced basis is a weakly reduced basis satisfying this condition, known as the Lovász condition.

As you might have guessed, we can turn any basis into a reduced basis by first weakly reducing it, then if necessary, interchanging any \(\vv{b}_i, \vv{b_{i+1}}\) violating this inequality and going back to the first step.

Like our weak reduction function, our reduction function also expects the Gram-Schmidt orthogonalization along with the lattice basis. With a couple of vector additions, we update the Gram-Schmidt basis after an interchange. Both weak reduction and reduction need the Gram-Schmidt basis, so we maintain it and pass it around rather than recompute it every time.

lll' = go [] . lllWeakReduce' where
  go acc = \case
    [] -> fst <$> reverse acc
    h@(b, bo):rest -> case acc of
      [] -> go [h] rest
      (a, ao):at -> let
        rhs = (mu (map (% 1) b) ao *) <$> ao
        bo' = zipWith (-) bo rhs
        ao' = zipWith (+) ao $ (mu (map (% 1) a) bo' *) <$> bo'
        in if 3 * quadrance ao <= 4 * quadrance (zipWith (+) bo rhs)
          then go (h:acc) rest
          else lll' $ reverse at ++ ((b, bo') : (a, ao') : rest)
lll b = lll' $ zip b $ gramSchmidt b

lll [[1,2,3,4],[-5,6,7,8],[9,-10,11,-12],[13,-14,-15,-16]]

As you might have also guessed, the 4/3 factor is important for proving that this algorithm terminates in polynomial time for rational inputs, which we will do soon, and also for estimating how close we are to the true answer.

\[ \begin{align} \lVert \vv{b}^*_i \rVert^2 & \le \frac{4}{3} \lVert \vv{b}^*_{i+1} + \mu_{(i+1),i}\vv{b}^*_i \rVert^2 \\ & = \frac{4}{3} \lVert \vv{b}^*_{i+1} \rVert^2 + \frac{4}{3} \mu_{(i+1),i}^2 \lVert \vv{b}^*_i \rVert^2 \\ & \le \frac{4}{3} \lVert \vv{b}^*_{i+1} \rVert^2 + \frac{1}{3} \lVert \vv{b}^*_i \rVert^2 \end{align} \]

where the last step is a consequence of the basis being weakly reduced. Then:

\[ \lVert \vv{b}^*_{i+1} \rVert^2 \ge \frac{1}{2} \lVert \vv{b}^*_i \rVert^2 \]

By induction:

\[ \lVert \vv{b}_1 \rVert^2 \le 2^{n-1} \min_i \lVert \vv{b}^*_i \rVert^2 \]

Thus by the lemma, for any nonzero vector \(\vv{v}\) on the lattice, including a shortest vector:

\[ \lVert \vv{b}_1 \rVert \le 2^{(n-1)/2} \lVert \vv{v} \rVert \]

Now we address the running time when given a rational basis. Define:

\[ D(\vv{b}_1, …, \vv{b}_n) = \prod_{i=1}^n \lVert \vv{b}^*_i \rVert^{n-i} \]

Our weak reduction step above leaves the Gram-Schmidt orthogonalization unchanged, so also leaves this quantity unchanged.

Our reduction step shrinks this quantity by a factor of at least \(2/\sqrt{3}\).

We can scale up the input vectors so all the coordinates are integers, so that we exclusively work with matrices over the integers during the algorithm. It is known that the Gram determinant of a set of vectors is the square of the volume of the parallelotope they form, hence:

\[ \prod_{i=1}^k \lVert \vv{b}^*_i \rVert^2 = \begin{vmatrix} \vv{b}_1 \cdot \vv{b}_1 & … & \vv{b}_1 \cdot \vv{b}_k \\ \vdots & \ddots & \vdots \\ \vv{b}_1 \cdot \vv{b}_k & … & \vv{b}_k \cdot \vv{b}_k \\ \end{vmatrix} \ge 1 \]

where the inequality is justified by noting every entry of the determinant is an integer and that the square of a volume must be positive. This implies:

\[ D(\vv{b}_1, …, \vv{b}_n) \ge 1 \]

Lastly, if the input is the matrix \(\vv{A} = (\vv{a}_1, …, \vv{a}_n)\), then we have the crude bound:

\[ D(\vv{a}_1, …, \vv{a}_n) = (\max_i \lVert \vv{a}_i \rVert)^\binom{n}{2} \]

Therefore, after \(m\) steps:

\[ 1 \le D(\vv{b}_1,…,\vv{b}_n) \le \left( \frac{\sqrt{3}}{2} \right)^m (\max_i \lVert \vv{a}_i \rVert)^\binom{n}{2} \]

Rearranging, and since \( 1 / \lg (2/\sqrt{3}) = 4.8188… \)

\[ m \le 5 \binom{n}{2}\max_i \lg \lVert \vv{a}_i \rVert \le 5 \binom{n}{2} \langle \vv{A} \rangle \]

where \(\langle \vv{A} \rangle\) denotes the number of bits needed to represent \(\vv{A}\). In other words, the LLL algorithm terminates after a polynonial number of steps. (Lovász has 6 here; I may have blundered along the way.)

It remains to show the arithmetic operations stay within polynomial bounds. We skip this, though we mention that it appears this is the only part of the proof that depends on \(\mu_{ji} \le 1/2 \) for all valid indices; above, we only need this inequality when \(j = i + 1.\)

Get Shorty

Let \(\vv{b}_1, …, \vv{b}_n\) be a reduced basis of \(\mathbb{R}^n\), and \(\vv{v} = \sum_i a_i \vv{b}_i\) be a vector on the lattice it generates. Then it can be shown for all \(i\):

\[ |a_i| \lt 3^n \lVert \vv{v} \rVert / \lVert \vv{b}_1 \rVert \]

Thus if we view \(n\) as a fixed constant, then we can find a shortest vector by trying all possible \(a_i \in [-3^n..3^n] \). This also implies a shortest vector must exist.

brutalSVP = go . lll where
  go b = snd $ foldr1 min $ filter ((/= 0) . fst) $ zip (quadrance <$> vs) vs where
    n = length b
    vs = lincom b <$> replicateM n [-3^n..3^n]

lincom b as = foldr1 (zipWith (+)) $ zipWith scale b as where
  scale v a = map (a*) v

Relationship Advice

Let \(\vv{x} \in \mathbb{R}^n\) be nonzero. An integer relation between the coefficients of \(\vv{x}\) exists if for some \(\vv{a} \in \mathbb{Z}^n\) we have:

\[ \vv{a} \cdot \vv{x} = 0 \]

We can find an approximation \(\vv{a}\) for any rational approximation of \(\vv{x}\) by running the LLL algorithm on the matrix:

\[ \pmatrix{ Qx_1 & Qx_2 & … & Qx_n \\ 1 & 0 & … & 0 \\ 0 & 1 & … & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & … & 1 } \]

Here, we only have \(n\) column vectors, but each vector has \(n + 1\) coordinates. This is no problem because the LLL algorithm works even when the number of vectors is less than the number of dimensions. We can imagine filling in the empty spots at the end of the list with phantom vectors that satisfy the Lovász condition.

We choose \(Q\) large enough so we can bound \( |\vv{a} \cdot \vv{x}| \). Lovász leaves the details as an exercise. I took a stab at it: we can add another vector:

\[\vv{y} = \pmatrix{M^2 n \\ 0 \\ \vdots \\ 0}\]

where \(M\) is the largest of the \(Qx_i\), which exceeds 1 for sufficiently large \(Q\). Then the vector \(\vv{b_1}\) returned by the LLL algorithm cannot involve any multiples of \(\vv{y}\), because it is shorter than any of the original column vectors. Roughly speaking, if we assign \(\vv{y}\) a nonzero coefficient and try to make up for it, then one of the other coefficients must be at least \(M\), which leads to a linear combination that must be longer than the original vectors. The determinant of the lattice is \(M^2 n\) which we can use to estimate bounds.

Lovász is more explicit in the case of an approximation \(\alpha\) to a real root \(r\) of a polynomial. We set:

\[x_1 = 1, x_2 = r, …, x_n = r^d\]

where \(d\) is the degree of the minimal polynomial. Then it can be shown if \(Q\) is large enough and \(r\) is close enough to \(\alpha\) then the LLL algorithm returns the coefficients of the minimal polynomial.

In particular, it can be shown shown taking \(Q = 2^{3k^3}\) suffices, where \(k\) is the input size of the polynomial, provided that \(|r - \alpha| < 2^{-4k^3}\).

minpoly n r = head $ lll $ zipWith (:) scaled im where
  scaled = map (nearestZ . (1000000*)) $ iterate (r*) 1
  im = map (take n) $ take n $ iterate (0:) $ 1 : repeat 0

Lastly, we write a widget to show off this function:

showsXPow k
  | k == 0 = id
  | k == 1 = ('x':)
  | otherwise = ("x<sup>"++) . shows k . ("</sup>"++)

showTerm n k
  | n == 0 = id
  | n == 1, k /= 0 = ('+':) . showsXPow k
  | n == -1, k /= 0 = ('-':) . showsXPow k
  | n < 0 = shows n . showsXPow k
  | otherwise = ('+':) . shows n . showsXPow k

magic = do
  (a, b) <- break (== '.') <$> jsEval "root.value;"
  deg <- fromIntegral . readInteger <$> jsEval "degree.value;"
  let
    r = case b of
      [] -> readInteger a % 1
      _:d -> let
        w = readInteger a
        tens = 10^length d in (w * tens + signum w * readInteger d) % tens
    mp = tail $ minpoly (deg + 1) r
    s = foldr ($) "" $ reverse $ zipWith showTerm mp [0..deg]
  jsEval $ "out.innerHTML = '"++s++"';"

jsEval [r|
magic.innerHTML="Root: <input id='root' type=='text' size='8' value='-1.650629'> Degree: <input id='degree' type='text' size='2' value='3'> <button id='magic_button'>Polynomial</button>";
magic_button.addEventListener("click", (ev) => {
  repl.run("chat", ["Main"], "magic");
});
|]

Computing in Science & Engineering named integer relation detection as one of the top 10 algorithms of the 20th century. They credit the 1979 algorithm of Ferguson and Forcade, though it seems the LLL algorithm was the first to be published with complete proofs. Ferguson and others improved the original approach, and now PSLQ is the most widely used algorithm for integer relation detection.

Ben Lynn blynn@cs.stanford.edu 💡