Over four thousand years ago, the Babylonians recorded the first known illustrations of multiplication in their clay tablets. Instead of listing all possible products with a multiplication table, the Babylonians listed the squares of integers and utilized the clever formula \(ab = \frac{1}{4}((a+b)^{2}-(a-b)^{2}) \) to calculate products. By transforming a multiplication problem into a subtraction problem, the Babylonians devised one of the first multiplication algorithms: a systematic process of operations used to multiply integers. As a result, the Babylonians were able to quickly compute products by effectively using their tablets. 

For the past fifty years, mathematicians have been searching for similar algorithms that allow computers to efficiently multiply enormous integers. Although this may seem like a simple task, decades of research have led to the discovery of numerous elaborate algorithms, each superior to the last. However, it wasn’t until March 18th of this year when two researchers finally discovered a new multiplication algorithm widely believed to achieve optimal efficiency. But how exactly do we determine an algorithm’s efficiency, and what are the consequences of this new algorithm?

One way to measure efficiency is complexity. Suppose that one was asked to compute the product \(652 \times 225\). Most likely, they would use the long multiplication algorithm taught in grade school: breaking down the product into \(652 \times 200\), \(652 \times 20\), \(652 \times 5\), and summing the three results.

In general, for multiplying two integers each with \(n\) digits, we split one number into \(n\) single digits and compute \(n\) single-digit products for each digit, a total of \(n^2\) operations. One might believe that the grade school algorithm is the quickest way to multiply integers, but when \(n\) becomes very large, performing \(n^2\) operations can be painfully slow, even for computers. When considering how efficient an algorithm is for a computer, we often think about an algorithm’s complexity, or how many operations a computer must perform in relation to the input size. In the grade school multiplication algorithm, we perform around \(n^2\) total operations where\(n\), the number of digits (or the number of bits) of the numbers we wish to multiply, is the input size. Therefore, we say that the time complexity of the algorithm is on the order of \(n^2\) denoted as \(O(n^2)\). This tells us that as \(n\) grows large, the total time the computer takes to execute the algorithm is proportional to at most \(n^2\). To put that into perspective, when \(n\) is around one million, an \(O(n^2)\) algorithm could take over a year to execute while a faster algorithm, such as one in \(O(n \log n)\), could take a fraction of a second. Multiplying large integers is an essential task that appears in many areas of mathematics, such as cryptography and number theory, which is why discovering more efficient algorithms is important. 

The first major breakthrough in the multiplication problem was a \(O(n^{\log_2{3}}) \approx O(n^{1.58})\) algorithm discovered by Anatoly Karatsuba in 1963. Karatsuba’s algorithm utilized the now commonly used technique of divide and conquer: an approach in which a problem is divided into smaller ‘subproblems’ in a recursive-like fashion, and then the results of the subproblems are combined to arrive at the final answer. Suppose we wish to multiply two \(n-\) digit integers \(p\) and \(q\). Here, we consider the simplified case where \(n\) is a power of \(2\) as we can add as many leading zeros to \(p\) and \(q\) as necessary. Now implementing divide and conquer, we can split \(p\) and \(q\) each into two numbers with \(n/2\)  digits by writing \(p=a \cdot 10^{n/2}+b\) and \(q=c \cdot 10^{n/2}\) . Then the product \(pq\) can be expressed as 

\((a \cdot 10^{n/2}+b) \cdot (c \cdot 10^{n/2}+d) = ac \cdot 10^{n}+ (ad+bc) \cdot (10^{n/2}+bd) \). 

Now instead of computing the\(n\) digit product \(pq\), we can instead find the four \(n/2\) digit products \(ac,ad,bc,bd\). This observation alone was not enough to devise an algorithm faster than \(O(n^2)\) but Karatsuba discovered the clever trick of rewriting the sum \(ad+bc\) as \((a+b) \cdot (c+d)-(ac+bd)\). Therefore, after we compute \(ac\) and \(bd\), we can find \(ad+bc\) with only one additional \(n/2\) digit product \((a+b) \cdot (c+d)\). Repeating the same process for each of the three \(n/2\) digit products, we see that if \(M(n)\) is the number of operations required to multiply two \(n\) digit integers with the algorithm, \(M(n)=3M(n/2)+O(n)\) where \(O(n)\) is some linear term that represents the operations done while adding smaller products to form \(pq\). As the extra linear term has little effect in the overall time complexity of the algorithm, \(M(n)\) can be approximated as \(n^{\log_2{3}}\). If \(n\) is not a power of \(2\), then at most \(n\) additional leading zeros will be added to the beginning of\(p\) and \(q\), resulting in at most \((2n)^{\log_2{3}}=3n^{\log_2{3}}\) operations, proving the desired complexity of the algorithm. 

Over the ensuing years, numerous improvements were made to Karatsuba’s divide and conquer algorithm. In \(1963\), Andrei Toom generalized Karatsuba’s algorithm by dividing each ndigit number into an arbitrary number parts \(r\), which led to a complexity of \(O(C(r) \cdot n^{\log_r{2r-1}})\) for a fixed \(r\) where \(C(r)\) is some constant that grows rapidly as \(r\) increases. By letting \(r\) change in a certain way as \(n\) increases, Toom and Stephen Cook devised an algorithm with complexity \(O(n \cdot 2^{5\sqrt{\log_2{n}}})\) which is commonly known as Toom-Cook multiplication. Further improvements were made by Donald Knuth in \(1969\) who devised an \(O(n \log n \cdot 2^{\sqrt{\log_2{n^2}}})\) algorithm. 

The next major step towards a faster multiplication algorithm was the (re?)discovery of the Fast Fourier Transform (FFT) by J. W. Cooley and John Tukey in 1971, an efficient algorithm for evaluating and interpolating polynomials at certain points. Here it is helpful to consider the more general problem of multiplying two polynomials \(A(x)=a_0+a_1x+\cdots+a_{n-1}x^{n-1}\) and \(B(x)=b_0+b_1x+\cdots+b_{n-1}x^{n-1}\), each with \(n\) terms. (We once again consider the special case where nis a power of \(2\).) If we wish to find the product of integers with digits \(a_{n-1}a_{n-2}\cdots a_0\) and \(b_{n-1}b_{n-2}\cdots b_0\), it suffices to evaluate the product of the polynomials \(A(x)\cdot B(x)\) at \(x=10\) since any \(n\) digit number can be written as

\(y_{n-1}y_{n-2}\cdots y_0=y_0+y_1\cdot 10+y_2\cdot 10^2+\cdots +y_{n-1}\cdot 10^{n-1}\). 

Now, manually multiplying polynomials using long multiplication can be done in \(O(n^2)\) operations, which as previously discussed, is very slow. In order to speed up the process, we use a corollary of the Fundamental Theorem of Algebra which tells us that a polynomial with degree \(n-1\) is uniquely defined by \(n\) points. This means that if we choose \(n\) points  \((x_0,y_0),(x_1,y_1),\cdots ,(x_{n-1},y_{n-1})\), there exists a unique polynomial with degree \(n-1\) that passes through all \(n\) of the points as shown in the image to the right.

So if we wish to find the polynomial \(C(x)=A(x) \cdot B(x)\) with degree \(2n-2\), we could choose \(2n-1\) distinct values \(x_0,x_1,\cdots ,x_{2n-2}\), evaluate \(A(x)\cdot B(x)\) at each those values, and then use that information to construct \(C(x)\). The FFT algorithm enables us to evaluate \(A(x)\cdot B(x)\) and construct \(C(x)\) in \(O(n \log n)\) time by choosing each of the \(x_i\) to be certain complex numbers: the \(2n^{th}\) roots of unity. By using these special complex numbers, another divide and conquer approach can be used by splitting \(A(x)\) and \(B(x)\) each into two polynomials each with degree \(\approx \frac{n}{2}\). Repeating this process allows us to compute \(A(x) \cdot B(x)\) at \(x_0,x_1,\cdots ,x_{2n-1}\) in \(O(n \log n)\) time. Finally, we can construct \(C(x)\) also in \(O(n \log n)\) time using an almost identical process called the Inverse FFT. In practice, the complexity of Cooley-Tukey algorithm is a little larger than \(O(n \log n)\) since this algorithm neglects floating points errors or errors in the precision of decimals. As the number of total operations increases, a higher precision of decimals is required to arrive at an accurate answer. Therefore, if we take into consideration the cost of such computations, the overall complexity increases.

In the same year, Arnold Schönhage and Volker Strassen devised a variation of the FFT algorithm known as a number-theoretic transform that utilizes fields of integers modulo primes instead of complex numbers. These fields contain integers that behave in the same way that roots of unity do in the complex numbers, allowing for an efficient multiplication algorithm in \(O(n \log n \cdot \log \log n)\). As a result, the Schönhage-Strassen algorithm continues to be one of the most widely used multiplication algorithms in the world today. 

During their research, Schönhage and Strassen conjectured an asymptotic lower bound of \(\Omega (n \log n)\). It wasn’t until \(36\) years later when an algorithm was published by Martin Fürer with complexity \(O(n \log n \cdot K^{\log^*n})\) reduced the gap between the Schönhage-Strassen algorithm and the conjectured lower bound (where \(\log^*n\) is iterated logarithm function which grows at an extremely slow rate and \(K\) is some unspecified constant).  In \(2014\), Dr. David Harvey from the School of Mathematics and Statistics at the University of New South Wales and Joris van der Hoeven from the L’École Polytechnique in France provided an explicit constant on the \(\log^*n\) exponent with an \(O(n \log n \cdot 2^{3 \log^*n})\) algorithm which was later reduced to \(O(n \log n \cdot 2^{2 \log^*n})\) in \(2018\). With their numerous findings, Harvey and van der Hoeven quickly became leading experts in the field. 

In March 2019, Harvey and van der Hoeven finally discovered a \(O(n \log n)\) algorithm for integer multiplication, finally providing an asymptotically optimal algorithm assuming the Schönhage-Strassen conjecture is correct. As Dr. Harvey states, “Our work is expected to be the end of the road for this problem, although we don’t know yet how to prove this rigorously.” Although the significance of their algorithm lies outside practical application, there are numerous important consequences. Looking back at the Cooley-Tukey algorithm, integer multiplication in \(O(n \log n)\) implies that \(k^{th}\) roots of unity can be computed to a precision of \(n\) significant bits in \(O(n \log n)\) time, and transcendental constants such as \(e^x\) and \(\pi\) can be computed to \(n\) bits in \(O(n \log^2n)\) time. As the algorithm continues to be studied and developed, many other applications and connections to other related fields will likely be uncovered, revealing the great importance of Harvey’s and van der Hoeven’s discovery. 

“People have been hunting for such an algorithm for almost 50 years. It was not a foregone conclusion that someone would eventually be successful.” — Dr. David Harvey.