Nov 7, 2021

Fast Multiplication

Multiplication, generally, is relatively easy to do, understand and code. Even the 2nd standard algorithm of multiple each digit with every other digit is simple enough to write. The issue is that such an algorithm takes O(n^2) time complexity, which is a lot, especially if you want to multiply huge numbers. Hence why you need better algorithms in place to do your job. Note, languages like Python have lots of optimisations on top of specific algorithms used for computing products when the user types in a simple ‘*', but the article is more about understanding these algorithms.

Fast Exponentiation

Most useful in the cases of binary numbers, fast exponentiation allows a user to get the exponent of a number in logN time. One can also use this algorithm for getting modulo of huge powers or for matrix multiplication! Well, since it’s logN time, the first thought which comes to mind is that conversion into binary is required, which is correct. A simple prerequisite to understanding this algorithm is knowing how exponents work, importantly, that $a^{b + c} = a^b + a^c$. Let’s take an example of finding 3*50

Converting the exponent to binary, we get the exponent in this form. Now, this is exciting, since if you notice, all the powers are strictly in powers of 2. It is important because now we do not have to worry about multiplying arbitrary powers with other things; we continue to square logN times, at which we will get our answer!

def expo(base, power):
    if power == 0:
        return 1
    result = expo(base, power//2)
    base = base*base
    if n&1:
        result = result*base
    return result

Karatsuba Algorithm

Discovered by Antony Karatsuba, this is a fast multiplication algorithm that only slightly improves a typical O($n^2$) case. It is based on dividing numbers into two and multiplying them recursively to answer (also, it has lesser space complexity :p) It is based on the straightforward thought that addition takes lesser time than multiplication, which, I hope, is not very hard to prove. Here, ideally, we will also be using previously learnt fast exponentiation.

As can be seen, we divide the numbers into two based on powers of 10 (You can also do this based on powers of 2), and we multiple recursively. This algorithm saves time in computing ad + bc, where it computes (a + b)*(c + d) and subtracts previously calculated products to get the required answer. Essentially, instead of computing four products, it computes 3, which is why its complexity is O($n^{log_23}$)

def karatsuba(x, y):
    if len(str(x)) == 1 or len(str(y)) == 1:
        return x*y
    n = max(len(str(x)), len(str(y)))/2
    a = x / 10**n
    b = x % 10**n
    c = y / 10**n
    d = y % 10**n

    ac = karatsuba(a, c)
    bd = karatsuba(b, d)
    ad_bc = karatsuba(a+b, c+d) - ac -bd
    return 10**(2*n)*ac + 10**n * (ad_bc) + bd

Fast Fourier Transformation

This is the most complex algorithm out of the three so far. It is essentially used for multiplying polynomials, but you can ofcourse write a normal number as a polynomial quite easily. This requires knowledge about complex numbers so its advisable you read up on that before going through this.

Let there be two polynomials A(x) and B(x) of degrees n and m respectively and we want to find C(x) = A(x) * B(x). Before going through best options, lets analyse stuff a bit. There are 3 traditional ways to represent a polynomial:

Coeffecient Vector: F(x) = $a_0 + a_1x + a_2x^2 .. a_nx^n$
Roots: F(x) = $ (x-r_1)(x-r_2)…(x-r_n) $
Samples: F(x) = $ { (x_1, y_1), (x_2, y_2) … (x_{n+1}, y_{n+1})}$

Depending on the representaiton, each operation takes different amount of time:

Algorithm	Coeffecient	Roots	Samples
Evaluation	O(n)	O(n)	O(n^2)
Addition	O(n)	Inf	O(n)
Multiplication	O(n^2)	O(n)	O(n)

As you can see, for multiplication (which is our focus), simply having polynomial in coefficient form is bad, and if we could convert it into either sample or root we would be able to do whole thing in O(n). So we try to convert it into sample form:

Let Vandermonde Matrix V be a matrix of any random n points where

$$A = \begin{pmatrix} a_0,\
a_1,\
.,\
.,\
.,\
a_{n-1},\
\end{pmatrix}$$

$$Y = \begin{pmatrix} y_0,\
y_1,\
.,\
.,\
.,\
y_{n-1}, \
\end{pmatrix} $$

A is a mtrix of Coefficients and Y is a matrix of results of A(x). For vandermonde’s matrix V, we know that: $$det(V) = \prod_{1 \leq j \lt k \leq n} (x_k - x_j)$$

Proof for which is available here: https://towardsdatascience.com/the-vandermonde-determinant-a-novel-proof-851d107bd728

Hence, we find out that V is invertible, and we realise that we can convert to the coefficient form of representing a polynomial if we are given the sample form.

As can be seen from the table, sample form is quite convenient when it comes to addition and multiplication of polynomials. Its main issue is that we have to evaluate the polynomial for a particular given point, for which we end up using Lagrange’s Interpolation at O(n^2). Added to this, the degree of resulting polynomial is the sum of the two polynomial which we are multiplying, hence we require 2n + 1 points instead of just n + 1 points.

So what we end up doing instead is that when it comes to individual evaluation of polynomial, we prefer converting it back to coefficient form and evaluate it in O(n) instead.

We can use any points as evaluation points, but we observe that by choosing the points as $2n^{th}$ roots of unity the conversion time is reduced to O($nlogn$) instead of O($n^2$).

Also, we need some knowledge about complex numbers :p

Nth Roots of Unity

These are defined as $$ \omega_n = e^{\frac{2\pi \iota}{n}} $$

Cancellation Lemma:

For any integer $n\geq 0, d\geq 0, k\geq 0$ $$ \omega_{dn}^{dk} = \omega_{n}^{k} $$

Proof: $$ \omega_{dn}^{dk} = (e^{\frac{2\pi \iota}{dn}})^{dk} = (e^{\frac{2\pi\iota}{n}})^k = \omega_n^k $$

Halving Lemma:

If $n$ is even, then the squares of the n complex nth roots of unity are the $n/2$ complex $n/2$th roots of unity.

Proof: By Cancellation Lemma we know that $$ (\omega_n^k)^2 = \omega_{\frac{n}{2}}^k $$

And we know each term on the LHS will occur twice, hence proved

Summation Lemma

Sum of all nth roots of unity is equal to 0

We need to evaluate the polynomial A(x) at 2nth roots of unity and store its result in Y. This vector Y is called Discrete Fourier Transformation of the coefficient vector A, which is represented as y = DFT_n(a).

By using a method known as the Fast Fourier Transformation (FFT), which takes advantage of the special properties of complex roots of unity, we can computed DFT_n(a) in time O($n logn $) as opposed to O($n^2$) of the straightforward method. Here, we assume that n is a power of 2 for convenience

FFT employs a divide and conquer technique where it divides the coefficients of A(x) on the bases of even and odd indices as: $$ A^{[0]}(x) = a_0 + a_2x + . . . + a_{n-2}x^{\frac{n}{2} - 1} $$ $$ A^{[1]}(x) = a_1 + a_3x + . . . + a_{n-1}x^{\frac{n}{2} - 1} $$

Where $$ A(x) = A^{[0]}x^2 + xA^{[1]}x^2$$

Since $\omega_n^0$ collapses on squaring, the size is constantly reduced by a factor of 2 as we progress in the recursive tree. Hence, we get time complexity as: $$ T(n) = 2*T(n/2) + \theta(n) = \theta(nlgn) $$

Now, for interpolation of the complex roots, we write $Y = V^{-1}A$ where $V_{j,k}^{-1} = \frac{\omega^{-kj}_n}{n} $. Since this is of the same type as V, we can use the same procedure of divide and conquer when it comes to interpolating to get back to coefficient form.

using cd = complex<double>;
const double PI = acos(-1);

void fft(vector<cd> & a, bool invert) {
    int n = a.size();
    if (n == 1)
        return;

    vector<cd> a0(n / 2), a1(n / 2); // divide
    for (int i = 0; 2 * i < n; i++) {
        a0[i] = a[2*i];
        a1[i] = a[2*i+1]; 
    }
    fft(a0, invert); // recursively divide
    fft(a1, invert);

    double ang = 2 * PI / n * (invert ? -1 : 1);
    cd w(1), wn(cos(ang), sin(ang)); // omega 
    for (int i = 0; 2 * i < n; i++) {
        a[i] = a0[i] + w * a1[i];
        a[i + n/2] = a0[i] - w * a1[i];
        if (invert) {
            a[i] /= 2;
            a[i + n/2] /= 2;
        }
        w *= wn;
    }
}

vector<int> multiply(vector<int> const& a, vector<int> const& b) {
    vector<cd> fa(a.begin(), a.end()), fb(b.begin(), b.end()); // defining vectors
    int n = 1;
    while (n < a.size() + b.size()) 
        n <<= 1;
    fa.resize(n);
    fb.resize(n);

    fft(fa, false); // convert into sample form
    fft(fb, false);
    for (int i = 0; i < n; i++)
        fa[i] *= fb[i];
    fft(fa, true); // convert answer back to coeff form

    vector<int> result(n);
    for (int i = 0; i < n; i++)
        result[i] = round(fa[i].real()); // answer is only the real part
    return result;
}