Algebraic FFTs

The FFT (“Fast Fourier Transform”) is one of the greatest algorithms of all time, but most have only seen it in its classical form, as transforming discrete samples in the time domain to their complex representation in the frequency domain. Its cousin the NTT (“Number Theoretic Transform”), which interpolates polynomials on modular numbers, is much less well known but still in common use, mostly for cryptography.

However, even more exotic “FFTs” have been invented for specialized purposes, over stranger domains and with unfamiliar properties. In this post I’ll illustrate the single algorithm underlying any FFT, and then go through a quick survey of these exotic FFTs and show how we can implement them by plugging in a small set of parameters to our common algorithm.

Ultimately we’ll see that an FFT allows us to change the basis of a function defined on \(N\) points in \(N\log(N)\) time. The naive method would be to change the basis by applying the linear transformation matrix, and matrix multiplication takes ~ \(N^3\) time, so this is a huge advantage.

This post was written as a Sage notebook. I’ve written a few utility functions, but their meaning should be clear from their names. The only strange one is bind - this lets me add methods to a class after definition, so I can write prose in between.

Definition of an FFT

For a given field \(\mathbb{F}\), and a size \(N = 2^n\), an FFT for \(\mathbb{F}\) and \(N\) is fully specified by its initial domain \(D_n\) containing \(2^n\) points, and a sequence of \(n\) layers:

Layer = namedtuple('Layer', 'π t')
Fft = namedtuple('Fft', 'domain layers')

Where either \(n = 0\), meaning \(D_0\) has exactly one point and \(\ell_0\) is empty, or:

From this definition, we derive a sequence of \(n+1\) domains \(D_n, D_{n-1}, ..., D_0\), each halving in size, generated by the maps \(\pi_n, \pi_{n-1}, ..., \pi_1\), and distinguishers (commonly referred to as twiddles) on each domain except \(D_0\).

\[\begin{CD} D_n @>\pi_n>> D_{n-1} @>\pi_{n-1}>> ... @>\pi_2>> D_1 @>\pi_1>> D_0 \\ @VV{t_n}V @VV{t_{n-1}}V @. @VV{t_1}V \\ \mathbb{F} @. \mathbb{F} @. @. \mathbb{F} @. \end{CD}\]

It will be useful to explicitly build the inverse map \(\pi_n^{-1}\) that maps every output point in \(D_{n-1}\) to its two inputs \(\{x_0,x_1\} \in D_n\) (aka its fiber), along with the values of \(t(x_0)\) and \(t(x_1)\). Let’s add a method to Fft for this, which will also check that our requirements are satisfied. The returned dictionary has the form \(\{ \pi_n(x) \rightarrow ((x_0,t(x_0)), (x_1,t(x_1))) \}\), and the set of all of its keys is exactly \(D_{n-1}\).

@bind(Fft)
def build_inverse_map(self: Fft):
    π, t = self.layers[0]
    inv_map = defaultdict(lambda: [])
    for x in self.domain:
        inv_map[π(x)].append((x, t(x)))
    assert all(
        len(fiber) == 2 and fiber[0][1] != fiber[1][1]
        for fiber in inv_map.values()
    )
    return inv_map

If these requirements are satisfied, we can apply the FFT algorithm to any function \(f : D_n \rightarrow \mathbb{F}\) to get a vector of \(2^n\) coefficients \(\vec{c}\):

Given an arbitrary FFT, we can derive its basis \(\hat{b}_n\), which is a vector of functions from \(D_n \rightarrow \mathbb{F}\). Then we can understand the meaning of our coefficients \(\vec{c}\): they’re weights of the basis functions, and the weighted sum of the basis will reconstruct our original function \(f\).

Well… it should, if your FFT forms a basis with enough dimensionality to reconstruct the original function. We’ll assume this is true, because we’re only interested in applying the FFT algorithm, not how useful the result turns out to be.

I will explain how to derive this basis as it is formed by the structure of the FFT, but it’s important to note that the basis is purely conceptual: it tells you how to interpret your resulting coefficients, but isn’t necessary if you just want to run the algorithm.

The FFT algorithm, and our basis

Of these steps, decomposition is the easiest to explain, but least illustrative. For now, assume we will write a function decompose that takes our inverse map and evaluations of our function on \(D_n\), and returns evaluations of \(f_0\) and \(f_1\) on \(D_{n-1}\).

@bind(Fft)
def fft(self: Fft, f: dict):
    if not self.layers:
        return [f[list(self.domain)[0]]]
    inv_map = self.build_inverse_map()
    f0, f1 = decompose(inv_map, f)
    next_fft = Fft(inv_map.keys(), self.layers[1:])
    return interleave(next_fft.fft(f0), next_fft.fft(f1))

I know it doesn’t make sense yet, I just wanted to get it out of the way, since it’s so short.

We’ll start from the beginning, and using what we’ve laid out above, try to arrive at the same solution.

Our goal is to take a function \(f\) and return a vector of coefficients \(\vec{c}\) as weights of our basis functions \(\hat{b}_n\), so that the weighted sum reconstructs \(f\). (Along the way, we’ll need to define \(\hat{b}_n\) as well). So we need to define the left hand side of this equation:

In the base case when \(n = 0\), we have a single point \(x_0\) in our domain. \(f\) simply maps \(x_0\) to a single value, so it can be fully described by a constant function that always returns that value \(f(x_0)\). Defining our single basis function as \(\hat{b}_0^{(0)}(x) = 1\), we can return \(f(x_0)\) as our coefficient or weight of this function, fulfilling our criteria that \(\sum_i \vec{c}^{(i)} \cdot \hat{b}_n^{(i)}(x) = \vec{c}^{(0)} \cdot \hat{b}_0^{(0)}(x) = f(x_0) \cdot 1 = f(x)\).

Otherwise, let’s we’ll continue for \(n > 0\). Let’s start by applying our decompose function to \(f\) to split it into \(f_0\) and \(f_1\), or “the part that depends on \(t(x)\)” and “the part that doesn’t depend on \(t(x)\).”

\[ \sum_i \vec{c}^{(i)} \cdot \hat{b}_n^{(i)}(x) = f_0(\pi_n(x)) + t_n(x) \cdot f_1(\pi_n(x)) \]

So we started with the values of \(f(x)\) at every \(x \in D_n\), and now have the values of \(f_0(\pi_n(x))\) and \(f_1(\pi_n(x))\) at every \(x \in D_n\), or equivalently, the values of \(f_0(y)\) and \(f_1(y)\) at every point \(y \in D_{n-1}\).

Now we recur, but we need to be careful. We’re going to apply the next (smaller) FFT to \(f_0\) and \(f_1\), not \(f_0 \circ \pi_n\) and \(f_1 \circ \pi_n\). What I mean is that for \(f_0\), for example, the smaller FFT will return coefficients \(\vec{c}_0\) of the smaller basis \(\hat{b}_{n-1}\), which consists of functions of the smaller domain. In order to substitute it back into our equation we have to move the \(\pi_n\) inside to project our domain to the smaller domain, and likewise for \(f_1\):

\[ \sum_i \vec{c}^{(i)} \cdot \hat{b}_n^{(i)}(x) = \left[ \sum_i \vec{c}_0^{(i)} \cdot \hat{b}_{n-1}^{(i)}(\pi_n(x)) \right] + t_n(x) \cdot \left[ \sum_i \vec{c}_1^{(i)} \cdot \hat{b}_{n-1}^{(i)}(\pi_n(x)) \right] \]

Again - we applied the smaller FFT to \(f_0\) and \(f_1\), so in order for the results to accurately reconstruct \(f_0(\pi_n(x))\), we need to apply the smaller basis functions to \(\pi_n(x)\) instead of \(x\).

Now remember we’re trying to get expressions for \(\vec{c}\) and \(\hat{b}_n\) so our equation is always true. We’ll try to express the right hand side as a single sum, so start by moving the \(t_n(x)\) inside:

\[ \sum_i \vec{c}^{(i)} \cdot \hat{b}_n^{(i)}(x) = \sum_i \vec{c}_0^{(i)} \cdot \hat{b}_{n-1}^{(i)}(\pi_n(x)) + \sum_i \vec{c}_1^{(i)} \cdot t_n(x) \cdot \hat{b}_{n-1}^{(i)}(\pi_n(x)) \]

But.. that’s really as far as we can get in the general case. We have two different inner-product-like sums, and the only way to structure \(\vec{c}\) and \(\hat{b}\) to make the sums always equal is to literally just concatenate, setting \(\vec{c} = \vec{c}_0 || \vec{c}_1\), and similarly for \(\hat{b}_n\), so that’s what we do.

Almost. Actually, we choose to interleave the elements of \(\vec{c}_0\) and \(\vec{c}_1\), but this or any other choice is arbitrary. The reason we usually choose to interleave is that \(t_n\) will usually be a degree-1 function, and \(\pi_n\) will usually be degree-2. So when we interleave, or place the terms from \(f_0\) into even indices and terms from \(f_1\) into odd indices:

\[ (\vec{c}^{(i)},\hat{b}_n^{(i)}(x)) = \begin{cases} (\vec{c}_0^{(k)}, \hat{b}_{n-1}^{(k)}(\pi_n(x))) & i = 2k \\ (\vec{c}_1^{(k)}, t_n(x) \cdot \hat{b}_{n-1}^{(k)}(\pi_n(x))) & i = 2k + 1 \\ \end{cases} \]

then the index will correspond to the degree of the basis function, assuming the same is true for \(\hat{b}_{n-1}\). In other words, assuming the degree of \(\hat{b}_{n-1}^{(k)}(x)\) is \(k\), the degree of \(\hat{b}_{n-1}^{(k)}(\pi_n(x))\) will be \(2k\), and the degree of \(t_n(x) \cdot \hat{b}_{n-1}^{(k)}(\pi_n(x))\) will be \(2k + 1\), exactly corresponding to our outer index \(i\).

Armed with all of the above math, you should now understand the last 2 lines of our algorithm:

    next_fft = Fft(inv_map.keys(), self.layers[1:])
    return interleave(next_fft.fft(f0), next_fft.fft(f1))

We can also add a method to apply each basis function of our FFT to a point, and return the results as a vector. This is convenient because we can now evaluate our coefficients at any point with dot(coeffs, fft.basis(pt)).

@bind(Fft)
def basis(self: Fft, pt):
    if not self.layers:
        return [1]
    domain, ((π, t), *next_layers) = self
    next_fft = Fft({ π(x) for x in domain }, next_layers)
    next_basis = next_fft.basis(π(pt))
    return interleave(
        next_basis,
        [t(pt)*b for b in next_basis],
    )

At this point it’s also quite trivial to define the inverse FFT, which transforms coefficients into evaluations:

@bind(Fft)
def ifft(self: Fft, coeffs: list) -> dict:
    if not self.layers:
        return { x: coeffs[0] for x in self.domain }
    domain, ((π, t), *next_layers) = self
    next_fft = Fft({ π(x) for x in domain }, next_layers)
    f0 = next_fft.ifft(coeffs[0::2])
    f1 = next_fft.ifft(coeffs[1::2])
    return {
        x: f0[π(x)] + t(x) * f1[π(x)]
        for x in domain
    }

Decomposing the function

At each layer of the FFT, we decompose the function \(f\) (defined on \(D_n\)) into two functions \(f_0\) and \(f_1\), each defined on \(D_{n-1}\):

To do so, we’ll need to take advantage of the special properties of \(\pi_n\) and \(t_n\). Consider two points \(x_0\) and \(x_1\) in \(D_n\) that map to the same point in \(D_{n-1}\)

\[ \begin{align} f(x_0) &= f_0(\pi_n(x_0)) + t_n(x_0) \cdot f_1(\pi_n(x_0)) \\ f(x_1) &= f_0(\pi_n(x_1)) + t_n(x_1) \cdot f_1(\pi_n(x_1)) \end{align} \]

We know the values of \(f(x_0)\) and \(f(x_1)\), because \(f\) (either the function or a vector of all its evaluations) is given as an input to our algorithm, and we know \(t_n(x_0)\) and \(t_n(x_1)\) because \(t_n\) is a parameter of our FFT. So that’s 4 unknowns, which is unsolvable… but wait! \(x_0\) and \(x_1\) map to the same point, meaning \(\pi_n(x_0) = \pi_n(x_1)\), which we’ll simply write as \(\pi_n(x)\). Rewritten:

\[ \begin{align} f(x_0) &= f_0(\pi_n(x)) + t_n(x_0) \cdot f_1(\pi_n(x)) \\ f(x_1) &= f_0(\pi_n(x)) + t_n(x_1) \cdot f_1(\pi_n(x)) \end{align} \]

Now we have 2 equations and 2 unknowns, for which we can solve. If you squint, you may recognize this as \(y = mx + b\), with \(m = f_1(\pi_n(x))\) and \(b = f_0(\pi_n(x))\), so you may already know how to solve this by hand. In my opinion it’s easier to view it as a matrix multiplication, for which we use the formula for 2x2 inversion:

\[\begin{align} \begin{bmatrix} f(x_0) \\ f(x_1) \end{bmatrix} &= \begin{bmatrix} 1 & t_n(x_0) \\ 1 & t_n(x_1) \end{bmatrix} \begin{bmatrix} f_0(\pi_n(x)) \\ f_1(\pi_n(x)) \end{bmatrix} \\ \begin{bmatrix} f_0(\pi_n(x)) \\ f_1(\pi_n(x)) \end{bmatrix} &= \begin{bmatrix} 1 & t_n(x_0) \\ 1 & t_n(x_1) \end{bmatrix}^{-1} \begin{bmatrix} f(x_0) \\ f(x_1) \end{bmatrix} \\ &= \left(\frac{1}{t_n(x_1) - t_n(x_0)}\right) \begin{bmatrix} t_n(x_1) & -t_n(x_0) \\ -1 & 1 \end{bmatrix} \begin{bmatrix} f(x_0) \\ f(x_1) \end{bmatrix} \\ &= \left(\frac{1}{t_n(x_1) - t_n(x_0)}\right) \begin{bmatrix} t_n(x_1) f(x_0) - t_n(x_0) f(x_1) \\ f(x_1) - f(x_0) \end{bmatrix} \end{align}\]

Which gives us explicit formulas for \(f_0(\pi_n(x))\) and \(f_1(\pi_n(x))\) in terms of \(f\) and \(t_n\), which we have access to. You can see why we needed \(t_n\) to distinguish the fibers of \(\pi_n\), guaranteeing \(t_n(x_0) \neq t_n(x_1)\), otherwise our matrix would not be invertible, giving a division by zero.

f_x0, f_x1, t_x0, t_x1 = SR.var('f_x0 f_x1 t_x0 t_x1')
(matrix([[1, t_x0],
         [1, t_x1]]).inverse() * vector([f_x0, f_x1])).simplify_rational()

\(\displaystyle \left(\frac{f_{x_{1}} t_{x_{0}} - f_{x_{0}} t_{x_{1}}}{t_{x_{0}} - t_{x_{1}}},\,\frac{f_{x_{0}} - f_{x_{1}}}{t_{x_{0}} - t_{x_{1}}}\right)\)

Note that our expression for \(f_1(\pi_n(x)) = \frac{f(x_1) - f(x_0)}{t_n(x_1) - t_n(x_0)}\) corresponds to computing the “slope” \(m\) as rise over run, and the expression for \(f_0(\pi_n(x))\) corresponds to the “y-intercept” \(b\) you would get if you plugged it back into “\(y=mx+b\)”.

If we precompute the inverses of the scalars \(\left(\frac{1}{t_n(x_1) - t_n(x_0)}\right)\) at all points of our domain, we can compute our two outputs in 2 multiplies for \(f_0\) + 2 * (1 subtraction and 1 multiply by the inverse scalar each) = 4 multiplies and 2 subtracts.

In practice, you hope to choose a twiddle \(t_n\) and clever domains that reduce this computation. For example, if we know that \(t_n(x_0) = -t_n(x_1)\) for all points of our domain, as in the multiplicative case, it’s significantly cheaper:

_.substitute(t_x1 = -t_x0).simplify_rational()

Which can be computed in 1 add, 1 subtract and 2 multiplies. If you delay the scaling by \(\frac{1}{2}\) until the end, you can reduce that to 1 multiply, plus a final scaling which ends up being cheaper (\(N\) total scaling multiplies instead of \(N\log{N}\)).

def decompose(inv_map, f):
    f0 = {
        π_x: (f[x0] * t1 - f[x1] * t1) / (t1 - t0)
        for π_x, ((x0, t0), (x1, t1)) in inv_map.items()
    }
    f1 = {
        π_x: (f[x1] - f[x0]) / (t1 - t0)
        for π_x, ((x0, t0), (x1, t1)) in inv_map.items()
    }
    return f0, f1

Again but all in one place

Here’s what it looks like if you combine our three functions into one. I find it peaceful to meditate on this snippet of code. Print it out and tuck it under your pillow for good luck.

@bind(Fft)
def fft(self: Fft, f: dict):
    if not self.layers:
        return [f[list(self.domain)[0]]]
    domain, ((π, t), *next_layers) = self
    inv_map = defaultdict(lambda: [])
    for x in domain:
        inv_map[π(x)].append(x)
    f0 = {
        π_x: (f[x0] * t(x1) - f[x1] * t(x0)) / (t(x1) - t(x0))
        for π_x, (x0, x1) in inv_map.items()
    }
    f1 = {
        π_x: (f[x1] - f[x0]) / (t(x1) - t(x0))
        for π_x, (x0, x1) in inv_map.items()
    }
    next_fft = Fft(inv_map.keys(), next_layers)
    return interleave(next_fft.fft(f0), next_fft.fft(f1))

At each layer, decomposition is linear in \(N\), and then we recur twice on problems half the size. The number of layers is \(\log N\), so total runtime is \(N\log N\).

Matrix representation

Since applying the FFT is change-of-basis, or linear transformation of a function, it can be represented by a matrix applied to evaluations of the function. Matrix multiplication is \(n^3\) (ish) so the performance is much worse, but we can use this representation for other purposes, such as calculating the minimum distance of our FFT when used as a linear code.

The columns of our matrix will come from applying the FFT to “one-hot” vectors \((0, \dots, 1, \dots, 0)\), which you can see as representing the lagrange basis of evaluations that is mapped to the FFT basis.

one_hot = lambda n, i: vector([1 if j == i else 0 for j in range(n)])

@bind(Fft)
def fft_matrix(self: Fft):
    domain = list(self.domain)
    N = len(domain)
    return matrix.column([
        self.fft(dict(zip(domain, one_hot(N, i))))
        for i in range(N)
    ])

Examples

Each of these deserves their own blog post, but at the moment I’ll just drop a reference and show that they work for their choices of \(D\), \(\pi\) and \(t\).

def demo_fft(field, sizes, fft_factory, symbolic_point):
    for n in sizes:
        print(f'n={n}:')
        fft = fft_factory(n)
        print(f'  basis:', fft.basis(symbolic_point))
        
        set_random_seed(0)
        evals = dict(zip(fft.domain, rands(field, 2^n)))

        # FFT, then evaluating at each original point yields our original values        
        coeffs = fft.fft(evals)
        evals2 = { x: dot(coeffs, fft.basis(x)) for x in fft.domain }
        assert evals == evals2

        # IFFT works
        evals3 = fft.ifft(coeffs)
        assert evals == evals3

        # FFT matrix correctly calculates coefficients
        M = fft.fft_matrix()
        assert M * vector([evals[x] for x in fft.domain]) == vector(coeffs)
        
        print('  tests succeeded!')

        IM = M.inverse().T[:2^(n-1)]
        # computing this is very slow
        dist = LinearCode(IM).minimum_distance()
        print('  code minimum distance:', dist, f'(best possible is {2^(n-1)+1})')

Multiplicative FFT

The classical FFT works in the multiplicative subgroup of \(\mathbb{C}\), with \(e^{2 \pi i / k}\) the \(k\)th root of unity. This is actually kind of annoying to show off, due to precision issues. If we work with floating point numbers, our \(\pi\) map isn’t exactly two-to-one, and our results won’t be exact. So instead I’ll show off the NTT, which is the exact same algorithm, but applied to the multiplicative subgroup of a finite field.

SquaringLayer = Layer(
    π = lambda x: x^2,
    t = lambda x: x,
)

GF17 = GF(2^4 + 1)

def make_ntt(n):
    gen = GF17.multiplicative_generator() ^ (2^(4-n))
    return Fft(
        domain = take(2^n, powers(gen)),
        layers = [SquaringLayer] * n,
    )

demo_fft(GF17, [2, 3], make_ntt, polygen(GF17, 'X'))

n=2:
  basis: [1, X, X^2, X^3]
  tests succeeded!
  code minimum distance: 3 (best possible is 3)
n=3:
  basis: [1, X, X^2, X^3, X^4, X^5, X^6, X^7]
  tests succeeded!
  code minimum distance: 5 (best possible is 5)

Circle FFT

CircleYLayer = Layer(
    π = lambda xy: xy[0],
    t = lambda xy: xy[1],
)
CircleXLayer = Layer(
    π = lambda x: 2*x^2 - 1,
    t = lambda x: x,
)

GF127 = GF(2^7 - 1)
C127.<i> = GF127.extension(polygen(GF127)^2 + 1)
circle_gen = C127.multiplicative_generator()^(GF127.order()-1)

def make_cfft(n):
    gen = circle_gen^(2^(7-n-1))
    return Fft(
        domain = take(2^n, powers(gen^2, start=gen)),
        layers = [CircleYLayer] + [CircleXLayer] * (n-1),
    )

demo_fft(GF127, [2, 3], make_cfft, polygens(GF127, 'X,Y'))

n=2:
  basis: [1, Y, X, X*Y]
  tests succeeded!
  code minimum distance: 2 (best possible is 3)
n=3:
  basis: [1, Y, X, X*Y, 2*X^2 - 1, 2*X^2*Y - Y, 2*X^3 - X, 2*X^3*Y - X*Y]
  tests succeeded!
  code minimum distance: 4 (best possible is 5)

Additive NTT

GF256 = GF(2^8, repr='int')

subspace = lambda n: [GF256.from_integer(i) for i in range(2^n)]
beta = lambda i: GF256.from_integer(2^i)
W = lambda i, x: product(x - u for u in subspace(i))
q = lambda i, x: (W(i, beta(i))^2 / W(i+1, beta(i+1))) * x * (x + 1)

def make_additive(n):
    return Fft(
        domain = subspace(n),
        layers = [
            Layer(
                # i=i to work around lambda capture quirk
                π = lambda x, i=i: q(i, x),
                t = lambda x: x,
            )
            for i in list(range(n))
        ]
    )

demo_fft(GF256, [2, 3], make_additive, polygen(GF256, 'X'))

n=2:
  basis: [1, X, 122*X^2 + 122*X, 122*X^3 + 122*X^2]
  tests succeeded!
  code minimum distance: 3 (best possible is 3)
n=3:
  basis: [1, X, 122*X^2 + 122*X, 122*X^3 + 122*X^2, 251*X^4 + 219*X^2 + 32*X, 251*X^5 + 219*X^3 + 32*X^2, 81*X^6 + 81*X^5 + 170*X^4 + 81*X^3 + 251*X^2, 81*X^7 + 81*X^6 + 170*X^5 + 81*X^4 + 251*X^3]
  tests succeeded!
  code minimum distance: 5 (best possible is 5)

rje.li