# OpenCL Fast Fourier Transform

Eric Bainville - May 2010, updated March 2011## Fast Fourier Transform algorithms

### Introduction

The Fast Fourier Transform is a family of algorithms computing the Discrete Fourier Transform (DFT) in time O(N.log(N)). The most commonly used FFT algorithms use a divide-and-conquer approach similar to the algorithm of Cooley and Tukey (1965). The Wikipedia FFT page is a good starting point for the interested reader.

The computation of a DFT of length N is done by splitting the input sequence into a fixed small number of smaller subsequences, compute their DFT, and assemble the outputs to build the final sequence. Providing the split and assembly steps have linear cost, the complexity of the algorithm will be O(N.log(N)).

One of the simplest subdivision is to take the sub-sequence xe with even indices 0,2,4,...,N-2, and
xo with odd indices 1,3,...,N-1, supposing N is even. Compute the DFT of each sub-sequence,
ye=DFT_{N/2}(θ^{2},xe) and yo=DFT_{N/2}(θ^{2},xo). Then we can build y=DFT_{N}(θ,x) as follows:

y_{k} = ye_{k} + θ^{k} yo_{k}, and

y_{k+N/2} = ye_{k} - θ^{k} yo_{k}, for k=0..N/2-1.

### More details on subdivisions

In this section, we enter into more details about the various subdivision schemes used in FFT, known as decimation in frequency (DIF), and decimation in time (DIT). Both actually refer to the same algorithm.

Consider a sequence x=(x_{0},x_{1},...,x_{N-1}) of length N, and a factorization N=P*Q.
We denote the DFT of x of length N by DFT_{N}(x) or DFT_{N}(x_{0},x_{1},...,x_{N-1}).
Let θ=e^{-2*i*Π/N}, y=DFT_{N}(x) is by definition:

y_{k} = θ^{0.k}.x_{0} + θ^{1.k}.x_{1} + θ^{2.k}.x_{2} + ... (N terms).

We consider the P subsequences of x with length Q defined by elements with the same index modulo P:

x[u]=(x_{u},x_{u+P},x_{u+2.P},...,x_{u+(Q-1).P}) with u=0..P-1.

Highlighting the terms of x[u] in y_{k}, we get:

y_{k} = ... + θ^{u.k}.x_{u} + ... + θ^{(u+P).k}.x_{u+P} + ... + θ^{(u+2.P).k}.x_{u+2.P} + ..., or

y_{k} = Σ_{u} θ^{u.k} (x_{u} + θ^{P.k}.x_{u+P} + θ^{2.P.k}.x_{u+2.P} + ...), and finally

y_{k} = Σ_{u} θ^{u.k} DFT_{Q}(x[u])_{k}, where element k of a DFT of length Q is taken modulo Q.

Let's group together the P values of k using the same element v of y[u]=DFT_{Q}(x[u]), by expressing k as k=v+Q.j,
with v=0..Q-1 and j=0..P-1:

y_{v+Q.j} = Σ_{u} θ^{u.(v+Q.j)} y[u]_{v}, or

y_{v+Q.j} = Σ_{u} θ^{Q.u.j} (θ^{u.v} y[u]_{v}), and finally

y_{v+Q.j} = DFT_{P}(θ^{0.v} y[0]_{v},θ^{1.v} y[1]_{v},...,θ^{(P-1).v} y[P-1]_{v})_{j}.

We have shown that the DFT_{N} can be computed by:

- compute P×DFT_{Q},

- scale all the N resulting values by the "twiddle factors" θ^{u.v},

- compute Q×DFT_{P}.

This algorithm is called either radix-P decimation in time (DIT), or radix-Q decimation in frequency (DIF). Usually,
the name is chosen according to the smallest value between P and Q. A radix-2 DIT algorithm corresponds to P=2
and Q=N/2; it starts with 2 DFT_{N/2}. A radix-2 DIF algorithm corresponds to P=N/2 and Q=2; it starts with N/2 DFT_{2}.

In the next page, Reference implementations, we show how the performance of the reference implementations is measured.

OpenCL FFT : DFT | Top of Page | OpenCL FFT : Reference implementations |