OpenCL Fast Fourier Transform

Eric Bainville - May 2010, updated March 2011


In this article, we will experiment various OpenCL implementations of one-dimensional Fast Fourier Transform (FFT) algorithms.

Since I wrote the first version of this page in May 2010, new generations of GPU have emerged (AMD HD 6xxx and NVIDIA GTX 4xx and 5xx), and CUFFT got a little faster. It's time for a refresh of both the code and the article.

The refresh of this page is still in progress. Meanwhile, please check out the "old" FFT page: OpenCL FFT (OLD)


  1. Introduction - Introduction.
  2. DFT - Discrete Fourier Transform.
  3. FFT - Fast Fourier Transform algorithms.
  4. Reference implementations - FFTW, Intel MKL, and NVidia CUFFT.
  5. Single and double precision - Kernels for 'float' and 'double'.
  6. Radix-2 kernel - Simple radix 2 OpenCL kernel.
  7. Higher radix kernels - Extension to radix 4,8,16, and 32 kernels.
  8. Benchmarks - Benchmarks of the various kernels and variants.



I moved all benchmarks to a single page, easier to update: Benchmarks. Feel free to download and compile the code, and send me your benchmark results. I may include them in this page.