OpenCL Fast Fourier Transform
Eric Bainville - May 2010, updated March 2011Introduction
In this article, we will experiment various OpenCL implementations of one-dimensional Fast Fourier Transform (FFT) algorithms.
Since I wrote the first version of this page in May 2010, new generations of GPU have emerged (AMD HD 6xxx and NVIDIA GTX 4xx and 5xx), and CUFFT got a little faster. It's time for a refresh of both the code and the article.
The refresh of this page is still in progress. Meanwhile, please check out the "old" FFT page: OpenCL FFT (OLD)
Contents
- Introduction - Introduction.
- DFT - Discrete Fourier Transform.
- FFT - Fast Fourier Transform algorithms.
- Reference implementations - FFTW, Intel MKL, and NVidia CUFFT.
- Single and double precision - Kernels for 'float' and 'double'.
- Radix-2 kernel - Simple radix 2 OpenCL kernel.
- Higher radix kernels - Extension to radix 4,8,16, and 32 kernels.
- Benchmarks - Benchmarks of the various kernels and variants.
References
- Discrete Fourier Transform Wikipedia entry.
- Fast Fourier Transform Wikipedia entry.
- Vasily Volkov's page provides links to several GPU optimized numerical algorithms, including the FFT.
- Apple OpenCL FFT sample code.
Benchmarks
I moved all benchmarks to a single page, easier to update: Benchmarks. Feel free to download and compile the code, and send me your benchmark results. I may include them in this page.
OpenCL Sorting | Top of Page | OpenCL FFT : DFT |