OpenCL Fast Fourier Transform
Eric Bainville - May 2010, updated March 2011Reference implementations
We will use several reference implementations to check our results, and compare execution times: FFTW and Intel MKL on the CPU, and NVidia CUFFT on the GPU. The following code snippets show how FFT was computed using the various API. n is the size of the DFT to compute, x[2*n] the input array, and y[2*n] the output array.
// FFTW API: fftwf_plan p1 = fftwf_plan_dft_1d(n,(fftwf_complex *)x,(fftwf_complex *)y,FFTW_FORWARD,FFTW_ESTIMATE); fftwf_execute(p1); fftwf_destroy_plan(p1); // Intel MKL native API (DFTI): DFTI_DESCRIPTOR_HANDLE h; DftiCreateDescriptor(&h,DFTI_SINGLE,DFTI_COMPLEX,1,n); DftiSetValue(h,DFTI_PLACEMENT,DFTI_NOT_INPLACE); DftiCommitDescriptor(h); DftiComputeForward(h,(void *)x,y); DftiFreeDescriptor(&h); // CUFFT API: cufftHandle plan; cufftComplex * inData = 0; cufftComplex * outData = 0; size_t dataSize = sizeof(cufftComplex) * n; cufftPlan1d(&plan,n,CUFFT_C2C,1); cudaMalloc((void **)(&inData),dataSize); cudaMalloc((void **)(&outData),dataSize); cudaMemcpy(inData,x,dataSize,cudaMemcpyHostToDevice); // write X cufftExecC2C(plan,inData,outData,CUFFT_FORWARD); // compute cudaMemcpy(y,outData,dataSize,cudaMemcpyDeviceToHost); // read Y cudaDeviceSynchronize(); cufftDestroy(plan); cudaFree(inData); cudaFree(outData);
Before starting to write FFT kernels, we will see in the next page how to handle the single/double precision selection: Single and double precision.
OpenCL FFT : FFT | Top of Page | OpenCL FFT : Single and double precision |