CUFFT Library
CUFFT Library
Release 12.3
NVIDIA
i
2.3.6 cufftMakePlanMany() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3.7 cufftMakePlanMany64() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.3.8 cufftXtMakePlanMany() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.4 cuFFT Estimated Size of Work Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.4.1 cufftEstimate1d() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.4.2 cufftEstimate2d() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.4.3 cufftEstimate3d() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.4.4 cufftEstimateMany() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.5 cuFFT Refined Estimated Size of Work Area . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.5.1 cufftGetSize1d() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.5.2 cufftGetSize2d() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.5.3 cufftGetSize3d() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.5.4 cufftGetSizeMany() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.5.5 cufftGetSizeMany64() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.5.6 cufftXtGetSizeMany() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.6 cufftGetSize() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.7 cuFFT Caller Allocated Work Area Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.7.1 cufftSetAutoAllocation() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.7.2 cufftSetWorkArea() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.7.3 cufftXtSetWorkAreaPolicy() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.8 cuFFT Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.8.1 cufftExecC2C() and cufftExecZ2Z() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.8.2 cufftExecR2C() and cufftExecD2Z() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.8.3 cufftExecC2R() and cufftExecZ2D() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.8.4 cufftXtExec() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.8.5 cufftXtExecDescriptor() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.9 cuFFT and Multiple GPUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.9.1 cufftXtSetGPUs() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.9.2 cufftXtSetWorkArea() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.9.3 cuFFT Multiple GPU Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.9.3.1 cufftXtExecDescriptorC2C() and cufftXtExecDescriptorZ2Z() . . . . . . . . . . . . 53
2.9.3.2 cufftXtExecDescriptorR2C() and cufftXtExecDescriptorD2Z() . . . . . . . . . . . 54
2.9.3.3 cufftXtExecDescriptorC2R() and cufftXtExecDescriptorZ2D() . . . . . . . . . . . 55
2.9.4 Memory Allocation and Data Movement Functions . . . . . . . . . . . . . . . . . . . . . 55
2.9.4.1 cufftXtMalloc() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.9.4.2 cufftXtFree() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.9.4.3 cufftXtMemcpy() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.9.5 General Multiple GPU Descriptor Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.9.5.1 cudaXtDesc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.9.5.2 cudaLibXtDesc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.10 cuFFT Callbacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.10.1 cufftXtSetCallback() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.10.2 cufftXtClearCallback() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.10.3 cufftXtSetCallbackSharedSize() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.11 cufftSetStream() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.12 cufftGetVersion() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.13 cufftGetProperty() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.14 cuFFT Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.14.1 Parameter cufftType . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.14.2 Parameters for Transform Direction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.14.3 Type definitions for callbacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.14.4 Other cuFFT Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.14.4.1 cufftHandle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.14.4.2 cufftReal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
ii
2.14.4.3 cufftDoubleReal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.14.4.4 cufftComplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.14.4.5 cufftDoubleComplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.15 Common types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
2.15.1 cudaDataType . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
2.15.2 libraryPropertyType . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
7 Deprecated Functionality 77
8 Notices 79
8.1 Notice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
8.2 OpenCL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
8.3 Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Index 81
iii
iv
cuFFT, Release 12.3
The API reference guide for cuFFT, the CUDA Fast Fourier Transform library.
This document describes cuFFT, the NVIDIA® CUDA® Fast Fourier Transform (FFT) product. It consists
of two separate libraries: cuFFT and cuFFTW. The cuFFT library is designed to provide high perfor-
mance on NVIDIA GPUs. The cuFFTW library is provided as a porting tool to enable users of FFTW to
start using NVIDIA GPUs with a minimum amount of effort.
The FFT is a divide-and-conquer algorithm for efficiently computing discrete Fourier transforms of
complex or real-valued data sets. It is one of the most important and widely used numerical algorithms
in computational physics and general signal processing. The cuFFT library provides a simple interface
for computing FFTs on an NVIDIA GPU, which allows users to quickly leverage the floating-point power
and parallelism of the GPU in a highly optimized and tested FFT library.
The cuFFT product supports a wide range of FFT inputs and options efficiently on NVIDIA GPUs. This
version of the cuFFT library supports the following features:
▶ Algorithms highly optimized for input sizes that can be written in the form 2a × 3b × 5c × 7d . In
general the smaller the prime factor, the better the performance, i.e., powers of two are fastest.
▶ An O (n log n) algorithm for every input data size
▶ Half-precision (16-bit floating point), single-precision (32-bit floating point) and double-precision
(64-bit floating point). Transforms of lower precision have higher performance.
▶ Complex and real-valued input and output. Real valued input or output require less computations
and data than complex values and often have faster time to solution. Types supported are:
▶ C2C - Complex input to complex output
▶ R2C - Real input to complex output
▶ C2R - Symmetric complex input to real output
▶ 1D, 2D and 3D transforms
▶ Execution of multiple 1D, 2D and 3D transforms simultaneously. These batched transforms have
higher performance than single transforms.
▶ In-place and out-of-place transforms
▶ Arbitrary intra- and inter-dimension element strides (strided layout)
▶ FFTW compatible data layout
▶ Execution of transforms across multiple GPUs
▶ Streamed execution, enabling asynchronous computation and data movement
The cuFFTW library provides the FFTW3 API to facilitate porting of existing FFTW applications.
Please note that starting from CUDA 11.0, the minimum supported GPU architecture is SM35. See
Deprecated Functionality.
Contents 1
cuFFT, Release 12.3
2 Contents
Chapter 1. Using the cuFFT API
This chapter provides a general overview of the cuFFT library API. For more complete information on
specific functions, see cuFFT API Reference. Users are encouraged to read this chapter before con-
tinuing with more detailed descriptions.
The Discrete Fourier transform (DFT) maps a complex-valued vector xk (time domain) into its frequency
domain representation given by:
N∑
−1
xn e−2πi N
kn
Xk =
n=0
where Xk is a complex-valued vector of the same size. This is known as a forward DFT. If the sign on
the exponent of e is changed to be positive, the transform is an inverse transform. Depending on N ,
different algorithms are deployed for the best performance.
The cuFFT API is modeled after FFTW, which is one of the most popular and efficient CPU-based FFT
libraries. cuFFT provides a simple configuration mechanism called a plan that uses internal building
blocks to optimize the transform for the given configuration and the particular GPU hardware selected.
Then, when the execution function is called, the actual transform takes place following the plan of
execution. The advantage of this approach is that once the user creates a plan, the library retains
whatever state is needed to execute the plan multiple times without recalculation of the configuration.
This model works well for cuFFT because different kinds of FFTs require different thread configurations
and GPU resources, and the plan interface provides a simple way of reusing configurations.
Computing a number BATCH of one-dimensional DFTs of size NX using cuFFT will typically look like this:
#define NX 256
#define BATCH 10
#define RANK 1
...
{
cufftHandle plan;
cufftComplex *data;
...
cudaMalloc((void**)&data, sizeof(cufftComplex)*NX*BATCH);
cufftPlanMany(&plan, RANK, NX, &iembed, istride, idist,
&oembed, ostride, odist, CUFFT_C2C, BATCH);
...
cufftExecC2C(plan, data, data, CUFFT_FORWARD);
cudaDeviceSynchronize();
...
cufftDestroy(plan);
cudaFree(data);
}
3
cuFFT, Release 12.3
The most common case is for developers to modify an existing CUDA routine (for example, filename.
cu) to call cuFFT routines. In this case the include file cufft.h or cufftXt.h should be inserted into
filename.cu file and the library included in the link line. A single compile and link line might appear
as
▶ ∕usr∕local∕cuda∕bin∕nvcc [options] filename.cu … -I∕usr∕local∕cuda∕inc -L∕
usr∕local∕cuda∕lib -lcufft
Of course there will typically be many compile lines and the compiler g++ may be used for linking so
long as the library path is set correctly.
Users of the FFTW interface (see FFTW Interface to cuFFT) should include cufftw.h and link with
both cuFFT and cuFFTW libraries.
Functions in the cuFFT and cuFFTW library assume that the data is in GPU visible memory. This means
any memory allocated by cudaMalloc, cudaMallocHost and cudaMallocManaged or registered with
cudaHostRegister can be used as input, output or plan work area with cuFFT and cuFFTW functions.
For the best performance input data, output data and plan work area should reside in device memory.
cuFFTW library also supports input data and output data that is not GPU visible.
Among the plan creation functions, cufftPlanMany() allows use of more complicated data layouts
and batched executions. Execution of a transform of a particular size and type may take several stages
of processing. When a plan for the transform is generated, cuFFT derives the internal steps that need
to be taken. These steps may include multiple kernel launches, memory copies, and so on. In addition,
all the intermediate buffer allocations (on CPU/GPU memory) take place during planning. These buffers
are released when the plan is destroyed. In the worst case, the cuFFT Library allocates space for
8*batch*n[0]*..*n[rank-1] cufftComplex or cufftDoubleComplex elements (where batch
denotes the number of transforms that will be executed in parallel, rank is the number of dimensions
of the input data (see Multidimensional Transforms) and n[] is the array of transform dimensions)
for single and double-precision transforms respectively. Depending on the configuration of the plan,
less memory may be used. In some specific cases, the temporary space allocations can be as low as
1*batch*n[0]*..*n[rank-1] cufftComplex or cufftDoubleComplex elements. This temporary
space is allocated separately for each individual plan when it is created (i.e., temporary space is not
shared between the plans).
The next step in using the library is to call an execution function such as cufftExecC2C() (see Pa-
rameter cufftType) which will perform the transform with the specifications defined at planning.
One can create a cuFFT plan and perform multiple transforms on different data sets by providing
different input and output pointers. Once the plan is no longer needed, the cufftDestroy() function
should be called to release the resources allocated for the plan.
Note: Complex-to-real (C2R) transforms accept complex-Hermitian input. For one-dimensional sig-
nals, this requires the 0th element (and the N2 th input if N is even) to be real-valued, i.e. its imaginary
part should be zero. For d-dimension signals, this means x(n1 ,n2 ,...,nd ) = x∗(N1 −n1 ,N2 −n2 ,...,Nd −nd ) . Oth-
erwise, the behavior of the transform is undefined. Also see Multidimensional Transforms.
input point values when non-unit input and output strides are chosen. Out-of-place complex-to-real
FFT will always overwrite input buffer. For out-of-place transforms, input and output sizes match the
logical transform non-redundant size ⌊ N2 ⌋ + 1 and size N , respectively.
N∑
−1
xn e−2πi N
kn
Xk =
n=0
n n1 n2 nd
where N = (N , ,..., N
1 N2 d
), and the summation denotes the set of nested summations
N∑
1 −1 N∑
2 −1 N∑
d −1
...
n1 =0 n2 =0 nd =0
cuFFT supports one-dimensional, two-dimensional and three-dimensional transforms, which can all
be called by the same cufftExec* functions (see Fourier Transform Types).
Similar to the one-dimensional case, the frequency domain representation of real-valued input data
satisfies Hermitian symmetry, defined as: x(n1 ,n2 ,...,nd ) = x∗(N1 −n1 ,N2 −n2 ,...,Nd −nd ) .
C2R and R2C algorithms take advantage of this fact by operating only on half of the elements of signal
array, namely on: xn for n ∈ {1, . . . , N1 } × . . . × {1, . . . , Nd−1 } × {1, . . . , ⌊ N2d ⌋ + 1}.
The general rules of data alignment described in Data Layout apply to higher-dimensional transforms.
The following table summarizes input and output data sizes for multidimensional DFTs:
For example, static declaration of a three-dimensional array for the output of an out-of-place real-to-
complex transform will look like this:
cufftComplex odata[N1][N2][N3∕2+1];
Passing inembed or onembed set to NULL is a special case and is equivalent to passing n for each. This
is same as the basic data layout and other advanced parameters such as istride are ignored.
If the advanced parameters are to be used, then all of the advanced interface parameters must be
specified correctly. Advanced parameters are defined in units of the relevant data type (cufftReal,
cufftDoubleReal, cufftComplex, or cufftDoubleComplex).
Advanced layout can be perceived as an additional layer of abstraction above the access to in-
put/output data arrays. An element of coordinates [z][y][x] in signal number b in the batch will
be associated with the following addresses in the memory:
▶ 1D
input[ b * idist + x * istride ]
output[ b * odist + x * ostride ]
▶ 2D
input[ b * idist` + (x * inembed[1] + y) * istride ]
output[ b * odist + (x * onembed[1] + y) * ostride ]
▶ 3D
input[ b * idist + ((x * inembed[1] + y) * inembed[2] + z) * istride ]
output[ b * odist + ((x * onembed[1] + y) * onembed[2] + z) * ostride ]
The istride and ostride parameters denote the distance between two successive input and output
elements in the least significant (that is, the innermost) dimension respectively. In a single 1D trans-
form, if every input element is to be used in the transform, istride should be set to 1; if every other
input element is to be used in the transform, then istride should be set to 2. Similarly, in a single
1D transform, if it is desired to output final elements one after another compactly, ostride should
be set to 1; if spacing is desired between the least significant dimension output data, ostride should
be set to the distance between the elements.
The inembed and onembed parameters define the number of elements in each dimension in the
input array and the output array respectively. The inembed[rank-1] contains the number of el-
ements in the least significant (innermost) dimension of the input data excluding the istride el-
ements; the number of total elements in the least significant dimension of the input array is then
istride*inembed[rank-1]. The inembed[0] or onembed[0] corresponds to the most significant
(that is, the outermost) dimension and is effectively ignored since the idist or odist parameter pro-
vides this information instead. Note that the size of each dimension of the transform should be less
than or equal to the inembed and onembed values for the corresponding dimension, that is n[i] �
inembed[i], n[i] � onembed[i], where i ∈ {0, . . . , rank − 1}.
The idist and odist parameters indicate the distance between the first element of two consecutive
batches in the input and output data.
▶ cufftMakePlan{1d,2d,3d,Many}() - create the plan. These are the same functions used in
the single GPU case although the definition of the argument workSize reflects the number of
GPUs used.
▶ Optional: cufftGetSize{1d,2d,3d,Many}() - refined estimate of the sizes of the work areas
required. These are the same functions used in the single GPU case although the definition of
the argument workSize reflects the number of GPUs used.
▶ Optional: cufftGetSize() - check workspace size. This is the same function used in the single
GPU case although the definition of the argument workSize reflects the number of GPUs used.
▶ Optional: cufftXtSetWorkArea() - do your own workspace allocation.
▶ cufftXtMalloc() - allocate descriptor and data on the GPUs
▶ cufftXtMemcpy() - copy data to the GPUs
▶ cufftXtExecDescriptorC2C()∕cufftXtExecDescriptorZ2Z() - execute the plan
▶ cufftXtMemcpy() - copy data from the GPUs
▶ cufftXtFree() - free any memory allocated with cufftXtMalloc()
▶ cufftDestroy() - free cuFFT plan resources
batch=1 1D 2D 3D
C2C/Z2Z ▶ 2-16 GPUs
▶ 2,4,8,16 GPUs ▶ One of the following conditions is met for
▶ power of 2 sizes each dimension:
only ▶ Dimension must factor into primes
▶ Minimum size for less than or equal to 127
2-4 GPUs is 64 ▶ Maximum dimension size is 4096 for
▶ Minimum size for single precision
8 GPUs is 128 ▶ Maximum dimension size is 2048 for
▶ Minimum size for double precision
16 GPUs is 1024 ▶ Minimum size is 32
To produce natural order results in GPU memory for multi-GPU runs in the 1D single transform case,
requires calling cufftXtMemcpy() with CUFFT_COPY_DEVICE_TO_DEVICE.
2D and 3D multi-GPU transforms support execution of a transform given permuted order results
as input. After execution in this case, the output will be in natural order. It is also possible to use
cufftXtMemcpy() with CUFFT_COPY_DEVICE_TO_DEVICE to return 2D or 3D data to natural order.
See the cuFFT Code Examples section for single GPU and multiple GPU examples.
Note: Starting from CUDA 11.4, support for callback functionality using separately compiled device
code is deprecated on all GPU architectures. Callback functionality will continue to be supported for
all GPU architectures.
the function prototype for the type of routine specified. If there is already a callback of the specified
type associated with the plan, the set callback function will replace it with the new one.
The callback routine extensions to cuFFT are built on the extensible cuFFT API. The general steps in
defining and executing a transform with callbacks are:
▶ cufftCreate() - create an empty plan, as in the single GPU case
▶ cufftMakePlan{1d,2d,3d,Many}() - create the plan. These are the same functions used in
the single GPU case.
▶ cufftXtSetCallback() - called for load and/or store callback for this plan
▶ cufftExecC2C() etc. - execute the plan
▶ cufftDestroy() - free cuFFT plan resources
Callback functions are not supported on transforms with a dimension size that does not factor into
primes smaller than 127. Callback functions on plans whose dimensions’ prime factors are limited to
2, 3, 5, and 7 can safely call __syncthreads(). On other plans, results are not defined.
Note: The callback API is available in the statically linked cuFFT library only, and only on 64 bit LINUX
operating systems.
From the host side, the user then has to get the address of the callback routine, which is stored in
myOwnCallbackPtr. This is done with cudaMemcpyFromSymbol, as follows:
cufftCallbackLoadR hostCopyOfCallbackPtr;
cudaMemcpyFromSymbol(&hostCopyOfCallbackPtr,
myOwnCallbackPtr,
sizeof(hostCopyOfCallbackPtr));
hostCopyOfCallbackPtr then contains the device address of the callback routine, that should be
passed to cufftXtSetCallback. Note that, for multi-GPU transforms, hostCopyOfCallbackPtr
will need to be an array of pointers, and the cudaMemcpyFromSymbol will have to be invoked for each
GPU. Please note that __managed__ variables are not suitable to pass to cufftSetCallback due to
restrictions on variable usage (See the NVIDIA CUDA Programming Guide for more information about
__managed__ variables).
When more than one kernel are used to implement a transform, the thread and block structure of the
first kernel (the one that does the load) is often different from the thread and block structure of the
last kernel (the one that does the store).
One common use of callbacks is to reduce the amount of data read or written to memory, either by
selective filtering or via type conversions. When more than one kernel are used to implement a trans-
form, cuFFT alternates using the workspace and the output buffer to write intermediate results. This
means that the output buffer must always be large enough to accommodate the entire transform.
For multi-GPU transforms, the index passed to the callback routine is the element index from the start
of data on that GPU, not from the start of the entire input or output data array.
For transforms whose dimensions can be factored into powers of 2, 3, 5, or 7, cuFFT guarantees that
it will call the load and store callback routines from points in the kernel that is safe to call __sync-
threads function from within callback routine. Caller is responsible for guaranteeing that the callback
routine is at a point where the callback code has converged, to avoid deadlock. For plans whose di-
mensions are factored into higher primes, results of a callback routine calling __syncthreads are not
defined.
Note that there are no guarantees on the relative order of execution of blocks within a grid. As such,
callbacks should not rely on any particular ordering within a kernel. For instance, reordering data (such
as an FFT-shift) could rely on the order of execution of the blocks. Results in this case would be unde-
fined.
Note: Starting from CUDA 11.8 (including CUDA 12.0 onward), CUDA Graphs are no longer supported
for callback routines that load data in out-of-place mode transforms. An upcoming release will update
the cuFFT callback implementation, removing this limitation. cuFFT deprecated callback functionality
based on separate compiled device code in cuFFT 11.4.
For cufftw on Linux, to compile a small application against the dynamic library, the following command
can be used:
nvcc mCufftwApp.c -lcufftw -lcufft -o myCufftwApp
Whereas to compile against the static cuFFT library, extra steps need to be taken. The library needs
to be device linked. It may happen during building and linking of a simple program, or as a separate
step. The entire process is described in Using Separarate Compilation in CUDA.
For cuFFT and cufftw in version 9.0 or later any supported architecture can be used to do the device
linking:
Static cuFFT compilation command:
nvcc mCufftApp.c -lcufft_static -lculibos -o myCufftApp
Prior to version 9.0 proper linking required specifying a subset of supported architectures, as shown
in the following commands:
Static cuFFT compilation command:
nvcc mCufftApp.c -lcufft_static -lculibos -o myCufftApp\
-gencode arch=compute_20,\"code=sm_20\"\
-gencode arch=compute_30,\"code=sm_30\"\
-gencode arch=compute_35,\"code=sm_35\"\
-gencode arch=compute_50,\"code=sm_50\"\
-gencode arch=compute_60,\"code=sm_60\"\
-gencode arch=compute_60,\"code=compute_60\"
Please note that the cuFFT library might not contain code for certain architectures as long as there
is code for a lower architecture that is binary compatibile (e.g. SM37, SM52, SM61). This is reflected
in link commands above and significant when using versions prior r9.0. To determine if a specific SM
is included in the cuFFT library, one may use cuobjdump utility. For example, if you wish to know if
SM_50 is included, the command to run is cuobjdump -arch sm_50 libcufft_static.a. Some
kernels are built only on select architectures (e.g. kernels with half precision arithmetics are present
only for SM53 and above). This can cause warnings at link time that architectures are missing from
these kernels. These warnings can be safely ignored.
It is also possible to use the native Host C++ compiler and perform device link as a separate step.
Please consult NVCC documentation for more details. Depending on the Host Operating system, some
additional libraries like pthread or dl might be needed on the linking line.
Note that in this case, the library cuda is not needed. The CUDA Runtime will try to open explicitly
the cuda library if needed. In the case of a system which does not have the CUDA driver installed, this
allows the application to gracefully manage this issue and potentially run if a CPU-only path is available.
The cuFFT static library supports user supplied callback routines. The callback routines are CUDA
device code, and must be separately compiled with NVCC and linked with the cuFFT library. Please
refer to the NVCC documentation regarding separate compilation for details. If you specify an SM
when compiling your callback functions, you must specify one of the SM’s cuFFT includes.
cuFFT batched plans require that input data includes valid signal for all batches. Performance opti-
mizations in batched mode can combine signal from different batches for processing. Optimizations
used in cuFFT can vary from version to version.
In version 9.2 cuFFT also introduced the cufftXtSetWorkAreaPolicy function. This function
allows fine tuning of work area memory usage.
cuFFT 9.2 version supports only the CUFFT_WORKAREA_MINIMAL policy, which instructs cuFFT to
re-plan the existing plan without the need to use work area memory.
Also as of cuFFT 9.2, supported FFT transforms that allow for CUFFT_WORKAREA_MINIMAL policy are
as follows:
▶ Transforms of type C2C are supported with sizes up to 4096 in any dimension.
▶ Transforms of type Z2Z are supported with sizes up to 2048 in any dimension.
▶ Only single GPU transforms are supported.
Depending on the FFT transform size, a different FFT algorithm may be used when the
CUFFT_WORKAREA_MINIMAL policy is set.
This chapter specifies the behavior of the cuFFT library functions by describing their input/output
parameters, data types, and error codes. The cuFFT library is initialized upon the first invocation of an
API function, and cuFFT shuts down automatically when all user-created FFT plans are destroyed.
Users are encouraged to check return values from cuFFT functions for errors as shown in cuFFT Code
Examples.
25
cuFFT, Release 12.3
2.2.1. cufftPlan1d()
cufftResult cufftPlan1d(cufftHandle *plan, int nx, cufftType type, int batch);
Creates a 1D FFT plan configuration for a specified signal size and data type. The batch input
parameter tells cuFFT how many 1D transforms to configure.
This call can only be used once for a given handle. It will fail and return CUFFT_INVALID_PLAN
if the plan is locked, i.e. the handle was previously used with a different cufftPlan or cufft-
MakePlan call.
Parameters
▶ plan[In] – Pointer to a cufftHandle object.
▶ nx[In] – The transform size (e.g. 256 for a 256-point FFT).
▶ type[In] – The transform data type (e.g., CUFFT_C2C for single precision com-
plex to complex).
▶ batch[In] – Number of transforms of size nx. Please consider using cufft-
PlanMany for multiple transforms.
▶ plan[Out] – Contains a cuFFT 1D plan handle value.
Return values
▶ CUFFT_SUCCESS – cuFFT successfully created the FFT plan.
▶ CUFFT_INVALID_PLAN – The plan parameter is not a valid handle. Handle is
not valid when the plan is locked.
▶ CUFFT_ALLOC_FAILED – The allocation of GPU resources for the plan failed.
▶ CUFFT_INVALID_VALUE – One or more invalid parameters were passed to the
API.
▶ CUFFT_INTERNAL_ERROR – An internal driver error was detected.
▶ CUFFT_SETUP_FAILED – The cuFFT library failed to initialize.
▶ CUFFT_INVALID_SIZE – The nx or batch parameter is not a supported size.
2.2.2. cufftPlan2d()
cufftResult cufftPlan2d(cufftHandle *plan, int nx, int ny, cufftType type);
Creates a 2D FFT plan configuration according to specified signal sizes and data type.
This call can only be used once for a given handle. It will fail and return CUFFT_INVALID_PLAN
if the plan is locked, i.e. the handle was previously used with a different cufftPlan or cufft-
MakePlan call.
Parameters
▶ plan[In] – Pointer to a cufftHandle object.
▶ nx[In] – The transform size in the x dimension This is slowest changing dimen-
sion of a transform (strided in memory).
▶ ny[In] – The transform size in the y dimension. This is fastest changing di-
mension of a transform (contiguous in memory).
▶ type[In] – The transform data type (e.g., CUFFT_C2R for single precision com-
plex to real).
▶ plan[Out] – Contains a cuFFT 2D plan handle value.
Return values
▶ CUFFT_SUCCESS – cuFFT successfully created the FFT plan.
▶ CUFFT_INVALID_PLAN – The plan parameter is not a valid handle. Handle is
not valid when the plan is locked.
▶ CUFFT_ALLOC_FAILED – The allocation of GPU resources for the plan failed.
▶ CUFFT_INVALID_VALUE – One or more invalid parameters were passed to the
API.
▶ CUFFT_INTERNAL_ERROR – An internal driver error was detected.
▶ CUFFT_SETUP_FAILED – The cuFFT library failed to initialize.
▶ CUFFT_INVALID_SIZE – Either or both of the nx or ny parameters is not a sup-
ported size.
2.2.3. cufftPlan3d()
cufftResult cufftPlan3d(cufftHandle *plan, int nx, int ny, int nz, cufftType type);
Creates a 3D FFT plan configuration according to specified signal sizes and data type. This func-
tion is the same as cufftPlan2d() except that it takes a third size parameter nz.
This call can only be used once for a given handle. It will fail and return CUFFT_INVALID_PLAN
if the plan is locked, i.e. the handle was previously used with a different cufftPlan or cufft-
MakePlan call.
Parameters
▶ plan[In] – Pointer to a cufftHandle object.
▶ nx[In] – The transform size in the x dimension. This is slowest changing di-
mension of a transform (strided in memory).
▶ ny[In] – The transform size in the y dimension.
▶ nz[In] – The transform size in the z dimension. This is fastest changing dimen-
sion of a transform (contiguous in memory).
▶ type[In] – The transform data type (e.g., CUFFT_R2C for single precision real
to complex).
▶ plan[Out] – Contains a cuFFT 3D plan handle value.
Return values
▶ CUFFT_SUCCESS – cuFFT successfully created the FFT plan.
▶ CUFFT_INVALID_PLAN – The plan parameter is not a valid handle. Handle is
not valid when the plan is locked.
2.2.4. cufftPlanMany()
cufftResult cufftPlanMany(cufftHandle *plan, int rank, int *n, int *inembed, int istride, int idist, int
*onembed, int ostride, int odist, cufftType type, int batch);
Creates a FFT plan configuration of dimension rank, with sizes specified in the array n. The batch
input parameter tells cuFFT how many transforms to configure. With this function, batched plans
of 1, 2, or 3 dimensions may be created.
The cufftPlanMany() API supports more complicated input and output data layouts via the
advanced data layout parameters: inembed, istride, idist, onembed, ostride, and odist.
If inembed and onembed are set to NULL, all other stride information is ignored, and default
strides are used. The default assumes contiguous data arrays.
All arrays are assumed to be in CPU memory.
Please note that behavior of cufftPlanMany function when inembed and onembed is NULL is
different than corresponding function in FFTW library fftw_plan_many_dft.
This call can only be used once for a given handle. It will fail and return CUFFT_INVALID_PLAN
if the plan is locked, i.e. the handle was previously used with a different cufftPlan or cufft-
MakePlan call.
Parameters
▶ plan[In] – Pointer to a cufftHandle object.
▶ rank[In] – Dimensionality of the transform (1, 2, or 3).
▶ n[In] – Array of size rank, describing the size of each dimension, n[0] being
the size of the outermost and n[rank-1] innermost (contiguous) dimension of
a transform.
▶ inembed[In] – Pointer of size rank that indicates the storage dimensions of
the input data in memory. If set to NULL all other advanced data layout param-
eters are ignored.
▶ istride[In] – Indicates the distance between two successive input elements
in the least significant (i.e., innermost) dimension.
▶ idist[In] – Indicates the distance between the first element of two consec-
utive signals in a batch of the input data.
▶ onembed[In] – Pointer of size rank that indicates the storage dimensions of
the output data in memory. If set to NULL all other advanced data layout pa-
rameters are ignored.
2.3.1. cufftCreate()
cufftResult cufftCreate(cufftHandle *plan)
Creates only an opaque handle, and allocates small data structures on the host. The cufft-
MakePlan*() calls actually do the plan generation.
Parameters
▶ plan[In] – Pointer to a cufftHandle object.
▶ plan[Out] – Contains a cuFFT plan handle value.
Return values
▶ CUFFT_SUCCESS – cuFFT successfully created the FFT plan.
▶ CUFFT_ALLOC_FAILED – The allocation of resources for the plan failed.
▶ CUFFT_INVALID_VALUE – One or more invalid parameters were passed to the
API.
▶ CUFFT_INTERNAL_ERROR – An internal driver error was detected.
2.3.2. cufftDestroy()
cufftResult cufftDestroy(cufftHandle plan)
Frees all GPU resources associated with a cuFFT plan and destroys the internal plan data struc-
ture. This function should be called once a plan is no longer needed, to avoid wasting GPU mem-
ory. In the case of multi-GPU plans, the plan created first should be destroyed last.
Parameters
▶ plan[In] – The cufftHandle object of the plan to be destroyed.
Return values
▶ CUFFT_SUCCESS – cuFFT successfully destroyed the FFT plan.
▶ CUFFT_INVALID_PLAN – The plan parameter is not a valid handle.
2.3.3. cufftMakePlan1d()
cufftResult cufftMakePlan1d(cufftHandle plan, int nx, cufftType type, int batch, size_t *workSize);
Following a call to cufftCreate() makes a 1D FFT plan configuration for a specified signal size
and data type. The batch input parameter tells cuFFT how many 1D transforms to configure.
This call can only be used once for a given handle. It will fail and return CUFFT_INVALID_PLAN
if the plan is locked, i.e. the handle was previously used with a different cufftPlan or cufft-
MakePlan call.
If cufftXtSetGPUs() was called prior to this call with multiple GPUs, then workSize will contain
multiple sizes. See sections on multiple GPUs for more details.
Parameters
▶ plan[In] – cufftHandle returned by cufftCreate.
▶ nx[In] – The transform size (e.g. 256 for a 256-point FFT). For multiple GPUs,
this must be a power of 2.
▶ type[In] – The transform data type (e.g., CUFFT_C2C for single precision com-
plex to complex). For multiple GPUs this must be a complex to complex trans-
form.
▶ batch[In] – Number of transforms of size nx. Please consider using cufft-
MakePlanMany for multiple transforms.
▶ *workSize[In] – Pointer to the size(s), in bytes, of the work areas. For example
for two GPUs worksize must be declared to have two elements.
▶ *workSize[Out] – Pointer to the size(s) of the work areas.
Return values
▶ CUFFT_SUCCESS – cuFFT successfully created the FFT plan.
▶ CUFFT_INVALID_PLAN – The plan parameter is not a valid handle. Handle is
not valid when the plan is locked or multi-GPU restrictions are not met.
2.3.4. cufftMakePlan2d()
cufftResult cufftMakePlan2d(cufftHandle plan, int nx, int ny, cufftType type, size_t *workSize);
Following a call to cufftCreate() makes a 2D FFT plan configuration according to specified
signal sizes and data type.
This call can only be used once for a given handle. It will fail and return CUFFT_INVALID_PLAN
if the plan is locked, i.e. the handle was previously used with a different cufftPlan or cufft-
MakePlan call.
If cufftXtSetGPUs() was called prior to this call with multiple GPUs, then workSize will contain
multiple sizes. See sections on multiple GPUs for more details.
Parameters
▶ plan[In] – cufftHandle returned by cufftCreate.
▶ nx[In] – The transform size in the x dimension. This is slowest changing di-
mension of a transform (strided in memory). For multiple GPUs, this must be
factorable into primes less than or equal to 127.
▶ ny[In] – The transform size in the y dimension. This is fastest changing dimen-
sion of a transform (contiguous in memory). For 2 GPUs, this must be factorable
into primes less than or equal to 127.
▶ type[In] – The transform data type (e.g., CUFFT_C2R for single precision com-
plex to real).
▶ workSize[In] – Pointer to the size(s), in bytes, of the work areas. For example
for two GPUs worksize must be declared to have two elements.
▶ *workSize[Out] – Pointer to the size(s) of the work areas.
Return values
▶ CUFFT_SUCCESS – cuFFT successfully created the FFT plan.
▶ CUFFT_INVALID_PLAN – The plan parameter is not a valid handle.
▶ CUFFT_ALLOC_FAILED – The allocation of GPU resources for the plan failed.
▶ CUFFT_INVALID_VALUE – One or more invalid parameters were passed to the
API.
▶ CUFFT_INTERNAL_ERROR – An internal driver error was detected.
▶ CUFFT_SETUP_FAILED – The cuFFT library failed to initialize.
▶ CUFFT_INVALID_SIZE – Either or both of the nx or ny parameters is not a sup-
ported size.
2.3.5. cufftMakePlan3d()
cufftResult cufftMakePlan3d(cufftHandle plan, int nx, int ny, int nz, cufftType type, size_t
*workSize);
Following a call to cufftCreate() makes a 3D FFT plan configuration according to specified
signal sizes and data type. This function is the same as cufftPlan2d() except that it takes a
third size parameter nz.
This call can only be used once for a given handle. It will fail and return CUFFT_INVALID_PLAN
if the plan is locked, i.e. the handle was previously used with a different cufftPlan or cufft-
MakePlan call.
If cufftXtSetGPUs() was called prior to this call with multiple GPUs, then workSize will contain
multiple sizes. See sections on multiple GPUs for more details.
Parameters
▶ plan[In] – cufftHandle returned by cufftCreate.
▶ nx[In] – The transform size in the x dimension. This is slowest changing di-
mension of a transform (strided in memory). For multiple GPUs, this must be
factorable into primes less than or equal to 127.
▶ ny[In] – The transform size in the y dimension. For multiple GPUs, this must
be factorable into primes less than or equal to 127.
▶ nz[In] – The transform size in the z dimension. This is fastest changing di-
mension of a transform (contiguous in memory). For multiple GPUs, this must
be factorable into primes less than or equal to 127.
▶ type[In] – The transform data type (e.g., CUFFT_R2C for single precision real
to complex).
▶ workSize[In] – Pointer to the size(s), in bytes, of the work areas. For example
for two GPUs worksize must be declared to have two elements.
▶ *workSize[Out] – Pointer to the size(s) of the work area(s).
Return values
▶ CUFFT_SUCCESS – cuFFT successfully created the FFT plan.
▶ CUFFT_INVALID_PLAN – The plan parameter is not a valid handle.
▶ CUFFT_ALLOC_FAILED – The allocation of GPU resources for the plan failed.
▶ CUFFT_INVALID_VALUE – One or more invalid parameters were passed to the
API.
▶ CUFFT_INTERNAL_ERROR – An internal driver error was detected.
▶ CUFFT_SETUP_FAILED – The cuFFT library failed to initialize.
▶ CUFFT_INVALID_SIZE – One or more of the nx, ny, or nz parameters is not a
supported size.
2.3.6. cufftMakePlanMany()
cufftResult cufftMakePlanMany(cufftHandle plan, int rank, int *n, int *inembed, int istride, int idist,
int *onembed, int ostride, int odist, cufftType type, int batch,
size_t *workSize);
Following a call to cufftCreate() makes a FFT plan configuration of dimension rank, with sizes
specified in the array n. The batch input parameter tells cuFFT how many transforms to config-
ure. With this function, batched plans of 1, 2, or 3 dimensions may be created.
The cufftPlanMany() API supports more complicated input and output data layouts via the
advanced data layout parameters: inembed, istride, idist, onembed, ostride, and odist.
If inembed and onembed are set to NULL, all other stride information is ignored, and default
strides are used. The default assumes contiguous data arrays.
This call can only be used once for a given handle. It will fail and return CUFFT_INVALID_PLAN
if the plan is locked, i.e. the handle was previously used with a different cufftPlan or cufft-
MakePlan call.
If cufftXtSetGPUs() was called prior to this call with multiple GPUs, then workSize will contain
multiple sizes. See sections on multiple GPUs for more details.
All arrays are assumed to be in CPU memory.
Parameters
▶ plan[In] – cufftHandle returned by cufftCreate.
▶ rank[In] – Dimensionality of the transform (1, 2, or 3)
▶ n[In] – Array of size rank, describing the size of each dimension, n[0] being
the size of the outermost and n[rank-1] innermost (contiguous) dimension of
a transform. For multiple GPUs and rank equal to 1, the sizes must be a power
of 2. For multiple GPUs and rank equal to 2 or 3, the sizes must be factorable
into primes less than or equal to 127.
▶ inembed[In] – Pointer of size rank that indicates the storage dimensions of
the input data in memory, inembed[0] being the storage dimension of the out-
ermost dimension. If set to NULL all other advanced data layout parameters are
ignored. |
▶ istride[In] – Indicates the distance between two successive input elements
in the least significant (i.e., innermost) dimension |
▶ idist[In] – Indicates the distance between the first element of two consec-
utive signals in a batch of the input data |
▶ onembed[In] – Pointer of size rank that indicates the storage dimensions of
the output data in memory, inembed[0] being the storage dimension of the
outermost dimension. If set to NULL all other advanced data layout parameters
are ignored. |
▶ ostride[In] – Indicates the distance between two successive output ele-
ments in the output array in the least significant (i.e., innermost) dimension |
▶ odist[In] – Indicates the distance between the first element of two consec-
utive signals in a batch of the output data |
▶ type[In] – The transform data type (e.g., CUFFT_R2C for single precision real
to complex). For 2 GPUs this must be a complex to complex transform.
2.3.7. cufftMakePlanMany64()
cufftResult cufftMakePlanMany64(cufftHandle plan, int rank, long long int *n, long long int
*inembed, long long int istride, long long int idist, long long int
*onembed, long long int ostride, long long int odist, cufftType
type, long long int batch, size_t *workSize);
Following a call to cufftCreate() makes a FFT plan configuration of dimension rank, with sizes
specified in the array n. The batch input parameter tells cuFFT how many transforms to config-
ure. With this function, batched plans of 1, 2, or 3 dimensions may be created.
This API is identical to cufftMakePlanMany except that the arguments specifying sizes and
strides are 64 bit integers. This API makes very large transforms possible. cuFFT includes kernels
that use 32 bit indexes, and kernels that use 64 bit indexes. cuFFT planning selects 32 bit kernels
whenever possible to avoid any overhead due to 64 bit arithmetic.
All sizes and types of transform are supported by this interface, with two exceptions. For trans-
forms whose size exceeds 4G elements, the dimensions specified in the array n must be fac-
torable into primes that are less than or equal to 127. For real to complex and complex to real
transforms whose size exceeds 4G elements, the fastest changing dimension must be even.
The cufftPlanMany64() API supports more complicated input and output data layouts via the
advanced data layout parameters: inembed, istride, idist, onembed, ostride, and odist.
If inembed and onembed are set to NULL, all other stride information is ignored, and default
strides are used. The default assumes contiguous data arrays.
This call can only be used once for a given handle. It will fail and return CUFFT_INVALID_PLAN
if the plan is locked, i.e. the handle was previously used with a different cufftPlan or cufft-
MakePlan call.
If cufftXtSetGPUs() was called prior to this call with multiple GPUs, then workSize will contain
multiple sizes. See sections on multiple GPUs for more details.
All arrays are assumed to be in CPU memory.
Parameters
▶ plan[In] – cufftHandle returned by cufftCreate.
▶ rank[In] – Dimensionality of the transform (1, 2, or 3).
▶ n[In] – Array of size rank, describing the size of each dimension. For multiple
GPUs and rank equal to 1, the sizes must be a power of 2. For multiple GPUs
and rank equal to 2 or 3, the sizes must be factorable into primes less than or
equal to 127.
▶ inembed[In] – Pointer of size rank that indicates the storage dimensions of
the input data in memory. If set to NULL all other advanced data layout param-
eters are ignored.
▶ istride[In] – Indicates the distance between two successive input elements
in the least significant (i.e., innermost) dimension.
▶ idist[In] – Indicates the distance between the first element of two consec-
utive signals in a batch of the input data.
▶ onembed[In] – Pointer of size rank that indicates the storage dimensions of
the output data in memory. If set to NULL all other advanced data layout pa-
rameters are ignored.
▶ ostride[In] – Indicates the distance between two successive output ele-
ments in the output array in the least significant (i.e., innermost) dimension.
▶ odist[In] – Indicates the distance between the first element of two consec-
utive signals in a batch of the output data.
▶ type[In] – The transform data type (e.g., CUFFT_R2C for single precision real
to complex). For 2 GPUs this must be a complex to complex transform.
▶ batch[In] – Batch size for this transform.
▶ *workSize[In] – Pointer to the size(s), in bytes, of the work areas. For example
for two GPUs worksize must be declared to have two elements.
▶ *workSize[Out] – Pointer to the size(s) of the work areas.
Return values
▶ CUFFT_SUCCESS – cuFFT successfully created the FFT plan.
▶ CUFFT_INVALID_PLAN – The plan parameter is not a valid handle. Handle is
not valid when the plan is locked or multi-GPU restrictions are not met.
▶ CUFFT_ALLOC_FAILED – The allocation of GPU resources for the plan failed.
▶ CUFFT_INVALID_VALUE – One or more invalid parameters were passed to the
API.
▶ CUFFT_INTERNAL_ERROR – An internal driver error was detected.
▶ CUFFT_SETUP_FAILED – The cuFFT library failed to initialize.
▶ CUFFT_INVALID_SIZE – One or more of the parameters is not a supported size.
2.3.8. cufftXtMakePlanMany()
cufftResult cufftXtMakePlanMany(cufftHandle plan, int rank, long long int *n, long long int
*inembed, long long int istride, long long int idist,
cudaDataType inputtype, long long int *onembed, long long int
ostride, long long int odist, cudaDataType outputtype, long
long int batch, size_t *workSize, cudaDataType executiontype);
Following a call to cufftCreate() makes an FFT plan configuration of dimension rank, with
sizes specified in the array n. The batch input parameter tells cuFFT how many transforms to
configure. With this function, batched plans of 1, 2, or 3 dimensions may be created.
Type specifiers inputtype, outputtype and executiontype dictate type and precision of
transform to be performed. Not all combinations of parameters are supported. Currently all
three parameters need to match precision. Parameters inputtype and outputtype need to
match transform type complex-to-complex, real-to-complex or complex-to-real. Parameter ex-
ecutiontype needs to match precision and be of a complex type. Example: for a half-precision
real-to-complex transform, parameters inputtype, outputtype and executiontype would
have values of CUDA_R_16F, CUDA_C_16F and CUDA_C_16F respectively. Similarly, a bfloat16
complex-to-real transform would use CUDA_C_16BF for inputtype and executiontype, and
CUDA_R_16BF for outputtype.
The cufftXtMakePlanMany() API supports more complicated input and output data layouts
via the advanced data layout parameters: inembed, istride, idist, onembed, ostride, and
odist.
If inembed and onembed are set to NULL, all other stride information is ignored, and default
strides are used. The default assumes contiguous data arrays.
If cufftXtSetGPUs() was called prior to this call with multiple GPUs, then workSize will contain
multiple sizes. See sections on multiple GPUs for more details.
All arrays are assumed to be in CPU memory.
Parameters
▶ plan[In] – cufftHandle returned by cufftCreate.
▶ rank[In] – Dimensionality of the transform (1, 2, or 3).
▶ n[In] – Array of size rank, describing the size of each dimension, n[0] being
the size of the outermost and n[rank-1] innermost (contiguous) dimension of
a transform. For multiple GPUs and rank equal to 1, the sizes must be a power
of 2. For multiple GPUs and rank equal to 2 or 3, the sizes must be factorable
into primes less than or equal to 127.
▶ inembed[In] – Pointer of size rank that indicates the storage dimensions of
the input data in memory, inembed[0] being the storage dimension of the out-
ermost dimension. If set to NULL all other advanced data layout parameters are
ignored.
▶ istride[In] – Indicates the distance between two successive input elements
in the least significant (i.e., innermost) dimension.
▶ idist[In] – Indicates the distance between the first element of two consec-
utive signals in a batch of the input data.
▶ inputtype[In] – Type of input data.
▶ onembed[In] – Pointer of size rank that indicates the storage dimensions of
the output data in memory, inembed[0] being the storage dimension of the
outermost dimension. If set to NULL all other advanced data layout parameters
are ignored.
▶ ostride[In] – Indicates the distance between two successive output ele-
ments in the output array in the least significant (i.e., innermost) dimension.
▶ odist[In] – Indicates the distance between the first element of two consec-
utive signals in a batch of the output data.
▶ outputtype[In] – Type of output data.
▶ batch[In] – Batch size for this transform.
▶ *workSize[In] – Pointer to the size(s), in bytes, of the work areas. For example
for two GPUs worksize must be declared to have two elements.
▶ executiontype[In] – Type of data to be used for computations.
▶ *workSize[Out] – Pointer to the size(s) of the work areas.
Return values
▶ CUFFT_SUCCESS – cuFFT successfully created the FFT plan.
▶ CUFFT_INVALID_PLAN – The plan parameter is not a valid handle. Handle is
not valid when multi-GPU restrictions are not met.
▶ CUFFT_ALLOC_FAILED – The allocation of GPU resources for the plan failed.
▶ CUFFT_INVALID_VALUE – One or more invalid parameters were passed to the
API.
▶ CUFFT_INTERNAL_ERROR – An internal driver error was detected.
▶ CUFFT_SETUP_FAILED – The cuFFT library failed to initialize.
▶ CUFFT_INVALID_SIZE – One or more of the parameters is not a supported size.
2.4.1. cufftEstimate1d()
cufftResult cufftEstimate1d(int nx, cufftType type, int batch, size_t *workSize);
During plan execution, cuFFT requires a work area for temporary storage of intermediate results.
This call returns an estimate for the size of the work area required, given the specified parame-
ters, and assuming default plan settings.
Parameters
▶ nx[In] – The transform size (e.g. 256 for a 256-point FFT).
▶ type[In] – The transform data type (e.g., CUFFT_C2C for single precision com-
plex to complex).
▶ batch[In] – Number of transforms of size nx. Please consider using cufftEs-
timateMany for multiple transforms.
▶ *workSize[In] – Pointer to the size, in bytes, of the work space.
▶ *workSize[Out] – Pointer to the size of the work space.
Return values
▶ CUFFT_SUCCESS – cuFFT successfully returned the size of the work space.
▶ CUFFT_ALLOC_FAILED – The allocation of GPU resources for the plan failed.
▶ CUFFT_INVALID_VALUE – One or more invalid parameters were passed to the
API.
▶ CUFFT_INTERNAL_ERROR – An internal driver error was detected.
▶ CUFFT_SETUP_FAILED – The cuFFT library failed to initialize.
▶ CUFFT_INVALID_SIZE – The nx parameter is not a supported size.
2.4.2. cufftEstimate2d()
cufftResult cufftEstimate2d(int nx, int ny, cufftType type, size_t *workSize);
During plan execution, cuFFT requires a work area for temporary storage of intermediate results.
This call returns an estimate for the size of the work area required, given the specified parame-
ters, and assuming default plan settings.
Parameters
▶ nx[In] – The transform size in the x dimension (number of rows).
▶ ny[In] – The transform size in the y dimension (number of columns).
▶ type[In] – The transform data type (e.g., CUFFT_C2R for single precision com-
plex to real).
▶ *workSize[In] – Pointer to the size, in bytes, of the work space.
▶ *workSize[Out] – Pointer to the size of the work space.
Return values
▶ CUFFT_SUCCESS – cuFFT successfully returned the size of the work space.
▶ CUFFT_ALLOC_FAILED – The allocation of GPU resources for the plan failed.
2.4.3. cufftEstimate3d()
cufftResult cufftEstimate3d(int nx, int ny, int nz, cufftType type, size_t *workSize);
During plan execution, cuFFT requires a work area for temporary storage of intermediate results.
This call returns an estimate for the size of the work area required, given the specified parame-
ters, and assuming default plan settings.
Parameters
▶ nx[In] – The transform size in the x dimension.
▶ ny[In] – The transform size in the y dimension.
▶ nz[In] – The transform size in the z dimension.
▶ type[In] – The transform data type (e.g., CUFFT_R2C for single precision real
to complex).
▶ *workSize[In] – Pointer to the size, in bytes, of the work space.
▶ *workSize[Out] – Pointer to the size of the work space.
Return values
▶ CUFFT_SUCCESS – cuFFT successfully returned the size of the work space.
▶ CUFFT_ALLOC_FAILED – The allocation of GPU resources for the plan failed.
▶ CUFFT_INVALID_VALUE – One or more invalid parameters were passed to the
API.
▶ CUFFT_INTERNAL_ERROR – An internal driver error was detected.
▶ CUFFT_SETUP_FAILED – The cuFFT library failed to initialize.
▶ CUFFT_INVALID_SIZE – One or more of the nx, ny, or nz parameters is not a
supported size.
2.4.4. cufftEstimateMany()
cufftResult cufftEstimateMany(int rank, int *n, int *inembed, int istride, int idist, int *onembed, int
ostride, int odist, cufftType type, int batch, size_t *workSize);
During plan execution, cuFFT requires a work area for temporary storage of intermediate results.
This call returns an estimate for the size of the work area required, given the specified parame-
ters, and assuming default plan settings.
The cufftEstimateMany() API supports more complicated input and output data layouts
via the advanced data layout parameters: inembed, istride, idist, onembed, ostride, and
odist.
All arrays are assumed to be in CPU memory.
Parameters
▶ rank[In] – Dimensionality of the transform (1, 2, or 3).
▶ n[In] – Array of size rank, describing the size of each dimension.
▶ inembed[In] – Pointer of size rank that indicates the storage dimensions of
the input data in memory. If set to NULL all other advanced data layout param-
eters are ignored.
▶ istride[In] – Indicates the distance between two successive input elements
in the least significant (i.e., innermost) dimension.
▶ idist[In] – Indicates the distance between the first element of two consec-
utive signals in a batch of the input data.
▶ onembed[In] – Pointer of size rank that indicates the storage dimensions of
the output data in memory. If set to NULL all other advanced data layout pa-
rameters are ignored.
▶ ostride[In] – Indicates the distance between two successive output ele-
ments in the output array in the least significant (i.e., innermost) dimension.
▶ odist[In] – Indicates the distance between the first element of two consec-
utive signals in a batch of the output data.
▶ type[In] – The transform data type (e.g., CUFFT_R2C for single precision real
to complex).
▶ batch[In] – Batch size for this transform.
▶ *workSize[In] – Pointer to the size, in bytes, of the work space.
▶ *workSize[Out] – Pointer to the size of the work space
Return values
▶ CUFFT_SUCCESS – cuFFT successfully returned the size of the work space.
▶ CUFFT_ALLOC_FAILED – The allocation of GPU resources for the plan failed.
▶ CUFFT_INVALID_VALUE – One or more invalid parameters were passed to the
API.
▶ CUFFT_INTERNAL_ERROR – An internal driver error was detected.
▶ CUFFT_SETUP_FAILED – The cuFFT library failed to initialize.
▶ CUFFT_INVALID_SIZE – One or more of the parameters is not a supported size.
2.5.1. cufftGetSize1d()
cufftResult cufftGetSize1d(cufftHandle plan, int nx, cufftType type, int batch, size_t *workSize);
This call gives a more accurate estimate of the work area size required for a plan than cufftEs-
timate1d(), given the specified parameters, and taking into account any plan settings that may
have been made.
Parameters
▶ plan[In] – cufftHandle returned by cufftCreate.
▶ nx[In] – The transform size (e.g. 256 for a 256-point FFT).
▶ type[In] – The transform data type (e.g., CUFFT_C2C for single precision com-
plex to complex).
▶ batch[In] – Number of transforms of size nx. Please consider using cufft-
GetSizeMany for multiple transforms.
▶ *workSize[In] – Pointer to the size(s), in bytes, of the work areas. For example
for two GPUs worksize must be declared to have two elements.
▶ *workSize[Out] – Pointer to the size of the work space.
Return values
▶ CUFFT_SUCCESS – cuFFT successfully returned the size of the work space.
▶ CUFFT_INVALID_PLAN – The plan parameter is not a valid handle.
▶ CUFFT_ALLOC_FAILED – The allocation of GPU resources for the plan failed.
▶ CUFFT_INVALID_VALUE – One or more invalid parameters were passed to the
API.
▶ CUFFT_INTERNAL_ERROR – An internal driver error was detected.
▶ CUFFT_SETUP_FAILED – The cuFFT library failed to initialize.
▶ CUFFT_INVALID_SIZE – The nx parameter is not a supported size.
2.5.2. cufftGetSize2d()
cufftResult cufftGetSize2d(cufftHandle plan, int nx, int ny, cufftType type, size_t *workSize);
This call gives a more accurate estimate of the work area size required for a plan than cufftEs-
timate2d(), given the specified parameters, and taking into account any plan settings that may
have been made.
Parameters
▶ plan[In] – cufftHandle returned by cufftCreate.
▶ nx[In] – The transform size in the x dimension (number of rows).
▶ ny[In] – The transform size in the y dimension (number of columns).
▶ type[In] – The transform data type (e.g., CUFFT_C2R for single precision com-
plex to real).
▶ *workSize[In] – Pointer to the size(s), in bytes, of the work areas. For example
for two GPUs worksize must be declared to have two elements.
▶ *workSize[Out] – Pointer to the size of the work space.
Return values
▶ CUFFT_SUCCESS – cuFFT successfully returned the size of the work space.
▶ CUFFT_INVALID_PLAN – The plan parameter is not a valid handle.
▶ CUFFT_ALLOC_FAILED – The allocation of GPU resources for the plan failed.
▶ CUFFT_INVALID_VALUE – One or more invalid parameters were passed to the
API.
▶ CUFFT_INTERNAL_ERROR – An internal driver error was detected.
▶ CUFFT_SETUP_FAILED – The cuFFT library failed to initialize.
▶ CUFFT_INVALID_SIZE – Either or both of the nx or ny parameters is not a sup-
ported size.
2.5.3. cufftGetSize3d()
cufftResult cufftGetSize3d(cufftHandle plan, int nx, int ny, int nz, cufftType type, size_t
*workSize);
This call gives a more accurate estimate of the work area size required for a plan than cufftEs-
timate3d(), given the specified parameters, and taking into account any plan settings that may
have been made.
Parameters
▶ plan[In] – cufftHandle returned by cufftCreate.
▶ nx[In] – The transform size in the x dimension.
▶ ny[In] – The transform size in the y dimension.
▶ nz[In] – The transform size in the z dimension.
▶ type[In] – The transform data type (e.g., CUFFT_R2C for single precision real
to complex).
▶ *workSize[In] – Pointer to the size(s), in bytes, of the work areas. For example
for two GPUs worksize must be declared to have two elements.
▶ *workSize[Out] – Pointer to the size of the work space.
Return values
▶ CUFFT_SUCCESS – cuFFT successfully returned the size of the work space.
▶ CUFFT_INVALID_PLAN – The plan parameter is not a valid handle.
▶ CUFFT_ALLOC_FAILED – The allocation of GPU resources for the plan failed.
▶ CUFFT_INVALID_VALUE – One or more invalid parameters were passed to the
API.
▶ CUFFT_INTERNAL_ERROR – An internal driver error was detected.
▶ CUFFT_SETUP_FAILED – The cuFFT library failed to initialize.
▶ CUFFT_INVALID_SIZE – One or more of the nx, ny, or nz parameters is not a
supported size.
2.5.4. cufftGetSizeMany()
cufftResult cufftGetSizeMany(cufftHandle plan, int rank, int *n, int *inembed, int istride, int idist,
int *onembed, int ostride, int odist, cufftType type, int batch, size_t
*workSize);
This call gives a more accurate estimate of the work area size required for a plan than cufftEs-
timateSizeMany(), given the specified parameters, and taking into account any plan settings
that may have been made.
Parameters
▶ plan[In] – cufftHandle returned by cufftCreate.
▶ rank[In] – Dimensionality of the transform (1, 2, or 3).
▶ n[In] – Array of size rank, describing the size of each dimension.
▶ inembed[In] – Pointer of size rank that indicates the storage dimensions of
the input data in memory. If set to NULL all other advanced data layout param-
eters are ignored.
▶ istride[In] – Indicates the distance between two successive input elements
in the least significant (i.e., innermost) dimension.
▶ idist[In] – Indicates the distance between the first element of two consec-
utive signals in a batch of the input data.
▶ onembed[In] – Pointer of size rank that indicates the storage dimensions of
the output data in memory. If set to NULL all other advanced data layout pa-
rameters are ignored.
▶ ostride[In] – Indicates the distance between two successive output ele-
ments in the output array in the least significant (i.e., innermost) dimension.
▶ odist[In] – Indicates the distance between the first element of two consec-
utive signals in a batch of the output data.
▶ type[In] – The transform data type (e.g., CUFFT_R2C for single precision real
to complex).
▶ batch[In] – Batch size for this transform.
▶ *workSize[In] – Pointer to the size(s), in bytes, of the work areas. For example
for two GPUs worksize must be declared to have two elements.
▶ *workSize[Out] – Pointer to the size of the work area.
Return values
▶ CUFFT_SUCCESS – cuFFT successfully returned the size of the work space.
▶ CUFFT_INVALID_PLAN – The plan parameter is not a valid handle.
▶ CUFFT_ALLOC_FAILED – The allocation of GPU resources for the plan failed.
▶ CUFFT_INVALID_VALUE – One or more invalid parameters were passed to the
API.
▶ CUFFT_INTERNAL_ERROR – An internal driver error was detected.
▶ CUFFT_SETUP_FAILED – The cuFFT library failed to initialize.
▶ CUFFT_INVALID_SIZE – One or more of the parameters is not a supported size.
2.5.5. cufftGetSizeMany64()
cufftResult cufftGetSizeMany64(cufftHandle plan, int rank, long long int *n, long long int
*inembed, long long int istride, long long int idist, long long int
*onembed, long long int ostride, long long int odist, cufftType
type, long long int batch, size_t *workSize);
This call gives a more accurate estimate of the work area size required for a plan than cufftEs-
timateSizeMany(), given the specified parameters, and taking into account any plan settings
that may have been made.
This API is identical to cufftMakePlanMany except that the arguments specifying sizes and
strides are 64 bit integers. This API makes very large transforms possible. cuFFT includes kernels
that use 32 bit indexes, and kernels that use 64 bit indexes. cuFFT planning selects 32 bit kernels
whenever possible to avoid any overhead due to 64 bit arithmetic.
All sizes and types of transform are supported by this interface, with two exceptions. For trans-
forms whose total size exceeds 4G elements, the dimensions specified in the array n must be
factorable into primes that are less than or equal to 127. For real to complex and complex to
real transforms whose total size exceeds 4G elements, the fastest changing dimension must be
even.
Parameters
▶ plan[In] – cufftHandle returned by cufftCreate.
▶ rank[In] – Dimensionality of the transform (1, 2, or 3).
▶ n[In] – Array of size rank, describing the size of each dimension.
▶ inembed[In] – Pointer of size rank that indicates the storage dimensions of
the input data in memory. If set to NULL all other advanced data layout param-
eters are ignored.
2.5.6. cufftXtGetSizeMany()
cufftResult cufftXtGetSizeMany(cufftHandle plan, int rank, long long int *n, long long int
*inembed, long long int istride, long long int idist, cudaDataType
inputtype, long long int *onembed, long long int ostride, long
long int odist, cudaDataType outputtype, long long int batch,
size_t *workSize, cudaDataType executiontype);
This call gives a more accurate estimate of the work area size required for a plan than cufftEs-
timateSizeMany(), given the specified parameters that match signature of cufftXtMake-
PlanMany function, and taking into account any plan settings that may have been made.
For more information about valid combinations of inputtype, outputtype and executiontype
parameters please refer to documentation of cufftXtMakePlanMany function.
Parameters
▶ plan[In] – cufftHandle returned by cufftCreate.
2.6. cufftGetSize()
cufftResult cufftGetSize(cufftHandle plan, size_t *workSize);
Once plan generation has been done, either with the original API or the extensible API, this call
returns the actual size of the work area required to support the plan. Callers who choose to
manage work area allocation within their application must use this call after plan generation,
and after any cufftSet*() calls subsequent to plan generation, if those calls might alter the
required work space size.
Parameters
▶ plan[In] – cufftHandle returned by cufftCreate.
▶ *workSize[In] – Pointer to the size(s), in bytes, of the work areas. For example
for two GPUs worksize must be declared to have two elements.
▶ *workSize[Out] – Pointer to the size of the work area.
Return values
▶ CUFFT_SUCCESS – cuFFT successfully returned the size of the work space.
▶ CUFFT_INVALID_PLAN – The plan parameter is not a valid handle.
▶ CUFFT_INTERNAL_ERROR – An internal driver error was detected.
2.7.1. cufftSetAutoAllocation()
cufftResult cufftSetAutoAllocation(cufftHandle plan, int autoAllocate);
cufftSetAutoAllocation() indicates that the caller intends to allocate and manage work
areas for plans that have been generated. cuFFT default behavior is to allocate the work area
at plan generation time. If cufftSetAutoAllocation() has been called with autoAllocate set
to 0 (“false”) prior to one of the cufftMakePlan*() calls, cuFFT does not allocate the work area.
This is the preferred sequence for callers wishing to manage work area allocation.
Parameters
▶ plan[In] – cufftHandle returned by cufftCreate.
▶ autoAllocate[In] – Indicates whether to allocate work area.
Return values
▶ CUFFT_SUCCESS – cuFFT successfully returned the size of the work space.
▶ CUFFT_INVALID_PLAN – The plan parameter is not a valid handle.
▶ CUFFT_INTERNAL_ERROR – An internal driver error was detected.
2.6. cufftGetSize() 47
cuFFT, Release 12.3
2.7.2. cufftSetWorkArea()
cufftResult cufftSetWorkArea(cufftHandle plan, void *workArea);
cufftSetWorkArea() overrides the work area pointer associated with a plan. If the work area
was auto-allocated, cuFFT frees the auto-allocated space. The cufftExecute*() calls assume
that the work area pointer is valid and that it points to a contiguous region in device memory
that does not overlap with any other work area. If this is not the case, results are indeterminate.
Parameters
▶ plan[In] – cufftHandle returned by cufftCreate.
▶ *workArea[In] – Pointer to workArea. For multiple GPUs, multiple work area
pointers must be given.
Return values
▶ CUFFT_SUCCESS – cuFFT successfully returned the size of the work space.
▶ CUFFT_INVALID_PLAN – The plan parameter is not a valid handle.
▶ CUFFT_INTERNAL_ERROR – An internal driver error was detected.
▶ CUFFT_SETUP_FAILED – The cuFFT library failed to initialize.
2.7.3. cufftXtSetWorkAreaPolicy()
cufftResult cufftXtSetWorkAreaPolicy(cufftHandle plan, cufftXtWorkAreaPolicy policy, size_t
*workSize);
cufftXtSetWorkAreaPolicy() indicates that the caller intends to change work area size
for a given plan handle. cuFFT’s default behavior is to allocate the work area at plan
generation time with a default size that depends on the plan type and other parame-
ters. If cufftXtSetWorkAreaPolicy() has been called with the policy parameter set to
CUFFT_WORKAREA_MINIMAL, cuFFT will attempt to re-plan the handle to use zero bytes of work
area memory. If the cufftXtSetWorkAreaPolicy() call is successful the auto-allocated work
area memory is released.
Currently the policies CUFFT_WORKAREA_PERFORMANCE, CUFFT_WORKAREA_USER and the work-
Size parameter are not supported and reserved for use in future cuFFT releases.
This function can be called once per lifetime of a plan handle.
Parameters
▶ plan[In] – cufftHandle returned by cufftCreate.
▶ policy[In] – Type of work area policy to apply.
▶ *workSize[In] – Reserved for future use.
Return values
▶ CUFFT_SUCCESS – cuFFT successfully returned the size of the work space.
▶ CUFFT_INVALID_PLAN – The plan parameter is not a valid handle.
▶ CUFFT_INVALID_SIZE – FFT size does not allow use of the selected policy.
▶ CUFFT_INTERNAL_ERROR – An internal driver error was detected.
in the odata array. Pointers to idata and odata are both required to be aligned to cufftCom-
plex data type in single-precision transforms and cufftDoubleComplex data type in double-
precision transforms. If idata and odata are the same, this method does an in-place transform.
Note the data layout differences between in-place and out-of-place transforms as described in
Parameter cufftType.
Parameters
▶ plan[In] – cufftHandle returned by cufftCreate.
▶ idata[In] – Pointer to the real input data (in GPU memory) to transform.
▶ odata[In] – Pointer to the complex output data (in GPU memory).
▶ odata[Out] – Contains the complex Fourier coefficients.
Return values
▶ CUFFT_SUCCESS – cuFFT successfully returned the size of the work space.
▶ CUFFT_INVALID_PLAN – The plan parameter is not a valid handle.
▶ CUFFT_INVALID_VALUE – At least one of the parameters idata and odata is
not valid.
▶ CUFFT_INTERNAL_ERROR – An internal driver error was detected.
▶ CUFFT_EXEC_FAILED – cuFFT failed to execute the transform on the GPU.
▶ CUFFT_SETUP_FAILED – The cuFFT library failed to initialize.
2.8.4. cufftXtExec()
cufftResult cufftXtExec(cufftHandle plan, void *input, void *output, int direction);
Function cufftXtExec executes any cuFFT transform regardless of precision and type. In case of
complex-to-real and real-to-complex transforms direction parameter is ignored. cuFFT uses
the GPU memory pointed to by the input parameter as input data. This function stores the
Fourier coefficients in the output array. If input and output are the same, this method does
an in-place transform.
Parameters
▶ plan[In] – cufftHandle returned by cufftCreate.
▶ input[In] – Pointer to the input data (in GPU memory) to transform.
▶ output[In] – Pointer to the output data (in GPU memory).
▶ direction[In] – The transform direction: CUFFT_FORWARD or
CUFFT_INVERSE. Ignored for complex-to-real and real-to-complex trans-
forms.
▶ output[Out] – Contains the complex Fourier coefficients.
Return values
▶ CUFFT_SUCCESS – cuFFT successfully executed the FFT plan.
▶ CUFFT_INVALID_PLAN – The plan parameter is not a valid handle.
▶ CUFFT_INVALID_VALUE – At least one of the parameters idata, odata, and
direction is not valid.
▶ CUFFT_INTERNAL_ERROR – An internal driver error was detected.
▶ CUFFT_EXEC_FAILED – cuFFT failed to execute the transform on the GPU.
▶ CUFFT_SETUP_FAILED – The cuFFT library failed to initialize.
2.8.5. cufftXtExecDescriptor()
cufftResult cufftXtExecDescriptor(cufftHandle plan, cudaLibXtDesc *input, cudaLibXtDesc
*output, int direction);
Function cufftXtExecDescriptor() executes any cuFFT transform regardless of precision
and type. In case of complex-to-real and real-to-complex transforms direction parameter is
ignored. cuFFT uses the GPU memory pointed to by cudaLibXtDesc *input descriptor as input
data and cudaLibXtDesc *output as output data.
Parameters
▶ plan[In] – cufftHandle returned by cufftCreate.
▶ input[In] – Pointer to the complex input data (in GPU memory) to transform.
2.9.1. cufftXtSetGPUs()
cufftResult cufftXtSetGPUs(cufftHandle plan, int nGPUs, int *whichGPUs);
cufftXtSetGPUs() identifies which GPUs are to be used with the plan. As in the single GPU case
cufftCreate() creates a plan and cufftMakePlan*() does the plan generation. In cuFFT prior
to 10.4.0, this call will return an error if a non-default stream has been associated with the plan.
Note that the call to cufftXtSetGPUs() must occur after the call to cufftCreate() and prior
to the call to cufftMakePlan*(). Parameter whichGPUs of cufftXtSetGPUs() function de-
termines ordering of the GPUs with respect to data decomposition (first data chunk is placed on
GPU denoted by first element of whichGPUs).
Parameters
▶ plan[In] – cufftHandle returned by cufftCreate.
▶ nGPUs[In] – Number of GPUs to use.
▶ whichGPUs[In] – The GPUs to use.
Return values
▶ CUFFT_SUCCESS – cuFFT successfully set the GPUs to use.
▶ CUFFT_INVALID_PLAN – The plan parameter is not a valid handle, or a non-
default stream has been associated with the plan in cuFFT prior to 10.4.0.
▶ CUFFT_ALLOC_FAILED – The allocation of GPU resources for the plan failed.
▶ CUFFT_INTERNAL_ERROR – An internal driver error was detected.
▶ CUFFT_SETUP_FAILED – The cuFFT library failed to initialize.
2.9.2. cufftXtSetWorkArea()
cufftResult cufftXtSetWorkArea(cufftHandle plan, void **workArea);
cufftXtSetWorkArea() overrides the work areas associated with a plan. If the work area was
auto-allocated, cuFFT frees the auto-allocated space. The cufftXtExec*() calls assume that
the work area is valid and that it points to a contiguous region in each device memory that does
not overlap with any other work area. If this is not the case, results are indeterminate.
Parameters
▶ plan[In] – cufftHandle returned by cufftCreate.
▶ workArea[In] – Pointer to the pointers to workArea.
Return values
▶ CUFFT_SUCCESS – cuFFT successfully set the GPUs to use.
▶ CUFFT_INVALID_PLAN – The plan parameter is not a valid handle.
▶ CUFFT_INTERNAL_ERROR – An internal driver error was detected.
▶ CUFFT_SETUP_FAILED – The cuFFT library failed to initialize.
▶ CUFFT_INVALID_DEVICE – A GPU associated with the plan could not be se-
lected.
2.9.4.1 cufftXtMalloc()
cufftXtSubFormat_t is an enumerated type that indicates if the buffer will be used for input or
output and the ordering of the data.
typedef enum cufftXtSubFormat_t {
CUFFT_XT_FORMAT_INPUT, ∕∕by default input is in linear order across�
,→GPUs
CUFFT_FORMAT_UNDEFINED
} cufftXtSubFormat;
2.9.4.2 cufftXtFree()
2.9.4.3 cufftXtMemcpy()
cufftXtCopyType_t is an enumerated type for multiple GPU functions that specifies the type of
copy for cufftXtMemcpy().
CUFFT_COPY_HOST_TO_DEVICE copies data from a contiguous host buffer to multiple device buffers,
in the layout cuFFT requires for input data. dstPointer must point to a cudaLibXtDesc structure,
and srcPointer must point to a host memory buffer.
CUFFT_COPY_DEVICE_TO_HOST copies data from multiple device buffers, in the layout cuFFT pro-
duces for output data, to a contiguous host buffer. dstPointer must point to a host memory buffer,
and srcPointer must point to a cudaLibXtDesc structure.
CUFFT_COPY_DEVICE_TO_DEVICE copies data from multiple device buffers, in the layout cuFFT pro-
duces for output data, to multiple device buffers, in the layout cuFFT requires for input data. dst-
Pointer and srcPointer must point to different cudaLibXtDesc structures (and therefore mem-
ory locations). That is, the copy cannot be in-place. Note that device_to_device cufftXtMemcpy()
for 2D and 3D data is not currently supported.
A descriptor type used in multiple GPU routines that contains information about the GPUs and their
memory locations.
struct cudaXtDesc_t{
int version; ∕∕descriptor version
int nGPUs; ∕∕number of GPUs
int GPUs[MAX_CUDA_DESCRIPTOR_GPUS]; ∕∕array of device IDs
void *data[MAX_CUDA_DESCRIPTOR_GPUS]; ∕∕array of pointers to data, one per GPU
size_t size[MAX_CUDA_DESCRIPTOR_GPUS]; ∕∕array of data sizes, one per GPU
void *cudaXtState; ∕∕opaque CUDA utility structure
};
typedef struct cudaXtDesc_t cudaXtDesc;
2.9.5.2 cudaLibXtDesc
A descriptor type used in multiple GPU routines that contains information about the library used.
struct cudaLibXtDesc_t{
int version; ∕∕descriptor version
cudaXtDesc *descriptor; ∕∕multi-GPU memory descriptor
libFormat library; ∕∕which library recognizes the format
int subFormat; ∕∕library specific enumerator of sub formats
void *libDescriptor; ∕∕library specific descriptor e.g. FFT transform plan�
,→object
};
typedef struct cudaLibXtDesc_t cudaLibXtDesc;
2.10.1. cufftXtSetCallback()
cufftResult cufftXtSetCallback(cufftHandle plan, void **callbackRoutine, cufftXtCallbackType
type, void **callerInfo)
cufftXtSetCallback() specifies a load or store callback to be used with the plan. This call
is valid only after a call to cufftMakePlan*(), which does the plan generation. If there was
already a callback of this type associated with the plan, this new callback routine replaces it.
If the new callback requires shared memory, you must call cufftXtSetCallbackSharedSize
with the amount of shared memory it needs. cuFFT will not retain the amount of shared memory
associated with the previous callback.
Parameters
▶ plan[In] – cufftHandle returned by cufftCreate.
▶ callbackRoutine[In] – Array of callback routine pointers, one per GPU.
▶ type[In] – Type of callback routine.
▶ callerInfo[In] – Optional array of device pointers to caller specific informa-
tion, one per GPU.
Return values
▶ CUFFT_SUCCESS – cuFFT successfully associated the callback function with the
plan.
▶ CUFFT_INVALID_PLAN – The plan parameter is not a valid handle, or a non-
default stream has been associated with the plan in cuFFT prior to 10.4.0.
▶ CUFFT_INTERNAL_ERROR – An internal driver error was detected.
▶ CUFFT_SETUP_FAILED – The cuFFT library failed to initialize.
2.10.2. cufftXtClearCallback()
cufftResult cufftXtClearCallback(cufftHandle plan, cufftXtCallbackType type)
cufftXtClearCallback() instructs cuFFT to stop invoking the specified callback type when
executing the plan. Only the specified callback is cleared. If no callback of this type had been
specified, the return code is CUFFT_SUCCESS.
Parameters
▶ plan[In] – cufftHandle returned by cufftCreate.
▶ type[In] – Type of callback routine.
Return values
▶ CUFFT_SUCCESS – cuFFT successfully disassociated the callback function with
the plan.
▶ CUFFT_INVALID_PLAN – The plan parameter is not a valid handle, or a non-
default stream has been associated with the plan in cuFFT prior to 10.4.0.
▶ CUFFT_INTERNAL_ERROR – An internal driver error was detected.
2.10.3. cufftXtSetCallbackSharedSize()
cufftResult cufftXtSetCallbackSharedSize(cufftHandle plan, cufftXtCallbackType type, size_t
sharedSize)
cufftXtSetCallbackSharedSize() instructs cuFFT to dynamically allocate shared memory
at launch time, for use by the callback. The maximum allowable amount of shared memory is
16K bytes. cuFFT passes a pointer to this shared memory to the callback routine at execution
time. This shared memory is only valid for the life of the load or store callback operation. During
execution, cuFFT may overwrite shared memory for its own purposes.
Parameters
▶ plan[In] – cufftHandle returned by cufftCreate.
▶ type[In] – Type of callback routine.
▶ sharedSize[In] – Amount of shared memory requested.
Return values
▶ CUFFT_SUCCESS – cuFFT will invoke the callback routine with a pointer to the
requested amount of shared memory.
▶ CUFFT_INVALID_PLAN – The plan parameter is not a valid handle, or a non-
default stream has been associated with the plan in cuFFT prior to 10.4.0.
▶ CUFFT_INTERNAL_ERROR – An internal driver error was detected.
▶ CUFFT_ALLOC_FAILED – cuFFT will not be able to allocate the requested
amount of shared memory.
2.11. cufftSetStream()
cufftResult cufftSetStream(cufftHandle plan, cudaStream_t stream);
Associates a CUDA stream with a cuFFT plan. All kernel launches made during plan execution are
now done through the associated stream, enabling overlap with activity in other streams (e.g.
data copying). The association remains until the plan is destroyed or the stream is changed with
another call to cufftSetStream().
Note that starting from CUDA 11.2 (cuFFT 10.4.0), cufftSetStream() is supported on multi-
GPU plans. When associating a stream with a plan, cufftXtMemcpy() remains synchronous
across the multiple GPUs. For previous versions of cuFFT, cufftSetStream() will return an
error in multiple GPU plans.
Note that starting from CUDA 12.2 (cuFFT 11.0.8), on multi-GPU plans, stream can be associated
with any context on any GPU. However, repeated calls to cufftSetStream() with streams from
different contexts incur a small time penalty. Optimal performance is obtained when repeated
calls to cufftSetStream use streams from the same CUDA context.
Parameters
▶ plan[In] – The cufftHandle object to associate with the stream.
▶ stream[In] – A valid CUDA stream created with cudaStreamCreate(); 0 for
the default stream.
Return values
▶ CUFFT_SUCCESS – The stream was associated with the plan.
▶ CUFFT_INVALID_PLAN – The plan parameter is not a valid handle, or plan is
multi-gpu in cuFFT version prior to 10.4.0.
2.12. cufftGetVersion()
cufftResult cufftGetVersion(int *version);
Returns the version number of cuFFT.
Parameters
▶ *version[In] – Pointer to the version number.
▶ *version[Out] – Contains the version number.
Return values CUFFT_SUCCESS – cuFFT successfully returned the version number.
2.13. cufftGetProperty()
cufftResult cufftGetProperty(libraryPropertyType type, int *value);
Return in *value the number for the property described by type of the dynamically linked CUFFT
library.
Parameters
▶ type[In] – CUDA library property.
▶ value[Out] – Contains the integer value for the requested property.
Return values
▶ CUFFT_SUCCESS – The property value was successfully returned.
▶ CUFFT_INVALID_TYPE – The property type is not recognized.
▶ CUFFT_INVALID_VALUE – value is NULL.
2.12. cufftGetVersion() 61
cuFFT, Release 12.3
cuFFT performs un-normalized FFTs; that is, performing a forward FFT on an input data set followed
by an inverse FFT on the resulting set yields data that is equal to the input, scaled by the number of
elements. Scaling either transform by the reciprocal of the size of the data set is left for the user to
perform as seen fit.
The corresponding function prototypes and pointer type definitions are as follows:
typedef cufftComplex (*cufftCallbackLoadC)(void *dataIn, size_t offset, void�
,→*callerInfo, void *sharedPointer);
type cufftHandle
A handle type used to store and access cuFFT plans. The user receives a handle after creating a
cuFFT plan and uses this handle to execute the plan.
typedef unsigned int cufftHandle;
2.14.4.2 cufftReal
2.14.4.3 cufftDoubleReal
2.14.4.4 cufftComplex
A single-precision, floating-point complex data type that consists of interleaved real and imaginary
components.
typedef cuComplex cufftComplex;
2.14.4.5 cufftDoubleComplex
A double-precision, floating-point complex data type that consists of interleaved real and imaginary
components.
typedef cuDoubleComplex cufftDoubleComplex;
2.15.1. cudaDataType
The cudaDataType data type is an enumeration of the types supported by CUDA libraries.
typedef enum cudaDataType_t
{
CUDA_R_16F= 2, ∕∕ 16 bit real
CUDA_C_16F= 6, ∕∕ 16 bit complex
CUDA_R_32F= 0, ∕∕ 32 bit real
CUDA_C_32F= 4, ∕∕ 32 bit complex
CUDA_R_64F= 1, ∕∕ 64 bit real
CUDA_C_64F= 5, ∕∕ 64 bit complex
CUDA_R_8I= 3, ∕∕ 8 bit real as a signed integer
CUDA_C_8I= 7, ∕∕ 8 bit complex as a pair of signed integers
CUDA_R_8U= 8, ∕∕ 8 bit real as an unsigned integer
CUDA_C_8U= 9 ∕∕ 8 bit complex as a pair of unsigned integers
} cudaDataType;
2.15.2. libraryPropertyType
The libraryPropertyType data type is an enumeration of library property types. (ie. CUDA version
X.Y.Z would yield MAJOR_VERSION=X, MINOR_VERSION=Y, PATCH_LEVEL=Z)
typedef enum libraryPropertyType_t
{
MAJOR_VERSION,
MINOR_VERSION,
PATCH_LEVEL
} libraryPropertyType;
For simple examples of complex and real 1D, 2D, and 3D transforms that use cuFFT to perform forward
and inverse FFTs, refer to the cuFFT Library samples on GitHub.
65
cuFFT, Release 12.3
This chapter explains how data are distributed between the GPUs, before and after a multiple GPU
transform. For simplicity, it is assumed in this chapter that the caller has specified GPU 0 and GPU 1
to perform the transform.
67
cuFFT, Release 12.3
transform, each GPU again has a portion of the surface, but divided in the y dimension. GPUs 0…2
have surfaces with dimensions [65][25]. GPU 3 has a surface with dimensions [65][24]
For a 3D transform on 4 GPUs consider an array declared in C as data[x][y][z], where x is 103,
y is 122, and z is 64. The volume is distributed prior to the transform such that each GPUs 0…2
receive volumes with dimensions [26][122][64], and GPU 3 receives a volume with dimensions
[25][122][64]. After the transform, each GPU again has a portion of the surface, but divided in
the y dimension. GPUs 0 and 1 have a volumes with dimensions [103][31][64], and GPUs 2 and 3
have volumes with dimensions [103][30][64].
On GPU 0:
string 0 has substrings with indices 0...7 64...71 128...135 ... 960...967
string 1 has substrings with indices 8...15 72...79 136...143 ... 968...975
...
On GPU 1:
string 4 has substrings with indices 32...39 96...103 160...167 ... 992...999
(continues on next page)
The cufftXtQueryPlan API allows the caller to retrieve a structure containing the number of strings,
the decomposition factors, and (in the case of power of 2 size) some useful mask and shift elements.
The example below shows how cufftXtQueryPlan is invoked. It also shows how to translate from an
index in the host input array to the corresponding index on the device, and vice versa.
∕*
* These routines demonstrate the use of cufftXtQueryPlan to get the 1D
* factorization and convert between permuted and linear indexes.
*∕
∕*
* Set up a 1D plan that will execute on GPU 0 and GPU1, and query
* the decomposition factors
*∕
int main(int argc, char **argv){
cufftHandle plan;
cufftResult stat;
int whichGPUs[2] = { 0, 1 };
cufftXt1dFactors factors;
stat = cufftCreate( &plan );
if (stat != CUFFT_SUCCESS) {
printf("Create error %d\n",stat);
return 1;
}
stat = cufftXtSetGPUs( plan, 2, whichGPUs );
if (stat != CUFFT_SUCCESS) {
printf("SetGPU error %d\n",stat);
return 1;
}
stat = cufftMakePlan1d( plan, size, CUFFT_C2C, 1, workSizes );
if (stat != CUFFT_SUCCESS) {
printf("MakePlan error %d\n",stat);
return 1;
}
stat = cufftXtQueryPlan( plan, (void *) &factors, CUFFT_QUERY_1D_FACTORS );
if (stat != CUFFT_SUCCESS) {
printf("QueryPlan error %d\n",stat);
return 1;
}
printf("Factor 1 %zd, Factor2 %zd\n",factors.factor1,factors.factor2);
cufftDestroy(plan);
return 0;
}
∕*
* Given an index into a permuted array, and the GPU index return the
* corresponding linear index from the beginning of the input buffer.
*
* Parameters:
* factors input: pointer to cufftXt1dFactors as returned by
* cufftXtQueryPlan
* permutedIx input: index of the desired element in the device output
* array
* linearIx output: index of the corresponding input element in the
(continues on next page)
∕*
* Given a linear index into a 1D array, return the GPU containing the permuted
* result, and index from the start of the data buffer for that element.
*
* Parameters:
* factors input: pointer to cufftXt1dFactors as returned by
* cufftXtQueryPlan
* linearIx input: index of the desired element in the host input
* array
* permutedIx output: index of the corresponding result in the device
* output array
* GPUix output: index of the GPU containing the result
*∕
cufftResult linear2Permuted( cufftXt1dFactors * factors,
size_t linearIx,
size_t *permutedIx,
int *GPUIx ) {
size_t indexInSubstring;
size_t whichString;
size_t whichSubstring;
size_t whichStringMask;
int whichStringShift;
if (linearIx >= factors->size) {
return CUFFT_INVALID_VALUE;
}
∕∕ get a useful additional mask and shift count
whichStringMask = factors->stringCount -1;
(continues on next page)
cuFFT differs from FFTW in that FFTW has many plans and a single execute function while cuFFT has
fewer plans, but multiple execute functions. The cuFFT execute functions determine the precision
(single or double) and whether the input is complex or real valued. The following table shows the
relationship between the two interfaces.
73
cuFFT, Release 12.3
NVIDIA provides FFTW3 interfaces to the cuFFT library. This allows applications using FFTW to use
NVIDIA GPUs with minimal modifications to program source code. To use the interface first do the
following two steps
▶ It is recommended that you replace the include file fftw3.h with cufftw.h
▶ Instead of linking with the double/single precision libraries such as fftw3∕fftw3f libraries, link
with both the cuFFT and cuFFTW libraries
▶ Ensure the search path includes the directory containing cuda_runtime_api.h
After an application is working using the FFTW3 interface, users may want to modify their code to
move data to and from the GPU and use the routines documented in the FFTW Conversion Guide for
the best performance.
The following tables show which components and functions of FFTW3 are supported in cuFFT.
Note that for each of the double precision functions below there is a corresponding single precision
version with the letters fftw replaced by fftwf.
75
cuFFT, Release 12.3
77
cuFFT, Release 12.3
8.1. Notice
This document is provided for information purposes only and shall not be regarded as a warranty of a
certain functionality, condition, or quality of a product. NVIDIA Corporation (“NVIDIA”) makes no repre-
sentations or warranties, expressed or implied, as to the accuracy or completeness of the information
contained in this document and assumes no responsibility for any errors contained herein. NVIDIA shall
have no liability for the consequences or use of such information or for any infringement of patents
or other rights of third parties that may result from its use. This document is not a commitment to
develop, release, or deliver any Material (defined below), code, or functionality.
NVIDIA reserves the right to make corrections, modifications, enhancements, improvements, and any
other changes to this document, at any time without notice.
Customer should obtain the latest relevant information before placing orders and should verify that
such information is current and complete.
NVIDIA products are sold subject to the NVIDIA standard terms and conditions of sale supplied at the
time of order acknowledgement, unless otherwise agreed in an individual sales agreement signed by
authorized representatives of NVIDIA and customer (“Terms of Sale”). NVIDIA hereby expressly objects
to applying any customer general terms and conditions with regards to the purchase of the NVIDIA
product referenced in this document. No contractual obligations are formed either directly or indirectly
by this document.
NVIDIA products are not designed, authorized, or warranted to be suitable for use in medical, military,
aircraft, space, or life support equipment, nor in applications where failure or malfunction of the NVIDIA
product can reasonably be expected to result in personal injury, death, or property or environmental
damage. NVIDIA accepts no liability for inclusion and/or use of NVIDIA products in such equipment or
applications and therefore such inclusion and/or use is at customer’s own risk.
NVIDIA makes no representation or warranty that products based on this document will be suitable for
any specified use. Testing of all parameters of each product is not necessarily performed by NVIDIA.
It is customer’s sole responsibility to evaluate and determine the applicability of any information con-
tained in this document, ensure the product is suitable and fit for the application planned by customer,
and perform the necessary testing for the application in order to avoid a default of the application or
the product. Weaknesses in customer’s product designs may affect the quality and reliability of the
NVIDIA product and may result in additional or different conditions and/or requirements beyond those
contained in this document. NVIDIA accepts no liability related to any default, damage, costs, or prob-
lem which may be based on or attributable to: (i) the use of the NVIDIA product in any manner that is
contrary to this document or (ii) customer product designs.
No license, either expressed or implied, is granted under any NVIDIA patent right, copyright, or other
NVIDIA intellectual property right under this document. Information published by NVIDIA regarding
third-party products or services does not constitute a license from NVIDIA to use such products or
79
cuFFT, Release 12.3
services or a warranty or endorsement thereof. Use of such information may require a license from a
third party under the patents or other intellectual property rights of the third party, or a license from
NVIDIA under the patents or other intellectual property rights of NVIDIA.
Reproduction of information in this document is permissible only if approved in advance by NVIDIA
in writing, reproduced without alteration and in full compliance with all applicable export laws and
regulations, and accompanied by all associated conditions, limitations, and notices.
THIS DOCUMENT AND ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS,
DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, “MATERIALS”) ARE
BEING PROVIDED “AS IS.” NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR
OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WAR-
RANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE.
TO THE EXTENT NOT PROHIBITED BY LAW, IN NO EVENT WILL NVIDIA BE LIABLE FOR ANY DAMAGES,
INCLUDING WITHOUT LIMITATION ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, PUNITIVE, OR CON-
SEQUENTIAL DAMAGES, HOWEVER CAUSED AND REGARDLESS OF THE THEORY OF LIABILITY, ARIS-
ING OUT OF ANY USE OF THIS DOCUMENT, EVEN IF NVIDIA HAS BEEN ADVISED OF THE POSSIBILITY
OF SUCH DAMAGES. Notwithstanding any damages that customer might incur for any reason whatso-
ever, NVIDIA’s aggregate and cumulative liability towards customer for the products described herein
shall be limited in accordance with the Terms of Sale for the product.
8.2. OpenCL
OpenCL is a trademark of Apple Inc. used under license to the Khronos Group Inc.
8.3. Trademarks
NVIDIA and the NVIDIA logo are trademarks or registered trademarks of NVIDIA Corporation in the
U.S. and other countries. Other company and product names may be trademarks of the respective
companies with which they are associated.
Copyright
©2007-2023, NVIDIA Corporation & affiliates. All rights reserved
80 Chapter 8. Notices
Index
C cufftXtFree (C function), 56
cufftCreate (C function), 29 cufftXtGetSizeMany (C function), 45
cufftDestroy (C function), 30 cufftXtMakePlanMany (C function), 36
cufftEstimate1d (C function), 38 cufftXtMalloc (C function), 55
cufftEstimate2d (C function), 38 cufftXtMemcpy (C function), 57
cufftEstimate3d (C function), 39 cufftXtSetCallback (C function), 58
cufftEstimateMany (C function), 39 cufftXtSetCallbackSharedSize (C function),
cufftExecC2C (C function), 49 59
cufftExecC2R (C function), 50 cufftXtSetGPUs (C function), 52
cufftExecD2Z (C function), 49 cufftXtSetWorkArea (C function), 53
cufftExecR2C (C function), 49 cufftXtSetWorkAreaPolicy (C function), 48
cufftExecZ2D (C function), 50
cufftExecZ2Z (C function), 49
cufftGetProperty (C function), 61
cufftGetSize (C function), 47
cufftGetSize1d (C function), 41
cufftGetSize2d (C function), 42
cufftGetSize3d (C function), 42
cufftGetSizeMany (C function), 43
cufftGetSizeMany64 (C function), 44
cufftGetVersion (C function), 61
cufftHandle (C type), 63
cufftMakePlan1d (C function), 30
cufftMakePlan2d (C function), 31
cufftMakePlan3d (C function), 32
cufftMakePlanMany (C function), 33
cufftMakePlanMany64 (C function), 34
cufftPlan1d (C function), 26
cufftPlan2d (C function), 26
cufftPlan3d (C function), 27
cufftPlanMany (C function), 28
cufftSetAutoAllocation (C function), 47
cufftSetStream (C function), 60
cufftSetWorkArea (C function), 48
cufftXtClearCallback (C function), 59
cufftXtExec (C function), 51
cufftXtExecDescriptor (C function), 51
cufftXtExecDescriptorC2C (C function), 53
cufftXtExecDescriptorC2R (C function), 55
cufftXtExecDescriptorD2Z (C function), 54
cufftXtExecDescriptorR2C (C function), 54
cufftXtExecDescriptorZ2D (C function), 55
cufftXtExecDescriptorZ2Z (C function), 53
81