0% found this document useful (0 votes)
621 views29 pages

CUFFT Library

NVIDIA MAKES no WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS. NVIDIA Corporation assumes no responsibility for the consequences of use of such information. Specifications mentioned in this publication are subject to change without notice.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
621 views29 pages

CUFFT Library

NVIDIA MAKES no WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS. NVIDIA Corporation assumes no responsibility for the consequences of use of such information. Specifications mentioned in this publication are subject to change without notice.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

CUDA

CUFFT Library

PG-05327-032_V01
August, 2010
CUFFT Library PG-05327-032_V01

Publishedȱby
NVIDIAȱCorporationȱ
2701ȱSanȱTomasȱExpressway
SantaȱClara,ȱCAȱ95050

Notice
ALLȱNVIDIAȱDESIGNȱSPECIFICATIONS,ȱREFERENCEȱBOARDS,ȱFILES,ȱDRAWINGS,ȱDIAGNOSTICS,ȱ
LISTS,ȱANDȱOTHERȱDOCUMENTSȱ(TOGETHERȱANDȱSEPARATELY,ȱ““MATERIALS””)ȱAREȱBEINGȱ
PROVIDEDȱ““ASȱIS””.ȱNVIDIAȱMAKESȱNOȱWARRANTIES,ȱEXPRESSED,ȱIMPLIED,ȱSTATUTORY,ȱORȱ
OTHERWISEȱWITHȱRESPECTȱTOȱTHEȱMATERIALS,ȱANDȱEXPRESSLYȱDISCLAIMSȱALLȱIMPLIEDȱ
WARRANTIESȱOFȱNONINFRINGEMENT,ȱMERCHANTABILITY,ȱANDȱFITNESSȱFORȱAȱPARTICULARȱ
PURPOSE.

Informationȱfurnishedȱisȱbelievedȱtoȱbeȱaccurateȱandȱreliable.ȱHowever,ȱNVIDIAȱCorporationȱassumesȱnoȱ
responsibilityȱforȱtheȱconsequencesȱofȱuseȱofȱsuchȱinformationȱorȱforȱanyȱinfringementȱofȱpatentsȱorȱotherȱ
rightsȱofȱthirdȱpartiesȱthatȱmayȱresultȱfromȱitsȱuse.ȱNoȱlicenseȱisȱgrantedȱbyȱimplicationȱorȱotherwiseȱunderȱ
anyȱpatentȱorȱpatentȱrightsȱofȱNVIDIAȱCorporation.ȱSpecificationsȱmentionedȱinȱthisȱpublicationȱareȱ
subjectȱtoȱchangeȱwithoutȱnotice.ȱThisȱpublicationȱsupersedesȱandȱreplacesȱallȱinformationȱpreviouslyȱ
supplied.ȱNVIDIAȱCorporationȱproductsȱareȱnotȱauthorizedȱforȱuseȱasȱcriticalȱcomponentsȱinȱlifeȱsupportȱ
devicesȱorȱsystemsȱwithoutȱexpressȱwrittenȱapprovalȱofȱNVIDIAȱCorporation.ȱ

Trademarks
NVIDIA,ȱCUDA,ȱandȱtheȱNVIDIAȱlogoȱareȱtrademarksȱorȱregisteredȱtrademarksȱofȱNVIDIAȱCorporationȱ
inȱtheȱUnitedȱStatesȱandȱotherȱcountries.ȱOtherȱcompanyȱandȱproductȱnamesȱmayȱbeȱtrademarksȱofȱtheȱ
respectiveȱcompaniesȱwithȱwhichȱtheyȱareȱassociated.

Copyright
©ȱ2005––2010ȱbyȱNVIDIAȱCorporation.ȱAllȱrightsȱreserved.ȱ

NVIDIA Corporation
Table of Contents

CUFFT Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
CUFFT Types and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Type cufftHandle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Type cufftResult . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Type cufftReal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Type cufftDoubleReal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Type cufftComplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Type cufftDoubleComplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Type cufftCompatibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
CUFFT Transform Types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
CUFFT Transform Directions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Streamed CUFFT Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
FFTW Compatibility Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
CUFFT API Functions . . . . . . . . . . . . . . . . . . .... .... ... . . . . . . . . . . . . . . . . . . .... 11
Function cufftPlan1d(). . . . . . . . . . . . . . . .... .... ... . . . . . . . . . . . . . . . . . . .... 12
Function cufftPlan2d(). . . . . . . . . . . . . . . .... .... ... . . . . . . . . . . . . . . . . . . .... 12
Function cufftPlan3d(). . . . . . . . . . . . . . . .... .... ... . . . . . . . . . . . . . . . . . . .... 13
Function cufftPlanMany(). . . . . . . . . . . . . .... .... ... . . . . . . . . . . . . . . . . . . .... 14
Function cufftDestroy() . . . . . . . . . . . . . . .... .... ... . . . . . . . . . . . . . . . . . . .... 15
Function cufftExecC2C() . . . . . . . . . . . . . .... .... ... . . . . . . . . . . . . . . . . . . .... 15
Function cufftExecR2C() . . . . . . . . . . . . . .... .... ... . . . . . . . . . . . . . . . . . . .... 16
Function cufftExecC2R() . . . . . . . . . . . . . .... .... ... . . . . . . . . . . . . . . . . . . .... 17
Function cufftExecZ2Z() . . . . . . . . . . . . . .... .... ... . . . . . . . . . . . . . . . . . . .... 18
Function cufftExecD2Z() . . . . . . . . . . . . . .... .... ... . . . . . . . . . . . . . . . . . . .... 19
Function cufftExecZ2D() . . . . . . . . . . . . . .... .... ... . . . . . . . . . . . . . . . . . . .... 19
Function cufftSetStream() . . . . . . . . . . . . .... .... ... . . . . . . . . . . . . . . . . . . .... 20
Function cufftSetCompatibilityMode() . . . . .... .... ... . . . . . . . . . . . . . . . . . . .... 21
Accuracy and Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
CUFFT Code Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . .... .... ... .... 23
1D Complex-to-Complex Transforms . . . . . . . . . . . . . . . . . . .. . . . .... .... ... .... 24
1D Real-to-Complex Transforms . . . . . . . . . . . . . . . . . . . . . .. . . . .... .... ... .... 25
2D Complex-to-Complex Transforms . . . . . . . . . . . . . . . . . . .. . . . .... .... ... .... 26
Batched 2D Complex-to-Complex Transforms. . . . . . . . . . . . .. . . . .... .... ... .... 27
2D Complex-to-Real Transforms . . . . . . . . . . . . . . . . . . . . . .. . . . .... .... ... .... 28
3D Complex-to-Complex Transforms . . . . . . . . . . . . . . . . . . .. . . . .... .... ... .... 29

PG-05327-032_V01 iii
NVIDIA
CUFFT Library

ThisȱdocumentȱdescribesȱCUFFT,ȱtheȱNVIDIA®ȱCUDA™™ȱFastȱFourierȱ
Transformȱ(FFT)ȱlibrary.ȱTheȱFFTȱisȱaȱdivideȬandȬconquerȱalgorithmȱ
forȱefficientlyȱcomputingȱdiscreteȱFourierȱtransformsȱofȱcomplexȱorȱ
realȬvaluedȱdataȱsets,ȱandȱitȱisȱoneȱofȱtheȱmostȱimportantȱandȱwidelyȱ
usedȱnumericalȱalgorithms,ȱwithȱapplicationsȱthatȱincludeȱ
computationalȱphysicsȱandȱgeneralȱsignalȱprocessing.ȱTheȱCUFFTȱ
libraryȱprovidesȱaȱsimpleȱinterfaceȱforȱcomputingȱparallelȱFFTsȱonȱanȱ
NVIDIAȱGPU,ȱwhichȱallowsȱusersȱtoȱleverageȱtheȱfloatingȬpointȱpowerȱ
andȱparallelismȱofȱtheȱGPUȱwithoutȱhavingȱtoȱdevelopȱaȱcustom,ȱGPUȬ
basedȱFFTȱimplementation.
FFTȱlibrariesȱtypicallyȱvaryȱinȱtermsȱofȱsupportedȱtransformȱsizesȱandȱ
dataȱtypes.ȱForȱexample,ȱsomeȱlibrariesȱonlyȱimplementȱRadixȬ2ȱFFTs,ȱ
restrictingȱtheȱtransformȱsizeȱtoȱaȱpowerȱofȱtwo,ȱwhileȱotherȱ
implementationsȱsupportȱarbitraryȱtransformȱsizes.ȱThisȱversionȱofȱtheȱ
CUFFTȱlibraryȱsupportsȱtheȱfollowingȱfeatures:
‰ 1D,ȱ2D,ȱandȱ3DȱtransformsȱofȱcomplexȱandȱrealȬvaluedȱdata
‰ Batchȱexecutionȱforȱdoingȱmultipleȱtransformsȱofȱanyȱdimensionȱinȱ
parallel
‰ 2Dȱandȱ3Dȱtransformȱsizesȱinȱtheȱrangeȱ[2,ȱ16384]ȱinȱanyȱ
dimension
‰ 1Dȱtransformȱsizesȱupȱtoȱ8ȱmillionȱelements
‰ InȬplaceȱandȱoutȬofȬplaceȱtransformsȱforȱrealȱandȱcomplexȱdata
‰ DoubleȬprecisionȱtransformsȱonȱcompatibleȱhardwareȱ(GT200ȱandȱ
laterȱGPUs)ȱ
‰ Supportȱforȱstreamedȱexecution,ȱenablingȱsimultaneousȱ
computationȱtogetherȱwithȱdataȱmovement

PG-05327-032_V01 4
NVIDIA
CUDA CUFFT Library

CUFFT Types and Definitions


TheȱnextȱsectionsȱdescribeȱtheȱCUFFTȱtypesȱandȱtransformȱdirections:
‰ ““TypeȱcufftHandle””ȱonȱpage 5
‰ ““TypeȱcufftResult””ȱonȱpage 6
‰ ““TypeȱcufftReal””ȱonȱpage 6
‰ ““TypeȱcufftDoubleReal””ȱonȱpage 6
‰ ““TypeȱcufftComplex””ȱonȱpage 6
‰ ““TypeȱcufftDoubleComplex””ȱonȱpage 7
‰ ““TypeȱcufftCompatibility””ȱonȱpage 7
‰ ““CUFFTȱTransformȱTypes””ȱonȱpage 7
‰ ““CUFFTȱTransformȱDirections””ȱonȱpage 8

Type cufftHandle
typedefunsignedintcufftHandle;
AȱhandleȱtypeȱusedȱtoȱstoreȱandȱaccessȱCUFFTȱplansȱ(seeȱ““CUFFTȱAPIȱ
Functions””ȱonȱpage 11ȱforȱmoreȱinformationȱaboutȱplans).ȱForȱ
example,ȱtheȱuserȱreceivesȱaȱhandleȱafterȱcreatingȱaȱCUFFTȱplanȱandȱ
usesȱthisȱhandleȱtoȱexecuteȱtheȱplan.

PG-05327-032_V01 5
NVIDIA
CUDA CUFFT Library

Type cufftResult
typedefenumcufftResult_tcufftResult;
AnȱenumerationȱofȱvaluesȱusedȱexclusivelyȱasȱAPIȱfunctionȱreturnȱ
values.ȱTheȱpossibleȱreturnȱvaluesȱareȱdefinedȱasȱfollows:
Return Values
CUFFT_SUCCESS AnyȱCUFFTȱoperationȱisȱsuccessful.
CUFFT_INVALID_PLAN CUFFTȱisȱpassedȱanȱinvalidȱplanȱhandle.
CUFFT_ALLOC_FAILED CUFFTȱfailedȱtoȱallocateȱGPUȱmemory.
CUFFT_INVALID_TYPE Theȱuserȱrequestsȱanȱunsupportedȱtype.
CUFFT_INVALID_VALUE Theȱuserȱspecifiesȱaȱbadȱmemoryȱpointer.
CUFFT_INTERNAL_ERROR Usedȱforȱallȱinternalȱdriverȱerrors.
CUFFT_EXEC_FAILED CUFFTȱfailedȱtoȱexecuteȱanȱFFTȱonȱtheȱGPU.
CUFFT_SETUP_FAILED TheȱCUFFTȱlibraryȱfailedȱtoȱinitialize.
CUFFT_INVALID_SIZE TheȱuserȱspecifiesȱanȱunsupportedȱFFTȱsize.
CUFFT_UNALIGNED_DATA Inputȱorȱoutputȱdoesȱnotȱsatisfyȱtextureȱ
alignmentȱrequirements.

Type cufftReal
typedeffloatcufftReal;
AȱsingleȬprecision,ȱfloatingȬpointȱrealȱdataȱtype.

Type cufftDoubleReal
typedefdoublecufftDoubleReal;
AȱdoubleȬprecision,ȱfloatingȬpointȱrealȱdataȱtype.

Type cufftComplex
typedefcuComplexcufftComplex;
AȱsingleȬprecision,ȱfloatingȬpointȱcomplexȱdataȱtypeȱthatȱconsistsȱofȱ
interleavedȱrealȱandȱimaginaryȱcomponents.

6 PG-05327-032_V01
NVIDIA
CUDA CUFFT Library

Type cufftDoubleComplex
typedefcuDoubleComplexcufftDoubleComplex;
AȱdoubleȬprecision,ȱfloatingȬpointȱcomplexȱdataȱtypeȱthatȱconsistsȱofȱ
interleavedȱrealȱandȱimaginaryȱcomponents.

Type cufftCompatibility
typedefenumcufftCompatibility_tcufftCompatibility;
AnȱenumerationȱofȱvaluesȱusedȱtoȱcontrolȱFFTWȱdataȱcompatibility.ȱ
Seeȱ““FFTWȱCompatibilityȱMode””ȱonȱpage 9ȱforȱdetails.

CUFFT Transform Types


TheȱCUFFTȱlibraryȱsupportsȱcomplexȬȱandȱrealȬdataȱtransforms.ȱTheȱ
cufftTypeȱdataȱtypeȱisȱanȱenumerationȱofȱtheȱtypesȱofȱtransformȱdataȱ
supportedȱbyȱCUFFT:ȱ
typedefenumcufftType_t{
CUFFT_R2C=0x2a,//Realtocomplex(interleaved)
CUFFT_C2R=0x2c,//Complex(interleaved)toreal
CUFFT_C2C=0x29,//Complextocomplex,interleaved
CUFFT_D2Z=0x6a,//DoubletodoubleŞcomplex
CUFFT_Z2D=0x6c,//DoubleŞcomplextodouble
CUFFT_Z2Z=0x69//DoubleŞcomplextodoubleŞcomplex
}cufftType;
ForȱcomplexȱFFTs,ȱtheȱinputȱandȱoutputȱarraysȱmustȱinterleaveȱtheȱrealȱ
andȱimaginaryȱpartsȱ(theȱcufftComplexȱtype).ȱTheȱtransformȱsizeȱinȱ
eachȱdimensionȱisȱtheȱnumberȱofȱcufftComplexȱelements.ȱTheȱ
CUFFT_C2Cȱconstantȱcanȱbeȱpassedȱtoȱanyȱplanȱcreationȱfunctionȱtoȱ
configureȱaȱsingleȬprecisionȱcomplexȬtoȬcomplexȱFFT.ȱPassȱtheȱ
CUFFT_Z2ZȱconstantȱtoȱconfigureȱaȱdoubleȬprecisionȱcomplexȬtoȬ
complexȱFFT.ȱ
ForȱrealȬtoȬcomplexȱFFTs,ȱtheȱoutputȱarrayȱholdsȱonlyȱtheȱnonȬ
redundantȱcomplexȱcoefficients.ȱSoȱforȱanȱNȬelementȱtransform,ȱtheȱ
outputȱarrayȱholdsȱ N e 2 + 1 ȱcufftComplexȱterms.ȱForȱhigherȬ
dimensionalȱrealȱtransformsȱofȱtheȱformȱ N0 u N1 u } u Nn ,ȱtheȱlastȱ
dimensionȱisȱcutȱinȱhalfȱsuchȱthatȱtheȱoutputȱdataȱisȱ

PG-05327-032_V01 7
NVIDIA
CUDA CUFFT Library

N0 u N1 u } u Nn e 2 + 1 ȱcomplexȱelements.ȱTherefore,ȱinȱorderȱtoȱ
performȱanȱinȬplaceȱFFT,ȱtheȱuserȱhasȱtoȱpadȱtheȱinputȱarrayȱinȱtheȱlastȱ
dimensionȱtoȱ Nn e 2 + 1 ȱcomplexȱelementsȱorȱ 2 * N e 2 + 1 ȱrealȱ
elements.ȱNoteȱthatȱtheȱrealȬtoȬcomplexȱtransformȱisȱimplicitlyȱ
forward.ȱPassingȱtheȱCUFFT_R2Cȱconstantȱtoȱanyȱplanȱcreationȱfunctionȱ
configuresȱaȱsingleȬprecisionȱrealȬtoȬcomplexȱFFT.ȱPassingȱtheȱ
CUFFT_D2ZȱconstantȱconfiguresȱaȱdoubleȬprecisionȱrealȬtoȬcomplexȱFFT.ȱ
TheȱrequirementsȱforȱcomplexȬtoȬrealȱFFTsȱareȱsimilarȱtoȱthoseȱforȱrealȬ
toȬcomplex.ȱInȱthisȱcase,ȱtheȱinputȱarrayȱholdsȱonlyȱtheȱnonȬredundant,ȱ
N e 2 + 1 complexȱcoefficientsȱfromȱaȱrealȬtoȬcomplexȱtransform.ȱTheȱ
outputȱisȱsimplyȱNȱelementsȱofȱtypeȱcufftReal.ȱHowever,ȱforȱanȱinȬ
placeȱtransform,ȱtheȱinputȱsizeȱmustȱbeȱpaddedȱtoȱ 2 * N e 2 + 1 ȱrealȱ
elements.ȱTheȱcomplexȬtoȬrealȱtransformȱisȱimplicitlyȱinverse.ȱPassingȱ
theȱCUFFT_C2Rȱconstantȱtoȱanyȱplanȱcreationȱfunctionȱconfiguresȱaȱ
singleȬprecisionȱcomplexȬtoȬrealȱFFT.ȱPassingȱCUFFT_Z2Dȱconstantȱ
configuresȱaȱdoubleȬprecisionȱcomplexȬtoȬrealȱFFT.ȱ
Forȱ1DȱcomplexȬtoȬcomplexȱtransforms,ȱtheȱstrideȱbetweenȱsignalsȱinȱaȱ
batchȱisȱassumedȱtoȱbeȱtheȱnumberȱofȱcufftComplexȱelementsȱinȱtheȱ
logicalȱtransformȱsize.ȱHowever,ȱforȱrealȬdataȱFFTs,ȱtheȱdistanceȱ
betweenȱsignalsȱinȱaȱbatchȱdependsȱonȱwhetherȱtheȱtransformȱisȱinȬ
placeȱorȱoutȬofȬplace.ȱForȱinȬplaceȱFFTs,ȱtheȱinputȱstrideȱisȱassumedȱtoȱ
beȱ 2 * N e 2 + 1 ȱcufftRealȱelementsȱorȱ N e 2 + 1 ȱcufftComplexȱelements.ȱ
ForȱoutȬofȬplaceȱtransforms,ȱinputȱandȱoutputȱstridesȱmatchȱtheȱlogicalȱ
transformȱsizeȱNȱandȱtheȱnonȬredundantȱsizeȱ N e 2 + 1 ,ȱrespectively.
StartingȱwithȱCUFFTȱversionȱ3.0,ȱbatchedȱtransformsȱareȱsupportedȱ
throughȱtheȱcufftPlanMany()ȱfunction.ȱAlthoughȱthisȱfunctionȱtakesȱ
inputȱparametersȱthatȱspecifyȱinputȬȱandȱoutputȬdataȱstrides,ȱasȱofȱ
versionȱ3.0ȱitȱisȱassumedȱtheȱdataȱforȱeachȱsignalȱwithinȱtheȱbatchȱ
immediatelyȱfollowȱtheȱdataȱofȱtheȱpreviousȱoneȱ(aȱstrideȱofȱ1).

CUFFT Transform Directions


TheȱCUFFTȱlibraryȱdefinesȱforwardȱandȱinverseȱFastȱFourierȱ
Transformsȱaccordingȱtoȱtheȱsignȱofȱtheȱcomplexȱexponentialȱterm:
#defineCUFFT_FORWARDŞ1
#defineCUFFT_INVERSE1

8 PG-05327-032_V01
NVIDIA
CUDA CUFFT Library

ForȱhigherȬdimensionalȱtransformsȱ(2Dȱandȱ3D),ȱCUFFTȱperformsȱ
FFTsȱinȱrowȬmajorȱorȱCȱorder.ȱForȱexample,ȱifȱtheȱuserȱrequestsȱaȱ3Dȱ
transformȱplanȱforȱsizesȱX,ȱY,ȱandȱZ,ȱCUFFTȱtransformsȱalongȱZ,ȱY,ȱandȱ
thenȱX.ȱTheȱuserȱcanȱconfigureȱcolumnȬmajorȱFFTsȱbyȱsimplyȱchangingȱ
theȱorderȱofȱtheȱsizeȱparametersȱtoȱtheȱplanȱcreationȱAPIȱfunctions.
CUFFTȱperformsȱunȬnormalizedȱFFTs;ȱthatȱis,ȱperformingȱaȱforwardȱ
FFTȱonȱanȱinputȱdataȱsetȱfollowedȱbyȱanȱinverseȱFFTȱonȱtheȱresultingȱ
setȱyieldsȱdataȱthatȱisȱequalȱtoȱtheȱinputȱscaledȱbyȱtheȱnumberȱofȱ
elements.ȱScalingȱeitherȱtransformȱbyȱtheȱreciprocalȱofȱtheȱsizeȱofȱtheȱ
dataȱsetȱisȱleftȱforȱtheȱuserȱtoȱperformȱasȱseenȱfit.

Streamed CUFFT Transforms


Executionȱofȱaȱtransformȱofȱaȱparticularȱsizeȱandȱtypeȱmayȱtakeȱseveralȱ
stagesȱofȱprocessing.ȱAȱplanȱforȱtheȱtransformȱisȱgenerated,ȱinȱwhichȱ
CUFFTȱspecifiesȱtheȱinternalȱstepsȱthatȱneedȱtoȱbeȱtaken.ȱTheseȱstepsȱ
mayȱincludeȱmultipleȱkernelȱlaunches,ȱmemoryȱcopies,ȱandȱsoȱon.
EveryȱCUFFTȱplanȱmayȱbeȱassociatedȱwithȱaȱCUDAȱstream.ȱOnceȱsoȱ
associated,ȱallȱlaunchesȱofȱtheȱinternalȱstagesȱofȱthatȱplanȱtakeȱplaceȱ
throughȱtheȱspecifiedȱstream.ȱStreamingȱofȱlaunchesȱallowsȱforȱ
potentialȱoverlapȱbetweenȱtransformsȱandȱmemoryȱcopies——seeȱtheȱ
NVIDIAȱCUDAȱProgrammingȱGuideȱforȱmoreȱinformationȱonȱstreams.ȱIfȱ
noȱstreamȱisȱassociatedȱwithȱaȱplan,ȱlaunchesȱtakeȱplaceȱinȱstreamȱ0ȱ
(theȱdefaultȱCUDAȱstream).

FFTW Compatibility Mode


Forȱsomeȱtransformȱsizes,ȱFFTWȱrequiresȱadditionalȱpaddingȱbytesȱ
betweenȱrowsȱandȱplanesȱofȱReal2Complexȱ(R2C)ȱandȱComplex2Realȱ
(C2R)ȱtransformsȱofȱrankȱgreaterȱthanȱ1.ȱ(Forȱdetails,ȱpleaseȱreferȱtoȱtheȱ
FFTWȱonlineȱdocumentationȱatȱhttp://www.fftw.org.)
ToȱspeedȱupȱR2CȱandȱC2RȱtransformsȱforȱpowerȬofȬ2ȱsizesȱsimilarȱtoȱ
theirȱComplex2Complexȱ(C2C)ȱequivalent,ȱoneȱcanȱdisableȱFFTWȬ
compatibleȱlayoutȱusingȱcufftSetCompatibilityMode(),ȱintroducedȱinȱ
releaseȱ3.1ȱandȱdescribedȱonȱpageȱ21.ȱWhenȱnativeȱmodeȱisȱselectedȱforȱ
thisȱfunction,ȱpowerȬofȬ2ȱtransformȱsizesȱwillȱbeȱcompactȱandȱCUFFTȱ
willȱnotȱuseȱpadding.ȱNonȬpowerȬofȬ2ȱsizesȱwillȱcontinueȱtoȱuseȱtheȱ
sameȱpaddingȱlayoutȱasȱFFTW.ȱ

PG-05327-032_V01 9
NVIDIA
CUDA CUFFT Library

TheȱFFTWȱcompatibilityȱmodesȱareȱasȱfollows:
CUFFT_COMPATIBILITY_NATIVEȱ
CUFFT_COMPATIBILITY_FFTW_PADDINGȱ
CUFFT_COMPATIBILITY_FFTW_ASYMMETRICȱ
CUFFT_COMPATIBILITY_FFTW_ALLȱ
CUFFT_COMPATIBILITY_NATIVEȱmodeȱdisablesȱFFTWȱcompatibility,ȱbutȱ
achievesȱtheȱhighestȱperformance.ȱ
CUFFT_COMPATIBILITY_FFTW_PADDINGȱsupportsȱFFTWȱdataȱpaddingȱbyȱ
insertingȱextraȱpaddingȱbetweenȱpackedȱinȬplaceȱtransformsȱforȱ
batchedȱtransformsȱwithȱpowerȬofȬ2ȱsize.ȱ
CUFFT_COMPATIBILITY_FFTW_ASYMMETRICȱwaivesȱtheȱC2Rȱsymmetryȱ
requirement.ȱOnceȱset,ȱitȱguaranteesȱFFTWȬcompatibleȱoutputȱforȱnonȬ
symmetricȱcomplexȱinputsȱforȱtransformsȱwithȱpowerȬofȬ2ȱsize.ȱThisȱisȱ
onlyȱusefulȱforȱartificialȱ(thatȱis,ȱrandom)ȱdataȱsetsȱasȱactualȱdataȱwillȱ
alwaysȱbeȱsymmetricȱifȱitȱhasȱcomeȱfromȱtheȱrealȱplane.ȱEnablingȱthisȱ
modeȱcanȱsignificantlyȱimpactȱperformance.ȱ
CUFFT_COMPATIBILITY_FFTW_ALLȱenablesȱfullȱFFTWȱcompatibility.ȱReferȱ
toȱtheȱFFTWȱdocumentationȱ(http://www.fftw.org)ȱforȱFFTWȱdataȱ
layoutȱspecifications.ȱ

10 PG-05327-032_V01
NVIDIA
CUDA CUFFT Library

CUFFT API Functions


TheȱCUFFTȱAPIȱisȱmodeledȱafterȱFFTW,ȱwhichȱisȱoneȱofȱtheȱmostȱ
popularȱandȱefficientȱCPUȬbasedȱFFTȱlibraries.ȱFFTWȱprovidesȱaȱ
simpleȱconfigurationȱmechanismȱcalledȱaȱplanȱthatȱcompletelyȱspecifiesȱ
theȱoptimal——thatȱis,ȱtheȱminimumȱfloatingȬpointȱoperationȱ(flop)——
planȱofȱexecutionȱforȱaȱparticularȱFFTȱsizeȱandȱdataȱtype.ȱTheȱ
advantageȱofȱthisȱapproachȱisȱthatȱonceȱtheȱuserȱcreatesȱaȱplan,ȱtheȱ
libraryȱstoresȱwhateverȱstateȱisȱneededȱtoȱexecuteȱtheȱplanȱmultipleȱ
timesȱwithoutȱrecalculationȱofȱtheȱconfiguration.ȱTheȱFFTWȱmodelȱ
worksȱwellȱforȱCUFFTȱbecauseȱdifferentȱkindsȱofȱFFTsȱrequireȱdifferentȱ
threadȱconfigurationsȱandȱGPUȱresources,ȱandȱplansȱareȱaȱsimpleȱwayȱ
toȱstoreȱandȱreuseȱconfigurations.ȱ
TheȱCUFFTȱlibraryȱinitializesȱinternalȱdataȱuponȱtheȱfirstȱinvocationȱofȱ
anȱAPIȱfunction.ȱTherefore,ȱallȱAPIȱfunctionsȱcouldȱreturnȱtheȱ
CUFFT_SETUP_FAILEDȱerrorȱcodeȱifȱtheȱlibraryȱfailsȱtoȱinitialize.ȱCUFFTȱ
shutsȱdownȱautomaticallyȱwhenȱallȱuserȬcreatedȱFFTȱplansȱareȱ
destroyed.ȱ
TheȱCUFFTȱfunctionsȱareȱasȱfollows:
‰ ““FunctionȱcufftPlan1d()””ȱonȱpage 12
‰ ““FunctionȱcufftPlan2d()””ȱonȱpage 12
‰ ““FunctionȱcufftPlan3d()””ȱonȱpage 13
‰ ““FunctionȱcufftPlanMany()””ȱonȱpage 14
‰ ““FunctionȱcufftDestroy()””ȱonȱpage 15
‰ ““FunctionȱcufftExecC2C()””ȱonȱpage 15ȱ
‰ ““FunctionȱcufftExecR2C()””ȱonȱpage 16
‰ ““FunctionȱcufftExecC2R()””ȱonȱpage 17
‰ ““FunctionȱcufftExecZ2Z()””ȱonȱpage 18ȱ
‰ ““FunctionȱcufftExecD2Z()””ȱonȱpage 19
‰ ““FunctionȱcufftExecZ2D()””ȱonȱpage 19
‰ ““FunctionȱcufftSetStream()””ȱonȱpage 20
‰ ““FunctionȱcufftSetCompatibilityMode()””ȱonȱpage 21

PG-05327-032_V01 11
NVIDIA
CUDA CUFFT Library

Function cufftPlan1d()
cufftResult
cufftPlan1d(
cufftHandle*plan,intnx,cufftTypetype,intbatch);
Createsȱaȱ1DȱFFTȱplanȱconfigurationȱforȱaȱspecifiedȱsignalȱsizeȱandȱ
dataȱtype.ȱTheȱbatchȱinputȱparameterȱtellsȱCUFFTȱhowȱmanyȱ1Dȱ
transformsȱtoȱconfigure.ȱȱȱ
Input
plan PointerȱtoȱaȱcufftHandleȱobject
nx Theȱtransformȱsizeȱ(e.g.,ȱ256ȱforȱaȱ256ȬpointȱFFT)
type Theȱtransformȱdataȱtypeȱ(e.g.,ȱCUFFT_C2Cȱforȱcomplexȱtoȱcomplex)ȱ
batch Numberȱofȱtransformsȱofȱsizeȱnxȱ

Output
plan ContainsȱaȱCUFFTȱ1Dȱplanȱhandleȱvalue

Return Values
CUFFT_SUCCESS CUFFTȱsuccessfullyȱcreatedȱtheȱFFTȱplan.ȱ
CUFFT_ALLOC_FAILED AllocationȱofȱGPUȱresourcesȱforȱtheȱplanȱfailed.
CUFFT_INVALID_TYPE Theȱtypeȱparameterȱisȱnotȱsupported.
CUFFT_INTERNAL_ERROR Internalȱdriverȱerrorȱisȱdetected.
CUFFT_SETUP_FAILED CUFFTȱlibraryȱfailedȱtoȱinitialize.
CUFFT_INVALID_SIZE Theȱnxȱparameterȱisȱnotȱaȱsupportedȱsize.

Function cufftPlan2d()
cufftResult
cufftPlan2d(
cufftHandle*plan,intnx,intny,cufftTypetype);
Createsȱaȱ2DȱFFTȱplanȱconfigurationȱaccordingȱtoȱspecifiedȱsignalȱsizesȱ
andȱdataȱtype.ȱThisȱfunctionȱisȱtheȱsameȱasȱcufftPlan1d()ȱexceptȱthatȱ
itȱtakesȱaȱsecondȱsizeȱparameter,ȱny,ȱandȱdoesȱnotȱsupportȱbatching.ȱȱȱ
Input
plan PointerȱtoȱaȱcufftHandleȱobject
nx TheȱtransformȱsizeȱinȱtheȱXȬdimensionȱ(numberȱofȱrows)
ny TheȱtransformȱsizeȱinȱtheȱYȬdimensionȱ(numberȱofȱcolumns)
type Theȱtransformȱdataȱtypeȱ(e.g.,ȱCUFFT_C2Rȱforȱcomplexȱtoȱreal)

12 PG-05327-032_V01
NVIDIA
CUDA CUFFT Library

Output
plan ContainsȱaȱCUFFTȱ2Dȱplanȱhandleȱvalue

Return Values
CUFFT_SUCCESS CUFFTȱsuccessfullyȱcreatedȱtheȱFFTȱplan.ȱ
CUFFT_ALLOC_FAILED AllocationȱofȱGPUȱresourcesȱforȱtheȱplanȱfailed.
CUFFT_INVALID_TYPE Theȱtypeȱparameterȱisȱnotȱsupported.
CUFFT_INTERNAL_ERROR Internalȱdriverȱerrorȱisȱdetected.
CUFFT_SETUP_FAILED CUFFTȱlibraryȱfailedȱtoȱinitialize.
CUFFT_INVALID_SIZE Theȱnxȱparameterȱisȱnotȱaȱsupportedȱsize.

Function cufftPlan3d()
cufftResult
cufftPlan3d(
cufftHandle*plan,intnx,intny,intnz,
cufftTypetype);
Createsȱaȱ3DȱFFTȱplanȱconfigurationȱaccordingȱtoȱspecifiedȱsignalȱsizesȱ
andȱdataȱtype.ȱThisȱfunctionȱisȱtheȱsameȱasȱcufftPlan2d()ȱexceptȱthatȱ
itȱtakesȱaȱthirdȱsizeȱparameterȱnz.ȱȱȱ
Input
plan PointerȱtoȱaȱcufftHandleȱobject
nx TheȱtransformȱsizeȱinȱtheȱXȬdimension
ny TheȱtransformȱsizeȱinȱtheȱYȬdimension
nz TheȱtransformȱsizeȱinȱtheȱZȬdimension
type Theȱtransformȱdataȱtypeȱ(e.g.,ȱCUFFT_R2Cȱforȱrealȱtoȱcomplex)

Output
plan ContainsȱaȱCUFFTȱ3Dȱplanȱhandleȱvalue

Return Values
CUFFT_SUCCESS CUFFTȱsuccessfullyȱcreatedȱtheȱFFTȱplan.ȱ
CUFFT_ALLOC_FAILED AllocationȱofȱGPUȱresourcesȱforȱtheȱplanȱfailed.
CUFFT_INVALID_TYPE Theȱtypeȱparameterȱisȱnotȱsupported.
CUFFT_INTERNAL_ERROR Internalȱdriverȱerrorȱisȱdetected.
CUFFT_SETUP_FAILED CUFFTȱlibraryȱfailedȱtoȱinitialize.
CUFFT_INVALID_SIZE Theȱnxȱparameterȱisȱnotȱaȱsupportedȱsize.

PG-05327-032_V01 13
NVIDIA
CUDA CUFFT Library

Function cufftPlanMany()
cufftResult
cufftPlanMany(
cufftHandle*plan,intrank,int*n,int*inembed,
intistride,intidist,int*onembed,intostride,
intodist,cufftTypetype,intbatch);
CreatesȱaȱFFTȱplanȱconfigurationȱofȱdimensionȱrank,ȱwithȱsizesȱ
specifiedȱinȱtheȱarrayȱn.ȱTheȱbatchȱinputȱparameterȱtellsȱCUFFTȱhowȱ
manyȱtransformsȱtoȱconfigureȱinȱparallel.ȱWithȱthisȱfunction,ȱbatchedȱ
plansȱofȱanyȱdimensionȱmayȱbeȱcreated.
Inputȱparametersȱinembed,ȱistride,ȱandȱidistȱandȱoutputȱparametersȱ
onembed,ȱostride,ȱandȱodistȱwillȱallowȱsetupȱofȱnonȬcontiguousȱinputȱ
dataȱinȱaȱfutureȱversion.ȱNoteȱthatȱforȱtheȱcurrentȱversionȱofȱCUFFT,ȱ
theseȱparametersȱareȱignoredȱandȱtheȱlayoutȱofȱbatchedȱdataȱmustȱbeȱ
sideȬbyȬsideȱandȱnotȱinterleaved.ȱȱȱ
Input
plan PointerȱtoȱaȱcufftHandleȱobject
rank Dimensionalityȱofȱtheȱtransformȱ(1,ȱ2,ȱorȱ3)
n Anȱarrayȱofȱsizeȱrank,ȱdescribingȱtheȱsizeȱofȱeachȱdimension
inembed Unused:ȱpassȱNULL
istride Unused:ȱpassȱ1
idist Unused:ȱpassȱ0
onembed Unused:ȱpassȱNULL
ostride Unused:ȱpassȱ1
odist Unused:ȱpassȱ0
type Transformȱdataȱtypeȱ(e.g.,ȱCUFFT_C2C,ȱasȱperȱotherȱCUFFTȱcalls)
batch Batchȱsizeȱforȱthisȱtransform

Output
plan ContainsȱaȱCUFFTȱplanȱhandleȱ

Return Values
CUFFT_SUCCESS CUFFTȱsuccessfullyȱcreatedȱtheȱFFTȱplan.ȱ
CUFFT_ALLOC_FAILED AllocationȱofȱGPUȱresourcesȱforȱtheȱplanȱfailed.
CUFFT_INVALID_TYPE Theȱtypeȱparameterȱisȱnotȱsupported.
CUFFT_INTERNAL_ERROR Internalȱdriverȱerrorȱisȱdetected.

14 PG-05327-032_V01
NVIDIA
CUDA CUFFT Library

Return Values (continued)


CUFFT_SETUP_FAILED CUFFTȱlibraryȱfailedȱtoȱinitialize.
CUFFT_INVALID_SIZE Theȱnxȱparameterȱisȱnotȱaȱsupportedȱsize.

Function cufftDestroy()
cufftResult
cufftDestroy(cufftHandleplan);
FreesȱallȱGPUȱresourcesȱassociatedȱwithȱaȱCUFFTȱplanȱandȱdestroysȱ
theȱinternalȱplanȱdataȱstructure.ȱThisȱfunctionȱshouldȱbeȱcalledȱonceȱaȱ
planȱisȱnoȱlongerȱneededȱtoȱavoidȱwastingȱGPUȱmemory.ȱȱ
Input
plan TheȱcufftHandleȱobjectȱofȱtheȱplanȱtoȱbeȱdestroyed.ȱ

Return Values
CUFFT_SUCCESS CUFFTȱsuccessfullyȱcreatedȱtheȱFFTȱplan.ȱ
CUFFT_INVALID_PLAN Theȱplanȱparameterȱisȱnotȱaȱvalidȱhandle.ȱ
CUFFT_SETUP_FAILED CUFFTȱlibraryȱfailedȱtoȱinitialize.

Function cufftExecC2C()
cufftResult
cufftExecC2C(
cufftHandleplan,cufftComplex*idata,
cufftComplex*odata,intdirection);
ExecutesȱaȱCUFFTȱsingleȬprecisionȱcomplexȬtoȬcomplexȱtransformȱ
planȱasȱspecifiedȱbyȱdirection.ȱCUFFTȱusesȱasȱinputȱdataȱtheȱGPUȱ
memoryȱpointedȱtoȱbyȱtheȱidataȱparameter.ȱThisȱfunctionȱstoresȱtheȱ
Fourierȱcoefficientsȱinȱtheȱodataȱarray.ȱIfȱidataȱandȱodataȱareȱtheȱsame,ȱ
thisȱmethodȱdoesȱanȱinȬplaceȱtransform.ȱȱȱ
Input
plan TheȱcufftHandleȱobjectȱforȱtheȱplanȱtoȱupdate
idata PointerȱtoȱtheȱsingleȬprecisionȱcomplexȱinputȱdataȱ(inȱGPUȱ
memory)ȱtoȱtransformȱ
odata PointerȱtoȱtheȱsingleȬprecisionȱcomplexȱoutputȱdataȱ(inȱGPUȱ
memory)
direction Theȱtransformȱdirection:ȱCUFFT_FORWARDȱorȱCUFFT_INVERSEȱ

PG-05327-032_V01 15
NVIDIA
CUDA CUFFT Library

Output
odata ContainsȱtheȱcomplexȱFourierȱcoefficients

Return Values
CUFFT_SUCCESS CUFFTȱsuccessfullyȱcreatedȱtheȱFFTȱplan.ȱ
CUFFT_INVALID_PLAN Theȱplanȱparameterȱisȱnotȱaȱvalidȱhandle.ȱ
CUFFT_INVALID_VALUE Theȱidata,ȱodata,ȱand/orȱdirectionȱparameterȱ
isȱnotȱvalid.ȱ
CUFFT_INTERNAL_ERROR Internalȱdriverȱerrorȱisȱdetected.
CUFFT_EXEC_FAILED CUFFTȱfailedȱtoȱexecuteȱtheȱtransformȱonȱGPU.ȱ
CUFFT_SETUP_FAILED CUFFTȱlibraryȱfailedȱtoȱinitialize.
CUFFT_UNALIGNED_DATA Inputȱorȱoutputȱdoesȱnotȱsatisfyȱtextureȱ
alignmentȱrequirements.

Function cufftExecR2C()
cufftResult
cufftExecR2C(
cufftHandleplan,cufftReal*idata,cufftComplex*odata);
ExecutesȱaȱCUFFTȱsingleȬprecisionȱrealȬtoȬcomplexȱ(implicitlyȱ
forward)ȱtransformȱplan.ȱCUFFTȱusesȱasȱinputȱdataȱtheȱGPUȱmemoryȱ
pointedȱtoȱbyȱtheȱidataȱparameter.ȱThisȱfunctionȱstoresȱtheȱnonȬ
redundantȱFourierȱcoefficientsȱinȱtheȱodataȱarray.ȱIfȱidataȱandȱodataȱ
areȱtheȱsame,ȱthisȱmethodȱdoesȱanȱinȬplaceȱtransformȱ(Seeȱ““CUFFTȱ
TransformȱTypes””ȱonȱpage 7ȱforȱdetailsȱonȱrealȱdataȱFFTs.)ȱȱ
Input
plan TheȱcufftHandleȱobjectȱforȱtheȱplanȱtoȱupdate
idata PointerȱtoȱtheȱsingleȬprecisionȱrealȱinputȱdataȱ(inȱGPUȱmemory)ȱ
toȱtransformȱ
odata PointerȱtoȱtheȱsingleȬprecisionȱcomplexȱoutputȱdataȱ(inȱGPUȱ
memory)

Output
odata ContainsȱtheȱcomplexȱFourierȱcoefficients

Return Values
CUFFT_SUCCESS CUFFTȱsuccessfullyȱcreatedȱtheȱFFTȱplan.ȱ
CUFFT_INVALID_PLAN Theȱplanȱparameterȱisȱnotȱaȱvalidȱhandle.ȱ

16 PG-05327-032_V01
NVIDIA
CUDA CUFFT Library

Return Values (continued)


CUFFT_INVALID_VALUE Theȱidata,ȱodata,ȱand/orȱdirectionȱparameterȱ
isȱnotȱvalid.ȱ
CUFFT_INTERNAL_ERROR Internalȱdriverȱerrorȱisȱdetected.
CUFFT_EXEC_FAILED CUFFTȱfailedȱtoȱexecuteȱtheȱtransformȱonȱGPU.ȱ
CUFFT_SETUP_FAILED CUFFTȱlibraryȱfailedȱtoȱinitialize.
CUFFT_UNALIGNED_DATA Inputȱorȱoutputȱdoesȱnotȱsatisfyȱtextureȱ
alignmentȱrequirements.

Function cufftExecC2R()
cufftResult
cufftExecC2R(
cufftHandleplan,cufftComplex*idata,cufftReal*odata);
ExecutesȱaȱCUFFTȱsingleȬprecisionȱcomplexȬtoȬrealȱ(implicitlyȱinverse)ȱ
transformȱplan.ȱCUFFTȱusesȱasȱinputȱdataȱtheȱGPUȱmemoryȱpointedȱtoȱ
byȱtheȱidataȱparameter.ȱTheȱinputȱarrayȱholdsȱonlyȱtheȱnonȬredundantȱ
complexȱFourierȱcoefficients.ȱThisȱfunctionȱstoresȱtheȱrealȱoutputȱ
valuesȱinȱtheȱodataȱarray.ȱIfȱidataȱandȱodataȱareȱtheȱsame,ȱthisȱmethodȱ
doesȱanȱinȬplaceȱtransform.ȱ(Seeȱ““CUFFTȱTransformȱTypes””ȱonȱpage 7ȱ
forȱdetailsȱonȱrealȱdataȱFFTs.)ȱȱȱ
Input
plan TheȱcufftHandleȱobjectȱforȱtheȱplanȱtoȱupdate
idata PointerȱtoȱtheȱsingleȬprecisionȱcomplexȱinputȱdataȱ(inȱGPUȱ
memory)ȱtoȱtransformȱ
odata PointerȱtoȱtheȱsingleȬprecisionȱrealȱoutputȱdataȱ(inȱGPUȱ
memory)

Output
odata ContainsȱtheȱrealȬvaluedȱoutputȱdata

Return Values
CUFFT_SUCCESS CUFFTȱsuccessfullyȱcreatedȱtheȱFFTȱplan.ȱ
CUFFT_INVALID_PLAN Theȱplanȱparameterȱisȱnotȱaȱvalidȱhandle.ȱ
CUFFT_INVALID_VALUE Theȱidata,ȱodata,ȱand/orȱdirectionȱparameterȱ
isȱnotȱvalid.ȱ
CUFFT_INTERNAL_ERROR Internalȱdriverȱerrorȱisȱdetected.
CUFFT_EXEC_FAILED CUFFTȱfailedȱtoȱexecuteȱtheȱtransformȱonȱGPU.ȱ

PG-05327-032_V01 17
NVIDIA
CUDA CUFFT Library

Return Values (continued)


CUFFT_SETUP_FAILED CUFFTȱlibraryȱfailedȱtoȱinitialize.
CUFFT_UNALIGNED_DATA Inputȱorȱoutputȱdoesȱnotȱsatisfyȱtextureȱ
alignmentȱrequirements.

Function cufftExecZ2Z()
cufftResult
cufftExecZ2Z(
cufftHandleplan,cufftDoubleComplex*idata,
cufftDoubleComplex*odata,intdirection);
ExecutesȱaȱCUFFTȱdoubleȬprecisionȱcomplexȬtoȬcomplexȱtransformȱ
planȱasȱspecifiedȱbyȱdirection.ȱCUFFTȱusesȱasȱinputȱdataȱtheȱGPUȱ
memoryȱpointedȱtoȱbyȱtheȱidataȱparameter.ȱThisȱfunctionȱstoresȱtheȱ
Fourierȱcoefficientsȱinȱtheȱodataȱarray.ȱIfȱidataȱandȱodataȱareȱtheȱsame,ȱ
thisȱmethodȱdoesȱanȱinȬplaceȱtransform.ȱȱȱ
Input
plan TheȱcufftHandleȱobjectȱforȱtheȱplanȱtoȱupdate
idata PointerȱtoȱtheȱdoubleȬprecisionȱcomplexȱinputȱdataȱ(inȱGPUȱ
memory)ȱtoȱtransformȱ
odata PointerȱtoȱtheȱdoubleȬprecisionȱcomplexȱoutputȱdataȱ(inȱGPUȱ
memory)
direction Theȱtransformȱdirection:ȱCUFFT_FORWARDȱorȱCUFFT_INVERSEȱ

Output
odata ContainsȱtheȱcomplexȱFourierȱcoefficients

Return Values
CUFFT_SUCCESS CUFFTȱsuccessfullyȱcreatedȱtheȱFFTȱplan.ȱ
CUFFT_INVALID_PLAN Theȱplanȱparameterȱisȱnotȱaȱvalidȱhandle.ȱ
CUFFT_INVALID_VALUE Theȱidata,ȱodata,ȱand/orȱdirectionȱparameterȱ
isȱnotȱvalid.ȱ
CUFFT_INTERNAL_ERROR Internalȱdriverȱerrorȱisȱdetected.
CUFFT_EXEC_FAILED CUFFTȱfailedȱtoȱexecuteȱtheȱtransformȱonȱGPU.ȱ
CUFFT_SETUP_FAILED CUFFTȱlibraryȱfailedȱtoȱinitialize.
CUFFT_UNALIGNED_DATA Inputȱorȱoutputȱdoesȱnotȱsatisfyȱtextureȱ
alignmentȱrequirements.

18 PG-05327-032_V01
NVIDIA
CUDA CUFFT Library

Function cufftExecD2Z()
cufftResult
cufftExecD2Z(
cufftHandleplan,cufftDoubleReal*idata,
cufftDoubleComplex*odata);
ExecutesȱaȱCUFFTȱdoubleȬprecisionȱrealȬtoȬcomplexȱ(implicitlyȱ
forward)ȱtransformȱplan.ȱCUFFTȱusesȱasȱinputȱdataȱtheȱGPUȱmemoryȱ
pointedȱtoȱbyȱtheȱidataȱparameter.ȱThisȱfunctionȱstoresȱtheȱnonȬ
redundantȱFourierȱcoefficientsȱinȱtheȱodataȱarray.ȱIfȱidataȱandȱodataȱ
areȱtheȱsame,ȱthisȱmethodȱdoesȱanȱinȬplaceȱtransformȱ(Seeȱ““CUFFTȱ
TransformȱTypes””ȱonȱpage 7ȱforȱdetailsȱonȱrealȱdataȱFFTs.)ȱȱȱ
Input
plan TheȱcufftHandleȱobjectȱforȱtheȱplanȱtoȱupdate
idata PointerȱtoȱtheȱdoubleȬprecisionȱrealȱinputȱdataȱ(inȱGPUȱ
memory)ȱtoȱtransformȱ
odata PointerȱtoȱtheȱdoubleȬprecisionȱcomplexȱoutputȱdataȱ(inȱGPUȱ
memory)

Output
odata ContainsȱtheȱcomplexȱFourierȱcoefficients

Return Values
CUFFT_SUCCESS CUFFTȱsuccessfullyȱcreatedȱtheȱFFTȱplan.ȱ
CUFFT_INVALID_PLAN Theȱplanȱparameterȱisȱnotȱaȱvalidȱhandle.ȱ
CUFFT_INVALID_VALUE Theȱidata,ȱodata,ȱand/orȱdirectionȱparameterȱ
isȱnotȱvalid.ȱ
CUFFT_INTERNAL_ERROR Internalȱdriverȱerrorȱisȱdetected.
CUFFT_EXEC_FAILED CUFFTȱfailedȱtoȱexecuteȱtheȱtransformȱonȱGPU.ȱ
CUFFT_SETUP_FAILED CUFFTȱlibraryȱfailedȱtoȱinitialize.
CUFFT_UNALIGNED_DATA Inputȱorȱoutputȱdoesȱnotȱsatisfyȱtextureȱ
alignmentȱrequirements.

Function cufftExecZ2D()
cufftResult
cufftExecZ2D(
cufftHandleplan,cufftDoubleComplex*idata,
cufftDoubleReal*odata);

PG-05327-032_V01 19
NVIDIA
CUDA CUFFT Library

ExecutesȱaȱCUFFTȱdoubleȬprecisionȱcomplexȬtoȬrealȱ(implicitlyȱ
inverse)ȱtransformȱplan.ȱCUFFTȱusesȱasȱinputȱdataȱtheȱGPUȱmemoryȱ
pointedȱtoȱbyȱtheȱidataȱparameter.ȱTheȱinputȱarrayȱholdsȱonlyȱtheȱnonȬ
redundantȱcomplexȱFourierȱcoefficients.ȱThisȱfunctionȱstoresȱtheȱrealȱ
outputȱvaluesȱinȱtheȱodataȱarray.ȱIfȱidataȱandȱodataȱareȱtheȱsame,ȱthisȱ
methodȱdoesȱanȱinȬplaceȱtransform.ȱ(Seeȱ““CUFFTȱTransformȱTypes””ȱ
onȱpage 7ȱforȱdetailsȱonȱrealȱdataȱFFTs.)ȱȱȱ
Input
plan TheȱcufftHandleȱobjectȱforȱtheȱplanȱtoȱupdate
idata PointerȱtoȱtheȱdoubleȬprecisionȱcomplexȱinputȱdataȱ(inȱGPUȱ
memory)ȱtoȱtransformȱ
odata PointerȱtoȱtheȱdoubleȬprecisionȱrealȱoutputȱdataȱ(inȱGPUȱ
memory)

Output
odata ContainsȱtheȱrealȬvaluedȱoutputȱdata

Return Values
CUFFT_SUCCESS CUFFTȱsuccessfullyȱcreatedȱtheȱFFTȱplan.ȱ
CUFFT_INVALID_PLAN Theȱplanȱparameterȱisȱnotȱaȱvalidȱhandle.ȱ
CUFFT_INVALID_VALUE Theȱidata,ȱodata,ȱand/orȱdirectionȱparameterȱ
isȱnotȱvalid.ȱ
CUFFT_INTERNAL_ERROR Internalȱdriverȱerrorȱisȱdetected.
CUFFT_EXEC_FAILED CUFFTȱfailedȱtoȱexecuteȱtheȱtransformȱonȱGPU.ȱ
CUFFT_SETUP_FAILED CUFFTȱlibraryȱfailedȱtoȱinitialize.
CUFFT_UNALIGNED_DATA Inputȱorȱoutputȱdoesȱnotȱsatisfyȱtextureȱ
alignmentȱrequirements.

Function cufftSetStream()
cufftResult
cufftSetStream(cufftHandleplan,cudaStream_tstream);
AssociatesȱaȱCUDAȱstreamȱwithȱaȱCUFFTȱplan.ȱAllȱkernelȱlaunchesȱ
madeȱduringȱplanȱexecutionȱareȱnowȱdoneȱthroughȱtheȱassociatedȱ
stream,ȱenablingȱoverlapȱwithȱactivityȱinȱotherȱstreamsȱ(forȱexample,ȱ

20 PG-05327-032_V01
NVIDIA
CUDA CUFFT Library

dataȱcopying).ȱTheȱassociationȱremainsȱuntilȱtheȱplanȱisȱdestroyedȱorȱ
theȱstreamȱisȱchangedȱwithȱanotherȱcallȱtoȱcufftSetStream().ȱȱȱ
Input
plan TheȱcufftHandleȱobjectȱtoȱassociateȱwithȱtheȱstream
stream AȱvalidȱCUDAȱstreamȱcreatedȱwithȱcudaStreamCreate()ȱ(orȱ0ȱ
forȱtheȱdefaultȱstream)

Output
odata ContainsȱtheȱrealȬvaluedȱoutputȱdata

Return Values
CUFFT_SUCCESS Theȱstreamȱwasȱassociatedȱwithȱtheȱplan.
CUFFT_INVALID_PLAN Theȱplanȱparameterȱisȱnotȱaȱvalidȱhandle.ȱ

Function cufftSetCompatibilityMode()
cufftResult
cufftSetCompatibilityMode(
cufftHandleplan,cufftCompatibilitymode);
ConfiguresȱtheȱlayoutȱofȱCUFFTȱoutputȱinȱFFTWȬcompatibleȱmodes.ȱ
WhenȱFFTWȱcompatibilityȱisȱdesired,ȱitȱcanȱbeȱconfiguredȱforȱpaddingȱ
only,ȱforȱasymmetricȱcomplexȱinputsȱonly,ȱorȱtoȱbeȱfullyȱcompatible.ȱȱ
Input
plan TheȱcufftHandleȱobjectȱtoȱassociateȱwithȱtheȱstream
mode TheȱcufftCompatibilityȱoptionȱtoȱbeȱusedȱ(seeȱ““Typeȱ
cufftCompatibility””ȱonȱpage 7):ȱ
CUFFT_COMPATIBILITY_NATIVE
CUFFT_COMPATIBILITY_FFTW_PADDINGȱ(Default)ȱ
CUFFT_COMPATIBILITY_FFTW_ASYMMETRIC
CUFFT_COMPATIBILITY_FFTW_ALL

Return Values
CUFFT_SUCCESS CUFFTȱsuccessfullyȱexecutedȱtheȱFFTȱplan.
CUFFT_INVALID_PLAN Theȱplanȱparameterȱisȱnotȱaȱvalidȱhandle.ȱ
CUFFT_SETUP_FAILED CUFFTȱlibraryȱfailedȱtoȱinitialize.

PG-05327-032_V01 21
NVIDIA
CUDA CUFFT Library

Accuracy and Performance


AȱgeneralȱDFTȱcanȱbeȱimplementedȱasȱaȱmatrixȱvectorȱmultiplicationȱ
thatȱrequiresȱO(N2)ȱoperations.ȱHowever,ȱtheȱCUFFTȱLibraryȱemploysȱ
theȱCooleyȬTukeyȱalgorithmȱtoȱreduceȱtheȱnumberȱofȱrequiredȱ
operationsȱand,ȱthereby,ȱtoȱoptimizeȱtheȱperformanceȱofȱparticularȱ
transformȱsizes.ȱThisȱalgorithmȱexpressesȱaȱDFTȱrecursivelyȱinȱtermsȱofȱ
smallerȱDFTȱbuildingȱblocks.ȱTheȱCUFFTȱLibraryȱimplementsȱtheȱ
followingȱDFTȱbuildingȱblocks:ȱradixȬ2,ȱradixȬ3,ȱradixȬ5,ȱandȱradixȬ7.ȱ
Henceȱtheȱperformanceȱofȱanyȱtransformȱsizeȱthatȱcanȱbeȱfactoredȱasȱ
a b c d
2 * 3 * 5 * 7 ȱ(whereȱa,ȱb,ȱc,ȱandȱdȱareȱnonȬnegativeȱintegers)ȱisȱ
optimizedȱinȱtheȱCUFFTȱlibrary.ȱForȱotherȱsizes,ȱsingleȱdimensionalȱ
transformsȱareȱhandledȱbyȱtheȱBluesteinȱalgorithm,ȱwhichȱisȱbuiltȱonȱ
topȱofȱtheȱCooleyȬTukeyȱalgorithm.ȱTheȱaccuracyȱofȱtheȱBluesteinȱ
implementationȱdegradesȱwithȱlargerȱsizesȱcomparedȱtoȱtheȱpureȱ
CooleyȬTukeyȱcodeȱpath,ȱspecificallyȱinȱsingleȬprecisionȱmode,ȱdueȱtoȱ
theȱaccumulationȱofȱfloatingȬpointȱoperationȱinaccuracies.ȱOnȱtheȱ
otherȱhand,ȱtheȱpureȱCooleyȬTukeyȱimplementationȱhasȱexcellentȱ
accuracy,ȱwithȱtheȱrelativeȱerrorȱgrowingȱproportionallyȱtoȱlog2(N),ȱ
whereȱNȱisȱtheȱtransformȱsizeȱinȱpoints.
ForȱsizesȱhandledȱbyȱtheȱCooleyȬTukeyȱcodeȱpathȱ(thatȱis,ȱstrictlyȱ
multiplesȱofȱ2,ȱ3,ȱ5,ȱandȱ7),ȱtheȱmostȱefficientȱimplementationȱisȱ
obtainedȱbyȱapplyingȱtheȱfollowingȱconstraintsȱ(listedȱinȱorderȱofȱtheȱ
mostȱgenericȱtoȱtheȱmostȱspecializedȱconstraint,ȱwithȱeachȱsubsequentȱ
constraintȱprovidingȱtheȱpotentialȱofȱanȱadditionalȱperformanceȱ
improvement).
‰ Restrictȱtheȱsizeȱalongȱanyȱdimensionȱtoȱbeȱaȱmultipleȱofȱ2,ȱ3,ȱ5,ȱorȱ7ȱonly.ȱ
Forȱexample,ȱaȱtransformȱofȱsizeȱ3nȱwillȱlikelyȱbeȱfasterȱthanȱoneȱofȱ
i j
sizeȱ 2 * 3 ,ȱevenȱifȱtheȱlatterȱisȱslightlyȱsmaller.
‰ RestrictȱtheȱpowerȬofȬtwoȱfactorizationȱtermȱofȱtheȱXȬdimensionȱtoȱbeȱatȱ
leastȱaȱmultipleȱofȱeitherȱ16ȱforȱsingleȬprecisionȱtransformsȱorȱ8ȱforȱ
doubleȬprecisionȱtransforms.ȱThisȱaidsȱwithȱmemoryȱcoalescingȱonȱ
TeslaȬclassȱandȱFermiȬclassȱGPUs.
‰ RestrictȱtheȱpowerȬofȬtwoȱfactorizationȱtermȱofȱtheȱXȬdimensionȱtoȱbeȱaȱ
multipleȱofȱeitherȱ256ȱforȱsingleȬprecisionȱtransformsȱorȱ64ȱforȱdoubleȬ

22 PG-05327-032_V01
NVIDIA
CUDA CUFFT Library

precisionȱtransforms.ȱThisȱfurtherȱaidsȱwithȱmemoryȱcoalescingȱonȱ
TeslaȬclassȱandȱFermiȬclassȱGPUs.
‰ RestrictȱtheȱXȬdimensionȱofȱsingleȬprecisionȱtransformsȱtoȱbeȱstrictlyȱaȱ
powerȱofȱtwoȱbetweenȱeitherȱ2ȱandȱ2048ȱforȱTeslaȬclassȱGPUsȱorȱ2ȱandȱ
8192ȱforȱFermiȬclassȱGPUs.ȱTheseȱtransformsȱareȱimplementedȱasȱ
specializedȱhandȬcodedȱkernelsȱthatȱkeepȱallȱintermediateȱresultsȱ
inȱsharedȱmemory.
Startingȱwithȱversionȱ3.1ȱofȱtheȱCUFFTȱLibrary,ȱtheȱconjugateȱ
symmetryȱpropertyȱofȱrealȬtoȬcomplexȱoutputȱdataȱarraysȱandȱ
complexȬtoȬrealȱinputȱdataȱarraysȱisȱexploited;ȱspecifically,ȱwhenȱtheȱ
powerȬofȬtwoȱfactorizationȱtermȱofȱtheȱXȬdimensionȱisȱatȱleastȱaȱ
multipleȱofȱ4.ȱLargeȱ1Dȱsizesȱ(powersȬofȬtwoȱlargerȱthanȱ65,536)ȱandȱ
2Dȱandȱ3Dȱtransformsȱbenefitȱtheȱmostȱfromȱtheȱperformanceȱ
optimizationsȱinȱtheȱimplementationȱofȱrealȬtoȬcomplexȱorȱcomplexȬtoȬ
realȱtransforms.

CUFFT Code Examples


Thisȱsectionȱprovidesȱsixȱsimpleȱexamplesȱofȱ1D,ȱ2D,ȱandȱ3Dȱcomplexȱ
andȱrealȱdataȱtransformsȱthatȱuseȱtheȱCUFFTȱtoȱperformȱforwardȱandȱ
inverseȱFFTs.ȱTheȱexamplesȱareȱasȱfollows:
‰ ““1DȱComplexȬtoȬComplexȱTransforms””ȱonȱpage 24
‰ ““1DȱRealȬtoȬComplexȱTransforms””ȱonȱpage 25
‰ ““2DȱComplexȬtoȬComplexȱTransforms””ȱonȱpage 26
‰ ““Batchedȱ2DȱComplexȬtoȬComplexȱTransforms””ȱonȱpage 27
‰ ““2DȱComplexȬtoȬRealȱTransforms””ȱonȱpage 28
‰ ““3DȱComplexȬtoȬComplexȱTransforms””ȱonȱpage 29

PG-05327-032_V01 23
NVIDIA
CUDA CUFFT Library

1D Complex-to-Complex Transforms
#defineNX256
#defineBATCH10

cufftHandleplan;
cufftComplex*data;
cudaMalloc((void**)&data,sizeof(cufftComplex)*NX*BATCH);

/*Createa1DFFTplan.*/
cufftPlan1d(&plan,NX,CUFFT_C2C,BATCH);

/*UsetheCUFFTplantotransformthesignalinplace.*/
cufftExecC2C(plan,data,data,CUFFT_FORWARD);

/*Inversetransformthesignalinplace.*/
cufftExecC2C(plan,data,data,CUFFT_INVERSE);

/*Note:
(1)Dividebynumberofelementsindatasettogetbackoriginaldata
(2)IdenticalpointerstoinputandoutputarraysimpliesinŞplace
transformation
*/

/*DestroytheCUFFTplan.*/
cufftDestroy(plan);
cudaFree(data);

24 PG-05327-032_V01
NVIDIA
CUDA CUFFT Library

1D Real-to-Complex Transforms
#defineNX256
#defineBATCH10

cufftHandleplan;
cufftComplex*data;
cudaMalloc((void**)&data,sizeof(cufftComplex)*(NX/2+1)*BATCH);

/*Createa1DFFTplan.*/
cufftPlan1d(&plan,NX,CUFFT_R2C,BATCH);

/*UsetheCUFFTplantotransformthesignalinplace.*/
cufftExecR2C(plan,(cufftReal*)data,data);

/*DestroytheCUFFTplan.*/
cufftDestroy(plan);
cudaFree(data);

PG-05327-032_V01 25
NVIDIA
CUDA CUFFT Library

2D Complex-to-Complex Transforms
#defineNX256
#defineNY128

cufftHandleplan;
cufftComplex*idata,*odata;
cudaMalloc((void**)&idata,sizeof(cufftComplex)*NX*NY);
cudaMalloc((void**)&odata,sizeof(cufftComplex)*NX*NY);

/*Createa2DFFTplan.*/
cufftPlan2d(&plan,NX,NY,CUFFT_C2C);

/*UsetheCUFFTplantotransformthesignaloutofplace.*/
cufftExecC2C(plan,idata,odata,CUFFT_FORWARD);

/*Note:idata!=odataindicatesanoutŞofŞplacetransformation
toCUFFTatexecutiontime.*/
/*Inversetransformthesignalinplace*/
cufftExecC2C(plan,odata,odata,CUFFT_INVERSE);

/*DestroytheCUFFTplan.*/
cufftDestroy(plan);
cudaFree(idata);cudaFree(odata);

26 PG-05327-032_V01
NVIDIA
CUDA CUFFT Library

Batched 2D Complex-to-Complex Transforms


#defineNX128
#defineNY256
#defineBATCHSIZE1000

intdatalen;
cufftHandleplan;
cufftComplex*indata,*outdata;

datalen=NX*NY*BATCHSIZE;
cudaMalloc((void**)&indata,sizeof(cufftComplex)*datalen);
cudaMalloc((void**)&outdata,sizeof(cufftComplex)*datalen);

/*Createabatched2Dplan*/
cufftPlanMany(&plan,2,{NX,NY},NULL,1,0,NULL,1,0,CUFFT_C2C,BATCHSIZE);

/*ExecutethetransformoutŞofŞplace*/
cufftExecC2C(plan,indata,outdata,CUFFT_FORWARD);

/*DestroytheCUFFTplan*/
cufftDestroy(plan);
cudaFree(indata);
cudaFree(outdata);

PG-05327-032_V01 27
NVIDIA
CUDA CUFFT Library

2D Complex-to-Real Transforms
#defineNX256
#defineNY128

cufftHandleplan;
cufftComplex*idata;
cufftReal*odata;
cudaMalloc((void**)&idata,sizeof(cufftComplex)*NX*NY);
cudaMalloc((void**)&odata,sizeof(cufftReal)*NX*NY);

/*Createa2DFFTplan.*/
cufftPlan2d(&plan,NX,NY,CUFFT_C2R);

/*UsetheCUFFTplantotransformthesignaloutofplace.*/
cufftExecC2R(plan,idata,odata);

/*DestroytheCUFFTplan.*/
cufftDestroy(plan);
cudaFree(idata);cudaFree(odata);

28 PG-05327-032_V01
NVIDIA
CUDA CUFFT Library

3D Complex-to-Complex Transforms
#defineNX64
#defineNY64
#defineNZ128

cufftHandleplan;
cufftComplex*data1,*data2;
cudaMalloc((void**)&data1,sizeof(cufftComplex)*NX*NY*NZ);
cudaMalloc((void**)&data2,sizeof(cufftComplex)*NX*NY*NZ);

/*Createa3DFFTplan.*/
cufftPlan3d(&plan,NX,NY,NZ,CUFFT_C2C);

/*Transformthefirstsignalinplace.*/
cufftExecC2C(plan,data1,data1,CUFFT_FORWARD);

/*Transformthesecondsignalusingthesameplan.*/
cufftExecC2C(plan,data2,data2,CUFFT_FORWARD);

/*DestroytheCUFFTplan.*/
cufftDestroy(plan);
cudaFree(data1);cudaFree(data2);

PG-05327-032_V01 29
NVIDIA

You might also like