CUFFT Library
CUFFT Library
CUFFT Library
PG-05327-032_V01
August, 2010
CUFFT Library PG-05327-032_V01
Publishedȱby
NVIDIAȱCorporationȱ
2701ȱSanȱTomasȱExpressway
SantaȱClara,ȱCAȱ95050
Notice
ALLȱNVIDIAȱDESIGNȱSPECIFICATIONS,ȱREFERENCEȱBOARDS,ȱFILES,ȱDRAWINGS,ȱDIAGNOSTICS,ȱ
LISTS,ȱANDȱOTHERȱDOCUMENTSȱ(TOGETHERȱANDȱSEPARATELY,ȱ“MATERIALS”)ȱAREȱBEINGȱ
PROVIDEDȱ“ASȱIS”.ȱNVIDIAȱMAKESȱNOȱWARRANTIES,ȱEXPRESSED,ȱIMPLIED,ȱSTATUTORY,ȱORȱ
OTHERWISEȱWITHȱRESPECTȱTOȱTHEȱMATERIALS,ȱANDȱEXPRESSLYȱDISCLAIMSȱALLȱIMPLIEDȱ
WARRANTIESȱOFȱNONINFRINGEMENT,ȱMERCHANTABILITY,ȱANDȱFITNESSȱFORȱAȱPARTICULARȱ
PURPOSE.
Informationȱfurnishedȱisȱbelievedȱtoȱbeȱaccurateȱandȱreliable.ȱHowever,ȱNVIDIAȱCorporationȱassumesȱnoȱ
responsibilityȱforȱtheȱconsequencesȱofȱuseȱofȱsuchȱinformationȱorȱforȱanyȱinfringementȱofȱpatentsȱorȱotherȱ
rightsȱofȱthirdȱpartiesȱthatȱmayȱresultȱfromȱitsȱuse.ȱNoȱlicenseȱisȱgrantedȱbyȱimplicationȱorȱotherwiseȱunderȱ
anyȱpatentȱorȱpatentȱrightsȱofȱNVIDIAȱCorporation.ȱSpecificationsȱmentionedȱinȱthisȱpublicationȱareȱ
subjectȱtoȱchangeȱwithoutȱnotice.ȱThisȱpublicationȱsupersedesȱandȱreplacesȱallȱinformationȱpreviouslyȱ
supplied.ȱNVIDIAȱCorporationȱproductsȱareȱnotȱauthorizedȱforȱuseȱasȱcriticalȱcomponentsȱinȱlifeȱsupportȱ
devicesȱorȱsystemsȱwithoutȱexpressȱwrittenȱapprovalȱofȱNVIDIAȱCorporation.ȱ
Trademarks
NVIDIA,ȱCUDA,ȱandȱtheȱNVIDIAȱlogoȱareȱtrademarksȱorȱregisteredȱtrademarksȱofȱNVIDIAȱCorporationȱ
inȱtheȱUnitedȱStatesȱandȱotherȱcountries.ȱOtherȱcompanyȱandȱproductȱnamesȱmayȱbeȱtrademarksȱofȱtheȱ
respectiveȱcompaniesȱwithȱwhichȱtheyȱareȱassociated.
Copyright
©ȱ2005–2010ȱbyȱNVIDIAȱCorporation.ȱAllȱrightsȱreserved.ȱ
NVIDIA Corporation
Table of Contents
CUFFT Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
CUFFT Types and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Type cufftHandle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Type cufftResult . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Type cufftReal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Type cufftDoubleReal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Type cufftComplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Type cufftDoubleComplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Type cufftCompatibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
CUFFT Transform Types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
CUFFT Transform Directions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Streamed CUFFT Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
FFTW Compatibility Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
CUFFT API Functions . . . . . . . . . . . . . . . . . . .... .... ... . . . . . . . . . . . . . . . . . . .... 11
Function cufftPlan1d(). . . . . . . . . . . . . . . .... .... ... . . . . . . . . . . . . . . . . . . .... 12
Function cufftPlan2d(). . . . . . . . . . . . . . . .... .... ... . . . . . . . . . . . . . . . . . . .... 12
Function cufftPlan3d(). . . . . . . . . . . . . . . .... .... ... . . . . . . . . . . . . . . . . . . .... 13
Function cufftPlanMany(). . . . . . . . . . . . . .... .... ... . . . . . . . . . . . . . . . . . . .... 14
Function cufftDestroy() . . . . . . . . . . . . . . .... .... ... . . . . . . . . . . . . . . . . . . .... 15
Function cufftExecC2C() . . . . . . . . . . . . . .... .... ... . . . . . . . . . . . . . . . . . . .... 15
Function cufftExecR2C() . . . . . . . . . . . . . .... .... ... . . . . . . . . . . . . . . . . . . .... 16
Function cufftExecC2R() . . . . . . . . . . . . . .... .... ... . . . . . . . . . . . . . . . . . . .... 17
Function cufftExecZ2Z() . . . . . . . . . . . . . .... .... ... . . . . . . . . . . . . . . . . . . .... 18
Function cufftExecD2Z() . . . . . . . . . . . . . .... .... ... . . . . . . . . . . . . . . . . . . .... 19
Function cufftExecZ2D() . . . . . . . . . . . . . .... .... ... . . . . . . . . . . . . . . . . . . .... 19
Function cufftSetStream() . . . . . . . . . . . . .... .... ... . . . . . . . . . . . . . . . . . . .... 20
Function cufftSetCompatibilityMode() . . . . .... .... ... . . . . . . . . . . . . . . . . . . .... 21
Accuracy and Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
CUFFT Code Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . .... .... ... .... 23
1D Complex-to-Complex Transforms . . . . . . . . . . . . . . . . . . .. . . . .... .... ... .... 24
1D Real-to-Complex Transforms . . . . . . . . . . . . . . . . . . . . . .. . . . .... .... ... .... 25
2D Complex-to-Complex Transforms . . . . . . . . . . . . . . . . . . .. . . . .... .... ... .... 26
Batched 2D Complex-to-Complex Transforms. . . . . . . . . . . . .. . . . .... .... ... .... 27
2D Complex-to-Real Transforms . . . . . . . . . . . . . . . . . . . . . .. . . . .... .... ... .... 28
3D Complex-to-Complex Transforms . . . . . . . . . . . . . . . . . . .. . . . .... .... ... .... 29
PG-05327-032_V01 iii
NVIDIA
CUFFT Library
ThisȱdocumentȱdescribesȱCUFFT,ȱtheȱNVIDIA®ȱCUDA™ȱFastȱFourierȱ
Transformȱ(FFT)ȱlibrary.ȱTheȱFFTȱisȱaȱdivideȬandȬconquerȱalgorithmȱ
forȱefficientlyȱcomputingȱdiscreteȱFourierȱtransformsȱofȱcomplexȱorȱ
realȬvaluedȱdataȱsets,ȱandȱitȱisȱoneȱofȱtheȱmostȱimportantȱandȱwidelyȱ
usedȱnumericalȱalgorithms,ȱwithȱapplicationsȱthatȱincludeȱ
computationalȱphysicsȱandȱgeneralȱsignalȱprocessing.ȱTheȱCUFFTȱ
libraryȱprovidesȱaȱsimpleȱinterfaceȱforȱcomputingȱparallelȱFFTsȱonȱanȱ
NVIDIAȱGPU,ȱwhichȱallowsȱusersȱtoȱleverageȱtheȱfloatingȬpointȱpowerȱ
andȱparallelismȱofȱtheȱGPUȱwithoutȱhavingȱtoȱdevelopȱaȱcustom,ȱGPUȬ
basedȱFFTȱimplementation.
FFTȱlibrariesȱtypicallyȱvaryȱinȱtermsȱofȱsupportedȱtransformȱsizesȱandȱ
dataȱtypes.ȱForȱexample,ȱsomeȱlibrariesȱonlyȱimplementȱRadixȬ2ȱFFTs,ȱ
restrictingȱtheȱtransformȱsizeȱtoȱaȱpowerȱofȱtwo,ȱwhileȱotherȱ
implementationsȱsupportȱarbitraryȱtransformȱsizes.ȱThisȱversionȱofȱtheȱ
CUFFTȱlibraryȱsupportsȱtheȱfollowingȱfeatures:
1D,ȱ2D,ȱandȱ3DȱtransformsȱofȱcomplexȱandȱrealȬvaluedȱdata
Batchȱexecutionȱforȱdoingȱmultipleȱtransformsȱofȱanyȱdimensionȱinȱ
parallel
2Dȱandȱ3Dȱtransformȱsizesȱinȱtheȱrangeȱ[2,ȱ16384]ȱinȱanyȱ
dimension
1Dȱtransformȱsizesȱupȱtoȱ8ȱmillionȱelements
InȬplaceȱandȱoutȬofȬplaceȱtransformsȱforȱrealȱandȱcomplexȱdata
DoubleȬprecisionȱtransformsȱonȱcompatibleȱhardwareȱ(GT200ȱandȱ
laterȱGPUs)ȱ
Supportȱforȱstreamedȱexecution,ȱenablingȱsimultaneousȱ
computationȱtogetherȱwithȱdataȱmovement
PG-05327-032_V01 4
NVIDIA
CUDA CUFFT Library
Type cufftHandle
typedefunsignedintcufftHandle;
AȱhandleȱtypeȱusedȱtoȱstoreȱandȱaccessȱCUFFTȱplansȱ(seeȱ“CUFFTȱAPIȱ
Functions”ȱonȱpage 11ȱforȱmoreȱinformationȱaboutȱplans).ȱForȱ
example,ȱtheȱuserȱreceivesȱaȱhandleȱafterȱcreatingȱaȱCUFFTȱplanȱandȱ
usesȱthisȱhandleȱtoȱexecuteȱtheȱplan.
PG-05327-032_V01 5
NVIDIA
CUDA CUFFT Library
Type cufftResult
typedefenumcufftResult_tcufftResult;
AnȱenumerationȱofȱvaluesȱusedȱexclusivelyȱasȱAPIȱfunctionȱreturnȱ
values.ȱTheȱpossibleȱreturnȱvaluesȱareȱdefinedȱasȱfollows:
Return Values
CUFFT_SUCCESS AnyȱCUFFTȱoperationȱisȱsuccessful.
CUFFT_INVALID_PLAN CUFFTȱisȱpassedȱanȱinvalidȱplanȱhandle.
CUFFT_ALLOC_FAILED CUFFTȱfailedȱtoȱallocateȱGPUȱmemory.
CUFFT_INVALID_TYPE Theȱuserȱrequestsȱanȱunsupportedȱtype.
CUFFT_INVALID_VALUE Theȱuserȱspecifiesȱaȱbadȱmemoryȱpointer.
CUFFT_INTERNAL_ERROR Usedȱforȱallȱinternalȱdriverȱerrors.
CUFFT_EXEC_FAILED CUFFTȱfailedȱtoȱexecuteȱanȱFFTȱonȱtheȱGPU.
CUFFT_SETUP_FAILED TheȱCUFFTȱlibraryȱfailedȱtoȱinitialize.
CUFFT_INVALID_SIZE TheȱuserȱspecifiesȱanȱunsupportedȱFFTȱsize.
CUFFT_UNALIGNED_DATA Inputȱorȱoutputȱdoesȱnotȱsatisfyȱtextureȱ
alignmentȱrequirements.
Type cufftReal
typedeffloatcufftReal;
AȱsingleȬprecision,ȱfloatingȬpointȱrealȱdataȱtype.
Type cufftDoubleReal
typedefdoublecufftDoubleReal;
AȱdoubleȬprecision,ȱfloatingȬpointȱrealȱdataȱtype.
Type cufftComplex
typedefcuComplexcufftComplex;
AȱsingleȬprecision,ȱfloatingȬpointȱcomplexȱdataȱtypeȱthatȱconsistsȱofȱ
interleavedȱrealȱandȱimaginaryȱcomponents.
6 PG-05327-032_V01
NVIDIA
CUDA CUFFT Library
Type cufftDoubleComplex
typedefcuDoubleComplexcufftDoubleComplex;
AȱdoubleȬprecision,ȱfloatingȬpointȱcomplexȱdataȱtypeȱthatȱconsistsȱofȱ
interleavedȱrealȱandȱimaginaryȱcomponents.
Type cufftCompatibility
typedefenumcufftCompatibility_tcufftCompatibility;
AnȱenumerationȱofȱvaluesȱusedȱtoȱcontrolȱFFTWȱdataȱcompatibility.ȱ
Seeȱ“FFTWȱCompatibilityȱMode”ȱonȱpage 9ȱforȱdetails.
PG-05327-032_V01 7
NVIDIA
CUDA CUFFT Library
N0 u N1 u } u Nn e 2 + 1 ȱcomplexȱelements.ȱTherefore,ȱinȱorderȱtoȱ
performȱanȱinȬplaceȱFFT,ȱtheȱuserȱhasȱtoȱpadȱtheȱinputȱarrayȱinȱtheȱlastȱ
dimensionȱtoȱ Nn e 2 + 1 ȱcomplexȱelementsȱorȱ 2 * N e 2 + 1 ȱrealȱ
elements.ȱNoteȱthatȱtheȱrealȬtoȬcomplexȱtransformȱisȱimplicitlyȱ
forward.ȱPassingȱtheȱCUFFT_R2Cȱconstantȱtoȱanyȱplanȱcreationȱfunctionȱ
configuresȱaȱsingleȬprecisionȱrealȬtoȬcomplexȱFFT.ȱPassingȱtheȱ
CUFFT_D2ZȱconstantȱconfiguresȱaȱdoubleȬprecisionȱrealȬtoȬcomplexȱFFT.ȱ
TheȱrequirementsȱforȱcomplexȬtoȬrealȱFFTsȱareȱsimilarȱtoȱthoseȱforȱrealȬ
toȬcomplex.ȱInȱthisȱcase,ȱtheȱinputȱarrayȱholdsȱonlyȱtheȱnonȬredundant,ȱ
N e 2 + 1 complexȱcoefficientsȱfromȱaȱrealȬtoȬcomplexȱtransform.ȱTheȱ
outputȱisȱsimplyȱNȱelementsȱofȱtypeȱcufftReal.ȱHowever,ȱforȱanȱinȬ
placeȱtransform,ȱtheȱinputȱsizeȱmustȱbeȱpaddedȱtoȱ 2 * N e 2 + 1 ȱrealȱ
elements.ȱTheȱcomplexȬtoȬrealȱtransformȱisȱimplicitlyȱinverse.ȱPassingȱ
theȱCUFFT_C2Rȱconstantȱtoȱanyȱplanȱcreationȱfunctionȱconfiguresȱaȱ
singleȬprecisionȱcomplexȬtoȬrealȱFFT.ȱPassingȱCUFFT_Z2Dȱconstantȱ
configuresȱaȱdoubleȬprecisionȱcomplexȬtoȬrealȱFFT.ȱ
Forȱ1DȱcomplexȬtoȬcomplexȱtransforms,ȱtheȱstrideȱbetweenȱsignalsȱinȱaȱ
batchȱisȱassumedȱtoȱbeȱtheȱnumberȱofȱcufftComplexȱelementsȱinȱtheȱ
logicalȱtransformȱsize.ȱHowever,ȱforȱrealȬdataȱFFTs,ȱtheȱdistanceȱ
betweenȱsignalsȱinȱaȱbatchȱdependsȱonȱwhetherȱtheȱtransformȱisȱinȬ
placeȱorȱoutȬofȬplace.ȱForȱinȬplaceȱFFTs,ȱtheȱinputȱstrideȱisȱassumedȱtoȱ
beȱ 2 * N e 2 + 1 ȱcufftRealȱelementsȱorȱ N e 2 + 1 ȱcufftComplexȱelements.ȱ
ForȱoutȬofȬplaceȱtransforms,ȱinputȱandȱoutputȱstridesȱmatchȱtheȱlogicalȱ
transformȱsizeȱNȱandȱtheȱnonȬredundantȱsizeȱ N e 2 + 1 ,ȱrespectively.
StartingȱwithȱCUFFTȱversionȱ3.0,ȱbatchedȱtransformsȱareȱsupportedȱ
throughȱtheȱcufftPlanMany()ȱfunction.ȱAlthoughȱthisȱfunctionȱtakesȱ
inputȱparametersȱthatȱspecifyȱinputȬȱandȱoutputȬdataȱstrides,ȱasȱofȱ
versionȱ3.0ȱitȱisȱassumedȱtheȱdataȱforȱeachȱsignalȱwithinȱtheȱbatchȱ
immediatelyȱfollowȱtheȱdataȱofȱtheȱpreviousȱoneȱ(aȱstrideȱofȱ1).
8 PG-05327-032_V01
NVIDIA
CUDA CUFFT Library
ForȱhigherȬdimensionalȱtransformsȱ(2Dȱandȱ3D),ȱCUFFTȱperformsȱ
FFTsȱinȱrowȬmajorȱorȱCȱorder.ȱForȱexample,ȱifȱtheȱuserȱrequestsȱaȱ3Dȱ
transformȱplanȱforȱsizesȱX,ȱY,ȱandȱZ,ȱCUFFTȱtransformsȱalongȱZ,ȱY,ȱandȱ
thenȱX.ȱTheȱuserȱcanȱconfigureȱcolumnȬmajorȱFFTsȱbyȱsimplyȱchangingȱ
theȱorderȱofȱtheȱsizeȱparametersȱtoȱtheȱplanȱcreationȱAPIȱfunctions.
CUFFTȱperformsȱunȬnormalizedȱFFTs;ȱthatȱis,ȱperformingȱaȱforwardȱ
FFTȱonȱanȱinputȱdataȱsetȱfollowedȱbyȱanȱinverseȱFFTȱonȱtheȱresultingȱ
setȱyieldsȱdataȱthatȱisȱequalȱtoȱtheȱinputȱscaledȱbyȱtheȱnumberȱofȱ
elements.ȱScalingȱeitherȱtransformȱbyȱtheȱreciprocalȱofȱtheȱsizeȱofȱtheȱ
dataȱsetȱisȱleftȱforȱtheȱuserȱtoȱperformȱasȱseenȱfit.
PG-05327-032_V01 9
NVIDIA
CUDA CUFFT Library
TheȱFFTWȱcompatibilityȱmodesȱareȱasȱfollows:
CUFFT_COMPATIBILITY_NATIVEȱ
CUFFT_COMPATIBILITY_FFTW_PADDINGȱ
CUFFT_COMPATIBILITY_FFTW_ASYMMETRICȱ
CUFFT_COMPATIBILITY_FFTW_ALLȱ
CUFFT_COMPATIBILITY_NATIVEȱmodeȱdisablesȱFFTWȱcompatibility,ȱbutȱ
achievesȱtheȱhighestȱperformance.ȱ
CUFFT_COMPATIBILITY_FFTW_PADDINGȱsupportsȱFFTWȱdataȱpaddingȱbyȱ
insertingȱextraȱpaddingȱbetweenȱpackedȱinȬplaceȱtransformsȱforȱ
batchedȱtransformsȱwithȱpowerȬofȬ2ȱsize.ȱ
CUFFT_COMPATIBILITY_FFTW_ASYMMETRICȱwaivesȱtheȱC2Rȱsymmetryȱ
requirement.ȱOnceȱset,ȱitȱguaranteesȱFFTWȬcompatibleȱoutputȱforȱnonȬ
symmetricȱcomplexȱinputsȱforȱtransformsȱwithȱpowerȬofȬ2ȱsize.ȱThisȱisȱ
onlyȱusefulȱforȱartificialȱ(thatȱis,ȱrandom)ȱdataȱsetsȱasȱactualȱdataȱwillȱ
alwaysȱbeȱsymmetricȱifȱitȱhasȱcomeȱfromȱtheȱrealȱplane.ȱEnablingȱthisȱ
modeȱcanȱsignificantlyȱimpactȱperformance.ȱ
CUFFT_COMPATIBILITY_FFTW_ALLȱenablesȱfullȱFFTWȱcompatibility.ȱReferȱ
toȱtheȱFFTWȱdocumentationȱ(http://www.fftw.org)ȱforȱFFTWȱdataȱ
layoutȱspecifications.ȱ
10 PG-05327-032_V01
NVIDIA
CUDA CUFFT Library
PG-05327-032_V01 11
NVIDIA
CUDA CUFFT Library
Function cufftPlan1d()
cufftResult
cufftPlan1d(
cufftHandle*plan,intnx,cufftTypetype,intbatch);
Createsȱaȱ1DȱFFTȱplanȱconfigurationȱforȱaȱspecifiedȱsignalȱsizeȱandȱ
dataȱtype.ȱTheȱbatchȱinputȱparameterȱtellsȱCUFFTȱhowȱmanyȱ1Dȱ
transformsȱtoȱconfigure.ȱȱȱ
Input
plan PointerȱtoȱaȱcufftHandleȱobject
nx Theȱtransformȱsizeȱ(e.g.,ȱ256ȱforȱaȱ256ȬpointȱFFT)
type Theȱtransformȱdataȱtypeȱ(e.g.,ȱCUFFT_C2Cȱforȱcomplexȱtoȱcomplex)ȱ
batch Numberȱofȱtransformsȱofȱsizeȱnxȱ
Output
plan ContainsȱaȱCUFFTȱ1Dȱplanȱhandleȱvalue
Return Values
CUFFT_SUCCESS CUFFTȱsuccessfullyȱcreatedȱtheȱFFTȱplan.ȱ
CUFFT_ALLOC_FAILED AllocationȱofȱGPUȱresourcesȱforȱtheȱplanȱfailed.
CUFFT_INVALID_TYPE Theȱtypeȱparameterȱisȱnotȱsupported.
CUFFT_INTERNAL_ERROR Internalȱdriverȱerrorȱisȱdetected.
CUFFT_SETUP_FAILED CUFFTȱlibraryȱfailedȱtoȱinitialize.
CUFFT_INVALID_SIZE Theȱnxȱparameterȱisȱnotȱaȱsupportedȱsize.
Function cufftPlan2d()
cufftResult
cufftPlan2d(
cufftHandle*plan,intnx,intny,cufftTypetype);
Createsȱaȱ2DȱFFTȱplanȱconfigurationȱaccordingȱtoȱspecifiedȱsignalȱsizesȱ
andȱdataȱtype.ȱThisȱfunctionȱisȱtheȱsameȱasȱcufftPlan1d()ȱexceptȱthatȱ
itȱtakesȱaȱsecondȱsizeȱparameter,ȱny,ȱandȱdoesȱnotȱsupportȱbatching.ȱȱȱ
Input
plan PointerȱtoȱaȱcufftHandleȱobject
nx TheȱtransformȱsizeȱinȱtheȱXȬdimensionȱ(numberȱofȱrows)
ny TheȱtransformȱsizeȱinȱtheȱYȬdimensionȱ(numberȱofȱcolumns)
type Theȱtransformȱdataȱtypeȱ(e.g.,ȱCUFFT_C2Rȱforȱcomplexȱtoȱreal)
12 PG-05327-032_V01
NVIDIA
CUDA CUFFT Library
Output
plan ContainsȱaȱCUFFTȱ2Dȱplanȱhandleȱvalue
Return Values
CUFFT_SUCCESS CUFFTȱsuccessfullyȱcreatedȱtheȱFFTȱplan.ȱ
CUFFT_ALLOC_FAILED AllocationȱofȱGPUȱresourcesȱforȱtheȱplanȱfailed.
CUFFT_INVALID_TYPE Theȱtypeȱparameterȱisȱnotȱsupported.
CUFFT_INTERNAL_ERROR Internalȱdriverȱerrorȱisȱdetected.
CUFFT_SETUP_FAILED CUFFTȱlibraryȱfailedȱtoȱinitialize.
CUFFT_INVALID_SIZE Theȱnxȱparameterȱisȱnotȱaȱsupportedȱsize.
Function cufftPlan3d()
cufftResult
cufftPlan3d(
cufftHandle*plan,intnx,intny,intnz,
cufftTypetype);
Createsȱaȱ3DȱFFTȱplanȱconfigurationȱaccordingȱtoȱspecifiedȱsignalȱsizesȱ
andȱdataȱtype.ȱThisȱfunctionȱisȱtheȱsameȱasȱcufftPlan2d()ȱexceptȱthatȱ
itȱtakesȱaȱthirdȱsizeȱparameterȱnz.ȱȱȱ
Input
plan PointerȱtoȱaȱcufftHandleȱobject
nx TheȱtransformȱsizeȱinȱtheȱXȬdimension
ny TheȱtransformȱsizeȱinȱtheȱYȬdimension
nz TheȱtransformȱsizeȱinȱtheȱZȬdimension
type Theȱtransformȱdataȱtypeȱ(e.g.,ȱCUFFT_R2Cȱforȱrealȱtoȱcomplex)
Output
plan ContainsȱaȱCUFFTȱ3Dȱplanȱhandleȱvalue
Return Values
CUFFT_SUCCESS CUFFTȱsuccessfullyȱcreatedȱtheȱFFTȱplan.ȱ
CUFFT_ALLOC_FAILED AllocationȱofȱGPUȱresourcesȱforȱtheȱplanȱfailed.
CUFFT_INVALID_TYPE Theȱtypeȱparameterȱisȱnotȱsupported.
CUFFT_INTERNAL_ERROR Internalȱdriverȱerrorȱisȱdetected.
CUFFT_SETUP_FAILED CUFFTȱlibraryȱfailedȱtoȱinitialize.
CUFFT_INVALID_SIZE Theȱnxȱparameterȱisȱnotȱaȱsupportedȱsize.
PG-05327-032_V01 13
NVIDIA
CUDA CUFFT Library
Function cufftPlanMany()
cufftResult
cufftPlanMany(
cufftHandle*plan,intrank,int*n,int*inembed,
intistride,intidist,int*onembed,intostride,
intodist,cufftTypetype,intbatch);
CreatesȱaȱFFTȱplanȱconfigurationȱofȱdimensionȱrank,ȱwithȱsizesȱ
specifiedȱinȱtheȱarrayȱn.ȱTheȱbatchȱinputȱparameterȱtellsȱCUFFTȱhowȱ
manyȱtransformsȱtoȱconfigureȱinȱparallel.ȱWithȱthisȱfunction,ȱbatchedȱ
plansȱofȱanyȱdimensionȱmayȱbeȱcreated.
Inputȱparametersȱinembed,ȱistride,ȱandȱidistȱandȱoutputȱparametersȱ
onembed,ȱostride,ȱandȱodistȱwillȱallowȱsetupȱofȱnonȬcontiguousȱinputȱ
dataȱinȱaȱfutureȱversion.ȱNoteȱthatȱforȱtheȱcurrentȱversionȱofȱCUFFT,ȱ
theseȱparametersȱareȱignoredȱandȱtheȱlayoutȱofȱbatchedȱdataȱmustȱbeȱ
sideȬbyȬsideȱandȱnotȱinterleaved.ȱȱȱ
Input
plan PointerȱtoȱaȱcufftHandleȱobject
rank Dimensionalityȱofȱtheȱtransformȱ(1,ȱ2,ȱorȱ3)
n Anȱarrayȱofȱsizeȱrank,ȱdescribingȱtheȱsizeȱofȱeachȱdimension
inembed Unused:ȱpassȱNULL
istride Unused:ȱpassȱ1
idist Unused:ȱpassȱ0
onembed Unused:ȱpassȱNULL
ostride Unused:ȱpassȱ1
odist Unused:ȱpassȱ0
type Transformȱdataȱtypeȱ(e.g.,ȱCUFFT_C2C,ȱasȱperȱotherȱCUFFTȱcalls)
batch Batchȱsizeȱforȱthisȱtransform
Output
plan ContainsȱaȱCUFFTȱplanȱhandleȱ
Return Values
CUFFT_SUCCESS CUFFTȱsuccessfullyȱcreatedȱtheȱFFTȱplan.ȱ
CUFFT_ALLOC_FAILED AllocationȱofȱGPUȱresourcesȱforȱtheȱplanȱfailed.
CUFFT_INVALID_TYPE Theȱtypeȱparameterȱisȱnotȱsupported.
CUFFT_INTERNAL_ERROR Internalȱdriverȱerrorȱisȱdetected.
14 PG-05327-032_V01
NVIDIA
CUDA CUFFT Library
Function cufftDestroy()
cufftResult
cufftDestroy(cufftHandleplan);
FreesȱallȱGPUȱresourcesȱassociatedȱwithȱaȱCUFFTȱplanȱandȱdestroysȱ
theȱinternalȱplanȱdataȱstructure.ȱThisȱfunctionȱshouldȱbeȱcalledȱonceȱaȱ
planȱisȱnoȱlongerȱneededȱtoȱavoidȱwastingȱGPUȱmemory.ȱȱ
Input
plan TheȱcufftHandleȱobjectȱofȱtheȱplanȱtoȱbeȱdestroyed.ȱ
Return Values
CUFFT_SUCCESS CUFFTȱsuccessfullyȱcreatedȱtheȱFFTȱplan.ȱ
CUFFT_INVALID_PLAN Theȱplanȱparameterȱisȱnotȱaȱvalidȱhandle.ȱ
CUFFT_SETUP_FAILED CUFFTȱlibraryȱfailedȱtoȱinitialize.
Function cufftExecC2C()
cufftResult
cufftExecC2C(
cufftHandleplan,cufftComplex*idata,
cufftComplex*odata,intdirection);
ExecutesȱaȱCUFFTȱsingleȬprecisionȱcomplexȬtoȬcomplexȱtransformȱ
planȱasȱspecifiedȱbyȱdirection.ȱCUFFTȱusesȱasȱinputȱdataȱtheȱGPUȱ
memoryȱpointedȱtoȱbyȱtheȱidataȱparameter.ȱThisȱfunctionȱstoresȱtheȱ
Fourierȱcoefficientsȱinȱtheȱodataȱarray.ȱIfȱidataȱandȱodataȱareȱtheȱsame,ȱ
thisȱmethodȱdoesȱanȱinȬplaceȱtransform.ȱȱȱ
Input
plan TheȱcufftHandleȱobjectȱforȱtheȱplanȱtoȱupdate
idata PointerȱtoȱtheȱsingleȬprecisionȱcomplexȱinputȱdataȱ(inȱGPUȱ
memory)ȱtoȱtransformȱ
odata PointerȱtoȱtheȱsingleȬprecisionȱcomplexȱoutputȱdataȱ(inȱGPUȱ
memory)
direction Theȱtransformȱdirection:ȱCUFFT_FORWARDȱorȱCUFFT_INVERSEȱ
PG-05327-032_V01 15
NVIDIA
CUDA CUFFT Library
Output
odata ContainsȱtheȱcomplexȱFourierȱcoefficients
Return Values
CUFFT_SUCCESS CUFFTȱsuccessfullyȱcreatedȱtheȱFFTȱplan.ȱ
CUFFT_INVALID_PLAN Theȱplanȱparameterȱisȱnotȱaȱvalidȱhandle.ȱ
CUFFT_INVALID_VALUE Theȱidata,ȱodata,ȱand/orȱdirectionȱparameterȱ
isȱnotȱvalid.ȱ
CUFFT_INTERNAL_ERROR Internalȱdriverȱerrorȱisȱdetected.
CUFFT_EXEC_FAILED CUFFTȱfailedȱtoȱexecuteȱtheȱtransformȱonȱGPU.ȱ
CUFFT_SETUP_FAILED CUFFTȱlibraryȱfailedȱtoȱinitialize.
CUFFT_UNALIGNED_DATA Inputȱorȱoutputȱdoesȱnotȱsatisfyȱtextureȱ
alignmentȱrequirements.
Function cufftExecR2C()
cufftResult
cufftExecR2C(
cufftHandleplan,cufftReal*idata,cufftComplex*odata);
ExecutesȱaȱCUFFTȱsingleȬprecisionȱrealȬtoȬcomplexȱ(implicitlyȱ
forward)ȱtransformȱplan.ȱCUFFTȱusesȱasȱinputȱdataȱtheȱGPUȱmemoryȱ
pointedȱtoȱbyȱtheȱidataȱparameter.ȱThisȱfunctionȱstoresȱtheȱnonȬ
redundantȱFourierȱcoefficientsȱinȱtheȱodataȱarray.ȱIfȱidataȱandȱodataȱ
areȱtheȱsame,ȱthisȱmethodȱdoesȱanȱinȬplaceȱtransformȱ(Seeȱ“CUFFTȱ
TransformȱTypes”ȱonȱpage 7ȱforȱdetailsȱonȱrealȱdataȱFFTs.)ȱȱ
Input
plan TheȱcufftHandleȱobjectȱforȱtheȱplanȱtoȱupdate
idata PointerȱtoȱtheȱsingleȬprecisionȱrealȱinputȱdataȱ(inȱGPUȱmemory)ȱ
toȱtransformȱ
odata PointerȱtoȱtheȱsingleȬprecisionȱcomplexȱoutputȱdataȱ(inȱGPUȱ
memory)
Output
odata ContainsȱtheȱcomplexȱFourierȱcoefficients
Return Values
CUFFT_SUCCESS CUFFTȱsuccessfullyȱcreatedȱtheȱFFTȱplan.ȱ
CUFFT_INVALID_PLAN Theȱplanȱparameterȱisȱnotȱaȱvalidȱhandle.ȱ
16 PG-05327-032_V01
NVIDIA
CUDA CUFFT Library
Function cufftExecC2R()
cufftResult
cufftExecC2R(
cufftHandleplan,cufftComplex*idata,cufftReal*odata);
ExecutesȱaȱCUFFTȱsingleȬprecisionȱcomplexȬtoȬrealȱ(implicitlyȱinverse)ȱ
transformȱplan.ȱCUFFTȱusesȱasȱinputȱdataȱtheȱGPUȱmemoryȱpointedȱtoȱ
byȱtheȱidataȱparameter.ȱTheȱinputȱarrayȱholdsȱonlyȱtheȱnonȬredundantȱ
complexȱFourierȱcoefficients.ȱThisȱfunctionȱstoresȱtheȱrealȱoutputȱ
valuesȱinȱtheȱodataȱarray.ȱIfȱidataȱandȱodataȱareȱtheȱsame,ȱthisȱmethodȱ
doesȱanȱinȬplaceȱtransform.ȱ(Seeȱ“CUFFTȱTransformȱTypes”ȱonȱpage 7ȱ
forȱdetailsȱonȱrealȱdataȱFFTs.)ȱȱȱ
Input
plan TheȱcufftHandleȱobjectȱforȱtheȱplanȱtoȱupdate
idata PointerȱtoȱtheȱsingleȬprecisionȱcomplexȱinputȱdataȱ(inȱGPUȱ
memory)ȱtoȱtransformȱ
odata PointerȱtoȱtheȱsingleȬprecisionȱrealȱoutputȱdataȱ(inȱGPUȱ
memory)
Output
odata ContainsȱtheȱrealȬvaluedȱoutputȱdata
Return Values
CUFFT_SUCCESS CUFFTȱsuccessfullyȱcreatedȱtheȱFFTȱplan.ȱ
CUFFT_INVALID_PLAN Theȱplanȱparameterȱisȱnotȱaȱvalidȱhandle.ȱ
CUFFT_INVALID_VALUE Theȱidata,ȱodata,ȱand/orȱdirectionȱparameterȱ
isȱnotȱvalid.ȱ
CUFFT_INTERNAL_ERROR Internalȱdriverȱerrorȱisȱdetected.
CUFFT_EXEC_FAILED CUFFTȱfailedȱtoȱexecuteȱtheȱtransformȱonȱGPU.ȱ
PG-05327-032_V01 17
NVIDIA
CUDA CUFFT Library
Function cufftExecZ2Z()
cufftResult
cufftExecZ2Z(
cufftHandleplan,cufftDoubleComplex*idata,
cufftDoubleComplex*odata,intdirection);
ExecutesȱaȱCUFFTȱdoubleȬprecisionȱcomplexȬtoȬcomplexȱtransformȱ
planȱasȱspecifiedȱbyȱdirection.ȱCUFFTȱusesȱasȱinputȱdataȱtheȱGPUȱ
memoryȱpointedȱtoȱbyȱtheȱidataȱparameter.ȱThisȱfunctionȱstoresȱtheȱ
Fourierȱcoefficientsȱinȱtheȱodataȱarray.ȱIfȱidataȱandȱodataȱareȱtheȱsame,ȱ
thisȱmethodȱdoesȱanȱinȬplaceȱtransform.ȱȱȱ
Input
plan TheȱcufftHandleȱobjectȱforȱtheȱplanȱtoȱupdate
idata PointerȱtoȱtheȱdoubleȬprecisionȱcomplexȱinputȱdataȱ(inȱGPUȱ
memory)ȱtoȱtransformȱ
odata PointerȱtoȱtheȱdoubleȬprecisionȱcomplexȱoutputȱdataȱ(inȱGPUȱ
memory)
direction Theȱtransformȱdirection:ȱCUFFT_FORWARDȱorȱCUFFT_INVERSEȱ
Output
odata ContainsȱtheȱcomplexȱFourierȱcoefficients
Return Values
CUFFT_SUCCESS CUFFTȱsuccessfullyȱcreatedȱtheȱFFTȱplan.ȱ
CUFFT_INVALID_PLAN Theȱplanȱparameterȱisȱnotȱaȱvalidȱhandle.ȱ
CUFFT_INVALID_VALUE Theȱidata,ȱodata,ȱand/orȱdirectionȱparameterȱ
isȱnotȱvalid.ȱ
CUFFT_INTERNAL_ERROR Internalȱdriverȱerrorȱisȱdetected.
CUFFT_EXEC_FAILED CUFFTȱfailedȱtoȱexecuteȱtheȱtransformȱonȱGPU.ȱ
CUFFT_SETUP_FAILED CUFFTȱlibraryȱfailedȱtoȱinitialize.
CUFFT_UNALIGNED_DATA Inputȱorȱoutputȱdoesȱnotȱsatisfyȱtextureȱ
alignmentȱrequirements.
18 PG-05327-032_V01
NVIDIA
CUDA CUFFT Library
Function cufftExecD2Z()
cufftResult
cufftExecD2Z(
cufftHandleplan,cufftDoubleReal*idata,
cufftDoubleComplex*odata);
ExecutesȱaȱCUFFTȱdoubleȬprecisionȱrealȬtoȬcomplexȱ(implicitlyȱ
forward)ȱtransformȱplan.ȱCUFFTȱusesȱasȱinputȱdataȱtheȱGPUȱmemoryȱ
pointedȱtoȱbyȱtheȱidataȱparameter.ȱThisȱfunctionȱstoresȱtheȱnonȬ
redundantȱFourierȱcoefficientsȱinȱtheȱodataȱarray.ȱIfȱidataȱandȱodataȱ
areȱtheȱsame,ȱthisȱmethodȱdoesȱanȱinȬplaceȱtransformȱ(Seeȱ“CUFFTȱ
TransformȱTypes”ȱonȱpage 7ȱforȱdetailsȱonȱrealȱdataȱFFTs.)ȱȱȱ
Input
plan TheȱcufftHandleȱobjectȱforȱtheȱplanȱtoȱupdate
idata PointerȱtoȱtheȱdoubleȬprecisionȱrealȱinputȱdataȱ(inȱGPUȱ
memory)ȱtoȱtransformȱ
odata PointerȱtoȱtheȱdoubleȬprecisionȱcomplexȱoutputȱdataȱ(inȱGPUȱ
memory)
Output
odata ContainsȱtheȱcomplexȱFourierȱcoefficients
Return Values
CUFFT_SUCCESS CUFFTȱsuccessfullyȱcreatedȱtheȱFFTȱplan.ȱ
CUFFT_INVALID_PLAN Theȱplanȱparameterȱisȱnotȱaȱvalidȱhandle.ȱ
CUFFT_INVALID_VALUE Theȱidata,ȱodata,ȱand/orȱdirectionȱparameterȱ
isȱnotȱvalid.ȱ
CUFFT_INTERNAL_ERROR Internalȱdriverȱerrorȱisȱdetected.
CUFFT_EXEC_FAILED CUFFTȱfailedȱtoȱexecuteȱtheȱtransformȱonȱGPU.ȱ
CUFFT_SETUP_FAILED CUFFTȱlibraryȱfailedȱtoȱinitialize.
CUFFT_UNALIGNED_DATA Inputȱorȱoutputȱdoesȱnotȱsatisfyȱtextureȱ
alignmentȱrequirements.
Function cufftExecZ2D()
cufftResult
cufftExecZ2D(
cufftHandleplan,cufftDoubleComplex*idata,
cufftDoubleReal*odata);
PG-05327-032_V01 19
NVIDIA
CUDA CUFFT Library
ExecutesȱaȱCUFFTȱdoubleȬprecisionȱcomplexȬtoȬrealȱ(implicitlyȱ
inverse)ȱtransformȱplan.ȱCUFFTȱusesȱasȱinputȱdataȱtheȱGPUȱmemoryȱ
pointedȱtoȱbyȱtheȱidataȱparameter.ȱTheȱinputȱarrayȱholdsȱonlyȱtheȱnonȬ
redundantȱcomplexȱFourierȱcoefficients.ȱThisȱfunctionȱstoresȱtheȱrealȱ
outputȱvaluesȱinȱtheȱodataȱarray.ȱIfȱidataȱandȱodataȱareȱtheȱsame,ȱthisȱ
methodȱdoesȱanȱinȬplaceȱtransform.ȱ(Seeȱ“CUFFTȱTransformȱTypes”ȱ
onȱpage 7ȱforȱdetailsȱonȱrealȱdataȱFFTs.)ȱȱȱ
Input
plan TheȱcufftHandleȱobjectȱforȱtheȱplanȱtoȱupdate
idata PointerȱtoȱtheȱdoubleȬprecisionȱcomplexȱinputȱdataȱ(inȱGPUȱ
memory)ȱtoȱtransformȱ
odata PointerȱtoȱtheȱdoubleȬprecisionȱrealȱoutputȱdataȱ(inȱGPUȱ
memory)
Output
odata ContainsȱtheȱrealȬvaluedȱoutputȱdata
Return Values
CUFFT_SUCCESS CUFFTȱsuccessfullyȱcreatedȱtheȱFFTȱplan.ȱ
CUFFT_INVALID_PLAN Theȱplanȱparameterȱisȱnotȱaȱvalidȱhandle.ȱ
CUFFT_INVALID_VALUE Theȱidata,ȱodata,ȱand/orȱdirectionȱparameterȱ
isȱnotȱvalid.ȱ
CUFFT_INTERNAL_ERROR Internalȱdriverȱerrorȱisȱdetected.
CUFFT_EXEC_FAILED CUFFTȱfailedȱtoȱexecuteȱtheȱtransformȱonȱGPU.ȱ
CUFFT_SETUP_FAILED CUFFTȱlibraryȱfailedȱtoȱinitialize.
CUFFT_UNALIGNED_DATA Inputȱorȱoutputȱdoesȱnotȱsatisfyȱtextureȱ
alignmentȱrequirements.
Function cufftSetStream()
cufftResult
cufftSetStream(cufftHandleplan,cudaStream_tstream);
AssociatesȱaȱCUDAȱstreamȱwithȱaȱCUFFTȱplan.ȱAllȱkernelȱlaunchesȱ
madeȱduringȱplanȱexecutionȱareȱnowȱdoneȱthroughȱtheȱassociatedȱ
stream,ȱenablingȱoverlapȱwithȱactivityȱinȱotherȱstreamsȱ(forȱexample,ȱ
20 PG-05327-032_V01
NVIDIA
CUDA CUFFT Library
dataȱcopying).ȱTheȱassociationȱremainsȱuntilȱtheȱplanȱisȱdestroyedȱorȱ
theȱstreamȱisȱchangedȱwithȱanotherȱcallȱtoȱcufftSetStream().ȱȱȱ
Input
plan TheȱcufftHandleȱobjectȱtoȱassociateȱwithȱtheȱstream
stream AȱvalidȱCUDAȱstreamȱcreatedȱwithȱcudaStreamCreate()ȱ(orȱ0ȱ
forȱtheȱdefaultȱstream)
Output
odata ContainsȱtheȱrealȬvaluedȱoutputȱdata
Return Values
CUFFT_SUCCESS Theȱstreamȱwasȱassociatedȱwithȱtheȱplan.
CUFFT_INVALID_PLAN Theȱplanȱparameterȱisȱnotȱaȱvalidȱhandle.ȱ
Function cufftSetCompatibilityMode()
cufftResult
cufftSetCompatibilityMode(
cufftHandleplan,cufftCompatibilitymode);
ConfiguresȱtheȱlayoutȱofȱCUFFTȱoutputȱinȱFFTWȬcompatibleȱmodes.ȱ
WhenȱFFTWȱcompatibilityȱisȱdesired,ȱitȱcanȱbeȱconfiguredȱforȱpaddingȱ
only,ȱforȱasymmetricȱcomplexȱinputsȱonly,ȱorȱtoȱbeȱfullyȱcompatible.ȱȱ
Input
plan TheȱcufftHandleȱobjectȱtoȱassociateȱwithȱtheȱstream
mode TheȱcufftCompatibilityȱoptionȱtoȱbeȱusedȱ(seeȱ“Typeȱ
cufftCompatibility”ȱonȱpage 7):ȱ
CUFFT_COMPATIBILITY_NATIVE
CUFFT_COMPATIBILITY_FFTW_PADDINGȱ(Default)ȱ
CUFFT_COMPATIBILITY_FFTW_ASYMMETRIC
CUFFT_COMPATIBILITY_FFTW_ALL
Return Values
CUFFT_SUCCESS CUFFTȱsuccessfullyȱexecutedȱtheȱFFTȱplan.
CUFFT_INVALID_PLAN Theȱplanȱparameterȱisȱnotȱaȱvalidȱhandle.ȱ
CUFFT_SETUP_FAILED CUFFTȱlibraryȱfailedȱtoȱinitialize.
PG-05327-032_V01 21
NVIDIA
CUDA CUFFT Library
22 PG-05327-032_V01
NVIDIA
CUDA CUFFT Library
precisionȱtransforms.ȱThisȱfurtherȱaidsȱwithȱmemoryȱcoalescingȱonȱ
TeslaȬclassȱandȱFermiȬclassȱGPUs.
RestrictȱtheȱXȬdimensionȱofȱsingleȬprecisionȱtransformsȱtoȱbeȱstrictlyȱaȱ
powerȱofȱtwoȱbetweenȱeitherȱ2ȱandȱ2048ȱforȱTeslaȬclassȱGPUsȱorȱ2ȱandȱ
8192ȱforȱFermiȬclassȱGPUs.ȱTheseȱtransformsȱareȱimplementedȱasȱ
specializedȱhandȬcodedȱkernelsȱthatȱkeepȱallȱintermediateȱresultsȱ
inȱsharedȱmemory.
Startingȱwithȱversionȱ3.1ȱofȱtheȱCUFFTȱLibrary,ȱtheȱconjugateȱ
symmetryȱpropertyȱofȱrealȬtoȬcomplexȱoutputȱdataȱarraysȱandȱ
complexȬtoȬrealȱinputȱdataȱarraysȱisȱexploited;ȱspecifically,ȱwhenȱtheȱ
powerȬofȬtwoȱfactorizationȱtermȱofȱtheȱXȬdimensionȱisȱatȱleastȱaȱ
multipleȱofȱ4.ȱLargeȱ1Dȱsizesȱ(powersȬofȬtwoȱlargerȱthanȱ65,536)ȱandȱ
2Dȱandȱ3Dȱtransformsȱbenefitȱtheȱmostȱfromȱtheȱperformanceȱ
optimizationsȱinȱtheȱimplementationȱofȱrealȬtoȬcomplexȱorȱcomplexȬtoȬ
realȱtransforms.
PG-05327-032_V01 23
NVIDIA
CUDA CUFFT Library
1D Complex-to-Complex Transforms
#defineNX256
#defineBATCH10
cufftHandleplan;
cufftComplex*data;
cudaMalloc((void**)&data,sizeof(cufftComplex)*NX*BATCH);
/*Createa1DFFTplan.*/
cufftPlan1d(&plan,NX,CUFFT_C2C,BATCH);
/*UsetheCUFFTplantotransformthesignalinplace.*/
cufftExecC2C(plan,data,data,CUFFT_FORWARD);
/*Inversetransformthesignalinplace.*/
cufftExecC2C(plan,data,data,CUFFT_INVERSE);
/*Note:
(1)Dividebynumberofelementsindatasettogetbackoriginaldata
(2)IdenticalpointerstoinputandoutputarraysimpliesinŞplace
transformation
*/
/*DestroytheCUFFTplan.*/
cufftDestroy(plan);
cudaFree(data);
24 PG-05327-032_V01
NVIDIA
CUDA CUFFT Library
1D Real-to-Complex Transforms
#defineNX256
#defineBATCH10
cufftHandleplan;
cufftComplex*data;
cudaMalloc((void**)&data,sizeof(cufftComplex)*(NX/2+1)*BATCH);
/*Createa1DFFTplan.*/
cufftPlan1d(&plan,NX,CUFFT_R2C,BATCH);
/*UsetheCUFFTplantotransformthesignalinplace.*/
cufftExecR2C(plan,(cufftReal*)data,data);
/*DestroytheCUFFTplan.*/
cufftDestroy(plan);
cudaFree(data);
PG-05327-032_V01 25
NVIDIA
CUDA CUFFT Library
2D Complex-to-Complex Transforms
#defineNX256
#defineNY128
cufftHandleplan;
cufftComplex*idata,*odata;
cudaMalloc((void**)&idata,sizeof(cufftComplex)*NX*NY);
cudaMalloc((void**)&odata,sizeof(cufftComplex)*NX*NY);
/*Createa2DFFTplan.*/
cufftPlan2d(&plan,NX,NY,CUFFT_C2C);
/*UsetheCUFFTplantotransformthesignaloutofplace.*/
cufftExecC2C(plan,idata,odata,CUFFT_FORWARD);
/*Note:idata!=odataindicatesanoutŞofŞplacetransformation
toCUFFTatexecutiontime.*/
/*Inversetransformthesignalinplace*/
cufftExecC2C(plan,odata,odata,CUFFT_INVERSE);
/*DestroytheCUFFTplan.*/
cufftDestroy(plan);
cudaFree(idata);cudaFree(odata);
26 PG-05327-032_V01
NVIDIA
CUDA CUFFT Library
intdatalen;
cufftHandleplan;
cufftComplex*indata,*outdata;
datalen=NX*NY*BATCHSIZE;
cudaMalloc((void**)&indata,sizeof(cufftComplex)*datalen);
cudaMalloc((void**)&outdata,sizeof(cufftComplex)*datalen);
/*Createabatched2Dplan*/
cufftPlanMany(&plan,2,{NX,NY},NULL,1,0,NULL,1,0,CUFFT_C2C,BATCHSIZE);
/*ExecutethetransformoutŞofŞplace*/
cufftExecC2C(plan,indata,outdata,CUFFT_FORWARD);
/*DestroytheCUFFTplan*/
cufftDestroy(plan);
cudaFree(indata);
cudaFree(outdata);
PG-05327-032_V01 27
NVIDIA
CUDA CUFFT Library
2D Complex-to-Real Transforms
#defineNX256
#defineNY128
cufftHandleplan;
cufftComplex*idata;
cufftReal*odata;
cudaMalloc((void**)&idata,sizeof(cufftComplex)*NX*NY);
cudaMalloc((void**)&odata,sizeof(cufftReal)*NX*NY);
/*Createa2DFFTplan.*/
cufftPlan2d(&plan,NX,NY,CUFFT_C2R);
/*UsetheCUFFTplantotransformthesignaloutofplace.*/
cufftExecC2R(plan,idata,odata);
/*DestroytheCUFFTplan.*/
cufftDestroy(plan);
cudaFree(idata);cudaFree(odata);
28 PG-05327-032_V01
NVIDIA
CUDA CUFFT Library
3D Complex-to-Complex Transforms
#defineNX64
#defineNY64
#defineNZ128
cufftHandleplan;
cufftComplex*data1,*data2;
cudaMalloc((void**)&data1,sizeof(cufftComplex)*NX*NY*NZ);
cudaMalloc((void**)&data2,sizeof(cufftComplex)*NX*NY*NZ);
/*Createa3DFFTplan.*/
cufftPlan3d(&plan,NX,NY,NZ,CUFFT_C2C);
/*Transformthefirstsignalinplace.*/
cufftExecC2C(plan,data1,data1,CUFFT_FORWARD);
/*Transformthesecondsignalusingthesameplan.*/
cufftExecC2C(plan,data2,data2,CUFFT_FORWARD);
/*DestroytheCUFFTplan.*/
cufftDestroy(plan);
cudaFree(data1);cudaFree(data2);
PG-05327-032_V01 29
NVIDIA