• <xmp id="om0om">
  • <table id="om0om"><noscript id="om0om"></noscript></table>
  • NVIDIA cuFFT

    NVIDIA cuFFT, a library that provides GPU-accelerated Fast Fourier Transform (FFT) implementations, is used for building applications across disciplines, such as deep learning, computer vision, computational physics, molecular dynamics, quantum chemistry, and seismic and medical imaging.

    Available in the CUDA Toolkit

    cuFFT

    Divide-and-conquer algorithms for computing discrete Fourier transformers. Multi-GPU support for FFT calculations on up to 16 GPUs in a single node.

    Available in the HPC SDK

    cuFFT

    Divide-and-conquer algorithms for computing discrete Fourier transformers. Multi-GPU support for FFT calculations on up to 16 GPUs in a single node.

    cuFFTMp

    Multi-node support for FFTs in exascale problems.

    Available as Standalone

    cuFFTDx Device APIs

    cuFFT Device Extensions for performing FFT calculations inside a CUDA kernel.


    cuFFT

    The FFT is a divide-and-conquer algorithm for efficiently computing discrete Fourier transforms of complex or real-valued datasets. It’s one of the most important and widely used numerical algorithms in computational physics and general signal processing. The cuFFT library provides a simple interface for computing FFTs on an NVIDIA GPU, which allows users to quickly leverage the GPU’s floating-point power and parallelism in a highly optimized and tested FFT library.

    When calculations are distributed across GPUs, cuFFT supports using up to 16 GPUs connected to a CPU to perform Fourier Transforms through its cuFFTXt APIs. Performance is a function of the bandwidth between the GPUs, the computational ability of the individual GPUs, and the type and number of FFTs to be performed.

    HPC SDKCUDA Toolkit
    • 1D, 2D, and 3D transforms of complex and real data types

    • Familiar APIs similar to the advanced interface of the Fastest Fourier Transform in the West (FFTW)

    • Flexible data layouts allowing arbitrary strides between individual elements and array dimensions

    • Streamed asynchronous execution

    • Half-, single-, and double-precision transforms

    • Batch execution

    • In-place and out-of-place transforms

    • Support for up to 16-GPU systems

    • Thread-safe and callable from multiple host threads


    cuFFTDx Device Extensions

    cuFFT Device Extensions (cuFFTDx) enable users to perform FFT calculations inside their CUDA kernel. Fusing numerical operations can decrease latency and improve the performance of their application.

    Download cuFFTDx
    • FFT embeddable into a CUDA kernel

    • High-performance, no-unnecessary data movement from and to global memory

    • Customizable with options to adjust selection of FFT routine for different needs (size, precision, batches, etc.)

    • Ability to fuse FFT kernels with other operations, saving global memory trips

    • Compatible with future versions of the CUDA Toolkit

    • Support for Windows


    cuFFTMp Multi-Node Support

    The multi-node FFT functionality, available through the cuFFTMp API, enables scientists and engineers to solve distributed 2D and 3D FFTs in exascale problems. The library handles all the communications between machines, allowing users to focus on other aspects of their problems.

    Download cuFFTMpDownload HPC-SDK
    • 2D and 3D distributed-memory FFTs

    • Slabs (1D) and pencils (2D) data decomposition, with arbitrary block sizes

    • Message Passing Interface (MPI) compatible

    • Low-latency implementation using NVSHMEM, optimized for single-node and multi-node FFTs


    cuFFTDx + cuFFT LTO EA Preview

    This early-access version of cuFFT and cuFFTDx previews an innovative way of expanding features of the device library, cuFFTDx, through the host library, cuFFT. It leverages device Link-Time Optimizations (LTO) features of the CUDA Toolkit to combine code segments and achieve optimal performance.

    Download Now
    • A new way of enhancing your cuFFTDx project via our cuFFT host library.

    • Over 1000 additional sizes supported with improved performance and without workspace requirement, via code sharing across our libraries enabled by LTO.

    • Supporting both offline builds (using NVCC) and runtime builds (using NVRTC / nvJitLink).

    • Additional link time optimization in cuFFTDx applications.


    Resources


    Decorative image representing Developer Forums

    Visit the Forums

    Decorative image representing contact NVIDIA

    Contact Us

    人人超碰97caoporen国产