fftw3: Multi-threaded FFTW
5 Multi-threaded FFTW
*********************
In this chapter we document the parallel FFTW routines for shared-memory
parallel hardware. These routines, which support parallel one- and
multi-dimensional transforms of both real and complex data, are the
easiest way to take advantage of multiple processors with FFTW. They
work just like the corresponding uniprocessor transform routines, except
that you have an extra initialization routine to call, and there is a
routine to set the number of threads to employ. Any program that uses
the uniprocessor FFTW can therefore be trivially modified to use the
multi-threaded FFTW.
A shared-memory machine is one in which all CPUs can directly access
the same main memory, and such machines are now common due to the
ubiquity of multi-core CPUs. FFTW's multi-threading support allows you
to utilize these additional CPUs transparently from a single program.
However, this does not necessarily translate into performance
gains--when multiple threads/CPUs are employed, there is an overhead
required for synchronization that may outweigh the computatational
parallelism. Therefore, you can only benefit from threads if your
problem is sufficiently large.
Menu