fftw3: MPI Plan Creation
6.12.5 MPI Plan Creation
------------------------
Complex-data MPI DFTs
.....................
Plans for complex-data DFTs (
2d MPI example) are created by:
fftw_plan fftw_mpi_plan_dft_1d(ptrdiff_t n0, fftw_complex *in, fftw_complex *out,
MPI_Comm comm, int sign, unsigned flags);
fftw_plan fftw_mpi_plan_dft_2d(ptrdiff_t n0, ptrdiff_t n1,
fftw_complex *in, fftw_complex *out,
MPI_Comm comm, int sign, unsigned flags);
fftw_plan fftw_mpi_plan_dft_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
fftw_complex *in, fftw_complex *out,
MPI_Comm comm, int sign, unsigned flags);
fftw_plan fftw_mpi_plan_dft(int rnk, const ptrdiff_t *n,
fftw_complex *in, fftw_complex *out,
MPI_Comm comm, int sign, unsigned flags);
fftw_plan fftw_mpi_plan_many_dft(int rnk, const ptrdiff_t *n,
ptrdiff_t howmany, ptrdiff_t block, ptrdiff_t tblock,
fftw_complex *in, fftw_complex *out,
MPI_Comm comm, int sign, unsigned flags);
These are similar to their serial counterparts (
Complex DFTs)
in specifying the dimensions, sign, and flags of the transform. The
'comm' argument gives an MPI communicator that specifies the set of
processes to participate in the transform; plan creation is a collective
function that must be called for all processes in the communicator. The
'in' and 'out' pointers refer only to a portion of the overall transform
data (
MPI Data Distribution) as specified by the 'local_size'
functions in the previous section. Unless 'flags' contains
'FFTW_ESTIMATE', these arrays are overwritten during plan creation as
for the serial interface. For multi-dimensional transforms, any
dimensions '> 1' are supported; for one-dimensional transforms, only
composite (non-prime) 'n0' are currently supported (unlike the serial
FFTW). Requesting an unsupported transform size will yield a 'NULL'
plan. (As in the serial interface, highly composite sizes generally
yield the best performance.)
The advanced-interface 'fftw_mpi_plan_many_dft' additionally allows
you to specify the block sizes for the first dimension ('block') of the
n[0] x n[1] x n[2] x ... x n[d-1] input data and the first dimension
('tblock') of the n[1] x n[0] x n[2] x ... x n[d-1] transposed data (at
intermediate steps of the transform, and for the output if
'FFTW_TRANSPOSED_OUT' is specified in 'flags'). These must be the same
block sizes as were passed to the corresponding 'local_size' function;
you can pass 'FFTW_MPI_DEFAULT_BLOCK' to use FFTW's default block size
as in the basic interface. Also, the 'howmany' parameter specifies that
the transform is of contiguous 'howmany'-tuples rather than individual
complex numbers; this corresponds to the same parameter in the serial
advanced interface (
Advanced Complex DFTs) with 'stride =
howmany' and 'dist = 1'.
MPI flags
.........
The 'flags' can be any of those for the serial FFTW (
Planner
Flags), and in addition may include one or more of the following
MPI-specific flags, which improve performance at the cost of changing
the output or input data formats.
* 'FFTW_MPI_SCRAMBLED_OUT', 'FFTW_MPI_SCRAMBLED_IN': valid for 1d
transforms only, these flags indicate that the output/input of the
transform are in an undocumented "scrambled" order. A forward
'FFTW_MPI_SCRAMBLED_OUT' transform can be inverted by a backward
'FFTW_MPI_SCRAMBLED_IN' (times the usual 1/N normalization).
One-dimensional distributions.
* 'FFTW_MPI_TRANSPOSED_OUT', 'FFTW_MPI_TRANSPOSED_IN': valid for
multidimensional ('rnk > 1') transforms only, these flags specify
that the output or input of an n[0] x n[1] x n[2] x ... x n[d-1]
transform is transposed to n[1] x n[0] x n[2] x ... x n[d-1] .
Transposed distributions.
Real-data MPI DFTs
..................
DFTs of Real Data::) are created by:
fftw_plan fftw_mpi_plan_dft_r2c_2d(ptrdiff_t n0, ptrdiff_t n1,
double *in, fftw_complex *out,
MPI_Comm comm, unsigned flags);
fftw_plan fftw_mpi_plan_dft_r2c_2d(ptrdiff_t n0, ptrdiff_t n1,
double *in, fftw_complex *out,
MPI_Comm comm, unsigned flags);
fftw_plan fftw_mpi_plan_dft_r2c_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
double *in, fftw_complex *out,
MPI_Comm comm, unsigned flags);
fftw_plan fftw_mpi_plan_dft_r2c(int rnk, const ptrdiff_t *n,
double *in, fftw_complex *out,
MPI_Comm comm, unsigned flags);
fftw_plan fftw_mpi_plan_dft_c2r_2d(ptrdiff_t n0, ptrdiff_t n1,
fftw_complex *in, double *out,
MPI_Comm comm, unsigned flags);
fftw_plan fftw_mpi_plan_dft_c2r_2d(ptrdiff_t n0, ptrdiff_t n1,
fftw_complex *in, double *out,
MPI_Comm comm, unsigned flags);
fftw_plan fftw_mpi_plan_dft_c2r_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
fftw_complex *in, double *out,
MPI_Comm comm, unsigned flags);
fftw_plan fftw_mpi_plan_dft_c2r(int rnk, const ptrdiff_t *n,
fftw_complex *in, double *out,
MPI_Comm comm, unsigned flags);
Similar to the serial interface (
Real-data DFTs), these
transform logically n[0] x n[1] x n[2] x ... x n[d-1] real data to/from
n[0] x n[1] x n[2] x ... x (n[d-1]/2 + 1) complex data, representing
the non-redundant half of the conjugate-symmetry output of a real-input
DFT (
Multi-dimensional Transforms). However, the real array
must be stored within a padded n[0] x n[1] x n[2] x ... x [2 (n[d-1]/2
+ 1)] array (much like the in-place serial r2c transforms, but here for
out-of-place transforms as well). Currently, only multi-dimensional
('rnk > 1') r2c/c2r transforms are supported (requesting a plan for 'rnk
= 1' will yield 'NULL'). As explained above (
Multi-dimensional
MPI DFTs of Real Data), the data distribution of both the real and
complex arrays is given by the 'local_size' function called for the
dimensions of the _complex_ array. Similar to the other planning
functions, the input and output arrays are overwritten when the plan is
created except in 'FFTW_ESTIMATE' mode.
As for the complex DFTs above, there is an advance interface that
allows you to manually specify block sizes and to transform contiguous
'howmany'-tuples of real/complex numbers:
fftw_plan fftw_mpi_plan_many_dft_r2c
(int rnk, const ptrdiff_t *n, ptrdiff_t howmany,
ptrdiff_t iblock, ptrdiff_t oblock,
double *in, fftw_complex *out,
MPI_Comm comm, unsigned flags);
fftw_plan fftw_mpi_plan_many_dft_c2r
(int rnk, const ptrdiff_t *n, ptrdiff_t howmany,
ptrdiff_t iblock, ptrdiff_t oblock,
fftw_complex *in, double *out,
MPI_Comm comm, unsigned flags);
MPI r2r transforms
..................
There are corresponding plan-creation routines for r2r transforms (
More DFTs of Real Data), currently supporting multidimensional ('rnk >
1') transforms only ('rnk = 1' will yield a 'NULL' plan):
fftw_plan fftw_mpi_plan_r2r_2d(ptrdiff_t n0, ptrdiff_t n1,
double *in, double *out,
MPI_Comm comm,
fftw_r2r_kind kind0, fftw_r2r_kind kind1,
unsigned flags);
fftw_plan fftw_mpi_plan_r2r_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
double *in, double *out,
MPI_Comm comm,
fftw_r2r_kind kind0, fftw_r2r_kind kind1, fftw_r2r_kind kind2,
unsigned flags);
fftw_plan fftw_mpi_plan_r2r(int rnk, const ptrdiff_t *n,
double *in, double *out,
MPI_Comm comm, const fftw_r2r_kind *kind,
unsigned flags);
fftw_plan fftw_mpi_plan_many_r2r(int rnk, const ptrdiff_t *n,
ptrdiff_t iblock, ptrdiff_t oblock,
double *in, double *out,
MPI_Comm comm, const fftw_r2r_kind *kind,
unsigned flags);
The parameters are much the same as for the complex DFTs above,
except that the arrays are of real numbers (and hence the outputs of the
'local_size' data-distribution functions should be interpreted as counts
of real rather than complex numbers). Also, the 'kind' parameters
specify the r2r kinds along each dimension as for the serial interface
DONTPRINTYET (
Real-to-Real Transform Kinds). *NoteOther Multi-dimensional
DONTPRINTYET (
Real-to-Real Transform Kinds).
Other Multi-dimensional
Real-data MPI Transforms.
MPI transposition
.................
FFTW also provides routines to plan a transpose of a distributed 'n0' by
'n1' array of real numbers, or an array of 'howmany'-tuples of real
numbers with specified block sizes (
FFTW MPI Transposes):
fftw_plan fftw_mpi_plan_transpose(ptrdiff_t n0, ptrdiff_t n1,
double *in, double *out,
MPI_Comm comm, unsigned flags);
fftw_plan fftw_mpi_plan_many_transpose
(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t howmany,
ptrdiff_t block0, ptrdiff_t block1,
double *in, double *out, MPI_Comm comm, unsigned flags);
These plans are used with the 'fftw_mpi_execute_r2r' new-array
execute function (
Using MPI Plans), since they count as (rank
zero) r2r plans from FFTW's perspective.