fftw3: MPI Plan Creation

 
 6.12.5 MPI Plan Creation
 ------------------------
 
 Complex-data MPI DFTs
 .....................
 
 Plans for complex-data DFTs (See2d MPI example) are created by:
 
      fftw_plan fftw_mpi_plan_dft_1d(ptrdiff_t n0, fftw_complex *in, fftw_complex *out,
                                     MPI_Comm comm, int sign, unsigned flags);
      fftw_plan fftw_mpi_plan_dft_2d(ptrdiff_t n0, ptrdiff_t n1,
                                     fftw_complex *in, fftw_complex *out,
                                     MPI_Comm comm, int sign, unsigned flags);
      fftw_plan fftw_mpi_plan_dft_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
                                     fftw_complex *in, fftw_complex *out,
                                     MPI_Comm comm, int sign, unsigned flags);
      fftw_plan fftw_mpi_plan_dft(int rnk, const ptrdiff_t *n,
                                  fftw_complex *in, fftw_complex *out,
                                  MPI_Comm comm, int sign, unsigned flags);
      fftw_plan fftw_mpi_plan_many_dft(int rnk, const ptrdiff_t *n,
                                       ptrdiff_t howmany, ptrdiff_t block, ptrdiff_t tblock,
                                       fftw_complex *in, fftw_complex *out,
                                       MPI_Comm comm, int sign, unsigned flags);
 
    These are similar to their serial counterparts (SeeComplex DFTs)
 in specifying the dimensions, sign, and flags of the transform.  The
 'comm' argument gives an MPI communicator that specifies the set of
 processes to participate in the transform; plan creation is a collective
 function that must be called for all processes in the communicator.  The
 'in' and 'out' pointers refer only to a portion of the overall transform
 data (SeeMPI Data Distribution) as specified by the 'local_size'
 functions in the previous section.  Unless 'flags' contains
 'FFTW_ESTIMATE', these arrays are overwritten during plan creation as
 for the serial interface.  For multi-dimensional transforms, any
 dimensions '> 1' are supported; for one-dimensional transforms, only
 composite (non-prime) 'n0' are currently supported (unlike the serial
 FFTW). Requesting an unsupported transform size will yield a 'NULL'
 plan.  (As in the serial interface, highly composite sizes generally
 yield the best performance.)
 
    The advanced-interface 'fftw_mpi_plan_many_dft' additionally allows
 you to specify the block sizes for the first dimension ('block') of the
 n[0] x n[1] x n[2] x ...  x n[d-1] input data and the first dimension
 ('tblock') of the n[1] x n[0] x n[2] x ...  x n[d-1] transposed data (at
 intermediate steps of the transform, and for the output if
 'FFTW_TRANSPOSED_OUT' is specified in 'flags').  These must be the same
 block sizes as were passed to the corresponding 'local_size' function;
 you can pass 'FFTW_MPI_DEFAULT_BLOCK' to use FFTW's default block size
 as in the basic interface.  Also, the 'howmany' parameter specifies that
 the transform is of contiguous 'howmany'-tuples rather than individual
 complex numbers; this corresponds to the same parameter in the serial
 advanced interface (SeeAdvanced Complex DFTs) with 'stride =
 howmany' and 'dist = 1'.
 
 MPI flags
 .........
 
 The 'flags' can be any of those for the serial FFTW (SeePlanner
 Flags), and in addition may include one or more of the following
 MPI-specific flags, which improve performance at the cost of changing
 the output or input data formats.
 
    * 'FFTW_MPI_SCRAMBLED_OUT', 'FFTW_MPI_SCRAMBLED_IN': valid for 1d
      transforms only, these flags indicate that the output/input of the
      transform are in an undocumented "scrambled" order.  A forward
      'FFTW_MPI_SCRAMBLED_OUT' transform can be inverted by a backward
      'FFTW_MPI_SCRAMBLED_IN' (times the usual 1/N normalization).  See
      One-dimensional distributions.
 
    * 'FFTW_MPI_TRANSPOSED_OUT', 'FFTW_MPI_TRANSPOSED_IN': valid for
      multidimensional ('rnk > 1') transforms only, these flags specify
      that the output or input of an n[0] x n[1] x n[2] x ...  x n[d-1]
      transform is transposed to n[1] x n[0] x n[2] x ...  x n[d-1] .
      SeeTransposed distributions.
 
 Real-data MPI DFTs
 ..................
 
 DFTs of Real Data::) are created by:
 
      fftw_plan fftw_mpi_plan_dft_r2c_2d(ptrdiff_t n0, ptrdiff_t n1,
                                         double *in, fftw_complex *out,
                                         MPI_Comm comm, unsigned flags);
      fftw_plan fftw_mpi_plan_dft_r2c_2d(ptrdiff_t n0, ptrdiff_t n1,
                                         double *in, fftw_complex *out,
                                         MPI_Comm comm, unsigned flags);
      fftw_plan fftw_mpi_plan_dft_r2c_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
                                         double *in, fftw_complex *out,
                                         MPI_Comm comm, unsigned flags);
      fftw_plan fftw_mpi_plan_dft_r2c(int rnk, const ptrdiff_t *n,
                                      double *in, fftw_complex *out,
                                      MPI_Comm comm, unsigned flags);
      fftw_plan fftw_mpi_plan_dft_c2r_2d(ptrdiff_t n0, ptrdiff_t n1,
                                         fftw_complex *in, double *out,
                                         MPI_Comm comm, unsigned flags);
      fftw_plan fftw_mpi_plan_dft_c2r_2d(ptrdiff_t n0, ptrdiff_t n1,
                                         fftw_complex *in, double *out,
                                         MPI_Comm comm, unsigned flags);
      fftw_plan fftw_mpi_plan_dft_c2r_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
                                         fftw_complex *in, double *out,
                                         MPI_Comm comm, unsigned flags);
      fftw_plan fftw_mpi_plan_dft_c2r(int rnk, const ptrdiff_t *n,
                                      fftw_complex *in, double *out,
                                      MPI_Comm comm, unsigned flags);
 
    Similar to the serial interface (SeeReal-data DFTs), these
 transform logically n[0] x n[1] x n[2] x ...  x n[d-1] real data to/from
 n[0] x n[1] x n[2] x ...  x (n[d-1]/2 + 1) complex data, representing
 the non-redundant half of the conjugate-symmetry output of a real-input
 DFT (SeeMulti-dimensional Transforms).  However, the real array
 must be stored within a padded n[0] x n[1] x n[2] x ...  x [2 (n[d-1]/2
 + 1)] array (much like the in-place serial r2c transforms, but here for
 out-of-place transforms as well).  Currently, only multi-dimensional
 ('rnk > 1') r2c/c2r transforms are supported (requesting a plan for 'rnk
 = 1' will yield 'NULL').  As explained above (SeeMulti-dimensional
 MPI DFTs of Real Data), the data distribution of both the real and
 complex arrays is given by the 'local_size' function called for the
 dimensions of the _complex_ array.  Similar to the other planning
 functions, the input and output arrays are overwritten when the plan is
 created except in 'FFTW_ESTIMATE' mode.
 
    As for the complex DFTs above, there is an advance interface that
 allows you to manually specify block sizes and to transform contiguous
 'howmany'-tuples of real/complex numbers:
 
      fftw_plan fftw_mpi_plan_many_dft_r2c
                    (int rnk, const ptrdiff_t *n, ptrdiff_t howmany,
                     ptrdiff_t iblock, ptrdiff_t oblock,
                     double *in, fftw_complex *out,
                     MPI_Comm comm, unsigned flags);
      fftw_plan fftw_mpi_plan_many_dft_c2r
                    (int rnk, const ptrdiff_t *n, ptrdiff_t howmany,
                     ptrdiff_t iblock, ptrdiff_t oblock,
                     fftw_complex *in, double *out,
                     MPI_Comm comm, unsigned flags);
 
 MPI r2r transforms
 ..................
 
 There are corresponding plan-creation routines for r2r transforms (See
 More DFTs of Real Data), currently supporting multidimensional ('rnk >
 1') transforms only ('rnk = 1' will yield a 'NULL' plan):
 
      fftw_plan fftw_mpi_plan_r2r_2d(ptrdiff_t n0, ptrdiff_t n1,
                                     double *in, double *out,
                                     MPI_Comm comm,
                                     fftw_r2r_kind kind0, fftw_r2r_kind kind1,
                                     unsigned flags);
      fftw_plan fftw_mpi_plan_r2r_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
                                     double *in, double *out,
                                     MPI_Comm comm,
                                     fftw_r2r_kind kind0, fftw_r2r_kind kind1, fftw_r2r_kind kind2,
                                     unsigned flags);
      fftw_plan fftw_mpi_plan_r2r(int rnk, const ptrdiff_t *n,
                                  double *in, double *out,
                                  MPI_Comm comm, const fftw_r2r_kind *kind,
                                  unsigned flags);
      fftw_plan fftw_mpi_plan_many_r2r(int rnk, const ptrdiff_t *n,
                                       ptrdiff_t iblock, ptrdiff_t oblock,
                                       double *in, double *out,
                                       MPI_Comm comm, const fftw_r2r_kind *kind,
                                       unsigned flags);
 
    The parameters are much the same as for the complex DFTs above,
 except that the arrays are of real numbers (and hence the outputs of the
 'local_size' data-distribution functions should be interpreted as counts
 of real rather than complex numbers).  Also, the 'kind' parameters
 specify the r2r kinds along each dimension as for the serial interface
DONTPRINTYET  (SeeReal-to-Real Transform Kinds).  *NoteOther Multi-dimensional
DONTPRINTYET  (SeeReal-to-Real Transform Kinds).  SeeOther Multi-dimensional

 Real-data MPI Transforms.
 
 MPI transposition
 .................
 
 FFTW also provides routines to plan a transpose of a distributed 'n0' by
 'n1' array of real numbers, or an array of 'howmany'-tuples of real
 numbers with specified block sizes (SeeFFTW MPI Transposes):
 
      fftw_plan fftw_mpi_plan_transpose(ptrdiff_t n0, ptrdiff_t n1,
                                        double *in, double *out,
                                        MPI_Comm comm, unsigned flags);
      fftw_plan fftw_mpi_plan_many_transpose
                      (ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t howmany,
                       ptrdiff_t block0, ptrdiff_t block1,
                       double *in, double *out, MPI_Comm comm, unsigned flags);
 
    These plans are used with the 'fftw_mpi_execute_r2r' new-array
 execute function (SeeUsing MPI Plans), since they count as (rank
 zero) r2r plans from FFTW's perspective.