fftw3: MPI Data Distribution Functions

 
 6.12.4 MPI Data Distribution Functions
 --------------------------------------
 
 As described above (SeeMPI Data Distribution), in order to allocate
 your arrays, _before_ creating a plan, you must first call one of the
 following routines to determine the required allocation size and the
 portion of the array locally stored on a given process.  The 'MPI_Comm'
 communicator passed here must be equivalent to the communicator used
 below for plan creation.
 
    The basic interface for multidimensional transforms consists of the
 functions:
 
      ptrdiff_t fftw_mpi_local_size_2d(ptrdiff_t n0, ptrdiff_t n1, MPI_Comm comm,
                                       ptrdiff_t *local_n0, ptrdiff_t *local_0_start);
      ptrdiff_t fftw_mpi_local_size_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
                                       MPI_Comm comm,
                                       ptrdiff_t *local_n0, ptrdiff_t *local_0_start);
      ptrdiff_t fftw_mpi_local_size(int rnk, const ptrdiff_t *n, MPI_Comm comm,
                                    ptrdiff_t *local_n0, ptrdiff_t *local_0_start);
 
      ptrdiff_t fftw_mpi_local_size_2d_transposed(ptrdiff_t n0, ptrdiff_t n1, MPI_Comm comm,
                                                  ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
                                                  ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
      ptrdiff_t fftw_mpi_local_size_3d_transposed(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
                                                  MPI_Comm comm,
                                                  ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
                                                  ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
      ptrdiff_t fftw_mpi_local_size_transposed(int rnk, const ptrdiff_t *n, MPI_Comm comm,
                                               ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
                                               ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
 
    These functions return the number of elements to allocate (complex
 numbers for DFT/r2c/c2r plans, real numbers for r2r plans), whereas the
 'local_n0' and 'local_0_start' return the portion ('local_0_start' to
 'local_0_start + local_n0 - 1') of the first dimension of an n[0] x n[1]
 x n[2] x ...  x n[d-1] array that is stored on the local process.  See
 Basic and advanced distribution interfaces.  For
 'FFTW_MPI_TRANSPOSED_OUT' plans, the '_transposed' variants are useful
 in order to also return the local portion of the first dimension in the
 n[1] x n[0] x n[2] x ...  x n[d-1] transposed output.  SeeTransposed
 distributions.  The advanced interface for multidimensional transforms
 is:
 
      ptrdiff_t fftw_mpi_local_size_many(int rnk, const ptrdiff_t *n, ptrdiff_t howmany,
                                         ptrdiff_t block0, MPI_Comm comm,
                                         ptrdiff_t *local_n0, ptrdiff_t *local_0_start);
      ptrdiff_t fftw_mpi_local_size_many_transposed(int rnk, const ptrdiff_t *n, ptrdiff_t howmany,
                                                    ptrdiff_t block0, ptrdiff_t block1, MPI_Comm comm,
                                                    ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
                                                    ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
 
    These differ from the basic interface in only two ways.  First, they
 allow you to specify block sizes 'block0' and 'block1' (the latter for
 the transposed output); you can pass 'FFTW_MPI_DEFAULT_BLOCK' to use
 FFTW's default block size as in the basic interface.  Second, you can
 pass a 'howmany' parameter, corresponding to the advanced planning
 interface below: this is for transforms of contiguous 'howmany'-tuples
 of numbers ('howmany = 1' in the basic interface).
 
    The corresponding basic and advanced routines for one-dimensional
 transforms (currently only complex DFTs) are:
 
      ptrdiff_t fftw_mpi_local_size_1d(
                   ptrdiff_t n0, MPI_Comm comm, int sign, unsigned flags,
                   ptrdiff_t *local_ni, ptrdiff_t *local_i_start,
                   ptrdiff_t *local_no, ptrdiff_t *local_o_start);
      ptrdiff_t fftw_mpi_local_size_many_1d(
                   ptrdiff_t n0, ptrdiff_t howmany,
                   MPI_Comm comm, int sign, unsigned flags,
                   ptrdiff_t *local_ni, ptrdiff_t *local_i_start,
                   ptrdiff_t *local_no, ptrdiff_t *local_o_start);
 
    As above, the return value is the number of elements to allocate
 (complex numbers, for complex DFTs).  The 'local_ni' and 'local_i_start'
 arguments return the portion ('local_i_start' to 'local_i_start +
 local_ni - 1') of the 1d array that is stored on this process for the
 transform _input_, and 'local_no' and 'local_o_start' are the
 corresponding quantities for the input.  The 'sign' ('FFTW_FORWARD' or
 'FFTW_BACKWARD') and 'flags' must match the arguments passed when
 creating a plan.  Although the inputs and outputs have different data
 distributions in general, it is guaranteed that the _output_ data
 distribution of an 'FFTW_FORWARD' plan will match the _input_ data
 distribution of an 'FFTW_BACKWARD' plan and vice versa; similarly for
 the 'FFTW_MPI_SCRAMBLED_OUT' and 'FFTW_MPI_SCRAMBLED_IN' flags.  See
 One-dimensional distributions.