fftw3: MPI Data Distribution Functions
6.12.4 MPI Data Distribution Functions
--------------------------------------
As described above (MPI Data Distribution), in order to allocate
your arrays, _before_ creating a plan, you must first call one of the
following routines to determine the required allocation size and the
portion of the array locally stored on a given process. The 'MPI_Comm'
communicator passed here must be equivalent to the communicator used
below for plan creation.
The basic interface for multidimensional transforms consists of the
functions:
ptrdiff_t fftw_mpi_local_size_2d(ptrdiff_t n0, ptrdiff_t n1, MPI_Comm comm,
ptrdiff_t *local_n0, ptrdiff_t *local_0_start);
ptrdiff_t fftw_mpi_local_size_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
MPI_Comm comm,
ptrdiff_t *local_n0, ptrdiff_t *local_0_start);
ptrdiff_t fftw_mpi_local_size(int rnk, const ptrdiff_t *n, MPI_Comm comm,
ptrdiff_t *local_n0, ptrdiff_t *local_0_start);
ptrdiff_t fftw_mpi_local_size_2d_transposed(ptrdiff_t n0, ptrdiff_t n1, MPI_Comm comm,
ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
ptrdiff_t fftw_mpi_local_size_3d_transposed(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
MPI_Comm comm,
ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
ptrdiff_t fftw_mpi_local_size_transposed(int rnk, const ptrdiff_t *n, MPI_Comm comm,
ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
These functions return the number of elements to allocate (complex
numbers for DFT/r2c/c2r plans, real numbers for r2r plans), whereas the
'local_n0' and 'local_0_start' return the portion ('local_0_start' to
'local_0_start + local_n0 - 1') of the first dimension of an n[0] x n[1]
x n[2] x ... x n[d-1] array that is stored on the local process.
Basic and advanced distribution interfaces. For
'FFTW_MPI_TRANSPOSED_OUT' plans, the '_transposed' variants are useful
in order to also return the local portion of the first dimension in the
n[1] x n[0] x n[2] x ... x n[d-1] transposed output. Transposed
distributions. The advanced interface for multidimensional transforms
is:
ptrdiff_t fftw_mpi_local_size_many(int rnk, const ptrdiff_t *n, ptrdiff_t howmany,
ptrdiff_t block0, MPI_Comm comm,
ptrdiff_t *local_n0, ptrdiff_t *local_0_start);
ptrdiff_t fftw_mpi_local_size_many_transposed(int rnk, const ptrdiff_t *n, ptrdiff_t howmany,
ptrdiff_t block0, ptrdiff_t block1, MPI_Comm comm,
ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
These differ from the basic interface in only two ways. First, they
allow you to specify block sizes 'block0' and 'block1' (the latter for
the transposed output); you can pass 'FFTW_MPI_DEFAULT_BLOCK' to use
FFTW's default block size as in the basic interface. Second, you can
pass a 'howmany' parameter, corresponding to the advanced planning
interface below: this is for transforms of contiguous 'howmany'-tuples
of numbers ('howmany = 1' in the basic interface).
The corresponding basic and advanced routines for one-dimensional
transforms (currently only complex DFTs) are:
ptrdiff_t fftw_mpi_local_size_1d(
ptrdiff_t n0, MPI_Comm comm, int sign, unsigned flags,
ptrdiff_t *local_ni, ptrdiff_t *local_i_start,
ptrdiff_t *local_no, ptrdiff_t *local_o_start);
ptrdiff_t fftw_mpi_local_size_many_1d(
ptrdiff_t n0, ptrdiff_t howmany,
MPI_Comm comm, int sign, unsigned flags,
ptrdiff_t *local_ni, ptrdiff_t *local_i_start,
ptrdiff_t *local_no, ptrdiff_t *local_o_start);
As above, the return value is the number of elements to allocate
(complex numbers, for complex DFTs). The 'local_ni' and 'local_i_start'
arguments return the portion ('local_i_start' to 'local_i_start +
local_ni - 1') of the 1d array that is stored on this process for the
transform _input_, and 'local_no' and 'local_o_start' are the
corresponding quantities for the input. The 'sign' ('FFTW_FORWARD' or
'FFTW_BACKWARD') and 'flags' must match the arguments passed when
creating a plan. Although the inputs and outputs have different data
distributions in general, it is guaranteed that the _output_ data
distribution of an 'FFTW_FORWARD' plan will match the _input_ data
distribution of an 'FFTW_BACKWARD' plan and vice versa; similarly for
the 'FFTW_MPI_SCRAMBLED_OUT' and 'FFTW_MPI_SCRAMBLED_IN' flags.
One-dimensional distributions.