Info: (fftw3) Basic distributed-transpose interface

Info Catalog

fftw3: FFTW MPI Transposes

fftw3: Advanced distributed-transpose interface

fftw3: Basic distributed-transpose interface

 
 6.7.1 Basic distributed-transpose interface
 -------------------------------------------
 
 In particular, suppose that we have an 'n0' by 'n1' array in row-major
 order, block-distributed across the 'n0' dimension.  To transpose this
 into an 'n1' by 'n0' array block-distributed across the 'n1' dimension,
 we would create a plan by calling the following function:
 
      fftw_plan fftw_mpi_plan_transpose(ptrdiff_t n0, ptrdiff_t n1,
                                        double *in, double *out,
                                        MPI_Comm comm, unsigned flags);
 
    The input and output arrays ('in' and 'out') can be the same.  The
 transpose is actually executed by calling 'fftw_execute' on the plan, as
 usual.
 
    The 'flags' are the usual FFTW planner flags, but support two
 additional flags: 'FFTW_MPI_TRANSPOSED_OUT' and/or
 'FFTW_MPI_TRANSPOSED_IN'.  What these flags indicate, for transpose
 plans, is that the output and/or input, respectively, are _locally_
 transposed.  That is, on each process input data is normally stored as a
 'local_n0' by 'n1' array in row-major order, but for an
 'FFTW_MPI_TRANSPOSED_IN' plan the input data is stored as 'n1' by
 'local_n0' in row-major order.  Similarly, 'FFTW_MPI_TRANSPOSED_OUT'
 means that the output is 'n0' by 'local_n1' instead of 'local_n1' by
 'n0'.
 
    To determine the local size of the array on each process before and
 after the transpose, as well as the amount of storage that must be
 allocated, one should call 'fftw_mpi_local_size_2d_transposed', just as
 for a 2d DFT as described in the previous section:
 
      ptrdiff_t fftw_mpi_local_size_2d_transposed
                      (ptrdiff_t n0, ptrdiff_t n1, MPI_Comm comm,
                       ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
                       ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
 
    Again, the return value is the local storage to allocate, which in
 this case is the number of _real_ ('double') values rather than complex
 numbers as in the previous examples.

Info Catalog

fftw3: FFTW MPI Transposes

fftw3: Advanced distributed-transpose interface