fftw3: Multi-dimensional MPI DFTs of Real Data
6.5 Multi-dimensional MPI DFTs of Real Data
===========================================
FFTW's MPI interface also supports multi-dimensional DFTs of real data,
similar to the serial r2c and c2r interfaces. (Parallel one-dimensional
real-data DFTs are not currently supported; you must use a complex
transform and set the imaginary parts of the inputs to zero.)
The key points to understand for r2c and c2r MPI transforms (compared
to the MPI complex DFTs or the serial r2c/c2r transforms), are:
* Just as for serial transforms, r2c/c2r DFTs transform n[0] x n[1] x
n[2] x ... x n[d-1] real data to/from n[0] x n[1] x n[2] x ... x
(n[d-1]/2 + 1) complex data: the last dimension of the complex data
is cut in half (rounded down), plus one. As for the serial
transforms, the sizes you pass to the 'plan_dft_r2c' and
'plan_dft_c2r' are the n[0] x n[1] x n[2] x ... x n[d-1]
dimensions of the real data.
* Although the real data is _conceptually_ n[0] x n[1] x n[2] x ...
x n[d-1] , it is _physically_ stored as an n[0] x n[1] x n[2] x ...
x [2 (n[d-1]/2 + 1)] array, where the last dimension has been
_padded_ to make it the same size as the complex output. This is
much like the in-place serial r2c/c2r interface (
Multi-Dimensional DFTs of Real Data), except that in MPI the
padding is required even for out-of-place data. The extra padding
numbers are ignored by FFTW (they are _not_ like zero-padding the
transform to a larger size); they are only used to determine the
data layout.
* The data distribution in MPI for _both_ the real and complex data
is determined by the shape of the _complex_ data. That is, you
call the appropriate 'local size' function for the n[0] x n[1] x
n[2] x ... x (n[d-1]/2 + 1) complex data, and then use the _same_
distribution for the real data except that the last complex
dimension is replaced by a (padded) real dimension of twice the
length.
For example suppose we are performing an out-of-place r2c transform
of L x M x N real data [padded to L x M x 2(N/2+1) ], resulting in L x M
x N/2+1 complex data. Similar to the example in 2d MPI example,
we might do something like:
#include <fftw3-mpi.h>
int main(int argc, char **argv)
{
const ptrdiff_t L = ..., M = ..., N = ...;
fftw_plan plan;
double *rin;
fftw_complex *cout;
ptrdiff_t alloc_local, local_n0, local_0_start, i, j, k;
MPI_Init(&argc, &argv);
fftw_mpi_init();
/* get local data size and allocate */
alloc_local = fftw_mpi_local_size_3d(L, M, N/2+1, MPI_COMM_WORLD,
&local_n0, &local_0_start);
rin = fftw_alloc_real(2 * alloc_local);
cout = fftw_alloc_complex(alloc_local);
/* create plan for out-of-place r2c DFT */
plan = fftw_mpi_plan_dft_r2c_3d(L, M, N, rin, cout, MPI_COMM_WORLD,
FFTW_MEASURE);
/* initialize rin to some function my_func(x,y,z) */
for (i = 0; i < local_n0; ++i)
for (j = 0; j < M; ++j)
for (k = 0; k < N; ++k)
rin[(i*M + j) * (2*(N/2+1)) + k] = my_func(local_0_start+i, j, k);
/* compute transforms as many times as desired */
fftw_execute(plan);
fftw_destroy_plan(plan);
MPI_Finalize();
}
Note that we allocated 'rin' using 'fftw_alloc_real' with an argument
of '2 * alloc_local': since 'alloc_local' is the number of _complex_
values to allocate, the number of _real_ values is twice as many. The
'rin' array is then local_n0 x M x 2(N/2+1) in row-major order, so its
'(i,j,k)' element is at the index '(i*M + j) * (2*(N/2+1)) + k' (
Multi-dimensional Array Format).
As for the complex transforms, improved performance can be obtained
by specifying that the output is the transpose of the input or vice
versa (Transposed distributions). In our L x M x N r2c example,
including 'FFTW_TRANSPOSED_OUT' in the flags means that the input would
be a padded L x M x 2(N/2+1) real array distributed over the 'L'
dimension, while the output would be a M x L x N/2+1 complex array
distributed over the 'M' dimension. To perform the inverse c2r
transform with the same data distributions, you would use the
'FFTW_TRANSPOSED_IN' flag.