fftw3: FFTW MPI Fortran Interface
6.13 FFTW MPI Fortran Interface
===============================
The FFTW MPI interface is callable from modern Fortran compilers
supporting the Fortran 2003 'iso_c_binding' standard for calling C
functions. As described in Calling FFTW from Modern Fortran,
this means that you can directly call FFTW's C interface from Fortran
with only minor changes in syntax. There are, however, a few things
specific to the MPI interface to keep in mind:
* Instead of including 'fftw3.f03' as in Overview of Fortran
interface, you should 'include 'fftw3-mpi.f03'' (after 'use,
intrinsic :: iso_c_binding' as before). The 'fftw3-mpi.f03' file
includes 'fftw3.f03', so you should _not_ 'include' them both
yourself. (You will also want to include the MPI header file,
usually via 'include 'mpif.h'' or similar, although though this is
not needed by 'fftw3-mpi.f03' per se.) (To use the 'fftwl_' 'long
double' extended-precision routines in supporting compilers, you
should include 'fftw3f-mpi.f03' in _addition_ to 'fftw3-mpi.f03'.
Extended and quadruple precision in Fortran.)
* Because of the different storage conventions between C and Fortran,
you reverse the order of your array dimensions when passing them to
FFTW (Reversing array dimensions). This is merely a
difference in notation and incurs no performance overhead.
However, it means that, whereas in C the _first_ dimension is
distributed, in Fortran the _last_ dimension of your array is
distributed.
* In Fortran, communicators are stored as 'integer' types; there is
no 'MPI_Comm' type, nor is there any way to access a C 'MPI_Comm'.
Fortunately, this is taken care of for you by the FFTW Fortran
interface: whenever the C interface expects an 'MPI_Comm' type, you
should pass the Fortran communicator as an 'integer'.(1)
* Because you need to call the 'local_size' function to find out how
much space to allocate, and this may be _larger_ than the local
portion of the array (MPI Data Distribution), you should
_always_ allocate your arrays dynamically using FFTW's allocation
routines as described in Allocating aligned memory in
Fortran. (Coincidentally, this also provides the best
performance by guaranteeding proper data alignment.)
* Because all sizes in the MPI FFTW interface are declared as
'ptrdiff_t' in C, you should use 'integer(C_INTPTR_T)' in Fortran
(FFTW Fortran type reference).
* In Fortran, because of the language semantics, we generally
recommend using the new-array execute functions for all plans, even
in the common case where you are executing the plan on the same
arrays for which the plan was created (Plan execution in
Fortran). However, note that in the MPI interface these
functions are changed: 'fftw_execute_dft' becomes
'fftw_mpi_execute_dft', etcetera. Using MPI Plans.
For example, here is a Fortran code snippet to perform a distributed
L x M complex DFT in-place. (This assumes you have already initialized
MPI with 'MPI_init' and have also performed 'call fftw_mpi_init'.)
use, intrinsic :: iso_c_binding
include 'fftw3-mpi.f03'
integer(C_INTPTR_T), parameter :: L = ...
integer(C_INTPTR_T), parameter :: M = ...
type(C_PTR) :: plan, cdata
complex(C_DOUBLE_COMPLEX), pointer :: data(:,:)
integer(C_INTPTR_T) :: i, j, alloc_local, local_M, local_j_offset
! get local data size and allocate (note dimension reversal)
alloc_local = fftw_mpi_local_size_2d(M, L, MPI_COMM_WORLD, &
local_M, local_j_offset)
cdata = fftw_alloc_complex(alloc_local)
call c_f_pointer(cdata, data, [L,local_M])
! create MPI plan for in-place forward DFT (note dimension reversal)
plan = fftw_mpi_plan_dft_2d(M, L, data, data, MPI_COMM_WORLD, &
FFTW_FORWARD, FFTW_MEASURE)
! initialize data to some function my_function(i,j)
do j = 1, local_M
do i = 1, L
data(i, j) = my_function(i, j + local_j_offset)
end do
end do
! compute transform (as many times as desired)
call fftw_mpi_execute_dft(plan, data, data)
call fftw_destroy_plan(plan)
call fftw_free(cdata)
Note that when we called 'fftw_mpi_local_size_2d' and
'fftw_mpi_plan_dft_2d' with the dimensions in reversed order, since a L
x M Fortran array is viewed by FFTW in C as a M x L array. This means
that the array was distributed over the 'M' dimension, the local portion
of which is a L x local_M array in Fortran. (You must _not_ use an
'allocate' statement to allocate an L x local_M array, however; you must
allocate 'alloc_local' complex numbers, which may be greater than 'L *
local_M', in order to reserve space for intermediate steps of the
transform.) Finally, we mention that because C's array indices are
zero-based, the 'local_j_offset' argument can conveniently be
interpreted as an offset in the 1-based 'j' index (rather than as a
starting index as in C).
If instead you had used the 'ior(FFTW_MEASURE,
FFTW_MPI_TRANSPOSED_OUT)' flag, the output of the transform would be a
transposed M x local_L array, associated with the _same_ 'cdata'
allocation (since the transform is in-place), and which you could
declare with:
complex(C_DOUBLE_COMPLEX), pointer :: tdata(:,:)
...
call c_f_pointer(cdata, tdata, [M,local_L])
where 'local_L' would have been obtained by changing the
'fftw_mpi_local_size_2d' call to:
alloc_local = fftw_mpi_local_size_2d_transposed(M, L, MPI_COMM_WORLD, &
local_M, local_j_offset, local_L, local_i_offset)
---------- Footnotes ----------
(1) Technically, this is because you aren't actually calling the C
functions directly. You are calling wrapper functions that translate
the communicator with 'MPI_Comm_f2c' before calling the ordinary C
interface. This is all done transparently, however, since the
'fftw3-mpi.f03' interface file renames the wrappers so that they are
called in Fortran with the same names as the C interface functions.