fftw3: FFTW MPI Fortran Interface
6.13 FFTW MPI Fortran Interface
===============================
The FFTW MPI interface is callable from modern Fortran compilers
supporting the Fortran 2003 'iso_c_binding' standard for calling C
functions. As described in
Calling FFTW from Modern Fortran,
this means that you can directly call FFTW's C interface from Fortran
with only minor changes in syntax. There are, however, a few things
specific to the MPI interface to keep in mind:
* Instead of including 'fftw3.f03' as in
Overview of Fortran
interface, you should 'include 'fftw3-mpi.f03'' (after 'use,
intrinsic :: iso_c_binding' as before). The 'fftw3-mpi.f03' file
includes 'fftw3.f03', so you should _not_ 'include' them both
yourself. (You will also want to include the MPI header file,
usually via 'include 'mpif.h'' or similar, although though this is
not needed by 'fftw3-mpi.f03' per se.) (To use the 'fftwl_' 'long
double' extended-precision routines in supporting compilers, you
should include 'fftw3f-mpi.f03' in _addition_ to 'fftw3-mpi.f03'.
Extended and quadruple precision in Fortran.)
* Because of the different storage conventions between C and Fortran,
you reverse the order of your array dimensions when passing them to
FFTW (
Reversing array dimensions). This is merely a
difference in notation and incurs no performance overhead.
However, it means that, whereas in C the _first_ dimension is
distributed, in Fortran the _last_ dimension of your array is
distributed.
* In Fortran, communicators are stored as 'integer' types; there is
no 'MPI_Comm' type, nor is there any way to access a C 'MPI_Comm'.
Fortunately, this is taken care of for you by the FFTW Fortran
interface: whenever the C interface expects an 'MPI_Comm' type, you
should pass the Fortran communicator as an 'integer'.(1)
* Because you need to call the 'local_size' function to find out how
much space to allocate, and this may be _larger_ than the local
portion of the array (
MPI Data Distribution), you should
_always_ allocate your arrays dynamically using FFTW's allocation
routines as described in
Allocating aligned memory in
Fortran. (Coincidentally, this also provides the best
performance by guaranteeding proper data alignment.)
* Because all sizes in the MPI FFTW interface are declared as
'ptrdiff_t' in C, you should use 'integer(C_INTPTR_T)' in Fortran
(
FFTW Fortran type reference).
* In Fortran, because of the language semantics, we generally
recommend using the new-array execute functions for all plans, even
in the common case where you are executing the plan on the same
arrays for which the plan was created (
Plan execution in
Fortran). However, note that in the MPI interface these
functions are changed: 'fftw_execute_dft' becomes
'fftw_mpi_execute_dft', etcetera.
Using MPI Plans.
For example, here is a Fortran code snippet to perform a distributed
L x M complex DFT in-place. (This assumes you have already initialized
MPI with 'MPI_init' and have also performed 'call fftw_mpi_init'.)
use, intrinsic :: iso_c_binding
include 'fftw3-mpi.f03'
integer(C_INTPTR_T), parameter :: L = ...
integer(C_INTPTR_T), parameter :: M = ...
type(C_PTR) :: plan, cdata
complex(C_DOUBLE_COMPLEX), pointer :: data(:,:)
integer(C_INTPTR_T) :: i, j, alloc_local, local_M, local_j_offset
! get local data size and allocate (note dimension reversal)
alloc_local = fftw_mpi_local_size_2d(M, L, MPI_COMM_WORLD, &
local_M, local_j_offset)
cdata = fftw_alloc_complex(alloc_local)
call c_f_pointer(cdata, data, [L,local_M])
! create MPI plan for in-place forward DFT (note dimension reversal)
plan = fftw_mpi_plan_dft_2d(M, L, data, data, MPI_COMM_WORLD, &
FFTW_FORWARD, FFTW_MEASURE)
! initialize data to some function my_function(i,j)
do j = 1, local_M
do i = 1, L
data(i, j) = my_function(i, j + local_j_offset)
end do
end do
! compute transform (as many times as desired)
call fftw_mpi_execute_dft(plan, data, data)
call fftw_destroy_plan(plan)
call fftw_free(cdata)
Note that when we called 'fftw_mpi_local_size_2d' and
'fftw_mpi_plan_dft_2d' with the dimensions in reversed order, since a L
x M Fortran array is viewed by FFTW in C as a M x L array. This means
that the array was distributed over the 'M' dimension, the local portion
of which is a L x local_M array in Fortran. (You must _not_ use an
'allocate' statement to allocate an L x local_M array, however; you must
allocate 'alloc_local' complex numbers, which may be greater than 'L *
local_M', in order to reserve space for intermediate steps of the
transform.) Finally, we mention that because C's array indices are
zero-based, the 'local_j_offset' argument can conveniently be
interpreted as an offset in the 1-based 'j' index (rather than as a
starting index as in C).
If instead you had used the 'ior(FFTW_MEASURE,
FFTW_MPI_TRANSPOSED_OUT)' flag, the output of the transform would be a
transposed M x local_L array, associated with the _same_ 'cdata'
allocation (since the transform is in-place), and which you could
declare with:
complex(C_DOUBLE_COMPLEX), pointer :: tdata(:,:)
...
call c_f_pointer(cdata, tdata, [M,local_L])
where 'local_L' would have been obtained by changing the
'fftw_mpi_local_size_2d' call to:
alloc_local = fftw_mpi_local_size_2d_transposed(M, L, MPI_COMM_WORLD, &
local_M, local_j_offset, local_L, local_i_offset)
---------- Footnotes ----------
(1) Technically, this is because you aren't actually calling the C
functions directly. You are calling wrapper functions that translate
the communicator with 'MPI_Comm_f2c' before calling the ordinary C
interface. This is all done transparently, however, since the
'fftw3-mpi.f03' interface file renames the wrappers so that they are
called in Fortran with the same names as the C interface functions.