Info: (fftw3) FFTW MPI Fortran Interface

fftw3: Distributed-memory FFTW with MPI
fftw3: FFTW MPI Fortran Interface

 
 6.13 FFTW MPI Fortran Interface
 ===============================
 
 The FFTW MPI interface is callable from modern Fortran compilers
 supporting the Fortran 2003 'iso_c_binding' standard for calling C
 functions.  As described in Calling FFTW from Modern Fortran,
 this means that you can directly call FFTW's C interface from Fortran
 with only minor changes in syntax.  There are, however, a few things
 specific to the MPI interface to keep in mind:
 
    * Instead of including 'fftw3.f03' as in Overview of Fortran
      interface, you should 'include 'fftw3-mpi.f03'' (after 'use,
      intrinsic :: iso_c_binding' as before).  The 'fftw3-mpi.f03' file
      includes 'fftw3.f03', so you should _not_ 'include' them both
      yourself.  (You will also want to include the MPI header file,
      usually via 'include 'mpif.h'' or similar, although though this is
      not needed by 'fftw3-mpi.f03' per se.)  (To use the 'fftwl_' 'long
      double' extended-precision routines in supporting compilers, you
      should include 'fftw3f-mpi.f03' in _addition_ to 'fftw3-mpi.f03'.
      Extended and quadruple precision in Fortran.)
 
    * Because of the different storage conventions between C and Fortran,
      you reverse the order of your array dimensions when passing them to
      FFTW (Reversing array dimensions).  This is merely a
      difference in notation and incurs no performance overhead.
      However, it means that, whereas in C the _first_ dimension is
      distributed, in Fortran the _last_ dimension of your array is
      distributed.
 
    * In Fortran, communicators are stored as 'integer' types; there is
      no 'MPI_Comm' type, nor is there any way to access a C 'MPI_Comm'.
      Fortunately, this is taken care of for you by the FFTW Fortran
      interface: whenever the C interface expects an 'MPI_Comm' type, you
      should pass the Fortran communicator as an 'integer'.(1)
 
    * Because you need to call the 'local_size' function to find out how
      much space to allocate, and this may be _larger_ than the local
      portion of the array (MPI Data Distribution), you should
      _always_ allocate your arrays dynamically using FFTW's allocation
      routines as described in Allocating aligned memory in
      Fortran.  (Coincidentally, this also provides the best
      performance by guaranteeding proper data alignment.)
 
    * Because all sizes in the MPI FFTW interface are declared as
      'ptrdiff_t' in C, you should use 'integer(C_INTPTR_T)' in Fortran
      (FFTW Fortran type reference).
 
    * In Fortran, because of the language semantics, we generally
      recommend using the new-array execute functions for all plans, even
      in the common case where you are executing the plan on the same
      arrays for which the plan was created (Plan execution in
      Fortran).  However, note that in the MPI interface these
      functions are changed: 'fftw_execute_dft' becomes
      'fftw_mpi_execute_dft', etcetera.  Using MPI Plans.
 
    For example, here is a Fortran code snippet to perform a distributed
 L x M complex DFT in-place.  (This assumes you have already initialized
 MPI with 'MPI_init' and have also performed 'call fftw_mpi_init'.)
 
        use, intrinsic :: iso_c_binding
        include 'fftw3-mpi.f03'
        integer(C_INTPTR_T), parameter :: L = ...
        integer(C_INTPTR_T), parameter :: M = ...
        type(C_PTR) :: plan, cdata
        complex(C_DOUBLE_COMPLEX), pointer :: data(:,:)
        integer(C_INTPTR_T) :: i, j, alloc_local, local_M, local_j_offset
 
      !   get local data size and allocate (note dimension reversal)
        alloc_local = fftw_mpi_local_size_2d(M, L, MPI_COMM_WORLD, &
                                             local_M, local_j_offset)
        cdata = fftw_alloc_complex(alloc_local)
        call c_f_pointer(cdata, data, [L,local_M])
 
      !   create MPI plan for in-place forward DFT (note dimension reversal)
        plan = fftw_mpi_plan_dft_2d(M, L, data, data, MPI_COMM_WORLD, &
                                    FFTW_FORWARD, FFTW_MEASURE)
 
      ! initialize data to some function my_function(i,j)
        do j = 1, local_M
          do i = 1, L
            data(i, j) = my_function(i, j + local_j_offset)
          end do
        end do
 
      ! compute transform (as many times as desired)
        call fftw_mpi_execute_dft(plan, data, data)
 
        call fftw_destroy_plan(plan)
        call fftw_free(cdata)
 
    Note that when we called 'fftw_mpi_local_size_2d' and
 'fftw_mpi_plan_dft_2d' with the dimensions in reversed order, since a L
 x M Fortran array is viewed by FFTW in C as a M x L array.  This means
 that the array was distributed over the 'M' dimension, the local portion
 of which is a L x local_M array in Fortran.  (You must _not_ use an
 'allocate' statement to allocate an L x local_M array, however; you must
 allocate 'alloc_local' complex numbers, which may be greater than 'L *
 local_M', in order to reserve space for intermediate steps of the
 transform.)  Finally, we mention that because C's array indices are
 zero-based, the 'local_j_offset' argument can conveniently be
 interpreted as an offset in the 1-based 'j' index (rather than as a
 starting index as in C).
 
    If instead you had used the 'ior(FFTW_MEASURE,
 FFTW_MPI_TRANSPOSED_OUT)' flag, the output of the transform would be a
 transposed M x local_L array, associated with the _same_ 'cdata'
 allocation (since the transform is in-place), and which you could
 declare with:
 
        complex(C_DOUBLE_COMPLEX), pointer :: tdata(:,:)
        ...
        call c_f_pointer(cdata, tdata, [M,local_L])
 
    where 'local_L' would have been obtained by changing the
 'fftw_mpi_local_size_2d' call to:
 
        alloc_local = fftw_mpi_local_size_2d_transposed(M, L, MPI_COMM_WORLD, &
                                 local_M, local_j_offset, local_L, local_i_offset)
 
    ---------- Footnotes ----------
 
    (1) Technically, this is because you aren't actually calling the C
 functions directly.  You are calling wrapper functions that translate
 the communicator with 'MPI_Comm_f2c' before calling the ordinary C
 interface.  This is all done transparently, however, since the
 'fftw3-mpi.f03' interface file renames the wrappers so that they are
 called in Fortran with the same names as the C interface functions.
Info Catalog
fftw3: FFTW MPI Reference
fftw3: Distributed-memory FFTW with MPI