fftw3: SIMD alignment and fftw_malloc

 
 3.1 SIMD alignment and fftw_malloc
 ==================================
 
 SIMD, which stands for "Single Instruction Multiple Data," is a set of
 special operations supported by some processors to perform a single
 operation on several numbers (usually 2 or 4) simultaneously.  SIMD
 floating-point instructions are available on several popular CPUs:
 SSE/SSE2/AVX/AVX2/AVX512/KCVI on some x86/x86-64 processors, AltiVec and
 VSX on some POWER/PowerPCs, NEON on some ARM models.  FFTW can be
 compiled to support the SIMD instructions on any of these systems.
 
    A program linking to an FFTW library compiled with SIMD support can
 obtain a nonnegligible speedup for most complex and r2c/c2r transforms.
 In order to obtain this speedup, however, the arrays of complex (or
 real) data passed to FFTW must be specially aligned in memory (typically
 16-byte aligned), and often this alignment is more stringent than that
 provided by the usual 'malloc' (etc.)  allocation routines.
 
    In order to guarantee proper alignment for SIMD, therefore, in case
 your program is ever linked against a SIMD-using FFTW, we recommend
 allocating your transform data with 'fftw_malloc' and de-allocating it
 with 'fftw_free'.  These have exactly the same interface and behavior as
 'malloc'/'free', except that for a SIMD FFTW they ensure that the
 returned pointer has the necessary alignment (by calling 'memalign' or
 its equivalent on your OS).
 
    You are not _required_ to use 'fftw_malloc'.  You can allocate your
 data in any way that you like, from 'malloc' to 'new' (in C++) to a
 fixed-size array declaration.  If the array happens not to be properly
 aligned, FFTW will not use the SIMD extensions.
 
    Since 'fftw_malloc' only ever needs to be used for real and complex
 arrays, we provide two convenient wrapper routines 'fftw_alloc_real(N)'
 and 'fftw_alloc_complex(N)' that are equivalent to
 '(double*)fftw_malloc(sizeof(double) * N)' and
 '(fftw_complex*)fftw_malloc(sizeof(fftw_complex) * N)', respectively (or
 their equivalents in other precisions).