fftw3: Real even/odd DFTs (cosine/sine transforms)

 
 2.5.2 Real even/odd DFTs (cosine/sine transforms)
 -------------------------------------------------
 
 The Fourier transform of a real-even function f(-x) = f(x) is real-even,
 and i times the Fourier transform of a real-odd function f(-x) = -f(x)
 is real-odd.  Similar results hold for a discrete Fourier transform, and
 thus for these symmetries the need for complex inputs/outputs is
 entirely eliminated.  Moreover, one gains a factor of two in speed/space
 from the fact that the data are real, and an additional factor of two
 from the even/odd symmetry: only the non-redundant (first) half of the
 array need be stored.  The result is the real-even DFT ("REDFT") and the
 real-odd DFT ("RODFT"), also known as the discrete cosine and sine
 transforms ("DCT" and "DST"), respectively.
 
    (In this section, we describe the 1d transforms; multi-dimensional
 transforms are just a separable product of these transforms operating
 along each dimension.)
 
    Because of the discrete sampling, one has an additional choice: is
 the data even/odd around a sampling point, or around the point halfway
 between two samples?  The latter corresponds to _shifting_ the samples
 by _half_ an interval, and gives rise to several transform variants
 denoted by REDFTab and RODFTab: a and b are 0 or 1, and indicate whether
 the input (a) and/or output (b) are shifted by half a sample (1 means it
 is shifted).  These are also known as types I-IV of the DCT and DST, and
 all four types are supported by FFTW's r2r interface.(1)
 
    The r2r kinds for the various REDFT and RODFT types supported by
 FFTW, along with the boundary conditions at both ends of the _input_
 array ('n' real numbers 'in[j=0..n-1]'), are:
 
    * 'FFTW_REDFT00' (DCT-I): even around j=0 and even around j=n-1.
 
    * 'FFTW_REDFT10' (DCT-II, "the" DCT): even around j=-0.5 and even
      around j=n-0.5.
 
    * 'FFTW_REDFT01' (DCT-III, "the" IDCT): even around j=0 and odd
      around j=n.
 
    * 'FFTW_REDFT11' (DCT-IV): even around j=-0.5 and odd around j=n-0.5.
 
    * 'FFTW_RODFT00' (DST-I): odd around j=-1 and odd around j=n.
 
    * 'FFTW_RODFT10' (DST-II): odd around j=-0.5 and odd around j=n-0.5.
 
    * 'FFTW_RODFT01' (DST-III): odd around j=-1 and even around j=n-1.
 
    * 'FFTW_RODFT11' (DST-IV): odd around j=-0.5 and even around j=n-0.5.
 
    Note that these symmetries apply to the "logical" array being
 transformed; *there are no constraints on your physical input data*.
 So, for example, if you specify a size-5 REDFT00 (DCT-I) of the data
 abcde, it corresponds to the DFT of the logical even array abcdedcb of
 size 8.  A size-4 REDFT10 (DCT-II) of the data abcd corresponds to the
 size-8 logical DFT of the even array abcddcba, shifted by half a sample.
 
    All of these transforms are invertible.  The inverse of R*DFT00 is
 R*DFT00; of R*DFT10 is R*DFT01 and vice versa (these are often called
 simply "the" DCT and IDCT, respectively); and of R*DFT11 is R*DFT11.
 However, the transforms computed by FFTW are unnormalized, exactly like
 the corresponding real and complex DFTs, so computing a transform
 followed by its inverse yields the original array scaled by N, where N
 is the _logical_ DFT size.  For REDFT00, N=2(n-1); for RODFT00,
 N=2(n+1); otherwise, N=2n.
 
    Note that the boundary conditions of the transform output array are
 given by the input boundary conditions of the inverse transform.  Thus,
 the above transforms are all inequivalent in terms of input/output
 boundary conditions, even neglecting the 0.5 shift difference.
 
    FFTW is most efficient when N is a product of small factors; note
 that this _differs_ from the factorization of the physical size 'n' for
 REDFT00 and RODFT00!  There is another oddity: 'n=1' REDFT00 transforms
 correspond to N=0, and so are _not defined_ (the planner will return
 'NULL').  Otherwise, any positive 'n' is supported.
 
    For the precise mathematical definitions of these transforms as used
 by FFTW, see SeeWhat FFTW Really Computes.  (For people accustomed
 to the DCT/DST, FFTW's definitions have a coefficient of 2 in front of
 the cos/sin functions so that they correspond precisely to an even/odd
 DFT of size N. Some authors also include additional multiplicative
 factors of sqrt(2) for selected inputs and outputs; this makes the
 transform orthogonal, but sacrifices the direct equivalence to a
 symmetric DFT.)
 
 Which type do you need?
 .......................
 
 Since the required flavor of even/odd DFT depends upon your problem, you
 are the best judge of this choice, but we can make a few comments on
 relative efficiency to help you in your selection.  In particular,
 R*DFT01 and R*DFT10 tend to be slightly faster than R*DFT11 (especially
 for odd sizes), while the R*DFT00 transforms are sometimes significantly
 slower (especially for even sizes).(2)
 
    Thus, if only the boundary conditions on the transform inputs are
 specified, we generally recommend R*DFT10 over R*DFT00 and R*DFT01 over
 R*DFT11 (unless the half-sample shift or the self-inverse property is
 significant for your problem).
 
    If performance is important to you and you are using only small sizes
 (say n<200), e.g.  for multi-dimensional transforms, then you might
 consider generating hard-coded transforms of those sizes and types that
 you are interested in (SeeGenerating your own code).
 
    We are interested in hearing what types of symmetric transforms you
 find most useful.
 
    ---------- Footnotes ----------
 
    (1) There are also type V-VIII transforms, which correspond to a
 logical DFT of _odd_ size N, independent of whether the physical size
 'n' is odd, but we do not support these variants.
 
    (2) R*DFT00 is sometimes slower in FFTW because we discovered that
 the standard algorithm for computing this by a pre/post-processed real
 DFT--the algorithm used in FFTPACK, Numerical Recipes, and other sources
 for decades now--has serious numerical problems: it already loses
 several decimal places of accuracy for 16k sizes.  There seem to be only
 two alternatives in the literature that do not suffer similarly: a
 recursive decomposition into smaller DCTs, which would require a large
 set of codelets for efficiency and generality, or sacrificing a factor
 of 2 in speed to use a real DFT of twice the size.  We currently employ
 the latter technique for general n, as well as a limited form of the
 former method: a split-radix decomposition when n is odd (N a multiple
 of 4).  For N containing many factors of 2, the split-radix method seems
 to recover most of the speed of the standard algorithm without the
 accuracy tradeoff.