elisp: Garbage Collection

 
 E.3 Garbage Collection
 ======================
 
 When a program creates a list or the user defines a new function (such
 as by loading a library), that data is placed in normal storage.  If
 normal storage runs low, then Emacs asks the operating system to
 allocate more memory.  Different types of Lisp objects, such as symbols,
 cons cells, small vectors, markers, etc., are segregated in distinct
 blocks in memory.  (Large vectors, long strings, buffers and certain
 other editing types, which are fairly large, are allocated in individual
 blocks, one per object; small strings are packed into blocks of 8k
 bytes, and small vectors are packed into blocks of 4k bytes).
 
    Beyond the basic vector, a lot of objects like window, buffer, and
 frame are managed as if they were vectors.  The corresponding C data
 structures include the ‘struct vectorlike_header’ field whose ‘size’
 member contains the subtype enumerated by ‘enum pvec_type’ and an
 information about how many ‘Lisp_Object’ fields this structure contains
 and what the size of the rest data is.  This information is needed to
 calculate the memory footprint of an object, and used by the vector
 allocation code while iterating over the vector blocks.
 
    It is quite common to use some storage for a while, then release it
 by (for example) killing a buffer or deleting the last pointer to an
 object.  Emacs provides a “garbage collector” to reclaim this abandoned
 storage.  The garbage collector operates by finding and marking all Lisp
 objects that are still accessible to Lisp programs.  To begin with, it
 assumes all the symbols, their values and associated function
 definitions, and any data presently on the stack, are accessible.  Any
 objects that can be reached indirectly through other accessible objects
 are also accessible.
 
    When marking is finished, all objects still unmarked are garbage.  No
 matter what the Lisp program or the user does, it is impossible to refer
 to them, since there is no longer a way to reach them.  Their space
 might as well be reused, since no one will miss them.  The second
 (sweep) phase of the garbage collector arranges to reuse them.
 
    The sweep phase puts unused cons cells onto a “free list” for future
 allocation; likewise for symbols and markers.  It compacts the
 accessible strings so they occupy fewer 8k blocks; then it frees the
 other 8k blocks.  Unreachable vectors from vector blocks are coalesced
 to create largest possible free areas; if a free area spans a complete
 4k block, that block is freed.  Otherwise, the free area is recorded in
 a free list array, where each entry corresponds to a free list of areas
 of the same size.  Large vectors, buffers, and other large objects are
 allocated and freed individually.
 
      Common Lisp note: Unlike other Lisps, GNU Emacs Lisp does not call
      the garbage collector when the free list is empty.  Instead, it
      simply requests the operating system to allocate more storage, and
      processing continues until ‘gc-cons-threshold’ bytes have been
      used.
 
      This means that you can make sure that the garbage collector will
      not run during a certain portion of a Lisp program by calling the
      garbage collector explicitly just before it (provided that portion
      of the program does not use so much space as to force a second
      garbage collection).
 
  -- Command: garbage-collect
      This command runs a garbage collection, and returns information on
      the amount of space in use.  (Garbage collection can also occur
      spontaneously if you use more than ‘gc-cons-threshold’ bytes of
      Lisp data since the previous garbage collection.)
 
      ‘garbage-collect’ returns a list with information on amount of
      space in use, where each entry has the form ‘(NAME SIZE USED)’ or
      ‘(NAME SIZE USED FREE)’.  In the entry, NAME is a symbol describing
      the kind of objects this entry represents, SIZE is the number of
      bytes used by each one, USED is the number of those objects that
      were found live in the heap, and optional FREE is the number of
      those objects that are not live but that Emacs keeps around for
      future allocations.  So an overall result is:
 
           ((conses CONS-SIZE USED-CONSES FREE-CONSES)
            (symbols SYMBOL-SIZE USED-SYMBOLS FREE-SYMBOLS)
            (miscs MISC-SIZE USED-MISCS FREE-MISCS)
            (strings STRING-SIZE USED-STRINGS FREE-STRINGS)
            (string-bytes BYTE-SIZE USED-BYTES)
            (vectors VECTOR-SIZE USED-VECTORS)
            (vector-slots SLOT-SIZE USED-SLOTS FREE-SLOTS)
            (floats FLOAT-SIZE USED-FLOATS FREE-FLOATS)
            (intervals INTERVAL-SIZE USED-INTERVALS FREE-INTERVALS)
            (buffers BUFFER-SIZE USED-BUFFERS)
            (heap UNIT-SIZE TOTAL-SIZE FREE-SIZE))
 
      Here is an example:
 
           (garbage-collect)
                 ⇒ ((conses 16 49126 8058) (symbols 48 14607 0)
                            (miscs 40 34 56) (strings 32 2942 2607)
                            (string-bytes 1 78607) (vectors 16 7247)
                            (vector-slots 8 341609 29474) (floats 8 71 102)
                            (intervals 56 27 26) (buffers 944 8)
                            (heap 1024 11715 2678))
 
      Below is a table explaining each element.  Note that last ‘heap’
      entry is optional and present only if an underlying ‘malloc’
      implementation provides ‘mallinfo’ function.
 
      CONS-SIZE
           Internal size of a cons cell, i.e., ‘sizeof (struct
           Lisp_Cons)’.
 
      USED-CONSES
           The number of cons cells in use.
 
      FREE-CONSES
           The number of cons cells for which space has been obtained
           from the operating system, but that are not currently being
           used.
 
      SYMBOL-SIZE
           Internal size of a symbol, i.e., ‘sizeof (struct
           Lisp_Symbol)’.
 
      USED-SYMBOLS
           The number of symbols in use.
 
      FREE-SYMBOLS
           The number of symbols for which space has been obtained from
           the operating system, but that are not currently being used.
 
      MISC-SIZE
           Internal size of a miscellaneous entity, i.e., ‘sizeof (union
           Lisp_Misc)’, which is a size of the largest type enumerated in
           ‘enum Lisp_Misc_Type’.
 
      USED-MISCS
           The number of miscellaneous objects in use.  These include
           markers and overlays, plus certain objects not visible to
           users.
 
      FREE-MISCS
           The number of miscellaneous objects for which space has been
           obtained from the operating system, but that are not currently
           being used.
 
      STRING-SIZE
           Internal size of a string header, i.e., ‘sizeof (struct
           Lisp_String)’.
 
      USED-STRINGS
           The number of string headers in use.
 
      FREE-STRINGS
           The number of string headers for which space has been obtained
           from the operating system, but that are not currently being
           used.
 
      BYTE-SIZE
           This is used for convenience and equals to ‘sizeof (char)’.
 
      USED-BYTES
           The total size of all string data in bytes.
 
      VECTOR-SIZE
           Internal size of a vector header, i.e., ‘sizeof (struct
           Lisp_Vector)’.
 
      USED-VECTORS
           The number of vector headers allocated from the vector blocks.
 
      SLOT-SIZE
           Internal size of a vector slot, always equal to ‘sizeof
           (Lisp_Object)’.
 
      USED-SLOTS
           The number of slots in all used vectors.
 
      FREE-SLOTS
           The number of free slots in all vector blocks.
 
      FLOAT-SIZE
           Internal size of a float object, i.e., ‘sizeof (struct
           Lisp_Float)’.  (Do not confuse it with the native platform
           ‘float’ or ‘double’.)
 
      USED-FLOATS
           The number of floats in use.
 
      FREE-FLOATS
           The number of floats for which space has been obtained from
           the operating system, but that are not currently being used.
 
      INTERVAL-SIZE
           Internal size of an interval object, i.e., ‘sizeof (struct
           interval)’.
 
      USED-INTERVALS
           The number of intervals in use.
 
      FREE-INTERVALS
           The number of intervals for which space has been obtained from
           the operating system, but that are not currently being used.
 
      BUFFER-SIZE
           Internal size of a buffer, i.e., ‘sizeof (struct buffer)’.
           (Do not confuse with the value returned by ‘buffer-size’
           function.)
 
      USED-BUFFERS
           The number of buffer objects in use.  This includes killed
           buffers invisible to users, i.e., all buffers in ‘all_buffers’
           list.
 
      UNIT-SIZE
           The unit of heap space measurement, always equal to 1024
           bytes.
 
      TOTAL-SIZE
           Total heap size, in UNIT-SIZE units.
 
      FREE-SIZE
           Heap space which is not currently used, in UNIT-SIZE units.
 
      If there was overflow in pure space (SeePure Storage),
      ‘garbage-collect’ returns ‘nil’, because a real garbage collection
      cannot be done.
 
  -- User Option: garbage-collection-messages
      If this variable is non-‘nil’, Emacs displays a message at the
      beginning and end of garbage collection.  The default value is
      ‘nil’.
 
  -- Variable: post-gc-hook
      This is a normal hook that is run at the end of garbage collection.
      Garbage collection is inhibited while the hook functions run, so be
      careful writing them.
 
  -- User Option: gc-cons-threshold
      The value of this variable is the number of bytes of storage that
      must be allocated for Lisp objects after one garbage collection in
      order to trigger another garbage collection.  You can use the
      result returned by ‘garbage-collect’ to get an information about
      size of the particular object type; space allocated to the contents
      of buffers does not count.  Note that the subsequent garbage
      collection does not happen immediately when the threshold is
      exhausted, but only the next time the Lisp interpreter is called.
 
      The initial threshold value is ‘GC_DEFAULT_THRESHOLD’, defined in
      ‘alloc.c’.  Since it’s defined in ‘word_size’ units, the value is
      400,000 for the default 32-bit configuration and 800,000 for the
      64-bit one.  If you specify a larger value, garbage collection will
      happen less often.  This reduces the amount of time spent garbage
      collecting, but increases total memory use.  You may want to do
      this when running a program that creates lots of Lisp data.
 
      You can make collections more frequent by specifying a smaller
      value, down to 1/10th of ‘GC_DEFAULT_THRESHOLD’.  A value less than
      this minimum will remain in effect only until the subsequent
      garbage collection, at which time ‘garbage-collect’ will set the
      threshold back to the minimum.
 
  -- User Option: gc-cons-percentage
      The value of this variable specifies the amount of consing before a
      garbage collection occurs, as a fraction of the current heap size.
      This criterion and ‘gc-cons-threshold’ apply in parallel, and
      garbage collection occurs only when both criteria are satisfied.
 
      As the heap size increases, the time to perform a garbage
      collection increases.  Thus, it can be desirable to do them less
      frequently in proportion.
 
    The value returned by ‘garbage-collect’ describes the amount of
 memory used by Lisp data, broken down by data type.  By contrast, the
 function ‘memory-limit’ provides information on the total amount of
 memory Emacs is currently using.
 
  -- Function: memory-limit
      This function returns the address of the last byte Emacs has
      allocated, divided by 1024.  We divide the value by 1024 to make
      sure it fits in a Lisp integer.
 
      You can use this to get a general idea of how your actions affect
      the memory usage.
 
  -- Variable: memory-full
      This variable is ‘t’ if Emacs is nearly out of memory for Lisp
      objects, and ‘nil’ otherwise.
 
  -- Function: memory-use-counts
      This returns a list of numbers that count the number of objects
      created in this Emacs session.  Each of these counters increments
      for a certain kind of object.  See the documentation string for
      details.
 
  -- Function: memory-info
      This functions returns an amount of total system memory and how
      much of it is free.  On an unsupported system, the value may be
      ‘nil’.
 
  -- Variable: gcs-done
      This variable contains the total number of garbage collections done
      so far in this Emacs session.
 
  -- Variable: gc-elapsed
      This variable contains the total number of seconds of elapsed time
      during garbage collection so far in this Emacs session, as a
      floating-point number.