elisp: Character Sets

 
 32.7 Character Sets
 ===================
 
 An Emacs “character set”, or “charset”, is a set of characters in which
 each character is assigned a numeric code point.  (The Unicode Standard
 calls this a “coded character set”.)  Each Emacs charset has a name
 which is a symbol.  A single character can belong to any number of
 different character sets, but it will generally have a different code
 point in each charset.  Examples of character sets include ‘ascii’,
 ‘iso-8859-1’, ‘greek-iso8859-7’, and ‘windows-1255’.  The code point
 assigned to a character in a charset is usually different from its code
 point used in Emacs buffers and strings.
 
    Emacs defines several special character sets.  The character set
 ‘unicode’ includes all the characters whose Emacs code points are in the
 range ‘0..#x10FFFF’.  The character set ‘emacs’ includes all ASCII and
 non-ASCII characters.  Finally, the ‘eight-bit’ charset includes the
 8-bit raw bytes; Emacs uses it to represent raw bytes encountered in
 text.
 
  -- Function: charsetp object
      Returns ‘t’ if OBJECT is a symbol that names a character set, ‘nil’
      otherwise.
 
  -- Variable: charset-list
      The value is a list of all defined character set names.
 
  -- Function: charset-priority-list &optional highestp
      This function returns a list of all defined character sets ordered
      by their priority.  If HIGHESTP is non-‘nil’, the function returns
      a single character set of the highest priority.
 
  -- Function: set-charset-priority &rest charsets
      This function makes CHARSETS the highest priority character sets.
 
  -- Function: char-charset character &optional restriction
      This function returns the name of the character set of highest
      priority that CHARACTER belongs to.  ASCII characters are an
      exception: for them, this function always returns ‘ascii’.
 
      If RESTRICTION is non-‘nil’, it should be a list of charsets to
      search.  Alternatively, it can be a coding system, in which case
      the returned charset must be supported by that coding system (See
      Coding Systems).
 
  -- Function: charset-plist charset
      This function returns the property list of the character set
      CHARSET.  Although CHARSET is a symbol, this is not the same as the
      property list of that symbol.  Charset properties include important
      information about the charset, such as its documentation string,
      short name, etc.
 
  -- Function: put-charset-property charset propname value
      This function sets the PROPNAME property of CHARSET to the given
      VALUE.
 
  -- Function: get-charset-property charset propname
      This function returns the value of CHARSETs property PROPNAME.
 
  -- Command: list-charset-chars charset
      This command displays a list of characters in the character set
      CHARSET.
 
    Emacs can convert between its internal representation of a character
 and the character’s codepoint in a specific charset.  The following two
 functions support these conversions.
 
  -- Function: decode-char charset code-point
      This function decodes a character that is assigned a CODE-POINT in
      CHARSET, to the corresponding Emacs character, and returns it.  If
      CHARSET doesn’t contain a character of that code point, the value
      is ‘nil’.  If CODE-POINT doesn’t fit in a Lisp integer (See
      most-positive-fixnum Integer Basics.), it can be specified as a
      cons cell ‘(HIGH . LOW)’, where LOW are the lower 16 bits of the
      value and HIGH are the high 16 bits.
 
  -- Function: encode-char char charset
      This function returns the code point assigned to the character CHAR
      in CHARSET.  If the result does not fit in a Lisp integer, it is
      returned as a cons cell ‘(HIGH . LOW)’ that fits the second
      argument of ‘decode-char’ above.  If CHARSET doesn’t have a
      codepoint for CHAR, the value is ‘nil’.
 
    The following function comes in handy for applying a certain function
 to all or part of the characters in a charset:
 
  -- Function: map-charset-chars function charset &optional arg from-code
           to-code
      Call FUNCTION for characters in CHARSET.  FUNCTION is called with
      two arguments.  The first one is a cons cell ‘(FROM . TO)’, where
      FROM and TO indicate a range of characters contained in charset.
      The second argument passed to FUNCTION is ARG.
 
      By default, the range of codepoints passed to FUNCTION includes all
      the characters in CHARSET, but optional arguments FROM-CODE and
      TO-CODE limit that to the range of characters between these two
      codepoints of CHARSET.  If either of them is ‘nil’, it defaults to
      the first or last codepoint of CHARSET, respectively.