elisp: Character Codes

 
 32.5 Character Codes
 ====================
 
 The unibyte and multibyte text representations use different character
 codes.  The valid character codes for unibyte representation range from
 0 to ‘#xFF’ (255)—the values that can fit in one byte.  The valid
 character codes for multibyte representation range from 0 to ‘#x3FFFFF’.
 In this code space, values 0 through ‘#x7F’ (127) are for ASCII
 characters, and values ‘#x80’ (128) through ‘#x3FFF7F’ (4194175) are for
 non-ASCII characters.
 
    Emacs character codes are a superset of the Unicode standard.  Values
 0 through ‘#x10FFFF’ (1114111) correspond to Unicode characters of the
 same codepoint; values ‘#x110000’ (1114112) through ‘#x3FFF7F’ (4194175)
 represent characters that are not unified with Unicode; and values
 ‘#x3FFF80’ (4194176) through ‘#x3FFFFF’ (4194303) represent eight-bit
 raw bytes.
 
  -- Function: characterp charcode
      This returns ‘t’ if CHARCODE is a valid character, and ‘nil’
      otherwise.
 
           (characterp 65)
                ⇒ t
           (characterp 4194303)
                ⇒ t
           (characterp 4194304)
                ⇒ nil
 
  -- Function: max-char
      This function returns the largest value that a valid character
      codepoint can have.
 
           (characterp (max-char))
                ⇒ t
           (characterp (1+ (max-char)))
                ⇒ nil
 
  -- Function: get-byte &optional pos string
      This function returns the byte at character position POS in the
      current buffer.  If the current buffer is unibyte, this is
      literally the byte at that position.  If the buffer is multibyte,
      byte values of ASCII characters are the same as character
      codepoints, whereas eight-bit raw bytes are converted to their
      8-bit codes.  The function signals an error if the character at POS
      is non-ASCII.
 
      The optional argument STRING means to get a byte value from that
      string instead of the current buffer.