Info: (elisp) Converting Representations

Info Catalog
elisp: Disabling Multibyte
elisp: Non-ASCII Characters
elisp: Selecting a Representation
elisp: Converting Representations

 
 32.3 Converting Text Representations
 ====================================
 
 Emacs can convert unibyte text to multibyte; it can also convert
 multibyte text to unibyte, provided that the multibyte text contains
 only ASCII and 8-bit raw bytes.  In general, these conversions happen
 when inserting text into a buffer, or when putting text from several
 strings together in one string.  You can also explicitly convert a
 string’s contents to either representation.
 
    Emacs chooses the representation for a string based on the text from
 which it is constructed.  The general rule is to convert unibyte text to
 multibyte text when combining it with other multibyte text, because the
 multibyte representation is more general and can hold whatever
 characters the unibyte text has.
 
    When inserting text into a buffer, Emacs converts the text to the
 buffer’s representation, as specified by ‘enable-multibyte-characters’
 in that buffer.  In particular, when you insert multibyte text into a
 unibyte buffer, Emacs converts the text to unibyte, even though this
 conversion cannot in general preserve all the characters that might be
 in the multibyte text.  The other natural alternative, to convert the
 buffer contents to multibyte, is not acceptable because the buffer’s
 representation is a choice made by the user that cannot be overridden
 automatically.
 
    Converting unibyte text to multibyte text leaves ASCII characters
 unchanged, and converts bytes with codes 128 through 255 to the
 multibyte representation of raw eight-bit bytes.
 
    Converting multibyte text to unibyte converts all ASCII and eight-bit
 characters to their single-byte form, but loses information for
 non-ASCII characters by discarding all but the low 8 bits of each
 character’s codepoint.  Converting unibyte text to multibyte and back to
 unibyte reproduces the original unibyte text.
 
    The next two functions either return the argument STRING, or a newly
 created string with no text properties.
 
  -- Function: string-to-multibyte string
      This function returns a multibyte string containing the same
      sequence of characters as STRING.  If STRING is a multibyte string,
      it is returned unchanged.  The function assumes that STRING
      includes only ASCII characters and raw 8-bit bytes; the latter are
      converted to their multibyte representation corresponding to the
      codepoints ‘#x3FFF80’ through ‘#x3FFFFF’, inclusive (
      codepoints Text Representations.).
 
  -- Function: string-to-unibyte string
      This function returns a unibyte string containing the same sequence
      of characters as STRING.  It signals an error if STRING contains a
      non-ASCII character.  If STRING is a unibyte string, it is returned
      unchanged.  Use this function for STRING arguments that contain
      only ASCII and eight-bit characters.
 
  -- Function: byte-to-string byte
      This function returns a unibyte string containing a single byte of
      character data, CHARACTER.  It signals an error if CHARACTER is not
      an integer between 0 and 255.
 
  -- Function: multibyte-char-to-unibyte char
      This converts the multibyte character CHAR to a unibyte character,
      and returns that character.  If CHAR is neither ASCII nor
      eight-bit, the function returns −1.
 
  -- Function: unibyte-char-to-multibyte char
      This convert the unibyte character CHAR to a multibyte character,
      assuming CHAR is either ASCII or raw 8-bit byte.
Info Catalog
elisp: Disabling Multibyte
elisp: Non-ASCII Characters
elisp: Selecting a Representation