elisp: Explicit Encoding
32.10.7 Explicit Encoding and Decoding
--------------------------------------
All the operations that transfer text in and out of Emacs have the
ability to use a coding system to encode or decode the text. You can
also explicitly encode and decode text using the functions in this
section.
The result of encoding, and the input to decoding, are not ordinary
text. They logically consist of a series of byte values; that is, a
series of ASCII and eight-bit characters. In unibyte buffers and
strings, these characters have codes in the range 0 through #xFF (255).
In a multibyte buffer or string, eight-bit characters have character
codes higher than #xFF (Text Representations), but Emacs
transparently converts them to their single-byte values when you encode
or decode such text.
The usual way to read a file into a buffer as a sequence of bytes, so
you can decode the contents explicitly, is with
‘insert-file-contents-literally’ (Reading from Files);
alternatively, specify a non-‘nil’ RAWFILE argument when visiting a file
with ‘find-file-noselect’. These methods result in a unibyte buffer.
The usual way to use the byte sequence that results from explicitly
encoding text is to copy it to a file or process—for example, to write
it with ‘write-region’ (Writing to Files), and suppress encoding
by binding ‘coding-system-for-write’ to ‘no-conversion’.
Here are the functions to perform explicit encoding or decoding. The
encoding functions produce sequences of bytes; the decoding functions
are meant to operate on sequences of bytes. All of these functions
discard text properties. They also set ‘last-coding-system-used’ to the
precise coding system they used.
-- Command: encode-coding-region start end coding-system &optional
destination
This command encodes the text from START to END according to coding
system CODING-SYSTEM. Normally, the encoded text replaces the
original text in the buffer, but the optional argument DESTINATION
can change that. If DESTINATION is a buffer, the encoded text is
inserted in that buffer after point (point does not move); if it is
‘t’, the command returns the encoded text as a unibyte string
without inserting it.
If encoded text is inserted in some buffer, this command returns
the length of the encoded text.
The result of encoding is logically a sequence of bytes, but the
buffer remains multibyte if it was multibyte before, and any 8-bit
bytes are converted to their multibyte representation (Text
Representations).
Do _not_ use ‘undecided’ for CODING-SYSTEM when encoding text,
since that may lead to unexpected results. Instead, use
‘select-safe-coding-system’ (select-safe-coding-system
User-Chosen Coding Systems.) to suggest a suitable encoding, if
there’s no obvious pertinent value for CODING-SYSTEM.
-- Function: encode-coding-string string coding-system &optional nocopy
buffer
This function encodes the text in STRING according to coding system
CODING-SYSTEM. It returns a new string containing the encoded
text, except when NOCOPY is non-‘nil’, in which case the function
may return STRING itself if the encoding operation is trivial. The
result of encoding is a unibyte string.
-- Command: decode-coding-region start end coding-system &optional
destination
This command decodes the text from START to END according to coding
system CODING-SYSTEM. To make explicit decoding useful, the text
before decoding ought to be a sequence of byte values, but both
multibyte and unibyte buffers are acceptable (in the multibyte
case, the raw byte values should be represented as eight-bit
characters). Normally, the decoded text replaces the original text
in the buffer, but the optional argument DESTINATION can change
that. If DESTINATION is a buffer, the decoded text is inserted in
that buffer after point (point does not move); if it is ‘t’, the
command returns the decoded text as a multibyte string without
inserting it.
If decoded text is inserted in some buffer, this command returns
the length of the decoded text.
This command puts a ‘charset’ text property on the decoded text.
The value of the property states the character set used to decode
the original text.
-- Function: decode-coding-string string coding-system &optional nocopy
buffer
This function decodes the text in STRING according to
CODING-SYSTEM. It returns a new string containing the decoded
text, except when NOCOPY is non-‘nil’, in which case the function
may return STRING itself if the decoding operation is trivial. To
make explicit decoding useful, the contents of STRING ought to be a
unibyte string with a sequence of byte values, but a multibyte
string is also acceptable (assuming it contains 8-bit bytes in
their multibyte form).
If optional argument BUFFER specifies a buffer, the decoded text is
inserted in that buffer after point (point does not move). In this
case, the return value is the length of the decoded text.
This function puts a ‘charset’ text property on the decoded text.
The value of the property states the character set used to decode
the original text:
(decode-coding-string "Gr\374ss Gott" 'latin-1)
⇒ #("Grüss Gott" 0 9 (charset iso-8859-1))
-- Function: decode-coding-inserted-region from to filename &optional
visit beg end replace
This function decodes the text from FROM to TO as if it were being
read from file FILENAME using ‘insert-file-contents’ using the rest
of the arguments provided.
The normal way to use this function is after reading text from a
file without decoding, if you decide you would rather have decoded
it. Instead of deleting the text and reading it again, this time
with decoding, you can call this function.