url: URI Encoding
2.2 URI Encoding
================
The ‘url-generic-parse-url’ parser does not obey RFC 3986 in one
respect: it allows non-ASCII characters in URI strings.
Strictly speaking, RFC 3986 compatible URIs may only consist of ASCII
characters; non-ASCII characters are represented by converting them to
UTF-8 byte sequences, and performing “percent encoding” on the bytes.
For example, the o-umlaut character is converted to the UTF-8 byte
sequence ‘\xD3\xA7’, then percent encoded to ‘%D3%A7’. (Certain
“reserved” ASCII characters must also be percent encoded when they
appear in URI components.)
The function ‘url-encode-url’ can be used to convert a URI string
containing arbitrary characters to one that is properly percent-encoded
in accordance with RFC 3986.
-- Function: url-encode-url url-string
This function return a properly URI-encoded version of URL-STRING.
It also performs “URI normalization”, e.g., converting the scheme
component to lowercase if it was previously uppercase.
To convert between a string containing arbitrary characters and a
percent-encoded all-ASCII string, use the functions ‘url-hexify-string’
and ‘url-unhex-string’:
-- Function: url-hexify-string string &optional allowed-chars
This function performs percent-encoding on STRING, and returns the
result.
If STRING is multibyte, it is first converted to a UTF-8 byte
string. Each byte corresponding to an allowed character is left
as-is, while all other bytes are converted to a three-character
sequence: ‘%’ followed by two upper-case hex digits.
The allowed characters are specified by ALLOWED-CHARS. If this
argument is ‘nil’, the allowed characters are those specified as
“unreserved characters” by RFC 3986 (see the variable
‘url-unreserved-chars’). Otherwise, ALLOWED-CHARS should be a
vector whose N-th element is non-‘nil’ if character N is allowed.
-- Function: url-unhex-string string &optional allow-newlines
This function replaces percent-encoding sequences in STRING with
their character equivalents, and returns the resulting string.
If ALLOW-NEWLINES is non-‘nil’, it allows the decoding of carriage
returns and line feeds, which are normally forbidden in URIs.