url: URI Encoding

 
 2.2 URI Encoding
 ================
 
 The ‘url-generic-parse-url’ parser does not obey RFC 3986 in one
 respect: it allows non-ASCII characters in URI strings.
 
    Strictly speaking, RFC 3986 compatible URIs may only consist of ASCII
 characters; non-ASCII characters are represented by converting them to
 UTF-8 byte sequences, and performing “percent encoding” on the bytes.
 For example, the o-umlaut character is converted to the UTF-8 byte
 sequence ‘\xD3\xA7’, then percent encoded to ‘%D3%A7’.  (Certain
 “reserved” ASCII characters must also be percent encoded when they
 appear in URI components.)
 
    The function ‘url-encode-url’ can be used to convert a URI string
 containing arbitrary characters to one that is properly percent-encoded
 in accordance with RFC 3986.
 
  -- Function: url-encode-url url-string
      This function return a properly URI-encoded version of URL-STRING.
      It also performs “URI normalization”, e.g., converting the scheme
      component to lowercase if it was previously uppercase.
 
    To convert between a string containing arbitrary characters and a
 percent-encoded all-ASCII string, use the functions ‘url-hexify-string’
 and ‘url-unhex-string’:
 
  -- Function: url-hexify-string string &optional allowed-chars
      This function performs percent-encoding on STRING, and returns the
      result.
 
      If STRING is multibyte, it is first converted to a UTF-8 byte
      string.  Each byte corresponding to an allowed character is left
      as-is, while all other bytes are converted to a three-character
      sequence: ‘%’ followed by two upper-case hex digits.
 
      The allowed characters are specified by ALLOWED-CHARS.  If this
      argument is ‘nil’, the allowed characters are those specified as
      “unreserved characters” by RFC 3986 (see the variable
      ‘url-unreserved-chars’).  Otherwise, ALLOWED-CHARS should be a
      vector whose N-th element is non-‘nil’ if character N is allowed.
 
  -- Function: url-unhex-string string &optional allow-newlines
      This function replaces percent-encoding sequences in STRING with
      their character equivalents, and returns the resulting string.
 
      If ALLOW-NEWLINES is non-‘nil’, it allows the decoding of carriage
      returns and line feeds, which are normally forbidden in URIs.