grub2: Internationalisation

 
 17 Internationalisation
 ***********************
 
 17.1 Charset
 ============
 
 GRUB uses UTF-8 internally other than in rendering where some
 GRUB-specific appropriate representation is used.  All text files
 (including config) are assumed to be encoded in UTF-8.
 
 17.2 Filesystems
 ================
 
 NTFS, JFS, UDF, HFS+, exFAT, long filenames in FAT, Joliet part of
 ISO9660 are treated as UTF-16 as per specification.  AFS and BFS are
 read as UTF-8, again according to specification.  BtrFS, cpio, tar,
 squash4, minix, minix2, minix3, ROMFS, ReiserFS, XFS, ext2, ext3, ext4,
 FAT (short names), RockRidge part of ISO9660, nilfs2, UFS1, UFS2 and ZFS
 are assumed to be UTF-8.  This might be false on systems configured with
 legacy charset but as long as the charset used is superset of ASCII you
 should be able to access ASCII-named files.  And it's recommended to
 configure your system to use UTF-8 to access the filesystem, convmv may
 help with migration.  ISO9660 (plain) filenames are specified as being
 ASCII or being described with unspecified escape sequences.  GRUB
 assumes that the ISO9660 names are UTF-8 (since any ASCII is valid
 UTF-8).  There are some old CD-ROMs which use CP437 in non-compliant
 way.  You're still able to access files with names containing only ASCII
 characters on such filesystems though.  You're also able to access any
 file if the filesystem contains valid Joliet (UTF-16) or RockRidge
 (UTF-8).  AFFS, SFS and HFS never use unicode and GRUB assumes them to
 be in Latin1, Latin1 and MacRoman respectively.  GRUB handles filesystem
 case-insensitivity however no attempt is performed at case conversion of
 international characters so e.g.  a file named lowercase greek alpha is
 treated as different from the one named as uppercase alpha.  The
 filesystems in questions are NTFS (except POSIX namespace), HFS+
 (configurable at mkfs time, default insensitive), SFS (configurable at
 mkfs time, default insensitive), JFS (configurable at mkfs time, default
 sensitive), HFS, AFFS, FAT, exFAT and ZFS (configurable on per-subvolume
 basis by property "casesensitivity", default sensitive).  On ZFS
 subvolumes marked as case insensitive files containing lowercase
 international characters are inaccessible.  Also like all supported
 filesystems except HFS+ and ZFS (configurable on per-subvolume basis by
 property "normalization", default none) GRUB makes no attempt at check
 of canonical equivalence so a file name u-diaresis is treated as
 distinct from u+combining diaresis.  This however means that in order to
 access file on HFS+ its name must be specified in normalisation form D.
 On normalized ZFS subvolumes filenames out of normalisation are
 inaccessible.
 
 17.3 Output terminal
 ====================
 
 Firmware output console "console" on ARC and IEEE1275 are limited to
 ASCII.
 
    BIOS firmware console and VGA text are limited to ASCII and some
 pseudographics.
 
    None of above mentioned is appropriate for displaying international
 and any unsupported character is replaced with question mark except
 pseudographics which we attempt to approximate with ASCII.
 
    EFI console on the other hand nominally supports UTF-16 but actual
 language coverage depends on firmware and may be very limited.
 
    The encoding used on serial can be chosen with 'terminfo' as either
 ASCII, UTF-8 or "visual UTF-8".  Last one is against the specification
 but results in correct rendering of right-to-left on some readers which
 don't have own bidi implementation.
 
    On emu GRUB checks if charset is UTF-8 and uses it if so and uses
 ASCII otherwise.
 
    When using gfxterm or gfxmenu GRUB itself is responsible for
 rendering the text.  In this case GRUB is limited by loaded fonts.  If
 fonts contain all required characters then bidirectional text, cursive
 variants and combining marks other than enclosing, half (e.g.  left half
 tilde or combining overline) and double ones.  Ligatures aren't
 supported though.  This should cover European, Middle Eastern (if you
 don't mind lack of lam-alif ligature in Arabic) and East Asian scripts.
 Notable unsupported scripts are Brahmic family and derived as well as
 Mongolian, Tifinagh, Korean Jamo (precomposed characters have no
 problem) and tonal writing (2e5-2e9).  GRUB also ignores deprecated (as
 specified in Unicode) characters (e.g.  tags).  GRUB also doesn't handle
 so called "annotation characters" If you can complete either of two
 lists or, better, propose a patch to improve rendering, please contact
 developer team.
 
 17.4 Input terminal
 ===================
 
 Firmware console on BIOS, IEEE1275 and ARC doesn't allow you to enter
 non-ASCII characters.  EFI specification allows for such but author is
 unaware of any actual implementations.  Serial input is currently
 limited for latin1 (unlikely to change).  Own keyboard implementations
 (at_keyboard and usb_keyboard) supports any key but work on
 one-char-per-keystroke.  So no dead keys or advanced input method.  Also
 there is no keymap change hotkey.  In practice it makes difficult to
 enter any text using non-Latin alphabet.  Moreover all current input
 consumers are limited to ASCII.
 
 17.5 Gettext
 ============
 
 GRUB supports being translated.  For this you need to have language *.mo
 files in $prefix/locale, load gettext module and set "lang" variable.
 
 17.6 Regexp
 ===========
 
 Regexps work on unicode characters, however no attempt at checking
 cannonical equivalence has been made.  Moreover the classes like
 [:alpha:] match only ASCII subset.
 
 17.7 Other
 ==========
 
 Currently GRUB always uses YEAR-MONTH-DAY HOUR:MINUTE:SECOND [WEEKDAY]
 24-hour datetime format but weekdays are translated.  GRUB always uses
 the decimal number format with [0-9] as digits and .  as descimal
 separator and no group separator.  IEEE1275 aliases are matched
 case-insensitively except non-ASCII which is matched as binary.  Similar
 behaviour is for matching OSBundleRequired.  Since IEEE1275 aliases and
 OSBundleRequired don't contain any non-ASCII it should never be a
 problem in practice.  Case-sensitive identifiers are matched as raw
 strings, no canonical equivalence check is performed.  Case-insenstive
 identifiers are matched as RAW but additionally [a-z] is equivalent to
 [A-Z]. GRUB-defined identifiers use only ASCII and so should
 user-defined ones.  Identifiers containing non-ASCII may work but aren't
 supported.  Only the ASCII space characters (space U+0020, tab U+000b,
 CR U+000d and LF U+000a) are recognised.  Other unicode space characters
 aren't a valid field separator.  'test' (Seetest) tests <, >, <=,
 >=, -pgt and -plt compare the strings in the lexicographical order of
 unicode codepoints, replicating the behaviour of test from coreutils.
 environment variables and commands are listed in the same order.