gawk: Locales
6.6 Where You Are Makes a Difference
====================================
Modern systems support the notion of "locales": a way to tell the system
about the local character set and language. The ISO C standard defines
a default '"C"' locale, which is an environment that is typical of what
many C programmers are used to.
Once upon a time, the locale setting used to affect regexp matching,
but this is no longer true (Ranges and Locales).
Locales can affect record splitting. For the normal case of 'RS =
"\n"', the locale is largely irrelevant. For other single-character
record separators, setting 'LC_ALL=C' in the environment will give you
much better performance when reading records. Otherwise, 'gawk' has to
make several function calls, _per input character_, to find the record
terminator.
Locales can affect how dates and times are formatted (Time
Functions). For example, a common way to abbreviate the date
September 4, 2015, in the United States is "9/4/15." In many countries
in Europe, however, it is abbreviated "4.9.15." Thus, the '%x'
specification in a '"US"' locale might produce '9/4/15', while in a
'"EUROPE"' locale, it might produce '4.9.15'.
According to POSIX, string comparison is also affected by locales
(similar to regular expressions). The details are presented in
POSIX String Comparison.
Finally, the locale affects the value of the decimal point character
used when 'gawk' parses input data. This is discussed in detail in
Conversion.