gawk: Time Functions

 
 9.1.5 Time Functions
 --------------------
 
 'awk' programs are commonly used to process log files containing
 timestamp information, indicating when a particular log record was
 written.  Many programs log their timestamps in the form returned by the
 'time()' system call, which is the number of seconds since a particular
 epoch.  On POSIX-compliant systems, it is the number of seconds since
 1970-01-01 00:00:00 UTC, not counting leap seconds.(1)  All known
 POSIX-compliant systems support timestamps from 0 through 2^31 - 1,
 which is sufficient to represent times through 2038-01-19 03:14:07 UTC.
 Many systems support a wider range of timestamps, including negative
 timestamps that represent times before the epoch.
 
    In order to make it easier to process such log files and to produce
 useful reports, 'gawk' provides the following functions for working with
 timestamps.  They are 'gawk' extensions; they are not specified in the
 POSIX standard.(2)  However, recent versions of 'mawk' (SeeOther
 Versions) also support these functions.  Optional parameters are
 enclosed in square brackets ([ ]):
 
 'mktime(DATESPEC' [', UTC-FLAG' ]')'
      Turn DATESPEC into a timestamp in the same form as is returned by
      'systime()'.  It is similar to the function of the same name in ISO
      C. The argument, DATESPEC, is a string of the form
      '"YYYY MM DD HH MM SS [DST]"'.  The string consists of six or seven
      numbers representing, respectively, the full year including
      century, the month from 1 to 12, the day of the month from 1 to 31,
      the hour of the day from 0 to 23, the minute from 0 to 59, the
      second from 0 to 60,(3) and an optional daylight-savings flag.
 
      The values of these numbers need not be within the ranges
      specified; for example, an hour of -1 means 1 hour before midnight.
      The origin-zero Gregorian calendar is assumed, with year 0
      preceding year 1 and year -1 preceding year 0.  If UTC-FLAG is
      present and is either nonzero or non-null, the time is assumed to
      be in the UTC time zone; otherwise, the time is assumed to be in
      the local time zone.  If the DST daylight-savings flag is positive,
      the time is assumed to be daylight savings time; if zero, the time
      is assumed to be standard time; and if negative (the default),
      'mktime()' attempts to determine whether daylight savings time is
      in effect for the specified time.
 
      If DATESPEC does not contain enough elements or if the resulting
      time is out of range, 'mktime()' returns -1.
 
 'strftime('[FORMAT [',' TIMESTAMP [',' UTC-FLAG] ] ]')'
      Format the time specified by TIMESTAMP based on the contents of the
      FORMAT string and return the result.  It is similar to the function
      of the same name in ISO C. If UTC-FLAG is present and is either
      nonzero or non-null, the value is formatted as UTC (Coordinated
      Universal Time, formerly GMT or Greenwich Mean Time).  Otherwise,
      the value is formatted for the local time zone.  The TIMESTAMP is
      in the same format as the value returned by the 'systime()'
      function.  If no TIMESTAMP argument is supplied, 'gawk' uses the
      current time of day as the timestamp.  Without a FORMAT argument,
      'strftime()' uses the value of 'PROCINFO["strftime"]' as the format
      string (SeeBuilt-in Variables).  The default string value is
      '"%a %b %e %H:%M:%S %Z %Y"'.  This format string produces output
      that is equivalent to that of the 'date' utility.  You can assign a
      new value to 'PROCINFO["strftime"]' to change the default format;
      see the following list for the various format directives.
 
 'systime()'
      Return the current time as the number of seconds since the system
      epoch.  On POSIX systems, this is the number of seconds since
      1970-01-01 00:00:00 UTC, not counting leap seconds.  It may be a
      different number on other systems.
 
    The 'systime()' function allows you to compare a timestamp from a log
 file with the current time of day.  In particular, it is easy to
 determine how long ago a particular record was logged.  It also allows
 you to produce log records using the "seconds since the epoch" format.
 
    The 'mktime()' function allows you to convert a textual
 representation of a date and time into a timestamp.  This makes it easy
 to do before/after comparisons of dates and times, particularly when
 dealing with date and time data coming from an external source, such as
 a log file.
 
    The 'strftime()' function allows you to easily turn a timestamp into
 human-readable information.  It is similar in nature to the 'sprintf()'
 function (SeeString Functions), in that it copies nonformat
 specification characters verbatim to the returned string, while
 substituting date and time values for format specifications in the
 FORMAT string.
 
    'strftime()' is guaranteed by the 1999 ISO C standard(4) to support
 the following date format specifications:
 
 '%a'
      The locale's abbreviated weekday name.
 
 '%A'
      The locale's full weekday name.
 
 '%b'
      The locale's abbreviated month name.
 
 '%B'
      The locale's full month name.
 
 '%c'
      The locale's "appropriate" date and time representation.  (This is
      '%A %B %d %T %Y' in the '"C"' locale.)
 
 '%C'
      The century part of the current year.  This is the year divided by
      100 and truncated to the next lower integer.
 
 '%d'
      The day of the month as a decimal number (01-31).
 
 '%D'
      Equivalent to specifying '%m/%d/%y'.
 
 '%e'
      The day of the month, padded with a space if it is only one digit.
 
 '%F'
      Equivalent to specifying '%Y-%m-%d'.  This is the ISO 8601 date
      format.
 
 '%g'
      The year modulo 100 of the ISO 8601 week number, as a decimal
      number (00-99).  For example, January 1, 2012, is in week 53 of
      2011.  Thus, the year of its ISO 8601 week number is 2011, even
      though its year is 2012.  Similarly, December 31, 2012, is in week
      1 of 2013.  Thus, the year of its ISO week number is 2013, even
      though its year is 2012.
 
 '%G'
      The full year of the ISO week number, as a decimal number.
 
 '%h'
      Equivalent to '%b'.
 
 '%H'
      The hour (24-hour clock) as a decimal number (00-23).
 
 '%I'
      The hour (12-hour clock) as a decimal number (01-12).
 
 '%j'
      The day of the year as a decimal number (001-366).
 
 '%m'
      The month as a decimal number (01-12).
 
 '%M'
      The minute as a decimal number (00-59).
 
 '%n'
      A newline character (ASCII LF).
 
 '%p'
      The locale's equivalent of the AM/PM designations associated with a
      12-hour clock.
 
 '%r'
      The locale's 12-hour clock time.  (This is '%I:%M:%S %p' in the
      '"C"' locale.)
 
 '%R'
      Equivalent to specifying '%H:%M'.
 
 '%S'
      The second as a decimal number (00-60).
 
 '%t'
      A TAB character.
 
 '%T'
      Equivalent to specifying '%H:%M:%S'.
 
 '%u'
      The weekday as a decimal number (1-7).  Monday is day one.
 
 '%U'
      The week number of the year (with the first Sunday as the first day
      of week one) as a decimal number (00-53).
 
 '%V'
      The week number of the year (with the first Monday as the first day
      of week one) as a decimal number (01-53).  The method for
      determining the week number is as specified by ISO 8601.  (To wit:
      if the week containing January 1 has four or more days in the new
      year, then it is week one; otherwise it is the last week [52 or 53]
      of the previous year and the next week is week one.)
 
 '%w'
      The weekday as a decimal number (0-6).  Sunday is day zero.
 
 '%W'
      The week number of the year (with the first Monday as the first day
      of week one) as a decimal number (00-53).
 
 '%x'
      The locale's "appropriate" date representation.  (This is '%A %B %d
      %Y' in the '"C"' locale.)
 
 '%X'
      The locale's "appropriate" time representation.  (This is '%T' in
      the '"C"' locale.)
 
 '%y'
      The year modulo 100 as a decimal number (00-99).
 
 '%Y'
      The full year as a decimal number (e.g., 2015).
 
 '%z'
      The time zone offset in a '+HHMM' format (e.g., the format
      necessary to produce RFC 822/RFC 1036 date headers).
 
 '%Z'
      The time zone name or abbreviation; no characters if no time zone
      is determinable.
 
 '%Ec %EC %Ex %EX %Ey %EY %Od %Oe %OH'
 '%OI %Om %OM %OS %Ou %OU %OV %Ow %OW %Oy'
      "Alternative representations" for the specifications that use only
      the second letter ('%c', '%C', and so on).(5)  (These facilitate
      compliance with the POSIX 'date' utility.)
 
 '%%'
      A literal '%'.
 
    If a conversion specifier is not one of those just listed, the
 behavior is undefined.(6)
 
    For systems that are not yet fully standards-compliant, 'gawk'
 supplies a copy of 'strftime()' from the GNU C Library.  It supports all
 of the just-listed format specifications.  If that version is used to
 compile 'gawk' (SeeInstallation), then the following additional
 format specifications are available:
 
 '%k'
      The hour (24-hour clock) as a decimal number (0-23).  Single-digit
      numbers are padded with a space.
 
 '%l'
      The hour (12-hour clock) as a decimal number (1-12).  Single-digit
      numbers are padded with a space.
 
 '%s'
      The time as a decimal timestamp in seconds since the epoch.
 
    Additionally, the alternative representations are recognized but
 their normal representations are used.
 
    The following example is an 'awk' implementation of the POSIX 'date'
 utility.  Normally, the 'date' utility prints the current date and time
 of day in a well-known format.  However, if you provide an argument to
 it that begins with a '+', 'date' copies nonformat specifier characters
 to the standard output and interprets the current time according to the
 format specifiers in the string.  For example:
 
      $ date '+Today is %A, %B %d, %Y.'
      -| Today is Monday, September 22, 2014.
 
    Here is the 'gawk' version of the 'date' utility.  It has a shell
 "wrapper" to handle the '-u' option, which requires that 'date' run as
 if the time zone is set to UTC:
 
      #! /bin/sh
      #
      # date --- approximate the POSIX 'date' command
 
      case $1 in
      -u)  TZ=UTC0     # use UTC
           export TZ
           shift ;;
      esac
 
      gawk 'BEGIN  {
          format = PROCINFO["strftime"]
          exitval = 0
 
          if (ARGC > 2)
              exitval = 1
          else if (ARGC == 2) {
              format = ARGV[1]
              if (format ~ /^\+/)
                  format = substr(format, 2)   # remove leading +
          }
          print strftime(format)
          exit exitval
      }' "$@"
 
    ---------- Footnotes ----------
 
    (1) SeeGlossary, especially the entries "Epoch" and "UTC."
 
    (2) The GNU 'date' utility can also do many of the things described
 here.  Its use may be preferable for simple time-related operations in
 shell scripts.
 
    (3) Occasionally there are minutes in a year with a leap second,
 which is why the seconds can go up to 60.
 
    (4) Unfortunately, not every system's 'strftime()' necessarily
 supports all of the conversions listed here.
 
    (5) If you don't understand any of this, don't worry about it; these
 facilities are meant to make it easier to "internationalize" programs.
 Other internationalization features are described in See
 Internationalization.
 
    (6) This is because ISO C leaves the behavior of the C version of
 'strftime()' undefined and 'gawk' uses the system's version of
 'strftime()' if it's there.  Typically, the conversion specifier either
 does not appear in the returned string or appears literally.