gawk: Derived Files

 
 C.2.4 Why Generated Files Are Kept In Git
 -----------------------------------------
 
 If you look at the 'gawk' source in the Git repository, you will notice
 that it includes files that are automatically generated by GNU
 infrastructure tools, such as 'Makefile.in' from Automake and even
 'configure' from Autoconf.
 
    This is different from many Free Software projects that do not store
 the derived files, because that keeps the repository less cluttered, and
 it is easier to see the substantive changes when comparing versions and
 trying to understand what changed between commits.
 
    However, there are several reasons why the 'gawk' maintainer likes to
 have everything in the repository.
 
    First, because it is then easy to reproduce any given version
 completely, without relying upon the availability of (older, likely
 obsolete, and maybe even impossible to find) other tools.
 
    As an extreme example, if you ever even think about trying to
 compile, oh, say, the V7 'awk', you will discover that not only do you
 have to bootstrap the V7 'yacc' to do so, but you also need the V7
 'lex'.  And the latter is pretty much impossible to bring up on a modern
 GNU/Linux system.(1)
 
    (Or, let's say 'gawk' 1.2 required 'bison' whatever-it-was in 1989
 and that there was no 'awkgram.c' file in the repository.  Is there a
 guarantee that we could find that 'bison' version?  Or that _it_ would
 build?)
 
    If the repository has all the generated files, then it's easy to just
 check them out and build.  (Or _easier_, depending upon how far back we
 go.)
 
    And that brings us to the second (and stronger) reason why all the
 files really need to be in Git.  It boils down to who do you cater
 to--the 'gawk' developer(s), or the user who just wants to check out a
 version and try it out?
 
    The 'gawk' maintainer wants it to be possible for any interested
 'awk' user in the world to just clone the repository, check out the
 branch of interest and build it.  Without their having to have the
 correct version(s) of the autotools.(2)  That is the point of the
 'bootstrap.sh' file.  It touches the various other files in the right
 order such that
 
      # The canonical incantation for building GNU software:
      ./bootstrap.sh && ./configure && make
 
 will _just work_.
 
    This is extremely important for the 'master' and 'gawk-X.Y-stable'
 branches.
 
    Further, the 'gawk' maintainer would argue that it's also important
 for the 'gawk' developers.  When he tried to check out the 'xgawk'
 branch(3) to build it, he couldn't.  (No 'ltmain.sh' file, and he had no
 idea how to create it, and that was not the only problem.)
 
    He felt _extremely_ frustrated.  With respect to that branch, the
 maintainer is no different than Jane User who wants to try to build
 'gawk-4.1-stable' or 'master' from the repository.
 
    Thus, the maintainer thinks that it's not just important, but
 critical, that for any given branch, the above incantation _just works_.
 
    A third reason to have all the files is that without them, using 'git
 bisect' to try to find the commit that introduced a bug is exceedingly
 difficult.  The maintainer tried to do that on another project that
 requires running bootstrapping scripts just to create 'configure' and so
 on; it was really painful.  When the repository is self-contained, using
 'git bisect' in it is very easy.
 
    What are some of the consequences and/or actions to take?
 
   1. We don't mind that there are differing files in the different
      branches as a result of different versions of the autotools.
 
        A. It's the maintainer's job to merge them and he will deal with
           it.
 
        B. He is really good at 'git diff x y > /tmp/diff1 ; gvim
           /tmp/diff1' to remove the diffs that aren't of interest in
           order to review code.
 
   2. It would certainly help if everyone used the same versions of the
      GNU tools as he does, which in general are the latest released
      versions of Automake, Autoconf, 'bison', and GNU 'gettext'.
 
      Installing from source is quite easy.  It's how the maintainer
      worked for years (and still works).  He had '/usr/local/bin' at the
      front of his 'PATH' and just did:
 
           wget https://ftp.gnu.org/gnu/PACKAGE/PACKAGE-X.Y.Z.tar.gz
           tar -xpzvf PACKAGE-X.Y.Z.tar.gz
           cd PACKAGE-X.Y.Z
           ./configure && make && make check
           make install    # as root
 
           NOTE: Because of the 'https://' URL, you may have to supply
           the '--no-check-certificate' option to 'wget' to download the
           file.
 
    Most of the above was originally written by the maintainer to other
 'gawk' developers.  It raised the objection from one of the developers
 "... that anybody pulling down the source from Git is not an end user."
 
    However, this is not true.  There are "power 'awk' users" who can
 build 'gawk' (using the magic incantation shown previously) but who
 can't program in C. Thus, the major branches should be kept buildable
 all the time.
 
    It was then suggested that there be a 'cron' job to create nightly
 tarballs of "the source."  Here, the problem is that there are source
 trees, corresponding to the various branches!  So, nightly tarballs
 aren't the answer, especially as the repository can go for weeks without
 significant change being introduced.
 
    Fortunately, the Git server can meet this need.  For any given branch
 named BRANCHNAME, use:
 
      wget https://git.savannah.gnu.org/cgit/gawk.git/snapshot/gawk-BRANCHNAME.tar.gz
 
 to retrieve a snapshot of the given branch.
 
    ---------- Footnotes ----------
 
    (1) We tried.  It was painful.
 
    (2) There is one GNU program that is (in our opinion) severely
 difficult to bootstrap from the Git repository.  For example, on the
 author's old (but still working) PowerPC Macintosh with Mac OS X 10.5,
 it was necessary to bootstrap a ton of software, starting with Git
 itself, in order to try to work with the latest code.  It's not
 pleasant, and especially on older systems, it's a big waste of time.
 
    Starting with the latest tarball was no picnic either.  The
 maintainers had dropped '.gz' and '.bz2' files and only distribute
 '.tar.xz' files.  It was necessary to bootstrap 'xz' first!
 
    (3) A branch (since removed) created by one of the other developers
 that did not include the generated files.