Sunday, January 22, 2012

Make 3.82 can't handle parentheses in prerequisites

GNU Make 3.82, which is the latest official release of GNU Make, behaves unpredictably when you try to run it with the following Makefile:
a : libawesome.a(x.o y.o)
The above Makefile is a perfectly valid Makefile that uses archive member targets. When you run Make with this Makefile, the expected output is:
make: *** No rule to make target `libawesome.a(x.o)', needed by `a'.  Stop.

But when I ran Make 3.82 (compiled from source) under Ubuntu 10.04 it printed:
make: *** No rule to make target `libawesome.a(x.o', needed by `a'.  Stop.

Note the missing right parenthesis in the target name.

The reason for this incorrect behavior is that there is a serious bug in the source code of Make 3.82 in read.c on line 3031 that causes it to read past the end of a string, which means it is probably reading into unallocated memory, which can cause undefined behavior. If you have the ability to compile Make from source, you can prove this by downloading the Make 3.82 source and adding a two-line if statement to read.c above line 3031:

if (! (flags & PARSEFS_NOAR)
          && tp == tmpbuf && tp[0] != '(' && tp[nlen-1] != ')')
        {
          char *n = strchr (tp, '(');
          if (n)
            {
              /* This looks like the first element in an open archive group.
                 A valid group MUST have ')' as the last character.  */
              if (nlen > strlen(p) + 1)
                printf("About to read past end of string!  p = %s, nlen = %d\n", p, nlen);
              const char *e = p + nlen;
              do
                {
                  e = next_token (e);
When you compile and run this modified version of Make with the Makefile above, it should print something like:
About to read past end of string!  p =  y.o), nlen = 16
make: *** No rule to make target `libawesome.a(x.o', needed by `a'.  Stop.
Here's the problem: the pointer p is already pointing to the end of the current token, so adding nlen (the length of the current token) makes no sense and can cause Make to read past the end of the string. The memory after the end of the string is probably unallocated memory, so when the program tries to read from it this is undefined behavior: anything can happen!

The solution to this bug is to remove the "+ nlen". This solution appears to prevent Make from crashing, but it looks like there are still more bugs with archive target groups that are not fixed by this simple change.

The bug was actually fixed a long time ago


This bug was reported by Peter Breitenlohner on July 31st, 2010 as Savannah bug #30612 and fixed by Paul Smith, a maintainer of GNU Make. Unfortunately, there hasn't been a new release of GNU Make since then. Until there is a new release of GNU Make, people can keep using GNU Make 3.81 or they can use the version 3.82-pololu2 which I just released the other day. It contains the fix discussed above and another fix that affects Windows users who have a parenthesis in the path to their shell.

Based on the timestamps at ftp://ftp.gnu.org/gnu/make/ it looks like new release of Make only happen about every 4 years, so it would be a while before we have a new release.


How I found the bug


This was a tricky bug to find. If you're interested in debugging open source C programs you might want to know how I did it.

A while ago, some users of the Wixel SDK were erratically getting a non-sensical error from GNU Make in Windows which was preventing them from building their Wixel apps.

Initially, tracking the bug down was frustrating. I eventually managed to reproduce the bug on my computer, but it was very slippery: sometimes Make would run fine, sometimes it would display the non-sensical error, and sometimes it would just crash with an Access Violation error when trying to read from an invalid address. Almost any perturbation to the environment would cause it to switch randomly between these three behaviors. Deleting an empty folder sitting in the current directory, switching to a different Command Prompt, or piping the output of Make to a file: any of these changes would change the result of running Make, sometimes permanently.

But the most frustrating thing was that Make would never crash when it was running in gdb. I was compiling Make with MinGW with debug options enabled and attempting to debug it with gdb, but it would never crash in gdb so I could not get a stack trace.

Whenever Make did crash, its exception handler would print the address of the code where it crashed and I could use gdb to convert that address to a file name and line number. That turned out to be a dead end because it was crashing in a very simple function that was called from many places (next_token).

I knew there must be a way to get a stack trace in situations like this, so I searched around and found DrMingw by José Fonseca. It was easy to install and use. After downloading it, I extracted it to a folder in "Program Files (x86)", put it on my PATH, ran Command Prompt as Administrator, and ran "drmingw -i" to install it as the Just-In-Time Debugger on my system. I commented out the call to SetUnhandledExceptionFilter in main.c so that the exception would be handled by Windows instead of by Make's own exception handler. I got Make to crash again. When the standard Windows crash dialog came up, I clicked "Debug" to launch DrMingw and it opened a window with a full stack trace with file names and line numbers. Here is an excerpt:
Call stack:
0041A930  gnumake.exe:0041A930  next_token  misc.c:511

 ...
 next_token (const char *s)
 {
>   while (isblank ((unsigned char)*s))
     ++s;
   return (char *)s;
 ...

00415AA8  gnumake.exe:00415AA8  parse_file_seq  read.c:3038

 ...
               do
                 {
>                   e = next_token (e);
                   /* Find the end of this word.  We don't want to unquote and
                      we don't care about quoting since we're looking for the
 ...
I looked in read.c around line 3038 and found that it was a block of code that had to do with parsing archive member targets in make. From there is was pretty easy to find the bug.