David Grayson's blog

Wednesday, March 21, 2012

The Ruby Enumerable module

When programming in the Ruby language, it is essential to know about the Enumerable module. The more Enumerable methods you know, the more concise and elegant your code will be. The Enumerable module is mixed into many common classes such as Array, Hash, and Range. It is also mixed into some not-so-common classes such as Set, Struct, and Enumerator.

The official documentation of Ruby 1.9.3's Enumerable module is good, but I don't like the alphabetical organization. I think you can understand what's going on better if you group the methods into similar categories. To that end, I have created a Ruby 1.9.3 Enumerable module quick reference. It's a one-page sheet that briefly documents ALL of enumerable's methods. Check it out!

Download links:

Ruby 1.9.3 Enumerable module quick reference (PDF)

Ruby 1.9.3 Enumerable module quick reference (ODS)

Let me know if there are any errors or you think it could be better in some way!

Sunday, January 22, 2012

Make 3.82 can't handle parentheses in prerequisites

GNU Make 3.82, which is the latest official release of GNU Make, behaves unpredictably when you try to run it with the following Makefile:

a : libawesome.a(x.o y.o)

The above Makefile is a perfectly valid Makefile that uses archive member targets. When you run Make with this Makefile, the expected output is:

make: *** No rule to make target `libawesome.a(x.o)', needed by `a'.  Stop.

But when I ran Make 3.82 (compiled from source) under Ubuntu 10.04 it printed:

make: *** No rule to make target `libawesome.a(x.o', needed by `a'.  Stop.

Note the missing right parenthesis in the target name.

The reason for this incorrect behavior is that there is a serious bug in the source code of Make 3.82 in read.c on line 3031 that causes it to read past the end of a string, which means it is probably reading into unallocated memory, which can cause undefined behavior. If you have the ability to compile Make from source, you can prove this by downloading the Make 3.82 source and adding a two-line if statement to read.c above line 3031:

if (! (flags & PARSEFS_NOAR)
          && tp == tmpbuf && tp[0] != '(' && tp[nlen-1] != ')')
        {
          char *n = strchr (tp, '(');
          if (n)
            {
              /* This looks like the first element in an open archive group.
                 A valid group MUST have ')' as the last character.  */
              if (nlen > strlen(p) + 1)
                printf("About to read past end of string!  p = %s, nlen = %d\n", p, nlen);
              const char *e = p + nlen;
              do
                {
                  e = next_token (e);

When you compile and run this modified version of Make with the Makefile above, it should print something like:

About to read past end of string!  p =  y.o), nlen = 16
make: *** No rule to make target `libawesome.a(x.o', needed by `a'.  Stop.

Here's the problem: the pointer p is already pointing to the end of the current token, so adding nlen (the length of the current token) makes no sense and can cause Make to read past the end of the string. The memory after the end of the string is probably unallocated memory, so when the program tries to read from it this is undefined behavior: anything can happen!

The solution to this bug is to remove the "+ nlen". This solution appears to prevent Make from crashing, but it looks like there are still more bugs with archive target groups that are not fixed by this simple change.

The bug was actually fixed a long time ago

This bug was reported by Peter Breitenlohner on July 31st, 2010 as Savannah bug #30612 and fixed by Paul Smith, a maintainer of GNU Make. Unfortunately, there hasn't been a new release of GNU Make since then. Until there is a new release of GNU Make, people can keep using GNU Make 3.81 or they can use the version 3.82-pololu2 which I just released the other day. It contains the fix discussed above and another fix that affects Windows users who have a parenthesis in the path to their shell.

Based on the timestamps at ftp://ftp.gnu.org/gnu/make/ it looks like new release of Make only happen about every 4 years, so it would be a while before we have a new release.

How I found the bug

This was a tricky bug to find. If you're interested in debugging open source C programs you might want to know how I did it.

A while ago, some users of the Wixel SDK were erratically getting a non-sensical error from GNU Make in Windows which was preventing them from building their Wixel apps.

Initially, tracking the bug down was frustrating. I eventually managed to reproduce the bug on my computer, but it was very slippery: sometimes Make would run fine, sometimes it would display the non-sensical error, and sometimes it would just crash with an Access Violation error when trying to read from an invalid address. Almost any perturbation to the environment would cause it to switch randomly between these three behaviors. Deleting an empty folder sitting in the current directory, switching to a different Command Prompt, or piping the output of Make to a file: any of these changes would change the result of running Make, sometimes permanently.

But the most frustrating thing was that Make would never crash when it was running in gdb. I was compiling Make with MinGW with debug options enabled and attempting to debug it with gdb, but it would never crash in gdb so I could not get a stack trace.

Whenever Make did crash, its exception handler would print the address of the code where it crashed and I could use gdb to convert that address to a file name and line number. That turned out to be a dead end because it was crashing in a very simple function that was called from many places (next_token).

I knew there must be a way to get a stack trace in situations like this, so I searched around and found DrMingw by José Fonseca. It was easy to install and use. After downloading it, I extracted it to a folder in "Program Files (x86)", put it on my PATH, ran Command Prompt as Administrator, and ran "drmingw -i" to install it as the Just-In-Time Debugger on my system. I commented out the call to SetUnhandledExceptionFilter in main.c so that the exception would be handled by Windows instead of by Make's own exception handler. I got Make to crash again. When the standard Windows crash dialog came up, I clicked "Debug" to launch DrMingw and it opened a window with a full stack trace with file names and line numbers. Here is an excerpt:

Call stack:
0041A930  gnumake.exe:0041A930  next_token  misc.c:511

 ...
 next_token (const char *s)
 {
>   while (isblank ((unsigned char)*s))
     ++s;
   return (char *)s;
 ...

00415AA8  gnumake.exe:00415AA8  parse_file_seq  read.c:3038

 ...
               do
                 {
>                   e = next_token (e);
                   /* Find the end of this word.  We don't want to unquote and
                      we don't care about quoting since we're looking for the
 ...

I looked in read.c around line 3038 and found that it was a block of code that had to do with parsing archive member targets in make. From there is was pretty easy to find the bug.

Friday, November 11, 2011

Crippling Hash Collisions in Rubinius

Last weekend, I was happily and innocently coding away in Ruby, when I noticed that my tests were running one hundred times slower in one Ruby implementation than in another. The different implementations naturally have different weaknesses and strengths in performance, but when you find some code that runs 100 times slower there is probably something going very wrong. I figured out the root cause of this problem and I hope you will find it interesting.

I managed to simplify my code to the simplest possible thing that demonstrates the problem:

P = Struct.new :x, :y, :z
hash = {}
(0..15).each do |x|
  (0..15).each do |y|
    (0..15).each do |z|
      key = P.new x, y, z
      hash[key] = true
    end
  end
end

This code looks pretty innocent, right? If I were on the jury I would say "not guilty".

Here's how long the code took to run in various Ruby implementations on my computer (according to the Linux time command):

Implementation	Run time
ruby 1.9.2p290 (MRI)	0.020 s
rubinius 1.2.4	3.210 s
rubinius 2.0.0dev	0.894 s

Here are some gratuitous SVG bar graphs:

That's right, you need to use a logarithmic scale to even see the run time of this code under MRI.

I was very curious about why Rubinius was so slow, so I used the profiler built in to Rubinius and discovered that most of the time was being spent searching for matching hash entries (Hash::Entry#match?). That's because the hash function in Rubinius 1.2.4 only generated 16 different hash-codes for the 4096 different keys I put into the hash. Every time a new key was added to the hash, there was a hash collision with approximately one sixteenth of the keys already in the hash, so the new key had to be compared to each of them to make sure it wasn't equal to any of the keys already in the hash. Rubinius 2.0.0dev did better, with 576 unique hash-codes. In MRI 1.9.2, all the hash-codes were unique.

If you didn't understand the paragraph above, read on because I'm going to explain how hash tables work.

As we know, the job of a hash table to associate keys to values. The keys and values can be any Ruby object. Imagine the hash table is a city, and the key-value pairs are houses in the city. Whenever you access an element in the hash table, you are visiting one of the houses.

When you want to visit a house (key-value pair) in the city, the first thing you need to do is compute its hash code, which is like the street address of the house. In Ruby, this is done by calling the key's hash method, which returns an integer. The roads in the hash city are designed so that once you know the address of a house, you can quickly find all the houses that have that address.

In our hash city, it possible that two or more houses will have the same address. This is called a hash collision, but unlike other collisions it doesn't leave body parts lying in the middle of the road. The consequence of a hash collision is relatively mild: after you find the houses that have the address you are looking for, you just need to look at each house individually to see whether it is the house you are looking for. This can increase the time it takes to find the right house. This applies equally to both read and write operations, but in our example code we only did write operations.

For example, if you are looking for a house (key-value pair) with address (hash-code) 34, you would quickly walk to the houses that have address 34. You would then look at each house (key) and see if it is the house you are looking for (using the eql? method). If you find the house, you can stop searching, but otherwise you will have to look at all the houses with that address.

There are different ways to generate hash-codes and for a given set of data some of those methods will produce more hash collisions than others.

In Ruby 1.9, the 4096 keys generated in the code above all had different hash-codes so there were no hash collisions.

In Rubinius 1.2.4, because the hash method was implemented differently, the 4096 keys in the code above had only *16* different addresses. Imagine a city with 4096 houses, but only 16 addresses. It would take forever to find anyone's house!

I hope this illuminated something for you. Let me know what you think in the comments!