Monday, August 9, 2010

How does a bug survive for a very long time?

I recently took on the task of giving some TLC to the RT queue for Crypt::SSLeay.

All software has bugs. Some bugs are easier to detect and diagnose than others. I once lost a gig to someone who claimed to write bug free C. When I was asked if I could guarantee that there would be no bugs in my code, I honestly answered No. I did not feel bad about it, because, clearly, the programmer and the project manager deserved each other.

Repeat: All software has bugs. Some bugs are easier to detect and diagnose than others.

One of the bugs for Crypt::SSLeay involved someone using a self-built perl and self-built OpenSSL library. Makefile.PL was setting /home/user/.../openssl-0.9.8l/include/openssl as the include directory to pass to gcc which was resulting in a failure to compile SSLeay.c.

I suspect the person who reported the bug did not have an OpenSSL installation in the system directories because when I tried to build with some older versions of the library, everything seemed to compile fine. Makefile.PL had told me:

BUILD INFORMATION
================================================
ssl library: OpenSSL 0.9.3p-dev in /home/sinan/testing
ssl header:  openssl/ssl.h
libraries:   -L/home/sinan/testing/lib -lssl -lcrypto -lgcc
include dir: -I/home/sinan/testing/include/openssl
================================================

But, programs using the OpenSSL library use:

[sinan@kas]$ cat crypt_ssleay_version.h
#include <openssl/ssl.h>
#include <openssl/crypto.h>
#include <openssl/err.h>
#include <openssl/rand.h>
#include <openssl/pkcs12.h>

Clearly, this would not result in /home/sinan/testing/include/openssl/ssl.h being included.

How do we explain the lack of failures in CPAN Testers then? How do I explain that everything seemed to compile fine?

Well, since the include statements above use <…>, if the include files are not found in the path passed using -I, gcc will happily go and use the system headers from /usr/include (that's where they live on my system). According to the Changes file, the move from using #include "…" to angle brackets was implemented in 0.57_01. I am not sure why the change was made, but it does have the effect of masking a failure to include files in the correct location if there is an OpenSSL installation in system directories.

The way I finally verified 100% that this was the problem was to add an #error directive in /usr/include/openssl/crypto.h:
[sinan@kas]$ head /usr/include/openssl/crypto.h
#error "WRRRRRROOOONGGGGG!!!!"

Which resulted in:

In file included from /usr/include/openssl/bio.h:69:0,
                 from /usr/include/openssl/ssl.h:152,
                 from crypt_ssleay_version.h:1,
                 from SSLeay.xs:25:
/usr/include/openssl/crypto.h:1:2: error: #error "WRRRRRROOOONGGGGG!!!!"

There are a few other reports pointing out the same issue. I'll think about the correct fix and upload a new development release in few days.

How did the bug survive for a long time? Probably because both CPAN Testers and most users have an OS default OpenSSL library. So, even though Makefile.PL picks the wrong directory, /usr/include/openssl (as in almost all the PASS reports from CPAN Testers), it ends up not mattering, because /usr/include is the compiler's default include path.

A Fix

IMHO, a fairly simple fix works here. Replace:

$inc_dir = dirname($version_file);
return unless -e "$inc_dir/ssl.h";

my $prefix;
if ($^O eq 'MSWin32') {
    $inc_dir =~ s{[\\/]openssl\z}{};
    $prefix  = 'openssl/';
}
elsif (index($inc_dir, '/../') > -1) {
    # OpenSSL include directory is in a sibling directory
    $inc_dir =~ s{\/openssl\z}{};
    $prefix  = 'openssl/';
}
else {
    $prefix = ($inc_dir =~ /\bopenssl/i) ? 'openssl/' : '';
}

with

my $prefix;
if ( 'openssl' eq lc basename $inc_dir ) {
    $prefix = 'openssl/';
    $inc_dir = dirname $inc_dir;
}
else {
    $prefix = '';
}

So far, everything seems to be OK after this change in my tests on ArchLinux with various versions of OpenSSL and on Windows with Strawberry Perl and OpenSSL 1.0.0a. However, I would love to hear about it if the change creates any problems.

No comments:

Post a Comment