Tuesday, August 20, 2013

Regarding a note on optimization in Perl in perlperf

I have a habit of regularly browsing through Perl's extensive documentation as new versions come out. It's a way of reminding myself of what I might have forgotten or new information I might not have noticed. This time, as I was reading through perlperf, I noticed the following:

The difference is clear to see and the dereferencing approach is slower. While it managed to execute an average of 628,930 times a second during our test, the direct approach managed to run an additional 204,403 times, unfortunately. Unfortunately, because there are many examples of code written using the multiple layer direct variable access, and it's usually horrible. It is, however, minusculy faster. The question remains whether the minute gain is actually worth the eyestrain, or the loss of maintainability.

You can examine it yourself, but I thought the benchmark was artificially contrived to give an advantage to the $hasref->{this}{that}{and}{the}{other} syntax rather than the cleaner one. I wanted to see if I could reverse the result by making just a small change.

I noticed that the data structure was holding scores for two players. It makes much more sense to do something arithmetic with two players' scores than just to concatenate them. So, first, I calculated the normalized score for just one player. The ugly syntax still performed better. But, who on earth would normalize the score of just one player?

Here's the changed benchmark script:

#!/usr/bin/env perl

use strict;
use warnings;
use Benchmark qw(cmpthese);

my $ref = {
    'ref' => {
        myscore => 101,
        yourscore => 99,
    },
};

cmpthese(-3, {
    'direct' => sub {
        my $x = $ref->{ref}{myscore} /
            ($ref->{ref}{yourscore} + $ref->{ref}{myscore});
        my $y = $ref->{ref}{yourscore} /
            ($ref->{ref}{yourscore} + $ref->{ref}{myscore});
    },
    'dereference' => sub {
        my $ref = $ref->{ref};
        my $myscore = $ref->{myscore};
        my $yourscore = $ref->{yourscore};
        my $x = $myscore / ($myscore + $yourscore);
        my $y = $yourscore / ($myscore + $yourscore);
    },
});

On my aging MacBook Pro, I got the following result:

                 Rate      direct dereference
direct       845835/s          --        -29%
dereference 1184222/s         40%          --

In about ten runs, the dereference version was anywhere between 36% to 44% faster.

Now, of course, the fact that we are normalizing scores might mean that we want to return them. So, this version:

cmpthese(-3, {
    'direct' => sub {
        my $x = $ref->{ref}{myscore} /
            ($ref->{ref}{yourscore} + $ref->{ref}{myscore});
        my $y = $ref->{ref}{yourscore} /
            ($ref->{ref}{yourscore} + $ref->{ref}{myscore});
        return [$x, $y];
    },
    'dereference' => sub {
        my $ref = $ref->{ref};
        my $myscore = $ref->{myscore};
        my $yourscore = $ref->{yourscore};
        my $denominator = $myscore + $yourscore;
        my $x = $myscore / $denominator;
        my $y = $yourscore / $denominator;
        return [$x, $y];
    },
});

resulted in:

                Rate      direct dereference
direct      590396/s          --        -22%
dereference 756560/s         28%          --

On the other hand, maybe we want to updated the structure itself:

cmpthese(-3, {
    'direct' => sub {
        my $x = $ref->{ref}{myscore} /
            ($ref->{ref}{yourscore} + $ref->{ref}{myscore});
        my $y = $ref->{ref}{yourscore} /
            ($ref->{ref}{yourscore} + $ref->{ref}{myscore});
        $ref->{ref}{myscore} = $x;
        $ref->{ref}{yourscore} = $y;
        return;
    },
    'dereference' => sub {
        my $ref = $ref->{ref};
        my $myscore = $ref->{myscore};
        my $yourscore = $ref->{yourscore};
        my $denominator = $myscore + $yourscore;
        my $x = $myscore / $denominator;
        my $y = $yourscore / $denominator;
        $ref->{myscore} = $x;
        $ref->{yourscore} = $y;
    },
});

which gives us:

               Rate      direct dereference
direct      590219/s          --        -28%
dereference 824560/s         40%          --

I can't help but think neither of these is a reasonable way of writing a little routine that updates scores. For example, I might write it as:

    'reasonable' => sub {
        my $ref = $ref->{ref};
        my $denominator = $ref->{myscore} + $ref->{yourscore};
        $ref->{myscore} /= $denominator;
        $ref->{yourscore} /= $denominator;
        return;
    },

which gives:

                 Rate      direct dereference  reasonable
direct       631583/s          --        -28%        -49%
dereference  876773/s         39%          --        -29%
reasonable  1235334/s         96%         41%          --

Of course, now I have broken

the best advice comes from the renowned Japanese Samurai, Miyamoto Musashi, who said:

Do Not Engage in Useless Activity

in 1645.

Also, note that having all the routines access something from package scope is not a good idea. Instead, it would have to be either passed as an argument or obtained through another call. Simulating that

#!/usr/bin/env perl

use strict;
use warnings;
use Benchmark qw(cmpthese);

my $gimme = sub {{
    'ref' => {
        myscore => 101,
        yourscore => 99,
    },
}};

cmpthese(-3, {
    'direct' => sub {
        my $ref = $gimme->();
        my $x = $ref->{ref}{myscore} /
            ($ref->{ref}{yourscore} + $ref->{ref}{myscore});
        my $y = $ref->{ref}{yourscore} /
            ($ref->{ref}{yourscore} + $ref->{ref}{myscore});
        $ref->{ref}{myscore} = $x;
        $ref->{ref}{yourscore} = $y;
        return;
    },
    'dereference' => sub {
        my $ref = $gimme->()->{ref};
        my $myscore = $ref->{myscore};
        my $yourscore = $ref->{yourscore};
        my $denominator = $myscore + $yourscore;
        my $x = $myscore / $denominator;
        my $y = $yourscore / $denominator;
        $ref->{myscore} = $x;
        $ref->{yourscore} = $y;
        return;
    },
    'reasonable' => sub {
        my $ref = $gimme->()->{ref};
        my $denominator = $ref->{myscore} + $ref->{yourscore};
        $ref->{myscore} /= $denominator;
        $ref->{yourscore} /= $denominator;
        return;
    },

});

we get:

                Rate      direct dereference  reasonable
direct      250558/s          --        -19%        -28%
dereference 308861/s         23%          --        -11%
reasonable  347251/s         39%         12%          --

What's the point? I would say don't pay too much attention to this particular bit of advice. Write readable code in short horizontal lines. I hate horizontal scrolling.

And, please, don't say minusculy.

No comments:

Post a Comment