Sunday, May 20, 2012

Updating all outdated Perl modules

The following is probably not the kind of procedure one would follow in a well organized environment.

I, on the other hand, have a handful of computers, each of which has a bunch of perls installed. Most of those might be activated (via perlbrew once in a while. That means, I sometimes find that I need to make sure everything is up-to-date before checking a bunch of things.

Now, with ActivePerl, you get ppm, and it is pretty convenient.

But then, on the same laptop, I also have a couple of Strawberry Perl distributions, as well as perls I built from scratch using the free Microsoft compilers.

Then, we have the Mac, with the various perls installed via MacPorts, and, also, via perlbrew. Then, there are two netbooks with ArchLinux on them, the dual boot partition on my main laptop.

I think you get the picture. Nothing is mission critical. Each Perl distribution has a bunch of modules I installed using cpanm at some point.

I'd rather try to upgrade all the modules/distributions I have for one specific perl at once: If anything is messed up, I am not averse to nuking that perl and everything that came with it, and starting from scratch. It hasn't happened yet, but if it does, I doubt I'll spend more time dealing with that than I would have fiddling with all the individual environments in turn.

Most of the time, I can just type a couple of commands and leave:

$ cpanm --self-upgrade
$ cpan-outdated |cpanm

If you want to inspect the CHANGES for everything that is outdated, you can use cpan-outdated |cpan-listchanges (hat tip Daniel Lavelle who seems to be much more disciplined about the whole thing ;-)

cpan-outdated and cpan-listchanges are helpful on their own, even if you don't go the "not a care in the world" route of cpan-outdated |cpanm.

Saturday, May 12, 2012

A file download CGI script in Perl

As Why is my image download CGI script written in Perl not working? on Stackoverflow (and the SitePoint article which the OP got his inspiration from) show, it seems the mechanics of CGI are still a mystery. In this post, I am going to use CGI.pm, instead of my favorite, CGI::Simple because the former has been in the core since perl 5.004, but I do recommend CGI::Simple because it does not include all the HTML generation cruft CGI.pm carries around for compatibility reasons.

There are important considerations in a CGI script used for file downloads. One important reason for using such a script is to make sure certain files can only be served to certain visitors. Assuming the files are not generated on the fly, one must ensure that arbitrary files on the server are not exposed to the outside world. Therefore, the file download script should not allow the path of a file to be passed in as a CGI script. Instead, there should be a defined mapping between the values of a query parameter and the paths of files eligible to be served. For example, if you are going to serve a bunch of files from the vacation, work, and school directories under C:\web\static\photos, don't have a query parameter corresponding to the relative path to the image file (e.g. download.pl?image=vacation%2Fphoto01.jpg), but rather, use two query parameters: E.g., download.pl?section=vacation&image=photo01.jpg). For your own sanity, limit the set of characters that can be used for directory and image file names (you control the server, right? Just because a whole bunch of characters can be used in file names, it doesn't mean you can't restrict what you're willing to handle.)

This is not an exhaustive list of things to watch out for writing a safe and robust script. While the WWW Security FAQ may seem dated, its advice is still relevant today, and gives you an idea of what you're dealing with.

So, here we go. First, the shebang line:

#!/opt/www/perl/bin/perl

It is true that perl is installed as part of the system on Unixy systems, but you should not use the system perl for your web servers. Instead, use perlbrew and cpanm to manage a dedicated Perl environment for the web server. This will make your life easier in the long run by decoupling the management of the web server's environment from the system's package manager.

If you're using Apache on Windows, you can use the ScriptInterpreterSource to specify a different location than the one that appears in the shebang line of the script.

use strict;
use warnings;

strict and warnings will save you a lot of headaches. Without them, things may go wrong without any outward indication.

use CGI;

I do not recommend using use CGI::Carp qw( fatalsToBrowser ). CGI::Carp is a fine and useful module, but it is better for you to learn to find, access, read, and interpret your web server's error logs.

use File::Copy qw( copy )
use File::Spec::Functions qw( catfile );

Path::Class is nicer, but File::Spec is in the core. catfile is how you concatenate components of a file path without trying to remember what happens to "C:\Users" (of course, you could always use 'C:\Users' or "C:/Users", but I still prefer treating paths as paths rather than plain strings.)

use constant IMG_DIR => catfile(qw(
    E:\ srv localhost images
));

It is important for the top level directory to be defined independently of user input, and for your script not to be able to traverse outside of this directory. Otherwise, an attacker may access vital system files.

serve_logo(IMG_DIR);

We call a function to do the actual serving of the file. This makes sure you don't inadvertently use global variables and also makes transitioning to a persistent environment much simpler.

sub serve_logo {
    my ($dir) = @_;

    # The mapping of CGI request parameter values to 
    # actual filenames. This also gives you the flexibility
    # to change the filenames without changing URLs in links.
    # In real life, the mapping may come from a database or
    # configuration file.

    my %mapping = (
        'big' => 'logo-1600x1200px.png',
        'medium' => 'logo-800x600.png',
        'small' => 'logo-400x300.png',
        'thumb' => 'logo-200x150.jpg',
        'icon' => 'logo-32x32.gif',
    );

    my $cgi = CGI->new;

    my $file = $mapping{ $cgi->param('which') };
    defined ($file)
        or die "Invalid image name in CGI request\n";

    # The components of the path are "known good" at
    # this point.
    send_file($cgi, $dir, $file);

    return;
}

sub send_file {
    my ($cgi, $dir, $file) = @_;

    my $path = catfile($dir, $file);

    open my $fh, '<:raw', $path
        or die "Cannot open '$path': $!";

    print $cgi->header(
        -type => 'application/octet-stream',
        -attachment => $file,
    );

    binmode STDOUT, ':raw';

    copy $fh => \*STDOUT, 8_192;

    close $fh
        or die "Cannot close '$path': $!";

    return;
}

Note that I do not bother with fancy error pages etc. You can set up a 500 handler in your web server's configuration. The error messages emitted with die go to your web server's log files, where they belong, and are only viewable by you as opposed to the whole wide world.

Here is the entire script in one chunk:

#!/opt/www/perl/bin/perl

use CGI;
use File::Copy qw( copy );
use File::Spec::Functions qw( catfile );

use constant IMG_DIR => catfile(qw(
    E:\ srv localhost images
));

serve_logo(IMG_DIR);

sub serve_logo {
    my ($dir) = @_;

    my %mapping = (
        'big' => 'logo-1600x1200px.png',
        'medium' => 'logo-800x600.png',
        'small' => 'logo-400x300.png',
        'thumb' => 'logo-200x150.jpg',
        'icon' => 'logo-32x32.gif',
    );

    my $cgi = CGI->new;

    my $file = $mapping{ $cgi->param('which') };
    defined ($file)
        or die "Invalid image name in CGI request\n";

    send_file($cgi, $dir, $file);

    return;
}

sub send_file {
    my ($cgi, $dir, $file) = @_;

    my $path = catfile($dir, $file);

    open my $fh, '<:raw', $path
        or die "Cannot open '$path': $!";

    print $cgi->header(
        -type => 'application/octet-stream',
        -attachment => $file,
    );

    binmode STDOUT, ':raw';

    copy $fh => \*STDOUT, 8_192;

    close $fh
        or die "Cannot close '$path': $!";

    return;
}

Thursday, May 3, 2012

Generate pretty weekly schedule charts using HTML::Template

This is another post inspired by a question on Stackoverflow.

The task is to take textual schedule information such as

0 24000 97200
1 52200 95400
2 0 0
3 37800 180000
4 0 0
5 48000 95400
6 0 0

and turn it into a an overview of the busy and free times over the course of the week. The end result of my attempt looks like this:

In the input data, the first number is the day of the week (Sunday = 0), the second number is the beginning of busy time in seconds since midnight, and the third number is the duration of the task in seconds. If a task does not fit in a given day, it spills over to the following day(s) until it's done.

I used div elements appropriately floated and given a height and width to contain free and busy blocks for each day. Clearly, this is not very semantic: Converting it to an ordered list with title attributes on each block displaying the time periods is left as an exercise to the reader (mostly because I am afraid I'll end up wasting a lot of time styling those lis.

I used my old standby, HTML::Template to generate the HTML. It is pretty standard fare and nicely separates the data from presentation (however, I used tpage from Template-Toolkit to generate the escaped HTML included in this post :-)

<!doctype html>
<html>
<head>
<title>Pretty Schedule Example</title>

<style type="text/css">
    .row .day { margin:0; padding:0; padding-right:1%; width:9% }
    .row .container { width: 90% }

    .row,
    .container,
    .container .busy,
    .container .free
    {
        height:1.5em;
        margin:0; 
        padding:0;
        overflow:hidden;
        white-space:nowrap;
    }

    .row .container,
    .row .day, 
    .container .busy,
    .container .free { float:left }

    .container .free { background-color:#f0f0a0 }
    .container .busy { background-color:#70b070 }

    .row {
        margin-bottom:0.125em;
        max-width:600px;
        width:100%;
    }

    .clear { clear:both }
</style>
</head>
<body>
<TMPL_LOOP DAYS>
<div class="row clear">
<div class="day"><TMPL_VAR DAY></div>
<div class="container">
<TMPL_LOOP BLOCKS>
<div class="<TMPL_VAR CLASS>" style="width:<TMPL_VAR WIDTH>"></div>
</TMPL_LOOP>
</div>
<br class="clear">
</div>
</TMPL_LOOP>
</body>
</html>

The Perl side is also relatively straightforward (and, as a self-contained example, reads the schedule data from the __DATA__ section):

#!/usr/bin/env perl

use strict; use warnings;
use HTML::Template;

use constant ONE_MINUTE => 60;
use constant ONE_HOUR   => 60 * ONE_MINUTE;
use constant ONE_DAY    => 24 * ONE_HOUR;

my @days = qw(Sun Mon Tue Wed Thu Fri Sat);

my $remainder = 0;
my @rows;

while (my $line = <DATA>) {
    next unless $line =~ m{
        \A
        ( [0-6]  ) \s+
        ( [0-9]+ ) \s+
        ( [0-9]+ ) \s+
        \z
    }x;
    my ($daynum, $start, $duration) = ($1, $2, $3);

    my $dayrow = make_dayrow(
        $days[$daynum],
        $remainder,
        $start,
        $duration,
    );

    push @rows, $dayrow->[0];
    $remainder = $dayrow->[1];
}

my $tmpl = HTML::Template->new(filename => 'pretty-schedule.html');
$tmpl->param(
    DAYS => \@rows
);

print $tmpl->output;

sub make_dayrow {
    my ($day, $remainder, $start, $duration) = @_;
    my $row = {DAY => $day};

    if ($remainder > ONE_DAY) {
        $row->{BLOCKS} = [
            { CLASS => 'busy', WIDTH => '100%' }
        ];
        return [$row, $remainder - ONE_DAY];
    }

    my @blocks;
    my $hang = $start + $duration > ONE_DAY
             ? $duration - (ONE_DAY - $start)
             : 0
             ;

    push @blocks, {
        CLASS => 'busy',
        WIDTH => format_width($remainder),
    } if $remainder > 0;

    if ($start > $remainder) {
        push @blocks, {
            CLASS => 'free',
            WIDTH => format_width($start - $remainder),
        }, {
            CLASS => 'busy',
            WIDTH => format_width($duration - $hang),
        };
    }

    unless ($hang) {
        my $taken = $start > $remainder ? $start : $remainder;
        $taken += $duration;
        my $leftover = ONE_DAY - $taken;
        if ($leftover > 0) {
            push @blocks, {
                CLASS => 'free',
                WIDTH => format_width($leftover),
            };
        }
    }

    $row->{BLOCKS} = \@blocks;
    return [$row, $hang];
}

sub format_width {
    my ($width) = @_;
    return sprintf('%.6f%%', 100 * ($width / ONE_DAY));
}

__DATA__
0 24000 97200
1 52200 95400
2 0 0
3 37800 180000
4 0 0
5 48000 95400
6 0 0

Enjoy!


Friday, April 27, 2012

How do you pass the PostData argument to the Navigate method of IWebBrowser2 using Win32::OLE?

Some years ago, well before Selenium made its debut, I wrote a program that scraped some information from a U.S. government web site. It was one of those "Made for Internet Explorer 6" monstrosities with 1 - 3 MB ViewStates embedded in each page. Most of the crucial actions could only be initiated via JavaScript from within the browser which made the usual Mechanize solution unworkable. Sure, I could have tried to capture and decipher the traffic using the Web Scraping Proxy, but my first few attempts were extremely disheartening.

The scraper was going to run on Windows workstations anyway, so I figured, I could just use Win32::OLE to control Internet Explorer: The approach worked, and I wrote a monstrosity that collected data fast enough to help Cornell Extension staff guide seniors through the process of choosing among Medicare Part D plans.

A few days ago, a question on Stackoverflow titled How can I read a https page's content using Perl on Windows, without installing OpenSSL? caught my attention. Now, to be honest, if either Crypt::SSLeay or IO::Socket::SSL etc are installed automatically with whatever distribution the OP is using, the whole issue is moot.

But, I was too lazy to investigate that (e.g., I know Strawberry Perl includes Crypt::SSLeay, but I wasn't motivated enough to go through the whole "discovery" process with the OP.

No, I was suddenly nostalgic for the pain of the days of controlling 16 parallel Internet Explorer instances on each machine, and I realized, I had never figured out how to do a POST using IWebBrowser2.

It turns out, it is quite straightforward: You just have to include a PostData argument to the invocation. Except …

The post data specified by PostData is passed as a SAFEARRAY Data Type structure. The VARIANT should be of type VT_ARRAY|VT_UI1 and point to a SAFEARRAY Data Type. The SAFEARRAY Data Type should be of element type VT_UI1, dimension one, and have an element count equal to the number of bytes of post data.

Well … OK then. How do I do that in Perl?

That is, given:

    my $html = $poster->post(
        'http://test.localdomain:8080/cgi-bin/showcgi.pl',
        [
            [ var1 => 'Yağmur', ],
            [ var2 => 'Øl']
        ],
    );

what do I do with that second argument so that, by the time it's passed on to IWebBrowser2's Navigate method, it is in the appropriate format?

Here are the steps:

  1. Make a single string:

    my $postdata = join '&', map join('=', @$_), @$data;

  2. Encode that string into octets:

    $postdata = encode('UTF-8', $postdata);

  3. Create a Variant to hold that:

    my $vPostData = Variant(VT_ARRAY|VT_UI1, length $postdata);

  4. Put $postdata in the Variant:

    $vPostData->Put($postdata)

  5. Invoke Navigate:

    $ie->Navigate(
          $url,
          $flags,
          '_self',
          $vPostData,
          "Content-Type: application/x-www-form-urlencoded\015\012",
    );

I deciphered that last part thanks to this Microsoft KB article: How To Use the PostData Parameter in WebBrowser Control.

Here's a short script demonstrating the technique. In my case, the target, showcgi.pl, just creates a CGI::Simple and prints the output of $cgi->Dump.

#!/usr/bin/env perl

package My::Poster;

use strict; use warnings;
use Const::Fast;
use Encode;
use Try::Tiny;
use Win32::OLE;
use Win32::OLE::Variant;
local $Win32::OLE::Warn = 3;

# http://msdn.microsoft.com/en-us/library/aa768360%28v=vs.85%29.aspx
const my %BrowserNavConstants => (
    navOpenInNewWindow => 0x1,
    navNoHistory => 0x2,
    navNoReadFromCache => 0x4,
    navNoWriteToCache => 0x8,
    navAllowAutosearch => 0x10,
    navBrowserBar => 0x20,
    navHyperlink => 0x40,
    navEnforceRestricted => 0x80,
    navNewWindowsManaged => 0x0100,
    navUntrustedForDownload => 0x0200,
    navTrustedForActiveX => 0x0400,
    navOpenInNewTab => 0x0800,
    navOpenInBackgroundTab => 0x1000,
    navKeepWordWheelText => 0x2000,
    navVirtualTab => 0x4000,
    navBlockRedirectsXDomain => 0x8000,
    navOpenNewForegroundTab => 0x10000,
);

sub new {
    my $class = shift;
    my $self = bless {} => $class;
    $self->init;
    return $self;
}

sub ie {
    my $self = shift;
    my $ie = shift;

    return $self->{ie} unless defined $ie;

    $self->{ie} = $ie;
    return;
}

sub init {
    my $self = shift;

    $self->ie(
        Win32::OLE->new(
            'InternetExplorer.Application',
            sub {
                my $ie = shift;
                try { $ie->Quit if $ie } catch { warn "$_\n" };
            },
        )
    );
    return;
}

sub post {
    my $self = shift;
    my ($url, $data) = @_;

    my $ie = $self->ie;

    my $flags = $BrowserNavConstants{navNoHistory} |
                $BrowserNavConstants{navNoReadFromCache} |
                $BrowserNavConstants{navNoWriteToCache} |
                $BrowserNavConstants{navEnforceRestricted} |
                $BrowserNavConstants{navNewWindowsManaged} |
                $BrowserNavConstants{navUntrustedForDownload} |
                $BrowserNavConstants{navBlockRedirectsXDomain}
    ;

    my $postdata = join '&', map join('=', @$_), @$data;
    $postdata = encode('UTF-8', $postdata);

    my $vPostData = Variant(VT_ARRAY|VT_UI1, length $postdata);
    $vPostData->Put($postdata);

    # http://msdn.microsoft.com/en-us/library/aa752133%28v=vs.85%29.aspx
    $ie->Navigate(
        $url,
        $flags,
        '_self',
        $vPostData,
        "Content-Type: application/x-www-form-urlencoded\015\012",
    );

    sleep 1 until $ie->{ReadyState} == 4;
    return $ie->Document->documentElement->innerHTML;
}

sub DESTROY {
    my $self = shift;
    try { $self->ie->Quit } catch { warn "$_\n" };
    return;
}

package main;

use strict; use warnings;

my $poster = My::Poster->new;

my $html = $poster->post(
    'http://test.localdomain:8080/cgi-bin/showcgi.pl',
    [
        [ var1 => 'Yağmur', ],
        [ var2 => 'Øl']
    ],
);

print $html if defined $html;

PS: I have no idea how well this works with UAC in Windows versions after XP SP3, so take it with a grain of salt. In my case, with local $CGI::Simple::PARAM_UTF8 = 1; in the CGI script, I got the expected output:

$VAR1 = bless( {
                 '.parameters' => [
                                    'var1',
                                    'var2'
                                  ],
                 '.globals' => {
                                 'DEBUG' => 0,
                                 'NO_UNDEF_PARAMS' => 0,
                                 'NO_NULL' => 1,
                                 'FATAL' => -1,
                                 'USE_PARAM_SEMICOLONS' => 0,
                                 'PARAM_UTF8' => 1,
                                 'DISABLE_UPLOADS' => 1,
                                 'USE_CGI_PM_DEFAULTS' => 0,
                                 'NPH' => 0,
                                 'POST_MAX' => 102400,
                                 'HEADERS_ONCE' => 0
                               },
                 'var1' => [
                             "Ya\x{c4}\x{9f}mur"
                           ],
                 '.fieldnames' => {
                                    'var1' => 1,
                                    'var2' => 1
                                  },
                 'var2' => [
                             "\x{c3}\x{98}l"
                           ],
                 '.crlf' => '
',
                 '.header_printed' => 1
               }, 'CGI::Simple' );

Saturday, April 21, 2012

My favorite editor is Vim

When I tell people that my favorite editor is Vim, they usually expect me to be familiar with various Vim wizardry.

I am not.

For the most part, I like Vim because it provides the best cross-platform syntax highlighting I have been able to find, and it doesn't take a lot of IQ to do basic editing tasks.

Most of the time, I do as much work in Vim as I would have done in Notepad or Nano to accomplish mildly time consuming tasks. It takes habit to get used to various combinations of modes and commands, and I have never invested the time into making anything beyond the mundane a habit.

I recently decided to change that after coming across an early version of Practical Vim by Drew Neil.

It looked interesting enough that I decided to part with a $20 for the e-book. As brian so clearly explained, the real question when you buy a book is not if it is worth the cover price, but, rather, whether the time spent studying it will add enough to your productivity to make it worthwhile.

As much as I like Vim's documentation, sometimes you need an appealing step-by-step guide to practicing even the things you know well. I am happy to report that Practical Vim has not disappointed me yet, as I quickly went through Part I.

If this book helps me avoid wasting 30% of the time I currently waste hunting through the documentation whenever I need something more than regular editing, it will have amply paid for the time I invested in it.

If you use Vim at all, or if you think your current editor is not all it could be, you owe it to yourself to at least take a look at this book, and consider learning Vim.

PS: If you are able to open a file in Emacs, make a couple of changes, save it in a different directory, and close the program without any help, or stumbles, you've already established that you are at least 1,000 times smarter than I am and there is no reason to rub it in ;-)


Monday, April 16, 2012

HTML::TableExtract is beautiful

And, it will help you save time and make money ;-)

I was motivated to post this because of another one of those Stackoverflow questions. I decided at the outset not to answer that question because the poster basically wants a job done for him for free:

I need the script to get the HTML, parse the table then to save the content (User + Online time), I would also want it to run every 15 mins and to make a report in the end of the day.

However, a so-called answer stated:

in my opinion perl can get a little ugly.

does it need to be perl....if it does ot i would recommend python.

Of course, I am kinda used to people proclaiming Perl sucks, but the supreme irony of the ugliness of the post asserting Perl's ugliness motivated me.

HTML::TableExtract is beautiful. Over the years, it has saved me a lot of time, and even helped me make some money.

So, consider the Personal Income table available from the Bureau of Economic Analysis.

Let's say I want to get the Unemployment Insurance row out of that table. Here's how you do it using HTML::TableExtract:

#!/usr/bin/env perl

use strict; use warnings;
use HTML::TableExtract;

my $te = HTML::TableExtract->new(
    attribs => { id => 'tbl' },
);

# local copy of
# http://bea.gov/iTable/iTableHtml.cfm?reqid=9&step=3&isuri=1&903=58

$te->parse_file('personal-income.html');

my ($table) = $te->tables;

for my $row ($table->rows) {
    my ($undef, $label, @row) = @$row;
    next unless defined $label;
    if ($label eq 'Unemployment insurance') {
        print "$label\t@row\n";
    }
}

And, here is the output:

C:\temp> uu
Unemployment insurance 101.1 127.9 144.8 148.7 152.8 137.4 135.8 128.7 117.5 108.8 103.0 100.1

Of course, things can be refined, but this is pretty beautiful.

Wednesday, April 11, 2012

Can Parallel::ForkManager speed up a seemingly IO bound task?

This post is motivated by a question on Stackoverflow titled Faster way to do “perl -anle 'print $F[1]' *manyfiles* > result” ('cut' fails). I cautiously mentioned that it might help to read files in parallel. brian d foy emphasized that it might not do much given that the task seems to be IO bound. I admit I don't understand much about filesystem caches, but I thought reading input in parallel might help utilize them better. So, I decided to test if that made sense. My preliminary check on Windows seemed to show using Parallel::ForkManager with two processes resulted in the run time being reduced by 40%. But, we all know that Windows is a little funky when it comes to forking, so I rebooted into Linux, and decided to try there.

The tests were run on my aging laptop with an ancient Core Duo processor and 2 GB of physical memory. No, I still haven't replaced it, although I do also use a newer Mac. Both perls were 5.14.2. The Windows system was XP Professional SP3 and the Linux system has ArchLinux with the latest updates. I am only going to show results from runs on Linux below.

First, I generated ten files with 1,000,000 lines each using the following short script:

#!/usr/bin/env perl

use strict; use warnings;

for (1 .. 1_000_000) {
    my $str;
    if (0.2 > rand) {
        $str .= ' ' x rand(10);
    }
    $str .= 'a' x 20 . ' ' . 'a' x 20;
    print $str, "\n";
}

I then used the following script to read all the files and capture the second field:

#!/usr/bin/env perl

use strict; use warnings;

use Parallel::ForkManager;

my ($maxproc) = @ARGV;
my @files = ('01' .. '10');

my $pm = Parallel::ForkManager->new($maxproc);

for my $file (@files) {
    my $pid = $pm->start and next;
    my $ret = open my $h, '<', $file;

    unless ($ret) {
        warn "Cannot open '$file': $!";
        $pm->finish;
    }

    while (my $line = <$h>) {
        next unless $line =~ /^\s*\S+\s+(\S+)/;
        print "$1\n";
    }

    $pm->finish;
}

$pm->wait_all_children;

Here are the results:

# sync
# echo 3 > /proc/sys/vm/drop_caches 
$ /usr/bin/time -f '%Uuser %Ssystem %Eelapsed %PCPU' ./process.pl 1 > output
24.44user 0.93system 0:29.08elapsed 87%CPU

$ rm output
# sync
# echo 3 > /proc/sys/vm/drop_caches 
$ /usr/bin/time -f '%Uuser %Ssystem %Eelapsed %PCPU' ./process.pl 2 > output
24.95user 0.91system 0:18.31elapsed 141%CPU

$ rm output
# sync
# echo 3 > /proc/sys/vm/drop_caches 
$ /usr/bin/time -f '%Uuser %Ssystem %Eelapsed %PCPU' ./process.pl 4 > output
24.70user 0.88system 0:17.45elapsed 146%CPU

$ rm output 
# sync
# echo 3 > /proc/sys/vm/drop_caches 
$ /usr/bin/time -f '%Uuser %Ssystem %Eelapsed %PCPU' ./process.pl 1 > output
25.31user 0.95system 0:29.72elapsed 88%CPU

The results were consistent through the handful of runs I tried.

So, even if the task is IO-bound, it may pay to utilize all the cores you have.