A file download CGI script in Perl

As Why is my image download CGI script written in Perl not working? on Stackoverflow (and the SitePoint article which the OP got his inspiration from) show, it seems the mechanics of CGI are still a mystery. In this post, I am going to use CGI.pm, instead of my favorite, CGI::Simple because the former has been in the core since perl 5.004, but I do recommend CGI::Simple because it does not include all the HTML generation cruft CGI.pm carries around for compatibility reasons.

There are important considerations in a CGI script used for file downloads. One important reason for using such a script is to make sure certain files can only be served to certain visitors. Assuming the files are not generated on the fly, one must ensure that arbitrary files on the server are not exposed to the outside world. Therefore, the file download script should not allow the path of a file to be passed in as a CGI script. Instead, there should be a defined mapping between the values of a query parameter and the paths of files eligible to be served. For example, if you are going to serve a bunch of files from the vacation, work, and school directories under C:\web\static\photos, don’t have a query parameter corresponding to the relative path to the image file (e.g. download.pl?image=vacation%2Fphoto01.jpg), but rather, use two query parameters: E.g., download.pl?section=vacation&image=photo01.jpg). For your own sanity, limit the set of characters that can be used for directory and image file names (you control the server, right? Just because a whole bunch of characters can be used in file names, it doesn’t mean you can’t restrict what you’re willing to handle.)

This is not an exhaustive list of things to watch out for writing a safe and robust script. While the WWW Security FAQ may seem dated, its advice is still relevant today, and gives you an idea of what you’re dealing with.

So, here we go. First, the shebang line:

#!/opt/www/perl/bin/perl

It is true that perl is installed as part of the system on Unixy systems, but you should not use the system perl for your web servers. Instead, use perlbrew and cpanm to manage a dedicated Perl environment for the web server. This will make your life easier in the long run by decoupling the management of the web server’s environment from the system’s package manager.

If you’re using Apache on Windows, you can use the ScriptInterpreterSource to specify a different location than the one that appears in the shebang line of the script.

use strict;
use warnings;

strict and warnings will save you a lot of headaches. Without them, things may go wrong without any outward indication.

use CGI;

I do not recommend using use CGI::Carp qw( fatalsToBrowser ). CGI::Carp is a fine and useful module, but it is better for you to learn to find, access, read, and interpret your web server’s error logs.

use File::Copy qw( copy )

use File::Spec::Functions qw( catfile );

Path::Class is nicer, but File::Spec is in the core. catfile is how you concatenate components of a file path without trying to remember what happens to "C:\Users" (of course, you could always use 'C:\Users' or "C:/Users", but I still prefer treating paths as paths rather than plain strings.)

use constant IMG_DIR => catfile(qw(
    E:\ srv localhost images
));

It is important for the top level directory to be defined independently of user input, and for your script not to be able to traverse outside of this directory. Otherwise, an attacker may access vital system files.

serve_logo(IMG_DIR);

We call a function to do the actual serving of the file. This makes sure you don’t inadvertently use global variables and also makes transitioning to a persistent environment much simpler.

sub serve_logo {
    my ($dir) = @_;

    # The mapping of CGI request parameter values to
    # actual filenames. This also gives you the flexibility
    # to change the filenames without changing URLs in links.
    # In real life, the mapping may come from a database or
    # configuration file.

    my %mapping = (
        'big' => 'logo-1600x1200px.png',
        'medium' => 'logo-800x600.png',
        'small' => 'logo-400x300.png',
        'thumb' => 'logo-200x150.jpg',
        'icon' => 'logo-32x32.gif',
    );

    my $cgi = CGI->new;

    my $file = $mapping{ $cgi->param('which') };
    defined ($file)
        or die "Invalid image name in CGI request\n";

    # The components of the path are "known good" at
    # this point.
    send_file($cgi, $dir, $file);

    return;
}

sub send_file {
    my ($cgi, $dir, $file) = @_;

    my $path = catfile($dir, $file);

    open my $fh, '<:raw', $path
        or die "Cannot open '$path': $!";

    print $cgi->header(
        -type => 'application/octet-stream',
        -attachment => $file,
    );

    binmode STDOUT, ':raw';

    copy $fh => \*STDOUT, 8_192;

    close $fh
        or die "Cannot close '$path': $!";

    return;
}

Note that I do not bother with fancy error pages etc. You can set up a 500 handler in your web server’s configuration. The error messages emitted with die go to your web server’s log files, where they belong, and are only viewable by you as opposed to the whole wide world.

Here is the entire script in one chunk:

#!/opt/www/perl/bin/perl

use CGI;
use File::Copy qw( copy );
use File::Spec::Functions qw( catfile );

use constant IMG_DIR => catfile(qw(
    E:\ srv localhost images
));

serve_logo(IMG_DIR);

sub serve_logo {
    my ($dir) = @_;

    my %mapping = (
        'big' => 'logo-1600x1200px.png',
        'medium' => 'logo-800x600.png',
        'small' => 'logo-400x300.png',
        'thumb' => 'logo-200x150.jpg',
        'icon' => 'logo-32x32.gif',
    );

    my $cgi = CGI->new;

    my $file = $mapping{ $cgi->param('which') };
    defined ($file)
        or die "Invalid image name in CGI request\n";

    send_file($cgi, $dir, $file);

    return;
}

sub send_file {
    my ($cgi, $dir, $file) = @_;

    my $path = catfile($dir, $file);

    open my $fh, '<:raw', $path
        or die "Cannot open '$path': $!";

    print $cgi->header(
        -type => 'application/octet-stream',
        -attachment => $file,
    );

    binmode STDOUT, ':raw';

    copy $fh => \*STDOUT, 8_192;

    close $fh
        or die "Cannot close '$path': $!";

    return;
}