Saturday, May 12, 2012

A file download CGI script in Perl

As Why is my image download CGI script written in Perl not working? on Stackoverflow (and the SitePoint article which the OP got his inspiration from) show, it seems the mechanics of CGI are still a mystery. In this post, I am going to use CGI.pm, instead of my favorite, CGI::Simple because the former has been in the core since perl 5.004, but I do recommend CGI::Simple because it does not include all the HTML generation cruft CGI.pm carries around for compatibility reasons.

There are important considerations in a CGI script used for file downloads. One important reason for using such a script is to make sure certain files can only be served to certain visitors. Assuming the files are not generated on the fly, one must ensure that arbitrary files on the server are not exposed to the outside world. Therefore, the file download script should not allow the path of a file to be passed in as a CGI script. Instead, there should be a defined mapping between the values of a query parameter and the paths of files eligible to be served. For example, if you are going to serve a bunch of files from the vacation, work, and school directories under C:\web\static\photos, don't have a query parameter corresponding to the relative path to the image file (e.g. download.pl?image=vacation%2Fphoto01.jpg), but rather, use two query parameters: E.g., download.pl?section=vacation&image=photo01.jpg). For your own sanity, limit the set of characters that can be used for directory and image file names (you control the server, right? Just because a whole bunch of characters can be used in file names, it doesn't mean you can't restrict what you're willing to handle.)

This is not an exhaustive list of things to watch out for writing a safe and robust script. While the WWW Security FAQ may seem dated, its advice is still relevant today, and gives you an idea of what you're dealing with.

So, here we go. First, the shebang line:

#!/opt/www/perl/bin/perl

It is true that perl is installed as part of the system on Unixy systems, but you should not use the system perl for your web servers. Instead, use perlbrew and cpanm to manage a dedicated Perl environment for the web server. This will make your life easier in the long run by decoupling the management of the web server's environment from the system's package manager.

If you're using Apache on Windows, you can use the ScriptInterpreterSource to specify a different location than the one that appears in the shebang line of the script.

use strict;
use warnings;

strict and warnings will save you a lot of headaches. Without them, things may go wrong without any outward indication.

use CGI;

I do not recommend using use CGI::Carp qw( fatalsToBrowser ). CGI::Carp is a fine and useful module, but it is better for you to learn to find, access, read, and interpret your web server's error logs.

use File::Copy qw( copy )
use File::Spec::Functions qw( catfile );

Path::Class is nicer, but File::Spec is in the core. catfile is how you concatenate components of a file path without trying to remember what happens to "C:\Users" (of course, you could always use 'C:\Users' or "C:/Users", but I still prefer treating paths as paths rather than plain strings.)

use constant IMG_DIR => catfile(qw(
    E:\ srv localhost images
));

It is important for the top level directory to be defined independently of user input, and for your script not to be able to traverse outside of this directory. Otherwise, an attacker may access vital system files.

serve_logo(IMG_DIR);

We call a function to do the actual serving of the file. This makes sure you don't inadvertently use global variables and also makes transitioning to a persistent environment much simpler.

sub serve_logo {
    my ($dir) = @_;

    # The mapping of CGI request parameter values to 
    # actual filenames. This also gives you the flexibility
    # to change the filenames without changing URLs in links.
    # In real life, the mapping may come from a database or
    # configuration file.

    my %mapping = (
        'big' => 'logo-1600x1200px.png',
        'medium' => 'logo-800x600.png',
        'small' => 'logo-400x300.png',
        'thumb' => 'logo-200x150.jpg',
        'icon' => 'logo-32x32.gif',
    );

    my $cgi = CGI->new;

    my $file = $mapping{ $cgi->param('which') };
    defined ($file)
        or die "Invalid image name in CGI request\n";

    # The components of the path are "known good" at
    # this point.
    send_file($cgi, $dir, $file);

    return;
}

sub send_file {
    my ($cgi, $dir, $file) = @_;

    my $path = catfile($dir, $file);

    open my $fh, '<:raw', $path
        or die "Cannot open '$path': $!";

    print $cgi->header(
        -type => 'application/octet-stream',
        -attachment => $file,
    );

    binmode STDOUT, ':raw';

    copy $fh => \*STDOUT, 8_192;

    close $fh
        or die "Cannot close '$path': $!";

    return;
}

Note that I do not bother with fancy error pages etc. You can set up a 500 handler in your web server's configuration. The error messages emitted with die go to your web server's log files, where they belong, and are only viewable by you as opposed to the whole wide world.

Here is the entire script in one chunk:

#!/opt/www/perl/bin/perl

use CGI;
use File::Copy qw( copy );
use File::Spec::Functions qw( catfile );

use constant IMG_DIR => catfile(qw(
    E:\ srv localhost images
));

serve_logo(IMG_DIR);

sub serve_logo {
    my ($dir) = @_;

    my %mapping = (
        'big' => 'logo-1600x1200px.png',
        'medium' => 'logo-800x600.png',
        'small' => 'logo-400x300.png',
        'thumb' => 'logo-200x150.jpg',
        'icon' => 'logo-32x32.gif',
    );

    my $cgi = CGI->new;

    my $file = $mapping{ $cgi->param('which') };
    defined ($file)
        or die "Invalid image name in CGI request\n";

    send_file($cgi, $dir, $file);

    return;
}

sub send_file {
    my ($cgi, $dir, $file) = @_;

    my $path = catfile($dir, $file);

    open my $fh, '<:raw', $path
        or die "Cannot open '$path': $!";

    print $cgi->header(
        -type => 'application/octet-stream',
        -attachment => $file,
    );

    binmode STDOUT, ':raw';

    copy $fh => \*STDOUT, 8_192;

    close $fh
        or die "Cannot close '$path': $!";

    return;
}

10 comments:

  1. It took me a while to figure what was wrong, so for those that end up here (as I did) having random issues serving large files, here's my advice:

    Avoid File::Copy, as it's bugged for this purpose.
    When serving data through CGI, syswrite can return undef ($! being 'Resource temporarily unavailable') for some time.

    File::Copy stops in that case (returns 0, sets $!), failing to transfer the entire file (or stream).

    Many different options to work around that, retrying the syswrite, or using blocking sockets, not sure which on is the best though !

    Anyway code above that very clean, thanks for it, but I wanted to highlight this pitfall.

    ReplyDelete
  2. " my $file = $mapping{ $cgi->param('which') }; "

    What does 'which' in the above statement refer to?

    Thanks.

    ReplyDelete
    Replies
    1. @Akanksha It refers to the CGI parameter in the request. E.g. http://www.example.com/download?which=big

      Delete
    2. Thanks a lot for your reply!

      Can we not pass this cgi parameter 'which', through an html form?

      I am trying to write a cgi script for downloading an image requested by the user, and I intend to get this image name through an html form.

      Whenever I press the submit button on my html form, it returns the entire cgi script instead of the function it is required to perform, i.e., the script written for the submit button is returned instead of the output the script is to display.

      Thanks again!

      Delete
    3. @Akanksha Of course, you can pass it through an HTML form.

      You are seeing the text of the CGI script instead of the server running the script because your server is not configured correctly.

      E.g. see this question on Serverfault.

      Delete
    4. Sorry for bothering you again but I've checked my server and apparently its configured correctly. However, the problem is still persisting.

      Kindly note that in your cgi script given on this blog, if I alter the statement

      "my $file = $mapping{ $cgi->param('which') }; "
      to
      "my $file = 'File one.jpg'; "
      ('File one.jpg' is the file I intend to download)

      the script works perfectly and 'File one.jpg' is downloaded.

      But, the html form output is not working.

      I'd be obliged if you can help.

      Thanks again!

      Delete
    5. Problem solved!
      Thanks for helping!

      Delete
  3. use constant IMG_DIR => catfile(qw(
    E:\ srv localhost images
    ));
    What is this location used for?

    ReplyDelete
    Replies
    1. It is passed as an argument to *serve_logo* which passes it on to *send_file*.

      Delete
  4. I am using a local server apache to download images from it. When i pass the the argument using an html form it shows the following error-

    Cannot open 'C:\Program\Files(x86)\Apache\Group\Apache2\cgi-bin\fh': No such file or directory at C:/Program Files (x86)/Apache Group/Apache2/cgi-bin/pics1.pl line 26.

    Could you please help me out?

    ReplyDelete