USENIX ;login: - Effective Perl Programming

effective perl programming

CGI Barbie says, "Programming is hard!"

by Joseph N. Hall
<[email protected]>

Joseph N. Hall is the author of Effective Perl Programming (Addison-Wesley, 1998). He teaches Perl classes, consults, and plays a lot of golf in his spare time.

CGI programming in Perl has brought a lot of new programmers into the Perl and UNIX world. While I'm always happy to see more people learning the fine art of programming, CGI isn't the gentlest introduction to the art. Nor is it always the safest. It is notoriously difficult to write programs that safely take user input and use it to perform file operations and/or run commands.

In this column, I examine an application that is similar to one I have had my students write in my CGI-programming classes. What it does is simple, yet it turns out to be surprisingly difficult to make it robust and secure from misuse. But isn't life always this way?

A "Simple" Directory Lister
Our program is a relatively simple one that lists the contents of directories and displays the contents of text files in those directories. I'll give it to you in pieces with a bit of explanation in between.

#!/usr/local/bin/perl -w use strict; use CGI qw(:standard); use URI::Escape; # for uri_escape use HTML::Entities; # for encode_entities use Cwd; # for cwd

We'll run the program with warnings (-w) turned on. The CGI, URI::Escape, and HTML::Entities modules are CGI-related. The Cwd module gives us a portable way of converting the current directory to an absolute pathname.

my $root_dir = "."; die "couldn't read the root dir" unless -r $root_dir and -x $root_dir; chdir $root_dir or die "couldn't change to the root dir: $!"; my $script_name = script_name; my $orig_cwd_dir = cwd;

The program lists the contents of $root_dir by default. The "specs" for this program say that we should be able to list the contents of directories below but not above $root_dir. script_name() is a function from CGI.pm that returns the URL of this script. cwd() is a function from Cwd that returns an absolute path for the current working directory, like the pwd command.

sub errorpage { my $msg = shift; print header, start_html, h1('Error'), p(encode_entities($msg)), p(a({-href => $script_name}, "go back to $script_name")), end_html; exit; }

The program defines a subroutine called errorpage used in the event of fatal errors. It generates HTML around an error message. Here, as well as in other places below, we use the encode_entities function from HTML::Entities to escape HTML entities that might appear in the output, so that if an error message contains an HTML metacharacter like "&" it will be converted into its escaped equivalent ("&" in this case) before it is output. A link to $script_name gives users an alternative to the back button.

my $dir = param('dir'); $dir = '.' if !defined($dir) or $dir eq ''; my $file = param('file');

The program takes two CGI parameters. If a file parameter is present, the program will treat it as the name of a text file and display the contents of that file. The dir parameter, if present, specifies the directory for the file parameter. If no file parameter is present, the program lists the contents of the dir directory, or $root_dir if there is no dir either. The dir parameter is relative to $root_dir.

if (defined($file) and $file ne '') { chdir $dir or errorpage "can't change to directory $dir: $!"; open F, $file or errorpage "couldn't open $file: $!"; print header('text/plain'); print "file: $file\n\n"; print <F>;

Here is the code that checks to see whether there is a file parameter. If there is, we use it as the name of a file in the current directory (set from the dir parameter) and display its contents as a text/plain document.

} else { my $dir_uri = uri_escape $dir, '\W'; chdir $dir or errorpage "can't change to directory $dir: $!"; my $cwd_dir = cwd; opendir DIR, '.' or errorpage "can't open directory $cwd_dir: $!"; my @files = sort readdir DIR; closedir DIR;

If there's no file to display, we display the contents of the dir directory (or .) instead. The uri_escape() function changes URI metacharacters like "+" and "%" into URI-escaped equivalents like "%2B" and "%25" — we'll need this when we construct HTML anchors in our document, later. $cwd_dir is an absolute pathname for the directory in the dir parameter.

my @table; push @table, "<table>\n"; push @table, "<tr><td>Filename</td> <td>Size</td><td>La st Modified</td></tr>\n";

Output gets "spooled" into an array called @table. I usually prefer to accumulate and save output while there is still interesting work left to perform. If the program emits half a page of HTML and then encounters a fatal error, the resulting unfinished HTML will look ugly, so I don't emit HTML until the very end.

if ($orig_cwd_dir ne $cwd_dir) { my $up_dir = $dir; $up_dir =~ s#/[^/]*$##; # lop off last /dir push @table, "<tr><td><tt><a href=\"$script_name?dir=" . uri_escape($up_dir) . '">Up one directory</a></tt></td>' . "<td></td><td></td></tr>\n"; }

This code adds a link to the directory "above" if we are browsing a directory other than $root_dir.

foreach my $fn (@files) { next if $fn eq "." or $fn eq ".."; my $fn_enc = encode_entities $fn_enc; my $fn_uri = uri_escape $fn, '\W'; my ($a, $aa) = (-T $fn) ? (qq(<a href="$script_name?file=$fn_uri&dir=$dir_uri">), "</a>") : ('', ''); my ($b, $bb) = (-d $fn ) ? (qq(<a href="$script_name?dir=$dir_uri/$fn_uri">), "</a>") : ('', ''); my $bytes_k = int(((-s $fn) + 1023) / 1024); my $mod_time = localtime((stat $fn)[9]); push @table, "<tr><td><tt>$a$b$fn_enc$bb$aa</tt></td> <td><tt>${bytes_k}k</tt></td><td>&l t;tt>$mod_time</tt></td </tr>\n"; } push @table, "</table>\n";

This block of code traverses the list of files that we obtained with the readdir earlier and creates a line of output for each of them. We display the filename (HTML-escaping it so that filenames like & will display correctly) and its size and modification time. We also create links that can be used to view text files and subdirectories. File-names in the link URLs have to be URI-escaped so that links to files with names like a+b work correctly.

print header, start_html($cwd_dir), h1("Directory of ", tt($cwd_dir)), "\n", @table, end_html; }

Finally, we emit our HTML and are done. Or are we?

Closing Up the Holes — They're Everywhere!
Despite its relatively simple construction and user interface, this program is filled with not-so-obvious flaws. The worst among them is that it can be used to execute arbitrary UNIX commands on the server via Perl's open operator. Notice that when we opened the file above we just said open F, $filename. What if we run this program with the query string "file=cat+/etc/passwd|"? Well, the open becomes open F, "cat /etc/passwd|", which opens a pipe to the cat command and displays the contents of /etc/passwd! I'm not fond of the syntax of the open operator and think that it should never be used to process a user-specified filename in a CGI program. I prefer to use Fcntl and open the file with sysopen instead:

sysopen F, $file, O_RDONLY or errorpage "couldn't open $file: $!";

We now no longer have to worry about metacharacters like "|", because sysopen only knows how to open files.

The program also has problems in that it can display files outside $root_dir and its subdirectories. It takes a bit of work to fix it so that it doesn't violate the spec that requires it to operate in just that part of the directory tree. First, before we do the sysopen, let's perform some other checks to disallow things like file=../../../../../../etc/passwd:

if (index($file, '/') > -1) { errorpage "filenames containing / are not allowed"; } errorpage "no such file $file" unless -f $file; errorpage "$file doesn't look like a text file" unless -T $file;

That takes care of the filename, but what about the dir parameter? It has the same relative path problems, so we have to work on that too. The following code would go after my $dir = param('dir'):

if (defined($dir) and $dir ne '') { errorpage "directory starting with / not allowed" if $dir =~ m#^/#; errorpage "directory can't have .." if index($dir, '..') > -1; errorpage "no such directory $dir" unless -d $dir; }

This eliminates the obvious cases. However, if the directory we are browsing has a symlink to some other directory, we'll be able to use the symlink to look at a directory outside the tree. To fix this, after if ($orig_cwd_dir ne $cwd_dir), put:

errorpage "directory $cwd_dir doesn't seem right" unless index($cwd_dir, $orig_cwd_dir) == 0;

Because $cwd_dir is supposed to be "below" $orig_cwd_dir, $orig_cwd_dir should be a prefix of $cwd_dir. This will abort the execution of the script if it's not. We also want to avoid generating anchors to bad places in the first place:

my ($b, $bb) = (-d $fn and not -l $fn and -x $fn and -r $fn) ? (qq(<a href="$script_name?dir=$dir_uri/$fn_uri">), "</a>") : ('', '');

This tests for symlinks (-l $fn), as well as ensures that directories are executable and readable.

Additional Paranoia
After spending hours or days on a program intended to run in the slightly weird environment of CGI, it's easy to forget that the same program can be run from the command line. A poorly written CGI program can be a source of exploits for local users. Hopefully you don't have any bad eggs logged onto your Web server, but who knows? One thing I like to do in my more complex CGI programs is to test to see whether the program is being run by the expected user. This is very important for setuid programs and not a bad idea for other programs. I usually do this with a BEGIN block before everything else in the program (even the use directives), so that the check will occur before any other code in the program runs.

BEGIN { if ($< != getpwnam('joseph') and $< != getpwnam('nobody')) { die "This program may not be run by user $<"; } }

Even though this program doesn't run any subshells or commands (not anymore, anyway), it's not a bad idea to turn on Perl's tainting feature. When tainting is turned on, Perl will yield a fatal error whenever user-supplied input (from files, standard input, environment variables, or a variety of other sources) is used as part of a subshell, command, or file operation. Tainting also invokes a number of other checks, such as checking the execution path for insecure directories. Tainting is automatically turned on for setuid programs, but you can activate it explicitly with the -T command-line switch:

#!/usr/local/bin/perl -Tw

With tainting turned on, we have to clean up the execution path:

$ENV{PATH} = "/bin:/usr/bin"; $ENV{BASH_ENV} = "/bin:/usr/bin";

The prohibition on file operations using user-supplied data includes the chdir operator. To use the dir parameter with chdir, we have to untaint its value. To untaint data, you perform a pattern match and extract portions with subexpressions (parentheses). Data captured in the subexpressions is untainted:

($dir) = $dir =~ /([^\0]*)/;

The purpose of untainting is to eliminate metacharacters that might be the source of some unwanted or unintended behavior — often, shell metacharacters in commands, but others as well. All manner of characters are valid in directory names, but I've used this as an opportunity to ensure that the directory name does not contain nulls.

Tainting doesn't actually help this program very much, but it's a good feature to use so that it'll be harder to make silly mistakes if you pass the code along to someone else or modify it yourself in the future.

Programming Really Is Hard
My point in presenting this example hasn't been to teach the gentle art of CGI programming. Rather, it has been to illustrate that although the concepts behind CGI programming are relatively simple, it can be difficult to eliminate unwanted side effects in programs that deal with user input. Your system's CGI users may be some of its least-experienced programmers, yet the issues that they have to confront when dealing with access to files and subshells can be tricky even for very experienced developers. You should never underestimate the ability of seemingly innocuous user-written CGI programs to create security risks for your servers.

For more about security issues in CGI programming, see the World Wide Web security FAQ at <http://www.w3.org/Security/Faq/www-security-faq.html>. See the Perl perlsec man page for more about tainting, and the perlfunc man page for more about Perl's strangely (and sometimes dangerously) versatile open operator. A commented version of the entire program referred to in this article is available in <http://www.perlfaq.com/examples>.