effective perl programmingCGI Barbie says, "Programming is hard!"![]()
by Joseph N. Hall
Joseph N. Hall is the author of Effective Perl Programming (Addison-Wesley, 1998). He teaches Perl classes, consults, and plays a lot of golf in his spare time.
CGI programming in Perl has brought a lot of new programmers into the Perl and UNIX world. While I'm always happy to see more people learning the fine art of programming, CGI isn't the gentlest introduction to the art. Nor is it always the safest. It is notoriously difficult to write programs that safely take user input and use it to perform file operations and/or run commands. In this column, I examine an application that is similar to one I have had my students write in my CGI-programming classes. What it does is simple, yet it turns out to be surprisingly difficult to make it robust and secure from misuse. But isn't life always this way?
A "Simple" Directory Lister
#!/usr/local/bin/perl -w
We'll run the program with warnings (-w) turned on. The CGI, URI::Escape, and HTML::Entities modules are CGI-related. The Cwd module gives us a portable way of converting the current directory to an absolute pathname.
my $root_dir = ".";
The program lists the contents of $root_dir by default. The "specs" for this program say that we should be able to list the contents of directories below but not above $root_dir. script_name() is a function from CGI.pm that returns the URL of this script. cwd() is a function from Cwd that returns an absolute path for the current working directory, like the pwd command.
sub errorpage {
The program defines a subroutine called errorpage used in the event of fatal errors. It generates HTML around an error message. Here, as well as in other places below, we use the encode_entities function from HTML::Entities to escape HTML entities that might appear in the output, so that if an error message contains an HTML metacharacter like "&" it will be converted into its escaped equivalent ("&" in this case) before it is output. A link to $script_name gives users an alternative to the back button.
my $dir = param('dir');
The program takes two CGI parameters. If a file parameter is present, the program will treat it as the name of a text file and display the contents of that file. The dir parameter, if present, specifies the directory for the file parameter. If no file parameter is present, the program lists the contents of the dir directory, or $root_dir if there is no dir either. The dir parameter is relative to $root_dir.
if (defined($file) and $file ne '') {
Here is the code that checks to see whether there is a file parameter. If there is, we use it as the name of a file in the current directory (set from the dir parameter) and display its contents as a text/plain document.
} else {
If there's no file to display, we display the contents of the dir directory (or .) instead. The uri_escape() function changes URI metacharacters like "+" and "%" into URI-escaped equivalents like "%2B" and "%25" we'll need this when we construct HTML anchors in our document, later. $cwd_dir is an absolute pathname for the directory in the dir parameter.
my @table;
Output gets "spooled" into an array called @table. I usually prefer to accumulate and save output while there is still interesting work left to perform. If the program emits half a page of HTML and then encounters a fatal error, the resulting unfinished HTML will look ugly, so I don't emit HTML until the very end.
if ($orig_cwd_dir ne $cwd_dir) {
This code adds a link to the directory "above" if we are browsing a directory other than $root_dir.
foreach my $fn (@files) {
This block of code traverses the list of files that we obtained with the readdir earlier and creates a line of output for each of them. We display the filename (HTML-escaping it so that filenames like & will display correctly) and its size and modification time. We also create links that can be used to view text files and subdirectories. File-names in the link URLs have to be URI-escaped so that links to files with names like a+b work correctly.
print header,
Finally, we emit our HTML and are done. Or are we?
Closing Up the Holes They're Everywhere!
sysopen F, $file, O_RDONLY or errorpage "couldn't open $file: $!"; We now no longer have to worry about metacharacters like "|", because sysopen only knows how to open files. The program also has problems in that it can display files outside $root_dir and its subdirectories. It takes a bit of work to fix it so that it doesn't violate the spec that requires it to operate in just that part of the directory tree. First, before we do the sysopen, let's perform some other checks to disallow things like file=../../../../../../etc/passwd:
if (index($file, '/') > -1) {
That takes care of the filename, but what about the dir parameter? It has the same relative path problems, so we have to work on that too. The following code would go after my $dir = param('dir'):
if (defined($dir) and $dir ne '') {
This eliminates the obvious cases. However, if the directory we are browsing has a symlink to some other directory, we'll be able to use the symlink to look at a directory outside the tree. To fix this, after if ($orig_cwd_dir ne $cwd_dir), put:
errorpage "directory $cwd_dir doesn't seem right" unless
Because $cwd_dir is supposed to be "below" $orig_cwd_dir, $orig_cwd_dir should be a prefix of $cwd_dir. This will abort the execution of the script if it's not. We also want to avoid generating anchors to bad places in the first place:
my ($b, $bb) = (-d $fn and not -l $fn and -x $fn and -r $fn) ?
This tests for symlinks (-l $fn), as well as ensures that directories are executable and readable.
Additional Paranoia
BEGIN {
Even though this program doesn't run any subshells or commands (not anymore, anyway), it's not a bad idea to turn on Perl's tainting feature. When tainting is turned on, Perl will yield a fatal error whenever user-supplied input (from files, standard input, environment variables, or a variety of other sources) is used as part of a subshell, command, or file operation. Tainting also invokes a number of other checks, such as checking the execution path for insecure directories. Tainting is automatically turned on for setuid programs, but you can activate it explicitly with the -T command-line switch: #!/usr/local/bin/perl -Tw With tainting turned on, we have to clean up the execution path:
$ENV{PATH} = "/bin:/usr/bin";
The prohibition on file operations using user-supplied data includes the chdir operator. To use the dir parameter with chdir, we have to untaint its value. To untaint data, you perform a pattern match and extract portions with subexpressions (parentheses). Data captured in the subexpressions is untainted: ($dir) = $dir =~ /([^\0]*)/; The purpose of untainting is to eliminate metacharacters that might be the source of some unwanted or unintended behavior often, shell metacharacters in commands, but others as well. All manner of characters are valid in directory names, but I've used this as an opportunity to ensure that the directory name does not contain nulls. Tainting doesn't actually help this program very much, but it's a good feature to use so that it'll be harder to make silly mistakes if you pass the code along to someone else or modify it yourself in the future.
Programming Really Is Hard
For more about security issues in CGI programming, see the World Wide
Web security FAQ at
<http://www.w3.org/Security/Faq/www-security-faq.html>.
See the Perl perlsec man page for more about tainting, and the perlfunc
man page for more about Perl's strangely (and sometimes dangerously)
versatile open operator. A commented version of the entire program
referred to in this article is available in
<http://www.perlfaq.com/examples>.
|
![]() Last changed: 20 Jul. 2000 mc |
|