Chapter 7 Extending HTML's Capabilities with CGI

CGI and HTTP
Brainstorming CGI
How to Write a CGI Script

Chapter 6, "Reducing Maintenance Costs with Server-Side Includes," shows how to write programs whose output is displayed to the screen. Starting with this chapter, this book describes increasingly sophisticated scenarios for using programs to direct the visitor's interaction with the site.

The effective use of Common Gateway Interface (CGI) scripts makes your site stand out from the crowd of static HTML files. CGI scripts collect form information and e-mail it to the site owner. They display random images from a list to liven up a site. They serve as the interface to search programs, databases, and batch jobs. Without CGI, the Web would be a much less interesting place.

CGI and HTTP

Chapter 4, "Designing Faster Sites," describes the HyperText Transfer Protocol (HTTP) and showed that, usually, the client sends a GET command to the server. This section looks at GET and POST methods for starting a program on the server, passing data to that program, and getting data back to the client.

Setting Up the Server for CGI

Like SSIs, the person maintaining the server must enable CGIs before they are available to Webmasters on the machine. Many of the difficulties in setting up CGIs come from miscommunication between the Webmaster and the superuser.

System managers are understandably anxious when Webmasters talk about allowing non-privileged users to write code and run it on their machine. Not only are there performance issues involved, but there are well-documented ways for even a non-privileged user to write code that can compromise security and allow private information on the machine to be accessed or even changed by unauthorized users. Many system managers prohibit the use of CGI exec for this reason. Still others restrict CGI to approved scripts, or at least to programs in selected directories, which they can review from time to time. Chapter 17, "How to Keep Portions of the Site Private," addresses these and other security issues in more detail.

In the srm.conf file, the server manager specifies which directories will contain CGI with the ScriptAlias directive. By convention, the alias /cgi-bin/ is set up to handle all CGI scripts. If the server handles multiple Web sites, each site may be set up as a virtual host with its own CGI directory.

The server manager can also allow or deny the execution of CGI scripts through access.conf (or .htaccess for directory-by-directory control) using the Option ExecCGI directive.

Note

For a complete description of the ScriptAlias and Option ExecCGI directives see Chapter 8, "Six Common CGI Mistakes and How to Avoid Them."

`GET` and `POST` in a CGI Context

The GET command specifies a Uniform Resource Identifier, or URI, which in turn contains a path describing the entity to be retrieved. That entity is often a static HTML page, but it can be a program. When the entity is a program, HTTP specifies that the program will be run, and its output is sent in response to the request.

The `GET` Method of Invoking a CGI

In addition to specifying the path, the protocol allows a user to specify a query string by placing a question mark after the path but before the query string. The query string may consist of any of the following "unreserved" characters:

ALPHA, ranging from a to z and A to Z
DIGIT, ranging from 0 to 9
safe, consisting of "$", "-", "_", "." and "+"
extra, consisting of "!", "*", "'", "(",")" and ","
national, consisting of any eight-byte character except control characters, space, ALPHA, DIGIT, safe, extra, and reserved characters

The reserved characters are ";", "/", "?", ":", "@", "&" and "=".

Any characters other than the unreserved must be escaped by packing them into three characters, as follows:

'%' + high order nibble in hex + low order nibble in hex

For example, to send "Test/Line," the client must escape the slash. The ASCII code for a forward slash is 47, or (in hexadecimal) 2F, so the client sends Test%2FLine. A space is sent as a "+".

Many people have the contents of small forms mailed to them using the GET method, and either suffer through the escaped characters, or use one of the many "escape removers," which translate the characters back to their original form. The last section of this chapter shows a better way to receive form data, using the POST method, but here's a quick workaround in Perl for GET:

if ($ENV{'REQUEST_METHOD') eq 'GET')
{
  $query = $ENV('QUERY_STRING');
  # set pluses back to spaces
  $query =~ tr/+/ /;
  $query =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
.
.
.

This code fragment says that if the information was sent via GET (which means the data is waiting in the environment variable named QUERY_STRING), read out the data, translate the character "+" to space, and then look for patterns of the form 'percent-sign' followed by two characters in the range a to f, A to F, or a digit. When it finds such a pattern it interprets that byte as the hexadecimal value of the character, and packs it into a single byte as a character.

The programmer can now use the query string (stored in $query) as desired.

Note that not all browsers have "got it right" in encoding reserved characters. Glenn Trewitt at Digital Equipment Corporation (DEC) has set up a test that shows which browsers work, which ones don't, and where they fail. See http://www.research.digital.com/nsl/formtest/home.html for details.

The `POST` Method

Unlike GET, which is intended primarily to move entities from the server to the client, POST is designed explicitly to move data from the client back to the server. At the HTTP level, a POST request looks like this:

POST /the/requested/URL http/1.0 <CRLF>
general, request, and entity headers <CRLF>
entity body

The server must be able to determine the length of the uploaded entity. This task is usually handled by the Content-length header.

The entity body consists of a series of bytes (known as octets in Netspeak). Once the server determines the length of the entity, it sends those bytes to the standard input (STDIN) of the program named in the URL.

CGI Environment Variables

At last the meaning of the CGI variables becomes clear. Some, such as CONTENT_LENGTH, REMOTE_HOST, and QUERY_STRING, are sent as headers to the server by the client. Others, like SERVER_NAME, are set by the server. All are made available to the CGI script as environment variables.

CGI and Security

Why are system administrators so concerned about CGI scripts? Here's just one example of what an unsuspecting CGI programmer can unleash upon the system.

Suppose a form prompts a user for an e-mail address, then sends that e-mail address on to UNIX's sendmail program. Sendmail is expecting a well-behaved e-mail address, like this:

sendmail jtsmith@somewhere.com

But the user might send it something like this:

| cat /etc/passwd; rm *

and the script runs

sendmail | cat /etc/passwd; rm *

The pipe symbol is used in UNIX to connect two programs together. The exclamation mark is used by many programs as a "shell escape." Following either of these characters, control is passed to a UNIX command interpreter, or shell, and the user can do things they would usually do from the command line.

Once upon a time, shell escapes were useful. A user on a simple character-based terminal might start an editor or a mail program, and then want to pop out to the shell for a moment to issue a command. The shell escape was a mechanism that allowed this behavior without requiring the user to exit the program.

Today, with large monitors and multiple windows, the usefulness of shell escapes is dwindling. Many users keep several windows open, including one or more to the shell. Gradually, shell escapes may be turned off. For the present, however, any lines that are being handed back out to a program, which might accept shell escapes or other "metacharacters," should be protected like this:

if ($query !~ [a-aA-Z0-9_\-+ \t\/@%]+$/}
{
  # Complain to the user about illegal characters
  # Make him fix them before we accept the input
  exit;
}

To help make scripts more secure, some system managers require scripts to be run through CGI-Wrap. CGI-Wrap is available at http://wwwcgi.umr.edu/~cgiwrap.

There is an excellent tutorial on CGI security at http://csclub.uwaterloo.ca/u/mlvanbie/cgisec/. Chapter 17, "How to Keep Portions of the Site Private," addresses Web site security issues.

Security is important in CGI scripting. This book emphasizes a basic understanding of the technology-most of the scripts have had only minimal security features added, so the relevant points are clearly illustrated. Review the guidelines in Chapter 17 and the online CGI security tutorial for ideas about how to make your site more secure.

Brainstorming CGI

CGI scripts can do anything that can be done on any computer. This statement means that, for CGI programmers, the sky is the limit. Here are a few ideas for CGI scripts that may prove useful in practice.

IfNew

The IfNew script was introduced in the last chapter. It takes the name of a file as a parameter (either in the query string or in the name by which the file is called) and checks that file's modification date. If the file is newer than a prescribed number of days, the script returns an IMG tag pointing to a "New" graphic, and writes out the file's "Last modified" date.

Browser Steering

As described in Chapters 2, "Reducing Site Maintenance Costs Through Testing and Validation," and 3, "Deciding What to Do About Netscape," it is possible to write pages that look good on any standards-compliant browser. And newer Netscape features such as frames and JavaScript are designed so that other browsers are not confused by the presence of these Netscapeisms.

But there are times when a particular page must be written one way for one browser and another way for others. By reading the CGI environment variable USER_AGENT, a script can send back different HTML depending upon the visitor's browser.

Introducing a URL-Minder Workalike

At the end of Chapter 6, "Reducing Maintenance Costs with Server-Side Includes," we introduced URL-minder, a third-party solution that captures the e-mail address of visitors interested in the page. URL-minder represents a nice solution that can be put on a page in just a few lines of HTML. There are two shortcomings to URL-minder, however. First, the page owner has no idea who has registered an interest in the page, so the amount of feedback and interaction with the visitor is limited. Second, URL-minder notifies registered users of all changes in a page. Correcting a typo sends out a notification. Taking down a "New" notice triggers a notification. Putting a counter on the page causes it to send out constant "New" notices. To be sure, there are workarounds to some of these problems, but a better solution is to write our own page-minder.

The custom page-minder has two parts. The first part connects to the form on the page. When users register an interest in a page, the page-minder writes their e-mail address to a database and associates it with that page. This part constructs a page-specific mailing list. The second part allows the site owner to notify everyone on the list. Because this is a custom page-minder, we can allow the site owner to tailor the message and send it out when he wants it sent.

Building a Counter with CGI

Counters are so easy to implement that they are frequently the first script a new CGI programmer writes. Perhaps that fact accounts for the many counters found on Web pages. The usefulness of counters is limited, of course. Most people probably don't care how many visitors you've had since February. They come for the content-if you don't have what they are looking for, they leave (but the counter increments nonetheless). Among those visitors who are interested in the counter, whatever number is displayed, some will interpret it as a statement that the page is unpopular. For the site owner, the counter says nothing that the access logs don't say, and it says it better.

While a counter is of questionable utility in promoting a page, it does make for a nice learning experience in CGI. So by all means write one-but don't use it.

Setting Up a Guest Book

Guest books are like counters. They don't add much content to a site. Most people don't read them. (Well, when was the last time you read the guest book someplace other than your own site?) But the make a nice "second project," and have become something of a rite of passage for fledgling CGI scripters.

The redeeming value of guest books is that they serve as a foundation for bulletin board scripts, which can be used to enhance the effectiveness of the site. Part VIII, "Advanced Applications: Web-based Bulletin Boards," describes a variety of bulletin board systems.

A Script to Search an Online Text Database

Among the first uses of computers was data storage and retrieval. A variety of scripts are available to search pages or to search a simple text database. Visit Matt's Script Archive at http://www.worldwidemart.com/scripts/ and look at Simple Search for an example of a page-searcher. HTGREP, by Oscar Nierstrasz at the University of Berne, Switzerland is an excellent example of a text-file searcher.

HTGREP can be installed in a CGI "wrapper" so that the installer controls most of its options, and brings out just enough control to the end user that the script accomplishes its task without confusing the user. The programmer can also write a custom backend filter to make sure the data is displayed in a way that is easy to understand. For a full description of HTGREP, see Chapter 16, "How to Index and Search the Information on Your Site."

Searching a Site-Specific Index of Pages

It is often desirable to allow the user to search the whole site for pages that match certain keywords. Maintaining a keyword database becomes a maintenance nightmare, however. The easiest way to maintain keywords is to store them in META tags at the top of each page. Two scripts from Robert Thau at MIT are available to help use this data.

The first, site-idx.pl, starts at the root and explores each page on the site. When it finds a page, it looks for keywords and adds the reference to the file site-idx. Various options are available, such as requiring that the META tag exist, or blocking access to any directory that has a .htaccess file.

The resulting site index can be read by search engines like ALI WEB to make the pages findable by the Web community as a whole. By using a META distribution tag, the Webmaster can also set up a local index and allow visitors to search it using the aliwebsimple.pl script.

A Form Mailer

Among the most useful scripts are those that take the contents of a form and mail it to the site owner. Using such scripts visitors can ask to join a mailing list, send in orders for a product, ask technical support questions, or provide feedback about the site.

One of the most powerful scripts is formmail.pl, also from Matt's Script Archive (http://www.worldwidemart.com/scripts/).There is a detailed discussion of formmail at the end of this chapter.

Printing a Filled-In Form

Sometimes users want a copy of a form they have filled in. If they have placed an online order, they want a copy for their records. If they are sending in a check or sending a credit card number by facsimile, they need a printed copy of the form. Users who use forms to submit data (such as announcing a new site) want to save the contents of the form.

Unfortunately, in most browsers printing a form causes the form to print, but not the contents of the fields. The solution is to have the user submit the form, then return to the user a printable page.

Ordering Through First Virtual

First Virtual Holdings is one of the Web's first online order processing systems. To use First Virtual (http://www.fv.com/) both the buyer and the seller need an account. The buyer enters his or her First Virtual ID. The script first checks to make sure the value entered in the field at least looks like a valid First Virtual ID. Then it calls "finger," a TCP/IP application for getting small pieces of information in real-time. Finger checks with First Virtual's server to see if the account is valid.

Once the script has determined that the user has entered a valid account ID, it sends an e-mail message to First Virtual using a specific protocol that can be automatically processed by their server. Then it builds a separate e-mail message describing the order and sends it to the site owner and the buyer. Finally it sends a "Thank You" page back to the user's browser.

First Virtual has examples of scripts that implement their protocol at their Web site. Chapter 25, "Getting Paid: Taking Orders over the Internet," describes First Virtual Holdings in more detail.

Accessing a Relational Database Through SQL

While the text-based database describe above is sufficient for simple queries against small databases, "real" databases contain hundreds of thousands of items and allow complex queries. The software to allow access to such large databases has been finely tuned over many years, and often sells for tens of thousands of dollars. Leaders in this industry include Oracle, Sybase, Informix and Gupta.

All of these relational database (RDB) products are accessed using the Structured Query Language, or SQL (pronounced see-quel). Each RDB vendor uses their own version of SQL. From the point of view of the script author, however, these versions are close enough that most of what would come in from the Web can be written for ANSI standard SQL.

Chapter 18, "How to Query Databases," describes RDB access scripts.

A Shopping Cart Script

For many site owners, the measure of effectiveness is sales. For many sites, a buyer might want more than one item, and might want the item in a variety of sizes, colors, or styles. In an ideal world, the user could browse from one page of the catalog to the next, picking up items and having the script remember them. Then when the user was ready to check out, he or she would review the order and enter information about where to ship the items and how they will make payment.

But HTTP is a stateless protocol. Recall from Chapter 4, "Designing Faster Sites," that after the user completes a GET, the connection is closed. The server has know way of knowing that a subsequent GET comes from the same user. There are several workarounds that allow a script to keep track of which user is making the request. Some of these mechanisms are given in Chapter 9, "Making a User's Life Simpler with Multipart Forms." Chapters 25-29 describe the online shopping experience.

How to Write a CGI Script

To get started writing scripts, this section describes a small but useful script in detail. The script is formmail.pl, available from Matt's Script Archive (http://www.worldwidemart.com/scripts/). formmail allows the HTML author to specify nearly all the options in hidden fields on the form, so authors who do not know much about CGI can put up sophisticated scripts.

The script comes with an excellent README file that describes how to install and configure the script. There are also some small changes the installer must make at the top of the script. Get the latest version from the archive and go through the README file to see how to hook up the latest features. Here is the header to formmail, showing these configuration variables:

#!/usr/bin/perl
##############################################################################
# FormMail                Version 1.5 #
# Copyright 1996 Matt Wright     mattw@misha.net #
# Created 6/9/95                 Last Modified 2/5/96 #
# Scripts Archive at:    http://www.worldwidemart.com/scripts/#
##################################################################
# COPYRIGHT NOTICE                                                           #
# Copyright 1996 Matthew M. Wright  All Rights Reserved.                     #
#                                                                            #
# FormMail may be used and modified free of charge by anyone so long as this #
# copyright notice and the comments above remain intact.  By using this      #
# code you agree to indemnify Matthew M. Wright from any liability that      #  
# might arise from its use.                                                  #  
#           #
# Selling the code for this program without prior written consent is         #
# expressly forbidden.  In other words, please ask first before you try and  #
# make money off of my program.    #
##################################################################
# Define Variables 
#  Detailed Information Found In README File.
# $mailprog defines the location of your sendmail program on your unix 
# system.
$mailprog = '/usr/lib/sendmail';
# @referers allows forms to be located only on servers which are defined 
# in this field.  This fixes a security hole in the last version which 
# allowed anyone on any server to use your FormMail script.
@referers = ('www.worldwidemart.com','worldwidemart.com','206.31.72.203');
# Done
#############################################################################

Once a CGI programmer has become familiar with the basics, they may want to use a Perl library so they don't have to continually recode the same features. There is a good Perl5 library (CGI.pm) at http://www-genome.wi.mit.edu/ftp/pub/software/WWW/cgi_docs.html. Libraries like CGI.pm hide many of the details of CGI processing so the user cannot get them wrong. This approach has many merits. This book emphasizes understanding the underlying technology, so the scripts here make little use of libraries.

Anatomy of formmail.pl

formmail is an example of well-designed, modular code. The main routine is lean, almost to the point of being sparse. It makes a series of calls to Perl subroutines, and then exits. Each subroutine is short (typically under 20 lines) and fits comfortably on a screen or page. (Many software engineers recommend this design style, since a piece of code short enough to be seen on one screen is easier to understand and more maintainable.)

`formmail`'s Main Routine

Here is the main routine of formmail:

# Check Referring URL
&check_url;
# Retrieve Date
&get_date;
# Parse Form Contents
&parse_form;
# Check Required Fields
&check_required;
# Return HTML Page or Redirect User
&return_html;
# Send E-Mail
&send_mail;

The `check_url` Subroutine

Breaking long programs into modules is good programming practice. The first such
module in formmail.pl is check_url:

sub check_url {
   if ($ENV{'HTTP_REFERER'}) {
      foreach $referer (@referers) {
         if ($ENV{'HTTP_REFERER'} =~ /$referer/i) {
            $check_referer = '1';
     last;
         }
      }
   }
   else {
      $check_referer = '1';
   }
   if ($check_referer != 1) {
      &error('bad_referer');
   }
}

The check_url looks at the CGI environment variable HTTP_REFERER to see which machine is sending this script. It is considered poor form (to say the least) to use someone else's machine to send your mail. Since so much of formmail is configurable at the form level, however, there used to be nothing to block you from setting up a form on your machine and hooking it to the copy of formmail on my machine. formmail would run, and send the resulting mail back to you!

Starting in version 1.4, Matt added check_url, which makes sure that the page the user was just on (the form page) was on a machine on the referrers list. By setting this list to the list of authorized users, the installer can keep unauthorized sites from hooking to this copy of the script and loading up this server.

The `get_Date` Subroutine

There are plenty of date and time "pretty-printers" around. This subroutine is an example of one such. It calls the Perl function localtime() to get a human-readable version of the current time. It uses Perl's string-processing facilities to line up all the fields so they look nice:

sub get_date {
   @days = ('Sunday','Monday','Tuesday','Wednesday','Thursday',
   'Friday','Saturday');
   @months = ('January','February','March','April','May','June','July',
	'August','September','October','November','December');
   ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time);
   if ($hour < 10) { $hour = "0$hour"; }
   if ($min < 10) { $min = "0$min"; }
   if ($sec < 10) { $sec = "0$sec"; }
   $date = "$days[$wday], $months[$mon] $mday, 19$year at $hour\:$min\:$sec";
}

The `parse_form` Subroutine

parse_form is where the real work is done. parse_form is a good candidate for use in a library of routines, since it is a generic formreader. (Just be sure to remember to give credit to Matt for the original work, in accordance with his copyright notice.)

sub parse_form {
   if ($ENV{'REQUEST_METHOD'} eq 'GET') {
      # Split the name-value pairs
      @pairs = split(/&/, $ENV{'QUERY_STRING'});
   }
   elsif ($ENV{'REQUEST_METHOD'} eq 'POST') {
      # Get the input
      read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
 
      # Split the name-value pairs
      @pairs = split(/&/, $buffer);
   }
   else {
      &error('request_method');
   }
   foreach $pair (@pairs) {
      ($name, $value) = split(/=/, $pair);
 
      $name =~ tr/+/ /;
      $name =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
      $value =~ tr/+/ /;
      $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
      # If they try to include server side includes, erase them, so they
      # aren't a security risk if the html gets returned.  Another 
      # security hole plugged up.
      $value =~ s/<!--(.|\n)*-->//g;
      # Create two associative arrays here.  One is a configuration array
      # which includes all fields that this form recognizes.  The other
      # is for fields which the form does not recognize and will report 
      # back to the user in the html return page and the e-mail message.
      # Also determine required fields.
      if ($name eq 'recipient' ||
   $name eq 'subject' ||
   $name eq 'email' ||
   $name eq 'realname' ||
   $name eq 'redirect' ||
   $name eq 'bgcolor' ||
   $name eq 'background' ||
   $name eq 'link_color' ||
   $name eq 'vlink_color' ||
          $name eq 'text_color' ||
      $name eq 'alink_color' ||
   $name eq 'title' ||
   $name eq 'sort' ||
   $name eq 'print_config' ||
   $name eq 'return_link_title' ||
   $name eq 'return_link_url' && ($value)) {
         
  $CONFIG{$name} = $value;
      }
      elsif ($name eq 'required') {
         @required = split(/,/,$value);
      }
      elsif ($name eq 'env_report') {
         @env_report = split(/,/,$value);
      }
      else {
         if ($FORM{$name} && ($value)) {
     $FORM{$name} = "$FORM{$name}, $value";
  }
         elsif ($value) {
            $FORM{$name} = $value;
         }
      }
   }
}

Recall that forms can be set up to use either the GET or the POST method. GET sends the data in a query string through an environment variable. POST is better suited for longer messages, since it passes the data through the HTTP entity-the script gets the CONTENT_LENGTH variable from the environment and reads the data from STDIN.

Here's where formmail handles GET:

if ($ENV{'REQUEST_METHOD'} eq 'GET') {
   # Split the name-value pairs
   @pairs = split(/&/, $ENV{'QUERY_STRING'});
}

If the method is GET, the fields are separated by ampersands, so QUERY_STRING might contain

 realname=John+T.+Smith&email=jtsmith@somewhere.com

Here's the code that handles POST in formmail:

elsif ($ENV{'REQUEST_METHOD'} eq 'POST') {
   # Get the input
   read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
 
   # Split the name-value pairs
   @pairs = split(/&/, $buffer);
}

If the form uses POST, the read brings CONTENT_LENGTH characters into the buffer from STDIN. Like the GET handler, the incoming data is separated into fields by an ampersand.

Once the above code has run, the script no longer cares whether GET or POST was used. The data is stored in the list @pairs, and looks like this

$pair[0] = "realname=John+T.+Smith"
$pair[1] = "email =jtsmith@somewhere.com"

and so on

foreach $pair (@pairs) {
   ($name, $value) = split(/=/, $pair);
 
   $name =~ tr/+/ /;
   $name =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
   $value =~ tr/+/ /;
   $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;

Once the data is in @pairs, this foreach loop splits out each field into variables $name and $value. In the above example, on the first line $name would get realname and $value would get John+T.+Smith. Next the plus signs are changed back into spaces, and the escaped values are turned back into their original form. If the name or value contained a %2F, for example, it would be packed into a character with value 2F16 or 4710. That character is an ASCII '/'.

if ($name eq 'recipient' ||
 $name eq 'subject' ||
and so forth
) {
          
$CONFIG{$name} = $value;
   }

The interesting fields are saved in an associative array called CONFIG. Later we will be able to retrieve fields by name, so $CONFIG{'email') will give jtsmith@somewhere.com.

elsif ($name eq 'required') {
   @required = split(/,/,$value);
}
elsif ($name eq 'env_report') {
   @env_report = split(/,/,$value);
}

The fields required and env_report get special handling. They contain lists of field names separated by commas. The above code splits these lists into Perl lists.

   else {
      if ($FORM{$name} && ($value)) {
  $FORM{$name} = "$FORM{$name}, $value";
}
      elsif ($value) {
         $FORM{$name} = $value;
      }

Finally, we put the names and values of any fields the user has added to the form into the FORM array. So the user can put, say, quantity information on an ordering page like this:

Quantity: <INPUT TYPE=Text NAME=Quantity VALUE=1><BR>

and get $FORM{Quantity} equal to the number the user put in that field.

The `check_required` Subroutine

Once the script has read in the required fields, it's time to check them for data. That's the job of the check_required subroutine:

sub check_required {
   foreach $require (@required) {
      if ($require eq 'recipient' ||
          $require eq 'subject' ||
          and so forth
) {
       if (!($CONFIG{$require}) || $CONFIG{$require} eq ' ') {
            push(@ERROR,$require);
       }
    }
    elsif (!($FORM{$require}) || $FORM{$require} eq ' ') {
       push(@ERROR,$require);
    }
 }
 if (@ERROR) {
    &error('missing_fields', @ERROR);
 }

The above code loops through the required array (which, you will recall, was set up from the required field) and checks each required field to be sure that it is not empty or a blank. First it checks the fields it knows about (for example, realname and e-mail), then it checks fields the user has added (for example, Quantity). If the field has no contents, the name of the field is pushed onto a list of errors.

After checking all the fields on the required list, if there are any fields named on the @ERROR list, the script calls its error handler (subroutine error) and asks it to complain about 'missing_fields'.

The `return_html` Subroutine

Once all the decisions are made, it's time for the script to answer the user. If the developer specified a "redirect" page, that page is sent. Otherwise a dynamic page is built and sent.

Tip

In programs like formmail that allow you to specify a redirect page, it's a good idea to use that feature. You will have better control of the end result, lower maintenance costs (since an HTML coder rather than a CGI programmer can maintain the page) and, as an added benefit, use slightly less computing power.

sub return_html {
   if ($CONFIG{'redirect'} =~ /http\:\/\/.*\..*/) {
      # If the redirect option of the form contains a valid url,
      # print the redirectional location header.
      print "Location: $CONFIG{'redirect'}\n\n";
   }
   else {
      print "Content-type: text/html\n\n";
      print "<html>\n <head>\n";
      # Print out title of page
      if ($CONFIG{'title'}) {
  print "  <title>$CONFIG{'title'}</title>\n";
      }
      else {
         print "  <title>Thank You</title>\n";
      }
      print " </head>\n <body";
      # Get Body Tag Attributes
      &body_attributes;
      # Close Body Tag
      print ">\n  <center>\n";
      if ($CONFIG{'title'}) {
         print "   <h1>$CONFIG{'title'}</h1>\n";
      }
      else {
         print "   <h1>Thank You For Filling Out This Form</h1>\n";
      }
      print "</center>\n";
      print "Below is what you submitted to $CONFIG{'recipient'} on ";
      print "$date<p><hr size=7 width=75\%><p>\n";
      if ($CONFIG{'sort'} eq 'alphabetic') {
         foreach $key (sort keys %FORM) {
            # Print the name and value pairs in FORM array to html.
            print "<b>$key:</b> $FORM{$key}<p>\n";
         }
      }
      elsif ($CONFIG{'sort'} =~ /^order:.*,.*/) {
         $sort_order = $CONFIG{'sort'};
         $sort_order =~ s/order://;
         @sorted_fields = split(/,/, $sort_order);
         foreach $sorted_field (@sorted_fields) {
            # Print the name and value pairs in FORM array to html.
            if ($FORM{$sorted_field}) {
               print "<b>$sorted_field:</b> $FORM{$sorted_field}<p>\n";
      }
         }
      }
      else {
         foreach $key (keys %FORM) {
            # Print the name and value pairs in FORM array to html.
            print "<b>$key:</b> $FORM{$key}<p>\n";
         }
      }
      print "<p><hr size=7 width=75%><p>\n";
      # Check for a Return Link
      if ($CONFIG{'return_link_url'} =~ /http\:\/\/.*\..*/ && $CONFIG{'return_link_title'}) {
         print "<ul>\n";
         print "<li><a href=\"$CONFIG{'return_link_url'}\">$CONFIG{'return_link_title'}</a>\n";
         print "</ul>\n";
      }
      print "<a href=\"http://www.worldwidemart.com/scripts/formmail.shtml\">FormMail
      </a> Created by Matt Wright and can be found at
      <a href=\"http://www.worldwidemart.com/scripts/\">Matt's Script
      Archive</a>.\n";
      print "</body>\n</html>";
   }
}

formmail allows the form designer to specify a custom form to be returned after the form is processed. Typically the designer would prepare a 'Thank You' form for this purpose, and put its URL in the field redirect. If redirect is empty or does not contain something that looks like a URL, formmail puts up its own page. It sends the obligatory "Content-type: text/html" to satisfy the requirements of HTTP. Then it puts up the necessary HTML to display each of the user-relevant fields. This routine concludes by putting up the return link (if the user has provided one) and a link to Matt's Script Archive. The return link is only used if the default 'Thank You' page is displayed, and provides a convenient way for the user to reenter the site from the Thank You page.

Note that return_html, like send_mail below, uses the sort field. The designer can specify an alphabetic sort (not particularly useful) or a specific order (usually a much nicer design).

The `send_mail` Subroutine

The real work of formmail is to send the results of the HTML form to the site owner by e-mail. Here is the routine that does that work:

sub send_mail {
   # Open The Mail Program
   open(MAIL,"|$mailprog -t");
   print MAIL "To: $CONFIG{'recipient'}\n";
   print MAIL "From: $CONFIG{'email'} ($CONFIG{'realname'})\n";
   # Check for Message Subject
   if ($CONFIG{'subject'}) {
      print MAIL "Subject: $CONFIG{'subject'}\n\n";
   }
   else {
      print MAIL "Subject: WWW Form Submission\n\n";
   }
   print MAIL "Below is the result of your feedback form.  It was ";
   print MAIL "submitted by $CONFIG{'realname'} ($CONFIG{'email'}) on ";
   print MAIL "$date\n";
   print MAIL "------------------------------------------------------
   ----------------------\n\n";
   if ($CONFIG{'print_config'}) {
      @print_config = split(/,/,$CONFIG{'print_config'});
      foreach $print_config (@print_config) {
         if ($CONFIG{$print_config}) {
            print MAIL "$print_config: $CONFIG{$print_config}\n\n";
         }
      }
   }
   if ($CONFIG{'sort'} eq 'alphabetic') {
      foreach $key (sort keys %FORM) {
         # Print the name and value pairs in FORM array to mail.
         print MAIL "$key: $FORM{$key}\n\n";
      }
   }
   elsif ($CONFIG{'sort'} =~ /^order:.*,.*/) {
      $CONFIG{'sort'} =~ s/order://;
      @sorted_fields = split(/,/, $CONFIG{'sort'});
      foreach $sorted_field (@sorted_fields) {
         # Print the name and value pairs in FORM array to mail.
         if ($FORM{$sorted_field}) {
            print MAIL "$sorted_field: $FORM{$sorted_field}\n\n";
         }
      }
   }
   else {
      foreach $key (keys %FORM) {
         # Print the name and value pairs in FORM array to html.
            print MAIL "$key: $FORM{$key}\n\n";
      }
   }
   print MAIL "-----------------------------------------------------
   -----------------------\n";
   # Send Any Environment Variables To Recipient.
   foreach $env_report (@env_report) {
      print MAIL "$env_report: $ENV{$env_report}\n";
   }
   close (MAIL);
}

This routine illustrates Perl's (and UNIX's) ability to open a pipe to a program just as easily as it opens a file. The script opens a pipe to the mail program; the -t switch indicates to programs like sendmail that the header information (for example, To:, From: Subject:) will be specified in the message. Then it pumps the fields out to sendmail, obeying the sort field just like return_html did.

Finally this routine sends the user the contents of the requested environment variables. This field allows the Webmaster or site owner to keep track of browsers, remote hosts, and anything else of interest that is contained in the environment variables. The most useful are likely to be:

REMOTE_HOST-The name of the host from which the user is accessing the form
REMOTE_ADDR-The IP address of REMOTE_HOST
REMOTE_USER-Username if authentication is used
REMOTE_IDENT-RFC 931 identification of the user, not usually available
HTTP_USER_AGENT-The user's client

The Error Handler

The error subroutine prepares a page of HTML to handle any of the various errors the script can throw. For example, if the error is 'missing_field' the script puts out a message that lists the missing required fields and tells the user what he or she must do (fill in the fields) to fix the error. Good error handling is a sign of quality code. The preceding handler is a good example of how to do it right.

sub error {
   ($error,@error_fields) = @_;
   print "Content-type: text/html\n\n";
   if ($error eq 'bad_referer') {
      print "<html>\n <head>\n  <title>Bad Referrer -
      Access Denied</title>\n </head>\n";
      print " <body>\n  <center>\n   <h1>Bad Referrer -
      Access Denied</h1>\n  </center>\n";
      print "The form that is trying to use this <a
      href=\"http://www.worldwidemart.com/scripts/\">FormMail
      Program</a>\n";
      print "resides at: $ENV{'HTTP_REFERER'},
      which is not allowed to access this cgi script.<p>\n";
      print "Sorry!\n";
      print "</body></html>\n";
   }
   elsif ($error eq 'request_method') {
      print "<html>\n <head>\n  <title>Error: Request Method</title>\n </head>\n";
      print "</head>\n <body";
      # Get Body Tag Attributes
      &body_attributes;
      # Close Body Tag
      print ">\n <center>\n\n";
      print "   <h1>Error: Request Method</h1>\n  </center>\n\n";
      print "The Request Method of the Form you submitted did not match\n";
      print "either GET or POST.  Please check the form, and make sure the\n";
      print "method= statement is in upper case and matches GET or POST.\n";
      print "<p><hr size=7 width=75%><p>\n";
      print "<ul>\n";
      print "<li><a href=\"$ENV{'HTTP_REFERER'}\">
      Back to the Submission Form</a>\n";
      print "</ul>\n";
      print "</body></html>\n";
   }
   elsif ($error eq 'missing_fields') {
      print "<html>\n <head>\n  <title>Error: Blank Fields</title>\n </head>\n";
      print " </head>\n <body";
      
      # Get Body Tag Attributes
      &body_attributes;
         
      # Close Body Tag
      print ">\n  <center>\n";
      print "   <h1>Error: Blank Fields</h1>\n\n";
      print "The following fields were left blank in your submission form:<p>\n";
      # Print Out Missing Fields in a List.
      print "<ul>\n";
      foreach $missing_field (@error_fields) {
         print "<li>$missing_field\n";
      }
      print "</ul>\n";
      # Provide Explanation for Error and Offer Link Back to Form.
      print "<p><hr size=7 width=75\%><p>\n";
      print "These fields must be filled out before you can successfully submit\n";
      print "the form.  Please return to the <a href=\"$ENV{'HTTP_REFERER'}\">
      Fill Out Form</a> and try again.\n";
      print "</body></html>\n";
   }
   exit;
}

The `body_attributes` Subroutine

Finally, the script allows the form designer to specify the background graphic or color and the various link colors if desired. These attributes are used on the default 'Thank You' page if the form designer does not specify a redirect page, and on most of the error pages.

sub body_attributes {
   # Check for Background Color
   if ($CONFIG{'bgcolor'}) {
      print " bgcolor=\"$CONFIG{'bgcolor'}\"";
   }
   # Check for Background Image
   if ($CONFIG{'background'} =~ /http\:\/\/.*\..*/) {
      print " background=\"$CONFIG{'background'}\"";
   }
   # Check for Link Color
   if ($CONFIG{'link_color'}) {
      print " link=\"$CONFIG{'link_color'}\"";
   }
   # Check for Visited Link Color
   if ($CONFIG{'vlink_color'}) {   
      print " vlink=\"$CONFIG{'vlink_color'}\"";
   }
   # Check for Active Link Color
   if ($CONFIG{'alink_color'}) {
      print " alink=\"$CONFIG{'alink_color'}\"";
   }
   # Check for Body Text Color
   if ($CONFIG{'text_color'}) {
      print " text=\"$CONFIG{'text_color'}\"";
   }
}

Hooking Up `formmail`

To use formmail.pl, set up a form and specify the URL to your copy of formmail in the form's ACTION attribute. While formmail allows both GET and POST methods, use POST for forms. Some machines have a limit on the length of environment variables. Forms usually generate a lot of data and benefit from being sent by POST.

Note

Here's the code to connect an HTML form page to formmail.pl: <FORM METHOD=POST ACTION="/cgi-bin/formmail.pl">

When the user submits the form, the script runs as described in the preceding section. If an error occurs, the error handler puts up the appropriate message. Otherwise the script looks for a redirect page. If one exists, it is returned to the user's browser. Otherwise the default Thank You page is constructed on the fly and sent back. (Note that, from a design standpoint, a custom page is usually superior to the default page, but the default page is handy during setup and testing.)

Finally, the script formats the mail message and sends it.

That's what happens if everything is working correctly. The next chapter looks at what can go wrong.

Chapter 7

Extending HTML's Capabilities with CGI

CONTENTS