Chapter 20 Preserving Data

Methods of Preserving Data
Philosophy of Data Preservation
Implementing Uses for Data Preservation

Chapter 19, "How to Build Pages On the Fly," presented the basic concept of generating pages dynamically. This chapter focuses on the sample applications presented in Chapter 19, and gives you access to specific tools for preserving data dynamically or statically.

Preserving data is a serious application of dynamic page generation. Data preservation involves using a CGI script to set up information so that it can be recalled by the next instance of another CGI script generating subsequent pages. Preserving data requires using HTML to store information. This chapter shows you how to preserve data with the markup language or by other more covert means.

The fundamentals of dynamic page generation are

Preserving data
Recursive CGI
Building HTML pages on the fly

This chapter explores the methods used to preserve data. We use the phrases "passing data" and "preserving data" almost interchangeably. The difference is that passing data is primarily concerned with data being shared or given to another CGI script. It's not important, in that case, what the data is used for-it's simply relayed. Preserving data is one purpose for passing data. If you're preserving data, you're primarily concerned with maintaining state information about a user or situation generated via your CGI scripts and HTML pages. In Chapter 21, "Recursive CGI," you learn how to apply recursive CGI programming techniques. In Chapter 22, "How to Build HTML On the Fly," you learn how to build HTML pages on the fly. The progression has been to introduce ideas and methods for how to preserve data, then techniques to make CGI scripts act more
dynamic (making them recursive) to arrive at the topic of generating HTML on the fly.

Methods of Preserving Data

The methods of passing data involve either the markup language or the URL of the CGI script. So far, all the methods we're concerned with can be done by a default Web server and practically any Web browser. There are new developments in the HTTP protocol and HTML language specification that implement the HTTP cookie. (Chapter 9, "Making a User's Life Simpler with Multipart Forms," examines this advancement.) Cookies are a significant step toward making the process of page generation and user recognition seamless. Until now, most sites and applications that utilize some form of data preservation do so by "traditional" means-passing information from HTML forms or in the URL of CGI scripts.

Several chapters later in the book cover the use of cookies. Although the specifications for using HTTP cookies are in development, the idea of client-side persistent HTTP cookies opens up a whole new set of solutions for our applications. We'll find cookies particularly useful in Part VII, "Advanced CGI Applications: Commercial Applications," for maintaining user identity within a Web-based shopping environment.

Some methods of preserving data are built in to the HTML language. There are several types of "input" tags in HTML:

INPUT tags that are "hidden."
INPUT tags that are "visible," like type="text", type="radio", and so on.
HTML comments are not INPUT tags, but they are another way to preserve data.
HTML comments cannot be processed or used once they become part of a page. HTML comments are a way to document version control for files, copyrights, and any other information that should be retained with the document, but they are not visible to the user reading the page.

Other methods of preserving data and modifying the HTTP server software to maintain "user identification" are part of the URL of the CGI script:

Adding arguments to the URL of a CGI script
Modifying the Web server to embed data within the URL

Passing Data Versus Preserving Data

Passing data from CGI scripts and HTML files is the same as preserving data. Unless all the information stored or collected in HTML forms is saved in a database, the only place you can preserve information is in the transaction between static HTML files and CGI scripts.

The application of data passing techniques is very important when you write dynamic CGI scripts that depend upon one another to work as a system. Chapter 21, "Recursive CGI," covers complex examples where data passing techniques are required to build a system of dynamic page generation.

The basic need for data preservation techniques stems from the need to retain information-which might or might not be provided by the user-and use that information again, perhaps repeatedly.

You can see the use of data preservation with the CGI scripts that build the Web chat application. The user's name, e-mail address, and so on are requested when the user enters the Web chat environment. Because you don't want to keep asking for this information, store it in the page with a hidden input type.

Before we learn, in detail, how to preserve data, let's review some applications we've seen in this book that depend on data preservation. Data preservation is not always an extravagant event. Data preservation can be as simple as the Web chat application that passes the user name and e-mail address to the "next page" as a hidden data type. Data preservation can be as quick as passing data in the real estate searching tool in Chapter 19, "How to Build Pages On the Fly."

We've also used data preservation when creating user identifiers or state information. Finally, we've relied on users to help preserve data by using the chat environment-whenever users post a new message or refresh the transcript page, they force data about themselves to be preserved for the next iteration.

Preserving Data Using HTML Tags (Hidden and Visible)

Using HTML form data is the easiest and most common method of passing data. For the most part, data passed that is not supplied by the user from HTML forms is done with the hidden input type.

Recall that hidden input type in an HTML form looks something like this:

<input name="pi" type="hidden" value="3.1416">

The CGI that receives data from a form containing this input tag assigns 3.1416 to the form variable $Form{'pi'}. The data is invisible to the user unless she looks at the source of the document and sees the hidden input tags.

The data is stored in the page. The chain of pages generated by CGI scripts can read and rewrite hidden data to the next page as long as the data is needed. This data isn't persistent-like data stored using cookies-but for the typical flow of pages the user sees on a site, this data can be persistent enough to help identify the user as she goes from page to page.

As you learned earlier, the Web server has no way to actually track the fact that the same user has visited any number of pages. We can use the hidden input data type to pass along a user ID, quantity of items purchased from a Web shopping tool, or any other piece of information that needs to be "remembered."

Data hidden this way is really useful only for the CGI that will process all the information from the HTML form that contains the hidden data. Customarily, the HTML form is used to get input from the user, but there's no rule that all HTML form data must be the kind that a user supplies.

For example, consider the static HTML page in Listing 20.1.

Listing 20.1 fromWhere.html-A Sample Page Using SSI to Preserve Data

<HTML>
<TITLE>Listing 2, Chapter 20 Programs for Webmaster Expert Solutions</TITLE>
<BODY BGCOLOR=FFFFFF>
<H1>Listing 2, Chapter 20</H1>

You seem to be coming from...
<p>
<!--#exec cmd="/export/home/jdw/bookweb/cgi-bin/fromWhere.cgi" -->

<HR>
<A HREF="../index.html">Home</A>
</BODY>
</HTML>

The SSI to handle this just checks the REMOTE_HOST flag from the environment of the client. If the SSI can tell the difference between a hostname and an IP address, it displays a message (see Listing 20.2).

Listing 20.2 fromWhere.cgi-Checks the REMOTE_HOST Environment Variable and Preserves the Address for the Next "Page"

#!/usr/local/bin/perl

$fromwhere = $ENV{'REMOTE_HOST'};

print "<form method=\"post\" action=\"/cgi-bin/someScript.cgi\">\n";

if ($fromwhere =~ /(\d+)$/) {
   print "Not sure, it's <b>$fromwhere.</b><p>\n";
}
else
{
   $fromwhere =~ /\.(\w+)$/;
   print "It's a <b>$1</b> site.<p>\n";
}


print "<input name=\"fromWhere\" type=\"hidden\" value=\"$fromwhere\">\n",
      "<input type=\"submit\" value=\"Continue\">\n",
      "</form>\n";

When this SSI is executed, the resulting page is a simple page with a Submit button, but the data is preserved in a hidden INPUT type (see Fig. 20.1).

Figure 20.1: The resulting HTML from the fromWhere.cgi SSI.

The document source (see Fig. 20.2) shows where the hidden INPUT type is.

Figure 20.2: This is the HTML source of figure 20.1. The remote hostname is preserved as a hidden INPUT type.

Preserving data with hidden data types in HTML forms is a favorite method for keeping data "alive" from page to page. The requirement to use hidden data types in HTML forms is the cumbersome part-you need an HTML form! Actually, it's not always possible to use an HTML form for embedding data inside hidden input types, so use care if you choose to use hidden input types to preserve data. If the chain of pages in which you want the data preserved can contain HTML forms, then there's no problem.

For example, an HTML form doesn't necessarily have to look like an HTML form with input boxes and buttons. A toolbar comprised of individual graphic icons rather than a composite (imagemap-ready) graphic could hide data well.

Let's try that out:

<form method="post" action="/cgi-bin/doSomething.cgi">
<input type="submit" src="/graphics/A.gif" name="userID" value="123ABC">
</form>
<form method="post" action="/cgi-bin/doSomething.cgi">
<input type="submit" src="/graphics/B.gif" name="userID" value="123ABC">
</form>

This code should continue as far as necessary. What the user sees, then, is a string of icons that make up a toolbar, but each image is really the Submit button of a form itself-with a name and value.

Realize that this probably isn't going to be a static HTML page. Notice the assignment of userID to the value 123ABC; unless the Web has only one patron or we're forcing everyone to have the same user ID, this value must change for different users. This HTML therefore must be generated dynamically by an SSI (or CGI script) so that a custom userID can be retrieved and used to build a custom page for each user.

Visible tags are those that are present in the browser window, artifacts of the page that the user can see. This includes input text areas, buttons, and list boxes. Link tags are also visible tags that can be used to help preserve data. Remember that the goal of preserving data is to keep it in the page and associate it with an event that can "pass" it to another CGI script that can process the data. Data preservation needs support in terms of a CGI script to manipulate the data.

For example, consider this HTML form:

<form method="post" action="/cgi-bin/handleThings.cgi">
UserID <input name="userID">
</form>

This passes the variable userID to the CGI script handleThings.cgi, but it's up to the CGI script to preserve the data. The data isn't preserved until it becomes part of the page in a way that cannot be altered by the user.

Imagine that handleThings.cgi is a CGI script that generates HTML. In order to preserve userID, the CGI script has to do one of two things. The first option is to create an HTML tag to place the data into the page:

#!/usr/local/bin/perl
# standard linkage
print "<input name=\"savedUserID\" type=\"hidden\" value=\"$Form{'userID'}>\n";

The alternative is to use a visible input tag:

print "<input type=\"radio\" name=\"savedUserID\" value=\"$Form{'userID'}\">\n";

Preserving Data Using HTML Comments

Data preservation techniques usually involve saving data for use by other CGI scripts to be archived or utilized by software to make decisions on how new pages are dynamically generated.

There are some situations where the preserved data doesn't need to be processed by a CGI script, but merely must "exist" for later reference. In the case of a Web chat environment, there are situations where you need the true identity of the user posting messages. For example, as new messages are written to the transcript log file, an HTML comment like

<!-- This message posted by user at IP address: 199.174.46.40 -->

is embedded into the HTML of the transcript page. If users or moderators need to learn more about the true origin of the message, they can look at the source of the HTML transcript and see the data saved about each message-in this case, the IP address of the client who posted the message. No software analyzes the HTML transcript, but a system administrator or chat environment moderator can quickly find out who has posted particular messages.

The trick of preserving data as HTML comments is used mostly for placing administrative information into a page, including copyright information, acknowledgements, or version labels.

For example, the Perl library we use, web.pl, has a function named beginHTML() that writes out a standard MIME type declaration:

Content-type: text/html

After that, it prints the following:

<!-- web.pl version 1.0 -->

It prints this so that the developer can record revision numbers in the documents that the SSI or CGI scripts generate.

Preserving Data in the URL of a CGI

Data can exist in several locations while it's being passed to another CGI script. We've already seen how data can be passed to a CGI script by collecting the data from an HTML form. Data associated with visible (or hidden) HTML tags can be part of the data stream fed into a CGI script (specified by the URL of the HTML form).

When the POST method is used in an HTML form to pass information to a CGI script, the information does not appear in the URL of the resulting CGI script. Instead, it's written to stdout and read by the CGI from stdin. Chapter 19 covers POST and GET in detail. The GET method for passing information from an HTML form to a CGI script modifies the URL of the resulting CGI, and that's where you can store information you wish to
preserve.

A CGI script doesn't necessarily always follow an HTML form; a URL can be explicitly specified as a CGI script. For example, consider this CGI script:

4[/usr/local/bin/perl
# standard linkage
print "Your number is $Form{'number'}\n";
exit(0);

A URL point to this CGI script would resemble the following:

<a href="/cgi-bin/myNumber.cgi?number=10">My number is 10</a>

So far, this example doesn't necessarily preserve any information; it just passes data from the URL to the CGI script myNumber.cgi, which in turn processes the information. In this case, the script just echoes the number back to the user.

To preserve data using information stored in the URL, you have to go a step further. The CGI script starts the process of preserving the information by re-creating a link with the data assembled and appended to the URL as an argument list for the CGI script. For example, you can modify the CGI sample above to be this:

#!/usr/local/bin/perl
# standard linkage
print "Your number is $Form{'number'}\n";
print "<a href=\"/cgi-bin/myNumber.cgi?","number=$Form{'number'}\">Again</a>\n";

When the CGI script generates HTML and constructs links, those links can be to other CGI scripts. Since you are "writing" HTML, you can emulate the passing of information by generating URLs to CGI scripts with the correct arguments built in to each URL.

You'll find that recursive objects (pages and content that are generated from CGI scripts) contain links and constructions that inherit the data passing schemes of previous CGI scripts. Part of the wonder of HTML and CGI programming is that, although HTML is a fairly tight and restrictive markup language, CGI scripting allows you to generate any kind of HTML construction you wish. You can create new links, HTML forms, and most other kinds of dynamic artifacts to perpetuate the passing of data from one page to another.

State information usually is preserved this way, especially when there are only a few items to keep alive. For instance, sites that depend on knowing the identity of users on every page they visit can easily preserve information using techniques based on modifying the URL.

Consider a Web-based shopping tool. Chapter 25, "Getting Paid: Taking Orders over the Internet," explains how the mechanics of available shopping tools are structured. CGI scripts generating pages in such a shopping environment are required to know that the same user who saw page A is now viewing page B. Web shopping environments utilize a container analogy to preserve data concerning what a user has selected from the shelves. To save space and streamline the URLs that make up a shopping site, the best thing to do is to assign a unique shopping ID string.

The shopping ID string is passed from page to page as the user navigates through the shopping environment. Throughout, the user never touches down on "solid static HTML ground," and all the pages are held up by dynamic CGI scripts.

Because the pages traversed in the shopping environment are generated dynamically, you can create shopping IDs and incorporate them into the pages at will. In this way, all links that have anything to do with the shopping environment contain the ID of the user, a piece of information valuable to the CGI scripts that maintain the shopping container for the user.

Let's look at the home page to a basic shopping environment:

<html>
<title>Home Page for Shopping Environment</title>
<!-- standard banner graphic and text -->
<!--#exec cmd="/var/web/bookweb/bin/generateID.pl" -->
<!-- standard footer and toolbar, text -->
</html>

This home page uses an SSI to create userID and generate HTML with userID embedded into the HTML. A fragment of that SSI looks like this:

$userID = &createUserID;
print <<"end of body";
You can enter the shopping environment
<a href="/cgi-bin/shopping.cgi?userID=$userID">HERE</a>
end of body

Preserving data is rarely an application by itself, but using techniques to preserve data builds better applications-especially those that need to maintain state information between pages, or that need the user's identity so that every page can be customized for that user as she navigates through parts of the site.

Philosophy of Data Preservation

Data can be preserved whenever pages are generated dynamically. You've seen some small examples where SSIs were utilized to generate HTML from within a static HTML file. You also know how to generate HTML from a CGI script. Whenever you have the opportunity to execute a program on the server, you can generate HTML or HTTP-specific codes to preserve data.

Preserving data is a commitment to keeping that data for the entire time your users remain in the environment. In some cases, users are even allowed to leave the site, shut down their machines, and return days later with the server still recognizing them. That requires an extreme case of data preservation, and enters the category of user profiling rather than just data preservation. Chapter 24, "User Profiles and Tracking," covers the details of user profiling in situations where the user's identity and "characteristic profile" need to be saved for future use.

The workings of any system that maintains information about users who visit the site deploy some kind of data preservation technique. A situation where the data only needs to be preserved while users are in the environment is best solved by incorporating data into the URL of scripts, or by using hidden data in HTML forms. For example, a small Web shopping environment usually doesn't need to record every detail of what happens when the user collects items but never brings them to the "checkout counter" to purchase them.

Preserving data, therefore, can be split up into two essentially different styles:

The ability for the Web server to support users only for the current session. If a user leaves the site and returns, you don't reestablish the user with data collected from the previous visit.
The ability for the Web server to support users throughout the current session, and remember information about the users when they revisit the site.

Levels of Data Preservation

Imagine that visiting the Web server is the same as visiting a festival or outdoor fair. The fair has a perimeter where every 20 yards, a person stands to sell tickets that allow people to enter. If you pay, enter the fair, and later exit the perimeter, you still have a receipt for what you paid when you entered. If you decide to return later, you simply present your receipt to the person selling tickets and she lets you pass without buying another ticket. Your identity as "a paid customer" is maintained by a token of information you have in your possession.

On the other hand, consider the same outdoor fair without any tickets. Instead of tickets, the person at each entry point sells one-time access to the fair. You pay each time you want to enter. You can decide to leave the fair and come back, but even if it's after just a few minutes, the person at the entry point (even if it's the same as your original entry point) cannot allow you to reenter without paying the fee again.

These two fairs correspond, respectively, to the two types of data preservation described in the preceding list.

If a Web server needs to maintain user information throughout the user's visit to the Web site (for example, in a shopping or chat environment), then it's important to find a way to maintain the user's identity without having to ask the user to repeat the process of "registering" on multiple pages.

Much of this discussion makes better sense when you look at real examples of sites that deal with the problem of preserving data. Both programming and design problems are an issue. The layout of a site depends greatly on how dynamic the transition between pages is, but the same holds true in the other direction. The technical limitations of what can and cannot be generated dynamically also influences the flow of the site, the "story board" a user follows to reach certain places in the site.

Earlier in this book, you can find several example applications that deploy some kind of data preservation technique. There isn't much mystery about how to preserve data with basic HTML tools and well-written scripts and programs.

The Role of Data Preservation

You should come away with a few key points. The main function of CGI scripts is to generate pages and perform system tasks-you know that. What you should add to this knowledge is that the execution of CGI scripts is the best (and usually the only) time you can construct pages that preserve data. This elevates the importance of CGI scripting enough to make it practically an art form. What CGI scripts can do is amazing, and the responsibility they bear in the development of Web-based applications has grown throughout this book.

The role that CGI scripts play in the development of applications for the Web is as crucial as the visual and aural aesthetics of the site. The art of CGI programming involves putting data through the Net and onto the user's browser. In the course of doing so, you have to build systems that get around the default limitations of HTTP. The Web server doesn't know how to relate a user to the page she's viewing-at least, not every Web server and client interaction.

Implementing Uses for Data Preservation

All this talk about data preservation makes us sound like digital naturalists. The aim of preserving information doesn't mean that all information generated by CGI scripts-or all input by users-is worth saving just for saving's sake.

To show a practical use of data preservation, let's enlist the help of Nina, our CGI programmer from Chapter 19.

Nina is working on an application where several pages need to be generated dynamically. She's working on creating a generic pre-search refinement tool, which allows users to pick a criterion to search upon, and then allows them to gradually refine the search so that a more exact match is made against the available data pool. For Nina's sample application, the search can be defined once (Step 1) and refined twice (Steps 2 and 3) before the user must redefine the search criteria (return to Step 1). See Chapter 21, "Recursive CGI," to learn how to modify Nina's sample application so that it accepts any number of search refinements.

For this sample application, we don't know (or care) what type of data source we're searching. We've been given a set of APIs for communicating with the data source:

@queryResults = sendQuery(queryString)
@displayReady = displayResults(@queryResults)
@queryResults = resendQuery(queryString, additionalCriteria)

These three functions are provided as part of the exercise. They're hypothetical functions that are not part of the application. They're meant to be "black boxes" that have no visible internal specifications. They are functions hidden from all, but their effects are plainly described:

sendQuery()-This function accepts a query for a search. It returns with an array of matches, each element containing the pertinent information about one match.If there are no matches, sendQuery() returns an undefined array.
displayResults()-This function is built to properly display the results of a query. The argument taken by displayResults is the same type as the data returned from a sendQuery() function call.
resendQuery()-This function is almost redundant, but according to our hypothetical specification, the data source can be re-searched as long as the original query plus the additional criteria are passed separately. The internals of resendQuery() are optimized to promptly handle a refinement of the original query if the criteria are passed to the search engine as two units.

Nina figures that she only has to preserve two pieces of data, the original search criteria and a list of any additional search criteria offered by the user.

She chooses to take advantage of an HTML form because she can hide data as necessary with a hidden input type, and also because the type of data requested is textual in nature (meaning that text boxes-and, therefore, HTML form-are required).

The HTML entry point to her generic searching tool is built as shown in Listing 20.3.

Listing 20.3 searchEntry.html-The Entry Point to the Generic Searching Tool, search.cgi

<html>
<body bgcolor=ffffff>
<title>Generic Searching Refinement Tool (three steps)</title>
<h1>Generic Searching Refinement Tool, Step 1</h1>
<form method="post" action="/cgi-bin/search.cgi">
What is your initial query string:
<input name="query">
<input name="next" type="hidden" value="2">
<p>
<input type="submit" value="Begin search">
</form>
</html>

This is Nina's HTML for the entry point to the searching tool. Nina decides, for this application, not to mix static HTML with dynamically generated HTML and make the application generate all pages of the application dynamically (see Fig. 20.3).

Figure 20.3: This is the entry point for Nina's generic search tool. This shows how data preservation is used with a generic searching tool system.

Her search.cgi script, as seen in Listing 20.4, grows as she develops more features in this application. For the initial step, however, the CGI script is written this way.

Listing 20.4 search.cgi-Nina's Searching Tool

#!/usr/local/bin/perl

@INC = ('../lib', @INC);
require 'web.pl';

%Form = &getStdin;

&beginHTML;

if ($Form{'next'} == 2 ) {
     # handle the second step
     @queryResults = &sendQuery($Form{'query'});
}
elsif ($Form{'next'} == 3) {
     # handle the third step
     @queryResults = &resendQuery($Form{'originalQuery'}, $Form{'query'});
}
else
{
     print <<"End Of Entry";
<html>
<title>Generic Searching Refinement Tool (three steps)</title>
<body bgcolor=ffffff>
<h1>Generic Searching Refinement Tool, Step 1</h1>
<form method="post" action="/cgi-bin/search.cgi">
What is your initial query string:
<input name="query">
<input type="hidden" name="next" value=2>
<p>
<input type="submit" value="Begin search">
</form>
</html>
End Of Entry
exit(0);
}
@results = &displayResults(@queryResults);

push(@criteria, $Form{'originalQuery'}) if $Form{'originalQuery'};
push(@criteria, $Form{'query'}) if $Form{'query'};

print "<html>\n",
      "<title>Results from Query</title>\n",
      "<body bgcolor=ffffff>\n",
      "<h1>The results from the query:</h1>\n",
      "<h3>Words that begin with:</h3>\n",
      join(",  ", @criteria), 
      "<hr>\n",
       @results;
 

$Form{'next'}++ if $Form{'next'};
if ( $Form{'next'} < 4 ) {
   print <<"End of Refinement";
<form method="post" action="/cgi-bin/search.cgi">
What is your initial query string:
<input name="query">
<input type="hidden" name="next" value="$Form{'next'}">
<p>
<input type="hidden" name="originalQuery" value="$Form{'query'}">
<input type="submit" value="Begin search">
</form>
End of Refinement
}
else
{
   print <<"End of TryAgain";
You're at the end of the refinement chain.  Do you wish to restart
the search with a new initial search entry?
<a href="/cgi-bin/search.cgi">Yes</a> or
<a href="/index.html">No</a>
End of TryAgain
}
print "</html>\n";
exit(0);




sub sendQuery {
   local($arg)=@_;
   local(@results);

   @results = 'look $arg';
   return @results;
}

sub resendQuery {
   local($arg, $arg2) = @_;
   local(@results);

   @results = (&sendQuery($arg), &sendQuery($arg2));
   return @results;
}
 

sub displayResults {

   local(@res) = @_;

   return ("<p>", join(",  ", @res), "<p>");
}

As the sequence continues, the user can see the output from the search, where the search criteria are passed to the next page (see Fig. 20.4). The result of the first search shows what words started with the expression entered.

Figure 20.4: The output from the first search includes a box to refine the search.

The next stage is to ask the user for more information to simulate the refinement of the search.

After the second refinement, the last page of output generated by the script shows a link to ask the user if she wants to go back and try a new sequence or stop searching (see Fig. 20.5).

Figure 20.5: This is the final stage of the searching refinement tool. If the user wants to start over, she would click on Yes.

Let's take a detailed look at each section of Nina's script:

#!/usr/local/bin/perl

@INC = ('../lib', @INC);
require 'web.pl';

%Form = &getStdin;

&beginHTML;

That's a standard linkage for Perl scripts.

The HTML form asking for the first query is State 1. When a user makes her first query, she enters State 2. You can set the state to 2 by using a hidden input type while asking for the initial search query:

if ($Form{'next'} == 2 ) {
     # handle the second step
     @queryResults = sendQuery($Form{'query'});
}

If a query has already been processed, then the user's in State 3 to resend a query for processing. The combination of the original query and new query uses the resendQuery() function:

elsif ($Form{'next'} == 3) {
     # handle the third step
     @queryResults = resendQuery($Form{'originalQuery'}, $Form{'query'});
}

If it's unclear what state the user's in, then obviously there's no preserved information-in this case, the user starts from scratch. The first page a user sees with this application is a welcome message and a request to enter the first query:

else
{
     print <<"End Of Entry";
<html>
<title>Generic Searching Refinement Tool (three steps)</title>
<h1>Generic Searching Refinement Tool, Step 1</h1>
<form method="post" action="/cgi-bin/search.cgi">
<input type="hidden" name="next" value=2>
What is your initial query string:
<input name="query">
<p>
<input type="submit" value="Begin search">
</form>
</html>
End Of Entry

That's all the program needs to do, so let's end it:

exit(0);
}

We exit the script unless the current state is one where a query has been made-in that case, it's time to display the results. The internal format of the query results is unknown to us, so we pass the query results to the displayResults() function so that they are "HTMLized" and each line is stored in the @results array:

@results = &displayResults(@queryResults);

push(@criteria, $Form{'originalQuery'}) if $Form{'originalQuery'};
push(@criteria, $Form{'query'}) if $Form{'query'};

Next, start the CGI sandwich:

print "<html>\n",
      "<title>Results from Query</title>\n",
      "<body bgcolor=ffffff>\n",
      "<h1>The results from the query:</h1>\n",
      "<h3>Words that begin with:</h3>\n",
      join(",  ", @criteria), 
      "<hr>\n",
       @results;

If the application was in a state specified by the $Form{'next'} variable, then increment the variable by one:

$Form{'next'}++ if $Form{'next'};

If the new state requires additional search criteria, then build the HTML form to ask for additional information. Remember to preserve the old query as a hidden input type:

if ( $Form{'next'} < 4 ) {
   print <<"End of Refinement";
<form method="post" action="/cgi-bin/search.cgi">
What is your initial query string:
<input type="hidden" name="next" value=$Form{'next'}>
<input name="query">
<p>
<input type="hidden" name="originalQuery" value="$Form{'query'}">
<input type="submit" value="Begin search">
</form>
End of Refinement
}

If we've run through all the possible states allowed by this application, we then offer the user two links, one to return to State 1 to make a totally new query and the other to exit to the home page:

else
{
   print <<"End of TryAgain";
You're at the end of the refinement chain.  Do you wish to restart
the search with a new initial search entry?
<a href="/cgi-bin/search.cgi">Yes</a> or
<a href="/index.html">No</a>
End of TryAgain
}
print "</html>\n";
exit(0);

The preceding example used the UNIX command "look" to generate output on a "search."

Chapter 20

Preserving Data

CONTENTS