Chapter 22 How to Build HTML On the Fly

Building HTML On the Fly
The Real Estate Project
Project Implementation
Other Lessons

This chapter is about building HTML on the fly. We probably could have talked about dynamic HTML before recursive objects and data preservation, because the topic of generating HTML is a little easier than the higher-level issues of recursive CGI programming. But the techniques of recursive CGI programming and data preservation lend themselves well to the simple task of generating HTML on the fly.

Generating HTML on the fly is really just about sending HTML to the browser from a CGI script or SSI. By default, there isn't anything dynamic about generating HTML on the fly. The HTML pages are generated via the execution of CGI scripts on the Web server. So they are dynamic in the sense that they are built on the spur of the moment, but the content of the pages isn't necessarily dynamic and changing from time to time.

This chapter covers several examples on generating HTML on the fly. We use the term on the fly interchangeably with generated dynamically and dynamically. The distinction between them is too small to make any comment about what term you should use. The authors prefer to use generated dynamically and dynamically.

Building HTML On the Fly

We consider on the fly to mean at the moment when the page is created. Pages are created on the fly, or dynamically, when a process on the Web server generates output that a Web browser can interpret as a document. In our case, the document is an HTML document, so the output of the Web server must be in HTML.

Output from Programs

A Web server generates output in several ways. The nature of a Web server is to serve files-to make information available to the user who is requesting that information. Most of the documents currently on the Web are static HTML documents. A file exists on the Web server. When the Web server honors a request for the document, the file is read and transferred to the client as requested.

The same idea holds true with respect to dynamic HTML and building HTML on the fly. Instead of a flat file or document that is read by the server when requested, a page generated dynamically is considered to be the output of a program running on the Web server. More specifically, the program is started at a moment determined by the actions of the user.

The actions of users in the Web environment involve the manipulation of buttons, links, and the like. The actions of the user that cause pages to be generated dynamically usually involve the HTML form and, moreover, require some kind of script or program to execute on the Web server.

The programs that execute on the Web server, generating output that is interpreted as HTML, are what we are calling CGI scripts. We have already given you numerous examples of CGI scripts, but we haven't related them to the process of generating HTML on the fly in a formal way.

The formal specification of how CGI scripts generate HTML is partially documented within the configuration of the Web server. Documents pouring out of the Web server can be formatted as any of the many MIME types. HTML documents are MIME type text/html. Other documents have different MIME type labels associated with them.

To generate HTML on the fly, you start with a program or script that you write on the Web server. Most of the examples for CGI scripts so far have been in the programming language Perl. The authors have been using Perl as the language for implementing programs that generate HTML on the fly.

The process of generating HTML on the fly again begins with the CGI program. When the program is executed, it generates output. In other words, if you were logged into the Web server and could type the name of the CGI script at the shell prompt, the output of the CGI script would appear. The output of the CGI script is written to stdout.

The CGI script is an executable program on the Web server that is going to generate output. For now, the only kind of output that you should be concerned about is HTML. So the script is going to generate an HTML page. The data generated by the CGI script is transmitted to the Web browser over the network. The Web browser accepts the data and interprets it according to the kind of data it is. For example, if you point your browser to a GIF image

http://some.server.com/graphics/boat.gif

you see an image. The MIME type of the document is a graphical image. The Web browser determines this MIME type by the contents of the file and the file extension.

The contents of the file boat.gif are transmitted to the browser. The Web server is aware of several MIME types.

Note

You should find the mime.types file located in the conf directory for NCSA Web servers. Netscape Commerce servers store the mime.types file in the config directory.

The MIME type of the "document" precedes the transfer of the document itself. The MIME type tells the Web browser what type of the document to expect and how to interpret the contents.

When we are writing CGI scripts to generate HTML on the fly, we need to generate output that the Web browser eventually interprets as HTML. The MIME type for an HTML document (plain-vanilla HTML; other kinds of HTML exist) is text/html.

The first thing that our CGI script will do to generate HTML on the fly is output the following line:

Content-type: text/html

If this output is read by the browser, any subsequent data sent during the connection is handled as an HTML document.

The following outputs the MIME type in Perl:

#!/usr/local/bin/perl
print "Content-type: text/html\n\n";

The following outputs the MIME type in C:

#include<stdio.h>
main() {
  printf("Content-type: text/html\n\n");
}

In summary, the CGI script generates output and the output is read in by the Web browser. The first thing that the Web browser should see is the MIME type of the data to follow.

In the examples, we've created a subroutine called beginHTML that does the work of the following:

print "Content-type: text/html\n\n";

Most of the examples that we have generate HTML, so we added the subroutine beginHTML to our library of subroutines, web.pl.

Now our CGI scripts start as follows:

#!/usr/local/bin/perl
require './web.pl';
&beginHTML;

If we are trying to only generate HTML on the fly, the use of

require './web.pl';

is specific to our examples. When the Web server executes the CGI script, the "current working directory" of the script is the directory in which the script exists.

Where Scripts Exist

The cgi-bin directory is the root directory for all the CGI scripts for the Web server. In URL terms, http://some.host.com/cgi-bin/foo.cgi means that the path /cgi-bin/foo.cgi refers to a file, foo.cgi, located in the directory cgi-bin.

Now consider the URL http://some.host.com/document.html. The path /document.html refers to a document in the document root of the Web server. The leading slash (/) signifies the document root.

The leading slash in /cgi-bin/foo.cgi, however, does not refer to the document root, but to the server root of the Web server.

In our CGI scripts, we have used the following standard linkage:

#!/usr/local/bin/perl
require './web.pl';

Because the current working directory of a CGI script is the directory in which the CGI is located, ./web.pl refers to the file web.pl, located in the same directory as the CGI script itself. If we say require "web.pl" (we're not explicit about the location), the Perl script uses the built-in array @INC to find the library referenced as web.pl from the search path of @INC.

If we move web.pl into a directory specified by @INC, we could just say

require "web.pl";

require 'web.pl';

Our examples on the CD that accompanies this book use a custom library that we have been adding to since the beginning of the book; we placed it in the /cgi-bin directory so that the CGIs that we write can reference it.

Back to the Output

The output from our CGI scripts starts with the MIME type of the data that is about to follow:

#!/usr/local/bin/perl
require './web.pl';
&beginHTML;

From this point on, anything that is sent to stdout (output) is treated as HTML.

We continue our script as follows:

print "<html>\n",
   "<title>This is a dynamically generated page</title>\n";

These lines print the proper HTML to declare a block of HTML code, and the TITLE tag defines the text that is to appear in the window's title bar. (In most browsers, the string of text between the <TITLE> and </TITLE> tags appears as a title-bar title.)

We continue with the following:

print "<H1>Welcome to the world of Dynamic HTML</H1>\n";

We could go on, but the message so far is that the output that you send to the browser via a program (a CGI script) is interpreted as data to be processed according to the MIME type of the data.

Generating HTML on the fly, then, involves the following steps:

Cause a script or program to be invoked by the Web server.
Generate output.

Step 2 has three parts:

Generate output before the MIME type is declared. The generated output can be data setting or querying for HTTP cookies.
Generate output for the MIME type of the data to be sent next.
Send the data in the format specified by the MIME type.

Step 1 starts when the user does something to cause the script to be invoked by the Web server. The Web-server process is the parent process of the CGI script, using the operating system to load and execute the CGI script. When the CGI script is executing, the script begins to generate output.

Step 2 is split into three main parts. The first part generates optional output, such as setting HTTP-cookie values or sending queries for the values of HTTP cookies. As the technology of HTTP and browsers improves and expands, more details could occur at this initial step of data output. For now, the practical uses of HTTP cookies are just becoming apparent.

The next part of Step 2 sends the MIME type to the browser. This output, generated by the CGI script, is a one-line text message in the following format:

Content-type: XXX

XXX is the intended MIME type. To generate HTML on the fly, the MIME type is text/html.

The third part of Step 2 sends a couple of new-line characters to break up the "preamble" of the document (the optional HTTP cookie negotiations and MIME-type declarations) from the body of the document. By body of the document, we mean the entire HTML intended to be sent out, not just the data enclosed within the <BODY> and </BODY> HTML tags.

These introductory words about the process of generating HTML on the fly are meant to illustrate the importance of relating the MIME type to the data that is being sent to the browser. Most of the examples and applications in this book that deal with generating pages send the MIME type text/html, because most of the pages generated by the applications in the book are written in HTML.

The Real Estate Project

Our fictional Web development company from Chapter 19, "How to Build Pages On the Fly," is hard at work devising a plan to develop a site for a real estate company. One of the issues is providing users of the site a way to browse homes and property that are for sale. The Web company has assigned Nina, the CGI programmer, the task of developing the home- and property-searching tool. The function specified is to allow a user to locate a home or property based on criteria such as price, location, and size of the home.

Data Formatting

The data source is important for determining the scope of functionality. The format of data is not as important as the kind of data. For our real estate project, we have been given a schema of the information for all the homes and property available. The extent of the data and its relation to other data are part of the schema.

For each property and home, we have been told that the following data is available:

Address
City
ZipCode
Price
Property Type (home or property)
State (for example, is the home new?)
Size (square feet)
Acreage
Availability
FlavorText (a paragraph that describes the property, including keywords for features of the property, such as fireplace, deck, and garage)

The data has been placed in a table. We're using a flat file on the Web server to store the information, because this phase of the project is only the first phase. In other chapters, we show you how to integrate a true relational database with a Web server. Storing data in a database has significant advantages, but for our proof of concept project with the real estate company, we are going to use a flat file with tab delimiters between fields.

The types of the elements are as follows:

Address-A variable-length text string (at most, 255 characters).
City-A variable-length text string (at most, 50 characters).
ZipCode-A variable-length text string (at most, 20 characters).
Price-A large cardinal number (such as 129000 for a home that costs $129,000).
Type of Property-A fixed-length string that contains a keyword (such as HOME, LAND, CONDO, and DUPLEX). If the property is under construction, we'll precede the type with the string PARTIAL. A home under construction, for example, will have the type PARTIAL HOME. We will also allow the use of CUSTOM for a property type when the property's type is unknown.
State of Property-For new properties, the keyword NEW will be used. If the property is old, the age of the property (the home, not the land) is specified as a floating-point number. The number 1.2 will be interpreted as a property (home, condo, or duplex) that is 1.2 years old.
Size of Property-A cardinal number, such as 1200. The size of the property will be in either of two formats. If the type of property is HOME, 1200 will be interpreted as 1,200 square feet. If the type of property is DUPLEX, 1200 will be interpreted as 1,200 square feet per unit.
Property Acreage-All property is built on a piece of land, and the total acreage of the land parcel is specified as a floating-point number. The number 2.4 will be interpreted as 2.4 acres.
FlavorText describing property-A variable-length text string (at most,\512 characters).

The format of the data file is a list of lines. Each line has nine fields. The fields are separated by tab characters. The order of the fields for each row (record) is Address, City, ZipCode, Price, Type, State, Size, Acreage, and Desc.

The Interface Design

Now that we have settled on the scope of the data available to search, we can define what inputs the user can make to request a query. Not all the fields are good candidates for the interface page. In other words, we are not going to use address as a search criterion, because we don't expect the user to know the addresses of the properties. Users are using the tool to find properties, and the X on the map that marks the location of a property is the last thing that users would know about the kind of property that they are looking for. We want to get users to find as many Xs (matches) on the map as possible, in accordance with the input search criteria that they provide.

City, ZipCode, Price, Type, State, Size, and Acreage are fields that can be used as search criteria for a property. Addresses and text descriptions will be stored as information to help flesh out the result page (when the search finds matches).

The art of CGI programming involves some forethought on how data will be used, especially when the data is central to the use of the site. The data on homes and properties is the object of the search. While we design the interface to search the data, we can look at our schema to get help on how we ask the user for input. Fields, such as Price, Size, and Acreage, contain values that are unique to the property, so we will allow users to input those criteria directly (in text boxes). Fields, such as type and state, are stored as keywords that are unique to properties, but they can be the same for other properties. We'll use a list box to allow users to select the type and state of the property.

For the fields that are property-specific (Price, Size, and Acreage), we will use two buttons: Less Than and Greater Than.

In a way, the HTML form built to ask the criteria for the search foreshadows the CGI script's logic. The questions asked in the HTML form set the direction and logic that the CGI script will use to find the best match. In the HTML form that queries for the real estate information, we are asking the user to supply a price range and to indicate whether the results should be less than or greater than that amount. In the CGI script, when we shuffle through all the data, finding matches, we will compare prices of property with the price entered by the user and determine whether the price is greater or less than the value that the user entered.

The Search Flow

Our search project consists of a cycle of page generation. The user who comes to the search page starts the process of searching by selecting the initial criteria. Our real estate search tool is going to be as interactive as possible. To realize this goal, we need to involve the user in the searching process and give him or her immediate, reversible results. After the initial criteria are selected and submitted, the CGI generates matches based on the criteria and responds by generating a new page-the result page.

The result page does two things. First, space on the page is reserved for the matches. The data returned by the CGI script is the purpose of the searching tool. Users will see what elements from the set of real estate data match their initial search criteria. The second part of the return page is a list of input options that the user can modify to refine the search. We are building a search tool that allows the user to locate information based on criteria, but we also should allow the user to backtrack to an earlier set of search criteria.

Suppose that a user chooses to finds homes that cost less than $129,000 and that are built on at least 4 acres of land. The search might come back with only one home that meets those criteria. We are obligated to make the results "immediate and reversible," which means that the result page (with the one match) must display a new HTML form to allow the user to switch back to an older set of criteria.

We are using the basic HTML form and CGI method to construct the real estate search tool, so we are limited in how "immediate" the result data can be generated. A CGI script must execute and perform logic on the set of all real estate data and come back with a set of matches. This isn't as instantaneous as we would like it to be. The flow requires the user to do the following things:

Enter search criteria.
Click a button to begin searching.
Wait for a new page to be generated.

With other dynamic page-generation techniques, such as JavaScript, you can build pages that update themselves and change as new data is entered without any page being generated via CGI scripts. We discuss this interesting method of page generation later in the chapter.

The flow of our searching tool is traditional for the first attempt. We start with a start-search HTML form; successive pages are generated via CGI scripts. Each of these pages has two parts: a result or response, and access to updating or altering the query. This repetitive process of searching lends itself well to data sources that are inherently relational.

If our real estate database is fully maintained, we will design the first dynamically created HTML form to be customized based on the initial search. If a user of that system comes to the first static HTML search form and indicates that he is looking for houses, the script will list all houses (see Fig. 22.1). The next HTML form won't ask again what kind of property the user is looking for (houses, condos, and so on); it already knows that houses are the key to the search and offers other query options, such as number of bedrooms and gas or electric heat. If the initial search criteria is not for houses, but land, the resulting pages will further refine the search criteria through additional questions about view and zoning (see Fig. 22.2).

Figure 22.1: The primary search page criteria.

Figure 22.2: After the first search, refinements to the search are requested.

Our data source doesn't suggest that we are capable of creating a relational searching tool. Our tool will generate result pages and new query options that match the initial search criteria.

The Result Page

The result of searching the real estate data produces pages that are split into two parts. The top part of the page has the result, neatly formatted, with all the elements of the unit or land parcel. If there are several matches, the matches are listed in order (best match to worst match). If the number of matches exceeds our usability specification-in other words, if there are too many matches to make the search useful-we will borrow ideas from BusyChat to allow the user to visit more matches.

We entered with an HTML form for setting criteria, and we want to remain in a cycle of pages that are generated by refining the criteria. Following links away from the result page for more results does not conform to our design model.

Imagine an auto race. You start the race in your car and make laps around the track. If, for some reason, your car needs repair, you return to the pit and get a new car. What happens on the track is the experience of racing; you can experience the race only on the track. You can use any car that you have available, but if you enter the race with a car, you have to remain in the race with a car.

The analogy is that you enter the search through an HTML form. You should remain in the environment of searching by continuing to refine the search by using new instances of the HTML form-generated dynamically after a search iteration, or from the initial static HTML form. As for any Web component that you build, an exit link is always available to stop the search. But any link away from the searching tool that leads to a new instance of a page generated by the searching tool should come from the event of refining the search.

The reason for closing the searching cycle so tightly is that users can be fickle. Users notice subtle differences in what they see if they try different links in the site. If the search tool gives you the impression that you can customize the search without using the HTML form, users will be led away from the very tool that was designed to help them find what they are looking for. This is a finer concept of dynamic HTML generation that will come back in other applications. This issue plays out significantly in designing dynamic shopping areas.

Project Implementation

The process of creating the HTML form and CGI script to process queries is iterative. Seeing is believing, so we should create the HTML form and a skeleton CGI script to actually see the process of searching and generating results. We are going to disable the real searching functions of the CGI script until the flow is working properly. As you recall from earlier chapters, the responsibilities of CGI scripts are to generate content and to perform system tasks. Our HTML form and CGI skeletons will satisfy the first role of CGI scripts; we're writing only the code and HTML necessary to generate the HTML to define the search flow. When we are satisfied with how the interface and interactivity work, we can insert into the CGI script the modules of code that fetch information from our data source and process it for display.

The Real Estate Search HTML Form

Now we need to create the HTML form that allows the user to search the real estate data source. We'll place those fields into the HTML form (the usual things, such as HTML and TITLE tags, are being left out of the skeleton).

Our plan is to start the searching flow with questions about the type of property, such as new or old construction (see Listing 22.1 and Fig. 22.3). The remaining search criteria will be built into the result pages.

Figure 22.3: Real estate can be searched by type and new/old construction.

Listing 22.1 RealEstate.html-Real Estate Searching Interface

<FORM METHOD="POST" ACTION="/cgi-bin/real-estate/search.cgi">
Type of Property:
<P>
<select name="propertyType">
<option value="HOME">Homes
<option value="CONDO">Condos
<option value="DUPLEX">Duplex
<option value="LAND">Land Parcels
<P>
New Construction: <INPUT name="newConstruction" value=1 type="radio"> Yes
<input name="newConstruction" value=0 type="radio"> No
<P>

<input type="submit" value="Start Search"> <BR>
<input type="reset" value="Reset">
</FORM>

Listing 22.2 shows the code for the real estate searching tool.

Listing 22.2 realSearch.cgi-The Real Estate Searching Tool

#!/usr/local/bin/perl

require '../lib/web.pl';

%Form = &getStdin;
&beginHTML;

&setUpGlobals;
&setUpPage;

$searchCriteria = &buildSearch(%Form);
@theResults     = &genericSearch($searchCriteria);

&displayForm(%Form);  
&displayResults(@theResults);
# &displayProlog;

exit(0);

sub genericSearch {
    local($searchParameters) = @_;
    local(@dataSet, $element, @match, @normalizedResults);

    @dataSet = &grabData;

    foreach $element (@dataSet) {
       if (&compareData($searchParameters, $element)) {
             push(@match, $element);
       }
    }
    @normalizedResults = &parseResult(@match);
    return @normalizedResults;
}


sub displayResults {
     local(@normalizedResults) = @_;
 
     &resultSummary(@normalizedResults);
     foreach $resultItem (@normalizedResults) {
          &constructResult($resultItem);
     }
}

sub resultSummary 
{
  local(@theResults) = @_;

  print "There were ", $#theResults+1, " entries in the result set<P>\n";
}


sub constructResult 
{
   local(@res);
   @res = split(/\t/, $_[0]);
   print "<table border=4>\n",
    &row("Type of property", $res[$propType]),
    &row("Address", $res[$propAddr]),
    &row("City", $res[$propCity]),
    &row("Zip", $res[$propZip]),
    &row("Cost", "\$ $res[$propPrice]"),
    &row("Construction", $res[$propState]),
    &row("Square Feet", $res[$propArea]),
    &row("Acreage", sprintf("%0.02f", $res[$propAcre])),
    &row("Available", $res[$propAvail]),
    &row("Description", $res[$propFlav]),
    "</table>\n";
}

sub row 
{
  local(@r) = @_;
  local($out, $col); 
  
  $out = " <tr>\n"; 
  foreach $col (@r) {
      $out .= " <td>\n";
      $out .= "   $col\n";
      $out .= " </td>\n";
  }
  $out .= " </tr>\n";

  return $out;
} 
sub displayForm {

  local(%fdata) = @_;


print <<"endofForm";

<HTML>
<TITLE>Listing 1, Chapter 22 Programs for Webmaster Expert Solutions</TITLE>
<BODY BGCOLOR=FFFFFF>
<H1>Listing 1, Chapter 22</H1>


<FORM METHOD="POST" ACTION="/cgi-bin/realSearch.cgi">
Type of Property: $fdata{'propertyType'}
<p>
<input type="hidden" name="propertyType" value="$fdata{'propertyType'}">

<P>

New Construction: <INPUT name="newConstruction" value="NEW" type="radio"> Yes
<input name="newConstruction" value="OLD" type="radio" checked> No
<P>

Price:  Less than:
<input name="targetPrice" size=9>
<p>
Size:  Less than:
<input name="targetSize" size=5>
square feet
<p>
Land: Less than:
<input name="targetAcre" size=5>
acres
<p>

<input type="submit" value="Start Search"> <BR>
<input type="reset" value="Reset">
</FORM>


<HR>
endofForm

}



## functions

sub buildSearch 
{

   local(%formData) = @_;
   local($query);

   # propertyType
   # newConstruction

   $query =  "(\$elements[$propType]  =~ /$formData{'propertyType'}/) ";
   $query .= "&& (\$elements[$propState] =~ /$formData{'newConstruction'}/)";
   
   if ($formData{'targetPrice'}) {
      $query .= " && (\$elements[$propPrice] <= $formData{'targetPrice'}) ";
   }
   if ($formData{'targetSize'}) {
      $query .= " && (\$elements[$propArea] <= $formData{'targetSize'}) ";
   }
   if ($formData{'targetAcre'}) {
      $query .= " && (\$elements[$propAcre] <= $formData{'targetAcre'}) ";
   }

   return $query;
}

sub compareData
{
   local($q, $item) = @_;
   local(@elements, $x);
   @elements = split(/\t/, $item);
 
   eval "\$x = $q;";
   return $x;
}

sub grabData 
{
   local(@theData);

   open(IT, "< $PROP_DATAFILE");
   chop(@theData = <IT>);
   close(IT);
  
   $TOO_LARGE = $#theData>0?int($#theData/2):0;

   return @theData;
}


sub parseResult
{
    local(@in) = @_;

    return @in;
}

sub setUpGlobals 
{
  while(<DATA>) {
    ($var, $val) = split(/\t/);
    eval "\$$var = $val;";
  }
}

sub setUpPage
{
   print "<body bgcolor=ffffff>\n";
}

  END  
PROP_DATAFILE    "/t2/home/jdw/bookweb/cgi-bin/properties.txt"
propAddr   0
propCity   1
propZip    2
propPrice  3
propType   4
propState  5
propArea   6
propAcre   7
propAvail  8
propFlav   9

Analyzing the Flow of the Search

We are developing a searching mechanism for finding homes from a flat-file data source, yet our application is built with several thoughts in mind.

First, we want the interface to the user to flow in a normalized manner. While searching, the user should see artifacts (buttons, list boxes) that connotate a "searching tool" interface. In other words, the main function of the real estate search should be to find a piece of real estate that best matches the search criteria. On the other hand, we want to give the user other options if nothing appropriate can be found. Even if nothing matches the user's preferences, we need to devise a way to give the user something else to satisfy the act of searching. Instant gratification (a great deal of it, is the mainstay of CGI programming). We have to be prepared for the possibility that the search will come up empty, just as we are ready to allow the user to peruse several successful matches. The matches are searchable in a style that is regular and consistent with the design of the site. That is what we mean by a normalized manner.

A technological step is being made with the searching tool. When designing the CGI script to process the search, we should keep in mind that CGI scripts evolve just as Web sites do. CGI scripts go along an evolutionary track. We began the framework as a skeleton script and skeleton HTML search page. As the Webmaster who will develop similar applications, you should bear in mind that CGI programming for generating dynamic HTML is a cycle.

We start by looking at the data source. We devise a relational viewpoint to the data, if any can be found. For our example data set, we don't have any strong relationships among the fields to split them apart. In another situation, such as a university library, data sets on their own are microcosms of data. The book table, which stores all information about the books on the shelves, is in itself a database. But the library has patrons, and the relationships of books to patrons exists through an intermediary loans table. Books are loaned to patrons.

For our real estate application, we have objects such as houses, duplexes, condos, and land parcels. Each object has attributes, such as size, state of repair, and age. Our CGI script to implement the searching capabilities of the real estate data source must take into account the possibility of a new data-storage system. The function for retrieving the data is opaque to the nature of the CGI script. The database searching paradigms of the application are influenced by data abstraction tendencies. CGI scripting isn't so abstract since it must come up with HTML "on demand" and not leave that to the browser. CGI scripts don't output: "Hey, uh, show them a table of neat links that have a certain pattern." Databases are designed to allow such abstractions given that those abstractions can be formulated in SQL or some other query language. The CGI script that utilizes any sort of database must connect the "world of data abstraction" (database theory) with a world built around a functional language like HTML.

The functionality of the real estate application shows how the data used to solve the query can be interchanged with other data and still produce the same result: matches based on search criteria. Comparing the search criteria with the available data set is a section of the model that can be replaced by newer methods or methods that use a higher level of data organization. A database query to a relational database could expand the potential searching capabilities by exploring concepts of fuzzy matches and relations that are special to the data source.

These issues we've covered so far dwell on CGI programming concerned with the behind-the-scenes activity of dynamic page generation. The other exciting part of dynamic page generation is making the pages themselves. It almost seems that the larger the data source, the more generalized the CGI script must be to handle the presentation of information. We can explore this situation with a small example-the matrix builder.

The Matrix Builder

A matrix is a grid of information. The mathematical notation usually is a grid of numbers with vertical bars on each side, as follows:

| 1 4 5 |
| 3 4 7 |
| 0 2 2 |

This matrix is 3by3. We might want to build a CGI script that can display information in a matrix layout. We might start with a 3by3 matrix-display function, as in Listing 22.3.

Listing 22.3 display3by3Matrix.cgi-First Attempt at Function to Display a 3by3 Matrix

#!/usr/local/bin/perl


@INC = ('../lib', @INC);
require 'web.pl';

&beginHTML('build 3by3 Matrix', 'bgcolor=ffffff');

&displayMatrix3by3('A' .. 'J');

exit;


sub displayMatrix3by3 {
  local(@element) = @_;
  print "<table>\n",
     "<tr>\n",
      "<td> $element[0] </td>\n",
      "<td> $element[1] </td>\n",
      "<td> $element[2] </td>\n",
     "</tr>\n",
     "<tr>\n",
      "<td> $element[3] </td>\n",
      "<td> $element[4] </td>\n",
      "<td> $element[5] </td>\n",
     "</tr>\n",
     "<tr>\n",
      "<td> $element[6] </td>\n",
      "<td> $element[7] </td>\n",
      "<td> $element[8] </td>\n",
     "</tr>\n",
     "</table>\n";
}

Elements from a 3by3 matrix stored in an array passed to displayMatrix3by3() will be displayed in table format (see Fig. 22.4).

Figure 22.4: Each cell (element) is printed out "by hand."

The function exaggerates the hard-coded problems of this function. A CGI script built on using scripts, such as this, cannot adapt to the changes needed to meet new demands on the site. Implementing any functions to generate pages should follow a tool and filter philosophy. The tool and filter philosophy is the root of scripting and programming in the UNIX environment. "Do one thing well," that is the philosophy. Programs and scripts that do one thing well can be connected together because the "stdout" of one function is the "stdin" to another, and so on. Simularly, functions that are generic and do one thing well can be reused. The following matrix-builder function (see Listing 22.4) is redesigned to handle a matrix of any shape (see Fig. 22.5).

Figure 22.5: The arguments to genericMatrix( ) are passed in the URL.

Listing 22.4 genericMatrix.cgi-Second Step to Refine 3by3 Matrix Builder

#!/usr/local/bin/perl

@INC = ('../lib', @INC);
require 'web.pl';

%Form = &getStdin;

&beginHTML('generic matrix', 'bgcolor=ffffff');

&genericMatrix($Form{'rows'}, $Form{'cols'}, $Form{'start'} ..
$Form{'end'}); 
exit;


sub genericMatrix {
  local($rows, $cols, @elements) = @_;
  local($i, $j, $x);

  print "<table>\n";

  for($i=0;$i<$rows;$i++) {

    print "<tr>\n";

    for($j=0;$j<$cols;$j++) {
     print "<td>$elements[$x]</td> ";
     $x++;
    }

    print "</tr>\n";
  }
  print "</table>\n";

}

This function is generic; it can accept an array of elements of any size and display those elements in a matrix format. Although we are oversimplifying the example, the point is that CGI scripts that generate pages need to be generic. Functions in CGI scripts should be abstract functions that are explicit in how they present data. This abstraction makes them capable of being reused for any situation where they are appropriate.

In an effort to refine the matrix example one step further, we could change it (see Listing 22.5).

Listing 22.5 genericMatrix2-The Final Version of a Generic Matrix Builder

#!/usr/local/bin/perl


@INC = ('../lib', @INC);
require 'web.pl';

%Form = &getStdin;
&beginHTML('Generic matrix2', 'bgcolor=ffffff');

print "Using TABLE<br>\n";

&genericMatrix2('table',$Form{'rows'}, $Form{'cols'}, 
                $Form{'start'} ..  $Form{'end'});

print "Using PRE<br>\n";


&genericMatrix2('pre',$Form{'rows'}, $Form{'cols'}, 
                $Form{'start'} ..  $Form{'end'});

exit;



sub genericMatrix2 {
 local($kind, $rows, $cols, @elements) = @_;
 &beginMatrix($kind);
 while ($rows) {
   &buildRow($kind, splice(@elements,0,$cols-1));
   $rows--;
 }
 &endMatrix($kind);
}


sub beginMatrix {
   local($type) = $_[0];
   print "<$type>\n";
}

sub endMatrix {
   local($type) = $_[0];
   print "</$type>\n";
}

sub buildRow {
  local($type, @elements) = @_;

  if ($type =~ /table/i) {
      print "<tr>\n",
            "<td>\n",
            join("</td>\n<td>\n", @elements),
            "</td>\n",
            "</tr>\n";
  }
  elsif ($type =~ /pre/i) {
      print join("\t", @elements), "\n"; 
  }

}

Now our matrix builder can be useful for several kinds of display needs (see Fig. 22.6).

Figure 22.6: The generic function needs arguments for what kind of table to display.

We can define a beginMatrix() function to start the display (perhaps with a <TABLE> declaration, but not necessarily). The matrix could start with a <PRE> tag. The buildRow() function uses the next bunch of elements from @elements to construct a new row of the matrix. The endMatrix function is almost a twin of beginMatrix, ending the display with </SOME-TAG>.

Other Lessons

The matrix builder example shows that when we are starting to write CGI scripts to generate HTML on the fly, it becomes necessary to create subroutines and functions to perform tasks that repeatedly come up. We created the beginHTML function, for example, as a way to say, "The MIME type of the data following this is to be accepted as an HTML document."

HTML that contains server-side includes (SSIs) is called parsed HTML, because the Web server looks at each line and checks for the existence of the tokens to signify the use of server-side includes.

When the Web server is parsing an HTML document, it checks for the tokens that build the syntax of how SSIs are specified.

The HTML

<!--#exec cmd="/var/web/bin/thisProg.pl" -->

is almost like the HTML for a comment, as follows:

<!-- This is a comment -->

The process of generating HTML on the fly with SSIs is very similar to CGI scripting, except that the MIME type isn't sent to the browser for the output generated from the SSI.

Additionally, it's worth mentioning that combining CGI scripts with SSIs is not possible. In terms of output generated by CGI scripts, the output should not contain SSIs, because SSIs are detected when the HTML is parsed by the Web server. Output from a CGI script is never parsed by the Web server, so the Web server cannot sandwich the output of the page generated from the CGI script and the output gleaned from the SSIs.

Chapter 22

How to Build HTML On the Fly

CONTENTS