Chapter 6

Reducing Maintenance Costs with Server-Side Includes


CONTENTS


The preceding chapters describe how to set up an effective HTML page-what is often called a static page. Most sites will be made up primarily of static pages. Some pages, on the other hand, need to be generated on the fly. These pages are best produced using techniques introduced in Chapter 7, "Extending HTML's Capabilities with CGI."

Between static pages and pages generated on the fly lie pages which exhibit some dynamic behavior in what is otherwise a static page. These pages are best built using server-side includes, or SSIs. While not all servers have SSIs enabled, most do, allowing the Web site developer to automate a number of burdensome maintenance tasks.

SSI Basics

In addition to the variety of HTML tags available, Web servers support a standard set of commands called server-side includes (SSIs). Here's how they work. Recall that HTML comments look like this:

<!--This is a comment-->

Server-side includes are embedded in comments like this:

<!--#command tag="value"-->

Realize that tag here has nothing to do with HTML tags. Instead, these tags carry information used by the command in performing its job.

Let's look at a couple of examples. The following line returns the date and time that a file was last modified:

<!--#echo var="LAST_MODIFIED"-->

The following line runs a CGI script and returns the output of the script:

<!--#exec cgi="/cgi-bin/dse/test.cgi"-->

Tip
Some site administrators disable #exec because that SSI can be used to attack the security of the site. Review the recommendations in Chapters 17, "How to Keep Portions of the Site Private," and 40, "Site Security," to decide whether SSIs in general, or #exec in particular, should be part of your site's security stance.

Functions

The general syntax for a server-side include is

<!--#command tag="value"-->

Each of the SSI functions, or commands, begins with a pound sign (#). Each parameter to the command, called a tag, finishes with an equal sign (=) and then takes a value. The following sections show the syntax and meaning of each of the SSI commands.

#echo

#echo is used to return the value of a variable. The five variables specific to SSIs are:

These variables, as well as the CGI variables (which also can be processed by #echo), are described later in this chapter.

Suppose that somewhere in your file you write the following:

Last modified: <EM><!--#echo var="LAST_MODIFIED"--></EM>

In that part of the document, you expect a result like this:

Last modified: Tuesday, 06-Feb-96 16:41:36 EST

The utility of this feature in maintaining the site should be obvious.

#include

The SSI #include is used to embed the contents of one file in another file. Here's an example:

<!--#include file="footer.txt"-->
</BODY>
</HTML>

In this case, the result resembles the following:

<H3>Comments to Author</H3>
</P>
<A HREF="Mailto: bob@bobshomes.com">bob@bobshomes.com</A><BR>
<A HREF="http://www.xyz.com/BobsHomes">Bob Moore</A><BR>
Bob's Homes<BR>
1243 Smith Street<BR>
Anytown, USA<BR>
</P>
<P>
Phone: +1 800 555-1212 (24-hours) or FAX: +1 804 555-1234
</P>
</BODY>
</HTML>

#include serves the same function in HTML as include does in languages like C and C++. It helps to make code more modular and maintainable by setting all the items that need to appear on nearly every page into a small set of files, and then including a reference to the appropriate file instead of retyping that file's contents in a given location.

Another tag that can be used with #include is virtual=. While file= is used to include a file that's in the same directory as the document, virtual= can access any document within the server's document tree by relative reference.

In the above example, we could say:

<!--#include virtual="/BobsHomes/footer.txt"-->

Now, the footer file can be placed in the site's root directory, and can be accessed from any file, even if that file is deeper in the directory tree.

#exec

#exec can take either of two parameters:

The command named in the cmd= tag will be executed using /bin/sh. If the tag is cgi=, the server looks for the named file in the specified cgi-bin directory. If the result of the script is a Location tag, the server constructs a link to it. Otherwise, the result of the script is simply merged back into the HTML file.

Some browsers (most notably Apache) have extended the #include virtual= semantics to include #exec. With Apache, you can use the following:

<!--#include virtual="/cgi-bin/dse/test.cgi"-->

You even can add a query string to a CGI script called in this manner:

<!--#include virtual="/cgi-bin/dse/test.cgi?This is my query."-->

For more information about Apache, visit http://www.apache.org/.

#config

The #config command has three valid tags:

errmsg= is used to control what message is sent back to the client if an error occurs while parsing the document.

timefmt= gives the server a new format to use when providing dates. The formatting string comes from UNIX's strftime. Table 6.1 shows examples of this formatting.

Table 6.1  How to Use strftime Tokens in timefmt

strftime Token
MeaningExample
%a
The abbreviated weekdaysSun for Sunday
%A
The full weekday Sunday
%b
The abbreviated month nameOct for October
%B
The full month nameOctober
%d
The day of the month as a decimal number  
%D
The date in mm/dd/yy format  
%e
The day of the month as a decimal number in a two-digitfield ranging from 1 through 31  
%H
The hour of the 24-hour clock as a decimal number (00 through 23)  
%I
The hour of the 12-hour clock as a decimal number (00 through 12)  
%j
The day of the year as a decimal number (01 through 366)  
%m
The month of the year as a decimal number (01 through 12)  
%M
The minutes of the hour as a decimal number (00 through 59)  
%p
The local AM or PM string 
%r
The 12-hour clock time in local AM/PM notation 10:24:58 AM
%S
The seconds of the minute as a decimal number (00 through 59)  
%T
The 24-hour clock time in HH:MM:SS format 16:23:43
%U
The week of the year as a decimal number (00 through 52) with Sunday as the first day of the week  
%w
The day of the week as a decimal number (0 through 6)  
%W
The week of the year (00 through 53) with Monday as the first day of the week  
%y
The year of the century (00 to 99)1994
%Y
Year as a decimal number  
%Z
The time zone name (if one can be determined) EST

#sizefmt determines the formatting to be used when displaying the size of a file. The choices are bytes= or abbrev=.
#fsize takes either the file= or the virtual= tags, and returns the size of the specified file as formatted by #config.
#flastmod takes either the file= or the virtual= tags, and returns the most recent modification date.

The Variables

Six environment variables (not counting the CGI variables) are available to SSIs:

How (and Where) Server-Side Includes Work

In order to work, SSIs must be enabled at the server level. You may enable SSIs for the entire site or on a directory-by-directory basis. You may also configure the site to permit SSIs but prohibit the use of #exec. Finally, you may activate SSIs for all files or only for files with a special file extension (usually shtml).

The Quick Answer: How to Activate Server-Side Includes Processing

To configure an NCSA server for SSI processing, simply follow two steps (similar directives are available on other servers).

First, to activate SSIs server-wide, open the access.conf configuration file and put in the following directive:

Options Includes

To activate all SSIs except for exec, put in this directive:

Options IncludesNoExec

Note
To configure SSIs on a directory-by-directory basis, put the same directive in the htaccess file of the directory where SSIs should be allowed.

Second, specify which files should be examined for SSIs. This examination process (called parsing) takes time since every line of the file must be examined for comments with the appropriate characters. Many Webmasters elect not to parse every file on the server. Instead, they specify that only files with names ending in shtml should be parsed. To do this, they use the following lines:

AddType text/html shtml
AddType text/x-server-parsed-html .shtml

They place these lines in the srm.conf configuration file.

The Long Answer: How Server-Side Includes Processing Works

Suppose that a Web server has been set up as described previously, with SSI processing allowed for any file with a name ending in shtml. When the server sees a GET request from a client, it looks to see if the requested file ends in shtml. If it does, the server examines the file for SSIs because of the file extension.

When it finds directives, it processes them, then sends the entire document back with Content-type set to text/html because the following line was placed in the srm.conf file:

AddType text/html shtml

If the configuration files are not set up correctly, this process fails. When the person who maintains the Web site is not the same person as the one who maintains the server, miscommunication is common and the process is likely to fail. Remember that the Webmaster and server maintainer must agree on three things for SSI processing to work:

Here's a procedure to test SSI to make sure it is working:

  1. Build a page with a simple SSI that does not require #exec. Consider the example in Listing 6.1:

    Listing 6.1  List61.htm-A Simple Page to Test SSI Function

    <HTML>
    <HEAD>
    <TITLE>SSI Test</TITLE>
    </HEAD>
    <BODY>
    <P>
    Here is an SSI:<BR>
    <!--#echo var="LAST_MODIFIED"-->
    </BODY>
    </HTML>



    Save the page with the proper suffix. On most servers it is shtml. For this example, use the name
    test.shtml.

  2. Access the test.shtml page and verify that the SSI runs. The page should produce something like this:
    Here is an SSI
    Thur June 6 10:42:32 1996


    If the results don't contain a date, talk to the person maintaining the server. Make sure that you have the correct file extension and that the directory the page is in has been enabled for SSIs.
  3. Once the test page works correctly, add a known working script. Many servers have a test-cgi script in the cgi-bin directory, so you might add the following:
Here is another SSI:<BR>
<!--#exec cgi="/cgi-bin/test-cgi"-->

If this page works correctly, you see the following:

Here is another SSI:
CGI/1.0 test script report:
argc is 0, argv is .
SERVER_SOFTWARE = the name of your server software

This goes on to list a dozen or more CGI variables. If the page doesn't look like this, but the SSI echo test has run successfully, check the following items:

The last double-check is not likely to be a problem for test-cgi, since the person who installed the server probably set up test-cgi and some other standard scripts in the cgi-bin directory. When you write your own CGI scripts, however, you must make sure that the scripts have execute permission. On UNIX, you can use telnet to log on to the server; then, enter the following:

cd /path/to/cgi-bin
chmod +x yourScript.cgi

After the server passes all three tests, you can begin to write your own scripts and execute them with SSIs. On servers like Apache, you also can run them using the #include directive as follows:

Here is yet another SSI:<BR>
<!--#include virtual="/cgi-bin/test-cgi?Here+is+a+query+string"-->

If the server supports a CGI #exec from the #include directive, it produces the following:

CGI/1.0 test script report:
argc is 5, argv is Here is a query string.
SERVER_SOFTWARE = the name of your server software

The ability to pass query strings to an executed CGI script simplifies the writing of certain kinds of scripts. You'll find an example of such a script at the end of this chapter.

When SSIs Are Useful

In the development of conventional software, there's a rule of thumb that if you write something down in more than one place, all versions but one are likely to be wrong. The same principle holds true for the development of Web pages. Recall the discussion on style guides in Chapter 1, "How to Make a Good Site Look Great." Your style guide may call for certain standard, boilerplate entries. For example, most Web sites have the same background on all pages to increase the feeling of integration. Most pages have some sort of copyright notice and a link to the content author or Webmaster. Not only is there a lot of work required to enter all this information on every page, but when it changes, it's nearly impossible to make the changes correctly on every page.

The good news is, there's a better way.

Using #include to Set Up a Standard Body and Footer

Open a text editor and enter the template shown in Listing 6.2.


Listing 6.2  template.shtml-A Template for Pages with Included body and footer

<HTML>
<HEAD>
<TITLE>Template</TITLE>
</HEAD>
<!--#include virtual="body.inc"-->
<H1>Header</H1>
<P>
</P>
<!--#include virtual="footer.inc"-->
</BODY>
</HTML>

Save this file as template.shtml or use another file extension if that's what your server requires.

Notice that here we use the virtual element of the #include directive so that we can put the included files anywhere in the site. Also, notice that we've named included files with the extension inc, not html. These included files contain HTML, but they are not complete pages in themselves. You'll keep better track of your files by using file extensions as a key to what each file contains.

Now, go build the body.inc and footer.inc files in accordance with your style guide and preferences. For example, in body.inc, you might say the following:

<BODY BACKGROUND="paper_green.gif">
<A HREF="CompanyInfo.shtml"><IMG SRC="Graphics/logo.gif"
ALT="Logo" HEIGHT= 150 WIDTH=75>

Then, in the footer, you might include copyright information and information about how to contact the person responsible for the site (including a link to the person's e-mail address).

Keeping Last Modified Dates Current with #echo

Another common style requirement is that every page have a Last modified: entry. Web visitors usually are looking for the freshest content. If they see a page that's out-of-date, they might ignore it. If they see a page without a date, they might assume it's out-of-date. At the very least, they'll conclude that the Webmaster doesn't care enough about fresh content to mark a date on the site.

To overcome these problems, add the following line to your template file:

<!--#echo var="LAST_MODIFIED"-->

Tempting Examples That Won't Work with Server-Side Includes

Most style guides require buttons somewhere that link the page to the preceding and following pages, and to a table of contents page or top-of-section page. Keeping these links up-to-date is difficult because pages are added in the middle of sequences.

You might be tempted to view the buttons and links as boilerplates, and put them in an include file; however, that approach doesn't work. You quickly realize that an include file includes the same text every time. In this case, you need different text for every page.

Your second temptation might be to write a CGI script that takes the name of a page and looks up the preceding page. While this works, it can hurt the download time of the page. Each SSI takes a small amount of computer time. Putting a few SSIs on a page doesn't change the download time much-but when every button becomes an SSI link, the total time is negatively impacted.

The following sections cover two techniques that allow the page to have include-like features without costing any CPU time.

Using make and cpp to Perform Large Integrations

First, if you need a page with lots of simple includes (such as body.inc and footer.inc introduced previously), but don't want to pay the download penalty, take advantage of the fact that most UNIX systems come with a development environment that includes utilities named make and cpp. The make utility is used by software developers to keep projects up-to-date; it is told (in a file named Makefile) about file dependencies and it follows the procedure given in Makefile to make target files out of components.

The cpp utility is the C language's pre-processor. Don't be confused by the name-this tool can be used for any language, not just C. Suppose that you have a set of HTML page files that should be assembled out of various components. A product catalog is one example of such a page-you might have one master page showing all products, and then a page for each product with more detail. From time to time, you change some components. If you copy all the components into each file, then sooner or later the copies get out of synch. The catalog might start offering an item for one price while the detail page offers the same item at a different price. When you're ready to make a maintenance change, the prospect of proofreading all the pages and getting them into synch becomes daunting.

Assemble the master page like Listing 6.3.


Listing 6.3  List63.htm-Name the Master Page with a .shtml Suffix on the UNIX Server

<HTML>
<HEAD>
<TITLE>Acme Catalog</TITLE>
#include body.inc
<H1>Our products</H1>
<P>
Here is a summary list of our product lines.
</P>
<H2>Sporting Goods</H2>
#include "roadrunnerTrap.inc"
#include "coyoteTrap.inc"
#include "canaryTrap.inc"
<H2>Transportation</H2>
#include "jetBelt.inc"
#include "flyingCarpet.inc"
#include "catapult.inc"
</BODY>
</HTML>

Save this file with a name like catalog.o. The output of the next step will be HTML.

The first UNIX command that follows runs the C pre-processor, which interprets the #include directives in the raw file and outputs a processed file with an i extension. The second line changes the extension of the i file to html.

cc -P catalog.o
mv catalog.i catalog.html

For occasional use-or on small sites-you might be satisfied typing these commands in directly. At some point, however, when this process becomes burdensome, you'll want to use make to automate the work.

Listing 6.4 is a simplified Makefile (it shows only one target and one include file) that supports several advanced features.


Listing 6.4  Makefile-Good make Utilities Are Available for UNIX, DOS, and Macintosh Platforms

.SUFFIXES :
.SUFFIXES : .html .i .o .o,v
CP = /usr/bin/cp
RM = /usr/bin/rm -f
CO = /usr/local/bin/co
MAKEDEPEND = /usr/local/makedepend
.i.html:
 $(CP) $< $@
.o.i:
 $(CC) -P $<
.o,v.i:
 $(CO) $<
 $(CO) $(RCSINC)
 $(CC) -P $*.o
 $(RM) $*.o
 $(RM) $(INC)
.o,v.o:
 $(CO) $<
 $(CO) $(RCSINC)
.inc,v.inc:
 $(CO) $<
#---- here begins the site-specific part of this makefile ----
SRC = catalog.o
RCSSRC = $(SRC:.o=.o,v)
INC = roadrunnerTrap.inc
RCSINC = $(INC:.inc=.inc,v)
all: catalog.html
catalog.html: catalog.i
roadrunnerTrap.inc: roadrunnerTrap.inc,v
clean:
 $(RM) *.html *.i *.o *.inc
depend:
 $(CO) $(RCSSRC)
 $(CO) $(RCSINC)
 $(MAKEDEPEND) $(SRC)
 $(RM) $(SRC) $(INC)
# DO NOT DELETE THIS LINE - make depend depends on it.
catalog.o: roadrunnerTrap.inc

Let's review this Makefile to see how make can help you manage a Web site.

The first two lines tell make to ignore its built-in rules (which are mainly useful for programmers) and replace them with some of your own rules. The second line says that we are working with four types of files:

The next four lines tell make where to find various UNIX commands. While these lines are not strictly necessary, it's a good idea to be very specific in a Makefile since it gets used heavily by people who aren't necessarily familiar with UNIX. The third command, co, is the RCS checkout command. The fourth command, makedepend, is a special program that we'll examine in more detail momentarily.

The next 15 lines tell make how to make one kind of file from another. Consider this section:

.i.html:
 $(CP) $< $@
.o,v.i:
 $(CO) $<
 $(CO) $(RCSINC)
 $(CC) -P $*.o
 $(RM) $*.o
 $(RM) $(INC)

It says that the way to change an i file into HTML is to copy the i to html. The second rule says that the way to change an o file that has been checked into RCS (o,v) into an i file is to check out the file from RCS, check out all the include files, run the C pre-processor on the o file that was checked out, and then remove the o file and all the inc files.

If you keep your RCS files in a subdirectory such as rcs, specify the path to the RCS files in the VPATH macro. (VPATH is not available on all versions of make.)

One problem with make is that it does not look inside the files for #include directives. If you want to, you can specify all the includes in the Makefile. For example, to indicate that catalog.html depends upon catalog.i and roadrunnerTrap.inc, you would say:

catalog.html: catalog.i roadrunnerTrap.inc

Once your pages get beyond 10 or 20 include files, maintaining the Makefile can become tedious. That's where a utility named makedepend comes in. This program opens every target file, looks at the dependencies, and writes them to the Makefile. makedepend is a C program written by Todd Brunhoff, Tektronix, and MIT Project Athena, and it must be compiled for your machine. It can be downloaded from ftp://expo.lcs.mit.edu. Once it's compiled and installed, make sure that the Makefile shows the correct location (such as /usr/bin/local/makedepend). Then, enter make depend to have make run makedepend and discover all the dependencies. Rerun makedepend whenever the dependencies change-for example, if you add a new product to the catalog.

Most UNIX systems have make and a C compiler (with the C pre-processor). If your machine does not, you can get an excellent set of development tools from the Free Software Foundation. Their online archive has many mirrors. The master site is at ftp://prep.ai.mit.edu. The GNU version of make is particularly well integrated with RCS, has a sophisticated VPATH feature, and allows for a simpler Makefile because it generates some of the rules on its own.

Linking Buttons with Perl

Another task that requires "include-like" capability but is difficult to do with pure SSIs is to give each page a set of custom links. Suppose that a site has the following four pages:

On each page, there's a Prev button and a Next button. The page named 3.Three.html should have the following HTML:

<A HREF="2.Two.html"><IMG SRC="Graphics/prev.gif"
WIDTH=90 HEIGHT=30 ALT=Prev></A>
<A HREF="4.Four.html"><IMG SRC="Graphics/next.gif"
WIDTH=90 HEIGHT=30 ALT=Next></A>

While this process isn't too bad with four pages, it becomes tedious with 20 or more pages. When the site owner drops in a few new pages like 2~1.TwoPlus.html, life really gets interesting.

To automate this hook-up process, do two things. First, name the files in such a way that the default ordering of ls outputs the files in the desired order. This convention allows you to make a list of the files in their proper order just by issuing this command:

ls > theList

Second, follow a coding standard that includes putting the anchor tag, image tag, and end-anchor tag all on the same line. (It's possible to make this work when they're spread out over multiple lines, but then it's more complex.) Your coding standard should also specify how tags and attributes are capitalized. For example, you might adopt a standard that tags and attributes are entirely in uppercase and values are in mixed case. Finally, you must resolve to set up the anchor and <IMG> the same way for every button. These conditions are easy to meet by using a template or an include file. Here's what the code for one button looks like when you follow these conventions:

<A HREF="2.Two.html"><IMG SRC="Graphics/prev.gif"
WIDTH=90 HEIGHT=30 ALT=Prev></A>

Now, let's design a Perl script to hook up the buttons on one page. For obvious reasons, let's name the script shown in Listing 6.5 hookUp.pl.


Listing 6.5  hookUp.pl-This Perl Script Can Save A Lot of Work Hooking Up Buttons

#!/usr/local/bin/perl
# name of file which contains the ordered list of pages
$theListFile = "./theList";
# name of string which names the Previous button graphic
$prevString = "Graphics/Prev.gif";
# same song, second verse
$nextString = "Graphics/Next.gif";
# now do the real work
# find the name of the input file
$fileName = $ARGV[0];
open (LIST, $theListFile) ||
die "Cannot open list file $theListFile\n";
$prevPage = "";
while (<LIST>)
{
  chop;
  last if ($_ eq $fileName);
  $prevPage = $_;
}
$nextPage = <LIST>;
chop $nextPage;
open (FILE, $fileName) || die "Cannot open page file $fileName\n";
while (<FILE>)
{
  s/<A HREF="(.*)"><IMG SRC="$prevString"/
  <A HREF="$prevPage"><IMG SRC="$prevString"/;
  s/<A HREF="(.*)"><IMG SRC="$nextString"/
  <A HREF="$nextPage"><IMG SRC="$nextString"/;

  print;
}
close (FILE);
exit;

You probably should omit the Prev anchor on the first page and the Next anchor on the last page. If you ever add a page ahead of or following those pages, respectively, you can hook the new page up by hand. After that, just run hookUp on every file by hand-or from the Makefile-before you release the site. To add hookUp to the Makefile, just add its name at the top and put it into the rule for how to make HTML from an i file. The hookUp script looks for a name on the command line and writes the hooked up file to the standard output (STDOUT).

HOOKUP = /path/to/your/copy/of/hookup
.i.html:
 $(HOOKUP) $< > $@

An SSI Example

The previous section explained how to avoid using SSIs for some include-like tasks when SSIs are not appropriate. Here, on the other hand, we'll look at a task that's best solved by an SSI.

The Problems

A major theme of this book is that "content is king," and that fresh content is the best mechanism for bringing visitors to your site. How do you tell them the site has changed, though? And, when they come back, how do you show them the newest material?

URL-minder

The first problem is solved with a third-party referral service named URL-minder. Put the URL-minder code on your first page and invite visitors to "sign up" for your site. Every few days, the URL-minder robot visits your site and checks that page to see if it has changed. The URL-minder robot can't tell what has changed, but even a difference of a single character is enough to tell the robot that something is different. When the robot discovers a difference, it sends a message to everyone who has registered an interest in that page.

To make URL-minder available to visitors, put the following code on the index page of your site:

<FORM METHOD=GET ACTION="http://www.netmind.com/cgi-bin/uncgi/
url-mind/URL-minder/URL-minder.txt">
<P>Enter your e-mail address to receive e-mail when this 
page is updated.</P>
<P><B>Your e-mail address: </B>
<BR><INPUT TYPE=Text SIZE=40 NAME="required-email"><BR>
<BR><INPUT TYPE=Hidden NAME=url 
VALUE="http://www.xyz.com/path/to/your/index.html">
<INPUT TYPE=Hidden NAME=message VALUE="Thank you for 
registering your interest in the XYZ site.">
<P><INPUT TYPE=Submit VALUE="Register to receive e-mail when 
this page is updated.">
</FORM>

The message field should contain whatever reply you want the user to see after they have registered. The url field, of course, should contain your index page's URL.

Part of the Solution

Now, your visitors have a way to indicate interest in the site. When you update the content of the site, you can update the index page to show what's new. The registered users get e-mail saying that the page has changed and, hopefully, they'll come back to find out what's new.

But, here's your next problem. Keeping the index page up-to-date is a real chore, particularly if you manage many sites. In an ideal world, you would have each page automatically turn on a "New" graphic next to its entry on the index page and maybe even display the date on which the file was last modified.

Good news! For once, it's an ideal world. The following two sections explain ways to use SSIs to put up exactly that information.

The Simple Way-If Your Server Supports It

On some servers, like Apache, you can call a CGI script from an #include SSI directive:
<!--#include virtual="/cgi-bin/path/to/script.cgi"-->

You can even include a query string after the cgi path. Thus, to automatically track which files have new content, you can write:

<P><A HREF="catalog.html">Our product catalog</A>
<BR>
<!--#include virtual="/cgi-bin/isNew.cgi?catalog.html"-->
</P>

Listing 6.6 shows a version of isNew.cgi that works on the Apache server.


Listing 6.6  isNew.cgi-This Perl Script Identifies New Files on an Apache Server

#!/usr/bin/perl
require "ctime.pl";
# Look in this directory for pages 
$pageDirectory = "/users/dse/pages/test/";
# Look for files that are newer than this number of days
$newTime = 14;
# And apply this graphic if the file is new
$newGraphic =  "Graphics/new.gif";
#--
# Now go to work
#
$filename = $ENV{QUERY_STRING};
$age = (-M $filename);
if ((-e $filename) && ($age < $newTime))
{
  print "Content-type: text/html\n\n";
  print "<IMG SRC=\"$newGraphic\" ALT=New>\n";
  ($dev, $ino, $mode, $nlink, $uid, $gid, $rdev, $size,
 $atime, $mtime, $ctime, $blksize, $blocks) = stat ($filename);
  $fileDate = &simplifyDate(&ctime($mtime));
  print "Last modified: <EM>$fileDate</EM>\n";
}
exit;
sub simplifyDate
{
  local($ctime) = @_;
  chop $ctime;
  $ctime =~ s/  / /;
  ($day, $month, $date, $time, $timeZone, $year) = 
  split (/ /, $ctime);
  $simpleDate = $day . ' ' . $month . ' ' . $date . ' ' . $year;
  $simpleDate;
}

This program reads the name of the file to check from QUERY_STRING and uses the -M test to get the file's age in days (including fractional days). If the file exists and is not too old, the script outputs the "New" graphic and then runs the stat function on the file and gets the modification time. The calls to ctime and simplifyDate serve to make the date easier to read.

A More Complex Way That Works on Any Server

Not all servers allow a script to pass a query string to a script executed with exec. If you can only use #exec, then you cannot pass environment variables, path information, or a file handle to the script-all the script gets is a call to run. To pass even a little information in this situation, you have to use the file name itself.

Suppose that your site has ten pages listed on the index page and that each page is updated from time to time. For simplicity, let's name these files One.html through Ten.html. Set up ten symbolic links in the cgi-bin directory, as follows:

ln -s isNew.cgi One.cgi
ln -s isNew.cgi Two.cgi
...
...
ln -s isNew.cgi Ten.cgi

Now, when isNew runs, it can tell which file it's supposed to check, since the name of the file is encoded in the name under which the script was invoked. Listing 6.7 shows the modified isNew.cgi.


Listing 6.7  isNew.cgi-This Version of isNew Does Not Require Apache's Special Features

#!/usr/bin/perl
require "ctime.pl";
# Look in this directory for pages
$pageDirectory = "/users/dse/pages/test/";
# Look for files that end in one of these suffixes
@suffixes = (".html", ".shtml");
# Look for files that are newer than this number
$newTime = 14;
# And apply this graphic if the file is new
$newGraphic =  "Graphics/new.gif";
#--
# Now go to work
#
# get the name under which this file was invoked.
$_ = $0;
# tease out the base name
if (/[\/.]*\/(\w+)\.cgi/)
{
  $file = $1;
}
# find the file to be monitored.
foreach $suffix (@suffixes)
{
  $filename = $pageDirectory . $file . $suffix;
  last if (-e $filename);
}
# and from here, we run just like we do on Apache servers.
$age = (-M $filename);
if ((-e $filename) && ($age < $newTime))
{
  print "Content-type: text/html\n\n";
  print "<IMG SRC=\"$newGraphic\" ALT=New>\n";
  ($dev, $ino, $mode, $nlink, $uid, $gid, $rdev, $size,
 $atime, $mtime, $ctime, $blksize, $blocks) = stat ($filename);
  $fileDate = &simplifyDate(&ctime($mtime));
  print "Last modified: <EM>$fileDate</EM>\n";
}
exit;
sub simplifyDate
{
  local($ctime) = @_;
  chop $ctime;
  $ctime =~ s/  / /;
  ($day, $month, $date, $time, $timeZone, $year) = 
  split (/ /, $ctime);
  $simpleDate = $day . ' ' . $month . ' ' . $date . ' ' . $year;
  $simpleDate;
}

This chapter described how to extend HTML using server-side includes. Some of the most useful SSIs allow the user to bring the output of a program (such as isNew.cgi) directly onto the page. The next chapter describes how to build scripts that can produce entire new pages at runtime.