Chapter 18 How to Query Databases

Technology
The mSQL Family

From time to time, lists appear defining the "hot technologies" of the Internet. Nearly all those lists include "database access" as one of their entries. Yet database access continues to be among the most mysterious techniques of the Web.

At least part of this mystery comes from the fact that many programmers have little experience in database access. Simple PC database managers such as dBASE use a different access language than the larger databases usually associated with Web sites. Furthermore, most Web sites need a client/server database architecture, which is a new design for some programmers.

While there are many different ways to link a Web page to a database, most of these techniques have been outside the mainstream. There was no documentation except that which was on the Net. Unless a Webmaster already knew the name of the tool, there was no good search strategy for finding interface programs.

Finally, most of the good database managers have been quite expensive, with many development kits starting at $20,000 or more. These prices discourage casual experimentation.

Technology

Chapter 16, "How to Index and Search the Information on Your Site," introduced search and indexing software in the context of searching a Web site. For many applications the Web site must allow a visitor to search much more. For example, a book distributor carries over a thousand titles. The visitor does not need (nor can the book distributor afford!) a page describing each title. They don't need to index the contents of the books (such as WAIS would allow). Instead they want to allow structured queries. For example,

Show the publisher and title of all books with the phrase "German Shepherd" in their title which were published after 1990 and have a retail price less than $25.00.

Database technology is ideally suited for this sort of task. Database technology comes in various flavors, each suitable for a particular class of problems.

Flat Files

Consider the previous query on books about German Shepherds. It suggests that the data is stored in a table like Table 18.1.

Table 18.1 Structure of the "Books" Table

Publisher	Title	Publication Year	Retail Price
BeanBag Press	Our Doggies, Our Selves	1994	$14.95
Dee	German Shepherds as Pets	1993	$19.95
BeanBag Press	German Shepherds on the Job	1992	$24.95
Eggles and West	A History of the German Shepherd	1989	$12.00
Tabb Press	The Way Dogs Ought To Be	1992	$23.50

If the table isn't too long, the query could be answered with a brute force search: examine each line looking at the title, the publication year, and the price. If the book meets the search criteria, print the publisher and title. This kind of search is what computer scientists call an O(n) (pronounced order-n) search. As the number of entries in the database grows, the time required to search the database grows at the same rate. Suppose it takes an average of one millisecond to read a record from the disk and determine if it satisfies the search conditions. When there are 100 records in the table, the query is answered in just a tenth of a second. If the number of entries grows to 1,000, the pro-cessing time grows to one second. When 10,000 entries are in the database, it takes 10 seconds to select all the right records-a figure unacceptably slow for many applications.

Sorting the list helps somewhat. If the list is sorted by publication year, then the search engine can immediately focus on those entries that were published in 1990 or later. If the list is sorted by price, again only part of the list needs to be searched. If the computer maintains indexes on both those fields, it could quickly pull out those records that have the right year and price and search the titles of this much smaller list for the phrase "German Shepherd." For many real-world problems, the data is searched often but updated infrequently, so a lot of computation is saved by storing the data in sorted lists (or, almost equivalently, maintaining sorted indexes to the records). Algorithms exist to search certain kinds of indexes in O(log2n), or even O(1) time, a fraction of the time needed for a sequential search.

For real-world problems, flat files use disk space inefficiently. Consider an accounting system, with accounts payable and vendors. Part of the flat file might look like Table 18.2.

Table 18.2 Flat ASCII Files Are Notorious Wasters of Disk Space

Vendor	Vendor Address	Invoice #	Amount Due
HPM	47-001 Kam Highway	0001	125.00
HPM	47-001 Kam Highway	0002	243.00
HPM	47-001 Kam Highway	0003	119.00

Each record duplicates information, wasting disk space. Furthermore, this design is difficult to maintain. If Fujimoto & Son moves, the merchant may have to update hundreds of records.

A better design separates invoices and vendors into two different tables, like the ones shown in Tables 18.3 and 18.4.

Table 18.3 The Vendors Table, Also Called a Relation, Holds Information About Vendors Only

Vendor	Vendor Address	Vendor ID
HPM	47-001 Kam Highway	0001

Table 18.4 The Invoices Table Has Pointers, or "Secondary Keys," Back to the Vendors Table

Vendor ID	Invoice #	Amount Due
0001	0001	125.00
0001	0002	243.00
0001	0003	119.00

This approach is used in relational database management systems, described in the section later titled, "Relational Databases."

The Indexed Sequential Access Method (ISAM)

As the size of the database grows, first the data and then even the indexes overflow main memory and must be stored on the disk. Disk accesses are several thousand times slower than memory accesses, so doing as much work as possible in memory before looking at something on the disk saves a lot of time. The indexed sequential search technique, also known as the Indexed Sequential Access Method, or ISAM, involves balancing hardware factors such as disk blocking and track size to build a partial index. The index is called "partial" because it does not lead to an individual record. Instead, it gets the searcher to a set of data on the disk that can be read sequentially. A complete index might overflow main memory, but the partial index can fit and therefore be accessed much faster.

Here's an example, illustrated in Figure 18.1. Suppose the book database is based on Books in Print, or the Library of Congress. There might easily be over 1,000,000 records. If each record requires just 100 bytes, the database takes 100M of storage. That much main memory is expensive. If we store 32 records per disk block, and index the disk blocks, the index requires just over 31,000 entries. If even that figure represents too much memory, an index to the index could be prepared, with perhaps 1,000 entries. Now access to the records is through a primary index (in memory), then a secondary index (with a single disk fetch), then to a block (requiring one more disk fetch), which must be searched sequentially.

Figure 18.1: The Indexed Sequential Access Method stores records on the disk and keeps indexes to the records (or indexes to the indexes) in high-speed memory.

If the file changes regularly, the problem becomes more complex. ISAM relies upon the file being stored on disk in the same order as the indexed field. When records are added or deleted from the database, the file and its indexes must be rebuilt. This process is not fast and cannot be done while the database is in use. Some implementations add a "changed" section on the disk, so that after purse ISAM has run, the system sequentially checks the changes before returning its results. Until the system can be stopped for rebuild, the "changed" section continues to grow, leading some pundits to dub ISAM the "Intrinsically Slow Access Method."

Relational Databases

Many real-world problems get more complex than a single table and index can handle. For example, if the previous example were used by a book reseller, she would want to keep track of the price at which she bought books (which might be bought in different lots, at different prices), the quantity sold, the publisher (and the publisher's contact information) and perhaps information on the wholesalers who actually deliver the books. Several tables come to mind:

Books
Publishers
Wholesalers
Purchase orders
Sales orders

Now more complex queries are possible: What is the total value of books on hand for which the last sales order over quantity 10 was more than 30 days ago (including books which have never been ordered in quantities greater than 10)?

To answer this query, the database user must construct a plan: first, select all the sales orders for quantities over 10 that are 30 days old or newer. Select from the books table all the books that are not on that list. For each title, compute the difference between the most recent sales price and the purchase price. (There are some accounting decisions here that will be ignored for the sake of simplicity.) Find the quantity on hand by subtracting the quantity sold from the quantity purchased. Finally, multiply the difference between selling price and purchase price by the quantity on hand to compute the value of that title, and add up all the values to answer the query.

When databases and queries become this complex, most people turn to the relational database management system (RDBMS). Many vendors offer an RDBMS solution; Oracle, Sybase, and Informix are among the best known. Most RDBMSs are accessed using the Structured Query Language, or SQL (pronounced see-quel).

Tip

Developers using Windows servers (such as Windows NT) have the option of using Microsoft's Open Database Connectivity (ODBC) technology. Use the ODBC Administrator program (available from Microsoft) to add ODBC drivers and define the characteristics of the data sources you want to access. The ODBC Administrator knows about many possible data sources, including SQL Server, dBASE files, Microsoft Excel, Paradox, FoxPro, Access, and Text).

Note

Database access typically includes a back end, or server, and a front end, or client. If you use ODBC, add the driver for the back end. Then build a front end using SQL. For example, Visual FoxPro has a set of "SQL Pass-Through" functions that allow the programmer to establish an ODBC connection and send SQL commands. That connection supports the transaction model, which explicitly names points where each set of operations is committed or cancelled.

For more information on the SQL Pass-Through functions in Visual FoxPro, see Chapter 27 ("Accessing ODBC Server Data") of Miriam Liskin's Visual FoxPro Expert Solutions (Que, 1995).

For more general information on setting up an operating Microsoft's SQL Server, see Microsoft SQL Server 6 Unleashed by Jeff Garbus, David Solomon, Ray Rankins, and Daniel Woodbeck (Sams, 1996).

Designing a database is a specialty. For large, complex databases, expert designers should be consulted. This chapter focuses on how to access such a database from the Web and only incidentally on design and language issues.

In the relational vocabulary, the tables are referred to as relations. Each entry in a relation must have a unique identifier, called a primary key. Tables are linked by having columns in two or more tables that share a primary key. For example, to model the concept that a book is purchased, the book table might have an ISBN as its primary key. A purchase order, or PO, would have a header table to contain information about the wholesaler and a details table that lists each line item on the PO.

One of the columns of the PO detail table would be the ISBN. In this model, the title would not be stored in the PO detail table. To find out the title of a book on a PO, the database would use the ISBN and look up the book in the book table.

The process of looking up a key in one table, then searching for it in the corre-sponding column of another table to assemble a unified record is called a join. Joins are computationally expensive. SQL allows the user to specify indexes on frequently-accessed columns to decrease the time required for joins. In most versions of SQL, one such index may be declared a "clustered index." Clustered indexes force the table to be rewritten to the disk in the order of the indexed field, in much the same way as ISAM data is stored.

Management of joins and indexes constitutes a major distinguishing factor between the competitors in the RDBMS market.

Object-Oriented Databases

The newest member of the database technology family is the object-oriented database, or OODB. OODBs are a natural choice when the overall system is being written in an object-oriented language such as C++. With many OODBs, the programmer does not need to learn a separate language like SQL-C++ operators are used to put data into the database and retrieve it again. OODBs make particularly good sense when much of the information to be modeled lies in the connections between tables, rather than in the tables themselves.

The mSQL Family

High-end RDBMS products routinely cost tens of thousands of dollars. For many purposes on the Web, a much simpler product will suffice. High-end products are often used to produce reports, which may take many minutes to run. Most Web queries need to complete within a few seconds to satisfy user's real-time requirements. To fill this need David Hughes wrote "miniSQL," also known as mSQL. mSQL is a light-weight RDBMS that supports a subset of the SQL language. It is offered under a commercial license; the price is in Australian dollars. At present exchange rates the product costs under $200 U.S. For details, visit http://Hughes.com.au/product/msql/.

A SQL Primer

Most RDBs contain an interactive SQL interpreter. mSQL calls its interpreter msql. Here is a summary of common SQL commands that can be understood by msql (as well as most other SQL interpreters).

Making and Filling a Database

To make a new database named test, type

msqladmin CREATE test

In the following examples, SQL commands are shown in upper case. Column names, table names, and other parameters are shown in lower or mixed case. mSQL will accept commands in either case. To begin to work with the new database, type

$ msql testWelcome to the miniSQL monitor.  Type \h for help.
mSQL > CREATE TABLE books
    -> (Title char (30) not null,
    -> Publisher char(20) not null,
    -> PY int,
    -> Price real,
    -> ISBN char(13) primary key)
    -> \g
Query OK.

After connecting to the test database, the operator instructed mSQL to make a new table with five columns. Title and Publisher are text strings of 30 and 20 characters, respectively. The phrase not null says that those columns cannot be left empty when making a new instance. PY (publication year) is an integer, and Price is a floating-point value. ISBN is a 13-character string and is declared as the primary key. The \g tells the interpreter to "go." The resulting table is shown in Figure 18.2.

Figure 18.2: This simple table is used in the example describing the capabilities of mSQL.

The operator has inserted one record into the test database.

mSQL > INSERT INTO books
-> VALUES ('Our Doggies, Our Selves', 'BeanBag Press', 1990,
    -> 24.95, '0-555-12345-3')
    -> \gQuery OK.

Queries

Now the user asks the database to display all fields from all records.

mSQL > SELECT * FROM books    -> \g
Query OK.
1 rows matched.
 +--------------------------------+----------------------+----------+--------------+--------------+
 | Title                          | Publisher            | PY       | Price        | ISBN          |
 +--------------------------------+----------------------+----------+--------------+--------------+
 | Our Doggies, Our Selves        | BeanBag Press        | 1990     | 24.95        | 0-555-12345-3 |
 +--------------------------------+----------------------+----------+--------------+--------------+
+--------------------------------+----------------------+ 
 | Title                          | Publisher            |
 +--------------------------------+----------------------+
                               | Our Doggies, Our Selves        | BeanBag Press        |
                               -----------+--------------+---------------+
                               | PY       | Price        | ISBN          |
                               -----------+--------------+---------------+
                               | 1990     | 24.95        | 0-555-12345-3 |
                               -----------+--------------+---------------+

In this query the operator requests the title and publication year of all records published after 1990. There are none.

mSQL > SELECT Title, PY FROM books
	-> WHERE PY > 1990
    -> \g
Query OK.
0 rows matched.
 +--------------------------------+----------+
 | Title                          | PY       |
 +--------------------------------+----------+ 
 +--------------------------------+----------+

The operator asks for the title and publication year of all records with a retail price of $24.95.

mSQL > SELECT Title, PY FROM books    
	 -> WHERE Price=24.95
     -> \g
Query OK.
1 rows matched.
 +--------------------------------+----------+
 | Title                          | PY       |
 +--------------------------------+----------+
 | Our Doggies, Our Selves        | 1990     | 
 +--------------------------------+----------+

Although matching real numbers exactly is often a poor idea in traditional programming languages, mSQL has no problem selecting and returning the desired data.

Joins

Relational joins show off the true power of the RDBMS. Joins are queries which span more than one table. Suppose the previous example has been expanded, so there are books, publishers, and wholesalers. The database is shown in Figure 18.3.

Figure 18.3: The programmer must use SQL Joins to build queries that span the tables of this expanded database.

A simple query such as

"Show the titles of books which are published in California."

becomes

SELECT books.Title FROM books, publishers
WHERE books.PubID = publishers.ID AND
publishers.State = 'CA'

More complex queries are also possible:

"Show the names of distributors in California who handle books 
which are published in Massachusetts."

becomes

SELECT distributors.Names FROM books, publishers, pubDetails, 
distributors, distribDetails
WHERE books.ISBN = distribDetails.ISBN AND
books.PubID = publishers.ID AND
distribDetails.ID = distributors.ID AND
distributors.State = 'CA' AND
publishers.State = 'MA'

While this query is complex, it can be coded in just a few minutes.

Commercial RDBMSs support indexes, transactions, and other features not found in mSQL, but for lightweight use on the Web, mSQL is highly effective.

mSQL

Get the install kit for mSQL from http://Hughes.com.au/product/msql/, and follow the directions in the README file. By default, mSQL installs expects to be run from root and installed into the directory /usr/local/Minerva. Both of these assumptions may be changed.

mSQL is another example of a program which uses a daemon, a program which is left running in the background. Make sure msqld is started by UNIX when the machine is rebooted. Otherwise when the server is taken down for maintenance, the database will go down for good.

Access Control

When mSQLadmin is first run, it may complain that it cannot find the ACL. It is looking for the Access Control List, a security feature. You can use mSQL without an ACL, but it is good practice to enable access control. A typical ACL is shown below:

database=test
read=jones, root
write=root
host=*
access=local,remoteoption=rfc931

This ACL says that it controls access to the database named test. Read access is granted to users jones and root. No one else can run SELECT against the database. Only root can write to test. To grant access to everyone, use *-the default action is global denial, so if the ACL had

database=test
write=root
host=*
access=local,remoteoption=rfc931

the database would be unreadable. Note, too, that the database entry must be followed by a blank line, to show the end of the entry.

Debugging

The mSQL engine is created with various debug lines. To see how the program is handling various requests, turn on debug with the MINERVA__DEBUG environment variable. For example, in the Korn or Bourne shell, enter:

MINERVA__DEBUG=query:error:key

The full list of debug options is:

cache__Display the workings of the table cache
query__Display each query before it is executed
error__Display error message as well as sending them to the client
key__Display details of key-based data lookups
malloc__Display details of memory allocation
trace__Display a function call trace as the program executes
mmap__Display details of memory-mapped regions
general__Anything that doesn't fit into a category above

Note

mSQL is supported by a high-volume mailing list. There are over 1,000 mSQL users subscribed, so this list is an excellent place to ask mSQL questions. To subscribe, send an e-mail message containing the word "subscribe" to msql-list-request@Bunyip.com. Subscribers can send a message to the entire list at msql-list@Bunyip.com. Archives of the mailing list, as well as general information on mSQL, are available at http://Hughes.com.au/.

MsqlPERL

Once an RDBMS such as mSQL has been installed, there are several ways to access it from the Web. The following discussion uses mSQL in its examples. Similar methods work for Oracle, Sybase, and other commercial products.

The first access method is to link Perl directly to the database. mSQL comes with a set of C language Application Programmer Interface (API) library routines. Several mSQL users have developed bindings from this library to their favorite language. Andreas Koenig (mailto: k@franze.ww.TU-Berlin.DE) developed MsqlPerl, a Perl5 adapter for mSQL. His program is available at ftp://Bond.edu.au/pub/Minerva/msql/Contrib/.

Before installing MsqlPerl, install Perl5. The MsqlPerl installation kit extends Perl5 in-place. Once MsqlPerl is installed, code like that shown in Listing 18.1 will work.

Listing 18.1 List181.pl-Connecting to a Database with MsqlPERL

#!/usr/bin/perluse Msql;
use html;
package main;
# Connect to the local host
#host = "";
$dbh = Msql->Connect($host) || &die ("Cannot connect to local host.\n");
$dbh->SelectDB("test") || &die("Cannot find test database.");
# Run a query, which may return multiple rows
$sth = $dbh->Query("select Title, PY from books") || 
 &die("Error: Msql::db_errstr\n");
while (@row = $sth->FetchRow())
{
 print "Title: $row[0] published in $row[1]\n";
}exit;

Note that, following a query, the results are stored in memory allocated by mSQL. If another query is run, the results from the new query overwrite the old results. To prevent this occurrence, call StoreResult() before making the next call to Query().

W3-mSQL

Some users want to simplify their interface to the database. They may want to pull up rows from the database in the middle of their HTML, without having to write an MsqlPerl script. David Hughes, the author of mSQL, has a solution: W3-mSQL.

W3-mSQL is available at Hughes.com.au/product/w3-msql/. Version 2, a new release, is documented at Hughes.com.au/product/w3-msql/manual-2/w3-msql.htm, and is described here.

With W3-msql, the programmer can build a page like the one in Listing 18.2.

Listing 18.2 w3msql.html-A Demo of W3-msql

<HTML><HEAD>
<TITLE>Demo of W3-mSQL</TITLE>
</HEAD>
<BODY>
<H1>Demo of W3-mSQL</H1>
<! printf("This line actually works!\nHello, world!\n");>
</BODY></HTML>

To get this code to run, specify a URL that executes the W3-mSQL binary (called nph-w3-msql), typically located in the cgi-bin directory. If the script is in /xyz/demo.html, the URL should be /cgi-bin/nph-w3-msql/xyz/demo.html.

W3-mSQL comes with a standard module (which provides most of the behavior of the C language) and an mSQL module (which encapsulates the C API to mSQL). This interface is similar to the MsqlPerl interface since both are based on the mSQL C API. To implement the example from the previous section in W3-mSQL, one would use

<!$host="";
$dbh = msqlConnect($host);
if ($dbh < 0)
{
 echo ("Cannot connect to local host. Error: $ERRMSG\n");
}
>
<!
if (msqlSelectDB($dbh, "test") < 0)
{
 echo ("Cannot find test database.");
}
>
<!
$res = msqlQuery($dbh, "select Title, PY from books");
if ($res < 0)
{
 echo ("Error: ERRMSG\n");
}
>
<!
$row = msqlFetchRow($res);
if (#$row == 0)
{
 echo ("ERROR: $ERRMSG\n");
}
else
{
 echo ("Title: $row[0] published in $row[1]\n";
}>

A new feature with version 2 is enhanced security. W3-msql includes W3-auth, allowing the Webmaster to define secure areas restricted by user authentication.

PHP/FI

Yet another embedded scripting language, similar to W3-msql, is PHP/FI. The acronym stands for "Personal Home Page/Forms Interface," which doesn't clear things up much. PHP is primarily responsible for access control and logging. FI is responsible for the user interface and database access.

PHP/FI's home page describes the software as "a server-side HTML-embedded scripting language." In concept it is similar to JavaScript, with this exception: JavaScript runs in the client (and specifically, the Netscape client), whereas PHP/FI runs on the server (and so runs with any client). A full description of PHP/FI is available at http://www.vex.net/php/.

PHP started life as a sophisticated access logger, and it is no surprise that it does that feature well. It also affords access control, access to mSQL and DBM databases and to Thomas Boutell's on the fly GIF image creation package, RFC-1867-compliant file upload, and a full programming language reminiscent of Perl. What is surprising is that this very complete and very well-supported package is free.

For many applications it is faster for the programmer and the end user to execute a short PHP script than to launch Perl and run a CGI program. Maintenance costs are reduced somewhat since the programmer maintains an integrated piece of code rather than a separate HTML page and CGI script.

With PHP/FI, some applications that might otherwise be handled in mSQL can be handled in DBM. DBM is a disk-based data format commonly used in the UNIX community to manage associative arrays. Associative arrays are used in Perl scripts, for example, to allow the programmer to say $FORM{'email'} to get to the email cell of the @FORM array.

Once installed, the PHP/FI binary is invoked in much the same way as W3-msql: Assuming the binary is in the cgi-bin directory and has the name php.cgi, the URL of a PHP/FI-enhanced page located at /xyz/demo.html is /cgi-bin/php.cgi/xyz/demo.html.

When a page is displayed by PHP/FI, PHP/FI adds a footer showing the number of times the page has been accessed. To turn this information off, add a tag like the following one to your page.

<?setshowinfo(0)>

The footer can also be controlled from within the ?config section of PHP/FI or on Apache servers from the server configuration files. These options are described in the PHP/FI documentation.

Once PHP/FI has control of a page, it can handle many tasks locally that would otherwise require a CGI script or a JavaScript program. For example, suppose you put up a page with the following HTML:

<FORM ACTION="/cgi-bin/php.cgi/xyz/display.html" METHOD=POST><INPUT TYPE="text" name="name">
<INPUT TYPE="submit"></FORM>

When the user submits the form, the response goes to display.html by way of PHP/FI. display.html contains

<? $hour = Date("H");
if ($hour < 12);
  echo "Good morning, $name<P>";
elseif ($hour < 19);
  echo "Good afternoon, $name<P>";
elseif ($hour < 22);
  echo "Good evening, $name<P>";
else;
  echo "Good grief, $name, what are you doing up so late?";
endif;>

To C and Perl programmers, those extra semicolons can be a bit unsettling. They are part of required syntax of PHP/FI. Unlike other languages, PHP/FI does not use braces. The previous code could also have been written with each statement in its own angle-brackets, like this:

<? $hour = Date("H")><? if ($hour < 12)>
<? echo "Good morning, $name<P>">
<? elseif ($hour < 19)>
<? echo "Good afternoon, $name<P>">
<? elseif ($hour < 22)>
<? echo "Good evening, $name<P>">
<? else>
<? echo "Good grief, $name, what are you doing up so late?";><? endif>

If the application does not require interpolated string variables, the programmer could even write

<? $hour = Date("H")><? if ($hour < 12)>
Good morning<P>
<? elseif ($hour < 19)>
Good afternoon<P>
<? elseif ($hour < 22)>
Good evening<P>
<? else>
Good grief, what are you doing up so late?<P><? endif>

To access an mSQL database in a manner similar to that shown in the previous sections, the programmer writes

<?$host="localhost";
msql_connect($host);
$res = msql("test", "select Title, PY from books");
if ($res < 0);
 echo "Error: $phperrmsg\n";
elseif ($res == 0);
 echo "No books available";
else;
$num = msql_numrows($res);
$i = 0;
while ($i < $num);
  echo "Title: ";
  echo msql_Result($res, $i, "Title");
  echo " published in ";
  echo msql_Result($res, $i, "PY");
  echo "<P>";
  $i++;
endwhile;
 echo ("Title: $row[0] published in $row[1]\n";>

Figure 18.4 shows one real-world application of PHP/FI. QMS, known for their printers, has put their technical notes online. A user can visit their site, at http://www.qms.com/cgi-bin/supportbase/www/faq/search-faq-display.html and enter a term like "noise." As shown in Figure 18.5, the server returns a list of support notes which address that term.

Figure 18.4: The QMS Technical Support Search Form allows the user to enter search terms and query the online technical support database.

Figure 18.5: Using PHP/FI, the QMS site returns the results of a search for "noise".

An example with a more complex interface is given at http://www. nerosworld.com/realestate/or/, shown in Figures 18.6 and 18.7.

Figure 18.6: Search real estate listings in Oregon with PHP/FI.

Figure 18.7: The search engine returns these results almost instantly.

The most elaborate example of PHP/FI shown in this section was put online by the Atlanta Metro Listing Service (MLS). This site, http://atlantamls.com/H/ allows the user to search through over 17,000 properties in just seconds. The query page and sample results are shown in Figures 18.8 and 18.9.

Figure 18.8: The Atlanta MLS has about 15,000 listings.

Figure 18.9: These search results came back in about 5 seconds.

PHP/FI also supports "Secure Variables," which are somewhat similar to Perl's tainted variables described in Chapter 17. To declare the variable foo as secure, use

<? SecureVar("foo")>

Once a variable is marked as secure, attempts to fill it from GET will fail, leaving it empty. Secure variables may be set by POST, or directly inside the PHP script.

JDBC

Sun Microsystems, developers of the object-oriented server-side language Java, has announced an interface between Java and SQL databases. Called JDBC, the application programmer interface defines classes for database connections, SQL statements, and result sets. Using JDBC, the programmer can connect to an RDBMS, issue one or more queries, and format the results for the client.

More information on Java is available in Chapter 3, "Deciding What to Do About Netscape."

More information on JDBC is avalable at http://www.iti.upv.es/~java/jdbc/jdbc-index.html and in the Web Week article at http://pubs.iworld.com/ww-online/96Jan/products/database-firms.html>.

WDB

The preceding products (MsqlPerl, W3-msql, and PHP/FI) require the developer to think about two things at once: the layout of the HTML page and the display of the data. For some applications, developers appreciate this level of control. Sometimes, however, a Webmaster just wants to give the visitor access to the database in the fastest way possible. WDB, by Bo Frese Rasmussen, is an excellent choice for those times.

To install WDB, first set up mSQL and then install MsqlPerl. Make a new directory, say wdb1.3a2, and untar the contents of the installation kit into it. Point a Web browser at README.html. Select the link to the Installation Guide and follow the directions given there.

Tip

Note that the directions are given for Sybase. The Postscript version has an appendix that addresses mSQL.

To use WDB, the developer builds a special file called the Form Definition File, or FDF, which describes the data. Rasmussen provides a tool, called mkfdf, which makes FDFs from the database schema. In the early release it is quite Sybase-specific; it may be easier to write the FDFs by hand than to adapt the tool to mSQL.

Listing 18.3 shows a simple FDF. At the top of this FDF are a few Form attributes. Other Form attributes are available to specify HTML to be placed at the top of queries and results screens, and Perl to be executed, typically to define functions. Below the Form attributes are the attributes of each of the three fields described on this form. One powerful set of field attributes are from__db and to__db. These attributes are used to transform data as it moves between the user's query, the database, and the result page.

Listing 18.3 book.fdf-A Sample FDF to Save the Programmer the Work of Writing Query Forms and Dynamic Pages

TABLE   = booksDATABASE  = test
TITLE   = Books
Q_HEADER  = Demo Query Form
R_HEADER  = Sample Query Result
COMMENTS_TO = morganm@dse.com
FIELD   = ISBN
label   = ISBN
column  = ISBN
type   = char
length  = 13
key
url   = "$WDB/test/books/query/$val{'ISBN'}"

FIELD   = Title
label   = Title
column  = Title
type   = char
length  = 30
FIELD   = PY
label   = PubYear
column  = PY
type   = intlength  = 4

Note

Most of the real power of WDF lies in the field attributes. Be sure to check out fdf_syntax.html, which comes with the installation kit.

Once the FDF is written and installed, WDB builds a query form from it. As a starting point, point the browser to http://your/server/cgi-bin/wdb/database/table/default. Figure 18.10 shows the resulting default list. From the default list the user can choose the query button and get http://your/server/cgi-bin/wdb/database/table/query, as shown in Figure 18.11. The fields on the query form permit relational operators, such as less than '<' and 'OR'. The help button on the query form is linked to a page describing how to use such operators.

Figure 18.10: Without writing any code, the programmer can still put up a meaningful list of records from the databse.

Figure 18.11: WDB automatically produces a default query form.

Caution

The documentation says that one can ask for a range, e.g., PubYear '1990 .. 1995'. The range operator builds a query using 'between,' which mSQL does not understand.

Figure 18.12 shows the results of a query. Of course, the developer can arrange the site so that the visitor has direct access to specific queries.

Figure 18.12: WDB formats the search results without any help from the programmer.

Although WDB allows the developer to put queries online without writing any code, much of the power of WDB comes from the ability to do calculations with the data, either before or after it is sent to the database. To define a computed field, specify the computed field attribute and compute the value in the from_db attribute.

Tip

When specifying a field as "computed," be sure to also set the no_query attribute so that the user doesn't try to look up the value in the database.

For full information on WDB visit http://arch-http.hq.eso.org/bfrasmus/wdb/wdb.html.

Chapter 18

How to Query Databases

CONTENTS

Technology

Flat Files

The Indexed Sequential Access Method (ISAM)

Relational Databases

Object-Oriented Databases

The mSQL Family

A SQL Primer

Making and Filling a Database

Queries

Joins

mSQL

Access Control

Debugging

MsqlPERL

W3-mSQL

PHP/FI

JDBC

WDB