Chapter 4

Designing Faster Sites


CONTENTS


Asking a Webmaster how fast a site should be is like asking a wealthy person how much money is enough. The answer is the same: "Just a little bit more."

This chapter addresses the question of site performance by looking at where the time goes. We examine the details of http, the protocol of the Web, and look at how much overhead is associated with the various components of TCP/IP. In this chapter we cover everything that can be done to speed up a site except changing the graphics. In Chapter 5, "Designing Graphics for the Web," we concentrate on the one element-graphics-that usually has the greatest impact on performance.

Where Does the Time Go?

Once a client connects to a server and asks for a page, the clock begins to run. In general, the time goes to three places:

What Is HTTP and Why Do I Care?

Anyone who has entered a URL has wondered about the letters "http" and why they're omnipresent on the Web. HTTP, the HyperText Transfer Protocol, is a series of handshakes exchanged between a browser like Netscape and the server.

There are many different servers. CERN, a research center in Switzerland that did the original development of the Web, has one. So does the National Center for Super-computer Applications, or NCSA, which did much of the early work on the graphical portions of the Web. Netscape Communications sells two servers, one for general use and one with special security features for commercial transactions. The one thing all servers have in common is that they speak HTTP.

The definitive description of HTTP is found at http://www.ics.uci.edu/pub/ietf/http/draft-ietf-http-v10-spec-03.html. This document contains a detailed memo from the HTTP Working Group of the Internet Engineering Task Force. The current version, HTTP/1.0, is the standard for how all communication is done over the Web.

Communication on the Internet takes place using a set of protocols named TCP/IP, which stands for Transmission Control Protocol/Internet Protocol. This chapter provides more details on TCP/IP later-for now, just think of TCP/IP as similar to the telephone system, and HTTP as a conversation that two people have over the phone.

The Request

When a user enters a URL such as http://www.xyz.com/index.html, TCP/IP on the user's machine talks to the network name servers to find out the IP address of the xyz.com server. TCP/IP then opens a conversation with the machine named www at that domain. TCP/IP defines a set of ports-each of which provides some service-on a server. By default, the HTTP server (commonly named httpd) is listening on port 80.

The client software (a browser like Netscape) starts the conversation. To get the file named index.html from www.xyz.com, the browser says the following:

GET /index.html http/1.0

This instruction is followed by a carriage return and a line feed, denoted by <CRLF>.

Formally, index.html is an instance of a uniform resource identifier (URI). A uniform resource locator (URL) is a type of URI.

Note
There are provisions in the Web specifications for identifiers to specify a particular document, regardless of where that document is located. There are also provisions that allow a browser to recognize that two documents are different versions of the same original-differing in language, perhaps, or in format (for example, one might be plain text, and another might be in PDF). For now, most servers and browsers know only about one type of URI-the URL.

The GET method asks the server to return whatever information is indicated by the URI. If the URI represents a file (like index.html), then the contents of the file are returned. If the URI represents a process (like formmail.cgi), then the server runs the process and sends the output.

Most commonly, the URI is expressed in terms relative to the document root of the server. For example, the server might be configured to serve pages starting at

/usr/local/etc/httpd/htdocs
If the user wants a file, for instance, whose full path is
/usr/local/etc/httpd/htdocs/hypertext/WWW/TheProject.html
the client sends the following instruction:
GET /hypertext/WWW/TheProject.html http/1.0

The http/1.0 at the end of the line indicates to the server what version of HTTP the client is able to accept. As the HTTP standard evolves, this field will be used to provide backwards compatibility to older browsers.

The Response

When the server gets a request, it generates a response. The response a client wants usually looks something like this:

HTTP/1.0 200 OK
Date: Mon, 19 Feb 1996 17:24:19 GMT
Server: Apache/1.0.2
Content-type: text/html
Content-length: 5244
Last-modified: Tue, 06 Feb 1996 19:23:01 GMT
<!DOCTYPE HTML PUBLIC "-//IETF/DTD HTML 3.0//EN">
<HTML>
<HEAD>
.
.
.
</BODY>
</HTML>

The first line is called the status line. It contains three elements, separated by spaces:

When the server is able to find and return an entity associated with the requested URI, it returns status code 200, which has the reason phrase OK.

The first digit of the status code defines the class of response. Table 4.1 lists the five classes.

Table 4.1  HTTP Response Status Code Classes

Code
Class Meaning
1xx
InformationalThese codes are not used, but are reserved for future use.
2xx
SuccessThe request was successfully received, understood, and accepted.
3xx
RedirectionFurther action must be taken in order to complete the request.
4xx
Client errorThe request contained bad syntax or could not be fulfilled through no fault of the server.
5xx
Server errorThe server failed to fulfill an apparently valid request.

Table 4.2 shows the individual values of all status codes presently in use, and a typical reason phrase for each code. These phrases are given as examples in the standard-each site or server can replace these phrases with local equivalents.

Table 4.2  Status Codes and Reason Phrases

Status Code
Reason Phrase
200
OK
201
Created
202
Accepted
203
Partial Information
204
No Content
301
Moved Permanently
302
Moved Temporarily
303
Method
304
Not Modified
400
Bad Request
401
Unauthorized
402
Payment Required
403
Forbidden
404
Not Found
500
Internal Server Error
501
Not Implemented
502
Server Temporarily Overloaded (Bad Gateway)
503
Server Unavailable (Gateway Timeout)

The most common responses are 200, 204, 302, 401, 404, and 500. These and other status codes are discussed more fully in the document located at http://www.w3.org/hypertext/www/Protocols/HTTP/HTRESP.html. We have already described code 200. It means that the request has succeeded, and data is coming.

Code 204 means that the document has been found, but is completely empty. This code is returned if the developer has associated an empty file with a URL, perhaps as a placeholder. The most common browser response when code 204 is returned is to leave the current data on-screen and put up an alert dialog box that says Document contains no data or something to that effect.

When a document has been moved, a code 3xx is returned. Code 302 is most commonly used when the URI is a CGI script that outputs something like the following:

 Location: http://www.xyz.com/newPage.html

Typically, this is followed by two line feeds. Most browsers recognize code 302, and look in the Location: line to see which URL to retrieve; they then issue a GET to the new location. Chapter 7 contains details about outputting Location: from a CGI script.

Status code 401 is seen when the user accesses a protected directory. The response includes a WWW-Authenticate header field with a challenge. Typically, a browser interprets a code 401 by giving the user an opportunity to enter a username and password. Chapter 17, "How to Keep Portions of the Site Private," contains details on protecting a Web site.

Status-code 402 has some tantalizing possibilities. So far, it has not been implemented in any common browsers or servers. Chapter 25, "Getting Paid: Taking Orders over the Internet," describes some methods that are in common use that allow the site owner to collect money.

When working on new CGI scripts, the developer frequently sees code 500. The most common explanation of code 500 is that the script has a syntax error, or that it's producing a malformed header. Chapters 7, "Extending HTML's Capabilities with CGI," and 8, "Six Common CGI Mistakes and How to Avoid Them," describe how to write CGI scripts to avoid error 500.

Other Requests

The preceding examples involved GET, the most common request. A client can also send requests involving HEAD, POST, and "conditional GET."

The HEAD request is just like the GET request, except no data is returned. HEAD can be used by special programs called proxy servers to test URIs to see if an updated version is available, or just to ensure that the URI is available at all.

POST is like GET in reverse; POST is used to send data to the server. Developers use POST most frequently when writing CGI scripts to handle form output.

Typically, a POST request brings a code 200 or code 204 response.

Requests Through Proxy Servers

Some online services, like America Online, set up machines to be proxy servers. A proxy server sits between the client and the real server. When the client sends a GET request to, say, www.xyz.com, the proxy server checks to see if it has the requested data stored locally. This local storage is called a cache.

If the requested data is available in the cache, the proxy server determines whether to return the cached data or the version that's on the real server. This decision usually is made on the basis of time-if the proxy server has a recent copy of the data, it can be more efficient to return the cached copy.

To find out whether the data on the real server has been updated, the proxy server can send a conditional GET, like this:

GET index.html http/1.0
If-Modified-Since: Sat, 29 Oct 1994 19:43:31 GMT <CRLF>

If the request would not normally succeed, the response is the same as if the request were a GET. The request is processed as a GET if the date is invalid (including a date that's in the future). The request also is processed as a GET if the data has been modified since the specified date.

If the data has not been modified since the requested date, the server returns status code 304 (Not Modified).

If the proxy server sends a conditional GET, it either gets back data, or it doesn't. If it gets data, it updates the cache copy. If it gets code 304, it sends the cached copy to the user. If it gets any other code, it passes that code back to the client.

Header Fields

If-Modified-Since is an example of a header field. There are four types of header fields:

General headers may be used on a request or on the data. Data can flow both ways. On a GET request, data comes from the server to the client. On a POST request, data goes to the server from the client. In either case, the data is known as the entity.

The three general headers defined in the standard are:

By convention, the server should send its current date with the response. By the standard, only one Date header is allowed.

Although HTTP does not conform to the MIME standard, it is useful to report content types using MIME notation. To avoid confusion, the server may send the MIME version that it uses. MIME version 1.0 is the default.

Optional behavior can be described in Pragma directives. HTTP/1.0 defines the nocache directive on request messages, to tell proxy servers to ignore their cached copy and GET the entity from the server.

Request header fields are sent by the browser software. The valid request header fields are

Referer can be used by CGI scripts to determine the preceding link. For example, if Susan announces Bob's site to a major real estate listing, she can keep track of the Referer variable to see how often users follow that link to get to Bob's site.

User-Agent is sent by the browser to report what software and version the user is running. This field ultimately appears in the HTTP_USER_AGENT CGI variable and can be used to return pages with browser-specific code.

Response header fields appear in server responses, and can be used by the browser software. The valid response header fields are

Location was mentioned earlier in this chapter, in the section entitled "The Response." Most browsers expect to see a Location field in a response with a 3xx code, and interpret it by requesting the entity at the new location.

Server gives the name and version number of the server software.

WWW-Authenticate is included in responses with status code 401. The syntax is

WWW-Authenticate: 1#challenge_

The browser reads the challenge(s)-there must be at least one-and asks the user to respond. Most popular browsers handle this process with a dialog box prompting the user for a username and password. Chapter 17, "How to Keep Portions of the Site Private," describes the authentication process in more detail.

Entity header fields contain information about the data. Recall that the data is called the entity; information about the contents of the entity body, or metainformation, is sent in entity header fields. Much of this information can be supplied in an HTML document using the <META> tag.

The entity header fields are

In addition, new field types can be added to an entity without extending the protocol. It's up to the author to determine what software (if any) will recognize the new type. Client software ignores entity headers that it doesn't recognize.

The Expires header is used as another mechanism to keep caches up-to-date. For example, an HTML document might contain the following line:

<META http-equiv="Expires" Contents="Thu, 01 Dec 1994 16:00:00 GMT">

This means that a proxy server should discard the document at the indicated time, and should not send out data after that time.

Note
The exact format of the date is specified by the standard, and the date must always be in Greenwich Mean Time (GMT).

Server-Parsed Documents

A document containing HTML sometimes also contains special directives called server-side includes (SSIs). Before being returned, these directives must be read and executed by the server.

Chapter 6, "Reducing Maintenance Costs with Server-Side Includes," describes the details of SSIs. Here, it's worth noting that the extra time required to parse the document is typically on the order of milliseconds. This time is inconsequential to the user, but heavy use of SSIs can put a noticeable load on the server.

Good practice on a server is to parse only those documents with names that end in shtml. This way, developers can use SSIs, but servers only have to parse documents that are known to hold SSIs.

Common Gateway Interface

Sometimes the requested entity is a Common Gateway Interface (CGI) script. In these cases, the server runs the script and returns the results using HTTP format. Chapters 7 and 8 provide details on CGI scripting. The running time is difficult to predict because CGI scripts can do virtually anything that a program can do.

A reasonable goal on a lightly loaded server is to have all CGI scripts complete within 50 milliseconds or so.

Server-Side Clickable Imagemaps

For timing purposes, clickable imagemaps are similar to CGI scripts. Of course, client-side imagemaps, such as those discussed in Chapter 3, "Deciding What to Do About Netscape," only incur the load on the local machine and client-side imagemaps almost always are faster than server-side imagemaps.

Overall Impact of HTTP on Download Time

A typical exchange between a client and server goes like this:

Client (calling on the server's port 80): Please give me index.html.
Server: Here it comes. It's 2782 bytes long.
   Connection closed.
Client (calling on the server's port 80):
Now give me /Graphics/Background.gif.
Server: Okay. This entity is 1542 bytes long.
   Connection closed.
Client (calling on the server's port 80):
Please give me /Graphics/ButtonLeft.gif.
Server: This entity is 564 bytes long.
   Connection closed.
Client (calling on the server's port 80):
Now give me /Graphics/ButtonRight.gif.
Server: Okay. This entity is 566 bytes long.
   Connection closed.
Client (calling on the server's port 80): 
Now give me /Graphics/Logo.jpeg.
Server: Okay. This entity is 20312 bytes long.
   Connection closed.

The conversation continues this way, downloading entities (such as graphics) that make up the complete page, until all the entities have been requested. Next, time passes while the user reads the page.

As a rule of thumb, the handshake-opening the connection, sending the request, and generating the response (not counting the data)-takes between 0.5 seconds and 1 second, so budget about 0.75 seconds.

Most users have at least a 14,400 bps modem. These modems do some compression of the data stream, but big files (such as graphics) are already compressed, so these modems don't deliver much faster than the advertised 14,400 bps. A byte is eight bits, but by the time you consider start bits, stop bits, and (possibly) parity bits, a byte costs around 10 bits to send. Under ideal conditions, therefore, a 14,400 bps modem sends about 1,440 bytes per second.

Suppose that a page has the components shown in Table 4.3.

Table 4.3  Time Required to Download a Page

Component
Size in Bytes
Handshake Time
Download Time at 14.4 Kbps
Time in Secs
Text
2,782 
0.75
1.93
2.68
Background GIF
1,542 
0.75
1.07
1.82
Prev Button
564   
0.75
.393
1.14
Next Button
566   
0.75
.391
1.14
Up Button
563   
0.75
.391
1.14
Logo
20,715
0.75
14.39
15.14
Incidental Graphic
10,212
0.75
7.09
7.84
Total Time
 
5.25
25.65
30.9

This table reveals several things. First, nearly 17 percent of the total time spent at this example site is taken up by HTTP overhead. Second, over 92 percent of the download time is spent moving graphics.

If you want the site to be downloaded faster, you can do three things:

Later in this chapter, in the section entitled "Making the Browser Seem Faster," we describe ways to make the download seem faster even if the overall time is the same. Chapter 5, "Designing Graphics for the Web," deals with techniques to make the graphics smaller.

As covered in the discussion of HTTP, there are three major components to the protocol overhead (that is, the time not spent sending data):

The next section describes how to choose a service provider who'll give your site a competitive edge in performance.

Choosing a Fast Service Provider

There are numerous factors to be considered in choosing an Internet service provider (ISP). Many of these factors have nothing to do with performance. The fastest site in the world is useless if the server is down or overloaded. The fastest site in the world is of appreciably less value if the pages cannot use SSIs or CGI scripts. The following discussion gives recommendations for choosing a high-performance provider, balanced with other factors.

High-Bandwidth Connections

Connections to the Internet come in various sizes, from ISDN lines to huge T-3 pipes. Every site hosting Web pages should support speeds of at least T-1.

Redundant Equipment

Most ISPs have at least one T-1 link to the Internet. In the United States, such links are provided by major national carriers such as Sprint and MCI. If the carrier has a problem and the link goes down, the site becomes unavailable.

The best sites have multiple T-1 connections. Not only does this design offer higher bandwidth, but if the connections are truly independent, one faulty link is far less likely to bring down others.

Similarly, distributing workload across multiple computers is a good way to keep performance high. That way, if one machine fails or is taken down for maintenance, the site is still available.

Technical Support

A faulty network interface, transceiver, connector, or cable can halt access to a site-or make performance erratic. It's best if the ISP has someone on-site 24 hours a day, 7 days a week. Failing this, the ISP should provide a pager number so that problems on the site can be brought to the attention of management and technical support staff as soon as the problems occur.

It's good practice to provide a system status line, a phone message that reports any known problems and an estimate of how long it will take for the system to be up again. This way, when a user notices a problem, the user can check the status line to see if someone already is working on the trouble.

Being Close to the Target Market

The section entitled "How Do TCP Connections Work?" later in this chapter talks about round-trip time (RTT). There are only two ways to reduce RTT: move closer, or increase the speed of light. Of the two, moving closer is markedly less expensive.

Okay, move closer to where? For many sites, the base of prospective customers is literally worldwide, or at least nationwide. In such cases, it's best to choose the ISP based on other factors, because physical location is not a major factor contributing to performance. If the target market is geographically focused, however, locating an ISP in that area can reduce RTT and improve performance slightly.

Major gains are to be had with mirroring. If a business has a truly international presence, consider setting up servers around the world with sites that are exact copies of each other-these sites are known as mirrors. Many ISPs in Europe and Asia rent space to U.S. businesses that need a mirror host.

Access to cgi-bin

Many ISPs do not grant access to the local cgi-bin directory. If the developer cannot access the server's cgi-bin directory, CGI scripts have to be hosted on another server. Such a design can add complexity and overhead to your maintenance tasks.

Access to SSIs, Including exec

Similarly, some ISPs do not parse documents containing SSIs. Even if they do, many prohibit the exec SSI on security grounds. Unlike CGIs, a page with SSIs must be processed by the server that hosts it. Because the use of SSIs can enhance quality and reduce your maintenance costs, the ability to have the server handle SSIs is highly desirable. Chapter 6, "Reducing Maintenance Costs with Server-Side Includes," discusses SSIs in detail.

Closing Security Holes

All operating systems have security holes. Various developers of UNIX have elected to publicly announce such holes and the changes required to close them. Putting a machine directly on the Internet exposes it to severe security threats.

A good ISP should have a process in place by which the staff and management regularly review newly reported security holes and do what is necessary to close them.

Making the Browser "Seem" Faster

It's always nice to make a site respond more quickly. Once a choice of ISPs is made, however, and your site design stabilizes, the number of things that you can do to decrease site loading time is limited. Instead, what you can do is to give users the impression that the site is loaded before the loading actually completes.

Advancing the Layout Complete Moment

Watch a page load into a graphical browser like Netscape. Initially, the page is blank, and the computer reports that it's busy downloading the data. There are no scroll bars, and most of the text is off the page. The user sits in frustration, waiting to click something.

Then the status bar reports Layout complete. Scroll bars appear, and the user begins reading the contents of the site.

For many pages, this layout complete moment occurs just as the last data is loaded from the server. In browsers like Netscape, however, there's no reason to wait. Here are three ways to advance the layout complete moment:

Using HEIGHT and WIDTH Tags

When a browser is reading in a file, it attempts to lay out the text around the graphics. Once the layout is complete, the user can begin to scroll the page and read the text. As you learned in the discussion of HTTP, the browser loads the page, then goes back and requests the images. If the <IMG> tags have HEIGHT and WIDTH attributes, the browser can complete the layout as soon as the text is loaded.

Do not use the HEIGHT and WIDTH tags to change the size of the image. Use them only to record the actual height and width in pixels. If HEIGHT and WIDTH are used to "shrink" an image, the whole image is still loaded, so no bandwidth is saved. If HEIGHT and WIDTH are used to increase the size of an image, the image has low resolution, and quality is degraded.

Furthermore, if a page with HEIGHT and WIDTH tags is read by a browser that does not recognize those tags, the graphic is displayed in its original size. If the page has been designed for this size, the only consequence is that the layout complete moment is delayed. If the image has been contorted to a new WIDTH and HEIGHT, though, the user will be surprised by a graphic that is unusually large or unusually small.

Using Interlaced GIFs for Large Graphics

Most software that can produce GIFs has an option to create interlaced GIFs. Interlaced GIFs load in four steps. First, rows 0, 7, 15, 23, and so on load. Then rows 4, 12, 20, 28, and so on load. This process continues until the whole graphic has been downloaded. If the WIDTH and HEIGHT tags are in place, the layout is complete as soon as the text is loaded. Then, as the user watches, a blurry version of the graphic loads quickly. Four successive passes improve the quality of the image.

If you use interlaced GIFs, the user can quickly get a sense of whether the graphic (and the page) is worth seeing, and can move on if desired.

Using LOWSRC in the IMG Tags

Although LOWSRC only works for Netscape, it's a useful way to quickly load an image, and should be considered as an alternative to interlaced GIFs.

To use LOWSRC, use a graphics program to make a small version of the graphic. If the original graphic is, let's say, 200¥400 pixels, make a version that is 50¥100. The original graphic might be 40K in size, and might take 29 seconds to load. The smaller version would be under 5K, and would load in less than six seconds. The <IMG> tag would say:

<IMG SRC="highResVersion.gif" LOWSRC="lowResVersion.gif"
ALT="text description"HEIGHT=200 WIDTH=400>

When a browser is ready to retrieve this graphic, it uses GET for lowResVersion.gif and expands it to 200¥400 pixels. The resulting low-resolution image serves as a placeholder that's visible quickly. The high-resolution version gradually loads and replaces the low-resolution version.

Being Faster Versus Seeming Faster

Look again at the page described in Table 4.3. The original total time was estimated to be 30.9 seconds. From the time of the first GET to the time the page is available to the user is more than 30 seconds.

Now, suppose that we add HEIGHT and WIDTH tags to each <IMG>. The download time stays the same, but the page is available for scrolling as soon as the text is received-just 2.68 seconds.

Suppose that the buttons are interlaced, and the larger graphics have a LOWSRC that is one-eighth of the size of the original graphic. (For technical reasons, it's unwise to interlace a background GIF.) Further, suppose that the page is laid out such that the background graphic loads first, the logo and incidental graphic load next, and the buttons load last. Of course, all graphics have HEIGHT and WIDTH attributes. The new timeline is shown in Table 4.4.

Table 4.4  Improved Download Times

Event
Time
Status
First GET
0.00
 
Text loaded
2.68
Layout complete; scroll bars appear
Background GIF loaded
4.50
 
LOWSRC logo loaded
7.08
Low-resolution logo visible
LOWSRC incidental loaded
8.72
Low-resolution incidental visible
Prev button first pass complete
9.30
Prev button vaguely recognizable
Next button fully loaded
11.00
Prev and Next buttons fully loaded
Up button fully loaded
12.14
All graphics visible
Full-res logo loaded
27.28
Logo at full resolution
Full-res incidental loaded
35.12
Page fully loaded

Even though the total loading process takes slightly longer than before, the user is able to do useful work on the site less than three seconds after the initial GET. Just 11 seconds after the page starts loading, the entire page is up (albeit at low resolution).

Many industry experts believe that users start "timing out" or losing the focus of their attention, after about 12 seconds. If you make the page usable within that 12 seconds, most users will still be there browsing the page when the final graphics load at about 35 seconds.

TCP/IP and Performance

Earlier in this chapter, we likened TCP/IP to a phone company for Web sites. TCP/IP provides basic connectivity, while higher-level protocols like HTTP implement the applications. By understanding more about how TCP/IP works, a Web developer can tune each page for maximum performance.

Understanding TCP/IP

TCP/IP is a layered architecture. This means that the protocols can be thought of as independent layers (see Fig. 4.1). Any given layer can be changed without changing the protocols in the other layers.

Figure 4.1: TCP/IP is a layered architecture-each layer uses the layers below to talk to its peer.

In practice, this means that the Internet as a whole is speaking TCP, IP, and related protocols, regardless of whether the lower layers are ethernet over coaxial cable or PPP over dial-up lines, and whether the application is FTP, Telnet, or HTTP.

What Is TCP/IP?

Networks often are considered to have as many as seven layers. Of those, the TCP/IP protocols present a standard for four:

The Link layer consists of the hardware and device drivers necessary to put the computer on the network. Standards here include ethernet, token ring, SLIP, and PPP.

The Network layer protocols define how packets of data move around the network.
Three protocols comprise the defining standard for the Internet:

The Transport layer is responsible for end-to-end connectivity. In the TCP/IP family of protocols, there are two very different transport mechanisms: TCP (Transmission Control Protocol) and UDP (User Datagram Protocol).

Suppose that you log into your local service provider, abc.net, to access the Web site at xyz.com. The Network layer makes sure that data goes from your desktop computer to your ISP's machine. Another portion of the Network layer protocol makes sure that data keeps flowing between the abc.net machine and xyz.com. The Transport protocols that manage the process are TCP and UDP.

UDP is a simple service that sends a set of packets (called datagrams) from one machine to another. Because HTTP is a protocol that only has two steps (request and response), you might think that UDP is a natural choice-you would be wrong. UDP does not guarantee delivery of a message; it's known as an unreliable service.

Instead, designers choose TCP to transport HTTP. TCP sets up a long-term, reliable connection. One way to think of TCP versus UDP is to compare them to a telephone call and a telegram, respectively. If you send a telegram, you have no way to know if it got to the intended recipient. Moreover, it's a one-shot message-if you want to send another telegram immediately after the other one, you have to fill out a new form specifying the addressee, phone number, and so on. TCP is more like a phone call-you dial, set up the connection, talk to the other party, and then hang up. If, during the call, the connection breaks or the other party goes away, you know it almost instantly.

Finally, the Application layer provides the services we most often associate with the Internet, including:

How Do TCP Connections Work?

To ensure reliability, TCP breaks a message down into chunks called segments. When a segment is sent, TCP starts a timer. If the timer runs out before the distant host acknowledges receipt of the segment, TCP retransmits the segment.

When TCP receives a segment, it looks at a checksum in the segment to be sure that the segment arrived uncorrupted. If the segment is okay, TCP sends an acknowledgment. If the segment is bad, TCP discards it and does not acknowledge it.

The acknowledgment often is delayed up to 200 ms (milliseconds)-if the host sending the acknowledgment has any data of its own going back to the other host, it can piggyback the acknowledgment onto a data segment.

As valid segments are received, TCP puts them in order and discards duplicates.

Finally, each host maintains flow control, so that a fast host does not use up all the buffers on a slower host. TCP flow control is based on a sliding window protocol that will be discussed in a moment.

To set up a connection, TCP/IP exchanges three segments as described in Table 4.5.

Table 4.5  TCP Connection Sequence

Sender
Receiver
Message
Client
Server
Let's synchronize sequence numbers. Here are my sequence number and the maximum segment size I can accept.
Server
Client
Acknowledged. Here are my sequence number and window size.
Client
Server
Your last message is acknowledged.

These three segments are often called the three-way handshake. While it seems tedious, this handshake is an established way to ensure reliability on an otherwise unreliable network.

In the first segment, the client announces the Maximum Segment Size (MSS) that it will accept. The default value is 536 bytes. The MSS mechanism is used to help avoid fragmenting TCP segments across multiple IP datagrams.

In the second segment, the server advertises its window size. This window is the one used in the sliding window protocol mentioned above. Here's how sliding windows work. Suppose that the sender, a fast Sparc server, has 8,192 bytes to send to the receiver, a slow desktop computer, and that the receiver has advertised a window of 4,096 bytes and an MSS of 1,024 bytes. The sequence can run as shown in Table 4.6.

Table 4.6  TCP Sliding Window Sequence

Sender
Receiver
Message
Server
Client
Here are 1,024 bytes of data.
Server
Client
Here are 1,024 bytes of data.
Server
Client
Here are 1,024 bytes of data.
Server
Client
Here are 1,024 bytes of data.
Client
Server
Acknowledged. My window is 0.
Client
Server
Window update. My window is now 4,096.
Server
Client
Here are 1,024 bytes of data.
Server
Client
Here are 1,024 bytes of data.
Server
Client
Here are 1,024 bytes of data.
Server
Client
Here are the last 1,024 bytes of data.
Client
Server
Acknowledged. My window is 0.
Client
Server
Acknowledged. My window is 4,096.

The sender keeps track of the receiver's window. When segments that have been sent are acknowledged, the window closes. When the receiver processes data and frees up the TCP receive buffers, the window opens.

Notice that after the fourth send, the client acknowledges the data, and says that it has no window left because it's still processing the data. The server stops sending, because it knows that the window is full. When the client provides another window update to say that the window is available, the server knows that the client has finished processing the first batch of data and is ready for more.

After another four sends, the server stops sending, because it's finished. The client acknowledges the new data, and says that it has no window left because it's still processing the data. This time, when the client announces that the full window is available, the client and server proceed to tear down the connection.

To tear down a connection, TCP/IP exchanges four segments as described in Table 4.7. This sequence is called a half-close. Either end can initiate the half-close-in this example, it's initiated by the client.

Table 4.7  TCP Disconnect Sequence

Sender
Receiver
Message
Client
Server
I'm finished sending data.
Server
Client
I acknowledge that you're finished.
Server
Client
I'm finished sending data.
Client
Server
I acknowledge that you're finished.

Why TCP/IP Was a Poor Choice for the Web

From the previous description, it's clear that TCP performance is largely governed by window size. Many UNIX systems default to 4,096-byte receive buffers. Some newer versions of UNIX advertise 8,192- or even 16,384-byte buffers. The minimum buffer size is calculated using a TCP algorithm called slow start, which depends upon the bandwidth and the RTT.

Suppose that the receiver advertises a window size of 4,096 bytes. Can the sender assume that there is room for 4,096 bytes, and start sending? If both machines are on the same LAN, the answer is yes. On the Internet, however, there are likely to be routers and slower links between the sender and the receiver. One of these intermediate data handlers could easily overload without the sender knowing that there's a problem.

To avoid this predicament, TCP designers invented slow start. With slow start, another window, the congestion window (cwnd), is added to the protocol. When the connection is set up, cwnd is set to one segment. Each time a segment is acknowledged, cwnd is increased by the number of bytes in one segment. The sender's limit is the lesser of the number of bytes in the advertised window and the number of bytes in cwnd. Table 4.8 shows how Table 4.6 would look with slow start.

Table 4.8  TCP Sliding Window Sequence with Slow Start

Sender
Receiver
Message
Server
Client
Here are 1,024 bytes of data.
Client
Server
Acknowledged. My window is 3,072.
Server
Client
Here are 1,024 bytes of data.
Server
Client
Here are 1,024 bytes of data.
Client
Server
Acknowledged. My window is 3,072.
Server
Client
Here are 1,024 bytes of data.
Server
Client
Here are 1,024 bytes of data.
Server
Client
Here are 1,024 bytes of data.
Client
Server
Acknowledged. My window is 2,048.
Server
Client
Here are 1,024 bytes of data.
Server
Client
Here are the last 1024 bytes of data.
Client
Server
Acknowledged. My window is 0.
Client
Server
Acknowledged. My window is 4,096.

After the first segment, the server cannot send another segment, because cwnd equals 1,024 at the outset of this exchange.

When the client announces that the window is available, the server sends two segments, but cannot send another segment, because cwnd equals 2,048.

The next time the client announces that the window is available, the server sends three segments, the server sends three segments, because cwnd equals 3,072.

The next time the client announces that the window is available, the server only sends two segments, because that completes the data that needs to be sent. The client acknowledges this data, but says its window size is zero because it's still processing the data. Finally, the client announces that its window is available again, and the client and server proceed to tear down the connection.

Even if the portion of the Internet between the two machines has plenty of capacity, the exchange is limited initially by the RTT between the two ends. If the sender sends a segment, the ACK is sent after a delay of 200 ms (awaiting piggyback) plus RTT. During the first few segments, RTT is the dominant factor in determining throughput.

For long-term connections like FTP and Telnet, slow start doesn't matter much, because within a few segments the sender computes a realistic cwnd. If only a few segments are sent before the connection is torn down, however, the sender and receiver are constantly in slow start, recomputing cwnd.

Now, let's put the whole sequence together. The following description shows what happens when our example page from Table 4.3 is requested. Assume that the MSS is 1,024 bytes.

First, the client sends GET thePage.html (size=2,782).

  1. Three segments are exchanged to set up the connection.
  2. The cwnd is set to 1,024 (one segment).
  3. The receiver advertises a window of 4,096.
  4. One segment of data is sent (1,024 bytes) because cwnd=1024 (one segment).
  5. The receiver ACKs first segment.
  6. Two segments of data are sent because cwnd=2,048 (two segments); all data has been sent.
  7. Four segments are exchanged to tear down connection.

Next, the client sends GET background.gif (1,542 bytes).

  1. Three segments are exchanged to set up the connection.
  2. The cwnd is set to 1,024 (one segment).
  3. The receiver advertises window of 4,096.
  4. One segment of data is sent (1,024 bytes) because cwnd=1,024 (one segment).
  5. The receiver ACKs first segment.
  6. One segment of data is sent because cwnd=2,048 (two segments); all data has been sent.
  7. Four segments are exchanged to tear down the connection.

Now the client sends GET prev.gif (564 bytes).

  1. Three segments are exchanged to set up the connection.
  2. The cwnd is set to 1,024 (one segment).
  3. The receiver advertises window=4,096.
  4. One segment of data is sent (564 bytes+response header); all data has been sent.
  5. Four segments are exchanged to tear down the connection.

The next two buttons are loaded just as prev.gif was, so we won't describe that. After those two buttons are loaded, GET logo.gif (20,715 bytes) takes place:

  1. Three segments are exchanged to set up connection.
  2. The cwnd is set to 1,024 (one segment).
  3. The receiver advertises window=4,096.
  4. One segment of data is sent (1,024 bytes) because cwnd=1,024 (one segment).
  5. The receiver ACKs first segment.
  6. Two segments of data are sent because cwnd=2,048 (two segments).
  7. The receiver ACKs next two segments.
  8. Three segments of data are sent because cwnd=3,072 (three segments).
  9. The receiver ACKs next three segments.
  10. Four segments of data are sent because cwnd=4,096 (four segments) and window=4,096.
  11. The receiver ACKs next four segments.
  12. Four segments of data are sent because window and cwnd haven't changed.
  13. The receiver ACKs next four segments.
  14. Four segments of data are sent because window and cwnd haven't changed.
  15. The receiver ACKs next four segments.
  16. Three segments of data are sent because window and cwnd haven't changed, but only three segments are left.
  17. The receiver ACKs next three segments; all data has been sent.
  18. Four segments are exchanged to tear down the connection.

The incidental graphic downloads in a manner similar to the logo.

Even when sending the logo-the most efficient of these file transfers-20 percent of the segments dealt with opening and closing the connection, and three of the seven exchanges were limited by slow start. The buttons, background GIF, and text were all sent entirely under slow start.

Simon E. Spero at the University of North Carolina (UNC) has empirically demonstrated the impact of slow start and other elements of the TCP protocol on HTTP. He timed an actual download from the server at NCSA to a client at UNC, and found that, in this test, HTTP spent more time waiting than transferring data. Table 4.9 summarizes his results.

Table 4.9  Empirical Impact of TCP on HTTP

Activity
Experimental Time
Total transaction
530 ms
Opening connection350 ms (includes first piggyback ACK)
Slow start
70 ms
Theoretical speed
25 ms

The theoretical speed in this table assumes fetching ten documents of 1,668 bytes each, with a long-lived connection, and with all entities downloaded on the same connection (150 ms transfer, 70 ms latency, and 30 ms processing time, divided by 10 documents).

Spero's paper on the subject is available at http://sunsite.unc.edu/mdma-release/http-prob.html.

Measuring TCP/IP

While it may be a few years before an improved HTTP is fielded, today there are several TCP/IP-level tools that Webmasters can use to determine how well their sites are performing.

ping

The TCP/IP utility ping sends a series of packets (a ping) from one machine to another and shows the RTT. To run ping on a UNIX machine, enter the following:

/etc/ping host

With most variants of UNIX, this command produces a ping once per second. If it does not, check the man page for the switch to put ping into continuous mode.

You should see all packets being returned. If no packets are returned, there's a network problem. If most packets are returned but one is dropped occasionally, there's an intermittent problem.

Take note of the RTT-anything below 100 ms is good, and lower is better. If you have a choice between several ISPs, telnet to a site across the country from each (or find a friend who lives far away) and ping the site at various times of day. There can be considerable variance in RTT between servers that are within even a few miles of each other.

netstat

The discussion about TCP mentioned that TCP acknowledges packets that have been received correctly. A lost or corrupted packet causes the sender to time out and send the packet again. Thus, network integrity problems directly cause performance degradation. To check the quality of a machine's network interface, run the following command:

netstat -i

The output lists each of the interfaces to the Net. One of the column headings is Ierrs. Find the row(s) describing your machine's link to the Net-use the Net/Dest and Address columns to help pinpoint the correct row(s). The Ierrs column for that row should be below 0.025 percent of the value in the column Ipkts. If it isn't, report the problem to your technical support staff. Now, look at the Oerrs column. Likewise, it should be below 0.025 percent of the value in the column Opkts. These numbers represent the total input and output errors since the system was last booted. If Ipkts and Opkts are low, wait a day or so and check again.

Be sure to also watch the number of collisions listed; collisions are indicative of network congestion. While this figure is mainly relevant to LANs, machines within a site are often connected by a LAN. A congested LAN can spell performance problems. If the number of collisions is consistently close to or greater than 10 percent of the number of output packets, consider redesigning the LAN to break up traffic patterns.

You also can use netstat to check the gateway leading out of your LAN and onto the Net. Enter the following:

netstat -s | grep "checksum"
netstat -s | grep "total packets"

The Bad Checksums figures should be below 0.01 percent of the Total Packets Received line. If not, ask your technical support staff to check out the gateways with a network analyzer.

spray

Network problems can be intermittent. To increase the likelihood of finding an error, use spray. Enter the following:

/etc/spray [host]

This sends approximately 100,000 bytes (1,162 packets of 86 bytes each) to the given host. spray reports the percentage of packets dropped by the other host. ures of five percent or less are acceptable. If the numbers go much above five percent, then you know that your computer can send to the other host at a fast enough rate to fill up the channel connecting the two machines. The other machine might be very slow-or under an unusually heavy load-or the network connecting the machines might have a bottleneck.

Here's how to get a better idea of why packets are being dropped. Suppose that your machine is named mickey and the other host is named minnie. Log onto the distant system, minnie, and enter the following:

netstat -s | grep "socket"

You should get a line that talks about "socket full drops" or "socket buffer overflow" or something similar. Note the number-this is the number of times minnie has dropped a UDP packet because the server was too busy or didn't have room in the buffer.

Now, log onto your local system, mickey, and run spray at minnie; note the number of dropped packets.

Back on minnie, run netstat again. Compare the number of socket full drops (or socket buffer overflow) with the number of dropped packets that spray reported. If they're the same, then the problem is that minnie is overloaded. If the number of socket full drops is smaller than the number of dropped packets reported by spray, then packets are being lost or corrupted between you and the distant host. Use a network analyzer to trace these packets and find out what's happening to them. (Some new versions of UNIX have network analysis tools built into them as software.)

If you find indications that a server is overloaded, use standard UNIX tools like sar and vmstat to find out where the load is coming from, and then add resources or redistribute work to try to reduce the load. Occasional heavy loads are to be expected, but if the system is overloaded day in and day out, it's time to spend some money and alleviate the problem.

traceroute

Another UNIX utility that can shed light on a server's performance is traceroute. In its simplest form, you can run traceroute this way:

traceroute [host]

This utility lists several lines showing which machines were used as gateways as packets moved from your machine to the distant host. Suppose that you are logged into mickey and enter the following:

traceroute donald.com

The output might look like this:

traceroute to donald.com (--some IP address--),
30 hops max, 40 byte packets
1_minnie.com_(--some IP address--)_20 ms_10 ms_10 ms
2_pluto.com_(--some IP address--)_120 ms_120 ms_120 ms
3_donald.com_(--some IP address--)_150 ms_140 ms_150 ms

For each host in the path, traceroute sends three datagrams and records the RTT for each. If a datagram isn't acknowledged within five seconds, traceroute prints an asterisk and moves on.

When examining traceroute results, look for any unusually large steps in RTT from one line to the next. In the example above, the hop from minnie.com to pluto.com averaged over 100 ms. While something might occasionally happen to slow down a datagram, consistently slow behavior is a sign that something is wrong.

The last line of the traceroute results generally should be consistent with the RTT in ping. If it takes unusually long to ping a host, consider running traceroute to get an idea of where the delay is being inserted.

A few minutes playing with traceroute should show you why sites with heavy international traffic are well advised to consider overseas mirror sites. Also, remember the lesson of our TCP/IP analysis-RTT and bandwidth play a major role in overall site performance.

How Caches Distort Timing Tests

At some point while evaluating site performance, you might encounter a page that comes back much faster than you expect. There's a good likelihood that this effect is caused by the cache on a proxy server somewhere in the path. For example, let's say that a user on America Online accesses the first page of Yahoo at 9:52. The page does not exist (or has expired) on the proxy server, so the proxy server does an HTTP GET, stores the page in its cache, and forwards a copy to the requesting user.

At 10:12, another AOL user asks for the same page and happens to get the same proxy server. The proxy server determines that the page is available in the cache and is current, so the proxy server sends the page to the user from the cache. Because the user has a direct phone link to AOL, the user avoids the intermediate links of the Internet and (in theory) gets the page more quickly.

In practice, the online servers have so many proxy servers that the likelihood of the page of a lightly trafficked site being in the cache is small. Nevertheless, it can happen when you least expect it. AOL reports that any page in its cache that's more than two hours old is re-retrieved from the origin server. There are stories circulating on the Net, however, of the page that would not die-a page that got into a cache and would not flush, even after the origin server had long since updated the page.

Enhancing the Sample Site

In the next release of the Realtor's site, the developer added HEIGHT and WIDTH attributes to the <IMG> tags, used some LOWSRC, and converted the rest of the GIFs to interlaced format. The developer asked a friend who lived on the other side of the country to ping and traceroute the site, and learned that the server was typically responding in 70 to 100 milliseconds. But like many sites, the download time was still dominated by large graphics.

This chapter addressed site performance by looking at where the time goes. We examined HTTP, and saw that many features of TCP/IP make it a poor choice for underlying transport of the Web protocol. Within those constraints, however, you saw that some things can be done to speed up a site, and more things can be done to make the site "seem" faster.

In the next chapter, "Designing Graphics for the Web," we focus decreasing the size and download time of the one element that usually has the greatest impact on performance-graphics.