A Web server is the server software behind the World Wide Web. It listens for requests from a client, such as a browser like Netscape or Microsoft's Internet Explorer. When it gets one, it processes that request and returns some data. This data usually takes the form of a formatted page with text and graphics. The browser then renders this data to the best of its ability and presents it to the user. Web servers are in concept very simple programs. They await for requests and fulfill them when received.
Web servers communicate with browsers or other clients using the Hypertext Transfer Protocol (HTTP), which is a simple protocol that standardizes the way requests are sent and processed. This allows a variety of clients to communicate with any vendor's server without compatibility problems.
Most of the documents requested are formatted using Hypertext Markup Language (HTML). HTML is a small subset of another markup language called Standard General Markup Language (SGML), which is in wide use by many organizations and the U.S. Government.
HTML is the lifeblood of the Web. It is a simple markup language used for formatting text. Browsers interpret the markup information and render its intent to the best of their abilities. More importantly, HTML allows linking to different documents and resources; this is the hypertext portion of the Web.
Hypertext allows a user to refer to other documents stored in the same computer or in a computer located in a different part of the world. It allows information to be almost tridimensional. Not only can you read sequentially, but you can jump for more elsewhere.
The information retrieval process is completely transparent to the user; it's easy and free-form. Navigation through this sea of information is in an ad hoc way. While the results and implications of this new learning process are yet to be seen, it sure is powerful. It provides a seamless exploration experience of documents and services. This is what the Web is all about. It allows you to gather information easily and presents it in a way that it is easy to digest. It's graphic, and it can combine sound and moving pictures.
You can learn more and find related issues that spark your interest; it's interactive. Instead of paging through a book (still a good thing), you can use the computer to remove much of the legwork associated with retrieving related information, and it allows you to explore the material that fits your needs or mood. It's like TV, but you get to choose the programming.
The Web server is responsible for fetching you this information. While Web servers may have been simple at some point, they are not anymore. All Web servers are not created equal.
If all the sudden you were asked to set up a Web site, you would be confronted with a variety of issues that you would need to resolve before you code your first HTML page. The most important issue, and probably the reason why you bought this book, is deciding which server software to use. Given the myriad of Web servers available, the choice is undoubtedly difficult.
The following is a list of issues that you should use to evaluate any piece of software before you commit work to it:
Commercial versus freeware
Ease of installation
Ease of configuration
Ease of extending or customizing some aspect of the server
Size of the installed base
Performance and resource consumption
Secure transaction support
Source code availability
Apache is freeware; therefore, you invariably have to confront the issue of free software. Is free software really cheap? After all, who do you go to if you have problems? What are the odds of the product not being supported? Free software usually has sparse documentation and no direct technical support. However, there are a few software packages out there that are superbly documented, supported, and maintained. In my view, Apache belongs to that list.
If you have a little understanding about UNIX, know how to use an UNIX editor such as vi, pico, or emacs, and are not afraid of the shell, Apache is a great choice. It is easy to install and configure, even for people that are not too savvy on the UNIX ways. The software is available in two types: precompiled and in source form.
Precompiled binaries are available for
There's also an OS/2 version. Rumor has it that a Windows NT version of Apache is also in the works. Given the growing popularity of both Apache and Windows NT in corporate networks, such a port would help to capture an even bigger segment of the Web server market.
If your OS/hardware is not available precompiled and you have an UNIX compiler such as GNU's GCC, there are configuration directives for almost every UNIX variant imaginable. All you need to do is set up a few options, type make, and your software will be built. The next chapter walks you through the process of compiling and installing your Apache server.
Installing a Web serverbe it through a graphical user interface (GUI)-based Netscape product or Apacheis simple. The GUI tools may give more reassurance to people who are afraid of command line interfaces, but in general, configuration and installation are fairly straight forward in either version. If you are not sure what a configuration value is, more than likely you will be stumped in either presentation. This is where documentation, such as this book, will help you. This book doesn't assume that you know much about UNIX or anything else. However, it will give you enough background to get going and will enlighten you about some of the possibilities. I hope your curiosity awakens.
Configuring Apache is very easy to do. Apache utilizes three configuration files; all of which are already preset to safe default behaviors. You just need to specify a few file locations, and name your server so that Apache can find its configuration files and the location of the document tree it is serving. To do this, all you need is a good UNIX text editor that you feel comfortable with. Chances are your version of UNIX has some sort of graphical editor you already feel comfortable with. Chapter 2, "Installing and Configuring an Apache Server," explains all the basic configuration steps you need to take to get your Apache server running fast. It's easy to do.
If you are customizing some aspect of the server, you'll love Apache. Its source code is clearly written but is a little thin on the documentation side. The Apache server is implemented as a set of software modules. Creating a new module that modifies the behavior of the server in some way will require less to learn on your part. There's a growing list of freely available third-party modules. More than likely you can find a module that implements the functionality you need.
The Apache server implements the same, if not more, features than the equivalent commercial Web server, and basically every aspect of the Apache server's functionality is configurable. This makes it easy to get the server to behave the way you want or need for your site.
Some of the most important features include the following:
At the time of this writing, Apache was the leader in Web server installed base. According to the August 1996 Netcraft survey (http://www.netcraft.co.uk/Survey/Reports/), Apache brand Web servers comprised 35.68 percent (37.16 percent if you add all the secure versions of the server) of the Web servers in their survey. That doesn't seem like much until you realize that the competition is far behind, and the entire Web server market is dominated by three products:
Netscape Communications 7.25%
Netscape Commerce 6.83%
Microsoft Internet Information Server 5.49%
Apache SSL US 1.43%
The other remaining 424 server brands didn't even get one percent of the market share! The Netcraft survey currently includes 342,081 servers. Check the details of the survey for yourself on Netcraft's Web site because this information changes monthly.
The Apache server was originally based on the NCSA code. Its development was started because of some security problems associated with the NSCA server and concerns for having an UNIX Web server that implemented some special features. Also driving the effort was a concern to somehow guarantee that a free UNIX server would always be available to the UNIX community. The Apache group was then formed, which developed a series of patches for the NCSA serverhence the name, Apache, from A PAtCHy server.
The Apache server has undergone several releases since its initial inception. There have been 32 different releases including five beta releases for version 1.1.0the current version at the time of this writing is 1.1.1. However, by the time this book is printed, version 1.2 should be available. The Apache team develops new revisions faster than I can document. Bug fixes are addressed almost immediately. More importantly, advisories from Computer Emergency Response Team (CERT) Coordination Center and other parties are corrected immediately to ensure that no security problems are present in the server.
In terms of total code lines, the server has more than doubled from approximately 10,000 lines in its initial NCSA-fixed version, to over 25,000 in its current released incarnation (these counts include comment lines). In its current form, it bears little resemblance to the NCSA server; it's a whole new animal.
While all this performance is not an indication of anything but past performance, the development patterns are there. The main reason for Apache is to ensure that a freely available UNIX Web server is always available to the UNIX community. Apache was created so that there would be an HTTP server that functioned the way some WWW providers thought it ought to work. After all, many of the Apache programmers have real jobs involved with providing WWW services or Internet connectivity.
The Apache server offers superior performance than the NCSA server. It has a pre-forking model that manages a configurable amount of child processes. A preforking model is one where server processes are waiting to answer requests. Many servers start a process only when they receive a request. Once the request is fulfilled, the server process dies, requiring this cycle to start all over with the next request.
Under UNIX, forking (launching) a process can be very expensive. It takes time that a busy server doesn't have and creates an impact on the system that further aggravates performance. By preallocating a number of servers to wait for requests, Apache can respond quickly and more efficiently. By reusing already available servers to respond to new requests, Apache is able to gain additional performance because it spends more time serving requests than setting up and cleaning after itself.
Using a conservative approach, the number of children processes changes dynamically. As server demands change, the number of processes is adjusted accordingly. Children processes are reused for a certain number of connections after which the parent server process kills them and recycles those resources with a new process.
Apache also implements a new HTTP 1.1 draft feature: Keep-Alive persistent connections. (Apache 1.2 is fully HTTP/1.1 compliant.) Keep-alive allows a single TCP connection to serve several requests from a client. The original model instigated a new connection for each request, requiring the overhead of starting and closing a Transmission Control Protocol (TCP) connection with the server for each element in a page. If an HTML page referenced five images, a total of six different requests would be generated. Under the new persistent connection scheme, the six elements are transmitted through the same TCP connection. This approach increases performance on pages that contain multiple images by as much as 50 percent. However, this feature only works with supporting browsers (currently Netscape Navigator, Spyglass, and Microsoft's Internet Explorer) that provide browser support for this feature. Also, Keep-Alive connections only work if the size of a file is known beforehand, meaning that output from a CGI program doesn't benefit from the new connection model.
The standard Apache release does not provide secure transaction support. However, there are products based on the Apache source code that do (Apache-SSL and Apache-SSL-US). Both of these servers are fully compatible with Netscape's encryption mechanism, and rumor has it that these two products are the reason why the Netscape product offerings were drastically reduced in price.
Following UNIX's tradition of source code availability, Apache's source code is distributed with every copy at no extra charge, making it possible to compile your own customized version easily. This allows you to compile a version of Apache for a nonpopular UNIX box. Source code availability invariably plays an important role in UNIX software acceptance. On sites where stringent security is important, system administrators have the ability to examine the source code line by line, thus ensuring no undesirable effects or hidden security risks are present.
Great technical support for Apache is available from Usenet at comp.infosystems.www.servers.unix. Many members of the Apache team answer questions there regularly.
To search for questions that may have been already asked, surf to http://www.dejanews.com. Dejanews is a useful Usenet database. It archives all USENET newsgroups and has a fast search engine that allows you find the information you need in seconds. If your question has been asked before, it will be there.
Commercial support for Apache is available through a number of third-party organizations. One of the most popular is available from http://www.ukweb.com. However, it is expensive (around $1,500 per year).
Because of Apache's modular design, many third-party developers have programmed modules to meet their specific needs, such as custom authentication modules, embedded Perl interpreters, SSI scripting extensions, and so on. Chapter 7, "Third-Party Modules," covers many of the more important ones. Chances are good that something you need has already been developed by someone else. If the module you need was not included with the Apache release, you can probably find what you need at
Some of the extensions and modules you'll find there include support for the following:
Support for setting the UID and GID of executing CGI scripts
User authentication using UNIX system account databases
Enhanced Server Side Include (SSI) modules
Embedded Perl interpreter
Character set translators
Postgres95 and mSQL database user authentication modules
Web page counters
Extended log formats
URI/URL rewriting modules
Tcl scripting processor
CGI module alternatives
The most important reason to use Apache should be subjective. It should be based on how Apache fits you. Software evaluation is like test-driving a car. You wouldn't buy a car just by looking at the brochure. To feel how you like the server, you need to test-drive it.
Luckily, most of the Web servers out there have some sort of demo version available. Unless you are going to customize some portion of the Apache server, such as writing a custom module, your commitment is small. Should a better product for UNIX appear, your HTML documents would migrate effortlessly. So your risk is minimal. To avoid a crap-shoot, test-drive the software and deploy on the basis of your experiences with it. Experience is your best guide.
While it is impossible to determine in a few hours if a solution will be the right one for you, the Internet is a close knit community of software junkies. Make use of the collective knowledge. Ask questions on Usenet and read the frequently asked questions (FAQ).
Apache is the choice for a server at accessLINK, Inc. They were NCSA users in the beginning, but a few features won them over and they have liked Apache ever since. To gain credibility, I don't have any association with the Apache group, save that I use their software and like it very much. The reasons why I use Apache are the subject of this book. Apache provides many features that make it a very powerful Web server. Its continued development has put pressure on even the high-powered brands that are currently available, which were complacent until recently.