In our poster we present two functional examples of highly portable minimalistic web server applications. One of these is written in ANSI C, while the other one is simply a Bourne Again Shell script. These simple softwares provide a surprisingly large variety of opportunities and services, including all of those which can be regarded as fundamental. They may serve as a counterpoint to nowadays' development towards expansible but highly resource consuming server software.
C.2.2Network Protocols Applications
H.4.3 Communication Applications Miscellaneous web server, portability, performance, reliability
It has become apparent in the last few years, that World Wide Web
became the most frequently used application of the Internet. It has
overtaken the traditional functionality of ftp, gopher, USENET news
etc. Moreover, many of the e-mail users use SMTP indirectly through
a WWW front-end. It would be therefore hard to overemphasize the
importance of questions and problems of the development of the
development of the underlying server-side software.
Both HTTP standards , and World
Wide Web servers have undergone a lot of development and
improvement since the appearance of the CERN HTTP Daemon. Regarding
the protocols, MIME, support for keep alive requests, server side
cache control, and etags may be listed as some of the key steps in
this development .
Meanwhile named virtual host support, auto-indexing of the
filesystem, proxy support, encrypted communication and support for
several script languages may be listed amongst natural requirement
for an up-to-date server software.
Examining this development of
WWW server software more closer, one finds however that they have
become rather complicated, and sometimes their operation can be
regarded as byzantine in many of the cases. Installing a new
functionality on a working web server, one can easily damage its
already existing functionality. This may be the consequence of
sophisticated dependencies, and less rigorous hierarchy. An average
apache server loads about forty modules in average for its
operation.
The question naturally arises: is this growing
complexity due to a purposive development, or is it simply an ad
hoc side effect of the more and more complex, and even
conflicting requirements. One may claim that it is a similar
situation to that of the operating system of the third generation
of integrated circuits and multiprogramming in the 1970's .
The appearance of universal operating systems had induced a demand
of a huge number of applications. This was beyond the scope of the
technical level and programming methodology of the age, which lead
to practically useless operating systems with a huge number of
untreatable bugs. Only the development of system design concepts
and underlying methodology lead to a solution.
The main aim of our
poster is to draw the attention to this possibility. We show that
it is indeed possible to create web server software supporting
surprisingly many of nowadays' requirements, using very limited
resources, while keeping the code itself simple and tractable. Our
main concern was not to produce a functional code: several such
attempts have been made already (see e.g. [1]). We rather
intend to demonstrate certain ideas. We have constructed two web
server applications, which are functional demonstrations of the
latter principles.
This paper is organized as follows: in
Section 2, a minimal
web server written in C programming language is introduced.
Section 3, a
working web server written in Bourne Again Shell (bash) is
introduced. Section 4 summarizes our conclusions.
This server may be regarded as a HTTP skeleton written in POSIX
ANSI C. It uses no external libraries, but the standard POSIX
features of the operating system only. Therefore it is utmost
portable, can be used on any POSIX compliant operating system. (For
sake of simplicity, it depends slightly on minor features of the
init available on the Debian/GNU Linux platform (version
3.0) on which it has been developed, but this dependency can be
easily eliminated.) It is a stand-alone network server, which
performs extremely well in case of high hit-rates. It supports html
and image files, but may be easily expanded to support all MIME
types. Interactions are logged by the syslog function.
Our demonstration version possesses certain limitations,
such as no auto-index feature or script languages are supported.
These could however be included simply.
The operation of the server can be
described simply: it can even be understood by tracing its system
calls. The program first sends an accept() system call,
which notifies the operating system that it listens connections. As
a connection appears, a fork() call is sent, which
duplicates the core image of the program, thereby enabling the
acceptance of another connection. The treatment of the request is
carried out then by a read() call, which reads the request
of the client from the network. Then a stat() call
examines whether the file found in the header exists. If it is the
case, then the file in argument is loaded into the system memory by
an open()- read()-close() call triplet.
This is followed by two write() calls, sending the header
and the file itself. The network connection is closed by a
shutdown() call, and then the forked process is
terminating by an exit() call. (Remember, that meanwhile
another replica of the server is running in order to receive
further calls.)
The program source consists of 169 lines providing
a single binary of about 6 kilobytes. (This is 0.7% of the size of
the apache binary on the same system.) Though it is of limited
functionality, It might even be practically useful for e.g.
providing local documentation available on the operating system.
Though it is a very minimal application, it provides a partial
support for the HTTP/1.0 standard , and
therefore it provides more resources than the original CERN HTTPD.
Having a system which has a POSIX Internet Superserver Daemon (
inetd or xinetd) running, a single UNIX shell
such as Bourne Again Shell (bash), supplemented a few
standard POSIX file utilities enables us to create a minimal web
server resembling nowadays most frequently used web servers. (In
detail, bash ver. 2.0.3, grep ver. 2.5.1,
file ver. 3.39, cat ver.4.5.10, cut ver.
4.5.10, date ver. 4.5.10, ls ver. 4.5.10, and
rev has been used in the actual implementation.) The
resources applied provide an extreme portability of this software.
Our aim was to provide a demonstration server producing a full
HTTP/1.0, and partial HTTP/1.1 compliance . In
addition, it provides logging features, and uses configurable
modules providing auto-index and vhost support. The server, in
contrast with those used in nowadays' practice, requires
practically no memory if no requests are present, therefore it is a
good solution for not frequently used servers.
Though this server
does still not support features such as etags, it may be regarded
as useful in practice. The application of a high-level programming
language provides us with the opportunity of easy modification and
development of the software, it introduces certain disadvantages.
Indeed, our bash-based server, though supporting more
features than that introduced in the previous Section, it is much
more limited in performance. Security issues are partially
transferred towards the operating system itself.
Regarding the
operation of the server, first we should note, that the
communication with the client is carried out via the standard input
and output (STDIN/STDOUT) of the script itself. Thus this program
is not a stand-alone application, it is inetd based. The
script is practically an infinite loop, terminating if no keep
alive requests are received from the client. This is required since
starting an interpreted program takes more time.
The header
received from the client is read line-by-line, and it is parsed by
the standard grep utility. Meanwhile a timeout is taken
into account against broken communications with the client
(possibly due to an attack), provided by the read builtin.
The full header may be logged, if the debug option is enabled in
the configuration file.
It is important to verify whether a double
dot string is present in the request, which is not allowed by the
HTTP/1.0 standard, as it would make it possible to go beyond the
document root at the server. A 404 error code is sent as a reply in
this case. In the next step, it is verified whether the requested
file is a regular file or a directory. If it is a regular file, it
is sent if it exists, a 404 error is sent otherwise. If it is a
directory, the presence of the standard trailing slash is verified,
and an error code 301 is replied in case of error. Next, the
presence of the index file is present in the requested directory.
The possible names of index files can be set in the configuration
file. If the index file is present, it is sent to the client. In
the absence of index files, the auto-index module of the script is
called, if auto-indexing is enabled. If auto-indexing is disabled,
the 403 error code is replied.
MIME-types are identified with the
standard file utility, while the last modification time is
obtained by the ls program. The locale is set to C, in
order to be able to give HTTP time. The header is composed with
builtin commands, and it is sent in an utmost simple way in a
binary fashion using the cat utility.
The named virtual
host support is provided by a loadable module, which can be enabled
in the configuration file. The actual hostname is extracted from
the header, and compared with the list of virtual hosts in the
configuration file. According to this data, the appropriate
document root is set.
The base script all together consists of 106
lines. In addition, there is a 6 lines long module for auto
indexing and a 18 lines long module for vhosts. This strikingly
illustrates the extreme simplicity of the program, especially
compared with its capabilities.
We have presented two functional examples of portable minimal web
servers, which can possibly be applied in everyday practice. We
believe that these illustrations may serve as a notification to
developers, drawing their attention to certain points. On one hand
it shows, that existing high-level programming tools are useful in
application protocol testing and development. Even though sometimes
more resources are needed, it might be indeed worth to tend towards
simple and tractable codes as well. The programs may be even of
practical use in their current form as well. Potential application
include providing local documentation on a non-networked computer,
application on less frequently used web-servers, and providing
static html pages.
The softwares introduced on our poster are
available under the GNU General Public License at URL:
ftp://linux.pte.hu/pub/minwebservers/.
The authors thank Mátyás Koniorczyk at Institute of Physics, University of Pécs for inspiring discussions.