Next: Performance results for realistic
Up: Experimental results
Previous: Overheads of the Web
From the overhead analysis, we can conclude that the mechanisms behind ClubWeb-1w are very efficient even when the network is close to saturation.
We now evaluate the scalability of the proposed Web switch for increasing numbers of server nodes in the Web cluster.
We consider two architectures as a machine support for the Web switch
(with one CPU, and with two SMP CPUs).
The client requests are for a small static file (ca. 1500 bytes, the
home page of the Apache site).
We evaluate the performance of ClubWeb-1w with one and two CPUs, and
for the sake of comparison we consider also a two-way mechanism implemented
at application layer that is, TCP Gateway based on
mod_rewrite [12] running on same inexpensive
PC/Linux machines. Performance comparisons
with dedicated (and expensive) commercial products are
beyond the scope of our academic research.
Unfortunately, the most
interesting performance comparison against the other public domain solution [3] was impossible because
the version of FreeBSD kernel on which the ScalaServer architecture was implemented is quite incompatible with present PC architectures.
Figures 7 and 8 show the
system throughput as a function of Mbps and connections per second, respectively.
The first result from all these experiments is that the layer-7 Web switch is not the bottleneck of the
cluster. The Web switch utilization never reached a critical threshold.
Figure 7 confirms that ClubWeb-1w is able to
saturate a 100Mbps LAN network even with requests for small files (i.e., 1.5 KB). The scalability of ClubWeb-1w
is almost linear without significant
performance losses until the bottleneck of the network. On the other hand,
the TCP Gateway does not scale over two nodes,
serving up to 600 TCP connections per second.
These experiments and other not reported results because of space limits
should clarify that the common belief
about the poor scalability of content-aware Web switches
concerns two-way architectures. On the other hand,
a careful kernel-based implementation of a one-way system that can take advantage
of a simple SMP architecture does not seem to have any performance
problem to provide content-aware functionality. Layer-4 solutions remain one-two orders of magnitude
faster, but the question is whether it is really necessary to have
a throughput higher than that shown here for a locally distributed
Web system. The conclusion is that when a Web
site has to manage more than ten thousands connections per second
it is better (even for availability reasons) to pass to a different architecture, such
as two or more Web clusters distributed over different network locations.
Figure 7:
Throughput in Mbps.
|
Figure 8:
Throughput in connections per second.
|
Next: Performance results for realistic
Up: Experimental results
Previous: Overheads of the Web
Mauro Andreolini
2003-03-13