Scalability analysis

Next: Performance results for realistic Up: Experimental results Previous: Overheads of the Web

Scalability analysis

From the overhead analysis, we can conclude that the mechanisms behind ClubWeb-1w are very efficient even when the network is close to saturation. We now evaluate the scalability of the proposed Web switch for increasing numbers of server nodes in the Web cluster. We consider two architectures as a machine support for the Web switch (with one CPU, and with two SMP CPUs). The client requests are for a small static file (ca. 1500 bytes, the home page of the Apache site). We evaluate the performance of ClubWeb-1w with one and two CPUs, and for the sake of comparison we consider also a two-way mechanism implemented at application layer that is, TCP Gateway based on mod_rewrite [12] running on same inexpensive PC/Linux machines. Performance comparisons with dedicated (and expensive) commercial products are beyond the scope of our academic research. Unfortunately, the most interesting performance comparison against the other public domain solution [3] was impossible because the version of FreeBSD kernel on which the ScalaServer architecture was implemented is quite incompatible with present PC architectures. Figures 7 and 8 show the system throughput as a function of Mbps and connections per second, respectively. The first result from all these experiments is that the layer-7 Web switch is not the bottleneck of the cluster. The Web switch utilization never reached a critical threshold. Figure 7 confirms that ClubWeb-1w is able to saturate a 100Mbps LAN network even with requests for small files (i.e., 1.5 KB). The scalability of ClubWeb-1w is almost linear without significant performance losses until the bottleneck of the network. On the other hand, the TCP Gateway does not scale over two nodes, serving up to 600 TCP connections per second. These experiments and other not reported results because of space limits should clarify that the common belief about the poor scalability of content-aware Web switches concerns two-way architectures. On the other hand, a careful kernel-based implementation of a one-way system that can take advantage of a simple SMP architecture does not seem to have any performance problem to provide content-aware functionality. Layer-4 solutions remain one-two orders of magnitude faster, but the question is whether it is really necessary to have a throughput higher than that shown here for a locally distributed Web system. The conclusion is that when a Web site has to manage more than ten thousands connections per second it is better (even for availability reasons) to pass to a different architecture, such as two or more Web clusters distributed over different network locations.

**Figure 7:** Throughput in Mbps.
$\begin{figure} \centering \epsfxsize 8.0cm \epsffile{p740-scala_thr.eps} \end{figure}$

**Figure 8:** Throughput in connections per second.
$\begin{figure} \centering \epsfxsize 8.0cm \epsffile{p740-scala_conn.eps} \end{figure}$

Next: Performance results for realistic Up: Experimental results Previous: Overheads of the Web

Mauro Andreolini 2003-03-13