Next: Design of a scalable
Up: Scalability of content-aware server
Previous: Scalability of content-aware server
Scalability remains a main requirement of a modern
Web-based system that should be able to accommodate
for user requests that augment in number and complexity.
Unfortunately, upgrading just the number of servers does not represent
a valid solution to the scalability problem, because this
would move the bottleneck from the (back-end) server side to the
front-end side.
This risk is even more serious when we consider that new Web-based services
require that the front-end component can catch from a request the
largest set of information that exists at
application level but not at TCP level.
Content-aware features augment the front-end scalability issues of one-two
orders of magnitude.
Many solutions have appeared to improve the delivery of
Web content [5,3,8,11] through locally distributed
Web-server systems, briefly Web clusters.
For a recent survey on the topic, see [6].
Basically, a Web cluster is a set of server machines that are interconnected through a high-speed LAN. The cluster is publicized
through one site name and one virtual IP address that
typically corresponds to the address of a dedicated front-end node. This important component, also called
Web switch, is the main focus of this paper. It acts as an interface between the nodes of the
cluster and the rest of the Internet, thus masking the distributed architecture
of the site to the users and the clients. The Web switch receives all
client requests and routes them to a
Web server node through some centralized dispatching policy.
We distinguish layer-4 from layer-7 Web switches. A layer-4 switch performs content-blind
routing that is, it does not take into account any content information
in the client request in performing assigning decisions. On the other hand,
a layer-7 Web switch performs content-aware routing: it first establishes
a complete TCP connection with clients, parses each request and
assigns a Web server node according to the content.
Content-aware routing allows a Web cluster to use sophisticated
dispatching strategies, improves cache hit rates, permits
content partitioning and gets a much larger set of user/client information.
However, it tends not to be used as a front-end component of a popular
Web-based information system because
it has been demonstrated to be less efficient than a layer-4 Web switch.
As an example, Aron et al. [3] show that the peak
throughput achieved by a layer-7 switch
is limited to 3500 conn/sec, while a software based layer-4 switch
implemented on the same hardware is able to sustain a throughput
up to 20000 conn/sec.
To improve scalability of layer-7 architectures, alternative solutions
for scalable Web-server systems, which combine content-blind
and content-aware request functionality,
have been proposed, e.g. [18,27].
The motivation of this paper comes from the observation that
the absolute efficiency is not the right measure to judge the
possibility of using a content-aware Web switch. Indeed, its performance
should be related to the operational requirements that a Web switch should
satisfy in a realistic multi-tier environment. This includes the inter-connection of the Web cluster to the Internet
(for example, the large majority
of Web clusters for economic reasons does not use more than T3-based connections, that have a peak bandwidth of 45 Mbps), the HTTP servers
(with typical workload and modern hardware, they are not the system bottleneck anymore, unless they have to manage
secure transmissions), and the back-end servers
(that can easily become the system bottleneck, when the dynamic requests
are computationally expensive).
Moreover, when the classes
of services provided by the Web site require peak throughputs
higher than 40-50 Mbps, it is more likely that a different
architecture should be considered, for
example a system distributed over a geographical area.
These motivations induced us to investigate whether the previous prejudices
against layer-7 Web switches are still valid when one considers
modern hardware and multi-tier architectures for content-aware distribution in cluster-based Web information systems.
We describe the design and implementation of
an efficient, content-aware Web switch (called ClubWeb-1w) that takes advantage
of all possible features and optimizations of
modern PC-based architecture.
We demonstrate that careful design and implementation
choices produce a Web switch with content-aware functionalities and very limited overheads. A careful analysis
of its performance demonstrates that the proposed
solution is extremely scalable, thus making a
content-aware Web switch a viable solution to the performance requirements
of the majority of popular Web sites based on cluster architectures.
The most important contributions of the layer-7 Web switch are outlined below
and discussed in the following sections.
- Almost all Web switches are based on
two-way architectures that, even if implemented at the kernel level, are less efficient because
both requests and responses transit through the Web switch
[16,21,7,19,30,9,15,13,24,29,1,28,4]. On the other hand, as in [3], the proposed Web switch uses a one-way architecture where just the client-to-server requests
flow through the Web switch, while the larger server-to-client responses
use another way.
- All content-aware dispatching features are implemented and integrated at the kernel level of
a Linux operating system.
- The design and implementation avoid the most serious
inefficiencies existing in other known implementations.
- The Web switch can operate on a single processor architecture or
even SMP architectures
(most experiments refer to a dual Pentium architecture).
In particular, we exploited
the spinlock primitives [26] to guarantee
the most efficient mutual
access to different CPUs to data structures.
- Transfers of TCP connections between the Web switch and the server are based on the so called
TCP Handoff protocol that has been proposed by Aron et al. for the FreeBSD Unix [3].
Our version is the first that has been specifically designed and implemented
for Linux operating systems. (The FreeBSD and Linux kernels
are so different in the choices about the network-based operations
that very few ideas could be taken from the previous
TCP Handoff implementation, not to say about the
optimizations that work only for Linux-based systems.)
- The Web switch design is highly modular from the point of view of request
dispatching policies: we have experimented content-blind,
content-aware, server load-aware, and combinations of
content- and server load-aware dispatching policies, even if a subset of results can be reported in the paper.
- The content-aware distribution mechanism has been designed to be
compliant with both the HTTP/1.0 and HTTP/1.1 protocols.
- The Web servers do not need specific configurations to
communicate with the switch node. Hence, it would be possible to
change the switch node role without reconfigurations of the entire
cluster. This augments the availability of the system.
The implemented Web switch has been subject to a large variety of performance tests.
All results confirm that the proposed layer-7 Web switch has a low
overhead, even when the Web servers tend to be saturated.
Moreover, we show that the Web switch scales pretty well across multiple
server nodes.
Finally, we also evaluate the performance of the Web cluster under
realistic workload conditions.
Again, we show that the
switch is able to handle several thousands of connections per second
without being the bottleneck of the whole system.
We can conclude that the proposed one-way
architecture is extremely scalable, thus making
content-aware routing a viable solution to the requirements
of the majority of network services provided by cluster-based architectures.
The rest of this paper is organized as following.
In Section 2, we describe main requirements,
major issues and our solutions for an efficient design of the layer-7 one-way Web switch.
Section 3 outlines two content-aware dispatching policies
that we use for the experiments. Section 4 presents
the implementation details, with major focuses on the techniques to
obtain the best performance from single- and dual-based processor architectures.
Section 5 contains the performance study.
Section 6 concludes the paper with some final remarks.
Next: Design of a scalable
Up: Scalability of content-aware server
Previous: Scalability of content-aware server
Mauro Andreolini
2003-03-13