The efficient delivery of internet content has been identified as a key issue of research for some time. Forward (or reverse) proxies, which are positioned along the request route from the users' browsers to the origin content servers, maintain a cache with copies of content from their origin servers. The strategic placement of proxies across the backbone ISP network (or the Content Delivery Network) can drastically improve the performance of the system (in terms of network bandwidth savings, origin server load, and user-request latency).
The ultimate goal of this work is to develop a tool that decides on the position and the number of proxies required in order to achieve given performance improvements (expressed in terms of network bandwidth, origin server load, and user-latency). We believe such a tool will be very helpful to ISPs/CDNs, content providers, and end-users and is, thus, very much lacking.
Web proxies, caching, replacement algorithms, performance measurements, proxy placement algorithms.
Proxies are categorized into forward or reverse proxies (the latter are also called surrogate servers). The former can be found in ISP network backbone nodes, intercepting all web traffic. The latter are found in CDN servers being contacted only for content requests belonging to origin servers, which have specific business agreements with the CDN. Cached content in reverse proxies is under the explicit control of the content provider. Despite these differences, both proxy types perform fundamentally the same caching tasks. Thus, in both types of environments, appropriately placed proxies within a network can prove very beneficial for the systems' efficiency. Therefore, the problem of appropriately placing proxies within a given network constitutes a fundamental issue for the efficient delivery of content.
The central aim of this research is to ultimately develop a tool, coined ProxyTeller, for the efficient placement of proxies. The problem ProxyTeller solves, accepts as inputs the network topology over which proxies will be placed, the desired sizes of their caches, the characteristics of the workload of the community served by a proxy (i.e., representative workload traces), and the performance improvement that is desired. The latter can involve one of the metrics of network bandwidth, user-request latency, and hit ratio and is specified in terms of the percentage improvement that is desired to be achieved, compared to the case where no proxies are deployed. The output of ProxyTeller is the position and the number of proxies required in order to achieve the stated performance goals.
We model the network under which the proxy placement takes place as a graph with nodes and edges connecting them. This graph could represent the backbone network of a large ISP. Each one of the nodes covers a concrete geographic region with thousands of users connected to these nodes. The nodes are connected with weighted edges. These weights show how loaded the links that connect each node are. Nodes also have weight. The node weight represents the quantity of information that all users connected to this node request from the web origin server and can be derived by appropriately processing the expected workload of the web origin server.
The performance impact of proxy servers plays an important role. Based on our previous work on the performance evaluation of proxy cache replacement algorithms, [3] we were able to summarize the proxy performance under different sizes of available cache, different values of the parameter e of the zipf popularity distribution, and different replacement algorithms (LRU, CSP, GDS(1), and GDS (lat)). For each combination of these values, we know which the preferred replacement algorithm is, and its performance in terms of BHR (Byte Hit Ratio).
The placement algorithm tries to place the proxy servers in those nodes that will lead to the minimum total proxy placement cost. The proxy placement cost is computed taking into account the edge and node weights, the given topology, and the performance of each proxy (which is measured by the BHR metric).
We examined the performance of the well-known Greedy, HotSpot, and Random algorithms for web-server replica placement. The Greedy algorithm [1], greedily places proxies in the network that yield the minimum total placement cost. HotSpot [1], tries to place proxies near communities of users that generate the greatest load. For the performance study of the three algorithms, the backbone topologies of two major ISPs (AT&T and C&W) in North America were used. The performance results were similar for both topologies, AT&T and C&W. It clearly appears that the Greedy algorithm is considerably more efficient, so we utilize it in the remainder.
Likely, users of ProxyTeller will be administrators, who wish to place appropriately proxy servers in nodal points of the Internet. This tool could also be used by bandwidth-selling corporations, for associating envisaged investments with desired performance improvements. It can be used by content providers to estimate the off-loading of their origin servers, as well as the performance seen by their customers/consumers. Finally, it can be used also for testing different placement algorithms and how they interact with replacement algorithms. In the section that follows we present the tool's basic components and functionality.
A Description of ProxyTeller
We used off-the-shelf tools for developing interactive internet applications, in order to make our placement tool easily accessible from a large number of users. A demo for the tool is available [2].
Initially the user becomes a member of the system. In the main page, the user defines certain parameters that are essential for the execution of the algorithms. These are:
Network Topology. The user can select a predefined network topology, such as AT&T and C&W, or create her own.
Expected Workload. A trace file, which will contain the request stream, in order to compute the node weights is essential. The characteristics of the workload are measured so as to determine the expected proxy server performance.
Available Cache Size. The user is asked to define the size of the proxy caches that are going to be placed in the network.
Desired Performance Improvement. Define the percentage of the desired performance improvement to be achieved (and/or the number of proxies) by placing the proxy servers.
Performance Metrics. Determine the metric of interest for individual proxy servers. There are three choices: bandwidth requirements, latency, and hit ratio.
How it Works
The Greedy placement algorithm is applied until the desired performance improvement specified by the user is met. The tool starts by processing the expected workload in order to compute the node weights for the given topology. In addition, the request stream is being analyzed in order to define the value of theta of the zipf object-popularity distribution. This value and the available disk size are then used as an index to the performance table so as to specify the best replacement algorithm and the performance (expressed with BHR) of the proxy, placed in each node. The next step involves the computing of the total proxy placement cost. Finally, the Greedy algorithm is applied, placing more and more proxies, until the desired performance improvement is met.
The output shows the network with the proxy servers placed, the percentage of improvement achieved, the number of proxies placed, and the replacement algorithm that the tool proposes for each proxy server.
Example Scenarios Utilizing ProxyTeller
In this section we show indicative scenarios for which ProxyTeller can be called to provide an answer. The scenarios are:
Single Web Origin Server with Proxies / Full Replicas. Consider a small organization, with a single web origin server, and its need to decide where to place proxy servers within the network to improve the efficiency of content delivery to clients. Supplying the expected workload (i.e., the web servers' trace file for a period of time), and the network topology to ProxyTeller, the tool can compute the best placement based on the expressed needs (i.e., how many proxy servers the organization can afford). Similarly, ProxyTeller can be called to place full replicas, instead of proxies, able to store all objects of the web origin server.
Several Web Origin Servers with Proxies / Full Replicas. This scenario could emerge when there are many web origin servers storing different content. For example, a large CDN may wish to decide where to place reverse proxies and/or full replicas within its network to improve efficiency.
Incremental improvement: Adding Surrogate Servers and/or Full Replicas in CDNs . Within a CDN the goal can be to improve further the overall performance by placing surrogate and/or full replica servers.
The results of ProxyTeller can be used to decide whether it is preferable to place surrogate servers vs full replicas.
ProxyTeller is a novel tool guiding the efficient placement of web proxies for the efficiency of content delivery. ProxyTeller is based on two pillars: (i) on a study of the performance impact of different algorithms/mechanisms for web proxy cache replacement, and (ii) the adaption of full-server replica placement algorithms into a proxy (partial replica) placement setting and testing their performance, given the results from (i).
The tool can be used for both forward and reverse proxies and, we hope, it will be of real use to the organizations maintaining them. We note that our approach with respect to the efficient content delivery problem within the framework of CDNs is different than the one employed by works focusing on content placement. Instead of continuously deciding which objects to replicate and where, we decide how many proxies to place and where, with each proxy employing internally the best algorithms/mechanisms for deciding which objects to have in its cache, given the specific circumstances (cache size, access skews), utilizing thus the wealth of previous research results.
We have implemented the tool, making it available through the internet to the community [2]. Our ultimate goal is to accumulate and combine related research results, which form its basis (i.e., new results on the performance of various mechanisms improving the performance of single proxies, improving/extending the cache replacement problem solution chart, and new results for server/proxy placement), make it available to all, and provide the infrastructure to utilize it in order answer key questions faced in the context of efficient content delivery.