Caching schemes have been proposed for www clients for reducing network traffic and improving access latency of the document. Collaborative caching of web documents at client end has been shown to be an effective technique for reducing web traffic and improving access latencies. This paper proposes a distributed scheme for collaborative caching, in which the proxy server does not cache actual documents, but maintains an index of the local caches of the individual users that it services. The proxy server itself is distributed in nature, which ensures high availability and load balancing.
The enormous growth of WWW based services has made the network traffic a major concern over the last few years. It is common to encounter long delays in web document retrieval due to slow connections, network congestions, remote server overloading etc. Most WWW clients use memory and disk caches for speeding up accesses to frequently used web documents.
In collaborative caching schemes, read access to cached document is given to the cooperating group of users. Such an integration creates an illusion of a single cache of much bigger size. Since all users of the group use the same integrated cache, it has an added advantage that the probability of the next accessed document being found in the cache is high. Collaborative caching schemes provide a better hit ratio than individual caching schemes. Hit ratios in the range of 30-50% have been reported in [1].Caching proxy servers implement collaborative caching.
Proxy servers control all accesses to the web from the subnet they service. Caching Proxies can be designed in a centralized or distributed fashion. In the centralized design, in addition to servicing the requests of the clients from the subnet, a proxy also caches the documents in its local cache. Here, the cache maintained by the proxy contains the documents accessed by all the individual clients. Making it available to everyone ensures a better cache hit rate [4]. The disadvantages of this scheme arise from the centralized nature of the proxy server and the cache. Systems that use centralized servers are not considered fault tolerant and can become hot-spots and bottlenecks when the load is high. The issues become more prominant when a single proxy is the only gateway to the internet for a large group of users. Hence a proxy server serving a large group of people generally needs a lot of resources.
This paper proposes a Distributed alternative to Collaborative Caching, (Henceforth referred to as DCC) in which the proxy server does not actually cache any documents, but maintains an index of all the documents cached by its clients within the subnet. The proxy itself is designed as a distributed server for providing high availability and reliability.
New technology trends make it possible to make the performance of this scheme comparable to other collaborative caching schemes [6] [3] [5]. First, LANs are becoming much faster than WANs. In addition to lower bandwidth, WAN traffic uses heavyweight protocols, which worsen the latency for short messages. On the other hand, LANs have higher bandwidths, and lightweight protocols have been developed for the high speed LANs, which benefit smaller messages between machines [9] [10] [11]. Hence, message passing within the distributed proxy server is always fast. Due to these improvements, the time required to consult local machines is a small fraction of the time required to fetch the required document from remote WWW server. Secondly, individual workstations now have fast processors and local disks of very high capacities, which can allow high disk cache sizes for individual clients.
The DCC scheme makes use of data cached in the individual clients, and hence new security issues come in picture, which do not arise in case of centralized caching scheme.
This section provides a brief overview of the DCC system. DCC is being implemented on a network of Sun workstations running Solaris2.5.1 connected by Myrinet[12]. The WWW client used is Netscape 3.01Gold.
The DCC system consists of a distributed proxy server and WWW clients. The clients referred to above differ from the currently existing WWW clients (eg Netscape) in that they have an additional functionality to receive requests for cached documents from the proxy server and service them. Since no commercially available clients (including netscape) support this, there can be two ways to provide the facility :
We chose to use the second option, in spite of the difficulties involved, because the first option needs access to the sources of Netscape, which we do not have, and secondly, the second option is more general purpose, and can be used with other clients with little modification.
The basic block diagram of the system is shown below :
The WWW client has to be configured for the proxy connection. Any request sent by the client first goes to the proxy. The proxy looks up the cache index in order to locate the document within the subnet and on success returns the user id of the user who has cached the document, filename and date and time of last update. If located, two requests are sent :
If the reply from the remote server implies no change in the document since the last update date supplied, the document received from the user stub is sent to the requesting client.On the other hand, if the remote server sends the document, it is sent to the requesting client, and the proxy's index of the locally cached documents is updated.
In addition to serving the requested document, the stub also supplies a list of related documents to the proxy, that have been cached by the client. This data is used by the proxy to maintain a TLB.
Since the individual cache is under the ownership of each user, many interesting security and protection problems come in picture. This design of DCC attempts to provide a very secure access to the cached documents. Also, keeping in mind that most clients were not designed for collaborative caching, DCC is kept completely transparent to the client. Only the proxy server and the user stubs know about the caching scheme.
The second scheme is better than the first scheme, as it avoids message passing in the critical path, but it has additional overhead of maintaining consistency of the tables and dealing with inconsistencies.
Every process of the proxy server maintains its own usertable and the TLB on every machine.
The above measures ensure that while and after receiving a document, the user is completely unaware of the source of the document - either the remote server, or the local cache of other user. The UNIX protection bits also stop other users to look into the cache of any other user. There may exists a covert channel. From the time required for accessing the document, the users may conclude that the document was served form a local cache, but they will be unable to identify the user who supplied it.
We run the processes of the proxy on a set of 8 machines, under a common namespace ( eg machines in room 214 of Pond Lab are collectively called Pond-214) Specifying pond-214 as the location of the proxy provides a flexibility to run a distributed proxy and still provide an interface like a centralized proxy.
This system is being developed on a Network Of Workstations (NOW) platform. We expect the performance to be comparable to that of the centralized caching schemes under low load. Due to the distributed nature, under heavy load, we expect better response times, load balancing and no hot-spot effect, with additional reliability and availability with no significant dedicated hardware requirments.
Although the current implementation is for the NOW platform, it can be implemented on any network, which may result in a slight degradation of performance.
The web is moving towards a cached and mirrored architecture. Caching clients, servers and intermediate agents are coming in common use. A robust design of these components is therefore important. NOW is a fast and reliable platform for providing these components The DCC scheme described above runs on a network of workstations, can provide performance comparable to centralized caching proxy servers, and stilll be highly available and reliable.
[1] Marc Abrams et al, Caching Proxies : Limitations and Potentials, Computer Sc Dept, Virginia Tech, Blacksburg, VA 24061-0106 USA
[2] Michael D. Dahlin, Randolf Y. Young, Thomas Anderson, David Patterson, Cooperative Caching : Using remote Client Memory to Improve Filesystem Performance, OSDI 1994.
[3] Chanda Dharap and Mic Bowman, Rudimentary Type Analysis of Wide-Area Accesses, Tech Report: TR CSE-96-044, Department of Computer Science and Engineering, The Pennsylvania State University.
[4] Chanda Dharap and Mic Bowman, Preliminary Analysis of Wide-Area Access Traces , Tech Report : CSE-95-030 , The Pennsylvania State University
[5] Chir Ben Abdelkader, A Prefetching Scheme for the World Wide Web, MS Thesis, Department of Computer Sc and Engg, The Pennsylvania State University, 1997.
[6] Cache Now! Campaign http://vancouver-webpages.com/CacheNow/detail.html
[7] Various technical reference pages, Netscape Communications Corporation, http://www.netscape.com
[8] Yennun Huang and Chandra Kintala, Software Fault Tolerance in the Application Layer, Chapter 10, Software Fault Tolerance, Edited by Michael R. Lyu, John Wiley & sons
[9] S.Pakin, M.Lauria, and A.Chien, High Performance Messaging on Workstations: Illinois Fast Messages (FM) for Myrinet, Proceedings of Supercomputing '95, December 1995.
[10] T.von Eicken, A.Basu, V.Buch, and W.Vogels, U-Net: A User-Level Network Interface for Parallel and Distributed Computing, Proceedings of the 15th ACM Symposium on Operating System Principles, December 1995.
[11]T.von Eicken, D.E. Culler, S.C. Goldstein, and K.E. Schauser, Active Messages: A Mechanism for Integrated Communication and Computation, Proceedings of the 19th Annual International Symposium on Computer Architecture, pages 256--266, May 1992.
[12] N.J. Boden et al. Myrinet: A Gigabit-per-second Local Area Network, IEEE Micro, 15(1):29--36, February 1995.
[13] T.Anderson et al. , A case for networks of workstations, IEEE Micro, pages 54--64, February 1995.
Acknowledgements - We thank Dr. Anand Sivasubramaniam and Dr. Thomas Keefe for their encouragement and help.