Exploiting Neglected Data Locality in Browsers

Li Xiao, Xiaodong Zhang
Department of Computer Science, College of William and Mary, Williamsburg, VA

Abstract

With an significant increase of memory and disk capacity in workstations and PCs, and with the fine improvement of  web browser caching capability, users are able to enlarge  the browser cache size for more frequent accesses to the cached data objects and to retain them in an organized manner for a longer period of time.  However, the browser caches are not shared among themselves and the available locality in browsers are neglected in Web proxy caching. In this paper, we propose an enhanced caching technique, called ``Browser-Aware Proxy Server",  to exploit theneglected data locality in browser caches for further performance improvement. Conducting trace-driven simulations, we present three new findings and contributions: (1) The neglected data locality in browser caches is significant and can be utilized for improving caching performance. (2) We show that the browser-aware proxy server outperforms a browser-unaware proxy server by 21% and 40%, measured by the average hit ratio and byte hit ratio, respectively. We also show that web server access latency can be reduced to 35% with a slight increase of local traffic in the client side.

Introduction

Proxy caching is an effective solution to quickly access and reuse the cached data on the client side and to reduce internet traffic to web servers. A group of networked clients connects to a proxy cache server, where each client has a browser with its cache buffering popular and recently requested  data objects. A standard web caching model built on such a system has the following data flows.Upon a web request of a client, the browser first checks if it exists in the local browser cache. If so, the request will be served by its own browser cache. Otherwise the request will be sent to the proxy cache. If the requested data object is not found in the proxy cache, the proxy server will immediately send the request to its cooperative caches, if any, or to an upper level proxy cache, or to the web server without considering if it exists in other browsers' caches. We believe there are three practical reasons for a proxy server to exclude this consideration. First, the browser caches are not shared and the dynamic status in each cache is unknown to the proxy server. Second, the possibility of a proxy cache miss which is a browser cache hit may have been considered low although no such a study has been found in literature. Finally, a browser cache was initially developed as a small data buffer with a few simple data manipulation operations. Users may not effectively retain the cached data with a high quality of spatial and temporal locality.

How much potential benefit can we gain in caching performance by exploiting the neglected locality? First, since browser caches are not shared among themselves, it is certainly possible that a data object is stored in one or more browser caches, but has been replaced in the proxy cache. Secondly, browsers provide a function for users to set the browser cache size. With the rapid increase of memory and disk capacity in workstations and PCs, and with the rapid growth of web applications, user browser cache size will tend to increase as time passes.Thirdly, with the aid of more functions provided by browser softwares, users will pay more attention to the organized browser cache data objects, and tend to keep them in the the cache much longer than to keep the unorganized data objects. Fourthly, new technologies have been developed to improve the browsing speed. Finally, the number and types of web servers have increased and will continue to increase dramatically, providing services to a wider and wider range of clients with more diverse interests. It is impossible for proxy caches to cover all multi-requested file objects of tremendous types even with a perfect cache replacement strategy, increasing the possibility that browser caches will keep file objects that have been replaced in proxy caches. In summary, the quality of spatial and temporal locality in browser caches has been and will continue to be improved.

Browser-Aware Proxy Server

In this design, the proxy server connecting to a group of networked clients maintains an index file of data objects of all clients' browser caches.  If a user request misses in its local browser cache and the proxy cache,  the browser-aware proxy server will search an index file attempting to find it in a client's browser cache before sending the request to an upper level proxy cache or the web server.  If such a hit is found in a client, the proxy will collect the file object from this client and then forward it to the requesting client.

In order to implement the browser-aware concept in a proxy server,  we create a browser index file in the proxy server. This index file records a directory of cached file objects in each client machine. Each item of the index file includes the ID number of a client machine, the URL including the full path name of the cached file object, and a time stamp of the file, or the TTL (Time To Live). Since the dynamic changes in browser caches are only partially visible to the proxy server (when a file object is sent from the proxy cache to the browser),  the browser index file will be updated periodically by each browser cache.

Performance Evaluation

We have used several web traces for performance evaluation:  (1)NLANR traces [2]: We have used one day's trace from the ``uc" proxy,  one day's trace from the ``bo1" proxy, and  one day's trace from the ``pa" proxy. (2) Boeing traces [1]: We have used one day's trace on March 4, 1999 and one day's trace on March 5, 1999 from the sixth proxy. We have built a simulator to construct a system with a group of clustered clients connecting to a proxy server.  The cache replacement algorithm used in our simulator is LRU. We have implemented and compared the following web caching organizations using the trace-driven simulations:  (1) Proxy-and-local-browser:  each client has a private browser cache, and there is a proxy cache server for the group of client machines. If a request misses in its local browser, it will be sent to the proxy to check if the requested document is cached there. If it misses again, the proxy will send the request to an upper level server. (2) Browser-aware-proxy-server: this is the enhanced proxy caching technique presented in the previous section.

We compared the (byte) hit ratios of the two policies on the ``NLANR-uc" trace, the ``NLANR-bo1" trace,  the ``NLANR-pa" trace, the ``boeing-4" trace, and the ``boeing-5" trace, respectively. The average  hit ratio improvements of the browser-aware-proxy-server for the five traces are 17.6%, 16.27%, 20.95%, 13.05%, and 12.6%, respectively. The average byte hit ratio improvements are 17.1%, 14.34%, 13.58%, 39.46%, and 39.94%, respectively. Our experiments also show that the browser-aware-proxy-server achieves an average latency reduction of 35%, compared with the proxy-and-local-browser scheme. The additional overheads of the browser-aware proxy cache come from (1) the data transferring time for the hits in remote browsers; (2) the update of the browser index file if the update is not conducted at a suitable time or conducted too frequently; (3) the space requirement of the proxy cache to store the browser index record. Our analyses show that the overheads are insignificant.

Summary and Technical Issues

We have proposed and evaluated the browser-aware proxy server to exploit neglected data locality in browsers. Our study shows that neglected locality in browsers is significant and should be utilized. We have shown that proxy caching performance can be further improved by exploiting browser cache locality with a simple software structure.

Besides exploiting data locality in browsers, our next objective is to effectively control and manage the data flows among the proxy and browser caches to make the proxy cache as the commonly shared place, and to make each browser mainly for individual usage. In order to incorporate these objectives to caching system designs and implementations, we must address two major technical issues. First, a client browser is not normally configured as a server. One solution for this is to add some simple server functions to a browser so that it is able to actively communicate with other clients and the proxy server. Another alternative is to let the proxy provide additional services on behalf of each browser. Second, the reliability and security of the browser data must be seriously considered. For example, the browser data files which have been modified by an owner client are not reliable for sharing among clients. In addition, the contents of browsers should not be visible among clients to preserve the privacy of each client. This concern can be addressed by requiring the proxy server to handle the data searching and transferring among the browsers.

References

  1. Boeing log files. ftp://researchsmp2.cc.vt.edu/pub/boeing/
  2. National Lab of Applied Network Research. http://www.ircache.net/