Hans-Werner Braunhttp://www.sdsc.edu/0/SDSC/Research/ANR/
Kimberly Claffyhttp://www.sdsc.edu/0/SDSC/Research/ANR/kc/kc.html
Applied Network Research
San Diego Supercomputer Center
Thu Sep 15 22:53:16 PDT 1994
We analyze two days of queries to the popular NCSA Mosaic server to assess the geographic distribution of transaction requests. The wide geographic diversity of query sources and popularity of a relatively small portion of the web server file set present a strong case for deployment of geographically distributed caching mechanisms to improve server and network efficiency.
The NCSA web server consists of four servers in a cluster. We show time series of bandwidth and transaction demands for the server cluster, and break these demands down into components according to geographical source of the query. We analyze the impact of caching the results of queries within the geographic zone from which the request was sourced, in terms of reduction of transactions with and bandwidth volume from the main server. We investigate a range of timeouts for flushing documents from the cache, outlining the tradeoff between bandwidth savings and memory/cache management costs. We discuss the implications of this tradeoff in the face of possible future usage-based pricing of backbone services that may connect several cache sites.
We also discuss other issues that caching inevitably poses, such as how to redirect queries initially destined for a central server to a preferred cache site. The preference of a cache site may be a function of not only geographic proximity, but also current load on nearby servers or network links. Such refinements in the web architecture will be essential to the stability of the network as the web continues to grow, and operational geographic analysis of queries to archive and library servers will be fundamental to its effective evolution.
keywords: traffic analysis, geographic distribution, server workload, caching, accounting