Next: requests per minute Up: Web traffic characterization: an Previous: server statistics for

trace-driven analysis

We used the two days of traces to simulate the results of caching documents locally upon their first reference to the NCSA server. Our trace-based analysis assumes that each zone maintained its own local cache during those two days. The local cache acts as an intermediary between the client and the central server; upon the first request for a document, the cache agent retrieves it from the central server and keeps a local copy. Future requests for that document within a certain timeout value, the effect of whose range we also investigate, would not require the query or document to traverse the national backbone to reach the central server. We then compute the savings such a policy offers in terms of backbone bandwidth and transactions to the central server.

We note that the benefit of a cache may not just be for bandwidth savings but for savings in demanded workload on the information server trying to process the queries, depending on the indivdual network situation. Our results apply to conservation of both resources.

A parameter of particular interest is the cache timeout, or how long a document can remain in the cache without being referenced before it is removed. Using cache timeouts of 2 to 10000 seconds, we track how many requests would require intervention from the central server, and how many the local cache could handle. We also keep track of how much memory the local cache would require to hold all documents that have not timed out yet.

We do not simulate cache replacement algorithms. Our goal is rather to demonstrate the benefits of caching and determine minimal reasonable cache size requirements. We assume that local disk for the cache is not an issue, i.e., disk storage is relatively cheap and more a matter of administrative planning. Cache contention is only an issue if the cached resource is expensive. Since one can purchase a gigabyte hard disk today for under $500, and our measurements show that much smaller caches can yield significant benefits, we did not focus on contention of this resource. As a temporary measure one can always reduce the cache timeout until additional disk space is procured. We expect however, that the timeout parameter will likely be important not for reasons of cache contention but rather to prevent documents from getting stale as new versions are placed on the original server without cache updates.




Next: requests per minute Up: Web traffic characterization: an Previous: server statistics for


kc@
Thu Sep 15 22:53:05 PDT 1994