Web Cache Coherence

The Caching Tradeoff

Benefit

Cost

How Distributed File Systems Maintain Coherence

Callback not scalable to the Web

Validation Checks

Staleness

Def:Given a cached copy of a page, let its LAST-GOOD TIME equal the time at which its content was last identical to the master copy of the page. If the cached and master copies are still identical, let the LAST-GOOD TIME equal the current time.

Def:Given a cached copy of a page, let its STALENESS equal the time elapsed since its last-good time.

Http headers used to maintain coherence

Last-Modified:

If-Modified-Since:

Date:

Pragma:no-cache

Expires:

Existing Mechanisms

Naive (Netscape Navigator)

check always

never check

expiration-based coherence (W3C httpd, Netscape Proxy)

Problems of Expiration-Based Coherence

1) users must wait for expiration checks to occur

2) When not satisfied with staleness, users can only RELOAD

3) Expiration mechanism provides no hard guarantee on staleness

4) Users can't specify their staleness tolerance

New Coherence Mechanisms

3 independent extensions to existing coherence mechanisms:

Returning a stream of documents for each request

The user's view

Implementation

Browser changes needed to work well with version streams

Proxies can be programmed to work around these limitations for the time being.

More speculative cache coherence techniques

1. Pre-fetching

2. Replication

Replication - the distribution of popular pages to caches even before those pages have been requested through those caches

Berners-Lee's argument why caching alone is inadequate and replication is necessary:

A proposed mechanism for replication

Conclusion

We hope that Web caches will replace all archives and mirrors on the net.

Theoretical work to be done:

Experimental work to be done: