2. Replication
Replication - the distribution of popular pages to caches even before
those pages have been requested through those caches
Berners-Lee's argument why caching alone is inadequate and replication
is necessary:
- "the nesting of proxy servers increases response time"
- but proposed extensions to HTTP minimize this inefficiency
- even Netscape 2.0 has support for keeping alive HTTP connections
- "unnested proxies will be very numerous and put a high load on
orginal servers"
- but as Web proxy hierarchies become more numerous, users' incentive
to use caches will be greater and greater
- today, the bottleneck for most page transmission is the Internet itself,
not server load
- "cache hit rates tend to be under 50%"
- yes, but the pages Berners-Lee wants to replicate (e.g. Olympics results)
are exactly those which are found in Web caches
- this is not really so bad - in a hierarchy, many caches cooperate to
provide a high user hit rate, even though each cache itself may have only
a low rate
- we feel that when sufficiently large (continental) caching systems
are built, hit rates will be at least 80-90%
Web Cache Coherence
(A.Dingle,T.Partl)