The UK (Academic) National Web Cache

Neil Smith, HENSA Unix, University of Kent at Canterbury

This presentation describes a JISC funded service provided for the benefit of the Higher Education community in the United Kingdom.

Introduction

History

November 1993
Introduced experimental Web Caching service

Summer 1995
Recognised as the UK National Service
About 320,000 requests per day
To about 9,600 distinct sites
From about 2,700 client machines

11th March 1996
New trans-Atlantic link
A total of 1,650,000 requests per day
To about 20,000 distinct sites
From about 8,000 client machines

Software

Lagoon - CERN
First Generation Caches

Netscape - Harvest
Second generation. Offering different facilities.

The Next Generation
Netscape 2
Netcache/Harvest 2

Hardware

CPU Evolution
From Uniprocessor Sparc 10
To Five Challenge S's + One Multiprocessor DM

Disk
Disk is always the bottleneck

RAM
Netscape Proxy RAM hungry - but depends on bandwidth

Network
Distribution across multiple sites
Increases available bandwidth
Improves resilience

Networks

Back then
4Mbps to the US
4Mbps to Europe

Real Soon Now
17Mbps to the US
34Mbps European Triangle

The Future
Dedicated bandwidth promised ...
... not yet delivered.

Users

You can please some of the people all of the time. And you can please all of the people some of the time. But you cannot please all of the people all of the time.

Proxy resilience

A Client Issue
Caches have to work around poor browser implementation
Fall-back or round-robin used in FTP and telnet

Netscape Auto-config
Introduces resilience and configurability
But depends on Netscape Navigator Version 2

Cache clusters
Dumb clients limit capabilities
Round-robin DNS with short TTLs
Need `buddies' to monitor and pick up failed machines

Cache co-operation

Introduced in Harvest
Hand crafted co-operatives
Neighbo(u)r and parent caches
Communication through light-weight UDP protocol

But...
Need a unifying and complete protocol
Need to recognise clusters of machines
Broadcast queries do not scale
Hand-built co-operatives cumbersome

Networks for Caches

The Problem
Caching is crucial
Hits give 0.05 to 0.2 second responses
Misses depend on bandwidth

The Solution
Dedicated bandwidth improves miss performance

The Benefits
Encourages use of the cache
Potential savings are enormous
Does it warrant blocking port 80?

HTTP developments

Log exchange
Log information passed to servers
Encourages cache friendly pages
Privacy issues

User configuration
User configures speed vs staleness
Need cache maintainer overrides
Need server overrides

Protocol efficiencies
HTTP-NG
HTTP 1.1

Politics and The Law

Politics
This section censored

The Law
Are caches publishers or distributors?
Censorship or promotion?
Copyright?
Lots of untested issues