Real-Time Geographic Visualization of World Wide Web Traffic
Stephen E. Lamm
and
Daniel A. Reed
Department of Computer Science>
University of Illinois
Urbana, Illinois 61801
Will H. Scullin
501 East Middlefield Road
Netscape Communications Corporation
Mountain View, California 94043
Introduction
- Growth of the WWW
- Understanding WWW Traffic
- NCSA Traffic Growth & Diversity
- Avatar - visualization & interaction system
WWW Traffic Growth
- Largest segment of Internet traffic
- Fastest growing
- Rapidly expanding commercial applications
- Large demand on servers
- Current: 4G byte WWW document tree with digital audio/video
- Future: Multi-gigabyte, multimedia database
NCSA WWW Server Growth
Current NCSA WWW Server Statistics
- Request types
- Text requests dominate request volume
- Images rival text for data volume
- A few pages account for large volume of requests
- Site distribution
- Educational site are the largest component
- Commercial sites are a growing fraction
- Commercial gateways are most prolific
NCSA WWW Server Architecture
- Single domain name: www.ncsa.uiuc.edu
- Scalable design
- Each server capable of handling any request
- Receiving 2,000,000 requests per week
NCSA WWW Server Architecture (continued)
- 4/6/8/9/7 HP 735 workstation servers
- 96MB memory
- 130MB AFS cache
- 100 megabit/second FDDI connection
- Andrew File System (AFS)
- 3 Sun SPARC 10 file servers
- Each has 120GB disk space
- Shared with NCSA researchers
- FDDI Ring
- NCSA Backbone
- T3 connection to Internet
Domain Name Server (DNS)
- Resolves domain names (oboe.cs.uiuc.edu)
- Returns IP addresses (128.174.327.130)
- Translations are cached at remote sites
- NCSA's modified DNS distributes IP addresses in round-robin fashion
- Server IP addresses have a recommend 15 minutes time-to-live (TTL)
WWW Performance Data
- Hypertext Transfer Protocol Deamon (HTTPD) log files
- What: Document accesses
- How: Agents
- Why: Referrers
- Whoops: Errors
- Access log files
a.com - - [05/Apr/1996:17:10:12] "GET t.gif HTTP/1.0" 200 512
- IP domains
-
com
, edu
, gov
,
net
, org
, and two letter country codes
- Type classifications
- Text, graphics, audio, video, scientific
Geographic Location Mapping
- Reveal temporal and spatial access patterns
- InterNIC whois database
- Cache look-ups
- 95.0% Map to location
- 4.5% Location not found
- 0.5% Search required
- Real-time performance
- Easily supports 30-50 requests per second
Whois Entry Example
Computing and Communications Services Office (UIUC-DOM)
1120 Digital Computer Laboratory
1304 West Springfield Avenue
Urbana, IL 61801-2910
Domain Name: UIUC.EDU
Administrative Contact:
Krol, Ed (EK10) e-krol@UIUC.EDU
(217) 333-7886 (FAX) 217-244-7089
Technical Contact, Zone Contact:
Joyner, Reece (RJ87) rjoyner@UIUC.EDU
217-244-7686 (FAX) 217-244-7089
Record last updated on 28-Aug-95.
Record created on 18-Jul-85.
Domain servers in listed order:
ARGUS.CSO.UIUC.EDU 128.174.5.58
CYCLOPS.CSO.UIUC.EDU 128.174.36.254
IUGATE.UCS.INDIANA.EDU 129.79.1.9
Real-time Analysis Architecture
Avatar Immersive Environment
- Data interaction environment
- Flexibility and extensibility
- Multiple display metaphors
- Scattercube
- Time tunnel
- Globe and map displays
- Multiple interaction techniques
- Multiple hardware support
- Head-mounted display (HMD)
- CAVE
- Workstation with stereo glasses
NCSA CAVE Virtual Reality Theater
- Features
- 3 x 3 x 2.75 meter cube
- High resolution rear-projection
- Supports multiple viewers
- Viewed with lightweight stereo glasses
Globe VR Metaphor
- Point-to-point arcs
- Stacked bars
- Height
- Color bands
- File type, domain classes, servers, time intervals
- Position
Avatar WWW Controls
Interaction
- Buttons
- Limited by number of buttons
- Good for common commands
- Menus
- Familiar to most users
- Easy to extend
- Can be disable for unobstructed view
- Voice
- Must know commands
- Take less space than menus
- More natural interaction
Pablo/Avatar Data Format
- Self Defining Data Format (SDDF)
- Descriptors
- Record instances
- Flexibility
- Add new types
- Extend existing types
- Two processing options
- Real-time analysis (socket data streams)
- Off-line analysis (data files)
Self Defining Data Format (SDDF) Example
- Record Descriptor
SDDFA
#1:
"Mosaic_Metric" {
int "time";
int "server";
int "size";
int "file_type";
int "domain_type"
float "latitude";
float "longitude";
char "city"[];
char "state"[];
char "country"[];
char "hostname"[];
};;
- Record Instance
"Mosaic_Metric" {
1300, 1, 12000, 2, 3, 40.112, -88.200,
[6] "URBANA", [2] "IL", [3] "USA",
[8] "uiuc.edu"
};;
WWW Patterns (August 22, 1995 at 6 AM)
Analysis Experiences
- Wide variation in request frequency
- Distribution along population lines
- Business hours create most requests
- Time of day affects file type requested
- Server load imbalance
- Recommended time-to-live not honored
- Advantage of visualization
Research Directions
- Multiple servers
- Improved granularity
- Incorporate new metrics
- Refined IP address mapping
- Exceptions common
- City databases
- Demographic Data Mining
- Distributed Query Analysis
- Java and VRML
Conclusions
- Growing demand on WWW servers
- Access patterns are an important performance determinant
- Expanded Avatar to visualize WWW server data
- Helpful tool for WWW analysis
Further Information