Next: trace-driven analysis Up: Web traffic characterization: an Previous: approach

server statistics for 2-3 August 1994

Table 1 shows the top 25 documents from the NCSA mosaic server for 2-3 August 1994, the number of requests for each document, and the total number of bytes representing each document that the server sent. A total of 8,949 unique documents were requested during the two-day period, each between 1 and 129,917 times. The 8,949 documents together compose 398 Megabytes, the size of the ``active document set'' for this period. The top 25 documents were responsible for 59%of the documents requested and 45%of the bytes requested during the two-day period. These top 25 documents together compose 260112 bytes, .065%of the bytes that compose all files requested. and .0019%of the total bytes sent from the server during the two-day period. These initial statistics suggest that caching a very small portion of the NCSA server at regional sites could save a significant portion of the overall traffic leaving the NCSA server.

Figure 2 shows the distribution of retrieved document sizes. We note that the mean document size, 17 kilobytes, is not very representative since a few very large documents requested skew the mean of the distribution. (The maximum document size is over 17MB, requested only once; the 95th percentile only 57 kilobytes; and the median only 3 kilobytes.) The top 25 documents range between 198 bytes and 59 kilobytes; interestingly enough the smallest of these 25 documents is a gif image (often used as an icon) and the largest of them is the NCSA What's New Page. Table 1 indicates that most of the very popular files are under 10 kilobytes, fairly small relative to the sizes of other NCSA files.

To assess geographic source of queries within the United States, we divide the continental US into eight zones, and use a ninth zone for Alaska and Hawaii, and a tenth zone for all non-US and unknown sites, according to figure 3. Table 2 shows the use of the NCSA mosaic server for 2-3 August 1994 by these zones; in the appendix we include a table of use by state.



Next: trace-driven analysis Up: Web traffic characterization: an Previous: approach


kc@
Thu Sep 15 22:53:05 PDT 1994