The influence of geographical and cultural issues on
the cache proxy server workload
Vírgilio F. Almeida,
Márcio G. Cesário,
Rodrigo C. Fonseca,
Wagner Meira Jr., and
Cristina D. Murta*
Computer Science Department, Federal University of Minas Gerais,
Belo Horizonte, MG, Brazil
virgilio@dcc.ufmg.br,
magc@dcc.ufmg.br,
rfonseca@dcc.ufmg.br,
meira@dcc.ufmg.br, and
cristina@dcc.ufmg.br
- Abstract
-
A key characteristic of the Internet is its global diffusion, that
shows a rapid growth of the number of hosts and international links
around the world. The diffusion of the Internet has been accompanied by
serious performance problems, such as long response times, server
overload and network congestion. Caching has been used as a standard
solution to minimize the problem. In this paper, we analyze logs of
caching proxy servers and show evidence that geographical, cultural
and social issues have a strong influence on the workload of a proxy server.
Therefore, the cultural and social context provide relevant information
to plan efficient caching proxy architectures.
- Keywords
-
WWW, Caching; Internationalization; Internet; Performance
A key characteristic of the Internet is its global diffusion,
that shows a rapid growth of hosts and international
links around the world [10].
The fastest growing nations between
1996 and 1997 were Japan, Malaysia, Singapore, Korea, and
Brazil, which confirms the
widespread penetration of the information society.
However, reference [8] also shows
that the exponential growth of the Internet has been accompanied
by serious performance problems. To minimize these problems, caching
proxy servers have been used as a solution to reduce
server overload and network congestion, as they attempt to bring
the data as close to the client as possible.
The way users access the Internet depends heavily upon the telecommunication
infrastructure and social context of each country. Thus, to
understand the performance behavior of the WWW, one must consider
geographical, cultural and social issues. Based on the analysis of
different proxy server logs, this paper shows evidence of the influence
of these issues on the workload of caching proxy servers.
Our approach is to examine the meaning of statistics drawn from logs of
a busy Brazilian proxy server in light of geographical and cultural issues.
Characteristics of Web cache workloads have been studied in many
references [5,2].
None of them make mention on geographical, social or cultural influence.
Some studies help to understand the relevance of geographical aspects
related to caching [6,9].
To our knowledge, there is no reference that investigates the
relationship between cache workload and cultural and social issues.
The history of the Internet in Brazil dates back to 1989, with the
implementation of the National Research Network's backbone
(RNP), which provides Internet access throughout the country.
Points Of Presence (POPs) were created in most states of the country, to
provide universities and institutions with a link to the Internet.
Like other countries, Brazil has watched exponential growth of
the Internet in its territory. As of January 1997, Brazil stands
as the 19th country in number of hosts in the world,
and the 3rd of the Americas, after US and Canada. According
to [11], the number of .com hosts in the Brazilian national
domain (.com)
has grown 1947% from January 1996 to July 1997.
Cache proxy servers throughout the world exhibit different access
patterns. Using data available at [4],
we compiled statistics for cache proxy servers in several countries. We observed
that in five countries, USA, Brazil, Japan, Italy, and Taiwan, the
majority of accesses is to the their national domain. However, in other
countries, like Belgium and the Netherlands, most accesses are directed to
the .com domain. The hit ratio for objects in the national
domain is always higher than the one for objects from other domains.
We analyze the influence of cultural characteristics on
the proxy server workload, by studying 4,235,311 requests to the POP-MG's
proxy server. It has a total bandwidth of 7 Mbps and an
average total traffic rate measured close to 5 Mbps.
POP-MG is the main gateway to the Internet for twenty five
universities and a hundred business organizations including Internet
Service Providers. The requests correspond to a ten-day operation interval
and come from both commercial (40%) and educational (60%)
organizations. The amount of transferred data is more than 25 gigabytes.
Table 1.
Statistics for accesses to the first level proxy of POP-MG
|
All |
.br |
.com |
Requests |
4,235,311 (100% Req) |
2,146,625 (50.2% Req) |
1,402,558 (33.2% Req) |
Objects |
1,079,044 (100% Obj) |
319,937 (29.7% Obj) |
518,256 (48.0% Obj) |
Accesses/object |
3.92 |
6.70 |
2.70 |
1-access objects |
709,759 (65.8% Obj) |
180,385 (16.7% Obj) |
349,737 (32.4% Obj) |
Non-first accesses |
74.52% |
85.10% |
63.05% |
Hit ratio |
47% |
58% |
36% |
Table 1 displays workload statistics.
Column labeled ``All'' indicates all requests handled by the proxy during
the period, and the other two columns represent the two most
accessed domains (comprising more than 80% of the accesses):
.br and .com.
Results available in [1] show that the majority of accesses
in some countries (e.g., Brazil, USA, Japan, and Italy) are for
their national domains. By analyzing
the results of the Brazilian Internet User Survey [7] and
considering the telecommunication infrastructure, we have the following
explanations for the high percentage of accesses to the .br domain:
(1) only 58% of the users speak English, and are able to access
English language sites; (2) most of the Brazilian users are interested in news
(80%), scientific information (67%), music (67%), and adult entertainment
(61%), which are topics heavily related to regional culture; and (3)
accesses to Brazilian sites are usually faster, since they do not demand
traversing busy international links.
The second observation regards the average number of accesses per
object. The hit ratio is much higher for
Brazilian objects (6.7 and 58%, respectively) than for objects from
the .com domain (2.7 and 36%, respectively). This phenomenon is
explained not only by the amount of accesses to .br sites,
but also by the fact that the number of unique Brazilian objects
(319,937) is significantly smaller than the number of cached objects
from the .com domain (518,256).
By examining the POP-MG and NLANR logs,
we found a significant difference regarding the popularity of http
based chat
sites in Brazil.
Accesses to sites with chat applications correspond to 4.9% of the
total accesses recorded at POP-MG
log. In the US, requests that stem from chatter sites represent
1.2% of the accesses for NLANR's.
It is worthnoting
that Web chatter sites are among the most popular sites in Brazil.
This characteristic is important for caching projects,
because chat pages are dynamic and cannot be cached.
Our last observation regards the telecommunication infrastructure.
In Brazil, telecommunication services are more expensive than in US.
Thus, most of Internet users tend to navigate through the WWW in periods
of time when the telephone rates are lower. As a consequence, we
observed heavy peak loads in the low rate periods.
Using the logs, we calculated the hourly arrival rates for the
proxy server of
NLANR and POP-MG.
We noticed a high variability in the load,
due to the different tariff schemes in the two countries (i.e.,
Brazil and US).
The traffic patterns seen at POP-MG
follow the phone rate variations.
During the least expensive period, the peak arrival rate
is 116% higher than the average rate.
In the NLANR servers the peak to average ratio falls to 46%.
Thus, this type of information is useful to plan the capacity
of proxy servers, that should be able to handle the peak load.
In this paper, we have analyzed the logs of a busy cache proxy
server in light of geographical and cultural issues, such as language,
social interaction, cost of bandwidth, among others. We noted
a correlation between national characteristics (taking Brazil as our
example) and the quantitative behavior of a cache proxy server,
represented by the percentage of accesses to the national domain, the
hit ratio for each domain and accented peaks in traffic.
As noted by [3], Brazilians naturally like to chat, and this fact
is reflected in a high percentage of accesses to chat sites, as compared
to an American server. Language and interest in regional information,
according to a WWW Brazilian user survey [7],
as well as limited bandwidth of international links are used to explain the
high percentage of accesses from Brazilian users to pages
in Brazilian sites and the high hit ratios observed in the cache of POP-MG.
The tariff scheme adopted by the local phone company a strong geographical
factor is found to have a significant influence on the traffic patterns of
POP-MG's cache server.
The above conclusions are being used to define the architecture of the
POP-MG cache proxy hierarchy, (e.g. domain based caching), as well as to size
cache capacity to handle load peaks.
- 1
-
V. Almeida, M. Cesário, R. Fonseca, W. Meira Jr., and C. Murta,
The influence of geographical and cultural issues on the cache proxy
server workload,
http://www.dcc.ufmg.br/anades/submissions/habits/
- 2
-
A. Bestavros, C.R. Cunha and M.E. Crovella,
Characteristics of WWW client-based traces,
Technical Report TR-95-010, Boston University Computer Science
Department, 1995.
- 3
-
M. Eakin,
Brazil: The Once And Future Country.
St. Martin's Press, New York, NY, 1997.
- 4
-
National Laboratory for Applied Network Research,
Cache statistics pages,
http://ircache.nlanr.net/Cache/cache-stats-links.html
- 5
-
M. Abrams, G. Abdulla, E.A. Fox and S. Williams,
WWW proxy traffic characterization with application to caching,
Technical Report 97-04, Virginia Tech, Computer Science Department,
1997.
- 6
-
J. Gwertzman and M. Seltzer,
The case for geographical push-caching,
in: Proc. of the 5th Annual Workshop on Hot Operating
Systems, May 1995, pp. 5155.
- 7
-
IBOPE,
2a., Pesquisa Cadê?/IBOPE,
http://www.ibope.com.br/cade97/welcome.htm
- 8
-
C. Kehoe and J. Pitkow,
Surveying the territory: Gvu's five www user surveys,
The World Wide Web Journal, 1(3), 1996.
- 9
-
M. Nabeshima,
The Japan cache project: an experiment on domain cache,
in: Proc. 6th International World Wide Web Conference, 1997.
- 10
-
L. Press,
Tracking the global diffusion of the Internet,
Communications of the ACM, 40(11): 1117, November 1997.
- 11
-
Brazilian Science and Technology Ministry,
Hosts por Domínio,
http://www.gt-er.cg.org.br/estatisticas/hosts/tab-host.html
Footnotes
- ...Murta*
Supported by a grant from CAPES, Brazil. Permanent address: Departamento de Informática, Universidade Federal do Paraná, Brazil.
-