Email: haf93@aber.ac.uk
URL: http://www.dcs.aber.ac.uk/
This poster describes some work that has been carried out to try to determine the types of activities (e.g. research, teaching, leisure) that are being carried out using JANET resources.
JANET is a major investment of UK public funds. This investment has, historically, been justified by the impressive growth in the use of the network, and examples of successful use. In the future this may not be sufficient; continuing and growing investment may well need to be justified by more extensive understanding of the benefits to users of the network.
Before we can understand the benefits, we need to understand what tasks are being carried out, and even this is currently ill-understood.
Several approaches to identifying the types of activities being carried out using JANET resources have been considered:
Interrogation of network users or service providers, by means of questionnaires, can provide useful information so long as the people providing the data give an accurate overview of their reasons for using the network resources. However, this approach will break down because the number of people using the network is growing at an enormous rate making it difficult to identify all of the people who are responsible for the traffic traveling through JANET at any one time.
The second approach of traffic analysis involves examining a representative proportion of the packets being transmitted through the network and categorising them on the basis of the application from which they were generated as identified by the port numbers contained within them. This approach captures all of the traffic, but has the disadvantage that only well established applications have designated port numbers. In addition the port numbers only differentiate between applications (e.g. ftp, WWW) and do not distinguish between categories of use (e.g. research, teaching).
The final option is the analysis of the log information available at end user application servers such as those for ftp and WWW. This process involves identifying the files that have been transmitted by the servers and then categorising those files on the basis of their content and their context within the other information available at the server. The use of this approach provides values for both the number of events that were carried out and the amount of traffic that was transmitted that fell into each category. The main problems with this approach are that it needs to be adapted for each different application type and it relies on the provision of the log information by the server providers.
The poster below describes a pilot experiment to categorise the uses to which JANET is being put by analysing the log information available at a number of World Wide Web servers. It describes the technique that was developed to achieve this, which was, briefly, as follows. WWW servers that provided detailed summary log information were identified and this log information was downloaded to cover the three month period of the experiment. This was followed by the process of categorisation which involved identifying each of the files listed in the log information as having been requested from the servers and examining their contents and position in the server directory structure. Each of these files was then assigned a category based on an interpretation of its intended to use. The categories that were used were chosen to represent the different types of activities that users were carrying out which involved the use of the files, examples of which are research, leisure and teaching. The poster also shows some example results and makes some conclusions about the experiment.