We describe a model for log data of user search sessions obtained from a trail-based search and navigation documentation system. The model elicits interesting patterns that can be used to better understand Web user search and navigation behaviour. Our study shows that such log data reveals interesting patterns beyond the typical statistical query terms analysis.
Log analysis, searching behaviour, navigation pattern, search process, selection process.
Figure 1: AutoDoc Interface
With AutoDoc, users have to choose as their search strategy either (1) to submit a query by typing their query terms into the search box or (2) to navigate by following a link in the trail window, which displays a collection of trails that the user may follow. Users can utilise both strategies during their subsequent searching process.
All transactions that occur during user interaction with AutoDoc are written to the log files stored in an Oracle database. We note that apart from the query terms, our server also records the links selected from the trail window generated by AutoDoc. The information relevant to this study is stored in two tables, lquery and lclick. The lquery table consists of information about the queries, such as the session id (SID), the query id (QID), the query string (QS) that users entered and a timestamp. The lclick table contains information about the links (or URLs) that the user clicked on after issuing the query, together with the matching query id. In order to construct a complete user search session, a new table is constructed by joining the lquery and lclick tables. Table 1 illustrates a simple example of the combined log entries in the joined table. Note that the entries in the URL column have been simplified to display only dummy Web pages.
|
|
Table 1. A simple log entry example
We model Web search sessions of AutoDoc users as a process having the following stages: (1) a query formulation stage, q, where query terms are entered in the search box or when a user navigates with the aid of the trail window whenever no query terms are entered, (2) a selection stage, s, where a sequence of links (filtered with respect to the search terms) are clicked upon, and (3) a query reformulation stage, r, where users either reformulate or modify their previous query, or submit a new query to the system. The final terminating stage, is modeled by the termination symbol of $. As an example, the log data in Table 1 can be modeled as follows:
|
|
Table 2. Users search behaviour patterns
These patterns can be represented in a more constructive and informative manner by representing them in a trie-like structure [3]. Note that as all user search sessions begin with the query formulation stage, q, so it is then unnecessary to include this stage in the trie. For the trie model to be useful, it should be able to provide useful information about users' activity such as the frequency of occurrence of specific searching actions and patterns. For instance, based on data in Table 2, the frequency of occurrence of s (i.e. a selection) at level two is four, as there are four users (ID 1,2,5,6) and there are two users with the pattern <q,s,s,$>. Therefore, Table 2 can be visualised as follows:
Figure 2: Trie For Table 2
Using this model, we can ask questions such as how many users do only link selection without reformulations or how many users reformulate after x number of selections.
Several months worth of AutoDoc log data were collected and examined. The log data examined comprises of 7755 entries from 3601 unique sessions. After going through a data cleaning process, where all entries generated when the initial AutoDoc page is loaded were deleted, the cleaned log data consists of 1962 unique sessions.
In Section 5.1 and 5.2, we examine two distinct categories of user search behaviour which arise from analysis of the trie of user behaviour patterns:
When starting a search session, users are more likely to type in query terms (55.6%) than to navigate via the trail window (44.4%). Further analysis reveals that from the percentage of users who choose a URL, t, as their first choice to start the search session, 94.3% of them continue to navigate via the trail window until the end of the search session. Only 5.7% of users in this group switched to query submission or/and navigating via the trails window again. Similarly, from the percentage of users who enter query, q, to start a search session, 97.2% of them continue typing in a new query until they have achieved their informational goals. Only 2.8% switched their search strategies in their subsequent activities. The behaviour observed from the two groups is rather consistent where users tend to select one preferred search strategy and stick to it throughout the search session.
Ninety-two unique searching patterns were generated. The average number of links being selected per session is 2.5 links. The majority of users, (80.1%) select or follow at most three result links per session while the maximum number of result links being selected is 34 per session. Looking at individual query submission, 71.5% of users select one result link per query entered. Only 9.2% of users select four or more links. A staggering 90.0% of users did not reformulate their query i..e they did not submit a new query or modify their previous query. Thus, 10% of users reformulated their previous query submission during a search session. Further investigation reveals that 60.2% of users who reformulate their query, performed reformulation after selecting only one result link. Users who reformulate after selecting four or more result links formed 12.2% of this group.
The model proposed allows us to elicit additional information from search engine log data and to provide a better understanding of Web users' search and navigation behaviour. This study revealed that users only select a few result links during their interaction with the search system. It is also interesting to note that users often prefer not to follow additional result links before reformulating their initial query. These findings suggest that effort should be directed towards generating more informative and revealing information summaries about the result links in order to help users make a decision before reformulating their queries. It is evident from this study that users are extensively following the trails generated during their navigation activities. For future improvement of AutoDoc, the findings reveal that providing fewer but highly informative trails may be more useful to the user than trying to displays all available links for user selection. However, we note that these observations may be a result of the fact that AutoDoc users approach the search system with a specific item to find. As most users of AutoDoc intend to solve issues concerning their Java programming problems, they are generally more task-oriented. We have also extended this model to elicit interesting user navigation patterns from the log data of a general Web site. In this case, each node in the trie represents an information category of the site. We aim to model how users navigate the Web site and scrutinise the main informational goals they set to achieve.