Visualizing the World-Wide Web with the Navigational View Builder
Sougata Mukherjea
Graphics, Visualization & Usability Center
College of Computing
Georgia Institute of Technology
Email: sougata@cc.gatech.edu
URL: http://www.cc.gatech.edu/gvu/people/Phd/Sougata.Mukherjea.html
James D. Foley
Graphics, Visualization & Usability Center
College of Computing
Georgia Institute of Technology
Email: foley@cc.gatech.edu
URL: http://www.cc.gatech.edu/gvu/people/Faculty/James.D.Foley.html
- Abstract:
- Overview diagrams are one of the best tools for orientation and navigation in
hypermedia systems. However, constructing effective overview diagrams is a
challenging task. This paper describes the Navigational View Builder, a tool which allows
the user to interactively create useful visualizations of the information space. It uses
four strategies to form effective views. These are binding, clustering, filtering and
hierarchization. These strategies use a combination of structural and content analysis of
the underlying space for forming the visualizations. This paper discusses these strategies
and shows how they can be applied for forming visualizations for the World-Wide Web.
- Keywords:
- Information Visualization, Overview Diagrams, Binding, Clustering, Filtering,
Hierarchization.
Introduction
One of the major problems with current hypermedia systems is being
lost in hyperspace. For example in Mosaic [1] the
process of jumping from one location to another can easily confuse the user. This is
primarily a result of the user's lack of knowledge of the overall structure of the
information space. For this purpose overview diagrams or navigational views are
very useful. By presenting a map of the underlying information space, they allow the users
to see where they are, what other information is available and how to access the other
information. In fact, they are one of the best tools for orientation and navigation in
hypermedia documents [12].
However, construction of effective overview diagrams is a challenging task. There are
various problems involved:
- The navigational views are two or three dimensional projections of generally
multidimensional hypermedia networks. Finding effective views of such complex network
structures that convey all the required information is hard. In particular, this task is
impossible without knowledge of the structure and the contents of the information space.
- Even if such a view is developed, the resulting structure would be very complex for
any non-trivial hypermedia system. Creating an aesthetic layout of such a complex
structure is extremely difficult. In fact the overview diagrams are generally represented
as node and link graph diagrams and graph layout is a very complex problem
[2].
- As the size of the underlying information space increases, it becomes very difficult
to fit the whole information structure on a screen. If the size is reduced to fit on the
screen, the details become too small to be seen. An alternative is to browse the large
layout by scrolling and arc traversing. However, this tends to obscure the global
structure. The goal should be to display both the details and the context smoothly
integrated together in a single screen.
- Just displaying the structure to the user is not enough. To be really useful, the
user should be able to get an idea of not only the structure but the actual contents of
the nodes and links just by looking at the navigational views.
We are building the Navigational View Builder, a tool for letting the designer
develop effective overview diagrams of hypermedia systems. (The implementation
is done using C++, Motif and Open Inventor [13]).
The tool uses various strategies to reduce the problems concerned with
developing overview diagrams. This paper discusses these strategies and shows
how they can be used to form useful views of the World-Wide Web.
We assume a database-oriented hypermedia system where the nodes are described
with attributes. Since the WWW is unstructured and does not have this model of the
hypermedia system, the model was built from the node and link structure of the WWW
extracted by parsing the html documents using the strategy described in
[10]. Some of the attributes of the nodes like the author
(the owner of the file) and the file-size could be extracted automatically from the files.
However, a major drawback of the World-Wide Web is the absence of many useful semantic
attributes for the pages. Therefore, to fully show the power of our tool, attributes like
the topic of the page (whether it is a research page or a personal page, etc.) were
inserted manually. (Efforts are underway to incorporate metadata into WWW and hopefully in
the near future we can extract all useful information from WWW automatically).
Figure 1 shows an unstructured overview diagram of the WWW pages
about the research activities at the Graphics Visualization & Usability (GVU) Center at
Georgia Tech (URL: http://www.gatech.edu/gvu/gvutop.html). These pages mainly contain
information about the faculty, students and research activities of GVU. This figure shows
how useless such overview diagrams can be for helping the user understand anything. The
whole information space does not fit on a screen, and the interconnections between the
nodes make the structure too complex for easy understanding. The figure also does not give
any details of the characteristics of the nodes and links. The following sections will
describe various strategies to make this diagram more understandable.
Binding
To make the overview diagrams useful for the user, they should depict more information
than merely the structure of the information space. The user should get an idea of the
content of the space so that just by looking at the diagram the user can decide what part
of the overall information space is of interest to her. For example, in the WWW, an user
should be able to get an idea about the topic of the page by just looking at the overview
diagram. Similarly, if certain nodes are important (known as landmarks in
hypermedia literature), they should be identifiable from the views. To achieve this
purpose, visual properties can be used in the overview diagram to represent information
from the underlying information space. For example, the topic of the nodes may be
represented by different colors. Similarly, special icons or brighter colors can be used
to represent the landmark nodes.
The designer of the overview diagrams should be able to specify the bindings
between the information attributes and the visual attributes of the nodes and links. These
bindings specify a mapping between the information space and the visual space of the
hypermedia system. At the time of displaying the navigational views, these bindings and
the information in the underlying information space control the actual appearance of the
views. For example, if the topic is bound to the color of the nodes, the color of the
nodes in the view would be controlled by the topic of the page which the node represents.
Unfortunately, the number of visual properties that can be used to depict information is
usually less than the amount of information in the underlying database. Therefore, only
the important data attributes should be bound to the visual properties. Thus, the mapping
between the database and the visual attributes should not be hardwired into the system
since the importance of the data attributes cannot be fixed a-priori. The Navigational
View Builder allows the designer of the navigational views to specify the
bindings between the data and the visual attributes of the nodes and links in an
easy-to-use interface. A sample interface for binding node attributes is shown in
Figure 2.
Figure 3 shows an example of a view of the GVU WWW pages with
various visual properties bound to information attributes. In this example icons are used
to represent various media types, the node size is bound to the file size (inversely)
while the shape represents whether the author of the page was a student (circles represent
students). The color hue represents the topic of the page while the color saturation
represents the last modified time. Moreover, links connecting two html files are solid
while links connecting html files to other kind of files are dashed. More information is
presented to the user in this diagram than in Figure 1. A more
detailed discussion of our binding strategy and its usefulness is presented in
[5].
Clustering
One of the major problems with the overview diagrams is that the screen gets cluttered
with too much information for even a reasonably sized hypermedia system. One way to
overcome the problem is to reduce the information that needs to be shown on the screen
through suitable abstractions. These abstractions will show the user the overall
information space without cluttering the screen. For example, it is obviously not possible
to show the entire GVU WWW pages on a single screen. Hence we may form an abstraction
which clusters all files in the same directory into a single node. Clustering is possible
for the abstracted views also. For example, if the number of directories are large, even
the abstraction which represents each directory as a single node may be too confusing for
the user. Hence there may be further clustering based on, for example, the topic of the
pages. These abstracted views would be able to show the overall information space on a
single screen without much cluttering. Hence the user would have a better feel of the
global information structure from these abstracted views. Depending on the intentions and
goals of the users, different kinds of abstractions would be useful to them. For example,
one user may want to see an abstraction of the information space based on the topic. Some
other user may want to see the abstraction based on the author of the pages. Hence, for
the abstracted navigational views to be really useful, we should allow the interactive
specification of the abstractions.
Types of Clustering
Two types of clustering are useful. They are as follows:
- Structure-based clustering:
In structure-based clustering, the hypermedia structure is searched for subnetworks
that match a given pattern. These subnetworks form a cluster. The simplest form of this
type of clustering is link-based clustering in which nodes that are joined by a
particular class of links are clustered together. For example, in an overview diagram we
may want to form clusters of nodes that are linked by annotation links.
- Content-based clustering:
In this type of clustering the nodes of the hypermedia are considered individually and
their attributes are examined to determine the clusters. The simplest form of
content-based clustering is when all nodes whose attributes satisfy certain properties
form a cluster. Sometimes the user would like to cluster all nodes that have the same
value for a certain attribute. If the user has to generate a clustering command for each
value of the attribute it would be quite tedious. Therefore,
Attribute-similarity-based clustering is allowed in which for each value of the
specified attribute, clusters of nodes having that value is formed. For example, all pages
built by the same author can be clustered together (as shown in
Figure 4).
To allow the clustering to be done interactively, it is essential that the clustering
algorithms are efficient. We discuss our clustering algorithms in
[6]. Note that clustering can be combined with binding so
that information about the clusters are shown visually. Thus in
Figure 4, shapes represent the type of the authors
(circles represent students).
Visualizing the Abstracted Views
By a series of clustering, the user will be able to see the global view of the information
space. After the user gets an idea of the global structure, she would generally want
details of some part of the space and the problem with the abstracted views is that the
details would be lost to the user. To show the details together with the context, we have
developed a visualization strategy using 3 dimensions. Initially the user will be shown
the most abstracted view as in Figure 5. (Here the abstraction was
done by directory, sub-topic, topic, etc.) When the user wants the details of a particular
node, those details will be shown at the front while the more abstract layers move deeper.
For example, Figure 6 shows a view where the user wanted to see the
details of the Research pages. The x-y plane is used for showing the details of each
particular layer and the layers are arranged in the z dimension with the most detailed
view (of the Research pages) in the front. It can be argued that as more and more details
are available, the context would move deeper and deeper into the screen. However, the
eye-point can be shifted to bring different parts of the space in focus. For example,
besides the front view, back, side and top views can also be provided. Figure
7 shows a top view of the information space after the user wanted
to see details of the Hypertext research area. (Note that colors are used to represent
different research areas and the nodes whose details are being shown are greyed out).
We allow smooth animation so that the view changes are
not abrupt and allows the user to see the changes easily. This view shows that the details
of the user's interest as well as the context are integrated together on a single screen.
(However, it should be noted that the user can generally see the details of one part of
the space only - otherwise the view becomes very confusing).
Filtering
To reduce the complexity of the overview diagrams, the user may want to remove unwanted
information (or show the useful information only). Therefore, we allow various filtering
options.
- Content-based: In this type of filtering only nodes whose attributes
satisfy certain properties are shown/ hidden. For example, Figure 8
shows only the pages related to research in Animation.
- Link-based: In this type of filtering only certain types of links are shown/ hidden.
- Structure-based: Suppose some organizations have guidelines for their
employees when they are making their WWW home pages. It may state that a central html
page should be linked to a gif file and another html page. Later on, an administrator for
these pages may wish to check whether people have been following the guidelines.
Traditional content-based database querying would not be of much help. Therefore we allow
structure-based querying which allows the showing/hiding of certain subgraph patterns
only. For example, Figure 9 shows the html pages that are linked to
both gif files and movies. (Icons represent the media types of the pages).
Note that our filtering algorithms are similar to the clustering algorithms. Also note
that our content-based filtering is similar to the concept of dynamic queries
[14].
Hierarchization
The previous sections have assumed that the overview diagrams would be presented as node
and link graph diagrams. Since the underlying hypermedia structure is a network, this may
seem to be the best way to present the information. However, graphs are the most difficult
data organizations to visualize. Moreover, views of the full graph structure are difficult
for the users to comprehend. Although clustering and filtering help in reducing the
complexity, unless the user has some idea about the underlying information space, she
would not know what type of clustering or filtering would be useful for her. Therefore,
some automatic mechanism which shows the user the overall information space in an
easy-to-understand visualization would be required.
In [9] Parunak notes that: ``The insight for hypermedia
is that a hyperbase structured as a set of distinguishable hierarchies will offer
navigational and other cognitive benefits that an equally complex system of
undifferentiated links does not, even if the union of all the hierarchies is not itself
hierarchical.'' [8] observed that the ability to view
knowledge from different perspectives is important. Thus, if different hierarchies, each
of which gives a different perspective to the underlying information can be formed, the
user would be better able to comprehend the information. It should be also noted that
unlike graphs some very effective ways of visualizing hierarchies have been proposed.
Examples are Treemaps [3] and Cone Trees
[11].
Therefore, we have developed an algorithm for forming hierarchies from hypermedia
graphs. It uses both structural and content analysis to identify the hierarchy. The
structural analysis looks at the structure of the graph while the content analysis looks
at the contents of the nodes. These hierarchies can be visualized in various ways.
Visualizing WWW through Multiple Hierarchies
Our algorithm has been implemented in the Navigational View Builder. The left hand screen
of Figure 10 shows the top level of the default hierarchy created
for the data by our algorithm. The file research.html which lists the various
research activities of GVU is the root. It has branches to the major research area as well
as to gvutop.html, a file containing general information about GVU. The right
hand side of Figure 10 shows a view of a section of this hierarchy
(showing research in Software Visualization) where the nodes are listed as a table of
content of a book. Figure 11 shows a 3d tree view of this
hierarchy. Binding can be used in these views as well. Thus, in this figure the node
colors represent author types while the link colors depend on the media type of the
destination pages. Also note that the interface allows various zooming, rotating
and filtering operations for the 3d tree.
In each step, our hierarchization algorithm tries to identify a root and partition the
other nodes of the graph into different branches. These branches are themselves graphs
and the algorithm is recursively called for each branch. At each step various partitions
are possible based on structural and content analysis. A metric is used to rank these.
By default the partitioning with the best metric is chosen and thus, the algorithm forms
a hierarchy automatically. However, the user can guide the hierarchization process.
Various choices for forming partitions can be shown in a menu as shown in Figure
12 and the user can choose any one. Details of the algorithm is
presented in [7].
The left hand screen of Figure 13 shows a Treemap view of a
hierarchy that was formed with the attribute topic being used to initially partition the
nodes. Here also colors are used to represent author-type. In this way, multiple
hierarchies, each giving a different perspective to the underlying information space can
be formed. If a user selects a node in one view, its positions in other views are also
highlighted. Thus, these views help the user in comprehending the data. It should be also
noted that the user can go directly to the corresponding WWW page for the selected node.
Thus in the Treemap view the node visdebug.html is highlighted. The corresponding
WWW page is shown on the right hand screen of Figure 13.
Generating Other Views
Once a hierarchy is formed from the original graph structure, other views can be
developed as well. For example, if the original partitioning for forming the hierarchy
was done by a quantitative attribute, a linear structure sorted by that attribute can be
formed from the subtrees of the root node. For example, Figure 14
represents a perspective wall [4] view of a linear structure
sorted by the last-modified-time. From the hierarchy whose initial partitioning was by the
attribute last-modified-time, the files were divided into partitions based on the time
when they were last modified. These partitions were arranged on walls. Only some walls are
in the focus at a given time. The user can easily control the walls which are in focus
through a scrollbar. Similarly a tabular view showing useful statistics for the various
pages and also for groups of pages organized by topic can be formed by a depth-first
traversal of the hierarchical structure whose initial partitioning is done by the
attribute topic.
Conclusion
We have presented the Navigational View Builder, a tool for forming various useful
visualizations of the WWW. We believe that by using the various strategies that are
described in the paper the user will be able to form visualizations that help in
reducing the lost in hyperspace problem. Future work is planned along the following
directions:
- A limitation of our system is that no evaluation of how useful our strategies and
views really are have been done so far. We plan to do
serious usability studies in the near future. These studies may give us new insights that
will help to improve our system.
- At present our system runs for the research pages of GVU having about 400 nodes and
800 links. For larger data sets our hierarchization algorithm will become slower.
(Presently the algorithm takes about 7 seconds on a SGI reality engine). Therefore, we are
trying to use smarter data structures to improve the efficiency of the code. Moreover, for
larger data sets better visualizations of hierarchies will be needed. Research needs to
be done in this direction.
- To improve our content-based analysis we need to incorporate more useful metadata
into our system. For example, information like the last time a file was accessed or the
number of times a link was traversed would be very useful. These information can be easily
accessed by incorporating a web analysis tool [10] into our
system. Finding other useful metadata is an open research issue.
Acknowledgment
This work is supported by grants from Digital Equipment Corporation, Bell South
Enterprises, Inc. and Emory University System of Health Care, Atlanta, Georgia as part of
the Hypermedia Interface for Multimedia Databases project.
References
- 1
-
M. Andreessen.
NCSA Mosaic Technical Summary.
Technical report, National Center for Supercomputing Applications,
1993.
- 2
-
G. Battista, P. Eades, R. Tamassia, and I. Tollis.
Algorithms for Drawing Graphs: an Annotated Bibliography.
Technical report, Brown University, June 1993.
- 3
-
B. Johnson and B. Shneiderman.
Treemaps: A Space-filling Approach to the Visualization of
Hierarchical Information.
In Proceedings of IEEE Visualization '91 Conference, pages
284-291, 1991.
- 4
-
J. D. Mackinlay, S. Card, and G. Robertson.
Perspective Wall: Detail and Context Smoothly Integrated.
In Proceedings of the ACM SIGCHI '91 Conference on Human Factors
in Computing Systems, pages 173-179, New Orleans, La, April 1991.
- 5
-
S. Mukherjea and J. Foley.
Navigational View Builder: A Tool for Building Navigational Views of
Information Spaces.
In ACM SIGCHI '94 Conference Companion, pages 289-290, Boston,
Ma, April 1994.
- 6
-
S. Mukherjea, J. Foley, and S. Hudson.
Interactive Clustering for Navigating in Hypermedia Systems.
In Proceedings of the ACM European Conference of Hypermedia
Technology, pages 136-144, Edinburgh, Scotland, September 1994.
- 7
-
S. Mukherjea, J. Foley, and S. Hudson.
Visualizing Complex Hypermedia Networks through Multiple
Hierarchical Views.
To appear in Proceedings of ACM SIGCHI '95, May 1995.
- 8
-
C. Neuwirth, D. Kauffer, R. Chimera, and G. Terilyn.
The Notes Program: A Hypertext Application for Writing from Source
Texts.
In Proceedings of Hypertext '87 Conference, pages 121-135,
Chapel Hill, NC, 1987.
- 9
-
H. Parunak.
Hypermedia Topologies and User Navigation.
In Proceedings of Hypertext '89 Conference, pages 43-50,
Pittsburgh, Pa, November 1989.
- 10
-
J. Pitkow and K. Bharat.
WEBVIZ: A Tool for World-Wide Web Access Log Visualization.
In Proceedings of the First International World-Wide Web
Conference, Geneva, Switzerland, May 1994.
- 11
-
G. G. Robertson, J. D. Mackinlay, and S. Card.
Cone Trees: Animated 3D Visualizations of Hierarchical Information.
In Proceedings of the ACM SIGCHI '91 Conference on Human Factors
in Computing Systems, pages 189-194, New Orleans, La, April 1991.
- 12
-
K. Utting and N. Yankelovich.
Context and Orientation in Hypermedia Networks.
ACM Transactions on Office Information Systems, 7(1):58-84,
1989.
- 13
-
J. Wernecke.
The Inventor Mentor: Programming Object-Oriented 3D Graphics with Open
Inventor, Addison-Wesley Publishing Company, 1994.
- 14
-
C. Williamson and B. Shneiderman.
The Dynamic HomeFinder: Evaluating Dynamic Queries in a Real-Estate
Information Exploration System.
In Proceedings of the ACM SIGIR '92 Conference on Research and
Development in Information Retrieval,
pages 338-346, Copenhagen, Denmark, June 1992.