WebGUIDE: Querying and Navigating Changes in Web Repositories
WebGUIDE: Querying and Navigating Changes in Web Repositories
Fred
Douglis |
Yih-Farn Chen |
Thomas
Ball |
Eleftherios Koutsofios |
|
|
Motivation
- The web as a publishing medium
- Lots of data out there, changing at different rates
- Examples: WWW5 home page, tech support, org charts
- Useful to track changes to web pages
- Learn about changed content
- Find new links of interest
- Numerous tools for tracking when individual pages change
- We focus on how they change, and on clusters of pages
WebGUIDE
- Web Graphical User Interface to a Difference Engine
- Combines Ciao (graphical interface) and AIDE (difference engine)
- Ciao shows high-level structural changes on collections of pages
- AIDE shows low-level differences
- Can see changes graphically or textually
- Web Time Travel
- View a snapshot of web-of-the-past
- Compare web as of two points in time (recursive diff)
WebGUIDE Example
HtmlDiff Example
HtmlDiff: Here is the first difference. There are 9 differences on this page.is old. is new.
...
AIDE :
AIDE
: AT&T Internet Difference Engine
Description
:
AIDE
is a tool that ``remembers'' HTTP pages for users and then uses
Tom Ball
's
HtmlDiff
program to highlight changes to subsequent versions of the page.
It is related to
WebGUIDE
, which provides a graphical interface.
For more information, contact
Fred Douglis
Recursive HtmlDiff Example
HtmlDiff: Here is the first difference. There are 9 differences on this page.is old. is new. /% for recursive diff.
...
AIDE : %
AIDE
: AT&T Internet Difference Engine
Description
:
AIDE
is a tool that ``remembers'' HTTP pages for users and then uses
Tom Ball
's
HtmlDiff
program to highlight changes to subsequent versions of the page.
It is related to
% WebGUIDE
, which provides a graphical interface.
For more information, contact
% Fred Douglis
Ciao
- Graphical navigation of document repositories
- Three components
- Abstractor converts source documents to database representation
- Repository contains entity-relationship (E-R) database and documents
- Graphical interface to query and visualize structure
- Used for C/C++, shell scripts, HTML
AT&T Internet Difference Engine (AIDE)
- CGI interface
- Notification of updated pages, organized by priority
- Shared version archive
- Differencing (HtmlDiff)
Integrating AIDE and Ciao
- Ciao uses AIDE repository to get old versions to compare structures
- AIDE & Ciao share logic to get current versions from web
- Ciao uses AIDE "What's New?" functionality to annotate nodes with
changed data
- Click on graph node (image map) goes to WebGUIDE form
- Ciao graphical operations
- AIDE operations (archive, diff, ...)
Architecture
- System components
- Graph Generator (Ciao)
- Difference Engine (HtmlDiff)
- Robot to track modifications
- Version and meta-data repository
Recursion
- Collections of documents often interesting
- Virtual library page and its descendants
- Related pages in a directory
- Operations to perform
- Find when any page in collection is updated
- Archive changes to related pages as a group
- Run HtmlDiff on pages referenced by page being compared
- What versions to compare?
- Base on timestamps
- Use newest archived version no newer than root page
- AIDE uses to show available diffs and % for unavailable ones
Picking Versions
Performance Issues
- Graph generation
- Data extraction on the fly
- Graph layout
- Recursive diff
- Database access slows down HtmlDiff
Future Work
- Direct access to Ciao
- Ciao outside of a web browser much more effective
- Fast panning and zooming
- Simple GUI for graph operations
- Ciao plug-in for Netscape
- Performance enhancements
- Caching and optimistic generation of HtmlDiff output
- Caching of Ciao metadata
- AIDE-specific issues
- Security and privacy concerns
- Copyright
Related Work
- Tracking modifications
- Smart Bookmarks: integrated with bookmark file
- URL-minder: Internet-wide service via email
- Difference detection
- Do-I-Care: finding interesting changes (Starr, et al.)
- Hierarchical differences (Chawathe, et al.)
- Browsing
- WebMap: visualize relationships between pages seen in browser
- Hyper-G: hierarchical navigation and searches
Conclusions
- WebGUIDE combines Ciao with our Difference Engine
- Tracking and viewing of changes to web pages
- Graphical interface for comparing document structures
- Recursive tracking and differencing at both textual and graphical levels
- User-interface and performance considerations suggest some extensions and modifications to this model
- Building a user community