3. File System Document Access

The network congestion and server load incurred by many HTTP and FTP servers demonstrate some limitations of the current information retrieval services. In particular, these services lack adequate facilities for caching, replication, and security [3]. This section describes a simple layered architecture that builds on existing features and capabilities of distributed computing systems to address and resolve these problems. In particular, it presents simple WWW client modifications that use, whenever possible, a wide-area file system to store and transfer data.

3.1 File System Access with WWW

Clients refer to a document stored in the WWW with a Uniform Resource Locator (URL) [2]. A URL consists of a naming scheme, a network host address, and a location specific document name. The URL specification makes provision for file system access through the FILE naming scheme. For example, the URL:

  file://localhost/afs/transarc.com/public/www/Announce.html
refers to a document in the local file system. To resolve this URL, WWW clients such as Mosaic simply open the file in the local file system. If the host name in the URL is not localhost, then the URL is resolved using the FTP protocol.

The WWW implementation of the FILE URL assumes that the name space of files is host specific as is the case with transfer protocols such as HTTP and FTP. However, since wide-area file system clients can access all files through a single name space---even those on remote file servers---the host name is irrelevant and the file should be opened locally. For example, the following URL can be resolved locally by any wide-area file system client, regardless of location, without using FTP:

	file://grand.central.org/afs/grand.central.org/public/www/Home.html
To correct the WWW assumption that physically remote files reside in a separate name space, we propose a modified scheme for resolving URLs that prefers file access through the locally mounted wide-area file system, even if the host is not local. For AFS clients the WWW client converts any URL of the form:
		file://HOST/afs/CELL/path/name
into a URL for an AFS file that can be accessed locally:
		file://localhost/afs/CELL/path/name
If the file is not accessible through AFS, then the URL is converted into an FTP request:
		ftp://HOST/afs/CELL/path/name

3.2 URL Translation

The example described in the previous section illustrates the benefit of converting URLs into a form that facilitates document access through a wide-area file system. The translation facility requires minor changes to the WWW client, and also is general enough to accommodate translations for other types of URLs.

We augmented the WWW client with a URL translation table that converts URLs to a preferred form. The translation table specifies a list of translation rules. Each translation rule consists of a regular expression and its modified version obtained by making substitutions and simple transformations in the the original expression. Every request for a URL is matched against a regular expression in the translation rule. If it succeeds, the URL is translated into the modified version which, in turn, is used to access the document. If this attempt fails, the next matching rule is used. If all matching rules fail, the client returns the original URL.

For example, the translation described in the previous section is accomplished through the following two rules:

		file://[\^/]*/afs/(.*)      file://localhost/afs/\1
		file://([\^/]*)/afs/(.*)    ftp://\1/afs\2
Similarly, the translation rule:
	http://www.cs.cmu.edu/afs/cs/(.*)   file://localhost/afs/cs.cmu.edu/\1
defines preferred access for documents exported by the www.cs.cmu.edu HTTP server.

The order of rules in the translation file is important; a rule is used only when all previous candidates fail. The rules can also be of the form "no translation", meaning that the matching URLs should not be translated. This allows full customization and expression of preferences within the translation facility.

There are times when a preferred translation no longer offers a route to the document. To alleviate this problem, our prototype of the translation facility uses a ``3-out-of-5'' heuristic for assessing the quality of a translation. When a rule fails three times during the last five attempts, that rule is no longer used during the current browsing session.

The translation facility acts like a proxy server [10]. The advantage of a translation facility combined with a WWW client is the reduced document access time. A client does not have to contact the proxy server, but can access documents directly, most often through a wide-area file system. The translation facility is general enough to be used with other wide-area file systems, proxy servers and WWW gateways.


Mirjana Spasojevic, C. Mic Bowman, and Alfred Spector.