1. Introduction

The dramatic increase in networked information access through the World-Wide Web [1] demonstrates the value of wide-area information sharing. However, the Web faces many challenges as it continues to scale to accommodate the ever increasing number of users. Among those are increased server and network load, network latency and inadequate security.

Several decades of distributed systems research address the problems of scale. Mechanisms for caching, replication, location transparency and authentication facilitate improved performance, reliability and security. Past research and commercial use shows that the distributed file system paradigm, in particular, is sustainable at a very large scale. Commercial wide-area file systems are capable of providing effective access to several terabytes of data [12].

This paper proposes an architecture that addresses many problems experienced by the current Web clients and servers. The paper discusses the impact of wide-area file system features such as location transparency, access control lists, authentication, client caching, data replication, and file migration. It demonstrates how these features can be used successfully to improve performance, decrease server and network load, and increase security. Although we use AFS [7] for our study, we believe that our experience demonstrates how a global, general purpose file sharing system can be utilized successfully as an information retrieval service.

The paper presents a simple WWW modification that attempts to retrieve files, whenever possible, through a locally mounted wide-area file system. Files available through HTTP, a common WWW file transfer protocol, or FTP servers are often stored in the wide-area file system and can be retrieved more efficiently through the file system interface. To accomplish this, we have developed a translation facility for World-Wide Web clients that converts WWW uniform resource locators (URL) into the name of a file in a locally available distributed file system. The translation facility is general enough to be used with other wide-area file systems, proxy servers and gateways.

Storing documents in a wide-area file system offers many advantages, including efficient document retrieval. The ability to quickly access documents encourages the idea of forming an organizational repository for documents of interest to a group of users. When these documents are stored in a wide-area file system, they are accessible by any client. We describe construction and use of document repositories that cache documents not otherwise available through the wide-area file system.

The next section discusses features of a wide-area file system and how they can alleviate the current Web problems. Section 3 describes the URL translation facility. Section 4 presents the design of a document repository. Finally, Section 5 provides conclusions and suggestions for future extensions of wide-area file systems.

Mirjana Spasojevic, C. Mic Bowman, and Alfred Spector.