WWW5 Fifth International World Wide Web Conference
May 6-10, 1996, Paris, France


Towards a World-Wide Data Base

Erik Sandewall
Department of Computer and Information Science
Linköping University
58183 Linköping, Sweden


Introduction and summary

The present paper proposes that a database can often be organized as a large collection of small text files, each containing the structured information that relates to a particular object. This means that the basic idea of WWW pages - small text files that are cross-linked by symbolic addresses - is generalized to become a database technique as well. The database pages are expressed in HORL notation (HyperObject Representation Language). Just like a WWW browser accesses and reads HTML pages dynamically as it needs them, our WWDB database system accesses and reads HORL pages as it needs them in the course of its data processing operations. When an HORL page is read by the database system, its contents are converted to an internal representation, and it does not have to be read again during the same session.

The WWDB technique can easily be combined with WWW usage. It was developed as a tool for generating WWW pages with structured contents, for example annotated publication lists, directories of authors, journals, and conferences, etc. It has also been put to a second usage in a mail management system. Our experience suggests that it may be an attractive and viable technique for many kinds of low-to-medium duty database applications, in particular, those where the database is a source of information rather than e.g. the basis of a transaction processing system.

In this paper, we first describe the application within which the WWDB concepts were developed, and then proceed to a description of the essential technical characteristics of the existing, experimental implementation. Finally we discuss what conclusions can be drawn from the project so far, and the perspectives for continued use of this technique.

Application: The Electronic Colloquium

A scientific colloquium is a group of scientists who meet more or less regularly in order to exchange information about recent developments in their field. Besides presentation of new research results, the colloquium is also an important source of references to new results: interesting new articles are mentioned, conferences are advertised, and so on. One can think of it as a joint information-gathering activity: since no single member is able to continuously scan all the possibly relevant information that appears in journals and other media, the colloquium members share the work and tell each other about what is worth knowing.

An electronic colloquium generalizes this concept to the electronic arena, and the WWW is an ideal substrate for it. Simply speaking, the colloquium home page offers a menue of specific services, in particular:

The important thing about a colloquium is that it should have very clear focus, and be oriented to a particular research topic. Thus, given that articles addressing the given topic may appear in any one of a large number of journals or conferences, but in each of those only a small percentage of the contents are actually relevant for the colloquium topic, the colloquium will be highly selective. Ideally, it will present all relevant contributions from those sources, and no irrelevant contributions. The colloquium members define the focus and perform the selection.

Electronic colloquia of this kind are particularly important on the European scene, where they offer a possibility for researchers in different countries to obtain continuous interaction with a group of sufficient critical size.

The Compulog project of Esprit (European Union research program in information technology) has recently started an electronic colloquium for spatial and temporal reasoning (ECSTER). This is a sub-area of research in knowledge representation and artificial intelligence, dealing with logical and algorithmic methods for reasoning about actions and their effects, developments over time, etc. Planning, scheduling, and diagnosis based on temporal and spatio-temporal data are some of its application areas. It is an example of a specialized research topic, containing work that ranges from the highly theoretical to the quite practical, where an electronic colloquium would be of interest. The present number of active researchers in Europe in this area is estimated to be around one hundred, most of them working isolated or in small local groups.

For obvious reasons, we chose to use the WWW as the information carrier for ECSTER. At first, experimental versions of the important colloquium pages were set up completely manually, using a text editor, but it became readily apparent that this was inefficient and inconvenient. The problem consisted not only of having to write HTML syntax, but also in the redundancy of the actual information contents: the same data tended to appear repeatedly in multiple contents. Furthermore, in order to have a reasonable order in the accumulated information, it was necessary to organize it in terms of a multi-level directory structure under the operating system being used (Unix), but the chores of locating files in different directory levels was a nuisance in itself. Finally, we wished to have a clear separation between the working version and the public version of each HTML file, so that a maintainer of a page or substructure could work with it to satisfaction, and only then release it for public viewing.

In summary, the practical overhead concerned both the structure within the WWW pages and between them. Furthermore, the same problems arose with other information that we were dealing with, besides the WWW pages. Distribution of published papers is an important function of an electronic colloquium, and the various aspects of a paper (full text, abstract, commentary, annex containing experimental data or software and its documentation, etc.) impose administrative burdens that are fairly analogous to those that arise for the WWW pages in HTML format.

It became clear very soon, therefore, that we needed to introduce a structured representation for these kinds of information. This structured representation should be a database in the sense of having little or no redundancy, so that each essential information element is only represented once, and it should lend itself easily to being processed in operations that combine related information elements. The HTML representation should then be generated from that database, or, to be precise, large sections of the HTML files should be generated from underlying structured data. There must always be some parts which only serve presentation purposes, and which continue to be best written in HTML. For example, an HTML page may have the following essential structure:

  1. Heading, identifying the page as a part of the ECSTER structure.
  2. Introductory text describing the purpose of this particular page.
  3. A list of things (papers, authors, or conferences, for example), with links to more information about each of them, and which is generated from underlying structured data.
  4. Additional information which is so unstructured that it does not fit well into the data base, but which should anyway be present.
  5. Footing, identifying the person responsible for the present page, and the date of last update.
In such a case, items 1, 3, and 5 might be generated automatically, and items 2 and 4 would be written manually. There must therefore be a convenient way of regenerating one or more of items 1, 3, and 5, while retaining the manually edited parts 2 and 4.

We believe that this situation is typical of many "bread and butter" applications of WWW. Naturally, home pages and other pages that a user encounters more or less immediately must be more interesting and less standardized, but as regards those pages that serve a productive purpose, it seems that the information one wants to present is often structured, and can best be generated from an underlying representation. The availability of a richer presentation language, with audio, animation, color, and embedded video capabilities does not change the essential situation: if anything, it will increase the need for a structured representation of the information. We will return to this topic in the final section of the paper.

If HTML pages are generated from underlying data, there is a choice whether the generation is to be done in advance, under the direction of the person editing the data, or on demand as the user accesses the information. The difference is a practical one, since the generation process is quite similar in both cases. In our application there has not yet been any strong reason for on-line generation of HTML, so we have chosen the former alternative so far. The methods proposed below would work equally well in the case of on-line generation, however.

For the representation of the structured data, the most obvious choice might have been to use a conventional database system. However, we chose instead to organize our database using large numbers of small text files, which are expressed in the HORL syntax. The resulting database is still of moderate size, but it has the inherent capability of growth that is suggested by its name, a World-Wide Data Base. We proceed now to describing this design and the reasons why it was chosen.

The World-Wide Data Base

General

Like any software design, the WWDB has a basic design that captures the key ideas, and then a number of modifications and extensions that answer to specific needs. We now describe the basic design.

The WWDB is an object-oriented database in the literal sense that it is organized as a collection of objects each of which has a number of properties. Objects are classified into types; objects of the same type have similar sets of properties. Other notions that are often associated with the term "object-oriented", such as message-passing and inheritance, are not presently used in the WWDB. In database terminology, the WWDB may be described as a binary database.

Each object has a name and a description. The object's name is like an identifier in an ordinary programming language. The description is an expression that maps labels to properties. For example, the combination of the name |France| and the type |countries| may be assigned the following description:

      { CAPITAL ~ |Paris|,
        CURRENCY ~ FRF,
        NEIGHBORS ~ { |Belgium|, |Germany|, |Switzerland|, 
                      |Italy|, |Spain|, |Andorra| }}
where the tilde character is to be read as an arrow, connecting a label and the corresponding property. The description is the real "object"; several objects may have the same name, but for each combination of a name and a type there may be at most one description. (For example, if persons are denoted by their last name, then the combination of |France| and |persons| may represent Anatole France). Properties may be names, or sets of things, but also numbers, strings, sequences of things, new mappings, etc.

So far, this is quite conventional, and it should be clear how one can build a database with authors, publications, universities, cities, countries, journals, conferences, and so on as some of the types. Rather than storing these objects and object descriptions in an ordinary database system, we chose to create one file for each object, and to store the description in textual form in that file. Instead of a database system, we now have a database browser, that is, a program that reads database text files as it needs them.

One of the uses of the database browser is for interactive updating of the database: adding more information, or correcting its existing data contents. Typically, this usage is interleaved with generation of HTML pages. A number of other tasks are also evident, such as for database search, and for consistency controls, but so far the HTML generation task has dominated in our applications.

Connection between HTML and WWDB

Let us discuss how the WWDB browser is used in the context of maintaining WWW pages. A particular example is the ECSTER page containing a list of authors in its area. This page exists in three versions. There is a public version which is the page normally seen by colloquium members and other visitors. There is also a working version, which is similar to the public version, but which is used in the interactive loop where alternatingly the database is edited and HTML pages are generated. In this way, one does not have to put a page on the net until one is sure that the generation works correctly. Finally, there is an extended version, where the maintainer may keep various information that is potentially relevant and which may later on be included in the public version, but not at present.

The full ECSTER structure consists of a number of such pages, which are linked in an approximate tree structure. The public versions and the working versions are linked as parallel structures, so that a public page links to a public subpage, and a working page to the corresponding working sub-page. Only on exit from the structure, for example in references to the full text of an archived article, or the reference to the home page of a researcher, do the parallel structures converge to common points.

Extended versions do not form a third parallel structure. Instead, an extended-version page links to working versions of subordinate or neighboring pages.

The working version of a page is in principle the same as the public version plus the recent improvements, except that at the top of the working-version page there is a short section containing three links: to the corresponding public version, the corresponding extended version, and to the WWDB browser. Therefore, the routine whenever this page structure is maintained is to move around the structure of working pages, and change them until satisfaction, all the time seeing them in exactly the same way as the visitor/user sees the published pages. Only if one wishes to compare the present working version with the present published version does one use the link at the top of the working page. The link from working page to extended page is used similarly for reference to related data.

The on-line reader is invited to visit ECSTER's public home page and the working home page, as well as their respective sub-pages, in order to see how this works. The clickable item "[revision]" has locally the effect of invoking the WWDB system; remote users will only see the Lisp code but not its execution.

As discussed above, each page typically contains some parts that are to be edited directly on HTML level, and some parts that are to be generated from the database. The direct-editing parts are modified in the usual fashion, for example using the editing capability of the HTML browser, or using a plain text editor. (We are currently using Emacs for this purpose). The automatically generated parts are distinguished by the separate quasi-HTML command label, so the text of an HTML page may have the following structure:

   <label heading>
   Automatically generated heading
   </label heading>
   Text pertaining to manually written and edited parts...
   <label contents>
   Automatically generated part
   </label contents>
   More text pertaining to manually written and edited parts...
   <label footing>
   Automatically generated footing
   </label footing>
Of course the label and /label commands are ignored by the HTML browsers, and for the maintainer of the page they indicate that whatever goes between <label x> and </label x> shall be left alone, since it will be regenerated anyway.

In order to change the autogenerated information, for example for adding one more author, or updating the information about a particular conference, the maintainer clicks the [revision] link at the top of the working page. This invokes or resets the WWDB browser, which is put in a state where the database object corresponding to the current HTML page is the current object, and |webpages| is the current type. The maintainer may then use the database browser to update the datastructures, and finally invoke commands that regenerate the current HTML page.

Concretely, suppose the current HTML page has the filename

        /info/www/ext/brs/researchers/index-wv.html
and that it contains three autogenerated segments as described above. The corresponding database object is stored as the file
        /info/www/ext/brs/researchers/index-wv.horl
with contents which may look as follows (simplified form)
    { META ~
        { ACCESSPATH ~ "/info/www/ext/brs/authors",
          OBJNAME ~ |author-index|,
          FILENAME ~ |index-wv|,
          PUBLNAME ~ |index|,
          EXTENDNAME ~ |index-xv| },
      TITLE ~ "Catalogue of authors in the ECSTER area",
      FORMAT ~ |ecster-page|,
      LANGUAGE ~ |english|,
      GENERATORS ~ {
        |heading| ~ (GENERATE |ecster-heading|),
        |contents| ~ (ALLMEMBERS |authorlist| |author-display|),
        |footing| ~ (GENERATE |ecster-footing|) }}
The name of this database object is author-index; it could not be just index or index-wv since many different HTML pages are called index. This object description contains enough information in order to reconstruct the full file name of the working version, the public version, and the extended version of the HTML page, since both access path and filename are there. It also contains relevant parameters, such as the language in which the page is written, which is needed in order to write language-independent generators. Finally, it specifies the generator methods for regenerating the three auto-generated segments.

To use those methods, the maintainer uses the interactive commands rg and rgp. Writing

    rg contents
to the WWDB browser (or selecting the same command from pull-down menues; not implemented at present) when |author-index| is the current object, will cause the current HTML working page to be regenerated, retaining all lines except the text between the lines <label contents> and </label contents>. The expression (ALLMEMBERS |authorlist| |author-display|) in the object description specifies the recipe for this generation process: the object |authorlist| is an object containing a list (ordered set) of authors, which is here used as the basis for generation, and |author-display| is a script specifying how to generate the appropriate HTML expressions based on a given object of type author. (Naturally, |authorlist| may be used in several different contexts). The operation ALLMEMBERS looks up the list of members represented by its first argument, and generates HTML code for each of them in succession using the script of the second argument.

As the WWDB browser is invoked from the working page, only the program and a kernel set of objects are loaded. Additional object descriptions are loaded from their text files as they are needed. For example, the first time the command rg contents is given in the example, it will cause the WWDB browser to load the description of the object |authorlist| from its file, and then in turn it will load the descriptions of all the authors that are in the author-list. The presentation of these authors may in turn require the loading of additional objects. For example, the current affiliation of an author may be represented by specifying a WWDB name, for example |TU-Munich|, and then the corresponding description has to be loaded in its turn. An encouraging observation from the present experimental implementation is that this loading of successive objects can be performed quite rapidly, and does not offer any practical performance problems.

Loaded objects are retained in working memory, so the next time the same rg command is issued they do not have to be re-loaded.

The operation GENERATE, by contrast, is a simpler operation which generates HTML expressions according to the formatting directive of its single argument, in the presence of the current object. For example the |ecster-heading| format will use the HEADING property of the current object for both the HTML TITLE lines and the first-level headings.

Thus the routine of the web-page maintainer is to modify the database, using the viewing and editing commands of the WWDB browser, and to regenerate the HTML working page from time to time using the rg command. Since the WWW-HTML browser and the WWDB browser appear in separate windows on the workstation, it is easy to reload the regenerated HTML page, look at it, and return to the WWDB browser as necessary.

Finally, when the new HTML page or set of pages is satisfactory, one regenerates the public page(s) as well, using the command rgp (for "regenerate public"), as in
    rgp contents

What has now been described is the basic organization of the system. Additional modifications can easily be introduced into the same architecture. For example, if a given update or set of updates of the database affect a number of HTML pages, it would be desirable to keep track of those dependencies, regenerate all affected pages, and inform the user of what pages have been changed. The existence of a database with flexible datastructures is a correct basis for implementing such services.

For another example, if the manually edited (non-automatic) parts of a working page have been edited, and are to be transferred to public status, then all links to working versions of subpages or other related pages must be replaced by links to public versions. This requires a systematic scan of the entire text contents, either for removing all substrings of the form -wv or (more reliably) removing such substrings if they appear in appropriate context, and otherwise giving a warning message.

Details of the current implementation

The present implementation is a single-user system which is being used regularly as a working tool for maintaining the ECSEL information structures. (It is also used locally as a mail manager, and for administrating the user's own publications). The WWDB browser has been implemented in Xlisp, which is a variant of CommonLisp [Steele, 1984]. The Xlisp implementation is available for Unix, PC, and Macintosh platforms. The present system runs in a Unix environments on Sun workstations under the Solaris operating system.

The reason for choosing CommonLisp for the experimental system was that the operations of printing and reading datastructures are built into the language, so that the transfer between the text-file representation and the in-memory representation of the object descriptions is trivial. An additional reason was that CommonLisp datastructures lend themselves easily to the implementation of embedded sublanguages, such as the script language for defining HTML generators.

The present implementation is experimental, and has been written without any particular consideration of efficiency. In spite of this, it operates with quite adequate speed. The loading time for the WWDB system is 9 seconds on a Sparcstation 10, provided that the LAN does not slow it down. (Typically it is only loaded once a day, and then used repeatedly throughout the day). The time for regenerating the list of authors, with its current 170 members, in the example document, was measured to 32 seconds for the first-time generation where all participating objects have to be loaded dynamically, including objects for affiliations, cities, etc., and to 9 seconds for repeated generations. This is quite adequate, but if it becomes a problem as the database grows, one can easily divide the list into several sections that are regenerated separately. Then, only the affected sections need to be regenerated after a database update session. (The separation into sections is invisible for the viewer of the generated HTML page). These figures are obtained with interpreted code and without having made any particular effort to optimize the implementation.

The technique described here can be implemented quite compactly. The following figures for the size of the present program show that it is easy to implement and re-implement a WWDB browser. The figures refer to lines of LISP S-expressions, spatiously printed:

  • 600 lines for the core program
  • 800 lines of general-purpose definitions of scripts and interactive commands
  • 600 lines of scripts (mostly) and interactive commands for publications and related concepts in the colloquium application
  • 800 lines of similar material for the E-mail management application.
In other words, with 50 lines per page, the core program is about 12 pages of CommonLisp programs. There is of course no reason why it can not be implemented in any other language.

The main limitation of CommonLisp and Xlisp at present is the limited access to screen dialogue capability. An obvious alternative would be to use Java, which would remedy that limitation. On the other hand, it would require a separate implementation of a package for printing and reading datastructures. The same requirement arises when the work is redone in e.g. C++.

For convenience in the development stage, we are using standard Lisp I/O of data structures (that is, Lisp's read and print functions) in parallel with the HORL representation.

The program is freely available. After some additional polishing, we intend to make the program available via ftp and the documentation via WWW.

Discussion

The presently implemented system has the advantage of simplicity: with a small implementation effort it has been possible to realize a tool which is reasonably general, and which works well for the intended first application. However, the purpose of the present article is to promote the general idea embodied in the program, rather than the program itself. We shall now address the design idea from several distinct perspectives.

Separation of content and appearance in WWW languages

In spite of its intentions, HTML does in fact combine content and appearance. It is true that the contents are presented in a somewhat media-independent format, but it remains that HTML is a markup language. An HTML file contains everything that is going to be written on the screen, plus high-level information about how it is to be written.

The additional step that has been taken by WWDB is to make a much more complete separation of content and appearance, and to organize content as a database, while at the same time retaining the distributed text-file organization of WWW. Appearance has not been an issue for the present project: we are satisfied with the appearance capabilities offered by HTML for the time being, which is why we manage by generating HTML pages.

Java, which is generally viewed as the next step of development after HTML, is a programming language, not more, not less. Improved appearance capabilities is its particular strength, so in this way it represents an orthogonal development to the one shown by WWDB. It follows that WWDB and Java together would most likely be a very powerful combination.

One must recognize, however, that an absolute separation of content and appearance is not possible. It must be understood as a guiding principle, and not as a strict rule.

The world-wide perspective

The example above showed how a WWDB object may have a property containing the file name of an HTML file; this file can then be read and regenerated from the WWDB browser. In the same way, a WWDB object can have a property containing the file name of an HORL file, which is how the browser can start from some objects and successively read additional ones. We have first used this technique within the same computer system, but we have also started to use it with arbitrary URL:s as properties, allowing the WWDB browser to read and access HTML or HORL files from foreign servers. Let us first describe how this works within the local file system, and then discuss the extension.

Description-file retrieval within one file system. Briefly, the details are as follows. For every combination of an object name and a type name, the WWDB browser must be able to retrieve the full name of the file containing the object description, so that it can then load the contents of that file. The full name is constructed as (access path) + (file name) + (extension), where the extension is standardized as .horl for the hyperobject representation language used above, and .lsp if the same information is expressed in classical CommonLisp format. The retrieval process assumes that the description of the type is already available; if it is not then retrieval is called recursively with the previous type as the new object, and with |types| as the new type. (Types are a special kind of objects, of course). Then, two main cases are allowed:

(1) The same access path for all objects of the same type. In this case, the type description contains the access path for the members of the type, and the object name serves as file name. The construction of the full file name for the object is trivial.

(2) Each object in the type has its own access path. In this case, the access path is a property of the object, but not of the type. In fact, it is stored as a subproperty under the property META; an example of this was shown above for the case of an object of type |webpages|. The problem, of course, is that as long as the object description is still only stored as a text file, the browser does not know how to find it.

For this reason, WWDB contains the notion of concierges. A concierge, who is a key person particularly in Paris, is someone to whom you mention a name, and he or she will tell you where to go in order to find the person with that name. Similarly, a WWDB concierge is a WWDB object containing a mapping from names to access paths.

Therefore, the retrieval process which is given an object name, a type name, and the description for that type, will first check with the type description whether this type has a single common access path, or individual paths. In the former case, the type contains the access path and the object name becomes file name. In the latter case, the browser will go through all the currently loaded concierge objects and ask each of them whether they have an appropriate access path for the present combination of object name and type name, until it finds one that can provide the information.

Types, in particular, have distributed access paths. The initially loaded WWDB system therefore only needs to load the description of the types |types| and |concierges|, a concierge for all types (that is, all members of the type |types|) that may need to be used initially, and concierge(s) for other relevant objects, for example for relevant members of |webpages|.

Description-file retrieval by world-wide access. The access mechanism with concierges which know about access paths has been generalized to allow arbitrary URL:s, besides local paths. In this way, it is possible to construct a database that is similar to the vast body of displayable information already existing in HTML format. Individual contributions can be set up locally and made available on the Internet, and these contributions can be accessed and used by the database browsers of other users regardless of where they are. The power of this concept is that the usage of the information is not limited to viewing; it can also be processed, combined with information from elsewhere, and presented in very flexible ways.

The usage for electronic colloquia is important enough, but we foresee that the same technique can be used for much broader purposes. Imagine, for example, a world-wide database containing geographical and historical information: countries, cities, activities in those cities, historical events, and so on. It would be reasonable to start with fairly elementary facts, and then to extend the database by gradually attaching additional information to existing ones. A world-wide database with those kinds of contents could develop into an encyclopaedia that is available freely to everyone (in the same sense and to the same extent as the present WWW is free). More specialized knowledge bases in various academic disciplines might use the same technique.

Some additional constructs would be necessary as the world-wide database becomes larger and larger. The present design requires all participating partners to use the same naming scheme for types and for concierges. The distributed system may accomodate multiple descriptions for a given combination of object name and type name, for types with individual access paths, as long as each user only selects a subset of all available concierges. In this way the user only "sees" one of the descriptions for each object/type combination. But in a world-wide context, it may be necessary to accomodate different uses of the same type name or the same concierge name concurrently. One plausible way of doing that is to allow multiple domains, where each domain consists of a set of information providers, and the present naming scheme is used within each domain. For information exchange between domains, one would use the well-known technique of mediators, that is, devices that translate a query that has been issued in one domain as an object/type combination, into a corresponding query in another domain.

The distributed database perspective.

The current wisdom in the database area is that databases are represented by database systems, which are a particular kind of software, and where all data are "owned" by a particular database system. The user is supposed to enter his or her data into the database system, and this system can then be used for performing various operations on the data that are in it. Distributed database systems allow the additional possibility of having several such software systems which run on different computers, and which exchange information as needed. Heterogenous distributed database systems allow, in addition, that the participating database systems can have different internal structure, that is, they may organize "their" respective data in different ways.

The WWDB approach represents a deviation from this traditional mode of thinking. It allows data to be represented in small and simple text files whose contents are open to everyone. One is not dependent on the continued use of a particular database software; it is very easy to implement and re-implement support for the HORL format. This has been demonstrated by the moderate size of the present operational program, where the program kernel is a mere 12 pages of code.

Besides bringing independence from any particular software, the compactness of the WWDB design has another important effect: access to the world-wide database can easily be integrated with any user interface, be it a conventional WWW browser, a UIMS, a document preparation system, or a particular application program.

The TSIMMIS project at Stanford University [Garcia-Molina et al, 1995] advocates a tagged object model which is similar to our view of data. However, the notation that is used in TSIMMIS is on a quite low level compared to the set-theoretic notation used in our HORL. TSIMMIS do not report using access paths or URL:s as first-class objects in their database.

The WWDB approach goes against current trends in the database area in another respect as well: large main-memory databases are presently a subject of considerable interest. Although this is important for many applications, one can not hold a world-wide database in-core. The WWDB approach uses a browser-like database tool that loads HORL pages as it needs them.

What are the disadvantages of the WWDB approach? One of the major issues in traditional database technology is data consistency and integrity: a database system shall contain type declarations for the data it contains, and various control mechanisms for verifying the structural correctness and the consistency of those data. In a WWDB, we make a virtue of necessity and consider data consistency and integrity as a separate issue. Anyone who posts information as an information source in the WWDB will have to make his own commitments as to the structural properties of the data he provides. In some cases this may be a very small issue. In other cases it may not be sufficient, and then the WWDB approach is not appropriate for those cases.

Concurrent update is another although related topic. The WWDB approach is oriented towards the assumption that the data are object-oriented, and that different information providers make non-conflicting contributions and updates to the body of object descriptions for name/type pairs. The classical example from transaction data processing - making a withdrawal from one account, and a corresponding deposit to another account - would obtain miserable performance in the WWDB architecture.

Related and relevant work

We have not been able to find any earlier usage of our basic idea - a world-wide database defined by a simple, textual language (HORL) whereby database object descriptions can be stored in a fully distributed fashion as small text files. However, the proposal obviously touches on a number of current topics in different parts of computer science, including databases, knowledge bases, office systems, multi-media, and so on. It is neither possible nor meaningful to make a full account of all those ramifications here. We refer instead to the home page of the WWDB project (please refer to the URL in the list of references below), which contains both an account of these related areas, and references to other articles (including forthcoming ones) about the WWDB project.

Actually, one observation from our Electronic Colloquium project has been that the traditional list of references in scientific articles is likely to become an obsolete construct in the age of electronic publication. Why should one freeze the reference list into the article; why not generalize it into a bibliographic reference structure which connects articles by binary links, and which can be gradually incremented over time, even after the article has been published?

Summary

In summary, the WWDB represents an approach that has significant similarities and significant differences with current HTML-oriented WWW technology. It is orthogonal to the Hot Java development since Java is a programming language, and WWDB addresses data structuring. Similarly, it has significant connections and significant differences compared to current database technology. In other contexts, we discuss its relationship to knowledge-base technology and to office information systems (please refer to the WWDB home page for the references). The following are the salient points of this new approach:
  • The World-Wide Data Base consists of short text pages in HORL syntax - analogous to HTML pages but for structured data.
  • WWDB pages live in symbiosis with HTML pages: the WWDB system is invoked in context from HTML pages, and WWDB pages are used to regenerate HTML pages - often, the same page as the system was invoked from or reset from.
  • The WWDB design is simple, easy to implement, and supports a rapid dialogue. Our implementation is in routine use today. It is independent of programming language.
  • WWDB pages which are put on-line at one site can be accessed and used at any other site, world-wide, and not only for viewing but also for processing. Contributions can be accumulated towards a world-wide database.

References

WWW pages:

The homepage of the WWDB project: [http://vir.liu.se/brs/database/]

The homepage of the ECSEL electronic colloquium: [http://vir.liu.se/brs/]

The author's homepage: [http://www.ida.liu.se/~erisa/]

Conventional publications:

Hector Garcia-Molina, Joachim Hammer, et al: Integrating and Accessing Heterogeneous Information Sources in TSIMMIS. Presented at the AAAI Symposium, 1995. Also available on-line in [postscript].

Guy L. Steele Jr: Common LISP. The language. Digital Press, 1984.