Not free, but growing fast: the Web in Germany

Michael Lenz
German National Research Center for Computer Science (GMD),
53754 Sankt Augustin, Germany

Content

Web servers in Germany
Sprechen Sie Englisch?
Data Protection Laws in Germany
Towards the liberalization of telecommunications in Germany and Europe
Cache Servers
Looking ahead
Tables
Biographies

The webscape in Germany is becoming more interesting by the month. Today, three-quarters of the servers are run by students and researchers, but businesses and the media are catching up. This paper describes these players, as well as issues specific to the Web and Internet in Germany: One is the umlauts, and multi-lingual documents in general. Another is Germany's data protection law and its implications for the Web's content. How the German Web is evolving and will evolve depends on the state of the Internet in general. As in the rest of continental Europe, the national PTT plays a central role here. And cache servers will help organizations use bandwidth more efficiently.

Web servers in Germany

As of 8 September 1994, there were 201 main entries in the canonical list of German Web servers, which is maintained on a voluntary basis at the Free University of Berlin. In our count, about 60% of these Web servers were at universities and trade schools, and 15% at research institutes, such as GMD. The university home pages have often been set up recognizably on the initiative of a particular institute. Typically, they show lists of faculties and departments, up to a third of which may already have links to departmental home pages. Some universities also provide links to the library and computer center, course catalogs, cafeteria menus, events calendars, and tourist information on the cities where they are located, such as Dresden. Increasingly, universities and research institutes are mounting databases and archives: German legal documents in full text at the law schools in Saarbruecken; environmental sample data at Technical University of Ilmenau provides an innovative clickable overview of its own Web structure.

Commercial companies are still less than 15% of the total. Most of these are publishers or providers of network services or software, who use the Web to advertise their products and post bulletins for users. NetUSE GmbH, a searchable email directory of companies in Germany. As costs drop and the Internet becomes commercialized, we expect the number of businesses in the Web to grow. About 7% of the Web sites are associations (Vereine); these include user groups (XWindows) and national network associations, such as Federal Ministry for Research and Technology, no government agencies have appeared on the Web. Since 8 September, Deutsche Welle, a national radio and television broadcaster with an international audience, has come online; at present, they offer program schedules and news analyses in several languages.

In order to characterize the contents of German Web servers in a way analogous to how libraries are described -- ie, by number of volumes held -- we wrote a Perl script that recursively followed all of the links within selected Web sites, thus retrieving all of the linked documents within a sub-domain. (To avoid overloading the network, it asked for each document only once and paused between requests.) For twelve servers -- chosen somewhat arbitrarily, but including the full range from small to big, from educational to commercial -- the script produced a list of all documents in the given Web site, along with a list of links from each document.

The servers we examined each held between 42 and 26,415 files, the majority of which were HTML documents. At a few sites, such as ART+COM's, up to one-third of the files was graphics files, which included GIF and postscript files, logos, and icons. Sound and video clips rarely added up to more than 1% of the total. The largest site, a university of 24,000 students, had 12,475 HTML documents, 7,492 graphics files, 287 non-HTML texts, 67 sound bites, and 327 video clips, as well as 4,247 links to news groups and 321 to ftp directories. GMD had 2,369 HTML documents, 287 non-HTML texts, 769 graphics, 27 sound bites, and 9 video clips. The smallest Web server we visited had only 22 HTML documents and 20 graphics files.

One software company keeps 8,472 files on just one server, but many of the Web sites have several; the most we found at one site was seventy-five. At GMD, there are nineteen, as employees increasingly are setting up their own Web servers, either because they do not have full access to the central server or because they want to experiment on their own. By collecting the appropriate header lines of the replies to full HTTP 1.0 GET requests, we found that about 60% of the German Web servers use NCSA Mosaic on a UNIX platform; about 33% use the CERN server; and a handful use Plexus, HTTPS, or MacHTTP.

Sprechen Sie Englisch?

Over half of the Web servers in Germany use a second language, mostly English; indeed, 15% use English only. The multilingual servers handle the language problem in one of two ways: Most commonly, a server has parallel sets of pages in the two languages. Alternatively, parallel sentences or paragraphs in the two languages are placed beside each other on the pages, though sometimes the languages are mixed arbitrarily, which can be confusing.

At GMD, we decided to use both German and English in our Web server: German, because it is our language and one of the official languages of the European Union; and English, because it is the lingua franca of the Internet. We avoid using mixed-language documents except for relatively simple pages, such as address lists; and even here, the two languages are visually distinct. Documents intended for internal company use only, such as administrative news and cafeteria menus, are often needed in German only. Publicity materials and scientific papers are generally written in (or translated into) English. Obviously, it makes no sense to translate everything; we use our judgment here.

GMD's external home page and some institute home pages, along with the informational pages immediately under them, are systematically maintained in parallel sets of German and English. Rather than separate the two versions of a document into parallel directories, we assign them a common base name and two-letter suffixes for language. We have discovered, however, that it is very hard to keep parallel versions of evolving documents consistent with each other. And quite often, the English and German versions are maintained by different colleagues, each with a different schedule and priorities. Great effort and a lot of email is needed to coordinate revisions among a relatively small circle of colleagues; and we wonder how such arrangements will scale as the body of documents and circle of colleagues grow.

Different languages can also be a problem for the users; they have to find information in an appropriate language. As on most other Web servers with pages in parallel, GMD's home pages point to each other. One could decide that beneath the home page, all further links should point only to documents in the same language or to the rare multilingual documents, but this would be needlessly rigid. And a user should be able to switch languages without starting over from the top. We have therefore decided to have all parallel pages point to each other. Links to documents in the other language are permissable as long as the anchors warn the reader.

This arrangement, however, is not ideal. It would be simpler for both users and system managers if one could provide the client with a list of acceptable languages and the server could use this list to attempt automatically to retrieve, for any given URL, a version in one of these languages. In fact, HTTP already supports such language negotiation between client and server. This has been implemented in the CERN server software, and Toshihiro Takada has presented a modified version of CERN's X-Mosaic client (version 2.4) that supports this feature. However, not all clients yet support the language lists, and even if they did, there would have to be standards for language suffixes; Takada has suggested using ISO language and country codes for this (Takada 1994:214). We plan to test this after migrating from the NCSA to the CERN server software, though we are well aware that it will take much work to rename all of the existing documents along with their corresponding links. Many links from documents outside our server will also be affected by these changes, though there are ways around this.

Another problem we face is the umlauts. Fortunately, HTML is flexible enough to allow users to represent a character in the ISO LATIN1 character set, with an entity reference (such as "ä"), or with a character reference (such as "ä"). But getting them into HTML documents in the first place, transmitting them unharmed across the net, and displaying them readably on the screen remain problematic. What are needed, of course, are standards. Or rather, there are standards, but there are too many of them, and none of them is universally accepted and supported.

Most of GMD's employees use UNIX workstations, Macs, or PCs. And they would all rather prepare their HTML documents with their favorite word processors than learn a special editor. Moreover, most of them use German keyboards with German versions of software, which together insert umlautted letters with a single keystroke; this is better than having to remember and type entity or character references and thereby to disturb the work flow with extra typing. Unfortunately, these three platforms use incompatible character sets. We currently use small filters that convert proprietary umlaut codes into the 8-bit ISO LATIN1 character set. And there are public-domain converters for translating various word-processor formats into HTML that, for lack of anything better, we misuse for converting umlauts into entity references. But these methods are troublesome and error-prone. It would be better if word processors would support ISO LATIN1 directly; this alone would solve our problems with German umlauts (and many other European special characters as well). Ideally, of course, the Web would adopt a vastly extended character set, such as Unicode or the ISO 10646 character set.

Once it is prepared by an author, an HTML document must be uploaded to a Web server and, from there, downloaded to the readers' Web clients -- preferably with umlauts intact. Programs for uploading, such as ftp, transfer all eight bits of a document's characters, though many email programs clear the eighth bit, turning umlautted letters into garbage. The HTTP protocol preserves the eighth bit when downloading HTML documents from server to client, so in theory, a Web client should never have trouble properly displaying umlauts. However, even popular Web clients occasionally have trouble doing this, especially when the versions are new, which suggests that this feature is not always tested before release.

Data Protection Laws in Germany

Internet service providers in Germany must operate within the framework of privacy laws that give every citizen the right to self-determination in matters of information. The current data protection laws (Datenschutz) date back to a Constitutional Court decision of 1983, which ruled that the census bureau had violated citizens' rights by asking too many questions and established that the right to informational self-determination is based on the constitution. Henceforth, the state would have to pass laws specifying what data it wanted to collect, and for what purposes. In principle, personal data is not to be collected, stored, and distributed unless a law is passed or unless the parties concerned come to a specific agreement. This notion underlies every situation in which one might want to distribute a list of people, such as a membership directory or list of conference participants.

In practice, however, some basic data elements, such as name, profession, title, university degree, address, and year of birth, are considered "free". And employers and government agencies may collect certain types of data if they are clearly needed for, say, social insurance registration. A business may store and transmit personal data if they are necessary for the protection of its rightful interests and there is no reason to assume that the interests of the individual involved carry more weight. Data may also be stored and transmitted if they are already available from publicly accessible sources or if they are needed for scientific research.

In all cases, the guiding principle for determining what data may be collected is whether or not the use or dissemination of the data could possibly harm the interests of those concerned. Examples of data clearly beyond the limits of the allowable are criminal and medical records, membership in political or religious associations, and the fact of having been fired from a job. Of course, if an individual actively and voluntarily agrees to allow data to be used, as with a signed release, then anything is possible. But under German law, passive approval -- simply failing to object -- generally does not count as agreement.

These issues arose when DFN announced it wanted to make personal data available in an X.500 directory. In their brief, DFN's lawyers pointed out that X.500 is useful only if the directories are reasonably comprehensive; yet to obtain active approval from all concerned would clearly be impractical. In order to justify using passive approval -- posting everyone's name unless they object -- they argued that the legislators did not mean that the interests of every single individual needed to be considered separately. They pointed out, furthermore, that written approval should not be necessary for storing the "minimal set" of data; that people will be able to read their own X.500 data and thus determine their accessibility; and if that data can be stored for an organization's own purposes, the purpose here is communication within the scientific community. By the very act of logging in, they concluded, users are participating in this community and thus implicitly giving their consent to participation in the directory.

At GMD, the issue of data protection arose in connection with personal home pages in the Web. We concluded that employees can have their own home pages, but because of the data protection law, they must create and maintain the pages on their own, thus remaining in control of their own information. Since most employees do not know HTML, we have designed templates and scripts to automate these tasks. Employees can fill in an order form to receive a default home page, containing their name and address, via email; they can specify whether it should be accessible within GMD only or world-wide; and they can mark it for inclusion in separate lists of personal home pages. After editing the page, they can submit it as a reply to the original message. Related scripts can be used to edit, remove, or change the contents of a page and its accessibility. These scripts use generic email addresses (Firstname.Lastname@gmd.de) and access keys to ensure that people access their own pages only. The security this provides is probably good enough for the purpose, though security would be better with something like Privacy Enhanced Mail (PEM) or Pretty Good Privacy (PGP).

Towards the liberalization of telecommunications in Germany and Europe

The growth of the Internet in Germany is best understood against the backdrop of the controversy in Europe over protocols. Scientific networking began in Europe in 1984, when EARN was established as a sister network of BITNET. As EARN was based on proprietary technology from IBM, European politicians created national networking organizations for implementing open networks based on ISO-OSI protocols. These national organizations, such as the German Research Network (DFN), were coordinated by a European umbrella organization, RARE; and EU programs set about developing OSI-conformant services. But with no network to test on and OSI standards that were complex and often preliminary, developers showed little interest. The first OSI-conformant X.25 networks were too expensive for scientific users; and RARE's IXI was plagued with technical problems. In 1989, then, DFN got the Deutsche Bundespost (Telekom) to provide WiN, a private X.25 network for scientific computing in Germany, at flat, affordable rates.

Meanwhile EUnet, a private company, began offering services based on TCP/IP, which was simpler than OSI, vendor-independent, and a big success in America. In 1989, IBM asked GMD to set up a TCP/IP-based network for its European Supercomputer Initiative (EASInet). By then, the tide had already begun to turn, as official national networking organizations increasingly defected to TCP/IP. In 1991, EBONE, a consortium of official and unofficial organizations, built an academic network based on TCP/IP. Then EBONE, in turn, was eclipsed by the official organizations' DANTE, whose EuropaNET, supported by the EU, provided TCP/IP services (as well as OSI X.400 and EARN services) at attractive prices. More and more commercial companies in Europe began using the Internet. The Bangemann Report showed that the political establishment in Europe and Germany was ready to accept the Internet as an important part of future information highways; and in June 1994, the European Union moved to work TCP/IP into its procurement policies (Birkenbihl 1994).

Currently, there are four main Internet service providers with international links in Germany. The most important is DFN, which provides access to WiN. DFN members can connect at 9.6 kbps, 64 kbps, or 1.92 Mbps; a 34-Mbps ATM-based network called High-Speed WiN is planned for the end of 1995. They can connect to the rest of Europe over a 1.92-Mbps link to DANTE; and to the United States over a T1 link (1.55 Mbps) to ESnet.

The other three major service providers are commercial. These are: EUnet Germany GmbH, which is part of EUnet, the European counterpart of UUnet; the Xlink division of NTG Netzwerk und Telematic GmbH, a subsidiary of Honeywell Bull, with links to EBONE and ANS; and a relatively small newcomer, MAZ Hamburg GmbH, which is connected to the English service provider PIPEX. Their backbones are based on leased lines controlled by Telekom, and their customers gain access largely through 64-kbps ISDN. Their international links range from 64 to 512 kbps. Some of their customers are themselves providers; Individual Net e.V., for example, connects to EUnet, Xlink, and WiN and specializes in accounts for individuals. Higher-bandwidth Internet connectivity is provided by various regional networks, such as Belwue in Baden-Wuerttemberg, by fiber-optic FDDI- and DQDB-based MANs, and by mission-specific links, such as Desy's HEPnet link to CERN.

Most service providers levy volume charges for bits carried. The exception is DFN, whose annual fees for a WiN connection are based primarily on a fixed cost per cable, depending on bandwidth, to which is added a comparatively small surcharge based on capacity class (the total bandwidth of all links) and volume class (average monthly traffic relayed or gatewayed on the application level in the second half of the previous year). With this policy, universities and research organizations can plan their networking expenditures in advance.

We were not surprised to find that three out of four German Web sites are customers of DFN: in the absence of volume charges, clicking on a graphics, sound, or postscript file incurs no extra costs. Indeed, nearly every DFN customer with a 1.92-Mbps connection has one or more Web servers -- as opposed to about one third of the organizations with 64-kbps lines and less then ten percent of those with 9.6-kbps lines. At GMD, with institutes in Sankt Augustin, Darmstadt, and Berlin, we pay about 1 million DM ($600,000) per year for three 1.92-Mbps access points. Many large universities, such as Bonn, with 40,000 students, must get by with 64 or 512 kbps.

Internet prices in Germany are so high generally because the lines are so expensive; according to The Economist, high-speed leased lines are five times more expensive than in the United States (13 Aug 1994, p. 59). And they are so expensive because Telekom, like the other state-owned telecommunications companies of continental Europe, has a national monopoly on basic telephone and network services. New companies are not allowed to offer services over any cables but Telekom's; and with no competition, Telekom is free to use a fee structure for data services that is derived from its own high telephone prices. The high costs of leased lines are passed on to home and business users, who must often in addition pay high long-distance charges to reach their nearest service provider.

Since last year, Telekom's telephone monopoly has been scheduled to end in 1998, when the European Union's internal telecommunications market is due to open; its network monopoly was supposed to end within the following two years. Powerful interest groups have begun to argue, however, that the monopoly should end sooner, lest Germany fall behind in building the infobahn. Telekom's most dangerous competitors may turn out to be the German energy companies and the railroad, who for years have used their privileged access to federal land for setting up their own nation-wide networks of fiber-optic cables. If the current postal minister has his way, liberalization could come about faster in Germany than in the rest of continental Europe (FAZ Sep 12 1994, p.15).

In the meantime, Web users must endure relatively slow response times when accessing popular Web servers such as those at CERN and NCSA; they are careful about clicking on bandwidth-consuming multimedia objects such as pictures, sound bites, and video clips. Indeed, German Web servers are often more text-oriented than those in other countries. That the Internet in Germany has grown by a respectable 20% in the past three months is all the more impressive when seen against this less-than-ideal backdrop.

Cache Servers

As Web use is growing faster than bandwidth, something must be done to reduce the load. One solution is to use cache servers, whereby Web clients no longer contact the Web server contained in a URL directly, but send the request first to a cache server. If the document is already cached and is not out of date, the cache server returns it immediately; otherwise, the cache server forwards the request to the document's home server.

Caching will reduce the load on the most popular Web servers and on the net as a whole. It will improve response times for retrieving documents and improve the availability of documents by keeping them available even when their home servers are unreachable. Further cache servers can be added in a cascading manner from a center outward in order to distribute the network load more evenly. On the other hand, the proliferation of copies will raise the problem of version control. It will be difficult to know how often a given document is accessed. And cache servers will pose potential security risks. To avoid bottlenecks at one local cache, client software will need to be improved so as to make it possible to define different caches for different URLs, or even to bypass the cache entirely. As many of the most popular Web servers are located outside of Germany, a central German cache server would have a big impact on international traffic.

DFN is planning to set up a central cache server in Berlin. For a trial period of one year, it will be accessible to WiN customers at no extra cost -- DFN wants to determine how much it will cost to maintain such a cache server and whether WiN customers should have to pay for it in the future. A prototype for this, which is accessible only by other local cache servers, has been set up at the University of Frankfurt. The NCSA server software we are using at GMD does not support cache servers, but we plan to set one up when we migrate to the CERN software. This will free up much bandwidth on our WiN connection. The less bandwidth an institution has, the more these savings can make a difference.

Looking ahead

We expect the Web to continue growing, with bigger servers, more servers, and more commercial sites. The problem with umlauts will fade away, as new software replaces the old. Uncertainties about the application of Germany's data protection laws in the Internet will be clarified. Prices for Internet access will fall with the liberalization of Europe's internal telecommunications market. Still, the cache server will remain an important means of conserving network resources. At GMD, we will concentrate on developing tools for the quality control of our Web servers, for maintaining our multi-lingual pages, and for supporting our users.

Tables

Table 1: Web sites in Germany by type of organization

    Educational                          119    59,2%
    Research                              31    15,4%
    Commercial                            27    13,4%
    Verein (1)                            13     6,5%
    Error (not accessible etc.)           11     5,5%

    Total number of servers              201

    (1) A "Verein" is an organization with the special legal
    status of "association".

Table 2: Documents at Web Sites in Germany (Examples)

  GMD (Research)
    HTML                2369          62 %
    non HTML texts       287           7 %
    graphics             769          20 %
    video                  9         < 1 %
    audio                 27         < 1 %
    misc                 390          10 %
    Total               3851

  A large university with about 24.000 students
    HTML               12475          47 %
    non HTML texts       690           3 %
    graphics            7492          28 %
    video                327           1 %
    audio                 67         < 1 %
    news articles       4247          16 %
    misc                1117           4 %
    Total              26415
    
  European branch of an American computer company
    HTML                4052          48 %
    non HTML texts       270           3 %
    graphics              34         < 1 %
    video                ---
    audio                ---
    files for download  3375          40 %
    misc                 741           9 %
    Total               8472

  One of the associations in Germany's Web
    HTML                  23          49 %
    graphics               4           8 %
    video                  2           4 %
    audio                  6          13 %
    misc                  12          26 %
    Total                 47

Table 3: Software used by German Web servers (broadly defined)

    NCSA                                 169     60.1%
         1.0                         25
         1.1                         47
         1.2                         16
         1.3                         80
         V1.3 (MSWindows)             1

    CERN                                  94     33.5%
         2.15                         1
         2.16beta                     4
         2.17beta                     1
         2.18beta                     2
         3.0                         86

    Plexus                                 3      1.0%
         2.2.1                        1
         3.0d                         1
         3.0k                         1

    HTTPS/0.8                              1      0.4%
    MacHTTP 1.3.1b9                        1      0.4%

    Unknown                               13      4.6%

    Total                                281    100.0%

Table 4: Language of Web Sites in Germany

    Only German                           88    43,8%
    English and German pages              38    18,9%
    Only English                          31    15,4%
    English and German mixed              24    11,9%
    Other (French, Swedish)                4     2,0%
    Error (not accessible etc.)           16     8,0%

    Total number of servers              201

Table 5: Prices for access to DFN's scientific network (WIN), per year

    Note: prices do not include value-added tax.

    Fixed costs per cable

        Connection at 9600 bps           15,790 DM
        Connection at 64 Kbps            52,630 DM
        Connection at 1.92 Mbps         310,000 DM
    
    Variable volume costs

                                    capacity classes
        volume       0          1          2a         2b         3
        classes      DM         DM         DM         DM         DM
          I        5,000.-    5,000.-   10,000.-   17,500.-   35,000.
          IIa      7,000.-    7,000.-   12,000.-   19,500.-   37,000.-
          IIb     11,000.-   11,000.-   16,000.-   23,500.-   41,000.-
          III     17,000.-   17,000.-   22,000.-   29,500.-   47,000.-
          IV      33,000.-   33,000.-   38,000.-   45,500.-   63,000.-
          V       57,000.-   57,000.-   62,000.-   69,500.-   87,000.-

    Capacity class 0:  indirect WIN access; up to 64 Kbps
    Capacity class 1:  direct WIN access; up to 64 Kbps
    Capacity class 2a: direct WIN access; 64 Kbps
    Capacity class 2b: direct WIN access; between 64 and 200 Kbps 
    Capacity class 3:  direct WIN access; over 200 Kbps

    Volume class I:   Usage less then 2 Mbytes per month
    Volume class IIa: Usage between 2 and 20 Mbytes per month
    Volume class IIb: Usage between 20 and 100 Mbytes per month
    Volume class III: Usage between 100 Mbyte and 1 Gbyte per month
    Volume class IV:  Usage between 1 and 10 Gbyte per month
    Volume class V:   Usage over 10 GBytes per month

    Source: Wir im Deutschen Forschungsnetz [No. 14, March 1994, page 64]

Biographies

Thomas Baker, a researcher in the Media and Communication department of GMD, prepares much of the English-language material for the GMD Web server. He has a masters in library science from Rutgers, a doctorate in anthropology from Stanford University, and has worked as a researcher at a policy institute in Italy.

Inke Brüning has been working at GMD since 1988. After four years in the Department for Human-Computer Interaction designing user-friendly software, she currently is engaged in multi-media mail development and user support in the Department for Network Engineering. Since late 1993 she has been involved with establishing and maintaining the Web server at GMD.

Lothar Klein received his masters degree in informatics (computer science) from the university of Bonn and has been working as scientist at GMD since 1991. He was engaged in the development in hypertext systems in the department of Department for Human-Computer Interaction. At present his major work areas are highspeed networking and multi-protocol applications in the Department for Network Engineering. Since late 1993 he has been involved with establishing and maintaining the Web server at GMD.

Michael Lenz received his masters degree in informatics (computer science) from the university of Bonn. He is working at GMD in the Department for Network Engineering since 1993. His major topics are value-added services, security and information systems. Since late 1993 he has been involved with establishing and maintaining the Web server at GMD.


Michael.Lenz@gmd.de