Direct access

Next: Full text searches Up: Document access Previous: Document access

3.1 Direct access

As already mentioned, the pages of a scanned-in document are not accessible as HTML nodes directly. Instead, these pages are accessible via a CGI script written in PERL. This script evaluates two main parameters passed in the two environment variables PATH_INFO and QUERY_STRING and produces HTML nodes as its output. Thereby, PATH_INFO contains the name of the file which stores all the relevant data for a node (name of the included GIF image, preceding and succeeding page etc.). QUERY_STRING contains the zoom factor that should be applied to the inlined GIF image of the document page.

The default value of the zoom factor is 1. According to the actions of a user this factor and the display size for images are modified: if the zoom factor z is greater than 1, the original image with an extension of x * y is enlarged to (x * z) * (y * z); if z <= -1, the image is scaled by the factor 1 / -z. A value of 0 is not allowed for z and could only be passed by means of manual creation of an URI appending ``?0''.

The actual value of the zoom factor therefore should only be modified using the appropriate buttons (buttons ``+'' and ``-'' in figure 3): clicking on the zoom-in or zoom-out button increases or decreases the value of z by 1, skipping values of 0. If z <> 1, the PERL script triggers the scaling of the GIF image under concern and includes the newly created temporary image file in the `artificial' HTML node. Otherwise, the original image is used. As a result, the usage of a modified zoom factor slows down the response time to user actions. Therefore, it should only be modified if scaling of the images is sensible or necessary. E. g. if pages should be transferred over a slow network, reducing the images' size may be a sensible action.

A scaled example node is shown in figure 3. There the basic structure of a node can be seen: At the top, the number and the name of a page are given. In the center, the possibly scaled image representation of the page follows. At the bottom, a row of icons gives the user access to certain functionalities.

The direct access to such a node is possible in two ways: either by means of using the appropriate URI or (semi-directly) by means of using the table of contents (see fig. 4). The manual creation of an URI does not seem to be sensible in most cases, because the user has to know the internal reference name of a page. However, he can put a specific page in his hotlist and may enter the page from there without being forced to know about names and zoom factors.

Using the table of contents gives him the basic access mechanisms as applied in books: he may either inspect the headings and choose the one that comes close to his information need. Or he may select a page by its number. The table of contents that we have implemented is based on the physical structure of the document, i. e. there is exactly one entry for each physical page. Of course, this table of contents could also be (even additionally) based on the logical structure of the document only. Then, there would be exactly one entry for each heading, thus mapping a printed document's table of contents exactly.

___________________________________________________

Figure 3:Screendump of a page display

___________________________________________________

Figure 4: Part of a table of contents

Next: Full text searches Up: Document access Previous: Document access