Jutta Treviranus
Adaptive Technology Resource Centre, University of Toronto, Toronto, Ontario, Canada
The user who is reading hyperlinked multimedia documents (e.g., Web pages, Intranets) using access technology such as screen readers, Braille displays or screen magnifiers, faces three challenges:
- obtaining an overview of the document, which includes getting a sense of the structure of the document,
- moving to specific sections or elements in the document, and
- obtaining translations of graphically presented information (i.e., animation, video, graphics).
The user with sight can get a good sense of the content and scope of a document at a glance. Formatting and layout allow the reader to quickly find a specific part of a document. When using screen readers and screen magnifiers only a very small piece of information can be viewed at one time. Visual communication of information regarding the structure or content of a document cannot be used. This paper will discuss developments on various fronts which make it easier to retrieve information and manipulate documents when using screen readers and screen magnifiers. Web publishing tools, browser enhancements and document creation supports within authoring tools will be hi-lited.
When reading a hyperlinked multimedia document we do much more than simply read the document word by word from left to right and top to bottom. We approach a new document with a series of questions. These questions are answered using a number of conventions or strategies, many of which are visually based. The first question is: what is this document? We get an overall sense of the document: the topic, the scope, the format, the authors approach, possibly the level, by quickly glancing at the document. We also take a quick inventory of what it contains (e.g., images, image maps, forms, etc.). We get a sense of what the author feels is important by looking at the formatting or other graphical conventions. Concepts or points are communicated or illustrated through pictures, videos, or graphs. We determine what we can do with the document by noting items like forms or links. We assess whether the document contains anything that interests us by scanning the document for key words or graphics. We answer the question "where can we go from here" by noting links. While reading the document, we get a sense of the organization or structure of the document, and the emphasis intended by the author, by noting the format and layout. At a glance we can determine where we are in the document. The visually communicated structure also lets us quickly locate specific sections of the document.
For many computer users this same information must be gleaned by listening to a flat synthesized voice which reads only text. In North America, screen readers are the most popular computer access systems for people who are blind. Screen readers present aurally, information which is visually presented on the computer screen. Translating a rich visual environment or document into the aural channel is very difficult. The primary constraint is that, while we can process many pieces of visual information virtually simultaneously, the aural channel is largely serial. When information is spoken using a monotone voice synthesizer, information can only be presented one piece at a time in a linear or serial fashion.
Screen readers were originally developed when computers displayed only text. They were adequate and well suited to the DOS interface which also presented information in a linear fashion. However, todays interfaces rely heavily on graphic devices or conventions, spatial layout, icons, pictures, video and animation to communicate information. Multiple pieces of information are presented simultaneously. There is frequently no logical serial order to the information on the screen. Associations between objects are expressed through visual means (e.g. proximity, color coding etc.). Interface consistency or predictability are not prevalent (e.g., labels for text input fields could be located above, below or to side of the input field within a single dialogue box or form). Mouse pointer actions have largely replaced keyboard commands.
Due to the evolution of the computer user interface and the digital document, users of screen readers face three major unmet challenges:
People who use screen magnifiers or Braille displays encounter similar constraints. The amount of information that can be reviewed at one time is very restricted. When only a small section of the screen can be seen at once, it is easy to miss dialog boxes that pop up elsewhere, it may be hard to determine the relationship between objects, it becomes very difficult to locate specific objects on the screen.
These challenges can be addressed by modifying the following:
This paper will briefly review accommodations made in all of these areas but will hi-lite accommodations in authoring tools and browser utilities or extensions.
Screen readers are slowly emerging which translate visual properties into aural properties other than spoken text. Screen readers are being developed which begin to communicate visual conventions such as hi-liting, layout, color coding, or visual grouping using aural properties. (www.prodworks.com, www.citi.doc.ca/Citi-Mosaic/Citihome/Programs/TECSOE/Projects/ProjectsTECSO.html#pc, multimedia.pnl.gov:2080/staff/elopresti/ad/) These aural properties include pitch, voice, rate, inflection, and various audio tones. For example, links may be spoken in a male voice while other text is spoken in a female voice, underlined text may be spoken with a higher pitch, headers may be preceded by a number of ascending tones. (Aural properties which have not been exploited include background music and 3 dimensional sound.)
Proposed aural Cascading Style Sheets would allow authors (or users) to specify the aural presentation of document elements by specifying properties such as speed, voice, pitch, stress, richness, as well as background audio and spatial properties. These proposed style sheets must be adopted by authors, screen reader developers and hardware developers in order to be fully implemented. (www.w3.org/pub/WWW/Style/)
Screen readers can only communicate what is revealed by the browser, or operating system. Tools are under development which would allow application developers to reveal necessary information to screen readers in a consistent fashion. (http://www.microsoft.com/windows/enable/activex.htm).
When using vision, overt controls are required to move to and interact with: the menu, the toolbar, the scroll bars, dialog box controls, links and input fields. In present browsers this is frequently done using the mouse pointer. For obvious reasons, people who are blind require keyboard equivalents for these mouse actions. Browsers are slowly adding these keyboard equivalents. When using a screen reader, overt controls are also needed to shift focus between various segments of the document, between frames, or between different cells of a table. Unlike visual focus, the screen reader must be manually directed to read the desired segment. The need for keyboard equivalents to move between desired document elements is being addressed by some browsers and screen readers. (www.prodworks.com, www.hj.com).
Screen readers can only speak text. This text need not be displayed on the screen. Graphics, videos, animation, and bitmapped text are illegible to screen readers unless there is a textual label or textual description associated with the graphic object. This is achieved using alternative text attributes, linked descriptive files for images, or textual alternatives for image buttons, applets, background images, image maps, or video files. The HTML specification and the file formats must accommodate these textual labels and descriptions and the individual authors must include them in the document.
Principles and guidelines for creating web pages which are accessible to people who use assistive technology are well documented (http://trace.wisc.edu/world/web/index.html, http://www.yuri.org/webable/index.html, http://www.utoronto.ca/atrc/). Advocacy groups exist which review web sites and inform authors who produce inaccessible web pages (http://www.yuri.org/webable/index.html). However, the content on the Web will not become accessible unless every Web author has the will and knowledge needed to create accessible documents. Given that authors range from school children, to corporations, academics, and government organizations it is highly unlikely that an education and advocacy campaign would reach all authors. Authors who are aware of the need to make their documents accessible and wish to do so can use services such as "Bobby." Bobby is a validation utility which checks documents for access barriers and advises authors on how to fix the barriers (http://www.cast.org:80/bobby/index.html).
A more effective method of reaching a large number of authors may be to provide the necessary prompting, education and supports in HTML authoring software. The majority of authors use Web authoring tools and the percentage who prefer authoring tools to manual HTML markup is increasing. Thus authoring tools can be used to inform a wide range of authors of the need to create accessible documents. Clear, easy to follow guidelines and prompts can be provided and as many modifications as possible can be automated using wizards or other utilities.
Barriers which would be addressed through these authoring supports could include:
Guidance and help files would provide justifications for recommendations as well as tips on writing useful alternative text, or textual descriptions. A utility which simulates the presentation of the authors document using a screen reader or screen magnifier could be included to illustrate recommendations and assist the author in visualizing the needs of screen reader or screen magnifier users. An accessibility checker, similar to a spell checker, could check both new documents, existing documents and imported documents for access barriers. When barriers are detected the author could be automatically linked to the appropriate dialog boxes and guidelines needed to modify the document.
A viable method of providing access to Web pages by screen readers (as well as text browsers or low bandwidth clients) is to provide a parallel text-only site which provides all the information of the standard site in text form. Problems encountered with this approach are that: while the standard site gets updated the text-only site is neglected and quickly becomes out of date. A wizard which automates the creation of the text-only site as much as possible and ensures that the text site is updated when the standard site is updated would help to alleviate this problem.
In order to efficiently find information in web based documents we rely upon our ability to sift through and ignore irrelevant information. This can be assisted by well structured, hierarchical documents. We want to get a quick overview and then efficiently zero in on the items of interest rather than going sequentially through all the possible items. Using the visually communicated document structure to guide us we move with agility in and out of various levels of a document. With presently available systems, this same mobility cannot be achieved when using screen readers or screen magnifiers.
"Luckily," screen size is finite, as is the size of legible or recognizable text or graphics. Consequently, users without visual impairments feel restricted in the amount of information they can view at one time when attempting to read large complex documents. Software developers have responded to these challenges by creating tools or utilities which make it easier to read, navigate through and manipulate large WEB or Intranet documents or sites (http://www.sq.com/hip/). These tools address problems which are similar, but greatly intensified, when using screen readers or screen magnifiers. Consequently, they can be exploited to make information retrieval and document manipulation more efficient for users of screen readers or screen magnifiers.
These tools include:
Navigators provide an outline of a document using the structure expressed in HTML. This outline can be expanded or compressed similar to a tree view of a file structure. Thus a great deal of detail can be displayed about one area of a document while only the general headers are displayed about another area. The user can navigate up and down through the document and "in and out" of the levels of detail. Dynamic tables of contents (TOCs) allow users to determine what will be listed as table of content items, for example: headers, images, figures, or tables. These table of contents or tree structures can also display multiple hyperlinked documents. The WEB of documents is translated into a tree structure by specifying a point of view. Points of views or top-level documents can be changed, thereby changing the tree structure.
These navigators or tables of contents have obvious advantages for users of screen readers. To exploit these resources, keyboard equivalents must be provided to move through and manipulate the TOCs. In addition, the document levels of the various items should be expressed using audio or vocal cues. Thus, screen reader users can get a sense of what the document or site looks like overall, and then zero in on specifics.
Certain Intranet publishing tools allow the specification of personal views of a document. Thus the same document can be viewed with a different page layout, with more or less detail, or with a different emphasis on certain pieces of information. A personalized view for users of screen readers, with particular areas of relevance or interest hi-lited, would make document viewing and information retrieval much easier.
Tools exist which allow annotation of documents by both content creators and users. These annotations could be used by access system users to hi-lite or mark areas of particular interest or to add text labels where they are missing. Thus areas of interest could be quickly found again and clarification of an inaccessible item would only need to be obtained once. Combined with dynamic TOCs, a truly personalized table of contents could be created.
Dynamic TOCs can be further enhanced by search tools which search within specific documents or sites. The results could be displayed in dynamic tables of content. If these can be easily navigated using keyboard equivalents they would provide the equivalent of visually scanning the document for content of interest. When loading a document an option could be to create a dynamic TOC of items related to a number of areas of personal interest. Thus the user who is blind or visually impaired could quickly assess whether the document has information matching their interests.
Additional tools, which would greatly benefit users of alternative access systems, are dynamic information retrieval utilities which alert the user to newly posted items of interest. Unfortunately these are frequently not accessible.
The visual channel and purely visual conventions are heavily relied upon to communicate the structure, the emphasis and the content of Web and Intranet documents. While the visual channel lends itself well to targeted document navigation, the aural channel encourages sequential document viewing. Through work on various fronts, strategies and tools are being developed which afford the same information retrieval and document manipulation capabilities to users of screen readers and screen magnifiers.
The author would like to acknowledge the following ATRC staff for their insightful input into this paper: Jan Richards, Dena Shumila, and Kevin Nguyen.