Gregg C. Vanderheiden Ph.D.
Trace R & D Center
Industrial Engr Dept Univ of Wisconsin- Madison
Madison Wisconsin, 53705
gv@trace.wisc.edu
With increasing power, miniaturization, and thin-client/NetPC structures, people will soon be able to access the full network environment wherever they are. Information access points/appliances will be built into the walls, incorporated into our working environments, carried and even worn by us, and used as an integral part of most of our daily activities.
At the same time, as the Internet and information technologies are being woven into the fabric of education, business, and daily life, greater attention is being focused on whether the ordinary person, including those with disabilities, will be able to access and use these systems.
It is interesting that these two seemingly different objectives have similar solutions. If we design systems which are truly ubiquitous and nomadic; that we can use whether we are walking down the hall, driving the car, sitting at our workstation, or sitting in a meeting; that we can use when we're under stress or distracted; and that make it easy for us to locate and use new services- we will have created systems which are accessible to almost anyone with a physical or sensory disability. We will also have gone a long way toward creating systems that are usable by a large percentage of the population who currently find systems aversive or difficult to learn. In addition, strategies and ideas developed for people with disabilities can provide valuable techniques and insights into creating devices for all nomadic computer users.
The devices of tomorrow, which might be referred to as TeleTransInfoComm (telecommunications / transaction / information / communication) devices, will be operating in a wide range of environments. Miniaturization, advances in wireless communication, and thin-client architectures are all quickly breaking our need to be tied to a workstation, or to carry a large device with us if we want to have access to our computing, communication, and information services and functions.
As a result, we will need interfaces that we can use while we're driving a car, sitting in an easy chair, sitting in a library, participating in a meeting, walking down the street, sitting on the beach, walking through a noisy shopping mall, taking a shower, or relaxing in a bathtub, as well as when they're sitting at a desk. The interfaces will also need to be usable in hostile environments- when camping or hiking, in factories, or shopping malls at Christmas time.
In addition, many of us will need to access our information appliance (or appliances) in very different environments within the same day- and perhaps within the same communication or interaction activity. These different environments will put constraints on the type of physical and sensory input and output techniques that will work (e.g., it is difficult to use a keyboard when walking; it is difficult and dangerous to use visual displays when driving a car; and speech input and output, which work great in a car, may not be usable in a shared environment, in a noisy mall, in the midst of a meeting, or while in the library). Systems designed to work across these environments will therefore need to have flexible input options in order to work in the different environments. The techniques, however, must operate essentially the same conceptually, even though they may be quite different (visual versus aural). Users will not want to master three or four completely different interface paradigms in order to operate their devices in different environments (perhaps even on the same task). There will need to be a continuity in the metaphor(s) and the "look and feel" even though the devices may be operating entirely visually at one point (for example, in a meeting), or entirely aurally at another (e.g., while driving a car). As noted above, many users will also want to be able to transition from one environment to another and from one device to another (e.g., workstation to hand-held), and from one mode to another (e.g., visual to voice), in the midst of a task.
If both government and industry are going to build the infrastructure needed for the NII of the future we will need to have systems that are usable by a much greater percentage of the population than we have today. This is necessary for both economic and political reasons.
The systems will need to be usable and understandable by individuals who, today, avoid technologies or who use them only when they have to. They will have to be operable by people who have difficulty figuring out household appliances. They will also need to address the issues of individuals with literacy problems, as well as individuals with physical, sensory, and cognitive disabilities. These latter groups account for between 15% and 20% of the population and close to 50% of the population who are elderly.
At the same time, however, these interfaces need to be both operable by and efficient for experienced and power users. The same argument that says that it is not economically efficient to create special interfaces or count on special devices being developed for the bottom quartile of the population with regard to interface skills, is just as valid at arguing that the mass market interfaces for the next generation NII products need to be usable by the top quartile of the population. Interestingly, it turns out that many of the individuals who have disabilities (such as blindness), turn out to be some of the best power users, as well, as long as the interfaces stay within their sensory capabilities.
It is also interesting to note that almost all of the issues around providing access to people with disabilities will be addressed if we simply address the issues raised by the "range of environments" discussion above. For example:
Thus, although there may be residual disability access specifics which need to be covered, the bulk of the disability access issues are addressed automatically through the process of developing environment/situation independent (modality-independent) interfaces.
The range of activities that will need to be carried out by these new devices on the next generation Internet is growing rapidly and will vary widely. As interface devices become smaller and more intelligent, and the Internet itself becomes more highly utilized and intelligent, it is hard to imagine any activity which would not conceivably involve these technologies in some role. Communication and information technologies will begin to resemble electricity in that they will be incorporated into almost every device, every environment, and every activity. Activities will include writing, talking, shopping, virtual travel, learning, authoring, disseminating, selling, voting, working, playing, collaborating, etc. It will also give us new tools for doing things we cannot now do, including visualizing concepts which are not inherently visible; listening to data or information which is not auditory; defining laws of physics in order to better explore either the real or constructed environments; enhancing our sensory, physical, and cognitive skills; and tackling tasks which we would not attempt due to the sheer amount of work that would otherwise be required.
We will also undoubtedly be seeing these new technologies spawn more and different applications - applications we have not thought of yet because they are not possible without these technologies. We will also probably become as dependent upon these technologies as we are on electricity today.
This great diversity will not be handleable with a single interface or interface approach. We are going to need a variety of interfaces; many of which will be tuned to specific tasks or types of tasks.
Taking the above requirements together, then, it would appear that in the near future we need to develop a family of interface techniques and strategies which will allow us to build interfaces which are:
Widely varying - To meet the diversity of tasks that will be addressed. Some interfaces will only need to deal with text capture, transmission, and display. Others will need to be able to deal with display, editing, and manipulation of audiovisual materials. Some may involve VR, but be basically shop and select strategies underneath. Others may require full immersion, such as data visualization and telepresence.
Modality independent - The interfaces will need to allow the user to choose the sensory modalities which are appropriate to the environment, situation, or user. Text-based systems will need to allow users to display information visually at some times and auditorially at others- on high-resolution displays when they are available and on smaller, low-resolution displays when that is all that is handy.
Flexible/adaptable - We will need interfaces which can take advantage of fine motor movements and three-dimensional gestures when a user's situation and/or abilities allow, but which can also be operated using speech, keyboard, or other input techniques when that is all that is practical given the environment the user is in, the activities they're engaged in, or any motor constraints.
Straight-forward and easy to learn - So that as much of the population as possible is able to use it, and so that all users can master new functions and capabilities easily as new ones evolve and are introduced.
Trying to create an "everyone" interface sounds wonderful but is unobtainable. Trying to design to a least common denominator clearly does not work. If we only use those abilities or input techniques which everyone has or which we could use in any environment, we would have to rule out all visual and auditory displays and probably tactile displays, as well. Even thinking about limiting interfaces to only those that we could use while driving a car or in a noisy environment seems to eliminate many of the multimedia techniques and approaches.
No matter how flexible an interface you create, there will always be someone with a combination of two or three severe disabilities that in combination render the interface unusable. There are also applications such as telepresence (for example, a cultural tour of the museums and orchestras of Europe) which cannot be made fully accessible to people who are blind or deaf. Some aspects can be made accessible, and all of it can be made partially accessible to both of these groups, but neither group would be able to have full access to all of the information presented because of its nature.
A tremendous degree of access to general information and transaction systems, however, can be provided in a fairly straightforward fashion -- much more than is usually assumed. For example, it is possible to allow individuals who are blind to access and use a 3-D virtual shopping center whether it is rendered in VRML or as a high resolution total immersion simulation. At the same time, these techniques allow an individual who is driving a car to access and use the same shopping simulation and allows the simulated shopping center to be more easily accessed and used by artificial intelligence agents as well.
A couple examples of systems which provide modality-independent, accessible interfaces are helpful here. These are not currently on nomadic systems, but they do demonstrate how a single system can be made to work (at different times) in hands-free or vision-free or hearing-free fashion.
The first example is a touchscreen kiosk interface which has just been unveiled at the Mall of America in Minneapolis (Figure 1) and is being incorporated into other multimedia kiosks across the country. This touchscreen kiosk interface includes a set of features developed at the University of Wisconsin called the EZ Access package. The EZ Access features add flexibility to the user interface for those who would ordinarily have difficulty using or be unable to use a touchscreen kiosk. They add this flexibility without changing the way that the kiosk looks or behaves to users without disabilities. With the EZ Access features in place, the kiosk can now be used by individuals:
Moreover, the techniques can be implemented on a modern multimedia kiosk by adding only a single switch (which appears to the kiosk's computer as the right mouse button) and incorporating the EZ Access features into the standard interface software for the computer. Once the EZ Access features are built into the standard user interface software a company uses to create its kiosks, implementing the techniques on subsequent kiosk designs is simple and straightforward. The kiosk demonstrates the feasibility of very flexible interfaces which has been implemented on a public commercial information system. (For discussion of strategies used see below.)
These techniques are now being adapted and extended for touchscreen kiosks which browse the web.
Figure 1.Curtis Chong, President of the Computer Science Division
of the National Federation of the Blind,
using Mall of America, Knight-Ridder Newspaper's Jobs kiosk at the Mall
of America.
Professor Gregg Vanderheiden looks on.
Click here for a (very large) video clip of the kiosk with EZ Access features.
QuickTime movies on the Web (Figure 2) are being captioned and described in order to make them accessible to and viewable by people who can't listen to them (because they are deaf, because they cannot turn up the volume in the environment they're in, or because the environment they're in is too noisy) as well as people who can't see them (because they are blind or because their vision is otherwise occupied). These movies take advantage of QuickTime's ability to have multiple audio and time-synched text tracks. What would be thought of as closed captions on a television show are stored in a text track as a part of the QuickTime data structure. Users who cannot hear or listen to the sound track can turn on the text track and have the "captions" of the audio track displayed immediately below the QuickTime movie as it plays. Similarly, an alternate audio track can be pulled up which adds a verbal descripton of what is visually happening on screen, so that someone who cannot see the image can "view" the QuickTime movie.
Click here for an example of captioned and described QuickTime movies prepared by the CPB/WGBH National Center on Accessible Media in Boston.
It is also possible for a user to use the search command built right into QuickTime to search for any occurrence of a particular word in the movie and jump to that instant in the movie. These movies can also be searched by intelligent agent software which can pull a movie or clips out of a movie in response to a user's requests.
When full-length movies and other programming are prepared in this way they will be accessible in audio-visual mode (standard viewing format) as well as being viewable as audio only or video only format. This will allow them to be 'viewed' in a wide variety of fashions. Persons viewing a movie in standard format could (if they have to get up) switch to audio -only mode while they go give Jimmy a drink of water, go pick up milk at the store etc. They can also switch to all video mode with the sound turned off if a spouse decides to go to sleep while they want to finish off the end of the movie , the in-laws call in the middle of the game, or the vacuum cleaner wipes out the audio.
To achieve these flexible mobile interfaces users are going to need new interface strategies and new interface architectures that allow a user to switch between modalities in a seamless, coherent and intuitive fashion. They will need to be able to chose from different compatible input/control techniques dependent on their situation - and be able to choose display formats compatible with their environments.
Although research into AA+A interfaces has just begun, a few basic principles and strategies have been defined from disability research and development which have been used to provide modality-independent, user-independent, and hardware-independent interfaces.
These principles/strategies include:
All of the basic information should be stored and available in either modality-independent or modality-redundant form.
Modality-independent refers to information which is stored in a form which is not tied to any particular form of presentation.
For example, ASCII text is not inherently visual, auditory, or tactile. It can be easily presented visually on a visual display or printer. It can just as easily be presented auditorially through a voice synthesizer, or tactually through a dynamic braille display or braille printer.
Modality-redundant refers to information which is stored in multiple modalities.
An example would be a movie which includes a description of the audio track (e.g., captions) and a description of the video track in audio and electronic text format so that all (or essentially) of the information can be presented visually, auditorially, or tactually at the user's request based upon their needs, preferences, or environmental situation.
The system should have viewers which support the selective modality presentation of the information. That is, it should provide a mechanism for displaying captions, playing alternate audio tracks, etc.
By maintaining an updated listing of all of the information currently available to the user as well as all of the actions or commands available or displayed for the user, it is possible to relatively easily provide great flexibility in the techniques that can be used to operate the device or system.
For example, in a 3D virtual shopping mall, a database is used to generate the image seen by the user and to react to user movements or choices of objects in the view. If properly constructed, this database would be able to provide a listing of all of the objects in view as well as information about any actionable objects presented to the user at any point in time. By including verbal (e.g., text) information about the various objects and items, it is possible for individuals to navigate and use this 3D virtual shopping system in a wide variety of ways including purely verbal.
The use of a simple set of alternate selection techniques, which can accomodate the varying physical and sensory abilities that an individual may have due to their environment/situation (e.g., walking, wearing heavy gloves, etc.), can provide coverage for a very wide range of environmental situations and/or personal abiliites.
A suggested selection of operating modes might be:
This text-based auxiliary interface port can take the form of either a software connection point or a hardware connection point such as an infrared port. The purpose of the port is to allow external hardware or software to query the system (to receive the list of information and action objects available) and to make selection from among the available actions. This port would be used in conjunction with the 'external list' mode described above.
The port might, for example, be used to connect an external dynamic braille display for viewing and controlling the device (kiosk, PDA, or tele/trans/info/comm appliance). As mentioned above, this port can also allow intelligent agents or devices to have (text-based) access to the information and functions in the device/system.
Using modality independent data storage and serving discussed above has a number of advantages besides supporting future 'nomadic', and disability access discussed. Because it allows access via a number of modalities, it also allows information to be made available today in a number of channels. The same information or service can be accessed via graphic web browser or via telephone. Different resolution displays can be easily supported. Even very small low resolution displays can be used. In fact, the problems small low resolution displays pose resemble low vision issues. Low bandwidth systems can also take advantage of the text only access that would be available from such a system. Those with higher bandwidth would not have to be limited to this format but could take advantage of the full graphic interface that their displays and bandwidth would allow them. As a result, information /service providers could use a common information or service server to handle inquiries from a wide variety of people (and agents) using devices with a wide range of speed, and display technologies. Also as technologies evolve, the same serving structure could be used across the technologies.
Through the incorporation of presentation-independent data structures, an available information/command menu, and several easy-to-program selection options, it is possible to create interfaces which begin to approximate the anytime/anywhere and anyone (AAA) interface goal. These types of interfaces have been constructed and are now being used in public information kiosks to provide access to individuals with a wide range of abilities. The same strategies can be incorporated into next-generation tele-trans/info/comm devices to provide users with the nomadicity they will be seeking and requiring in their next-generation Internet appliances.
It won't be long before individuals will be looking for systems which will allow them to begin preparing an important communication at their desk, stand up and continue it as they walk to their car, and finish it while driving to the next appointment. Similarly, users will want to be able to freely move between high and low bandwidth systems to meet their needs and circumstances. They will want to be able to access their information databases using visual displays and perhaps advanced data visualization and navigation strategies while at their desk, but will want to access many of the same information databases using auditory-only systems as they are walking to their next appointment. They may even wish to access their personal rolodexes or people-databases while engaged in conversations at a social gathering (by using a keypad in their pocket and an earphone in their ear - "What is Mary Jone's husband's name?").
The approaches discussed will also allow these systems to address the equity issues of providing access to those with disabilities and those with lower technology and lower bandwidth devices - and provide support for intellegent (or not-so intellegent) agent software.
The AAA strategies presented here do not provide full cross environment access to all types of interface or information systems. In particular, as noted above, fully immersive systems which presented inherently graphic (e.g., paintings) or inherently auditory (e.g., symphonies) will not be accessible to anyone without employing the primary senses for which this information was prepared (text descriptions are insufficient). However a the majority of todays information and most all services can be made available through these approaches. And extensions may provide access to even more.
Finally, it is important to note that not only do Environment/Situation-Independent interfaces and Disability-Accessible interfaces appear to be closely related but that one of the best ways to explore Environment/Situation-Independent Nomadic interface strategies may be the exploration of past and developing strategies for providing cross disability access to computer and information systems.
Click here for a (very large) video clip of the kiosk with EZ Access features. = http://trace.wisc.edu/world/kiosk/ezmovie.mov
Click here for an example of captioned and described QuickTime movies prepared by the CPB/WGBH National Center on Accessible Media in Boston. = http://www.boston.com:80/wgbh/pages/ncam/captionedmovies.html
For more information on these and related topics, see Designing an Accessible World, a cooperative web site on universal design hosted by the Trace R&D Center at the University of Wisconsin-Madison.
Designing an Accessible World
= http://trace.wisc.edu/world