The integration of Emacspeak and W3 provides a powerful speech environment where the user can fluently browse the WWW while reading email or Usenet news, or when encountering a URL within a document that is being editted or proof-read. In addition to the advantages of a speech-enabled browser described in the section on user experience this tight integration between the speech-enabled WWW browser and the audio desktop is a crucial factor in making the Emacspeak platform a very productive environment for day-to-day work.
The technique used to implement both visual and aural speech styles is analogous. In Emacs, all textual content is displayed and manipulated by placing the text in a buffer. The Emacs system is responsible for managing and displaying text placed in buffers. Such text can be annotated with additional properties that control the visual appearance of the text, e.g., the color and font used. W3 implements the visual style sheet by annotating the text being displayed with the appropriate visual properties. In addition, when running in a speech-enabled context as when using Emacspeak, W3 annotates the text with the speech properties specified in the speech style sheet. When the displayed document is spoken by Emacspeak, the user hears an audio formatted rendering (see [1]). This implementation strategy has the added advantage of keeping the synchronizing the spoken and visual renderings ---thus someone who is both looking at the screen and listening to the output perceives the effect of both visual and aural style sheets.
We use a sample radio button group that might appear in a coffee order form to demonstrate the mapping of HTML form elements to contextually rich user interface widgets. Notice that the radio button group has been encoded using some of the enhancements detailed in the section on proposed extenstions. See the section on user experience for details on the the user experience when working with forms that are encoded using the current HTML standard encoding, and the improvements that result with the enhanced encoding.
When W3 parses an HTML document containing interactive form elements, it maps these to appropriate user interface widgets such as checkboxes or radio groups; in this sense, the processing is no different from other WWW browsers. However, when processing the radio group shown in the example, W3 annotates each radio button with its associated name e.g. a0 in the example, as well as the label (if available) e.g. 5 pounds. The interface treats the entire group of radio buttons as a logical entity --this is easier in cases where tag group is used, but is still possible in the standard case by applying heuristic techniques. When providing speech feedback, Emacspeak examines these additional contextual information stored in the interface widgets to produce spoken dialogues that approximate what a human would say. Thus, for example, the user hears
Group how much coffee would you like is currently set to 5 pounds