This paper presents a prototype intermediary system targeted at enhancing mobile Internet user experiences. With this system, mobile users can interact with Internet services through multiple-channels (e.g., not only relying on the single channel of a phone-based browser but also using text messages and voice interaction as well), actively switching among channels according to preferences, web contents, and environmental factors. Our architecture, implementation, and usage scenarios are discussed.
Internet Intermediary System, Mobile Internet User Experience, Multi-channel Interaction, Channel Switching
Mobile web users suffer from many limitations of current interfaces based on the Wireless Access Protocol (WAP), such as small screen size and limited interaction capabilities [1,2]. In many cases, users would prefer more flexible methods of interaction. For example, a user with even minor visual impairment might prefer to hear news rather than to read it on small screen, and a user who is in a meeting might prefer to receive information pushed to him or her by voice or text messages rather than explicitly pulled through a WAP user interface.
Further more, a user "on the move" may has quite different choices on interaction methods from time to time. Different web contents, different devices (e.g., WAP-enabled or not), different user contexts [1,4] and different environmental factors will lead users to choose different modalities---and to change their choices frequently. A mobile user's environment changes as the user moves, which can lead to changes in illumination, background noise, and privacy that might force the user to change his or her preferences. Thus, to enhance mobile Internet usability, the user should be able to interact with the services through multiple channels and should also be able to switch channels due to changing environmental and other factors.
This paper presents several interaction functions to enhance mobile Internet user experience: (1) Users can interact with the same Internet service through different channels; and (2) Users can switch channels at any transaction step; (3) Using the voice channel, users can select an item verbally or by pressing buttons; and (4) Supplemental or non-critical contents can be eliminated if the Internet service was originally targeted at users with large displays.
We developed an intermediary- or proxy-based system as the middleware to manage our multi-channel interaction process. More precisely, in our system, users interact with web services through a WAP Gateway, SMS Gateway, or Voice Server in conjunction with a specially tailored intermediary that transcodes web content into suitable forms and also coordinates multi-channel interaction.
The middleware includes a transcoding function implemented using the Web Intermediaries (WBI) Development Kit [3] that can transform HTML pages into WML, VoiceXML and text messages at run-time. The middleware also includes an implemented component named HTML Session Manager that coordinates multi-channel interaction.
The middleware implements the channel-switching function, which can be initiated by either the user or the middleware itself. The middleware will switch when it discovers data that are beyond the current channel's capability. We annotate such channel limitations related to special Web URLs in middleware database, and so the middleware can make switching decision through actively checking the HTTP request headers to find the annotated URLs/channel information. For example, it might be impractical to input some text contents (e.g., mailing addresses) by voice, so switching to SMS or WAP would be reasonable. In this case, the middleware suggests alternative channels for the user to adopt. The user might switch channels for any one of a number of reasons, as mentioned previously. To handle this properly, the middleware must determine whether the user is in the middle of an ongoing transaction. It is assumed that mobile users tend to continue the most recent transactions in progress. The middleware keeps a supplemental annotation file of specific URLs so that it can check whether a transaction has been completed or not.
For synchronizing transactions across different channels, the middleware maintains a data model of current HTML pages and adds "partial-submit widgets" into the transcoded pages. A sample "partial-submit widget" looks like this,
<input type=submit name="partialSubmitWidget" value ="Update">
The intermediary checks each HTTP request header for such parameter/value pair i.e. "partialSubmitWidget=Update" and updates user input to related HTML model. In this way, any partial update in one channel can be shared with other channels. The middleware will load an HTML model instead of a new page from Internet if user switches channels in the middle of an HTML page.
In transcoding pages, our system adds a DTMF-enabling widget into all VoiceXML-selection widgets to enable control via the phone keypad. As a result, users can select an item verbally or by pressing buttons. The transcoding function also takes advantages of annotation files to eliminate some other interaction difficulties. For example, our HTML2VoiceXML syntactic transcoder translates a HTML phone-querying "text-input" widget into a VoiceXML "phone-input" widget, through the use of annotation files. The middleware annotates such a HTML phone-query as follows, where the widget's x-path property is "/HTML[1]/BODY[1]/FORM[1]".
<?xml version='1.0' ?>
<annot version="2.0">
<description take-effect="before" target="/HTML[1]/BODY[1]/FORM[1]" condition="(User-Agent=*voice*)">
<remove/>
<insertmarkup>
<form>
<field name="userphone" type="phone">
<prompt>What is your telephone number? </prompt>
<filled>
<return namelist="userphone"/>
</filled>
</field>
</form>
</insertmarkup>
</description>
</annot>
The middleware will not initiate a channel-switch in this case.
Financial services are becoming increasingly popular among mobile users. A prototype, implemented in Java, was demonstrated at the China Banking Show 2001. To support this demonstration, we also developed an SMS gateway and adopted two commercial software i.e. IBM Websphere Voice Server and WAP gateway. These three components helps in caching the transcoded pages, deliverying them in different modalities and translating different protocols. Our system is connected with a remote Internet bank that provides all standard services for web access. We eliminate supplemental or non-critical contents through annotation authoring. For example, a typical HTML page as figure 2 presents two different parts i.e. part 1 and part 2. When user is in the middle of a multi-page transaction like account transfer, he or she actually needs only the part 2. So we have transcoders remove part 1 in several pages through authoring page-related annotation files (e.g., annotation files targeted at pages in account transfer).
Mobile users could accomplish nearly all banking transactions through WAP, SMS text messages, and voice, as described. In addition, users could switch among the three channels while performing any transaction step. To see the benefits, consider the following Mobile Internet Banking scenario in which a user named Sarah wants to transfer money among accounts while on the go.
WAP interaction: To begin, Sarah logs into the Internet service through her WAP phone. She first checks account balances and transaction history. As this information is being displayed on several pages on the phone's small screen, she realizes she is late for her next appointment. Because it is difficult to see all the information on the screen---and, moreover, it is not feasible at all to see the information while traveling---she presses a sequence of buttons on the phone to switch to the voice channel.
Voice Interaction: Using voice interaction through the phone, Sarah can continue on the way to her appointment, listening to her banking records and interacting with the service through verbal commands and buttons. If need be, she can also switch channels back to WAP simply by refreshing the WAP GUI interface. Using verbal commands, she can complete the transactions for moving funds among her accounts.
Text Message Interaction: In the end, Sarah wants a text record of the transaction. She sends a text message to switch from voice to text mode. The text-message response contains specific command templates that Sarah edits and sends back to complete her transaction.
In the near future, we plan to enhance the multi-modal interaction support of our demonstration system by adding synchronized interaction through different channels, in addition to including service support for news portals and ecommerce sites.