HTML Made Easy: The XTND HTML Translator


Jonathan Ryan Day and Brian A. Sullivan
Computer Science & Engineering

Jeff Spitulnik
School of Education

Elliot Soloway
Professor of Electrical Engineering and Computer Science
Professor of Education

Highly Interactive Computing Group (HI-C)
Department of Electrical Engineering and Computer Science
University of Michigan
1101 Beal Avenue
Ann Arbor, MI 48109

Abstract


Accessing documents on the World-Wide Web using NCSA Mosaic as an interface provides an unparalleled on-line educational information resource. Enabling users to present their compositions on the Web is as important as accessing existing information. To facilitate this, it is necessary for users to be able to easily compose documents to be accessed on the Web.
At the Highly Interactive Computing group (HI-C) at the University of Michigan, we have created a tool which simplifies the publishing process. By developing a translator which allows users to create documents in a familiar application and save as an HTML file, we have eliminated the need to know how to compose HTML documents. The translator also allows importation of any HTML document for modification. We chose to target ClarisWorks for Macintosh as the document composition application. It is a popular application in the educational market as well as the industry in general.
Our translator allows the users to create their documents and presentations containing text styles, and pictures as they normally would in ClarisWorks, and save their compositions as HTML files. To do so, the users simply select the HTML file format option in the 'Save As...' Standard File dialog. The translator is useful in converting existing ClarisWorks documents to HTML format as well. It provides WYSIWYG conversion and allows the user to specify links to other files. Once saved, the HTML files can be made accessible by uploading them to a WWW server.

Introduction


The Highly Interactive Computing Group at the University of Michigan consists of undergraduate and graduate students in both education and in computer science as well as professors from both fields and professionals. We work closely with schools to research ways in which computers can aid in, and further stimulate the educational process. Through interaction with students, educators, and administrators at Community High School in Ann Arbor, we are trying to determine which types of applications and tools are beneficial for educators and students. We then develop the applications or tools and test them in the classroom with the high school students. This gives us information about which types of software tools are useful and which are inappropriate or ineffective.
Importance of the World-Wide Web and Mosaic in the classroom

The World-Wide Web provides accessibility in finding information on virtually every subject from sources around the world. For students, this type of resource is a tremendous asset in the learning process. Not only does the Web provide information on topics which cannot be found anywhere else; with an HTML browser such as Mosaic, it also presents the data in a form which is new, and interesting to students and other users. From this point on, we will assume NCSA Mosaic 1.0.3 or newer for Macintosh in all discussions of an HTML browser. As the world of information continues to increase in size, the Web provides a standard, easy to navigate information structure while Mosaic adds functionality and an inviting graphical interface [1]. All of this is very important when targeting students as users. The easier it is for them to find the resources they need, the more likely they will be to use and benefit from them. It is also important to realize that publishing information in the form of HyperText Markup Language (HTML) documents placed on a Web server for worldwide browsing is crucial in the educational process as well. It is relatively easy to get started browsing the Web to find a plethora of information, even as a novice computer user. However, this constitutes only half of the two-way information sharing path that the Web provides. Users need to be able to publish such documents for others to view. This, however, requires individuals to have some knowledge of the HyperText Markup Language to compose even the most basic documents.
The HTML barrier to document composition

The HyperText Markup Language is a markup language which is a Document Type Definition (DTD) written using the Standard Generalized Markup Language (SGML) model [2]. It is the standard format for hypertext documents being served by World-Wide Web servers. HTML itself has a framework which consists of tags within a text document which dictate the style of the body of text they surround, or they may indicate links to other documents and other types of resources available on the Web such as movie clips and audio clips.
Several stumbling blocks exist for novice users trying to begin composing HTML documents. There is the task of finding information on HTML itself. Many on-line references and primers are easily accessible on the Web, but currency, completeness and clarity in documentation are some factors that complicate the process. For users not familiar with a markup language, the whole framework may be unfamiliar as well. Even when armed with documentation and an idea of how HTML is structured, it is often necessary to expend considerable time reformatting a document to achieve the desired result. Overall, it is very time consuming to compose documents when compared with composition through use of a familiar page layout or word processing application.
Motivation to develop the translator

The desire to allow users to compose HTML documents without having to learn HTML was sparked by research in the Highly Interactive Computing Group involving K-12 curricula. We realize how important it is to open the door to the Web for students, but at the same time, not to overwhelm them with the task of learning HTML. We found that, in order to make the procedure of creating HTML documents seamless to the students when using ClarisWorks for Macintosh, it was necessary to develop a translator which provided transparent translation to HTML format. It would not have been enough to simply develop an application which would allow drag and drop conversion or as a separate translation application entirely. Rather, a translation option accessible to the user through the 'Save As...' dialog box from within the application was exactly the desired solution. This was our primary motivation to develop the translator. We have realized the potential for application outside of the educational world as well, however.

Implementation


The XTND System

Developed by the Claris Corporation, the XTND System allows an application to use a virtually unlimited number of file formats. The XTND System itself is a document which contains several routines which are called by XTND-capable applications. The job of the XTND System is to present a list of all available translators to the Standard File dialog for the user to select and indicate which document to read from or write to [3]. The XTND System then loads the selected translator. From this point the translator does the remaining work. That is, it uses the available data either from the application on export or from the input file on import and performs the translation. The translator is, then, the intermediate piece in the model that determines which information gets passed from the application to a destination file or vice versa.
The modularity of the XTND System allows each translator to be developed independently of other translators. The ability to create import and export translators for a specific file format without being concerned with other file formats or translators adds much flexibility in development. For these reasons, we chose to develop using the XTND System.

XTND HTML Translator

During the export process, the translator has access to data structures which contain data from the application. This data comes in different phases. If the data is text, it comes in the form of text runs, or continuous streams of text of like style [3]. Each text run may have any number of the available style characteristics associated with it. These characteristics include font type, size, style, color, and other formatting characteristics. They are important during export as the basis for which tags to write when exporting data to the destination HTML text file. The translator also has access to information about images contained in documents. If an image is encountered during export, it is denoted by an image flag which the translator checks at the beginning of each phase. If an image exists, the translator receives a pointer to the image [3] which it uses to write the image to a file. A more detailed description of how images are handled appears in the section on functionality mapping. As explained above, the translator receives information about a document being exported, then writes the data to the destination file as described in the mapping between ClarisWorks and HTML in the section on functionality mapping.
The import process is very similar to the export process. During the import process, the translator reads data from the source file which is an HTML text file. As expected, nearly the reverse of the export process occurs. The text is read in by the translator and the tags are interpreted to determine what information is to be sent to the application. The translator sends information regarding the input data by using similar data structures as were used as information sources during export. The appropriate characteristics are set for the incoming text, and then passed to the application to be displayed for the user. Since the input file must be a text file, there are only text runs to process, however, depending on the tags read, it may be necessary to open an image which is linked to within the HTML document. The method followed in such cases is described in the section on functionality mapping.

The user interface

There are several pieces which make up the translation package for use with ClarisWorks. First, there is the XTND System itself, which comes with all Claris XTND-capable applications and resides in the Claris folder in the System Folder on a Macintosh. Then, the XTND HTML Translator is located in the Claris Translators folder in the Claris folder. Finally, to add ClarisWorks-specific functionality for handling links and styles, there is HTML Document stationery for ClarisWorks. This stationery document is a template which is used to aid in composing HTML documents. It has styles specific to HTML headers, preformatted text, and literal HTML commands listed in the Style menu as shown in Figure 1. This is done to provide a method for the user to specify headers and styles from the familiar ClarisWorks Style menu. Also included in the stationery is a macro with an associated shortcut button. When the shortcut palette is displayed, the 'Link' button appears as in Figure 2. When the user clicks on the macro button, a link is created in the manner described in the next section.






The style menu additions and the shortcut macro mentioned above are the only noticeable differences in ClarisWorks using the HTML Document stationery as opposed to using no stationery at all. As mentioned above, the translation is initiated in the Standard File dialog by selecting the translator from the pop-up list of available translators. The dialogs are very similar for both export and import as shown in Figure 3. Therefore, without breaking from familiar user interface guidelines on the Macintosh [5], we have added functionality not inherent in ClarisWorks, enabling users familiar with ClarisWorks to translate their documents to HTML format easily.


Mapping ClarisWorks and HTML functionality


Due to the differences in the types of styles and document formatting supported by ClarisWorks and HTML it was necessary to evaluate how to map styles and formatting between the two representations of a document. This mapping was the model on which the translator code structure was based. The main issues to consider in the export mapping from ClarisWorks to HTML were document formatting, font type, size, and style, and support for links to other resources or documents. When looking at import functionality mapping from HTML to ClarisWorks, the issues are identical but handled very differently due to the nature of importing versus exporting.
In exporting, since margins, multiple columns, page breaks, footnotes, document headers, and footers do not have direct counterparts in HTML files, these characteristics were ignored. We assumed the user created the document in a single column and that footnotes would be used for a special purpose other than as footnotes, which is described later in this section. Since the type of font used to display an HTML document is dependent only on the browser, the font type is irrelevant and is ignored. The remaining text export issues deal with font attributes.
Font style and size are important issues in both ClarisWorks and HTML. In ClarisWorks any size font can be used, but the only method for specifying differences in font sizes in HTML is to tag the selected text with header level tags or other style tags, then specify different font sizes for each header or other tag type in the preferences of the HTML browser. This assumes use of a browser which allows users to specify such preferences, as is true in the case of NCSA Mosaic but may not necessarily be true for other browsers. In order to maintain a certain level of uniformity in font size mapping we divided font sizes into six different groups, each to be tagged as a different header. Table 1 shows the mapping between font sizes in ClarisWorks and their corresponding header tags in HTML. Provided that the font size representations of the different headers scale down in the HTML browser from largest font for <H1> to smallest for <H6>, the relative font sizes will look similar, if not identical, in an HTML browser and ClarisWorks.



In terms of font styles, the translator supports bold, italics and underline. ClarisWorks also allows strike through, outline, shadow, condensed, extended, superscript, subscript, and text colors, however, these styles do not have equivalent tags in HTML. With the exception of superscript numbers and text colors which are used by the translator as link flags and literal HTML text flags, which will be explained later in this section, the unmatched styles are ignored, and the translator treats the run as plain text.
Export translation of images within documents occurs as follows. When the XTND System notifies the translator of an image during the export process, a file is created in the current folder in the Macintosh file system which is titled filename.gif, where filename is the name specified in the Save dialog by the user, then the image information is stored in the new file as a GIF image. An <IMG SRC = "filename.gif"> tag is written to the HTML export text file which is a link to filename.gif so that the image can be inlined when viewed with an HTML browser. This inline image link is generated assuming that the image file will be located in the same directory as the HTML file itself which is true during export since the translator assures this.
Even though ClarisWorks does not provide a method for users to add certain media such as sound files to documents, which would be suitable in HTML, there is support for user specified links to facilitate this. We realize that there are limitations to ClarisWorks in terms of being able to display all possible types of media, but, with the ability to specify links explicitly, the user still has full control to add different media and to specify links to other HTML documents. The 'Link' macro described in The user interface section above enables users to specify links. Clicking on the macro button while highlighting a selection of text to be the link text, causes the text selection to be changed to blue colored, underlined text and tagged with a superscripted footnote number. Footnote text is then created in association with the particular footnote number at the bottom of the page in ClarisWorks. Initially the footnote text is 'INSERT URL HERE.' This method is for use in specifying links to other documents on the Web by typing the Uniform Resource Locator (URL) as the footnote text. There is also functionality through the Style menu associated with the HTML Document stationery as described above, to insert literal HTML text which will not be translated, but rather echoed to the HTML export text file. A selection of text with 'Literal HTML Text' style associated with it appears in red. This can be any valid HTML text including plain text, tags, links to other resources, or anchors. This functionality makes it possible to create HTML documents in ClarisWorks which support all possible HTML tags. We based the translator on Level 1 HTML specifications, but with the literal HTML text option, any currently valid tags may be used.
When importing HTML documents into ClarisWorks using the translator, the process is fairly straight forward. The text is read from the HTML file and the translator checks for valid tags. Header tags, logical and physical styles, and normal text are all handled at the time they are read. The appropriate style flags are set, and the particular styles are applied to the text read in, then passed to the application. Any list tags read are simulated with tabs in ClarisWorks since it has no built in functionality for displaying lists. If the tag read is a link to another document, it is displayed as a numbered footnote at the bottom of the page with blue link text in the body above. If the text read is a reference to an inlined image, an attempt to find the image is made. Upon a successful search, the image is inserted into the document being displayed in ClarisWorks. If the image file is not found, the link text is displayed in red. This is true for all tags which are read that are not understood by the translator. After importation, what the user sees is text with possibly different styles, footnotes denoting links if any exist, and red text displaying any information read which was not understood by the translator such as bad HTML code or unsupported tags.


Use and effectiveness


The XTND HTML Translator was developed to make composition of documents as easy as possible, however, some basic knowledge is required. Knowledge of the Macintosh operating system and file system structure are imperative, at least at a working knowledge level. Also assumed, is that the user is familiar with ClarisWorks for Macintosh, Claris stationery, use of macros in ClarisWorks, and ClarisWorks tools and menus. It is also important to understand how the Standard File dialogs for saving and opening documents operate. These are reasonable requirements. The group of high school students we conduct research with have all been introduced to these concepts through classroom instruction and have been able to use the HTML translator successfully.

Building knowledge

The method by which the translator functions is very conducive to gradually learning more about HTML documents and how to compose them. Initially, it is easy to create flat documents, that is, documents with text styles and images, but no links to other documents or resources. Without much instruction about URLs and Web navigation, users can become familiar with and start including links to other documents in their own compositions by using the provided shortcut macro in ClarisWorks. Eventually, the ability to include any HTML text will enable users to be able to go beyond the basics to incorporate other resources by specifying them in literal HTML text which is written directly to the output file without translation. This is helpful in exploring new resources supported by HTML while still being able to benefit from the simplicity of composing in a familiar, user-friendly environment.

Future directions


The XTND HTML Translator is currently being further developed to add even more flexibility in composing HTML documents. One such development is support for QuickTime movies. ClarisWorks does allow for inclusion of QuickTime movies in documents, however this is not currently supported by the translator. Future versions will accommodate movies.
As mentioned earlier, Level 1 HTML specifications [6] were used to model the current version of the translator. Since Level 2 HTML documents are proliferating and support is being added for Level 2 functionality in HTML browsers such as NCSA Mosaic, future versions of the XTND HTML Translator will also handle Level 2 compliant documents. As Level 3 HTML or HTML+ is further developed, it will be considered for support in future versions of the translator as well.
We will continue testing use of the translator in the educational environment to find ways to further increase its beneficiality and ease of use. As possible improvements in the translator and the included HTML Document stationery are recognized, such changes will be implemented in subsequent releases.

Conclusion


Using the XTND HTML Translator greatly decreases the amount of time required to compose HTML documents while also decreasing the level of knowledge necessary to begin. It is a valuable tool for users at all levels of HTML knowledge for increasing document composition efficiency, while still allowing as much flexibility in manipulating text and images as ClarisWorks allows, and also providing a means for support of future developments in the HTML standard. It also makes simpler the task of progressing through learning different aspects of Web publishing. It is possible to publish documents on the Web with little or no knowledge of HTML, thus enabling anyone with access to a Macintosh, ClarisWorks and the XTND HTML Translator to create HTML documents. The translator allows more people to participate in the growth of the Web, and move away from being merely observers.

Acknowledgements


Recognition of the need for the translator and its application in educational curricula is due in total to Jeff Spitulnik, School of Education, University of Michigan, and Professor Elliot Soloway, Professor of Electrical Engineering and Computer Science, Professor of Education, University of Michigan. Sean DeMonner, School of Education, University of Michigan, collected data and instructed a group of high school students about how to use Mosaic, what the World-Wide Web is, and how to use the XTND HTML Translator to compose documents to be viewed in Mosaic. He is also currently using the translator as a key tool in forming a model for bringing schools up as Web sites. Mike Farr, formerly of Claris Corporation, now with Helios, Inc., gave us help in resolving several issues specific to ClarisWorks during the development process. Kathy Brade, University of Michigan, supplied information on Macintosh Toolbox routines. The entire HI-C Group gave us hardware and software support during development of the translator. Special thanks to the Community High School students who tested the translator.

References


Software Development Group, National Center for Supercomputing Applications, NCSA Mosaic for Macintosh User's Guide Version 1.0, University of Illinois at Urbana-Champaign, Champaign, IL, 1994.

National Center for Supercomputing Applications, A Beginner's Guide to HTML, University of Illinois at Urbana-Champaign, 1994.
URL: http://www.ncsa.uiuc.edu/General/Internet/WWW/HTMLPrimer.html

Developer Technical Publications, XTND Programmer's Guide For XTND 1.3 version 1.3, Apple Computer, Inc., Cupertino, CA, 1991.

Claris Corporation, ClarisWorks for Macintosh Handbook, Claris Corporation, Santa Clara, CA, 1993.

Apple Computer, Inc., Macintosh Human Interface Guidelines (Apple Technical Library), Addison-Wesley, New York, 1992.

Berners-Lee, Tim and Connolly, Dan, A specification in hypertext, CERN, Geneva, Switzerland, 1994.
URL: http://info.cern.ch/hypertext/WWW/MarkUp/HTML.html

Authors


Jonathan Ryan Day is a senior undergraduate Computer Science student at the University of Michigan. He has worked with the University of Michigan Libraries Recon group maintaining data integrity in the on-line catalog searching system. He has also worked with Learning Center, Ltd, a computer networking and service company, developing in-house applications. Recently, he has been working with the Highly Interactive Computing group at the University of Michigan on several research projects.
Brian Sullivan is a senior undergraduate Computer Engineering student at the University of Michigan. He has worked with Chase Manhattan Bank as a systems engineer, at Northern Telecom as a system administrator, and the Computer Aided Engineering Network (University of Michigan) as a network specialist. Recently, he has been working with the Highly Interactive Computing group at the University of Michigan on several research projects.
Jeff Spitulnik is a doctoral student in the School of Education, University of Michigan. He has been one of the primary links in the project involving the University of Michigan School of Education, the Highly Interactive Computing group, and Community High School, Ann Arbor, MI.
Elliot Soloway is a Professor in the Department of Electrical Engineering and Computer Science, College of Engineering, University of Michigan. He is also a Professor in the School of Education. Previously he was an Associate Professor at Yale University, Department of Computer Science. His area of research is currently interactive learning environments. In particular, he is concerned with how learners of all ages will routinely manipulate computational media in pursuit of their learning and workplace goals. Soloway's research group has produced MediaText, a multimedia composition environment, that is available commercially and is being used in hundreds of classrooms around the country. Soloway is Editor-in-Chief of the Interactive Learning Environments journal, published by Ablex, Inc. He was a Keynote Speaker at the ACM Conference on Computer-Human Interfaces in 1989, and has given numerous tutorials and presentations at other CHI conferences.

_________________
This project was supported, in part, by a grant from the National Science Foundation
[RED 9353481].

_________________
Contact: Jonathan Ryan Day, jrday@eecs.umich.edu