www.goyoyo.com, Yahoo search.yahoo.co.jp and AnySearch www.anysearch.com.
Usually, the users are expected to find their own ways to enter CJK
characters into the text field in the HTML form to search for keywords.
This does not pose a problem to those users on a native platform, but users
on English platforms or other locale platforms will need to install their
own third-party applications to input CJK text. Many third-party
applications for Windows are available, usually running as a keyboard
manager. But Macintosh has limited such applications. Moreover, such
third party applications for Windows and Macintosh are usually either
commercial ware or are available on a trial basis with expiry. There are
various developments of IME servers for UNIX systems but these are not
easy to set up for novice users. Users thus experience much inconvenience
just to input a few characters for searching.
To assist those users on non-CJK locale platform, we developed a Java
applet jInput allowing the user to input CJK text without having to
install any third-party applications. Websites adopting this applet do not
need elaborate instructions to user on installing third-party applications
in order to input CJK text.
The advantage of this approach is the user is not expected to install
any keyboard manager on their system. The user simply waits for the applet
to be downloaded and enters the text into the Java applet. Before
submitting the form, JavaScript will call a public method in the Java
applet to retrieve the text content from the applet. Netscape LiveConnect
[6] technology allows JavaScript to call methods in Java classes. In this
way, the applet works seamlessly with the HTML form as if it is a plain
text field. Both Netscape 3.0/4.0 and Internet Explorer 4.0 currently allow such
JavaScript-to-Java communication.
The applet currently supports the following language encoding and input
methods.
- Chinese GB2312 with PinYin and CangJie methods.
- Chinese Big5 with PinYin, CangJie and Simplex methods.
- Japanese EUC-JP with RomanKana and TCode methods.
- Korean KSC with Hangul and Hanja methods.
For a demo of the jInput applet, see
http://www.irdu.nus.sg/multilingual/jinput/. See below for a screen
shot of the applet.
Fig. 1. Screen shot of jInput applet.
5.2.2. JIMEPlug Netscape Composer plugin
Netscape Composer 4.0 allows developers to write plug-ins (in Java) [7] to
extend the functionality of the HTML editor. Composer currently allows
user to view multilingual text in its editing window, given that the
required fonts are installed and appropriate settings configured.
Unfortunately, it does not provide a localized keyboard to the user to
edit the multilingual text being displayed; the host platform is expected
to provide the input method manager.
The above-mentioned applet is ported to run as a Netscape Composer
plug-in named "JIMEPlug". This is especially useful for users on English
or other non-CJK locale platform who wish to have the I18N capability. A user who
sets up their Netscape Messenger to send HTML email message can even
type an email message in CJK with the help of the plugin.
As with any plug-in, installing the plug-in involves just simply
downloading a ZIP file to the plug-ins directory of where the Communicator
application is installed. The plug-in is a self-contained unit with its
own fonts and keyboard input methods supporting various Chinese, Japanese
and Korean input methods and encoding. Because Netscape Composer uses
Unicode for its internal representation of characters within an HTML
document, authoring of CJK documents in Unicode is also possible.
A beta prototype of the plug-in is available at
http://www.irdu.nus.edu.sg/jime/jimeplug/.
6. Design and implementation (based on JDK 1.1)
6.1. Design issues
Our early prototypes mentioned above were done with the aims of ease of
use and compact file size in mind. We focus on making the Java bitmap
fonts and input method classes compact to minimize the download time.
However, a few shortcomings are inherent. One aspect is the input methods
and bitmap fonts are based on individual native encoding. This enables
each applet to operate well and efficiently when standalone. Combining
these input methods and bitmap fonts did not work well as one single
entity as they are based on different encoding. As such, in the next
version of our development, we realigned our effort and made some
improvements.
- We made use of Java 1.1, which offers several advantages over 1.0.
The new event-handling model in JDK 1.1 is more flexible and ensures
an easier porting path to turn our work into JavaBeans. Making use of host
fonts to display Unicode characters is now possible with Java 1.1. If the
target platform has appropriate Unicode font installed, rendering of
multilingual text with different sizes is much easier.
- All input methods mapping definition are now based on Unicode,
instead of individual native encoding (e.g. GB, Big5, JIS, etc.). The
characters mapped from the user's keystrokes are all in Unicode. Now, when
working with multiple languages, we do not need to perform redundant
conversion between different character sets unless we need to export the
text content in a particular native encoding.
- The simple "table lookup array" implementation of the keyboard
mapping is also replaced with a more efficient and compact "tree"
implementation. On average, the various input methods mapping classes for
CJK benefit from a 4060 percent decrease in file size with the
"tree"
implementation.
Although JDK 1.1 offers significant advantages over 1.0, it is deficient
in other ways and we designed our JIME framework to attempt to address these
inconvenience.
- A single consistent font interface and convenient font utilities for
multiple languages. Java 1.1 does not yet allow you choose a font of a
given encoding, or find out what range of the encoding a font is capable
to render.
- Because different Java virtual machine may shipped with or without
the Sun packages in JDK 1.1, we resort to writing our own converters class
instead of relying on sun.io.* classes to convert between different
character sets and Unicode.
In addition, JIME design try to overcome JDK 1.2 initial support for only
input methods from the host platform. JIME provides various input methods
/ keyboard input for languages other than US English, regardless of the
host platform the Java application is running on. For instance, a Java
application will still get Japanese input methods with JIME even when the
Java apps is running on a Chinese locale host platform.
JIME consists of five packages.
- jime.font it contains typeface implementation to make use of
both Java host system font and the bitmap font we designed for JDK 1.0
(compiled as Java classes), and provides one consistent interface for users
to make use of all kinds of typefaces.
- jime.fontlib this package holds all the glyphs of the bitmap
fonts.
- jime.ime this package deals with keyboard mappings and input
methods. Generally, the input methods are classified into two classes:
direct input and over-the-spot input. Direct input covers keyboards like
Thai and most Western European languages. Over-the-spot input covers
Chinese, Japanese, Korean keyboard input methods which requires a pop-up
window to let user select the characters.
- jime.imelib this package holds the mapping tables of the
various input methods.
- jime.widget this package, as the name implies, contains
necessary components to draw strings, texts, and also layout controllers
to layout components in a clean and flexible way. It also provides
auxiliary widgets, such as buttons, pull-down menus, and over-the-spot
windows, etc.
JIME architecture focuses on enabling input method support in JDK 1.0 and
1.1. The jime.widget components are written to make use of the jime.ime
and jime.font libraries.
6.2. Implementation
The Java applet and the Netscape Composer plugin are re-deployed using
JIME based on Java 1.1 code. In addition, to further illustrate JIME's
flexible multilingual framework, a multiligual text editor
JIMEWord is implemented. Its basic multilingual features
include:
- saving and loading of Unicode UTF-8 or UTF-7 encoding files, since
Unicode is used for internal representation and processing. Saving/loading
of other native encoding is also supported via code conversion routines
from Unicode to the target encoding.
- support for display and input methods of Chinese, Japanese, Korean,
Thai, French, German and many more.
- user-friendly graphical keyboard for ease of typing. This helps if a
user is using a US-English keyboard device and wish to input French for
example. He/she can use the mouse to click on the keypad on the graphical
keyboard for typing French.
A screen shot of JIMEWord with the floating graphical keyboard window
displaying the Thai keyboard mapping is shown in Fig. 2 below.
Fig. 2. Screen shot of JIMEWord with the floating graphical
keyboard.
7. Problems and limitations
Because of the complex nature of internationalization, it is not easy
to get a perfect design. JIME is a good try because it does strive to meet
its objective, and has an extensible structure. However, there will
definitely be some limitations along the way. Currently, JIME API does
provide extensible space for bi-directional horizontal text layout and
edition, because they are all left to individual StringView to handle. No
major changes are required other than just implementing another
bidi-StringView type into the BlockView.
The classes in jime.widget package do not aim to rival Java2D API and
JDK 1.2 advanced text layout features [9]. JIME framework design is
focused on providing full input methods support to JDK 1.0 and JDK 1.1
applications given the unique feature of jime.imelib and jime.fontlib
package.
8. Ongoing/future developments
Java support from browsers is not consistent. Older browsers like Netscape
3.0 and Internet Explorer 3.0 support only Java 1.0. Netscape 4.0 with a JDK patch
supports Java 1.1. On the other hand, Internet Explorer 4.0 has many proprietary
extensions and modification to its Java implementation. Because of this
inconsistency, the newer features in our development work based on JDK 1.1
cannot be shown on older browsers. To work around this and provide backward
compatibility, some wrapper code is required.
We plan to make a JDK 1.0 applet which runs on all Java-enabled browsers
(whether it contains a 1.0 or 1.1 Java VM). Assuming the host system has
the appropriate native fonts installed and Netscape is configured to make
use of them, the applet will try to use these fonts if the browser is JDK
1.1 enabled. If either native fonts are missing or a JDK 1.1 VM is not
present, the applet will fall back to use our jime.fontlib packages'
bitmap font classes. The wrapper applet should be able to dynamically load
the correct Java codebase based on the situation described above.
To make the JIME code-base and framework reusable, we are in the process
of porting it into JavaBeans. With JavaBeans, software developers can
easily reuse JIME components and build native keyboard input methods into
their Java 1.0/1.1 applications regardless of the locale of the host
platform it will be running on.
To increase JIME support base, its extensibility has to be further
enhanced through adding support for more languages to its portfolio. We
are extending the framework to include more European languages, Indian
languages (like Hindi and Tamil) and maybe even bi-directional writings
like Arabic and Hebrew.
Conclusion
Java 1.2 input method framework is a step in the right direction.
Unfortunately, only input methods supported by the host platform's native
input method managers are available to Java applications. However, with
Java 1.2 support of Java Foundation Classes (JFC), AWT peering widgets are
being complemented by JFC peerless components. Because JFC widgets are
lightweight standalone components, they do not rely on the host platform
widgets' functionality. As such, it is expected (according to the Java 1.2
input method framework documentation) that future releases of Java and JFC
may provide full input method support regardless of the host platform
the Java application is running on.
In the meantime, JIME serves as a good transitional component for JDK
1.0 and 1.1 (or even 1.2) developers who need the native input methods
support for their Java applications, especially since Web browser support
for the latest Java VM do not catch up as fast as Javasoft's JDK releases.
In conclusion, the Web is moving towards a more "World Wide" reach and so
is Java. With Java, we are close to realizing true internationalization of
cross-platform applications. Java Input Methods will make your localized
applications more complete.
Acknowledgments
We wish to thank the following persons who have contributed in the coding
and Web page design in one way or another Rose Boey, Chen Ling, Chen Yu,
Gong Min, Yak Shu Herng, Yin Jun, Wen Qiang, Zhu Xiao Peng (in
alphabetical order). Their effort has made this project possible.
References
- D. Raggett,
HTML 3.2 Reference Specification, 14 Jan 1997,
http://www.w3.org/TR/REC-html32.html
- D. Raggett, A. Le Hors and I. Jacobs,
HTML 4.0 Specification, W3C Working Draft, 17 Sep 1997,
http://www.w3.org/TR/WD-html40/
- F. Yergeau, G. Nicol, G. Adams and M. Duerst,
RFC2070, Internationalization of the Hypertext Markup Language, Jan
1997,
ftp://ds.internic.net/rfc/rfc2070.txt
- <FONT FACE> considered harmful,
http://www.isoc.org:8080/web_ml/html/fontface.html
- JDK 1.1 Internationalization Specification, 4 Dec 1996,
http://java.sun.com/products/jdk/1.1/intl/html/intlspecTOC.doc.html
- Netscape LiveConnect,
http://home.netscape.com/eng/mozilla/3.0/handbook/javascript/livecon.htm
- Netscape Composer Plug-in Guide,
http://developer.netscape.com/library/documentation/communicator/composer/plugin/contents.htm
- JDK 1.2 Beta2 Documentation, Input Method Framework,
http://developer.javasoft.com
- IBM's Java Education, International Text in JDK 1.2,
http://ww.ibm.com/java/education/international-text/
- Hanzi Bitmap Format (HBF),
ftp://ftp.ifcss.org
Vitae
Leong Kok Yong
is the principal researcher in the I18N group of the
Internet R&D Unit (IRDU) of the National University of Singapore. He has
worked on multilingual development work since 1995, with focus on the
World Wide Web and Java.
kokyong@irdu.nus.edu.sg
[http://www.irdu.nus.edu.sg/~kokyong/]
Internet R&D Unit, National University of Singapore, 10 Kent Ridge
Crescent, Singapore 119260
Oliver P. Wu
was formerly attached to IRDU as a student researcher, working
on the very early design and implementation phase of JIME. He has since
joined the BioKleisli research group of the BioInformatics Centre (BIC) of
the National University of Singapore. He is currently a senior software
engineer at the Kent Ridge Digital Laboratories.
owu@bic.nus.edus.sg
[http://adenine.krdl.org.sg:8080/~owu/]
Research Unit, BioInformatics Centre, Institute of Systems Science /
Kent Ridge Digital Laboratories, 21, Heng Mui Keng Terrace, Singapore
119613
Liu Hai
is a student researcher working with IRDU I18N group. He is
completing his undergraduate degree course on Information Systems and
Computer Science in the National University of Singapore. After working on
JIME, he subsequently got an opportunity to be attached to Netscape
Communications Corp for a summer internship program for 3 months in May
1997.
liuhai@irdu.nus.edu.sg
[http://www.irdu.nus.edu.sg/~liuhai/]
Internet R&D Unit, National University of Singapore, 10 Kent Ridge
Crescent, Singapore 119260