Philip Thrift
Member, Technical Staff
Texas Instruments
Dallas TX USA
In authoring HTML, one is frequently challenged to include
notations external to HTML. Notations like
or
pic may be useful to embed in an HTML document.
Typically this process involves either:
Neither approach is particulary practical to the person authoring primarily in HTML and who only needs to escape to external notation as needed. In this paper a technique for directly embedding external notations in HTML called preprocessing instructions (PPIs) is presented.
Advantages gained by including PPIs are:
Preprocessing instructions are proposed as a means of embedding non-HTML data and associated translation programs (called filters) in HTML documents. They are useful for encoding data objects not currently expressible in HTML (e.g. math, tables, graphics, etc.) within the HTML document itself, facilitating authoring. In the implementation presented in this document they are syntactically analogous to server-side includes.
Server side includes (SSIs) provide a way to embed deliver-time
information in HTML documents. It is supported by NCSA's httpd
by following certain conventions in the location and file type
of HTML documents containing SSIs. In a typical setup,
HTML files with embedded SSIs have a .shtmlfile extension,
to distinguish them from `pure' HTML files with a .html
file extension. According to the
documentation:
All directives to the server are formatted as SGML comments within the document. This is in case the document should ever find itself in the client's hands unparsed. Each directive has the following format:SSIs provide a way of including deliver-time (or run-time) information within the delivered HTML document, such as date, text sections that are periodically being updated, etc. There is perhaps some controversy about this approach in regards to server load, security, and certainly their SGML correctness. These issues will not be addressed here.
<!--#command tag1="value1" tag2="value2"-->
While SSIs are a run-time processing feature, preprocessing instructions (PPIs) are proposed here as a compile-time processing feature. Syntatically, in this experimental implementation, they are similar to SSIs.
PPIs are used to embed data in HTML documents that is then passed
to designated filters that translate the data, the result
being incorporated in a new HTML document. HTML documents with
PPIs will have the file extension .phtml (pre-HTML)
and will be called PHTML files.
cphtml is a PHTML compiler with the following usage:
cphtml [-s] [doc.phtml]+
where one or more PHTML files appear on the command line,
with the result being for each file either a
corresponding doc.html file, or
a doc.shtml file, if the -s flag is present.
The format for embedding PPIs within PHTML files is:
<!--%filter cdata-->
where filter is the program that will process
the character data in cdata.
Note:filtercannot contain a space character, andcdatacannot contain the close comment delimiter (--). There must be at least one space character betweenfilterandcdata.
When cphtml is executed, it finds each PPI
within the doc.phtml file. The designated
filter program is executed with two additional
environment variables:
$PPIDOC=doc
$PPINUM=ppi#
where doc is the file name root (doc
in the above format description) and
ppi# is the count of the PPI in the file
(starting at 1). The cdata is passed to the program
on the standard input (EOF is reached at
the close comment delimiter --, which cannot appear
in cdata). What filter writes on
standard output in inserted in the output HTML file.
A typical use of PPIs is to be able to embed external notations in HTML documents, and to designate a filter to convert the external notation into HTML. For example:
<!--%TEX \Large{\(E = mc^2\)}-->
where TEX produces a transparent gif image
named doc-ppi#.gif (doc and
ppi# being the values of the environment variables passed to
TEX) from the given
data and returns,
for example,
This would appear in a viewer as:
<IMG src="doc-ppi#.gif" ALT="doc-ppi#">
The TEX filter can be used to produce inlined transparent
gifs for tables and mathematical entities. Here is a shell script for
the TEX filter:
#!/bin/sh
DOC=$PPIDOC
NUM=$PPINUM
cat > $DOC-$NUM.tex << HEAD
\documentstyle[12pt]{article}
\unitlength 0.2in
\thispagestyle{empty}
\begin{document}
\noindent
HEAD
cat >> $DOC-$NUM.tex
cat >> $DOC-$NUM.tex << FOOT
\end{document}
FOOT
latex $DOC-$NUM.tex > .latex-errors
dvips $DOC-$NUM.dvi > $DOC-$NUM.ps
pstogif $DOC-$NUM.ps $DOC-$NUM.GIF > .pstogif-errors
giftrans -t 1 -b 0 $DOC-$NUM.GIF > $DOC-$NUM.gif
rm $DOC-$NUM.tex $DOC-$NUM.dvi $DOC-$NUM.ps \
$DOC-$NUM.aux $DOC-$NUM.log $DOC-$NUM.GIF
echo -n "<IMG SRC=\"$DOC-$NUM.gif\" ALT=\"$DOC-$NUM\">"
Here is an example from a PHTML file where some embedded math appears:
<P>Euler's equation looks like:
<P ALIGN=center><A NAME=euler>
<!--%TEX
\Large{\[ e^{i\pi} + 1 = 0 \]}
--></A>
<P>where <!--%TEX \(e\)--> is the natural
logarithm base
and <!--%TEX \(i = \sqrt{-1}\)-->. In larger type, this
looks like <!--%TEX \large{\(i = \sqrt{-1}\)}-->.
This is then translated by cphtml into:
Euler's equation looks like:
where
is the natural logarithm base and
. In larger type, this looks like
.
An example of embedding a table in HTML (whenever HTML 2.0 is released, tables should be supported natively):
<P ALIGN=center><A NAME=table1>
<!--%TEX
\begin{tabular}{|l|l|r|} \hline\hline
{\em type} &
\multicolumn{2}{c|}{\em style} \\ \hline\hline
smart & red & short \\
rather silly & blue & tall \\ \hline\hline
\end{tabular}
--></A>
which becomes a
rendered table:
Here is an example of a picture environment:
<!--%TEX
% unitlength default is 0.2in
\begin{picture}(5,5)(0,0)
\put(2,2){\circle{4}}
\put(2,2){\vector(1,1){1}}
\end{picture}
-->
which becomes:
PPIs can also be used for picture description languages such as pic. A PIC script (similar to TEX) can turn![]()
<!--%PIC
circle rad .25
spline right 1 then down .5 left 1 then right 1
circle same
-->
into
.
I work at <!--%/bin/sh echo -n $ORGANIZATION-->.
into
I work at Texas Instruments.
They can also be used to embed expressions from other SGML DTDs,
(along with an associated filter program to produce HTML) and
to implement macros. For example
<!--FWBK A.html B.html-->
could produce
which can be used at the bottom of pages for forward and backward links.<A href="A.html"><IMG src="/www2/fwd.gif"></A> <A href="B.html"><IMG src="/www2/back.gif"></A>
<PP FILTER=filter>cdata</PP>
The following can be viewed in the on-line version of this
paper at <http://www.ncsa.uiuc.edu/SDG/IT94/IT94Info.html>.
phtml.c: the compiler written
in C++ (use CC phtml.c -o phtml to compile)
TEX and
PIC: example filters
pstoppm.ps,
pstogif and
giftrans.c: needed by the
example filters (also needed are gs, latex,
dvips, ppmtogif, and pnmcrop,
available from various public domain sources).
and pic
can easily be embedded in HTML files
Note: in contrast tolatex2html, PPIs allow for "native" HTML development, allowing an escape to external notations only when necessary.cphtmlmakes it easy for HTML authors to incorporate complex mathematical expressions, a forte of.
Dr. Philip Thrift received the Ph.D.in Applied Mathematics in 1979 from Brown University and joined Texas Intruments in 1982. In various laboratories at TI, he has done research in image understanding, object-oriented and logic programming, machine learning, database mining and information systems. He has several publications in these areas and holds two patents. He is currently working on networked information system applications.
Contact: thrift@csc.ti.com