Philip Thrift
Member, Technical Staff
Texas Instruments
Dallas TX USA
In authoring HTML, one is frequently challenged to include notations external to HTML. Notations like or pic may be useful to embed in an HTML document. Typically this process involves either:
Neither approach is particulary practical to the person authoring primarily in HTML and who only needs to escape to external notation as needed. In this paper a technique for directly embedding external notations in HTML called preprocessing instructions (PPIs) is presented.
Advantages gained by including PPIs are:
Preprocessing instructions are proposed as a means of embedding non-HTML data and associated translation programs (called filters) in HTML documents. They are useful for encoding data objects not currently expressible in HTML (e.g. math, tables, graphics, etc.) within the HTML document itself, facilitating authoring. In the implementation presented in this document they are syntactically analogous to server-side includes.
Server side includes (SSIs) provide a way to embed deliver-time
information in HTML documents. It is supported by NCSA's httpd
by following certain conventions in the location and file type
of HTML documents containing SSIs. In a typical setup,
HTML files with embedded SSIs have a .shtml
file extension,
to distinguish them from `pure' HTML files with a .html
file extension. According to the
documentation:
All directives to the server are formatted as SGML comments within the document. This is in case the document should ever find itself in the client's hands unparsed. Each directive has the following format:SSIs provide a way of including deliver-time (or run-time) information within the delivered HTML document, such as date, text sections that are periodically being updated, etc. There is perhaps some controversy about this approach in regards to server load, security, and certainly their SGML correctness. These issues will not be addressed here.
<!--#command tag1="value1" tag2="value2"-->
While SSIs are a run-time processing feature, preprocessing instructions (PPIs) are proposed here as a compile-time processing feature. Syntatically, in this experimental implementation, they are similar to SSIs.
PPIs are used to embed data in HTML documents that is then passed
to designated filters that translate the data, the result
being incorporated in a new HTML document. HTML documents with
PPIs will have the file extension .phtml
(pre-HTML)
and will be called PHTML files.
cphtml is a PHTML compiler with the following usage:
cphtml [-s] [doc.phtml]+
where one or more PHTML files appear on the command line,
with the result being for each file either a
corresponding doc.html
file, or
a doc.shtml
file, if the -s
flag is present.
The format for embedding PPIs within PHTML files is:
<!--%filter cdata-->
where filter
is the program that will process
the character data in cdata
.
Note:filter
cannot contain a space character, andcdata
cannot contain the close comment delimiter (--
). There must be at least one space character betweenfilter
andcdata
.
When cphtml
is executed, it finds each PPI
within the doc.phtml
file. The designated
filter
program is executed with two additional
environment variables:
$PPIDOC=doc
$PPINUM=ppi#
where doc
is the file name root (doc
in the above format description) and
ppi#
is the count of the PPI in the file
(starting at 1). The cdata
is passed to the program
on the standard input (EOF
is reached at
the close comment delimiter --
, which cannot appear
in cdata
). What filter
writes on
standard output in inserted in the output HTML file.
A typical use of PPIs is to be able to embed external notations in HTML documents, and to designate a filter to convert the external notation into HTML. For example:
<!--%TEX \Large{\(E = mc^2\)}-->
where TEX
produces a transparent gif image
named doc-ppi#.gif
(doc
and
ppi#
being the values of the environment variables passed to
TEX
) from the given data and returns,
for example,
This would appear in a viewer as:
<IMG src="doc-ppi#.gif" ALT="doc-ppi#">
The TEX
filter can be used to produce inlined transparent
gifs for tables and mathematical entities. Here is a shell script for
the TEX
filter:
#!/bin/sh DOC=$PPIDOC NUM=$PPINUM cat > $DOC-$NUM.tex << HEAD \documentstyle[12pt]{article} \unitlength 0.2in \thispagestyle{empty} \begin{document} \noindent HEAD cat >> $DOC-$NUM.tex cat >> $DOC-$NUM.tex << FOOT \end{document} FOOT latex $DOC-$NUM.tex > .latex-errors dvips $DOC-$NUM.dvi > $DOC-$NUM.ps pstogif $DOC-$NUM.ps $DOC-$NUM.GIF > .pstogif-errors giftrans -t 1 -b 0 $DOC-$NUM.GIF > $DOC-$NUM.gif rm $DOC-$NUM.tex $DOC-$NUM.dvi $DOC-$NUM.ps \ $DOC-$NUM.aux $DOC-$NUM.log $DOC-$NUM.GIF echo -n "<IMG SRC=\"$DOC-$NUM.gif\" ALT=\"$DOC-$NUM\">"
Here is an example from a PHTML file where some embedded math appears:
<P>Euler's equation looks like: <P ALIGN=center><A NAME=euler> <!--%TEX \Large{\[ e^{i\pi} + 1 = 0 \]} --></A> <P>where <!--%TEX \(e\)--> is the natural logarithm base and <!--%TEX \(i = \sqrt{-1}\)-->. In larger type, this looks like <!--%TEX \large{\(i = \sqrt{-1}\)}-->.This is then translated by
cphtml
into:
Euler's equation looks like:
where is the natural logarithm base and . In larger type, this looks like .
An example of embedding a table in HTML (whenever HTML 2.0 is released, tables should be supported natively):
<P ALIGN=center><A NAME=table1> <!--%TEX \begin{tabular}{|l|l|r|} \hline\hline {\em type} & \multicolumn{2}{c|}{\em style} \\ \hline\hline smart & red & short \\ rather silly & blue & tall \\ \hline\hline \end{tabular} --></A>which becomes a rendered table:
Here is an example of a picture environment:
<!--%TEX % unitlength default is 0.2in \begin{picture}(5,5)(0,0) \put(2,2){\circle{4}} \put(2,2){\vector(1,1){1}} \end{picture} -->which becomes:
PPIs can also be used for picture description languages such as pic. A PIC script (similar to TEX) can turn
<!--%PIC circle rad .25 spline right 1 then down .5 left 1 then right 1 circle same -->into
.
intoI work at <!--%/bin/sh echo -n $ORGANIZATION-->.
They can also be used to embed expressions from other SGML DTDs, (along with an associated filter program to produce HTML) and to implement macros. For exampleI work at Texas Instruments.
could produce<!--FWBK A.html B.html-->
which can be used at the bottom of pages for forward and backward links.<A href="A.html"><IMG src="/www2/fwd.gif"></A> <A href="B.html"><IMG src="/www2/back.gif"></A>
<PP FILTER=filter>cdata</PP>
The following can be viewed in the on-line version of this
paper at <http://www.ncsa.uiuc.edu/SDG/IT94/IT94Info.html>
.
phtml.c
: the compiler written
in C++ (use CC phtml.c -o phtml
to compile)
TEX
and
PIC
: example filters
pstoppm.ps
,
pstogif
and
giftrans.c
: needed by the
example filters (also needed are gs
, latex
,
dvips
, ppmtogif
, and pnmcrop
,
available from various public domain sources).
Note: in contrast tolatex2html
, PPIs allow for "native" HTML development, allowing an escape to external notations only when necessary.cphtml
makes it easy for HTML authors to incorporate complex mathematical expressions, a forte of .
Dr. Philip Thrift received the Ph.D.in Applied Mathematics in 1979 from Brown University and joined Texas Intruments in 1982. In various laboratories at TI, he has done research in image understanding, object-oriented and logic programming, machine learning, database mining and information systems. He has several publications in these areas and holds two patents. He is currently working on networked information system applications.
Contact: thrift@csc.ti.com