This paper describes a prototype browser designed to solve some of these problems. It supports a subset of HTTP 1.0.
While it currently works very well for text and graphics, is is not as well suited for distributing stream-oriented media such as audio and video. Even for users on a local Ethernet, the time needed to download a complete audio or video segment discourages browsing.
In addition, it is currently not possible to access ``live'' media streams through Web browsers. Live media are media of possibly indeterminiate duration, where capture, transmission, and playback are overlapped, and latency is nearly constant and relatively short; for instance, a live newscast might have latency of a few seconds. In current browsers, such as NCSA Mosiac and tclwww, it is not possible to overlap transmission and playback, resulting in a delay proportional to the length of the segment being transmitted.
This paper describes a scriptable, extensible, and embeddable browser that allows live multimedia to be transmitted using HTTP. Examples of live audio and video are presented, and approaches to live multimedia are discussed. The paper also describes the performance of the browser on prerecorded multimedia, and presents some applications using the browser as a platform for distributed computing.
In a typical transaction, a browser first generates an HTTP request on behalf of a user. The URL provided is parsed to find the host and port of a server, and a TCP connection is opened. The request header is sent over the connection, and the browser then waits for the HTTP reply header.
On the server side, the header is parsed, the appropriate data is located, a HTTP reply header with the data type and length is generated and sent, and the data is written directly to the network connection.
The browser then reads and parses the HTTP reply header, determining the length and type of the data being sent. If the data is of a type that is supported natively by the browser, it is read and processed. If it is not, the data is copied from the TCP connection to a temporary file on local storage, and the name of this file is then passed to an user-defined application though a command line parameter.
This basic model is very effective. Most importantly, it is extensible; adding new data types is fairly simple, usually requiring a small modification to a configuration file and not requiring compilation of the browser. Sub-applications can be designed and tested completely seperately from the browser, and there is an established procedure to promote experimental data types to generally accepted ones.
However, there are some problems with the details of the implementation. The most significant is the step where the data is copied from the network connection into a temporary file before a sub-application is spawned. For small text and graphic files where transmission time is short compared to the connection setup time, there is little effect on response time.
For large audio and video files, however, this extra copy means that the entire file must be transferred before the sub-application can be spawned. Several problems result directly from this. The most important is that it is not possible to send live media streams; playback cannot be overlapped with transmission or capture.
Even for prerecorded media, response times for long segments may be measured in minutes, and the client must have enough local storage to store the entire file whether it will be played completely or not. Another disadvantage of this approach is that it is inefficient; not only is the data being copied one more time than necessary, it probably is being copied to a relatively slow mechanical storage device. As network connections become faster, this may become a serious bottleneck.
On the server side, no modifications are necessary to take advantage of this approach to playing media files. The server sends data to the client as fast as the TCP connection will accept, and blocks when the TCP connection does.
For live multimedia, some server-side scripts are necessary to create live media sources from media capture applications. Fortunately, both the NCSA and CERN HTTP servers have a gateway interface that make it easy to do this by passing TCP sockets directly to sub-applications.
The language platform that was chosen was TCL 7.3b, with the TCL-DP networking extensions and some C extensions to handle HTTP header parsing. The computing platform was a DEC Alpha running OSF/1 1.2.
The size of the final code was approximately 300 lines of TCL, and 50 lines of C.
When called with an URL and method, the http function first parses the URL to determine the name of the server to access, and the port number to use. The library then opens a TCP connection and sends and receives the appropriate HTTP headers.
The TCP connection is read only until the end of the HTTP header, so that the next byte read would be the start of the data requested. At that point, the HTTP library returns the handle of the TCP connection and passes the parsed list of MIME headers to the calling function.
For some types of data, such as "text/plain" and "app/x-tcl-script", the data may be read and processed within the browser. For others, a sub-application is started.
Here, instead of writing the data to a file before starting the sub-application, the sub-application is started as a child process, and the exisiting TCP connection that was returned to the browser is redirected to be the sub-application's standard input and output.
At this point, the browser's handle to the TCP connection is closed, and there is no further interaction with the server by the browser.
Most applications in the UNIX environment can easily be configured to read from standard input. In many cases, this is the default behavior. In TCL, starting these applications with a socket redirected standard input and output can be handled in a single command.
The browser currently supports most of the common content-types typically found on the Web, including text/plain and text/html, image/tiff, image/jpeg and image/gif, audio/basic, and video/mpeg. In addition, it supports VuSystem \cite{Vusystem} streams, which are interleaved audio, video, and text streams used in the MIT TNS VuSystem project.
The NPH-Header scripts used to invoke the media capture applications are very similar to those used in the browser. Most UNIX applications are easily configured through command line options to write their output to standard output rather than a file.
Some data formats such as the various image formats such as GIF and JPEG have the content length essentially encoded in the header, or may have some other way of determining end-of-file. Some other applications such as audio stream players may simply accept data until the connection is closed, which happens when the HTTP server has no more data to send.
For completeness, a convention was adopted where a negative content-length signified an indefinite-length data stream.
For this application, the browser was run with the TK extensions, a popular windowing system built around TCL. The server application transferred a program to the client that created the user interface. Then, for every log access, it sent a TCL command to the client, modifying the client's state and updating the client's display.
The server application also monitored the connection for commands sent by the client. For instance, the client could specify to the server whether it was interested in all connections logged, or only connections made by previously unseen hosts.
When started from the TNS browser, running over a local Ethernet, there was no noticable delay before start of playback and the file played without audible glitches. In contrast, when started from NCSA Mosaic, there was approximately a six second pause before start of playback when playing a 1 MB sound file.
For live audio, the arecord application provided with DEC OSF/1 was used for sampling. Live audio performance was identical to pre-recorded audio performance; there was no noticable delay before output started, and no glitches.
Audio performance over the Internet was highly variable. In general, the delay between the start of the request and the start of output was less than a second, compared to nearly a minute for a similar (1MB) audio file. However, the quality of output varied. On some occasions, several minutes of audio could be played without audible glitches. At other times, long periods of uninterrupted audio are often followed by long periods of silence, with occasional fragments of sound. It is possible that this is caused in part by rigid rate control in the audio player; if the incoming data stream falls behind the audio output, output is stopped, and all data that arrives "late" is discarded. Output continues only when the incoming data rate increases enough so that the individual samples are "on time".
Live audio transmissions over the Internet were not attempted. However, it is likely that performance will be similar to the performance of pre-recorded audio.
mpeg_play was also run with a video file located on an NCSA server. The download time from NCSA was approximately 30 seconds for a 740KB video file. When played directly over the Internet, using the TNS browser, MPEG video playback performance was highly variable. The playback rate was bursty; several seconds of video would play at a fast rate followed by a pauses of about half a second.
It is possible that this behavior is a result of a lack of rate control in mpeg_play and the variable bandwidth compression of MPEG video streams. mpeg_play will play back frames as soon as they are received, keeping any TCP receive buffers empty. Also, MPEG video streams are of variable bit rate; low bandwidth interframe data is sent between higher bandwidth frames.
Because the only MPEG encoders available were batch encoders, live MPEG transmission was not attempted.
Live VuSystem media streams were also transmitted over the local Ethernet. Eight bit, 320 by 240 pixel frames were sent at sustained rates of five fps, along with a single basic audio stream. At this relatively low rate, network playback was not distinguishable from local playback. At this time, testing at higher bit rates was not attempted.
There is more difficulty in using these techniques over slower networks. For any given media type, there will be some networks that simply do not have the necessary bandwidth for live transmission. For these networks, the only choice is to use local storage to cache the incoming data.
The most interesting area is transmission over networks such as the Internet that may have adequate average bandwidth, but high variability of bandwidth. Sub-applications must be developed that can tolerate this.
One solution might involve buffering at the receiving end, so that occasional pauses in transmission can be tolerated. Another approach might be to take advantage of the full duplex TCP connection and use feedback to control the transmission rate, possibly affecting the final output quality.
In all of these cases, a new perspective on HTTP is needed. Rather than being viewed as a browser protocol, MIME types can be looked at as naming protocols in addition to data types. HTTP can then be seen as a powerful, widely used protocol negotiation tool.
Tim Berners-Lee, ``HyperText Transfer Protocol Requirements'', European Laboratory for Particle Physics.
Tennenhouse, David L. et al., ``A Software Oriented Approach to the Design of Media Processing Environments'', Proceedings of the International Conference on Multimedia Computing and Systems, pp. 435-444, May 1994.
J. K. Ousterhout, ``Tcl: An Embedded Command Language,'', Computer Science Division (EECS), University of California, Berkeley, CA, January 1990.