Live Multimedia over HTTP

Jonathan C. Soo
Telemedia, Networks and Systems Group
MIT Laboratory for Computer Science
Cambridge, MA 02139

Abstract

The World Wide Web is currently not well oriented towards distributing stream-oriented media such as audio and video. The limitation is not in HTTP itself, but in currently existing browsers. After opening an HTTP connection to a server, most browsers write all data to a local file before passing it to an external viewer. While this works well for text and graphics, it makes viewing stream-oriented media impractical because of the long delay before start of playback, and because the entire file must be stored on the local host. In addition, it is not possible to send ``live'' streams of data.

This paper describes a prototype browser designed to solve some of these problems. It supports a subset of HTTP 1.0.

Introduction

The World Wide Web is a popular mechanism for distributing many types of data. In the last few years, it has become a major source of traffic on Internet and an important part of the academic and commercial information infrastructure.

While it currently works very well for text and graphics, is is not as well suited for distributing stream-oriented media such as audio and video. Even for users on a local Ethernet, the time needed to download a complete audio or video segment discourages browsing.

In addition, it is currently not possible to access ``live'' media streams through Web browsers. Live media are media of possibly indeterminiate duration, where capture, transmission, and playback are overlapped, and latency is nearly constant and relatively short; for instance, a live newscast might have latency of a few seconds. In current browsers, such as NCSA Mosiac and tclwww, it is not possible to overlap transmission and playback, resulting in a delay proportional to the length of the segment being transmitted.

This paper describes a scriptable, extensible, and embeddable browser that allows live multimedia to be transmitted using HTTP. Examples of live audio and video are presented, and approaches to live multimedia are discussed. The paper also describes the performance of the browser on prerecorded multimedia, and presents some applications using the browser as a platform for distributed computing.

Approach

Current Web Browser Implementations

Most browsers today are designed primarily for browsing through text and graphics over fast network connections, using a "store and forward" approach.

In a typical transaction, a browser first generates an HTTP request on behalf of a user. The URL provided is parsed to find the host and port of a server, and a TCP connection is opened. The request header is sent over the connection, and the browser then waits for the HTTP reply header.

On the server side, the header is parsed, the appropriate data is located, a HTTP reply header with the data type and length is generated and sent, and the data is written directly to the network connection.

The browser then reads and parses the HTTP reply header, determining the length and type of the data being sent. If the data is of a type that is supported natively by the browser, it is read and processed. If it is not, the data is copied from the TCP connection to a temporary file on local storage, and the name of this file is then passed to an user-defined application though a command line parameter.

This basic model is very effective. Most importantly, it is extensible; adding new data types is fairly simple, usually requiring a small modification to a configuration file and not requiring compilation of the browser. Sub-applications can be designed and tested completely seperately from the browser, and there is an established procedure to promote experimental data types to generally accepted ones.

However, there are some problems with the details of the implementation. The most significant is the step where the data is copied from the network connection into a temporary file before a sub-application is spawned. For small text and graphic files where transmission time is short compared to the connection setup time, there is little effect on response time.

For large audio and video files, however, this extra copy means that the entire file must be transferred before the sub-application can be spawned. Several problems result directly from this. The most important is that it is not possible to send live media streams; playback cannot be overlapped with transmission or capture.

Even for prerecorded media, response times for long segments may be measured in minutes, and the client must have enough local storage to store the entire file whether it will be played completely or not. Another disadvantage of this approach is that it is inefficient; not only is the data being copied one more time than necessary, it probably is being copied to a relatively slow mechanical storage device. As network connections become faster, this may become a serious bottleneck.

Proposed Changes

The approach taken to solve these problems was to restructure the client to avoid the copy to disk. Instead of immediately reading the data stream and writing it to a file, the connection is passed directly to the sub-application. The sub-application can then read data from the server as it is needed, and overlap processing with I/O. In fact, the sub-application has a TCP connection to the HTTP server process, and can send and receive data simultaneously.

On the server side, no modifications are necessary to take advantage of this approach to playing media files. The server sends data to the client as fast as the TCP connection will accept, and blocks when the TCP connection does.

For live multimedia, some server-side scripts are necessary to create live media sources from media capture applications. Fortunately, both the NCSA and CERN HTTP servers have a gateway interface that make it easy to do this by passing TCP sockets directly to sub-applications.

Goals

The design goal of this project was to create a client that would implement the above approach, as well as form a platform for future research in related areas. It was desired to have a client that would be easily understandable, portable, extensible, and embeddable. Although an existing browser could have been modified, this was not done for several reasons.

Implementation

The implementation was divided into two parts; an HTTP library and a simple browser. The HTTP library implements a subset of HTTP, and the browser supports several of the common data types found on the Web, and several common sub-applications used to view those types.

The language platform that was chosen was TCL 7.3b, with the TCL-DP networking extensions and some C extensions to handle HTTP header parsing. The computing platform was a DEC Alpha running OSF/1 1.2.

The size of the final code was approximately 300 lines of TCL, and 50 lines of C.

HTTP Library

The HTTP library is written almost completely in TCL. It uses the socket management functions of TCL-DP to establish TCP connections to HTTP servers, and generates an HTTP header. Because TCL has difficulty using non-UNIX line delimiters, it also uses a function written in C to read the HTTP reply header.

When called with an URL and method, the http function first parses the URL to determine the name of the server to access, and the port number to use. The library then opens a TCP connection and sends and receives the appropriate HTTP headers.

The TCP connection is read only until the end of the HTTP header, so that the next byte read would be the start of the data requested. At that point, the HTTP library returns the handle of the TCP connection and passes the parsed list of MIME headers to the calling function.

Browser

The browser accepts user input and determines how to present the data received from the server to the user. After calling the HTTP library, a dispatch function determines the type of data being received by examining the Content-type entry of the MIME header. For each type, there is a user-defined handler procedure that the dispatcher calls.

For some types of data, such as "text/plain" and "app/x-tcl-script", the data may be read and processed within the browser. For others, a sub-application is started.

Here, instead of writing the data to a file before starting the sub-application, the sub-application is started as a child process, and the exisiting TCP connection that was returned to the browser is redirected to be the sub-application's standard input and output.

At this point, the browser's handle to the TCP connection is closed, and there is no further interaction with the server by the browser.

Most applications in the UNIX environment can easily be configured to read from standard input. In many cases, this is the default behavior. In TCL, starting these applications with a socket redirected standard input and output can be handled in a single command.

The browser currently supports most of the common content-types typically found on the Web, including text/plain and text/html, image/tiff, image/jpeg and image/gif, audio/basic, and video/mpeg. In addition, it supports VuSystem \cite{Vusystem} streams, which are interleaved audio, video, and text streams used in the MIT TNS VuSystem project.

Server

At the server side, no modifications are needed for sending existing files. For live media, however, some scripts are neccesary to run the live media capture applications.

The Common Gateway Interface

Both the NCSA and CERN HTTP servers have a sub-application interface known as the Common Gateway Interface (CGI) \cite{CGI}. This interface provides a standard way of passing details of HTTP transactions to sub-applications. Two slightly different variants exist on each server. A regular CGI script needs only to write a short content-type header before sending data to the client. The data from the CGI script is first read by the server, which calcluates it's length before sending it to the client. The data is not sent to the client until the CGI script signals the end of the data by terminating itself. Live media sources are not implementable using regular CGI scripts.

NPH-Header Scripts

An NPH-Header script is similar to the regular CGI script except that the sub-application is required to generate a complete HTTP header itself, and the sub-application is given the direct TCP connection to the client rather than a pipe to the server. This makes live media possible.

The NPH-Header scripts used to invoke the media capture applications are very similar to those used in the browser. Most UNIX applications are easily configured through command line options to write their output to standard output rather than a file.

Content-length

It is important to note that in many cases, no use is made of the content-length value returned by the server. In fact, none of the sub-applications used have a provision for a length parameter.

Some data formats such as the various image formats such as GIF and JPEG have the content length essentially encoded in the header, or may have some other way of determining end-of-file. Some other applications such as audio stream players may simply accept data until the connection is closed, which happens when the HTTP server has no more data to send.

For completeness, a convention was adopted where a negative content-length signified an indefinite-length data stream.

Distributed Applications

Two simple distributed applications were also created. Both of these used the direct TCP connection to transfer live data.

A simple access log monitor

The first application written was a simple access log monitor. A simple server-side nph-header script was written to copy data written to the server's access log to standard output. On the client side, the data was written to the user display as it was received.

A complex access log monitor

The second application was similar to the access log monitor, but was implemented using RPC and was interactive. A server-side nph-script was written with a new output content-type of x-tcl-rpc. On the client side, a handler was written that read from the connection, and evaluated the TCL code.

For this application, the browser was run with the TK extensions, a popular windowing system built around TCL. The server application transferred a program to the client that created the user interface. Then, for every log access, it sent a TCL command to the client, modifying the client's state and updating the client's display.

The server application also monitored the connection for commands sent by the client. For instance, the client could specify to the server whether it was interested in all connections logged, or only connections made by previously unseen hosts.

Preliminary Results

The browser was tested with a variety of sub-applications, file types, and servers over a local Ethernet and the Internet. Due to time restrictions, detailed measurements could not be made; however, the subjective results are fairly clear and reported here.

Server Compatibility

The browser was tested with both the NCSA and CERN HTTP servers. Initially, there were some problems because the servers use different line delimiters; the NCSA server uses MS-DOS delimiters, while the CERN server uses UNIX conventions. Because TCL has difficulty with MS-DOS line delimiters, a short C function was written that appears to have solved this problem.

Audio Transmission

The audio application aplay provided with DEC OSF/1 was used for audio playback, and all audio samples were of type audio/basic, which has 8 bit samples and 22k samples per second.

When started from the TNS browser, running over a local Ethernet, there was no noticable delay before start of playback and the file played without audible glitches. In contrast, when started from NCSA Mosaic, there was approximately a six second pause before start of playback when playing a 1 MB sound file.

For live audio, the arecord application provided with DEC OSF/1 was used for sampling. Live audio performance was identical to pre-recorded audio performance; there was no noticable delay before output started, and no glitches.

Audio performance over the Internet was highly variable. In general, the delay between the start of the request and the start of output was less than a second, compared to nearly a minute for a similar (1MB) audio file. However, the quality of output varied. On some occasions, several minutes of audio could be played without audible glitches. At other times, long periods of uninterrupted audio are often followed by long periods of silence, with occasional fragments of sound. It is possible that this is caused in part by rigid rate control in the audio player; if the incoming data stream falls behind the audio output, output is stopped, and all data that arrives "late" is discarded. Output continues only when the incoming data rate increases enough so that the individual samples are "on time".

Live audio transmissions over the Internet were not attempted. However, it is likely that performance will be similar to the performance of pre-recorded audio.

Video Transmission

The video applications used for testing were the Berkeley mpeg_play application and the TNS VuSystem applications vsrecord and vsplay. mpeg_play accepts a MPEG encoded video only stream while vsplay accept a multiplexed audio/video/text stream. For these tests, vsplay was used without compression.

MPEG video transmission

When started from Mosaic, mpeg_play took approximately 3 seconds to download and play an 740KB video file over a local Ethernet. When started from the TNS browser, playback started approximately five seconds after the initial request. There was no visible difference in playback time between the two invocations.

mpeg_play was also run with a video file located on an NCSA server. The download time from NCSA was approximately 30 seconds for a 740KB video file. When played directly over the Internet, using the TNS browser, MPEG video playback performance was highly variable. The playback rate was bursty; several seconds of video would play at a fast rate followed by a pauses of about half a second.

It is possible that this behavior is a result of a lack of rate control in mpeg_play and the variable bandwidth compression of MPEG video streams. mpeg_play will play back frames as soon as they are received, keeping any TCP receive buffers empty. Also, MPEG video streams are of variable bit rate; low bandwidth interframe data is sent between higher bandwidth frames.

Because the only MPEG encoders available were batch encoders, live MPEG transmission was not attempted.

VuSystem video transmission

When run on a local Ethernet on pre-recorded video files, vsplay performed similarly to aplay and mpeg_play; there was a significant reduction in response time when using the TNS browser, and there was no visible difference between the two approaches.

Live VuSystem media streams were also transmitted over the local Ethernet. Eight bit, 320 by 240 pixel frames were sent at sustained rates of five fps, along with a single basic audio stream. At this relatively low rate, network playback was not distinguishable from local playback. At this time, testing at higher bit rates was not attempted.

Conclusions

Live Multimedia is possible through HTTP without much modification to clients and servers. Even a very simple implementation of an HTTP client allows for adequate performance over a local area network. In addition, many of the techniques used result in improved performance for prerecorded media.

There is more difficulty in using these techniques over slower networks. For any given media type, there will be some networks that simply do not have the necessary bandwidth for live transmission. For these networks, the only choice is to use local storage to cache the incoming data.

The most interesting area is transmission over networks such as the Internet that may have adequate average bandwidth, but high variability of bandwidth. Sub-applications must be developed that can tolerate this.

One solution might involve buffering at the receiving end, so that occasional pauses in transmission can be tolerated. Another approach might be to take advantage of the full duplex TCP connection and use feedback to control the transmission rate, possibly affecting the final output quality.

In all of these cases, a new perspective on HTTP is needed. Rather than being viewed as a browser protocol, MIME types can be looked at as naming protocols in addition to data types. HTTP can then be seen as a powerful, widely used protocol negotiation tool.

Acknowledgements

The author would like to thank all of the members of the TNS group, especially Professor David Tennenhouse for the initial motivation, and David Weatherall, Chris Lindblad, and Henry Houh for their feedback and support.

Bibliography

Rob McCool, ``The Common Gateway Interface'', National Center for Supercomputer Applications.

Tim Berners-Lee, ``HyperText Transfer Protocol Requirements'', European Laboratory for Particle Physics.

Tennenhouse, David L. et al., ``A Software Oriented Approach to the Design of Media Processing Environments'', Proceedings of the International Conference on Multimedia Computing and Systems, pp. 435-444, May 1994.

J. K. Ousterhout, ``Tcl: An Embedded Command Language,'', Computer Science Division (EECS), University of California, Berkeley, CA, January 1990.

About the Author

Jonathan Soo is a M.Eng. student in the Telemedia, Networks, and Systems Group at the MIT Laboratory for Computer Science. He is interested in the applications of distributed systems, and the of the tools that enable them. He was an undergraduate at MIT, and is a vice president and founder of Agora Technology Group, Inc. He is reachable at jcsoo@mit.edu