Synchronized multimedia for the WWW

^aThe Open Group Research Institute
2, avenue de Vignate, 38610 Gières, France
f.rousseau@opengroup.org

^bLSR-IMAG,
BP 72, 38402 Saint Martin d'Hères, France
Andrzej.Duda@imag.fr

Abstract: We propose temporal extensions to HTML that allow seamless integration of synchronized multimedia into WWW documents. The extensions are based on three concepts: hypertime links for temporal composition, common time bases for close lip-sync synchronization between media objects, and dynamic layout. We have designed a flexible execution architecture to support the concepts and experimented with play back of simple synchronized presentations.
Keywords: Synchronized multimedia; Temporal markup languages; Dynamic layout

1 Introduction

We are interested in synchronized multimedia presentations that have inherent temporal behavior. Components of such presentations may include continuous multimedia data (audio and video clips) either stored on WWW servers or coming from live-data sources. The WWW Consortium has initiated an activity to define a new standard for synchronized multimedia documents: SMIL. SMIL adds to HTML some features related to time behavior: components of a SMIL document may be continuous multimedia. However, SMIL has some drawbacks:

Temporal composition is based on a hybrid approach that mixes two different abstractions: intervals and time-points. A scenario is represented as a tree in which temporal operators par and seq are nodes and intervals corresponding to multimedia objects are leaves. To make the par operator unambiguous, SMIL defines additional attributes that refer to time-points. The attributes specify the exact semantics of the par operator. As a result, a given scenario can be expressed in several different ways and specification may become confusing.
SMIL provides an optional lipsync attribute that applies to a group of objects enclosed with the par operator. It is not sufficient to specify desired close intermedia synchronization between multimedia objects. Moreover, the exact behavior of the attribute is not defined and left to implementation.
The layout of a document is based on a tuner element that uses a simplified version of CSS positioning. As a consequence, the layout is static — it cannot change in time. The static layout limits authoring possibilities, because temporal objects may require placement that varies in time.

We would like to contribute to the development of SMIL by proposing temporal extensions to HTML based on three concepts:

hypertime links for temporal composition,
common time bases for close lip-sync synchronization between media objects,
dynamic layout that can be seen as an extension of media objects.

A hypertime link is similar to the standard WWW hypertext link, however, it has an explicit temporal semantics: it relates two media samples (e.g. video frames) and assigns consecutive time instants to the samples. Following a hypertime link consists of skipping in time from the origin media sample to the target. A time base is a means for specifying close intermedia synchronization. It can be thought of as a virtual time space in which media objects "live". It defines a common time coordinate space for objects that are synchronization dependent. We extend the notion of a document layout: it may be specified as an object having temporal behavior in a similar way to media objects. This feature allows seamless integration of spatial and temporal dimensions of a multimedia document.

We have defined an execution architecture to play back documents specified using the temporal extensions to HTML. The architecture is also based on three concepts: synchronization events, synchronization managers, and synchronizable media objects. We have prototyped the concepts using Java and experimented with play back of simple synchronized presentations.

In the remainder of the paper, we discuss existing models of multimedia documents (Section 2), propose temporal extensions to HTML (Section 3), present an execution architecture that supports the extensions (Section 4), and outline conclusions (Section 5).

2 Models of multimedia documents

Integration of multimedia data into documents raises the problem of time. Traditional documents such as those specified in HTML have static spatial layout filled with different static elements such as text or images. Other elements of a layout such as audio or video clips may have temporal behavior, however there is no means for specifying temporal relations between elements. Time adds a new dimension to the existing two dimensional coordinate space of documents: we must specify how the elements of a document evolve in time. We discuss below existing models for temporal composition.

2.1. Models for temporal composition

Existing temporal models for multimedia can be divided in two classes: point-based and interval-based [WR94]. An example of the point-based approach is the timeline, in which media objects are placed on several time axes called tracks, one for each media type. All events have an absolute time reference, so they are totally ordered on the timeline. The timeline model has been adopted in HyTime, Macromedia Director and in many other tools. The model is well suited for temporal composition of media segments of known durations, however it is limited because of the absolute time reference of media objects.

Temporal point nets are based on a functional relation that may exist between two events, for example the end of one segment may start or stop another one [BZ93]. This model allows reasoning about events in a relative way. It is simple, powerful, and intuitive. Its use may result in complex, unstructured graphs and inconsistent specification, but this drawback is comparable to the difficulty of using HTML to format traditional documents: both are not intended to be used by the author directly, yet it is always possible.

Interval-based models consider elementary media entities as time intervals ordered according to some relations. Existing models are mainly based on the relations defined by Hamblin [Ham72] and Allen [All83] for expressing the knowledge about time. The interval relations are ambiguous, for example meets relation states that the end of the first interval coincides with the end of the second one, but it does not say whether the first interval starts the second one, whether the second interval stops the first one or whether it is a pure coincidence. Temporal composition specified by the relations may lead to inconsistent specification. Their expressing power is lower than those of temporal point nets: there are some scenarios that cannot be expressed using the interval relations.

Algebraic Video has defined a different paradigm based on functional operators that can be applied to time intervals [WDG95]. Interval Expressions extend this approach [KD96]. The model guarantees the consistency of temporal composition.

Interval models can be thought of as higher-level abstractions that may be used in an authoring environment and compiled into a low-level one such as HTML with temporal extensions. We can observe that any interval model such as Allen relations or algebraic operators can be transformed into temporal point nets.

2.2. Multimedia standards

Several international standards such as HyTime [HyT97,NKN91], MHEG [EMB95], and Premo [HRvL96] deal with multimedia. HyTime is derived from SGML and adds mechanisms to specify hyperlinks and schedule multimedia information in time and space. MHEG and Premo propose an object-oriented approach in which a multimedia presentation is defined by a hierarchy of objects that execute on a presentation engine. They do not propose any sophisticated temporal composition model, but take care of low-level close synchronization support.

The W3C initiated an activity to explore integration of synchronized multimedia into WWW documents. It has defined a working draft of SMIL (Synchronized Multimedia Integration Language), a new language that extends HTML with temporal functionalities. It is based on XML and provides some basic functionalities for including continuous multimedia data such as video and audio in WWW documents. However, it has several drawbacks.

First, there are only two temporal composition operators: seq and par. A scenario is represented as a tree in which seq and par operators are nodes and intervals corresponding to multimedia objects are leaves. The semantics of the seq operator is clear and unambiguous : A seq B means that when A ends, B should start. On the contrary, the semantics of the par operator is ambiguous: it does not define the exact relation between four end-points of intervals A and B. To overcome this problem, Interval Expressions [KD96] define several operators that specialize the par operator. SMIL takes another approach by defining additional attributes that refer to time-points in order to specify the exact semantics of the par operator. As a result, temporal specification of SMIL is based on a hybrid approach that mixes two different abstractions: intervals and time-points. A given scenario can be expressed in several different ways and specification may become confusing.

Second, SMIL provides an optional lipsync attribute to specify how different media segments must be kept synchronized. It applies to a group of objects enclosed with the parallel operator. It is not sufficient to specify desired close intermedia synchronization between multimedia objects, because in many cases, we want to designate some objects as masters that control synchronization of slaves independently of temporal composition. Moreover, SMIL does not define the exact behavior of the lipsync attribute and leaves it to implementation.

Third, the layout of a SMIL document is static, because the tuner element uses a simplified version of CSS positioning. Even if we are able to specify temporal composition of media objects, the objects appear at fixed positions on the screen. This problem arises because in SMIL, temporal composition is separated from spatial dimension. The static layout limits authoring possibilites, because temporal objects may require placement that varies in time.

We propose to use a simple functional paradigm derived from temporal point nets to specify a temporal extension of HTML. We also define a sophisticated mechanism for close intermedia synchronization and a way of specifying dynamic layouts.

3. Integrating time into HTML documents

A document has a dual structure [RV94]. The logical structure defines how different components of a documents are related. It is used for editing and allows clear separation of style properties (such as fonts, colours or typefaces) from properties related to a physical medium. The physical structure defines how logical components of a document should be presented on a physical medium such as a paper page or a screen. In particular, it specifies a spatial layout of logical components. Most of multimedia document models add the temporal structure as a separate dimension. This approach is not integrated with the existing logical and physical structures: spatial and temporal descriptions are independent and cannot be mixed, so for example it is impossible to define a layout that changes in time. As a result, a document has a static layout that is applied to components with temporal behavior. SMIL proposes such a temporal extension to HTML.

We would like to propose another approach in which the temporal dimension is well integrated with the logical and physical structures. Figure 1 presents an example of scenario that we want to take into account: a document with a layout changing in time. When the video clip finishes, a new layout is used to present components that follow the clip: subsequent text paragraphs and an audio clip. After the audio clip, a third layout defines how to present the last paragraph and an image. The components of the document as well as the layout have temporal behavior.

Fig. 1. A document with a dynamic temporal layout.

The logical structure of a document defines how different parts are related and in this respect it is similar to the temporal structure. For example, we can say that in a document, one section is before another one. Temporal composition may be expressed in a similar manner: we can say that a video clip should be played back before another one. However, time is also a physical parameter and the temporal dimension must appear in the physical structure. So, time relates to both logical and physical structures: it is not orthogonal to them, but it is rather a part of them. This aspect can be taken into account by defining a temporal layout in the same way as there is a spatial layout for a logical structure. The temporal layout defines how temporal media are mapped onto the absolute timeline to be played at right time instants.

So, to integrate temporal behavior into HTML documents, we need to specify how the structure and the layout of a HTML document evolve in time. In addition to that, we need a flexible mechanism to specify temporal composition of document elements. Finally, we also need to express close synchronization between some elements. Our proposal is based on the following concepts: hypertime links, time bases, and dynamic layout.

3.1. Hypertime links for temporal composition

We propose to use a simple functional paradigm derived from temporal point nets to specify temporal composition: a temporal link between an origin and a target. We call it a hypertime link by analogy to its WWW companion. A hypertime link has an explicit temporal semantics: it relates two media samples (e.g. video frames) and assigns time instants to the samples. Following a hypertime link is automatic in the sense that it does not require any user interaction and consists of skipping in time from the origin media sample to the target. The presentation of the target sample may be delayed according to the presentation rate of the media (for example in the case of video played at the nominal rate of 25 fps, the delay would be 40 ms). The action expressed by the link depends on the target: if the target is the beginning of a media object or a sample somewhere inside the object, the link activates the target. If the target is the end of a media segment, the link terminates the object.

We also extend the notion of the origin of a hypertime link to include a possibility of specifying a portion of a temporal media (a range of samples) as an origin, in the same way as a portion of text can be an origin of a hypertext link.

A hypertime link provides an intuitive way to express time relations of two events. The functional relation has a nice analogy with hypertext links: a hypertext link has an origin and a target in a spatial document space. It expresses a relation between an origin place and a target one, and its activation allows the user to jump (instantaneously, in theory) from one place to another in the space of documents.

Figure 2 shows the logical and physical structures of an example multimedia document. Solid lines express relations between logical components of the document. Solid arrows represent standard hypertext links by means of which the user can browse different parts of the document. Dashed arrows are hypertime links that are used for specifying temporal composition. In the example, when the video clip finishes, it starts displaying the text of Section 2 and activates the audio clip.

Fig. 2. An example multimedia document with hypertime links.

The notion of a hypertime link has many advantages. First, it is consistent with the current hypertext link paradigm since both of them have similar semantics. This property is useful if we want to mix hypertime and hypertext links. A link may have both properties simultaneously: a hypertext/hypertime link defines a place (for example, a portion of text in a document) and a time interval during which the link can be activated. For example, we can specify that frames 63 to 157 form the origin of the link and if the user clicks when the frames are being played back, the link is activated. Second, a hypertime link is independent of any absolute time, because it only specifies a relative temporal relation between media samples. Third, this concept is simple yet powerful: any type of temporal scenario can be expressed using hypertime links.

3.2. Common time bases for close synchronization

Specification of temporal composition is not sufficient to play back a multimedia document. We need some more information about how media objects must be synchronized. This information is particularly useful in a computing environment that does not provide strong real-time support. In such an environment, different media segments started at the same instant may run out of synchronization after some time and require some corrective action (such as dropping samples) to become synchronized again. We want to be able to specify which objects should be kept synchronized, how often, and what is the nature of this close synchronization (in other words, who is the master of the time).

For this purpose, we define the notion of a time base. A time base is a virtual time space in which media objects "live". A time base defines a common time coordinate space for all objects that are related by some relations, for example master–slave dependency. A time base can be seen as a perfect time space in which real world phenomena such as jitter or drift do not exist and media objects behave in a perfect manner. Obviously, such a perfect time space does not exist, however, it can be implemented closely enough using real-time support. If such a support is not available, which is the case of many existing systems, a document should indicate how the quality of presentation is to be maintained and what is the nature of synchronization to be enforced.

We define the nature of synchronization between media segments using the notions of master and slave. A master–slave relationship defines a master controlling a slave according to its needs (Fig. 3a). We extend this notion to multiple masters and slaves (Fig. 3b) through the common time base: a master can accelerate time or slow it down, hence slaves and other masters must adjust to time.

The master–slave relationship allows the user to easily define the behavior of media segments with respect to synchronization. Imagine for example a video clip synchronized with audio comments and close-captioning text. We might want to define the audio comments and the close-captioning text as masters to be sure that these critical data govern the play back of video: we do not want to slow down audio if for some reasons video is delayed. In this case, it is better to skip frames.

Fig. 3. Masters and slaves in a common time base.

Another way to control synchronization between media segments is through synchronization points. In the previous example, we have supposed that close synchronization should be enforced between audio, video and close-captioning text, but such a constraint may be too strong. When close lip-sync synchronization is not necessary, for example between a video clip and an audio comment, we do not need to enforce synchronization for each video frame or audio buffer. We rather specify some time instants that we call synchronization points at which synchronization should be enforced. Synchronization points can be specified at some intermittent user-defined instants, for example at the beginning or at the end of a video shot, as well as at the instants when close-captioning text must be presented along with a video scene. Synchronization points may also be specified as periodic, for example we can say that two media objects must synchronize at every interval of 1 sec. When no explicit synchronization points are defined, synchronization is enforced at the smallest possible grain, i.e. at each video frame or audio sample buffer.

The two mechanisms for synchronizing media segments: the master–slave relationship in a time base and synchronization points allow authors to express complex synchronization constraints to ensure that the document will be played back as the authors intended it to be, thus preserving the semantics of the document.

3.3. Media objects and dynamic layout

We suppose that a synchronized multimedia document may include a variety of media objects having temporal behavior. A media object defines time evolution of media samples of one type. Media samples must be presented at precise time instants defined by the rate of presentation. The rate may be imposed by the author, adapted to match the duration of another object, or adjusted to synchronize with other objects. A media object schedules presentation of samples within a given time base. In this way, objects in the same time base are synchronized. We suppose that traditional media objects such as audio and video can be enriched with temporally scrolled text.

In addition to synchronized presentation of media samples, a media object can be controlled by other objects according to temporal composition. A hypertime link activated by another object can change the current presentation of an object and force it to skip to the target sample or to stop.

We define a dynamic layout as a special case of a media object. It defines a temporal behavior of the physical layout. The only difference is that layouts are neither masters nor slaves since they do not contain any media samples to be synchronized. It encapsulates frames, a means for defining regions of screen in which media objects are presented. Frames can be mixed with static elements such as traditional HTML text paragraphs. Frames can include other layouts to specify nested layouts that provide nested coordinate spaces. Hypertime links define how the layout changes in time. This approach allows seamless integration of spatial and temporal dimensions into a multimedia document.

More work is needed to investigate how the notion of a dynamic layout can be integrated with CSS (Cascading Stylesheets). Such integration requires extending positioning in the X and Y directions, and layering/overlaying (Z positioning) with the fourth dimension: the time. For the sake of presentation clarity, we define below the layout as en extension to HTML and no by means of CSS.

3.4. Temporal extensions to HTML

We present below a syntax of tags that correspond to the concepts presented in this section: hypertime links, time bases, synchronization points, media objects, and dynamic layout. Unlike SMIL, we define the syntax of tags and not a grammar. We describe the semantics of tags and attributes, and illustrate their use through examples. Keywords in bold typeface are terminal, those in normal typeface are not terminal.

3.4.1. Hypertime link

The hypertime link tag relates an origin and a target defined in a media object or layout. We use the notion of time points to define an origin and a target.


   < htlink orig   = " object-id.time-point-id | layout-id.time-point-id |
                     [ object-id.time-point-id, object-id.time-point-id ] |
                     [ layout-id.time-point-id, layout-id.time-point-id ] "
	    target = " object-id.time-point-id | layout-id.time-point-id " >

Examples:

<htlink orig="video.audio-trigger" target="audio.beg">
starts an audio clip when the time point audio-trigger is reached (e.g. at frame 300 of video);
<htlink orig="audio.end" target="video.end">
stops the video clip when the audio clip finishes.

3.4.2. Time points

The time point tag defines a position in a media object or layout that can be used as an origin or a target of a hypertime link. The position can be of two types: nominal or absolute. A nominal position is defined using the nominal presentation rate of a media object. An absolute position is based on an absolute time coordinate of a time point.


   < time-point id    = " name "
                value = " integer value | float value "
                unit  = " frame | sample | timestamp | second | ... "
                type  = " nominal | absolute " >

Examples:

<time-point id="audio-trigger" value="300" unit="frame" type="nominal">
if defined in a video object, sets a time point at frame 300;
<time-point id="wait-20s" value="20" unit="seconds" type="absolute">
sets a time point 20 seconds after the beginning of the containing object or layout.

3.4.3. Media objects

The media object tag defines an object that refers to the URL of media content, specifies the role for master or slave synchronization, and defines a scale temporal transformation. Within the tag, we can encapsulate other objects and define synchronization and time points. Unlike SMIL, we do not distinguish between a video and image objects. The media object tag may contain arbitrary content presented using a given layout.


   < object id    = " name "
            src   = " url "
            role  = " master | slave "
            scale = " object-id | layout-id | float value " >
        objects
        synchronization points
        time points
        hypertime links
   < /object >

The scale attribute specifies in a simple way temporal control over a media, for example:

+2 : play forward at double speed,
-1 : play backward,
++1 : play forward in a loop,
--2 : play backward in a loop,
+-1 : play forward then play backward,
-2+1 : play backward at double speed then play forward.

If the value of scale attribute is an identifier of another object, the duration will be scaled to the duration of the referenced object. This attribute is useful for example to play a video synchronized with audio so that the video changes its rate to terminate at the same time as the audio. It may for example scale the duration of a layout to match a media object.

Examples:

<object id="video" src="clip.mpg" role="slave" scale="+2">...</object>
plays forward clip.mpg as a slave at twice the nominal rate;
<object id="music" src="http://www.imag.fr/music.wav" role="master" scale="++1">...</object>
plays the audio file located at http://www.imag.fr/music.wav as a master, in a loop at the nominal rate;
<object id="logo" src="logo.gif" scale="video">...</object>
plays an animated GIF logo, scales its duration to the duration of object video so that they terminate at the same time;
<object id="movie"> <object id="see" src="hello.qt">...</object> <object id="hear" src="hello.au">...</object> hypertime links for internal temporal composition </object>
defines a composite object.

3.4.4. Dynamic layout

The layout tag is a special case of a media object tag. It does not specify a synchronization relationship nor media content. Instead of encapsulating other objects, it encapsulates frames, a means for defining regions of a screen in which media objects may be presented.


   < layout id    = " name "
	    scale = " object-id | layout-id | float value " >
        frames
        time points
        hypertime links
   < /layout >
   < frame id    = " name "
           src   = " object-id | layout-id | url "
           layer = " integer value "
           shape = " shape "
           mask  = " mask " >

The layer attribute defines the priority of objects that can be overlaid. The shape and mask attributes are used to make special effects for example:

play a video in a region having triangular shape,
play a video using a mask.

Examples:

<layout id="layout1" scale="video"> <frame id="frame1" src="video"> <frame id="frame2" src="logo"> <frame id="frame3" src="layout2"> </layout> <layout id="layout2"> <frame id="frame4" src="movie"> </layout>
defines nested layouts.

3.4.5. Time bases

The time base tag groups layouts and media objects that should be synchronized closely. Synchronization roles (master or slave) are specified in objects themselves.


   < timebase >
        layouts
        objects
   < /timebase >

3.4.6. Synchronization points

The tag specifies a synchronization point in a media object.


   < sync-point id     = " name "
                value  = " integer value | float value "
                period = " integer value | float value "
                unit   = " frame | sample | timestamp | second | ... "
                type   = " nominal |  absolute " >

Synchronization points beg and end are predefined and correspond to the beginning and the end of an object, respectively. They can be overloaded.

Examples:

<sync-point value="300" unit="frame" type="nominal">
all media objects in the same time base will synchronize at frame 300;
<sync-point period="5" unit="seconds" type="absolute">
all media objects in the same time base will synchronize every 5 seconds.

3.4.7. Other possible extensions

To have a fully-fledged multimedia mark-up language, we need some more extensions, however they require further work. In particular, we would like to add functionalities to define special effects on media objects such as zoom in/out, fade in/out, and many others. For audio objects, we would need to specify some sound effects such as 3D, stereo, or echo. Many of such effects depend on extended features implemented by media objects.

We would also like to add attributes for positioning, showing, hiding and moving frames in a layout. This can be used to describe for example a video presented in a layout frame that moves across the screen.

3.5. Example of a synchronized multimedia document

The example below presents a complex scenario with a dynamic layout. Figure 4 shows a graphic representation.


<timebase>
   <layout id="lay1">
      <frame id="frm1" src="video">
      <frame id="frm2" src="logo">
      <frame id="frm3" src="lay2">
   </layout>
   <layout id="lay2">
      <frame id="frm4" src="movie">
   </layout>
   <object id="video" src="clip.mpg" role="slave" scale="+2">
      <time-point id="audio-trigger" value="300" unit="frame"
                  type="nominal">
   </object>
   <object id="audio" src="comment.au" role="master">
   </object>
</timebase>
<timebase>
   <layout id="lay3">
      <frame id="frm5" src="logo">
      <frame id="frm6" src="movie">
   </layout>
</timebase>
<timebase>
   <object id="music" src="http://www.imag.fr/music.wav"
           role="master" scale="++1">
   </object>
</timebase>
<timebase>
   <object id="logo" src="logo.gif" scale="++1">
   </object>
</timebase>
<timebase>
   <object id="movie">
      <object id="see" src="hello.qt" role="slave">
      </object>
      <object id="hear" src="hello.au" role="master">
      </object>
      <htlink orig="hear.beg" target="see.beg">
      <htlink orig="hear.end" target="see.end">
   </object>
</timebase>
<htlink target="lay1.beg">
<htlink target="logo.beg">
<htlink target="video.beg">
   <htlink orig="video.beg" target="music.beg">
   <htlink orig="video.audio-trigger" target="audio.beg">
      <htlink orig="audio.end" target="video.end">
      <htlink orig="audio.end" target="lay3.beg">
<htlink target="movie.beg">
   <htlink orig="movie.end" target="music.end">
   <htlink orig="movie.end" target="logo.end">

Fig. 4. An example scenario.

When the document starts, three objects are activated: an animated logo that loops, a movie, and a video. video starts a background music that loops. movie is a composite object containing a video see and an audio track hear. movie encapsulates its internal temporal composition: hear starts see and stops it when it ends. All objects are presented using two nested layouts, lay1 and lay2. When video reaches frame 300, it starts audio, an associated audio track. audio stops video when it ends, and changes the layout to lay3. Then, when movie ends, it stops music and logo, which terminates the presentation.

Close synchronization is specified as a time base that contains the first two layouts lay1, lay2 and media objects video and audio. This means that they will be kept synchronized according to the roles specified by each object.

Since audio and video are in the same time base, they will be kept synchronized, audio being a master and video a slave. It is similar for see and hear. lay3, logo and music are the only objects in their time bases, so they are not closely synchronized with other objects.

4. Execution architecture for synchronized multimedia documents

4.1. Architecture

To support the model presented above, we have defined an execution architecture to play back documents specified using the proposed temporal extension to HTML. The extensions are fairly low-level, so all the proposed concepts have their counterparts at the system level. The architecture is based on three components: synchronizable objects, synchronization events, and synchronization managers.

A synchronizable object integrates media and synchronization: it encapsulates the services needed for a media to be played back and for controlling its execution. A synchronizable object controls fine grain scheduling of media samples at a nominal or requested rate. A synchronizable object is a media object wrapped using a synchronizable interface that defines methods needed by the underlying multimedia architecture to handle intramedia and intermedia synchronization and scheduling. Synchronizable objects generate synchronization events and are controlled by synchronization managers. By clearly separating media processing functions from synchronization, the architecture becomes modular and dynamically extendible: objects that deal with new media or compression formats, can be integrated in a seamless way.

Synchronization events convey time information between various entities. A synchronization event defines an action aimed at a target that will happen in the future. When its deadline is reached, the event is sent to the target to perform the required action. A hypertime link can be easily implemented using such a synchronization event. The deadline for a hypertime link can be obtained from the relative temporal information associated with the link.

Synchronization managers implement time bases. They manage a pool of synchronizable objects belonging to the same time base. They handle synchronization events on behalf of objects and enforce the synchronization policy defined in the time base by means of roles and synchronization points. Managers do not handle time themselves, but use the internal scheduler for scheduling events.

Figure 5 presents the global functional view of the architecture.

Fig. 5. Functional structure of the execution architecture.

The three concepts on which our architecture is based, make the notion of time virtual. We can say that time in our architecture is elastic, which means that it provides flexible adaptive synchronization in a computing environment that does not have strong real-time support. If such a support is available, the architecture guarantees quality of service.

4.2. Implementation

We have prototyped the concepts of our multimedia architecture using Java. We have experimented with temporal presentations containing synchronized video and audio clips, and close-captioning text in two languages. Figure 6 shows snapshots of video clips. Each of them is synchronized with two close-captioning text sequences.

Fig. 6. Video clips synchronized with close-captioning text.

Using Sun's JITc on UltraSparc, our prototype achieves 15 fps for small MPEG-1 videos (160x120). We expect to perform better with TurboJ developed at the Open Group Research Institute.

5. Conclusion

Integration of time into traditional WWW documents increases their complexity and may be successful only if all the required functionalities are available and if temporal extensions to HTML follow the spirit of the WWW. We have designed our temporal extensions with these objectives in mind. Hypertime links are simple, powerful, and analogous to WWW links. Time bases provide close intermedia synchronization features and dynamic layout opens new ways for specifying multimedia documents. More work is needed to specify useful attributes for media effects and the movement of frames.

We have also defined an execution architecture for play back of multimedia documents. We have prototyped the architecture and experimented with simple synchronized presentations. Our experience shows that the temporal extension to HTML provide interesting contribution to SMIL.

References

[All83]	J.F. Allen, Maintaining knowledge about temporal intervals, Communications of the ACM, 26(11), November 1983.
[BZ93]	M.C. Buchanan and P.T. Zellweger, Automatic temporal layout mechanisms, in: Proc. ACM Multimedia'93, Anaheim, CA, August 1–6, 1993, pp. 341–350.
[EMB95]	W. Effelsberg and T. Meyer-Boudnik, MHEG explained, IEEE MultiMedia, 2(1): 26–38, Spring 1995.
[Ham72]	C.L. Hamblin, Instants and intervals, in: Proc. 1st Conf. of the Intl. Society for the Study of Time, New York, 1972, pp. 324–331.
[HRvL96]	I. Herman, G.J. Reynolds, and J. van Loo, PREMO: An emerging standard for multimedia presentation, IEEE MultiMedia, 3(3–4), Fall–Autumn 1996.
[HyT97]	Information technology — Hypermedia Time-Based Structuring Language (HyTime), International Standard ISO/IEC 10744: 1997
[KD96]	C. Keramane and A. Duda, Interval expressions — a functional model for interactive dynamic multimedia presentations, in: Proc. IEEE International Conference on Multimedia Computing and Systems (ICMCS'96), Hiroshima, Japan, June 1996.
[NKN91]	S.R. Newcomb, N.A. Kipp, and V.T. Newcomb, The "HyTime": hypermedia/time-based document structuring language, Communications of the ACM, 34(11): 67–83, November 1991.
[RV94]	C. Roisin and I. Vatton, Merging logical and physical structures in documents, Electronic Publishing, 6(4): 327–337, April 1994.
[WDG95]	R. Weiss, A. Duda and D.K. Gifford, Composition and search with a video algebra, IEEE Multimedia, 2(1): 12–25, Spring 1995.
[WR94]	T. Wahl and K. Rothermel, Representing time in multimedia systems, in: Proc. of the IEEE International Conference on Multimedia Computing and Systems (ICMCS'94), Boston, MA, May 14–19, 1994, pp. 538–543.

Vitae

Franck Rousseau is a Ph.D. candidate at LSR-IMAG laboratory and a member of technical staff at the Open Group Research Institute in Grenoble. He is supported by Bull. His current research interests include multimedia documents, architectures and communication.

Andrzej Duda is a Professor at INPG (Institut National Polytechnique de Grenoble). He is a member of LSR-IMAG laboratory in Grenoble. Previously, he was a Visiting Scientist at the MIT Laboratory for Computer Science, a Research Scientist at CNRS, and an Assistant Professor at the Université de Paris-Sud. He worked on the design and performance evaluation of distributed systems and his current research interests include distributed multimedia systems, information access, resource discovery and new network applications.