Multimedia
Thursday, 10:30 AM – 12:00 PM
Chair: Wei-Ying Ma
Statistical Models of Music-listening Sessions in Social Media
Elena Zheleva, John Guiver, Eduarda Mendes Rodrigues, Natasa Milic-Frayling
User experience in social media is characterized by rich interaction with the media content and other participants within the online community. We use statistical models to describe the patterns of music listening in online communities at different levels of model complexity. First, we adapt the LDA model to capture the users’ taste in songs and identify the corresponding clusters of media and users. Second, we define a graphical model that takes into consideration the listening sessions and captures the listening mood of users. Our session model yields clusters of media and users that capture the behavior exhibited across listening sessions, and it allows faster inference when compared to the LDA model. Our experiments with the data from an online media site (Zune Social music community) demonstrate that the session model is better in terms of the perplexity on the music genre co-occurrence compared to the LDA-based taste model that does not incorporate cross-session information and a baseline model that does not use latent clusters.
Unlocking the Semantics of Multimedia Presentations in the Web with the Multimedia Metadata Ontology
Carsten Saathoff, Ansgar Scherp
The semantics of rich multimedia presentations in the web such as SVG, SMIL, and Flash cannot or only to a very limited extend be understood by search engines today. This hampers the retrieval of such presentations and makes their archival and management a difficult task. Existing metadata models and metadata standards are either conceptually too narrow, focus on a specific media type only, cannot be used and combined together, or are not practically applicable for the semantic description of such rich multimedia presentations. In this paper, we propose the Multimedia Metadata Ontology (M3O) for annotating rich, structured multimedia presentations. The M3O provides a generic modeling framework for representing sophisticated multimedia metadata. It allows for integrating the features provided by the existing metadata models and metadata standards. Our approach bases on Semantic Web technologies and can be easily integrated with multimedia formats such as the W3C standards SMIL and SVG. With the M3O, we unlock the semantics of rich multimedia presentations in the web by making the semantics machine-readable and machine-understandable. The M3O is used with our SemanticMM4U framework for the multi-channel generation of semantically-rich multimedia presentations.
What are the Most Eye-Catching and Ear-Catching Features in the Video? Implications for Video Summarization
Yaxiao Song, Gary Marchionini, Chi Young Oh
With the rapid growth in computing technology and explosive proliferation of digital videos online, it is imperative to give web users effective summarization and skimming tools to facilitate finding and browsing of videos. Video summarization, a mechanism for generating short summaries of videos, has become a common approach for aiding users in browsing and retrieving relevant videos from large video collections. To produce reliable automatic video summarization algorithms, it is essential to first understand how human beings create video summaries with manual efforts. This paper examines a set of video summaries created by multiple human assessors for instructional documentary videos, and identifies the most eye-catching and ear-catching features in these manually generated video summaries. The paper provides insights into what features automatic algorithms should be looking for when performing automatic video summarization.
.