Tutorial abstracts

Enhancing Search Relevance: Machine Learning Techniques for Better Matching of Query and Document

In this tutorial, we will give a systematic and detailed presentation on newly developed machine learning technologies for query document matching in search. We will focus on the descriptions on the fundamental problems, as well as the recent solutions. People in the industry can get a summary of the state-of-the-art methods and think about how to apply them in practice, and people in the academia can get a reference of the recent work and leverage the result in their own research. Matching between query and document is not limited to search, and similar problems can be observed at online advertisement, recommendation system, and other applications, as matching between objects from two spaces. The technologies we introduce can be generalized into more general machine learning techniques, which we call learning to match.

Location-Based Social Networks

In this tutorial, we first introduce and define the meaning of location-based social network (LBSN). Then, we discuss the research philosophy behind LBSNs from the perspective of users and locations, and point out its unique challenges beyond traditional social networks. Later, we explore some representative research in this field and summarize the general methodologies that can be used in this research theme.

Mining, Searching and Exploiting Collaboratively Generated Content on the Web

Proliferation of ubiquitous access to the Internet enables millions of Web users to collaborate online on a variety of activities. Many of these activities result in the construction of large repositories of knowledge, either as their primary aim (e.g., Wikipedia) or as a by-product (e.g., Yahoo! Answers). In this tutorial, we will discuss mining and exploiting collaboratively generated content (CGC) on the Web for web search and information retrieval tasks. Specifically, we intend to cover two complementary areas of the problem: (1) organizing, filtering, and organizing collaboratively generated content data and (2) using CGC as a powerful enabling resource for knowledge-enriched, intelligent text representations and new web search and information retrieval techniques.

Querying and Exchanging XML and RDF on the Web

This tutorial is intended to benet researchers and system designers in the broad area of scalable query engines for XML and RDF. The tutorial would benefit both designers of the query engines as well as users of these engines since a survey of the current systems and an in-depth understanding will is essential for choosing the appropriate system as well as designing an eective system. This tutorial does not require any knowledge on XML or RDF query engines.

The Web of Things: Integrating Sensors to the Web

The tutorial on Web of Things will discuss the vertical of the system by identifying the relevant components, illustrating their functionality and showing existing tools and systems. First the tutorial will cover architectural aspects and discuss the levels of abstraction for integrating the “things” into the web. Then, the tutorial will focus on semantic technology and analytic methods for leveraging services and applications on top of the things. Finally, through live demos, state of the art technology and tools will be showed. Existing projects and research directions will also be provided.

Watson and the Deep QA architecture

Open domain Question Answering (QA) is a long standing research problem. Recently, IBM took on this challenge in the context of Jeopardy!, a well-known TV quiz show that has been airing on television in the United States for more than 25 years. It pits three human contestants against one another in a competition that requires answering rich natural language questions over a very broad domain of topics. The development of a system able to compete with grand champions in the Jeopardy! challenge led to the design of the DeepQA architecture and the implementation of Watson. The DeepQA project shapes a grand challenge in Computer Science that aims to illustrate how the wide and growing accessibility of natural language content and the integration and advancement of Natural Language Processing, Information Retrieval, Machine Learning, Knowledge Representation and Reasoning, and massively parallel computation can drive open-domain automatic Question Answering technology to a point where it clearly and consistently rivals the best human performance. 
 Natural Language Processing (NLP) plays a crucial role in the overall Deep QA architecture. It allows to “make sense” of both question and unstructured knowledge contained in the large corpora where most of the answers are located. Semantic Web Technology, enhanced by a massive use of open linked data, is another key component of Watson. Linked data and triple stores have been used to generate candidate answers and to score them under multiple points of view such as type coercion and geographic proximity. In addition the connection between linked data and natural language text offered by Wikipedia has been very useful to generate open domain training data for relation detection and entity recognition.

The Role of Human-Generated and Automatically-Extracted Lexico-Semantic Resources in Web Search

This tutorial examines the role of knowledge from lexico-semantic resources, whether human-generated or automatically-extracted, in information retrieval in general, and Web search in particular. It teaches the audience about characteristics, advantages and limitations of existing, human-generated resources; methods for extracting open-domain classes, instances and relations from the Web; the role of human-generated vs. automatically-extracted knowledge resources, in enhancing information retrieval; and implications in semantic annotation of queries, understanding query intent, and information access and retrieval in general.

CSS3 for Style (W3C Tutorial)

The goal of the tutorial is to explain the newest features of Cascading Style Sheets (CSS). The tutorial will teach how to use the features in the CSS Snapshot 2010 that are new since 1999 (i.e., since CSS Level 2). These features can be applied to all existing versions of HTML and XHTML, and also work for the next version, HTML5. That includes such features as Media Queries (style sheets that are adapted for specific devices), Namespaces (styles for XML files with mixed vocabularies), semi-transparent colors, and many new selectors. The tutorial will also take a brief look at the features expected in the next snapshot of CSS, for which there are currently some experimental implementations.

Developing Mobile Web Applications (W3C Tutorial)

Participants to the tutorial will learn how to build applications for mobile devices using Web technologies. We will first focus on what makes it different to use the Web on mobile devices compared to computers: the specific constraints of these devices, as well as their increasing specific advantages.

We will then learn how to work around these constraints to provide a good user experience on mobile devices:

  • learning how best practices can help hide limitations of mobile devices
  • using features from HTML5, CSS to make sites more mobile-friendly
  • use the exciting new APIs available on modern mobile platforms
  • and looking at what content adaptation solutions can be used to cater to a large number of devices

The tutorial will then look at how to exploit all the specificities of the mobile user experience, via JavaScript APIs, touch interactions, camera integration, etc.

Practical Cross-Dataset Queries on the Web of Data

The web is developing into a platform for data exchange, as shown by the rise of web APIs, Microdata, Schema.org, Facebook’s Open Graph Protocol, and the Linked Open Data Cloud. All these sources of web data have one thing in common: they can be converted to the RDF data model with off-the-shelf tools. In this tutorial, participants will learn how to do ad-hoc queries and data mashups across such datasets using W3C’s SPARQL query language. Emphasis is placed on practical recipes and hands-on sessions.

HTML5 tutorial

The tutorial will present the main characteristics of HTML5 and focus on some parts like the web sockets API, the canvas html5 tag, external JavaScript libraries like jQuery mobile for smartphone games, semantics, database persistence, etc. During the exercises, a small real time Pictionary game involving several participants in real time will be developed, using the NodeJS server.

Large Scale Machine Learning and Its Applications to Web Information Management

Machine learning techniques have been widely used in web information management, such as document retrieval, text categorization, web page clustering, and social network analysis. One of the main challenges in applying learning algorithms to web data management is the scalability issue: many well established learning algorithms are unable to handle the scale of web data. This tutorial will cover the general theories of large-scale machine learning as well as tools and algorithms that have been developed for learning from web scale data sets. It will also highligh the web information management problems where these algorithms have been successfully applied.

Integrating and Ranking Aggregated Content on the Web

In this tutorial, we will present the core problems associated with content aggregation, which include: sources of predictive evidence, sources of training data, relevance modelling, and evaluation. While much of the aggregation literature is in the context of Web search, we also present material related to aggregation more generally. Furthermore, we will present material from both academic and commercial perspectives and review solutions developed in both environments. This will provide a holistic view for researchers and a set of tools for different types of practitioners.

Digital Advertising and Marketing: A review of three generations

Over the past 15 years online advertising, a $65 billion industry worldwide in 2010, has been pivotal to the success of the World Wide Web. This success has arisen largely from the transformation of the advertising industry from a low-tech, human intensive, “Mad Men” way of doing work (that were common place for much of the 20th century and the early days of online advertising) to highly optimized, mathematical, computer-centric processes (some of which have been adapted from Wall Street) that form the backbone of many current online advertising systems. We have already being through three generations of digital advertising. Generation 1 borrowed from offline techniques (human sales force) and pricing models (CPM); generation 2 leveraged automatic pricing models (via online auctions and CPC pricing models) and automatic targeting; and generation three has focused on personalization (e.g., behavioral targeting), and new market places such as ad exchange.