Tutorials May 19 | International World Wide Web Conference

Monday May 18 Tuesday May 19

Tutorial #9 – Diffusion in Social and Information Networks: Research Problems, Probabilistic Models & Machine Learning Methods

(Morning)

Presenters

Manuel Gomez Rodriguez, Max Planck Institute for Software Systems, Germany
Le Song, Georgia Institute of Technology, USA

Show/Hide abstract

In recent years, there has been an increasing effort on developing realistic models, and learning and inference algorithms to understand, predict, and influence diffusion over networks. This has been in part due to the increasing availability and granularity of large-scale diffusion data, which, in principle, allows for understanding and modeling not only macroscopic diffusion but also microscopic (node-level) diffusion. To this aim, a bottom-up approach has been typically considered, which starts by considering how particular ideas, pieces of information, products, or, more generally, contagions spread locally from node to node apparently at random to later produce global, macroscopic patterns at a network level. However, this bottom-up approach also raises significant modeling, algorithmic and computational challenges which require leveraging methods from machine learning, probabilistic modeling, event history analysis and graph theory, as well as the nascent field of network science. In this tutorial, we will present several diffusion models designed for fine-grained large-scale diffusion data, present some canonical research problem in the context of diffusion, and introduce state-of-the-art algorithms to solve some of these problems, in particular, network estimation, influence estimation and influence control.

Tutorial #10 – Processing Large Graphs: Representations, Storage, Systems and Algorithms

(Morning)

Presenters

Deepak Ajwani, Bell Labs Ireland, UK
Alessandra Sala, Bell Labs Ireland, UK
Marcel Karnstedt, Bell Labs Ireland, UK

Show/Hide abstract

Analyzing and processing large graphs is of fundamental importance for an ever-growing number of applications. Significant advancements in the last few years at both, systems and algorithmic side, let graph processing become increasingly scalable and efficient. Often, these advances are still not well-known and well-understood outside the systems and algorithms communities. In particular, there is very little understanding of the various trade-offs involved in the usage of particular combinations of algorithms, data structures, and systems. This tutorial will have a particular focus on this aspect, imparting theoretical knowledge intertwined with hands-on experience.

Since there is no clearly winning system/algorithm combination that performs best on all the different metrics, it is of utmost importance to understand the pros and cons of the various alternatives. The tutorial will enable application developers in industry and academics, students as well as researchers to make corresponding decisions in an informed way. The participants do neither require any particular a-priori knowledge apart from a basic understanding of core computer science concepts, nor any special equipment apart from their laptop.

After a general introduction, we will describe the critical dimensions that need to be tackled together to effectively and efficiently overcome problems in large graph processing: data representation, data storage, acceleration via multi-core programming, and horizontally scalable graph-processing infrastructures. Thereafter, we will provide an overview of existing graph-processing systems and graph databases. This will be followed by hands-on experiences with popular representatives of such systems. Finally, we will provide a detailed description of algorithms used in these systems for fundamental problems like shortest paths and Pagerank, how they are implemented, and how this affects the overall performance. We will also cover basic data structures such as distance oracles that can be built on these systems to efficiently answer distance queries for real-world graphs.

Tutorial #11 – An Introduction to Entity Recommendation and Understanding

(Morning)

Presenters

Hao Ma, Microsoft Research, USA
Yan Ke, Microsoft Bing, USA

Show/Hide abstract

Recent years have witnessed rapidly increasing interests on the research field of semantic search. Knowledge base powered entity search and recommendation experience has been widely adopted by major search engine companies. In this tutorial, we provide the first detailed introduction on how entity search and recommendation work and how various entity understanding techniques could further improve the entity search and recommendation experience.

This tutorial consists of four major parts. In the first part, we give a brief introduction on entities and knowledge bases. We also show how we collect information from different data sources as well as how we infer users’ interests on specific entities. In the second part, we demonstrate various entity recommendation and search applications the presenters developed and productionized in Bing, including entity recommendation, natural language interpretation of recommendation, attribute ranking, carousel ranking, entity exploration, factoid answers, conversational search, semantic question and answering, etc. The architectures, challenges, and corresponding solutions on these systems will also be briefly introduced in the second part of the tutorial. The third part will give a deep dive on the algorithms that are related to entity recommendation, including basic non-personalized recommendation algorithms as well as recommendation models that tailor related entities to an individual search user’s unique taste and preference. The fourth part of this tutorial will focus on how to further improve semantic recommendation and search experience by employing other entity understanding techniques. The tutorial will conclude by summarizing and reflecting back on the semantic search applications that users are experiencing on the Web and posit that what we have presented in the tutorial is just a tip of the iceberg to a whole area of exciting and dynamic research that is worthy of more detailed investigation for many years to come.

Tutorial #12- Online Experiments for Computational Social Science

(Afternoon)

Presenters

Eytan Bakshy, Facebook, USA
Sean J. Taylor, Facebook, USA

Show/Hide abstract

Experiments are the gold standard for establishing causal relationships. While Web-based experiments (“A/B tests”) have routinely been used to assess alternative ranking models or user interface designs, they have become increasingly popular for answering important questions in the social sciences. This tutorial teaches attendees how to design, plan, implement, and analyze online experiments. First, we review basic concepts in causal inference and motivate the need for experiments. Then we will discuss basic statistical tools to help plan experiments: exploratory analysis, power calculations, and the use of simulation in R. We then discuss statistical methods to estimate causal quantities of interest and construct appropriate confidence intervals. We then discuss how to design and implement online experiments using PlanOut, an open-source toolkit for advanced online experimentation used at Facebook. We will show how to implement a variety of experiments, including basic A/B tests, within-subjects designs, as well as more sophisticated experiments. We demonstrate how experimental designs from social computing literature can be implemented, and then collaboratively plan and implement an experiment together. We then discuss issues with logging and common errors in the deployment and analysis of experiments. Finally, we will conclude the tutorial with a discussion of strategies and scalable methods for analyzing online experiments, including working with weighted data, and data with single and multi-way dependence. Throughout the tutorial, attendees will be given code examples and participate in the planning, implementation, and analysis of a Web application using Python, PlanOut, and R.

Tutorial #13 – Constructing and Mining Web-scale Knowledge graphs

(Afternoon)

Presenters

Antoine Bordes, Facebook, USA
Evgeniy Gabrilovich, Google, USA

Show/Hide abstract

Recent years have witnessed a proliferation of large-scale knowledge graphs, such as Freebase, YAGO, Facebook’s Entity Graph, Google’s Knowledge Graph, and Microsoft’s Satori. Whereas there is a large body of research on mining homogeneous graphs, this new generation of information networks is highly heterogeneous, with thousands of entity and relation types and billions of instances of vertices and edges. In this tutorial, we will present the state of the art in constructing, mining, and growing knowledge graphs. The purpose of the tutorial is to equip newcomers to this exciting field with an understanding of the basic concepts, tools and methodologies, available datasets, and open research challenges. A publicly available knowledge base (Freebase) will be used throughout the tutorial to exemplify the different techniques.

Tutorial #14 – From Complex Object Exploration to Complex Crowdsourcing

(Afternoon)

Presenters

Sihem Amer-Yahia, Centre National de la Recherche Scientifique/Laboratory of Informatics of Grenoble, France
Senjuti Basu Roy, University of Washington Tacoma, USA

Show/Hide abstract

Forming and exploring complex objects is at the heart of a variety of emerging web applications. Historically, existing work on complex objects has been developed in two separate areas: composite item retrieval and team formation. Composite item retrieval is prevalent in online shopping, where products are bundled together to provide discounts, or travel itinerary recommendation, where points of interest in a city are combined into a single trip offer. Team formation is encountered in the area of social networks analysis and group recommendations, where the objective is to form a team to solve a problem, or consume some items together. At the same time, emerging applications that harness the wisdom of crowd workers, such as, document editing by workers, sentence translation by fans (or fan-subbing), innovative design, citizen science or journalism, represent complex crowdsourcing, in which an object may represent a complex task formed by a set of sub-tasks or a team of workers who work together to solve the task. The goal of this tutorial is to bridge the gap between composite item retrieval and team formation and define new research directions for complex crowdsourcing applications.

This tutorial will start with a review of a number of web applications that have been developed in practice in the last few years and evolve into a summary of the various formalizations proposed to solve complex object formation in those applications. It will then focus on the algorithmic challenges and solutions as well as a summary of the empirical findings. The last part of the tutorial will constitute a third of the material and will study the applicability of formalizations and solutions that were developed in silos in composite item retrieval and in team formation, to emerging web applications in complex crowdsourcing.