Tutorials
Events schedule
May
13
9:00
12:30
The Practice of Labeling: everything you always wanted to know about labeling
Organizers:
Omar Alonso
Description:
Many data intensive applications that use machine learning or artificial intelligence techniques depend on humans providing the initial dataset, enabling algorithms to process the rest or for other humans to evaluate the performance of such algorithms. There are, however, practical issues with the adoption of human computation and crowdsourcing at scale in the real world. Building systems data processing pipelines that require crowd computing remains difficult. In this tutorial, we present practical considerations for designing and implementing tasks that require the use of humans and machines in combination with the goal of producing high quality labels.
May
13
9:00
12:30
Deep Chit-Chat: Deep Learning for Chatbots
Organizers:
Wei Wu and Rui Yan
Description:
The tutorial is based on the long-term efforts on building conversational models with deep learning approaches for chatbots. We will summarize the fundamental challenges in modeling open domain dialogues, clarify the difference from modeling goal-oriented dialogues, and give an overview of state-of-the-art methods for open domain conversation including both retrieval-based methods and generation-based methods. In addition to these, our tutorial will also cover some new trends of research of chatbots, such as how to design a reasonable evaluation metric and how to conduct dialogue management for the conversational systems in the open domain.
May
13
9:00
12:30
Sequence-aware Recommender Systems
Organizers:
Paolo Cremonesi, Massimo Quadrana and Dietmar Jannach
Description:
In recent years, more and more recommendation algorithms have been proposed that are based on time-ordered user interaction logs. Algorithms for session-based recommendation tasks are among the most prominent examples of such approaches. Differently from more traditional rating prediction algorithms, sequence-aware algorithms are typically designed to learn sequential patterns from user behavior data. These patterns can then be used to predict the user's next action within an ongoing session or to detect short-term trends in the community. In this tutorial, we first outline the application areas of sequence-aware recommendation. We then focus on sequential and session-based recommendation techniques and discuss algorithmic proposals as well as evaluation challenges. Finally, the tutorial will be concluded by an hands-on session.
May
13
9:00
12:30
Scalable Subgraph Counting: The Methods Behind The Madness
Organizers:
Comandur Seshadhri and Srikanta Tirthapura
Description:
Subgraph counting is a fundamental and widely applied problem in graph analysis that asks to count or approximate the occurrences of a small subgraph (pattern) in a large graph (dataset). The last few years have seen a rich literature develop around scalable solutions for this challenging problem. While research results have so far appeared as a somewhat disconnected set of ideas, we observe a few common algorithmic building blocks that they build on. In this tutorial, we summarize the state-of-the-art in terms of such building blocks, and highlight the practical utility of various approaches. We will also cover methods for subgraph counting in "big data" computational models such as the streaming model and parallel and distributed models.
May
13
9:00
12:30
Cloud Economics
Organizers:
Ian Kash
Description:
Current cloud pricing schemes are generally simple utility-style metering where customers pay for what is used. However, this hides substantial complexity as there are many different ways to buy the same underlying resources and subtle but important details to the pricing structure. This tutorial will survey both the resource allocation challenges faced by cloud providers and the economic mechanisms used to resolve them. Topics include spot markets, reservations, storage, resource bundling, and higher-level services such as access to datasets and machine learning models.
May
13
9:00
12:30
Designing Equitable Algorithms for the Web
Organizers:
Ricardo Baeza-Yates and Sharad Goel
Description:
Machine learning algorithms increasingly affect both our online and offline experiences. Researchers and policymakers, however, have rightfully raised concerns that these systems might inadvertently exacerbate societal biases. We provide an introduction to fair machine learning, beginning with a general overview of algorithmic fairness, and then discussing these issues specifically in the context of the Web.
May
14
9:00
17:30
The Challenge of API Management: API Strategies for Decentralized API Landscapes
Organizers:
Erik Wilde and Mike Amundsen
Description:
The rapidly evolving "Web of Services" is based on a diverse set of approaches and technologies. This can make architectural decisions hard when it comes to choosing on how to expose information and services through an API. This challenge becomes more pronounced in organizations with continuously evolving API landscapes. This tutorial takes participants through two different journeys: (1) Discussing API styles and API technologies, comparing and contrasting them as a way to highlight the fact that there is no such thing as the best choice. (2) How to define an API strategy that helps teams to make effective choices about APIs in a given context, and how to manage that context over time in landscapes of thousands of evolving APIs.
May
13
14:00
17:30
A/B Testing at Scale: Accelerating Software Innovation
Organizers:
Somit Gupta, Ronny Kohavi, Alex Deng, Jeff Omhover and Pawel Janowski
Description:
Online controlled experiments help make data-driven decisions in a number of products and services like search engines (e.g., Google, Bing), retail services (e.g., Amazon, eBay, Etsy), social networking services (e.g., Facebook, LinkedIn, Twitter), and travel services (e.g., Expedia, Airbnb, Booking.com). The theory of a controlled experiment is simple. In practice, the deployment and evaluation of online controlled experiments at scale (100’s of concurrently running experiments) presents many pitfalls and challenges. In this tutorial, we will introduce the A/B testing methodology, walkthrough use cases using real examples, and then focus on practical and research challenges in scaling experimentation. We will share key lessons learned from scaling experimentation at Microsoft to thousands of experiments per year and outline directions for future work.
May
13
14:00
17:30
Crowdsourcing Inclusivity: Dealing with diversity of opinions, perspectives and ambiguity in annotated data - The CrowdTruth Tutorial
Organizers:
Lora Aroyo, Anca Dumitrache, Oana Inel, Zoltán Szlávik, Benjamin Timmermans and Chris Welty
Description:
We introduce the CrowdTruth methodology for crowdsourcing ground truth by harnessing and interpreting inter-annotator disagreement. CrowdTruth is a widely used crowdsourcing methodology adopted by industrial partners and public organizations (Google, IBM, New York Times, The Cleveland Clinic, Crowdynews, The Netherlands Institute for Sound and Vision, Rijksmuseum), in multiple domains (news, medicine, cultural heritage, social sciences). The central characteristic of CrowdTruth is harnessing the diversity in human interpretation to capture the wide range of opinions and perspectives, and thus, provide more reliable and realistic real-world annotated data for training and evaluating machine learning components. This tutorial aims to introduce this novel approach to crowdsourcing that contributes to the larger discussion on how to make the Web more reliable, diverse and inclusive.
May
13
14:00
17:30
Socially Responsible NLP
Organizers:
Yulia Tsvetkov, Vinodkumar Prabhakaran and Rob Voigt
Description:
As language technologies have become increasingly prevalent in analyzing online data, there is a growing awareness that decisions we make about our data, methods, and tools often have immense impact on people and societies. This tutorial will provide an overview of real-world applications of NLP technologies and their potential ethical implications. We intend to provide the researchers with an overview of tools to ensure that the data, algorithms, and models that they build are socially responsible. These tools will include a checklist of common pitfalls that one should avoid, as well as methods to mitigate these issues. Issues of bias, ethics, and impact are often not clear-cut; this tutorial will also discuss the complexities inherent in this area.
May
13
14:00
17:30
Concept to Code: Deep Learning for Fashion Recommendation
Organizers:
Omprakash Sonie, Muthusamy Chelliah and Shamik Sural
Description:
Deep Learning has shown significant results in various domains. In this tutorial, we provide conceptual understanding of embedding methods, Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNNs). We present fashion use case and apply these techniques for modeling image, text as well as sequence data to figure out user profiles, give personalized recommendations tailored to changing user taste and interest. Given the image of a fashion item, recommending complementary matches is a challenge. Users’ taste evolves over time and depends on persona. Humans relate objects based on their appearance and non-visual factors of lifestyle merchandise which further complicate recommendation task. Composing outfits in addition necessitates constituent items to be compatible - similar in some but different in other aspects.
May
13
14:00
17:30
Economic Theories of Distributive Justice for Fair Machine Learning
Organizers:
Krishna Gummadi and Hoda Heidari
Description:
Machine Learning is increasingly employed to make consequential decisions for humans. In response to the ethical issues that may ensure, an active area of research in ML has been dedicated to the study of algorithmic unfairness. This tutorial introduces fair-ML to the web conference community and offers a new perspective on it through the lens of the long-established economic theories of distributive justice. Based on our own past and ongoing research, we believe that economic theories of equality of opportunity, inequality measurement, and social choice has a lot to offer—in terms of tools and insights—to data scientists and practitioners interested in understanding the ethical implications of their work. We overview these theories and discuss their connections to fair-ML.
May
13
14:00
17:30
Modeling and Mining Feature-Rich Networks
Organizers:
Rushed Kanawati and Martin Atzmueller
Description:
In the field of web mining and web science, as well as data science and data mining there has been a lot of interest in the analysis of (social) networks. With the growing complexity of heterogeneous data, feature-rich networks have emerged as a powerful modeling approach: They capture data and knowledge at different scales from multiple heterogeneous data sources, and allow the mining and analysis from different perspectives. The challenge is to devise novel algorithms and tools for the analysis of such networks. This tutorial provides a unified perspective on feature-rich networks, focusing on different modeling approaches, in particular multiplex and attributed networks. It outlines important principles, methods, tools and future research directions in this emerging field.
May
14
9:00
12:30
Online User Engagement: Metrics and Optimization
Organizers:
Liangjie Hong and Mounia Lalmas
Description:
User engagement plays a central role in online services. The main challenge is to leverage collected knowledge about the daily online behavior of millions of users to understand what engages them short-term and long-term. Two critical steps of improving user engagement are metrics and their optimization. The most common way that engagement is measured is through various online metrics. This tutorial will review these metrics, their advantages and drawbacks, and their appropriateness to various types of online services. Once metrics are defined, how to optimize them will become the key issue. We will survey methodologies that are utilized to optimize these metrics via direct or indirect ways, with case studies in the domain of news, search, entertainment, and e-commerce.
May
14
9:00
12:30
Continuous Analytics of Web Streams
Organizers:
Riccardo Tommasini, Robin Keskisärkkä, Jean-Paul Calbimonte, Eva Blomqvist, Emanuele Della Valle and Albert Bifet
Description:
We provide a comprehensive introduction to web stream processing, including the fundamental stream reasoning concepts, as well as an introduction to practical implementations and how to use them in concrete web applications. To this extent, we intend to 1) survey existing research outcomes from Stream Reasoning / RDF Stream Processing that arise in querying, reasoning on and learning from a variety of highly dynamic data, 2) introduce deductive and inductive stream reasoning techniques as powerful tools to use when addressing a data-centric problem characterized both by variety and velocity, 3) present a relevant use-case, which requires to address data velocity and variety simultaneously on the web, and guide the participants in developing a Web stream processing application.
May
14
9:00
12:30
From Research Articles to Knowledge Graphs: Methods for ontology-driven knowledge base creation from text
Organizers:
Vayianos Pertsas and Panos Constantopoulos
Description:
We address the challenge of transforming text into knowledge graphs. We will tutor the participants to methods for modeling domain knowledge, extracting information from texts using ML techniques and associating this with other information mined from the Web in order to create knowledge graphs according to a domain model. The scholarly domain will be used as a use case, where we will show how to model research processes, extract them from research articles, associate them with contextual information from article metadata and other digital repositories and create knowledge bases available as linked data. Our aim is to show how different methodologies, namely NLP, ML and conceptual modeling, can be combined with Web technologies in a meaningful workflow.
May
14
9:00
17:30
Representation Learning on Networks: Theories, Algorithms, and Applications
Organizers:
Jie Tang and Yuxiao Dong
Description:
We will give a systematic introduction for representation learning on large-scale networks, covering theories, algorithms, and applications. We will introduce both the history and recent advances on network representation learning. Uniquely, this tutorial aims to provide the audience with 1) underlying theories in network representation learning and 2) our experience in translating network representation learning into real-world application on the Web, including Alibaba, AMiner, Microsoft Academic Search, as well as Wechat and Tencent. Finally, all the work introduced in the tutorial is guaranteed with open code and we will also take the opportunity to release the Open Challenge on Network Embedding with open datasets and benchmarks.
May
14
9:00
17:30
Human Mobility from theory to practice: Data, Models and Applications
Organizers:
Filippo Simini, Gianni Barlacchi, Luca Pappalardo, Roberto Pellungrini
Description:
The rapid inclusion of tracking technologies in personal devices opened the doors to the analysis of large sets of mobility data like GPS traces and call detail records. This tutorial presents an overview on both modeling principles of human mobility and machine learning models applicable to specific problems. We review the state of the art of four main aspects in human mobility: (1) human mobility data landscape; (2) key measures of individual and collective mobility; (3) generative models at the level of individual, population and mixture of the two; (4) next location prediction algorithms; (5) applications for social good. For each aspect, we show experiments and simulations using the Python library "scikit-mobility" developed by the presenters of the tutorial.
May
14
9:00
17:30
Social audience, under the influence
Organizers:
Augustin Chaintreau, Arthi Ramachandran and Elissa Redmiles
Description:
The increasing availability of knowledge and interconnectivity has brought with it an amplification of propaganda and influence. We must account for the influence systems we design may have on changing audience beliefs and societal outcomes. This tutorial will provide an overview of the current state-of-the-art knowledge about how audiences reacts to and engage with content and how audience influence has been engineered, both by human and algorithmic intervention. We will provide in-depth discussions and surveys of key research methodologies for each topic (surveying and quantifying perceptions, assessing audience size, reproducing reinforcing dynamics, and computational limits of fair rankings). We will conclude with breakout sessions to develop concrete directions for future work based on current shortcomings and underexplored areas.
May
14
14:00
17:30
Privacy-preserving Data Mining in Industry
Organizers:
Krishnaram Kenthapadi, Ilya Mironov and Abhradeep Thakurta
Description:
Preserving privacy of users is a key requirement of web-scale data mining applications and systems, and has witnessed a renewed focus in light of recent data breaches and regulations such as GDPR. We will first present the lessons learned from privacy breaches over the last two decades and an overview of differential privacy. Then, we will focus on the application of privacy-preserving data mining techniques in practice, by presenting case studies such as Apple's differential privacy deployment for iOS / macOS, Google's RAPPOR, LinkedIn Salary, and Microsoft's differential privacy deployment for collecting Windows telemetry. We will conclude with open problems and challenges for the data mining / machine learning community, based on our experiences in industry.
May
14
9:00
12:30
Fairness-Aware Machine Learning: Practical Challenges and Lessons Learned
Organizers:
Sarah Bird, Ben Hutchinson, Krishnaram Kenthapadi, Emre Kıcıman and Margaret Mitchell
Description:
Researchers and practitioners from different disciplines have highlighted the ethical and legal challenges posed by the use of machine learned models and data-driven systems, and the potential for such systems to discriminate against certain population groups, due to biases in algorithmic decision-making systems. This tutorial aims to present an overview of algorithmic bias / discrimination and techniques for achieving fairness in machine learning systems. We will motivate the need for adopting a "fairness-first" approach when developing machine learning based models and systems in practice. Based on our experiences in industry, we will present case studies from different technology companies, highlight best practices, and identify open problems and research challenges for the data mining / machine learning community.
May
14
14:00
17:30
Explainable Recommendation and Search
Organizers:
Yongfeng Zhang, Jiaxin Mao and Qingyao Ai
Description:
Explainable recommendation and search attempt to develop search/recommendation models that are both accurate (i.e., high-quality recommendation or search results), and explainable (i.e., model is explainable or intuitive explanations of the results can be generated), which can help to improve the system transparency, persuasiveness, trustworthiness, and effectiveness. The tutorial focuses on the recent research of explainable recommendation and search algorithms, as well as their application in real-world systems such as search engine, e-commerce and social networks. The tutorial aims at introducing and communicating explainable recommendation and search methods to the community, as well as gathering researchers and practitioners interested in this research direction for discussions, idea communications, and research promotions.