Tutorial #1 – LIKE and Recommendation in Social Media
(Morning)
Presenters
- Dongwon Lee, Penn State University, USA
- Huan Liu, Arizona State University, USA
-
The recent dramatic increase in the usage and prevalence of social media has led to the creation and sharing of a significant amount of information in various formats such as texts, photos, or videos. When it comes to information consumption, people are not only accessing or appreciating published and shared contents, but also interacting with them by adding comments or pressing the Like button (or expressing other relationships similar to Like in nature such as “+1” in Google+, “re-pin” in Pinterest, and “favorite” in Flickr). With such massive social media data with rich LIKE-like relationships therein, recommendation has been proven to be effective in mitigating the information overload problem. It has demonstrated its strength in improving the quality of user experience, and positively impacted the success of social media. New types of data introduced by social media not only provide more information to advance traditional recommender systems but also manifest new research possibilities for recommendation.
In this tutorial, therefore, we aim to provide a comprehensive overview of: (1) various examples of LIKE in social media, existing literatures studying about LIKE in social media, the analysis and modeling of LIKE activities, and techniques to predict the creation and deletion of LIKE relationship in social media, and (2) various recommendation tasks in social media, especially their recent advances and new frontiers, and the emerging challenges and opportunities in recommending friends, contents, or locations in social media.
Tutorial #2 – Deep Learning for the Web
(Morning)
Presenters
- Kyomin Jung, Seoul National University, Korea
- Byoung-Tak Zhang, Seoul National University, Korea
- Prasenjit Mitra, Qatar Computing Research Institute, Qatar
-
Recent success of deep learning has shown that it outperforms state-of-the-art systems in image processing, voice recognition, web search, recommendation systems, etc. In this tutorial, we will introduce the basics of deep learning algorithms and architectures such as deep belief networks, convolutional neural networks, recursive neural networks. Then we will discuss how these deep learning systems have been used in the above-mentioned fields. We also discuss the neural networks used to generate word embeddings, such as Word2Vec, deep DSSM for deep semantic similarity, and object detection in images, such as GoogLeNet, and AlexNet.
The first part of the tutorial will present the basics of neural networks, training algorithms via backpropagation, which is a common method of training artificial neural networks. We will emphasize how each of these concepts can be used in various Web data analysis. In the second part of the tutorial, we describe the learning algorithms for deep neural networks and related ideas, such as contrastive divergence, wake-sleep algorithms, and Monte Carlo simulation. We then describe the different kinds of deep architectures, including deep belief networks, and convolutional neural networks. In the third part, we will present recursive neural networks, which can learn structured tree outputs as well as vector representations for phrases and sentences. We will discuss applications include POS tagging, sentiment analysis, and social network data. The audience will have a clear understanding of how to build a deep learning system for word, sentence and document level tasks. The fourth section of the tutorial will cover other application examples of deep learning. These include speech recognition, object classification, action recognition from videos, web data analytics, and wearable/IoT sensor data modeling for smart services.
Tutorial #3 – Urban Informatics and the Web
(Morning)
Presenters
- Konstantinos Pelechrinis, University of Pittsburgh, USA
- Daniele Quercia, University of Cambridge, UK
-
Based on a recent report from the United Nations, more than 50% of the world’s population currently lives in cities. This percentage is projected to increase to 70% by the year 2050. As massive amounts of people move to urban areas there is a need for cities to be run more efficiently, while at the same time improving the quality of life of their dwellers. Nevertheless, the exact same force that sets the above requirement, i.e., the proliferation of urbanization levels, makes this task much harder and challenging, especially in megacities. Despite the aforementioned conflicting dynamics, many city management operations can be facilitated by appropriate exploitation of the unprecedented amount of data that can be made available to authorities from a variety of sources. In the era of big data and ubiquitous and pervasive mobile computing, different types of sensors such as parking meters, weather sensors, traffic sensors, pipe sensors, public transportation ticket readers and even human sensors (e.g., through web technologies, social media or cell phone usage data) can assist in these efforts. Furthermore, civic applications can exploit web and mobile technologies to deliver a livable, sustainable and resilient environment to the citizens. Harnessing these information streams and technologies presents many challenges that are in the epicenter of this tutorial. In this tutorial we will present the current practices and methods in the emerging field of urban informatics as well as the open challenges. The topics to be covered in this tutorial are structured in three sessions: (i) introduction to urban studies and urban informatics, (ii) civic data and technologies for urban sensing and (iii) analytical techniques used for urban data analysis. Finally, we will also provide concrete examples of urban informatics applications.
Tutorial #4 – Knowledge Bases for Web Content Analytics
(Morning)
Presenters
- Johannes Hoffart, Max Planck Institute for Informatics, Germany
- Nicoleta Preda, University of Versailles, France
- Fabian Suchanek, Télécom ParisTech University, France
- Gerhard Weikum, Max Planck Institute for Informatics, Germany
Link: http://resources.mpi-inf.mpg.de/yago-naga/www2015-tutorial
-
Recent years have seen the automatic construction of very large knowledge bases, such as DBpedia, Freebase, Nell and Yago, as well as industrial knowledge graphs at Google, Microsoft, Bloomberg, Walmart, and others. Some of these knowledge bases contain many millions of entities, organized into thousands of fine-grained semantic classes, and billions of facts that capture entity attributes or relationships between entities. This digital world knowledge enables intelligent applications and knowledge-centric services like disambiguating natural-language text, entity linking, deep question answering, semantic search, and text analytics over Web contents boosted by identifying and aggregating over entities and relations. This tutorial will cover a wide spectrum of methods for automatically constructed large knowledge bases, for extending them, and for harnessing them in intelligent applications like text annotation, disambiguation, entity linking, and Web contents analytics.
Participants will obtain an in-depth understanding of state-of-the-art knowledge bases, how they are built and maintained, how knowledge harvesting can utilize scale-out algorithms, and how knowledge can contribute to analytic tasks over news and Web contents. As the relevant literature is widely dispersed across different communities like Web Mining (WSDM, WWW, and KDD), Artificial Intelligence (IJCAI and AAAI), Natural Language Processing (ACL and EMNLP), Semantic Web (ISWC), and Data Management (SIGMOD, VLDB, and ICDE), the tutorial also serves as a guided tour on the latest research in these venues and aims to offer a unifying big picture.
Tutorial #5 – Geo-Social Media Analytics
(Afternoon)
Presenters
- Cheng-Te Li, Academia Sinica, Taiwan
- Hsun-Ping Hsieh, National Taiwan University, Taiwan
-
With the maturity of wireless communication techniques, GPS-equipped mobile devices become ubiquitous, and location-acquisition technologies and services are flourishing. These location applications as well as mobile devices, developed and combined with the social networking services, foster the emergence of geo-social media, a novel type of user-generated geo-social data, such as data from Facebook, Twitter, and Foursquare. In geo-social media, social connections and geo-location information of users are the essential elements, which keep track of their user interactions and their spatial-temporal activities. While social interactions are depicted by online network structures, and geographical activities are usually represented as check-in records. Due to the pervasive mobility of users, a huge amount of user-generated geo-social data is rapidly generated. Such big geo-social data not only collectively represents diverse kinds of real-world human activities, but also serves as a handy resource for various geo-social applications. In this tutorial, we aim to present the recent advances on geo-social media analytics in a systematic manner, consisting of five parts: (a) properties of geo-social networks, which unveil the relationships between human mobility and social structures; (b) geo-social link prediction, using geographical, mobility, activity features with various inference models; (c) location recommendation, leveraging personal, social, contextual, geographical, and content information; (d) geo-social influence propagation and maximization; and (e) connecting online and offline social networks for revisiting conventional SNA wisdom and developing applications that bridge virtual and physical social worlds. We also highlight the unsolved problems for each of the aforementioned topics and future directions of geo-social media analytics. We believe this tutorial can benefit the research communities of mobile web, data mining, information retrieval, social network analysis, recommender system, and marketing and advertisement.
Tutorial #6 – Large Scale Network Analytics with SNAP
(Afternoon)
Presenters
- Rok Sosic, Stanford University, USA
- Jure Leskovec, Stanford University, USA
-
We propose an entry level 3-hour tutorial, which will give an overview of basic principles of network analytics and how to use SNAP for network analytics in real world scenarios.
Stanford Network Analysis Platform (SNAP) is a general purpose, high performance system for analysis and manipulation of large networks. SNAP is being used widely in studies of Web-based and other large scale networks. SNAP consists of software, which provides a rich set of functions for performing network analytics and is available for Python and C++, and a popular repository of real world network datasets. All software is freely available under a liberal open source license. All datasets discussed are publicly available for download from the Web. Hundreds of copies of software and datasets are downloaded each month.
The tutorial is designed to proceed from entry level to more advanced topics. At the end of the tutorial, the participants will understand basics of network analytics, the resources provided by SNAP and how to apply those resources to network analytic tasks on Web-based datasets. They will have SNAP installed on their computers and will gain hands-on experience with SNAP. The tutorial is structured in 5 parts: Python API, C++ API, analytic functionality, datasets, hands-on exercises. These parts will be combined into three 50 minute sessions.
Tutorial #7 – Mining Mobility Data
(Afternoon)
Presenters
- Spiros Papadimitriou, Rutgers University, USA
- Tina Eliassi-Rad, Rutgers University, USA
-
The fairly recent explosion in the availability of reasonably fast wireless and mobile data networks has spurred demand for more capable mobile computing devices. Conversely, the emergence of new devices increases demand for better networks, creating a virtuous cycle. The current concept of a smartphone as an always-connected computing device with multiple sensing modalities was brought into the mainstream by the Apple iPhone just a few years ago. Such devices are now seeing an explosive growth. Furthermore, small, cheap, always-connected devices (standalone or peripheral) with additional sensing capabilities are very recently emerging, further blurring the lines between the web, apps, and the real world. All of this opens up countless possibilities for data collection and analysis, for a broad range of applications.
In this tutorial, we survey the state-of-the-art in terms of mining mobility data across different application areas such as ads, geo-social, privacy and security. Our tutorial consists of three parts. (1) We summarize the possibilities and challenges in the collection of data from various sensing modalities. (2) We cover cross-cutting challenges such as real-time analysis and security; and we outline cross-cutting algorithms for mobile data mining such as network inference and streaming algorithms. (3) We focus on how all of this can be usefully applied to broad classes of applications, notably mobile and location-based social, mobile advertising and search, mobile Web, and privacy and security. We conclude by showcasing the opportunities for new data collection techniques and new data mining methods to meet the challenges and applications that are unique to the mobile arena (e.g., leveraging emerging embedded computing and sensing technologies to collect a large variety and volume of new kinds of “big data”).
Tutorial #8 – Diversity and Novelty on the Web: Search, Recommendation, and Data Streaming Aspects
(Afternoon)
Presenters
- Rodrygo L. T. Santos, University Federal de Minas Gerais, Brazil
- Pablo Castells, University Autonoma de Madrid, Spain
- Ismail Sengor Altingovde, Middle East Technical University, Turkey
- Fazli Can, Bilkent University, Turkey
-
The main goal of this tutorial is deriving a common understanding of the diversification/novelty problem and the existing solutions, their commonalities and differences, in multiple Web information systems, i.e., in search engines, recommender systems, and data streams. In particular, tutorial attendees will:
- Understand the importance and complexities of achieving diversity/novelty for various Web IR domains;
- Learn the state-of-the-art approaches for diversity/novelty in search results, documents and streaming data, and recommender systems;
- Learn the fundamental evaluation metrics and have an overview of past and current evaluation campaigns;
- Obtain a unified view of the topic (i.e., the commonalities and connections between various methods employed in different domains as well as the differences between them) as a take-away message and as a means to foster new research directions.
In this respect, this tutorial aims to bring together Web information systems researchers and practitioners with introductory knowledge in the broad domains of search, recommendation, and data streams, and who have to cope with ambiguous requests or redundant results in some manner.