Visual search over The Web Conference 2021 papers
ACCEPTED PAPERS
Title: “Is it a Qoincidence?”: An Exploratory Study of QAnon on Voat |
Authors: Antonis Papasavva (University College London), Jeremy Blackburn (Binghamton University), Gianluca Stringhini (Boston University), Savvas Zannettou (Max Planck Institute) and Emiliano De Cristofaro (University College London). |
Online fringe communities offer fertile grounds for users to seek and share paranoid ideas fueling suspicion of mainstream news, and outright conspiracy theories. Among these, the QAnon conspiracy theory has emerged in 2017 on 4chan, broadly supporting the idea that powerful politicians, aristocrats, and celebrities are closely engaged in a global pedophile ring. At the same time, governments are thought to be controlled by "puppet masters," as democratically elected officials serve as a fake showroom of democracy. In this paper, we provide an empirical exploratory analysis of the QAnon community on Voat.co, a Reddit-esque news aggregator, which has recently captured the interest of the press for its toxicity and for providing a platform to QAnon followers. More precisely, we analyze a large dataset from /v/GreatAwakening, the most popular QAnon-related subverse (the Voat equivalent of a subreddit) to characterize activity and user engagement. To further understand the discourse around QAnon, we study the most popular named entities mentioned in the posts, along with the most prominent topics of discussion, which focus on US politics, Donald Trump, and world events. We also use word2vec models to identify narratives around QAnon-specific keywords, and our graph visualization shows that some of QAnon- related ones are closely related to those from the Pizzagate conspiracy theory and "drops" by "Q." Finally, we analyze content toxicity, finding that discussions on /v/GreatAwakening are less toxic than in the broad Voat community. |
Title: “Short is the Road that Leads from Fear to Hate”: Fear Speech in Indian WhatsApp Groups |
Authors: Punyajoy Saha (Indian Institute of Technology,Kharagpur), Binny Mathew (IIT Kharagpur), Pawan Goyal (Indian Institute of Technology,Kharagpur), Kiran Garimella (MIT Institute for Data, Systems and Society) and Animesh Mukherjee (Indian Institute of Technology,Kharagpur). |
WhatsApp is the most popular messaging app in the world. Due to its popularity, WhatsApp has become a powerful and cheap tool for political campaigning being widely used during the 2019 Indian general election, where it was used to connect to the voters on a large scale. Along with the campaigning, there have been reports that WhatsApp has also become a breeding ground for harmful speech against various protected groups and religious minorities. Many such messages attempt to instil fear among the population about a specific (minority) community. According to research on inter-group conflict, such "fear speech" messages could have a lasting impact and might lead to real offline violence. In this paper, we perform the first large scale study on fear speech across thousands of public WhatsApp groups discussing politics in India. We curate a new dataset and try to characterize fear speech from this dataset. We observe that users writing fear speech messages use various events and symbols to create the illusion of fear among the reader about a target community. We build models to classify fear speech and observe that current state-of-the-art NLP models do not perform well at this task. Fear speech messages tend to spread faster and could potentially go undetected by classifiers built to detect traditional toxic speech due to their low toxic nature. |
Title: “Go eat a bat, Chang!”: On the Emergence of Sinophobic Behavior on Web Communities in the Face of COVID-19 |
Authors: Fatemeh Tahmasbi (Binghamton University), Leonard Schild (CISPA Helmholtz Center for Information Security), Chen Ling (Boston University), Jeremy Blackburn (Binghamton University), Gianluca Stringhini (Boston University), Yang Zhang (CISPA Helmholtz Center for Information Security) and Savvas Zannettou (Max Planck Institute for Informatics). |
The outbreak of the COVID-19 pandemic has changed our lives in unprecedented ways. In the face of the projected catastrophic consequences, most countries have enacted social distancing measures in an attempt to limit the spread of the virus. Under these conditions, the Web has become an indispensable medium for information acquisition, communication, and entertainment. At the same time, unfortunately, the Web is being exploited for the dissemination of potentially harmful and disturbing content, such as the spread of conspiracy theories and hateful speech towards specific ethnic groups, in particular towards Chinese people and people of Asian descent since COVID-19 is believed to have originated from China. In this paper, we make a first attempt to study the emergence of Sinophobic behavior on the Web during the outbreak of the COVID-19 pandemic. We collect two large datasets from Twitter and 4chan’s Politically Incorrect board (/pol/) over a time period of approximately five months and analyze them to investigate whether there is a rise or important differences with regard to the dissemination of Sinophobic content. We find that COVID-19 indeed drives the rise of Sinophobia on the Web and that the dissemination of Sinophobic content is a cross-platform phenomenon: it exists on fringe Web communities like /pol/, and to a lesser extent on mainstream ones like Twitter. Using word embeddings over time, we characterize the evolution of Sinophobic slurs on both Twitter and /pol/. Finally, we find interesting differences in the context in which words related to Chinese people are used on the Web before and after the COVID-19 outbreak: on Twitter we observe a shift towards blaming China for the situation, while on /pol/ we find a shift towards using more (and new) Sinophobic slurs. |
Title: #Twiti: Social Listening for Threat Intelligence |
Authors: Hyejin Shin (Samsung Research), Woochul Shim (Samsung Research), Saebom Kim (Samsung Research), Sol Lee (Samsung Research), Yong Goo Kang (School of Cybersecurity, Korea University) and Yong Ho Hwang (Samsung Research). |
Twitter is a popular public source for threat hunting. Many security vendors and security professionals use Twitter in practice for collecting Indicators of Compromise (IOCs). However, little is known about IOCs on Twitter. Their important characteristics such as earliness, uniqueness, and accuracy have never been investigated. Moreover, how to extract IOCs from Twitter with high accuracy is not obvious. In this paper, we present Twiti, a system that automatically extracts various forms of malware IOCs from Twitter. Based on the collected IOCs, we conduct the first empirical assessment and thorough analysis of malware IOCs on Twitter. Twiti extracts IOCs from tweets identified as having malware IOC information by leveraging natural language processing and machine learning techniques. With extensive evaluation, we demonstrate that not only can Twiti extract malware IOCs accurately, but also the extracted IOCs are unique and early. By analyzing IOCs in Twiti from various aspects, we find that Twitter captures ongoing malware threats such as Emotet variants and malware distribution sites better than other public threat intelligence (TI) feeds. We also find that only a tiny fraction of IOCs on Twitter come from commercial vendor accounts and individual Twitter users are the main contributors of the early detected or exclusive IOCs, which indicates that Twitter can provide many valuable IOCs uncovered in commercial domain. |
Title: A Cooperative Memory Network for Personalized Task-oriented Dialogue Systems with Incomplete User Profiles |
Authors: Jiahuan Pei (University of Amsterdam), Pengjie Ren (University of Amsterdam) and Maarten de Rijke (University of Amsterdam & Ahold Delhaize). |
There is increasing interest in developing personalized Task-oriented Dialogue Systems (TDSs). Previous work on personalized TDSs often assumes that complete user profiles are available for most or even all users. This is unrealistic because (1) not everyone is willing to expose their profiles due to privacy concerns; and (2) rich user profiles may involve a large number of attributes (e.g., gender, age, tastes, ...). In this paper, we study personalized TDSs without assuming that user profiles are complete. We propose a Cooperative Memory Network (CoMemNN) that has a novel mechanism to gradually enrich user profiles as dialogues progress and to simultaneously improve response selection based on the enriched profiles. CoMemNN consists of two core modules: User Profile Enrichment (UPE) and Dialogue Response Selection (DRS). The former enriches incomplete user profiles by utilizing collaborative information from neighbor users as well as current dialogues. The latter uses the enriched profiles to update the current user query so as to encode more useful information, based on which a personalized response to a user request is selected. We conduct extensive experiments on the personalized bAbI dialogue benchmark datasets. We find that CoMemNN is able to enrich user profiles effectively, which results in an improvement of 3.06% in terms of response selection accuracy compared to state-of-the-art methods. We also test the robustness of CoMemNN against incompleteness of user profiles by randomly discarding attribute values from user profiles. Even when discarding 50% of the attribute values, CoMemNN is able to match the performance of the best performing baseline without discarding user profiles, showing the robustness of CoMemNN. |
Title: A First Look at DeepFake Videos in the Wild: Analysis and Detection |
Authors: Jiameng Pu (Virginia Tech), Neal Mangaokar (University of Michigan), Lauren Kelly (Virginia Tech), Parantapa Bhattacharya (University of Virginia), Kavya Sundaram (Virginia Tech), Mobin Javed (LUMS Pakistan), Bolun Wang (Facebook) and Bimal Viswanath (Virginia Tech). |
AI-manipulated videos, commonly known as deepfakes, are an emerging problem. Recently, researchers in academia and industry have contributed several (self-created) benchmark deepfake datasets, and deepfake detection algorithms. However, little effort has gone towards understanding deepfake videos in the wild, leading to a limited understanding of the real-world applicability of research contributions in this space. Even if detection schemes are shown to perform well on existing datasets, it is unclear how well the methods generalize to real-world deepfakes. To bridge this gap in knowledge, we make the following contributions: First, we collect and present the first large-scale dataset of deepfake videos in the wild, containing 1,918 videos from YouTube, and Bilibili with over 5,000,000 frames of content. Second, we present a comprehensive analysis of the growth patterns, popularity, creators, manipulation strategies, and production methods of deepfake content in the real-world. Third, we systematically evaluate and interpret existing defenses using our new dataset, and observe that they are not ready for deployment in the real-world. Fourth, we explore the potential for transfer learning schemes and competition-winning techniques to improve defenses. |
Title: A Generative Adversarial Click Model for Information Retrieval |
Authors: Xinyi Dai (Shanghai Jiao Tong University), Jianghao Lin (Shanghai Jiao Tong University), Weinan Zhang (Shanghai Jiao Tong University), Shuai Li (Shanghai Jiao Tong University), Weiwen Liu (Huawei Noah's Ark Lab), Ruiming Tang (Huawei Noah's Ark Lab), Xiuqiang He (Huawei Noah's Ark Lab), Jianye Hao (Huawei Noah's Ark Lab), Jun Wang (University College London) and Yong Yu (Shanghai Jiao Tong University). |
Modern information retrieval systems, including web search, ads placement, and recommender systems, typically rely on learning from user feedback. Click models, which study how users interact with a ranked list of items, provide a useful understanding of user feedback for learning ranking models. Constructing "right" dependencies is the key of any successful click model. However, probabilistic graphical models (PGMs) have to rely on manually assigned dependencies, and oversimplify user behaviors. Existing neural network based methods promote PGMs by enhancing the expressive ability and allowing flexible dependencies, but still suffer from exposure bias and inferior estimation. In this paper, we propose a novel framework, Generative Adversarial Click Model (GACM), based on imitation learning. Firstly, we explicitly learn the reward function that recovers users' intrinsic utility and underlying intentions. Secondly, we model user interactions with a ranked list as a dynamic system instead of one-step click prediction, alleviating the exposure bias problem. Finally, we minimize the JS divergence through adversarial training and learn a stable distribution of click sequences, which makes GACM generalize well across different ranked list distributions. Theoretical analysis has shown that GACM reduces the exposure bias from $O(T^2)$ to $O(T)$. Our studies on a public web search dataset show that GACM not only outperforms state-of-the-art models in traditional click metrics but also achieves superior performance in addressing the exposure bias and recovering the underlying patterns of click sequences. We also demonstrate that GACM generalize well across different ranked list distributions, allowing safe exploration of the ranking function. |
Title: A Hybrid Bandit Model with Visual Priors for Creative Ranking in Display Advertising |
Authors: Shiyao Wang (Alibaba group), Qi Liu (University of Science and Technology of China), Tiezheng Ge (Alibaba Group), Defu Lian (University of Science and Technology of China) and Zhiqiang Zhang (Alibaba Group). |
Creative plays a great important role in e-commerce for exhibiting products to attract customers. Sellers usually create multiple creatives for comprehensive demonstrations, thus it is crucial to display the most appealing design to maximize the Click-Through Rate~(CTR). For this purpose, modern recommender systems dynamically rank creatives when a product is proposed for a user. However, this task suffers more cold-start problem than conventional products recommendation since the user-click data is more scarce and creatives potentially change more frequently. In this paper, we propose a hybrid bandit model with visual priors which first makes predictions with a visual evaluation, and then naturally evolves to focus on the specialities through the hybrid bandit model. Our contributions are three-fold: 1) We present a visual-aware ranking model (called VAM) that incorporates a list-wise ranking loss for ordering the creatives according to the visual appearance. 2) Regarding visual evaluations as a prior, the hybrid bandit model (called HBM) is proposed to evolve consistently to make better posteriori estimations by taking more observations into consideration for online scenarios. 3) A first large-scale creative dataset, CreativeRanking, is constructed, which contains over 1.7M creatives of 500k products as well as their real impression and click data. Extensive experiments have also been conducted on both our dataset and public Mushroom dataset, demonstrating the effectiveness and generalizability of the proposed method. |
Title: A Linguistic Study on Relevance Modeling in Information Retrieval |
Authors: Yixing Fan (ict), Jiafeng Guo (ict), Xinyu Ma (ict), Ruqing Zhang (ict), Yanyan Lan (ict) and Xueqi Cheng (ict). |
Relevance plays a central role in information retrieval (IR), which has received extensive studies starting from the 20th century. The definition and modeling of relevance have always been critical challenges in both information science and computer science. Along with the debate and exploration on relevance, IR has already become a core task in many real- world applications, such as Web search engines, question answering systems, conversational bots and so on. While relevance acts as a unified concept in all these tasks, its specific definitions are in general considered different due to the heterogeneity of these retrieval problems. This raises a question to us: Do these different forms of relevance really lead to different modeling focuses? To answer this question, in this work, we conduct a quantitative analysis on relevance modeling in three representative IR tasks, i.e., document retrieval, answer retrieval, and response retrieval. Specifically, we attempt to study the following two questions: 1) Does relevance modeling in these tasks really show differences in terms of natural language understanding (NLU)? We employ 16 linguistic tasks to probe a unified retrieval model over these three retrieval tasks to answer this question. 2) If there do exist differences, how can we leverage the findings to enhance the relevance modeling? We proposed a parameter intervention method to improve the relevance models with our findings. We believe the way we study the problem as well as our findings would be beneficial to the IR community. |
Title: A Longitudinal Study of Removed Apps in iOS App Store |
Authors: Fuqi Lin (Peking University), Haoyu Wang (Beijing University of Posts and Telecommunications), Liu Wang (Beijing University of Posts and Telecommunications) and Xuanzhe Liu (Peking University). |
To improve app quality and nip the potential threats in the bud, modern app markets have released strict guidelines along with app vetting process before app publishing. However, there has been growing evidence showing the ineffectiveness of app vetting, making potentially harmful and policy-violation apps sneak into the market from time to time. Therefore, app removal is a common practice, and market maintainers have to remove undesired apps from the market periodically in a reactive manner. Although a number of reports and news media have mentioned removed apps, our research community still lacks the comprehensive understanding of the landscape of this kind of apps. To fill the void, in this paper, we present a large-scale and longitudinal study of removed apps in iOS app store. We first make great effort to record daily snapshot of iOS app store continuously in a span of 1.5 years. By comparing each two consecutive snapshots, we have collected the information of over 1 million removed apps with their accurate removed date. This comprehensive dataset enables us to characterize the overall landscape of removed apps. We observe that, although most of the removed apps are low-quality apps (e.g., outdated and abandoned), a number of the removed apps are quite popular. We further investigate the practical reasons leading to the removal of such popular apps, and observe that several interesting reasons, including ranking fraud, fake description, and content issues, etc. More importantly, most of these mis-behaviors can be reflected on app meta information including app description, app review and ASO keywords. It motivates us to design an automated approach to flagging the removed apps. Experiment result suggests that, even without accessing to the bytecode of mobile apps, we can identify the removed apps with good performance (F1=83%). Furthermore, we are able to flag the removed apps accurately 6 days in advance. To engage the community and facilitate further study along this direction, we will release our dataset to the research community. |
Title: A Model of Two Tales: Dual Transfer Learning Framework for Improved Long-tail Item Recommendation |
Authors: Yin Zhang (Texas A&M University), Derek Zhiyuan Cheng (Google), Tiansheng Yao (Google), Xinyang Yi (Google), Lichan Hong (Google) and Ed H. Chi (Google). |
Highly skewed long-tail item distribution is very common in recommendation systems. It significantly affects model performance, and especially on tail items. To improve tail-item recommendation, we conduct research to transfer knowledge from head items to tail items, leveraging the rich user feedback in head items and the semantic connections between head and tail items. Specifically, we propose a novel dual transfer learning framework that collaboratively learns the knowledge transfer from both model-level and item-level. The model-level knowledge transfer builds a generic meta- mapping of model parameters from few-shot to many-shot model. It captures the implicit data augmentation on the model level to improve the representation learning of tail items. The item-level transfer connects head and tail items through item-level features, to ensure a smooth transfer of meta-mapping from head items to tail items. The two types of transfers are incorporated to ensure the learned knowledge from head items can be well applied for tail item representation learning in the long-tail distribution settings. Through extensive experiments on two benchmark datasets, results show that our proposed dual transfer learning framework significantly outperforms other state-of-the-art methods for tail item recommendation in hit ratio and NDCG. It is also very encouraging that our framework further improves head items and overall performance on top of the gains on tail items. |
Title: A Multi-Agent Reinforcement Learning Framework for Intelligent Electric Vehicle Charging Recommendation |
Authors: Weijia Zhang (University of Science and Technology of China), Hao Liu (Business Intelligence Lab, Baidu Research, China), Fan Wang (Baidu, Inc.), Tong Xu (University of Science and Technology of China), Haoran Xin (University of Science and Technology of China), Dejing Dou (Baidu) and Hui Xiong (the State University of New Jersey). |
Electric Vehicle (EV) has become a preferable choice in the modern transportation system due to its environmental and energy sustainability. However, in many large cities, the EV drivers often fail to find the proper spots for charging, because of the limited charging infrastructures and the spatiotemporally unbalanced charging demands. Indeed, the recent emergence of deep reinforcement learning provides great potential to improve the charging experience from various aspects over a long-term horizon. In this paper, we propose a framework, named Multi-Agent Spatio-Temporal Reinforcement Learning (MAST), for intelligently recommending public accessible charging stations by jointly considering various long-term spatiotemporal factors. Specifically, by regarding each charging station as an individual agent, we formulate the problem as a multi-objective multi-agent reinforcement learning task. We first develop a multi-agent actor- critic framework with a centralized attentive critic to coordinate the recommendation between geo-distributed agents. Moreover, to quantify the influence of future potential charging competition, we introduce a delayed access strategy to integrate the unpredictable future charging competition. After that, to effectively optimize multiple learning objectives, we propose a multi-critic architecture with a dynamic gradient reweighting strategy to adaptively guide the optimization direction. Finally, extensive experiments on two real-world datasets demonstrate that MAST achieves the best comprehensive performance compared with several baseline approaches. |
Title: A Novel Macro-Micro Fusion Network for User Representation Learning on Mobile Apps |
Authors: Shuqing Bian (Renmin University of China), Xin Zhao (Renmin University of China, School of Information), Kun Zhou (Renmin University of China), Xu Chen (Renmin University of China), Jing Cai (Platform and Content Group, Tencent), Yancheng He (Platform and Content Group, Tencent), Xingji Luo (Platform and Content Group, Tencent) and Ji-Rong Wen (Renmin University of China). |
The evolution of mobile apps has greatly changed the way that we live. It becomes increasingly important to understand and model the users on mobile apps. Instead of focusing on some specific app alone, it has become a popular paradigm to study the user behavior on various mobile apps in a symbiotic environment. In this paper, we study the task of user representation learning with both macro and micro interaction data on mobile apps. In specific, macro and micro interaction refer to user-app interaction or user-item interaction on some specific app, respectively. By combining the two kinds of user data, it is expected to derive a more comprehensive, robust user representation model on mobile apps. In order to effectively fuse the information across the two views, we propose a novel macro-micro fusion network for user representation learning on mobile apps. With a Transformer architecture as the base model, we design a representation fusion component that is able to capture the category-based semantic alignment at the user level. After such semantic alignment, the information across the two views can be adaptively fused in our approach. Furthermore, we adopt mutual information maximization to derive a self-supervised loss to enhance the learning of our fusion network. Extensive experiments with three downstream tasks on two real-world datasets have demonstrated the effectiveness of our approach. |
Title: A Recommender System for Crowdsourcing Food Rescue Platforms |
Authors: Zheyuan Ryan Shi (Carnegie Mellon University), Leah Lizarondo (412 Food Rescue) and Fei Fang (Carnegie Mellon University). |
The challenges of food waste and insecurity arise in wealthy and developing nations alike, impacting millions of livelihoods. The ongoing pandemic only exacerbates the problem. A major force to combat food waste and insecurity, food rescue (FR) organizations match food donations to the non-profits that serve low-resource communities. Since they rely on external volunteers to pick up and deliver the food, some FRs use web-based mobile applications to reach the right set of volunteers. In this paper, we propose the first machine learning based model to improve volunteer engagement in the food waste and security domain. We (1) develop a recommender system to send push notifications to the most likely volunteers for each given rescue, (2) leverage a mathematical programming based approach to diversify our recommendations, and (3) propose an online algorithm to dynamically select the volunteers to notify without the knowledge of future rescues. Our recommendation system improves the hit ratio from 44% achieved by the previous method to 73%. A pilot study of our method is scheduled to take place in the near future. |
Title: A Scalable, Adaptive and Sound Nonconvex Regularizer for Low-rank Matrix Completion |
Authors: Yaqing Wang (Baidu Research), Quanming Yao (4Paradigm) and James Kwok (Hong Kong University of Science and Technology). |
Many real-world applications such as collaborative filtering and text mining can be formulated as a low-rank matrix completion problems, which recovers incomplete matrix using low-rank assumptions. To ensure that the matrix solution has a low rank, a recent trend is to use nonconvex regularizers that adaptively penalize singular values. They offer good recovery performance and have nice theoretical properties, but are computationally expensive due to repeated access to individual singular values. In this paper, based on the key insight that adaptive shrinkage on singular values improve empirical performance, we propose a new nonconvex low-rank regularizer called "nuclear norm minus Frobenius norm" regularizer, which is scalable, adaptive and sound. We first show it provably holds the adaptive shrinkage property. Further, we discover its factored form which bypasses the computation of singular values and allows fast optimization by general optimization algorithms. Stable recovery and convergence are guaranteed. Extensive experiments on both synthetic data and real-world recommendation and climate record data sets show that the proposed method obtains state-of-the-art recovery performance while being the fastest in comparison to existing low-rank methods. |
Title: A Targeted Attack on Black-Box Neural Machine Translation with Parallel Data Poisoning |
Authors: Chang Xu (University of Melbourne), Jun Wang (University of Melbourne), Yuqing Tang (Facebook AI), Francisco Guzman (Facebook AI), Benjamin Rubinstein (University of Melbourne) and Trevor Cohn (University of Melbourne). |
As modern neural machine translation (NMT) systems are now in widespread deployment, their security vulnerabilities require close scrutiny. Most recently, NMT systems are found to suffer from targeted attacks which can cause them to produce specific, unsolicited, and even harmful translations. Such vulnerability is typically exploited in a white-box analysis of a known target system, where adversarial inputs causing targeted translations are discovered. However, this approach is less viable when the target system is black-box and unknown to the public (e.g., secured commercial systems). In this paper, we show that targeted attacks on black-box NMT systems are feasible, based on poisoning a small fraction of their parallel training data. We show that this attack can be achieved simply through targeted corruption of web documents which are crawled to form the system's training data. We then analyse the effectiveness of poisoning two common NMT training scenarios, including the one-off training and pre-train & fine-tune paradigms. Our findings are alarming: even on the state-of-the-art systems trained with massive parallel data (tens of millions), the attacks are still successful (over 50% success rate) with only a 0.006% poisoning rate. Lastly, we discuss available defences to counter such attacks. |
Title: A Trigger-Sense Memory Flow Framework for Joint Entity and Relation Extraction |
Authors: Yongliang Shen (Zhejiang University), Xinyin Ma (Zhejiang University), Yechun Tang (Zhejiang University) and Weiming Lu (Zhejiang University). |
Joint entity and relation extraction framework constructs a unified model to perform entity recognition and relation extraction simultaneously, which can exploit the dependency between the two tasks to mitigate the error propagation problem suffered by the pipeline model. Current efforts on joint entity and relation extraction focus on enhancing the interaction between entity recognition and relation extraction through parameter sharing, joint decoding, or other ad- hoc tricks (e.g., modeled as a semi-Markov decision process, cast as a multi-round reading comprehension task). However, there are still two issues on the table. First, the interaction utilized by most methods is still weak and uni- directional, which is unable to model the mutual dependency between the two tasks. Second, relation triggers are ignored by most methods, which can help explain why humans would extract a relation in the sentence. They're essential for relation extraction but overlooked. To this end, we present a Trigger-Sense Memory Flow Framework (TriMF) for joint entity and relation extraction. We build a memory module to remember category representations learned in entity recognition and relation extraction tasks. And based on it, we design a multi-level memory flow attention mechanism to enhance the bi-directional interaction between entity recognition and relation extraction. Moreover, without any human annotations, our model can enhance relation trigger information in a sentence through a trigger sensor module, which improves the model performance and makes model predictions with better interpretation. Experiment results show that our proposed framework achieves state-of-the-art results by improves the relation F1 to 52.44% (+3.2%) on SciERC, 66.49% (+4.9%) on ACE05, 72.35% (+0.6%) on CoNLL04 and 80.66% (+2.3%) on ADE. |
Title: A Workflow Analysis of Context-driven Conversational Recommendation |
Authors: Shengnan Lyu (University of Toronto), Arpit Rana (University of Toronto), Scott Sanner (University of Toronto) and Mohamed Reda Bouadjenek (Deakin University). |
A number of recent works have made seminal contributions to the understanding of user intent and recommender interaction in conversational recommendation. However, to date, these studies have not focused explicitly on context- driven interaction that underlies typical use of more pervasive (non-recommendation) QA conversational assistants like Amazon Alexa, Apple Siri, and Google Assistant. In this paper, we aim to understand a general workflow of natural context-driven conversational recommendation that arises from a pairwise study of a human user interacting with a human simulating the role of a recommender. In our analysis of this intrinsically organic human-to-human conversation, we observe a clear structure of interaction workflow consisting of a preference elicitation and refinement stage, followed by inquiry and critiquing stages after the first recommendation. To better understand the nature of these stages and the conversational flow within them, we augment existing taxonomies of intent and action to label all interactions at each stage and analyze the workflow. From this analysis, we identify distinct conversational characteristics of each stage, e.g., (i) the preference elicitation stage consists of significant iteration to clarify, refine, and obtain a mutual understanding of preferences, (ii) the inquiry and critiquing stage consists of extensive informational queries to understand features of the recommended item and to (implicitly) specify critiques, and (iii) explanation appears to drive a substantial portion of the post-recommendation interaction, suggesting that beyond the purpose of justification, explanation serves a critical role to direct the evolving conversation itself. Altogether, we contribute a novel qualitative and quantitative analysis of workflow in conversational recommendation that further refines our existing understanding of this important frontier of conversational systems and suggests a number of critical avenues for further research to better automate natural recommendation conversations. |
Title: Adapting to Context-Aware Knowledge in Natural Conversation for Multi-Turn Response Selection |
Authors: Chen Zhang (Alibaba Group), Hao Wang (AI Labs Alibaba Group), Feijun Jiang (Alibaba Group) and Hongzhi Yin (The University of Queensland). |
Virtual assistants aim to build a human-like conversational agent. However, current human-machine conversations still cannot make users feel intelligent enough to build a continued dialog over time. Some responses from agents are usually inconsistent, uninformative, less-engaging and even memoryless. In recent years, most researchers have tried to employ conversation context and external knowledge, e.g. wiki pages and knowledge graphs, into the model which only focuses on solving some special conversation problems in local perspectives. Few researchers are dedicated to the whole capability of the conversational agent which is endowed with abilities of not only passively reacting the conversation but also proactively leading the conversation. In this paper, we first explore the essence of conversations among humans by analyzing real dialog records. We find that some conversations revolve around the same context and topic, and some require additional information or even move on to a new topic. Base on that, we conclude three conversation modes shown in Figure 1 and try to solve how to adapt to them for a continuous conversation. To this end, we define ``Adaptive Knowledge-Grounded Conversations'' (AKGCs) where the knowledge is to ground the conversation within a multi-turn context by adapting to three modes. To achieve AKGC, a model called MNDB is proposed to Model Natural Dialog Behaviors for multi-turn response selection. To ensure a consistent response, MNDB constructs a multi-turn context flow. Then, to mimic user behaviors of incorporating knowledge in natural conversations, we design a ternary-grounding network along with the context flow. In this network, to gain the ability to adapt to diversified conversation modes, we exploit multi-view semantical relations among response candidates, context and knowledge. Thus, three adaptive matching signals are extracted for final response selection. Evaluation results on two benchmarks indicate that MNDB can significantly outperform state-of-the-art models. |
Title: Advanced Semantics for Commonsense Knowledge Extraction |
Authors: Tuan-Phong Nguyen (Max Planck Institute for Informatics), Simon Razniewski (Max Planck Institute for Informatics) and Gerhard Weikum (Max Planck Institute for Informatics). |
Structured commonsense knowledge about concepts and their properties is a foundational asset for building reliable AI applications. Previous projects like ConceptNet, TupleKB or Quasimodo have compiled substantial collections of such knowledge, yet have restrictions in terms of a simple datamodel, and limited precision and/or recall. In this paper we present a methodology and resource called ASCENT, advanced semantics for commmonsense knowledge extraction. ASCENT advances knowledge representation in CSKBs by enabling subgroups and aspects of concepts to be subjectss. Beyond existing quantitative rankings of KB assertions, ASCENT introduces also the notion of qualitative facets of assertions, allowing to capture, for instance, the location or time during which an assertion is true, or truth modifiers such as occasionally or frequently. Although these components are in principle known in knowledge representation and semantic role labelling, to our knowledge, this is the first time that such knowledge in this format is actually acquired at scale for commonsense. The extraction approach of ASCENT relies on a combination of automatically scoped web document retrieval and filtering, dependency-based open information extraction, and a consolidation stage relying on pretrained language models. Intrinsic and extrinsic evaluation point at the superior precision and recall of ASCENT. A web interface, data and code can be found at https://ascentkb.herokuapp.com. |
Title: Adversarial and Contrastive Variational Autoencoder for Sequential Recommendation |
Authors: Zhe Xie (Shanghai Jiao Tong University), Chengxuan Liu (Shanghai Jiao Tong University), Yichi Zhang (Shanghai Jiao Tong University), Hongtao Lu (Shanghai Jiao Tong University), Dong Wang (Shanghai Jiao Tong University) and Yue Ding (Shanghai Jiao Tong University). |
Sequential recommendation as an emerging topic has attracted increasing attention due to its important practical significance. Models based on deep learning and attention mechanisms have achieved good performance in sequential recommendation. Recently, the generative models based on Variational Autoencoder (VAE) show the unique advantage in collaborative filtering. In particular, the Sequential VAE model as a recurrent version of VAE can effectively capture temporal dependencies among items in user sequence and perform sequential recommendation. However, VAE-based models suffer from a common limitation that the expression ability of the obtained approximate posterior distribution is limited, resulting in lower quality of generated samples. This is especially true when making sequence generation. To solve the above problem, in this work, we propose a novel method called Adversarial and Contrastive Variational Autoencoder (ACVAE) for sequential recommendation. Specifically, we first employ the contrastive loss. The latent variables will be able to learn more personalized characteristics for different users by maximizing the mutual information between input sequences and latent variables. Then, we introduce the adversarial training for sequence generation under the Adversarial Variational Bayes framework, which enables our model to generate high-quality latent variables. Besides, we apply a convolutional layer to capture local relationships between adjacent latent variables of items in the sequence. Finally, we conduct extensive experiments on three real-world datasets. The experimental results show that our proposed ACVAE model outperforms other state-of-the-art methods. |
Title: Adversarial Item Promotion: Vulnerabilities at the Core of Top-N Recommenders that Use Images to Address Cold Start |
Authors: Zhuoran Liu (Radboud University) and Martha Larson (Radboud University and Delft University of Technology). |
E-commerce platforms provide their customers with ranked lists of recommended items matching the customers’ preferences. Merchants on e-commerce platforms would like their items to appear as high as possible in the top-N of these ranked lists. In this paper, we demonstrate how unscrupulous merchants can create item images that artificially promote their products, improving their rankings. Recommender systems that use images to address the cold start problem are vulnerable to this security risk. We describe a new type of attack, Adversarial Item Promotion (AIP), that strikes directly at the core of Top-N recommenders: the ranking mechanism itself. Existing work on adversarial images in recommender systems investigates the implications of conventional attacks, which target deep learning classifiers. In contrast, our AIP attacks are embedding attacks that seek to push features representations in a way that fools the ranker (not a classifier) and directly lead to item promotion. We introduce three AIP attacks insider attack, expert attack, and semantic attack, which are defined with respect to three successively more realistic attack models. Our experiments evaluate the danger of these attacks when mounted against three representative visually-aware recommender algorithms in a framework that uses images to address cold start. We also evaluate potential defenses, including adversarial training and find that common, currently-existing, techniques do not eliminate the danger of AIP attacks. In sum, we show that using images to address cold start opens recommender systems to potential threats with clear practical implications. |
Title: AID: Active Distillation Machine to Leverage Pre-Trained Black-Box Models in Private Data Settings |
Authors: Nghia Hoang (MIT-IBM Watson AI Lab, IBM Research), Shenda Hong (Peking University), Cao Xiao (IQVIA), Bryan Low (National University of Singapore) and Jimeng Sun (Georgia Institute of Technology). |
This paper presents an active distillation method for a local institution (e.g., hospital) to find the best queries within its given budget to distill an on-server black-box model’s predictive knowledge into a local surrogate with transparent parameterization. This allows local institutions to understand better the predictive reasoning of the black-box model in its own local context or to further customize the distilled knowledge with its private dataset that cannot be centralized and fed into the server model. The proposed method thus addresses several challenges of deploying machine learning (ML) in many industrial settings (e.g., healthcare analytics) with strong proprietary constraints. These include: (1) the opaqueness of the server model’s architecture which prevents local users from understanding its predictive reasoning in their local data contexts; (2) the increasing cost and risk of uploading local data on the cloud for analysis; and (3) the need to customize the server model with private onsite data. We evaluated the proposed method on both benchmark and real- world healthcare data where significant improvements over existing local distillation methods were observed. A theoretical analysis of the proposed method is also presented. |
Title: An Adversarial Transfer Network for Knowledge Representation Learning |
Authors: Huijuan Wang (Sun Yat-sen University), Shuangyin Li (Department of Computer Science, South China Normal University) and Rong Pan (Sun Yat-sen University). |
Knowledge representation learning has received a lot of attention in the past few years. The success of existing methods heavily relies on the quality of knowledge graphs. The entities with few triplets tend to be learned with less expressive power. Fortunately, there are many knowledge graphs constructed from various sources, the representations of which could contain much information. We propose an adversarial embedding transfer network ATransN, which transfers knowledge from one or more teacher knowledge graphs to a target one through an aligned entity set without explicit data leakage. Specifically, we add soft constraints on aligned entity pairs and neighbours to the existing knowledge representation learning methods. To handle the problem of possible distribution differences between teacher and target knowledge graphs, we introduce an adversarial adaption module. The discriminator of this module evaluates the degree of consistency between the embeddings of an aligned entity pair. The consistency score is then used as the weights of soft constraints. It is not necessary to acquire the relations and triplets in teacher knowledge graphs because we only utilize the entity representations. Knowledge graph completion results show that ATransN achieves better performance against baselines without transfer on three datasets, CN3l, WK3l, and DWY100k. The ablation study demonstrates that ATransN can bring steady and consistent improvement in different settings. The extension of combining other knowledge graph embedding algorithms and the extension with three teacher graphs display the promising generalization of the adversarial transfer network. |
Title: An Alternative Cross Entropy Loss for Learning-to-Rank |
Authors: Sebastian Bruch (Google). |
Listwise learning-to-rank methods form a powerful class of ranking algorithms that are widely adopted in applications such as information retrieval. These algorithms learn to rank a set of items by optimizing a loss that is a function of the entire set---as a surrogate to a typically non-differentiable ranking metric. Despite their empirical success, existing listwise methods are based on heuristics and remain theoretically ill-understood. In particular, none of the empirically-successful loss functions are related to ranking metrics. In this work, we propose a cross entropy-based learning-to-rank loss function that is theoretically sound, is a convex bound on NDCG---a popular ranking metric---and is consistent with NDCG under learning scenarios common in information retrieval. Furthermore, empirical evaluation of an implementation of the proposed method with gradient boosting machines on benchmark learning-to-rank datasets demonstrates the superiority of our proposed formulation over existing algorithms in quality and robustness. |
Title: An Empirical Study of Real-World WebAssembly Binaries: Security, Languages, Use Cases |
Authors: Aaron Hilbig (University of Stuttgart), Daniel Lehmann (University of Stuttgart) and Michael Pradel (University of Stuttgart). |
WebAssembly has emerged as a low-level language for the web and beyond. Despite its popularity in different domains, little is known about WebAssembly binaries that occur in the wild. This paper presents a comprehensive empirical study of 8,461 unique WebAssembly binaries gathered from a wide range of sources, including source code repositories, package managers, and live websites. We study the security properties, source languages, and use cases of the binaries and how they influence the security of the WebAssembly ecosystem. Our findings update some previously held assumptions about real-world WebAssembly and highlight problems that call for future research. For example, we show that vulnerabilities that propagate from insecure source languages potentially affect a wide range of binaries (e.g., two thirds of the binaries are compiled from memory unsafe languages, such as C and C++) and that 21% of all binaries import potentially dangerous APIs from their host environment. We also show that cryptomining, which once accounted for the majority of all WebAssembly code, has been marginalized (less than 1% of all binaries found on the web) and gives way to a diverse set of use cases. Finally, 29% of all binaries on the web are minified, calling for techniques to decompile and reverse engineer WebAssembly. Overall, our results show that WebAssembly has left its infancy and is growing up into a language that powers a diverse ecosystem, with new challenges and opportunities for security researchers and practitioners. Besides these insights, we also share the dataset underlying our study, which is 58x larger than the largest previously reported benchmark. |
Title: An Experimental Study to Understand User Experience and Perception Bias Occurred by Fact-checking Messages |
Authors: Sungkyu Park (Institute for Basic Science (IBS)), Jaimie Yejean Park (Samsung Electronics), Hyojin Chin (Institute for Basic Science (IBS)), Jeong-Han Kang (Yonsei University) and Meeyoung Cha (Institute for Basic Science (IBS)). |
Fact-checking has become the de facto solution for fighting fake news online. This research brings attention to the unexpected and diminished effect of fact-checking due to cognitive biases. We experimented (66,870 decisions) comparing the change in users' stance toward unproven claims before and after being presented with a hypothetical fact- checked condition. The current study shows that claims marked with the 'Lack of Evidence' label are perceived similarly as false information unlike other borderline labels such as 'Mixed Evidence' or 'Divided Evidence,' which indicates the uncertainty-aversion bias in response to insufficient information. Next, users who initially show disapproval toward a claim are less likely to correct their views later than those who initially approve of the same claim when opposite fact- checking labels are shown - an indication of disapproval bias. On average, we confirm that fact-checking helps users correct their views and reduces the circulation of falsehoods by leading them to abandon extreme views. Despite the positive role, the presence of two biases, uncertainty-aversion and disapproval bias, unveils that fact-checking does not always produce the desired user experience and that the outcome varies by the design of fact-checking messages and people's initial view. These new observations have direct implications for multiple stakeholders, including platforms, policy-makers, and online users. |
Title: An Investigation of Identity-Account Inconsistency in Single Sign-On |
Authors: Guannan Liu (University of Delaware), Xing Gao (University of Delaware) and Haining Wang (University of Delaware). |
Single Sign-On (SSO) has been widely adopted for online authentication due to its favorable usability and security. However, it also introduces a single point of failure since all service providers fully trust the identity of a user created by the SSO identity provider. In this paper, we investigate the identity-account inconsistency threat, a new SSO vulnerability that can cause the compromise of online accounts. The vulnerability exists because current SSO systems highly rely on a user’s email address to bind an account with a real identity, but ignore the fact that email addresses might be reused by other users. We reveal that under the SSO authentication, such inconsistency allows an adversary controlling a reused email address to take over associated online accounts without knowing any credentials like passwords. Specifically, we first conduct a measurement study on the account management policies for multiple cloud email providers, showing the feasibility of acquiring previously used email accounts. We further perform a systematic study on 100 popular websites using the Google business email service with our own domain address and demonstrate that most online accounts can be compromised by exploiting this inconsistency vulnerability. To shed light on email reuse in the wild, we analyze the commonly used naming conventions that lead to a wide existence of potential email address collisions, and conduct a case study on the account policies of U.S. universities. Finally, we propose several useful practices for end-users, service providers, and identity providers to protect against this identity-account inconsistency threat. |
Title: Assessing the Effects of Friend-to-Friend Texting on Turnout in the 2018 US Midterm Elections |
Authors: Aaron Schein (Columbia University), Keyon Vafa (Columbia University), Dhanya Sridhar (Columbia University), Victor Veitch (Columbia University), Jeffrey Quinn (PredictWise), James Moffet (JDM Design), David M. Blei (Columbia University) and Donald P. Green (Columbia University). |
Recent mobile app technology lets people systematize the process of messaging their friends to urge them to vote. Prior to the most recent US midterm elections in 2018, the mobile app Outvote randomized an aspect of their system, hoping to unobtrusively assess the causal effect of their users' messages on voter turnout. However, properly assessing this causal effect is hindered by multiple statistical challenges, including attenuation bias due to mismeasurement of subjects' outcomes and low precision due to two-sided non-compliance with subjects' assignments. We address these challenges, which are likely to impinge upon any study that seeks to randomize authentic friend-to-friend interactions, by tailoring the statistical methodology to additional data available in this case about both users and subjects. Using meta-data of users' in-app behavior, we reconstruct subjects' positions in users' queues. We use this information to refine the study population to more compliant subjects who were higher in the queues, and we do so in a systematic way which optimizes a proxy for the study's power. To mitigate attenuation bias, we then use ancillary data of subjects' matches to the voter rolls that lets us refine the study population to one with low rates of outcome mismeasurement. Our analysis reveals statistically significant treatment effects from friend-to-friend mobilization efforts (CACE = 8.3, CI = (1.2, 15.3)) that are among the largest reported in the get-out-the-vote (GOTV) literature. While social pressure from friends has long been conjectured to play a role in effective GOTV treatments, the present study is among the first to assess these effects experimentally. |
Title: ATJ-Net: Auto-Table-Join Network for Automatic Learning on Relational Databases |
Authors: Jinze Bai (Peking University), Jialin Wang (Peking University), Zhao Li (Alibaba Group), Donghui Ding (Alibaba Group), Ji Zhang (The University of Southern Queensland) and Jun Gao (Peking University). |
A relational database, consisting of multiple tables, provides heterogeneous information across various entities, widely used in real-world services. This paper studies the supervised learning task on multiple tables, aiming to predict one label column with the help of multiple-tabular data. However, classical ML techniques mainly focus on single-tabular data. Multiple-tabular data refers to many-to-many mapping among joinable attributes and n-ary relations, which cannot be utilized directly by classical ML techniques. Besides, current graph techniques, like heterogeneous information network (HIN) and graph neural networks (GNN), are infeasible to be deployed directly and automatically in a multi-table environment, which limits the learning on databases. For automatic learning on relational databases, we propose an auto-table-join network (ATJ-Net). Multiple tables with relationships are considered as a hypergraph, where vertices are joinable attributes and hyperedges are tuples of tables. Then, ATJ-Net builds a graph neural network on the heterogeneous hypergraph, which samples and aggregates the vertices and hyperedges on n-hop sub-graphs as the receptive field. In order to enable ATJ-Net to be automatically deployed to different datasets and avoid the "no free lunch" dilemma, we use random architecture search to select optimal aggregators and prune redundant paths in the network. For verifying the effectiveness of our methods across various tasks and schema, we conduct extensive experiments on 4 tasks, 8 various schemas, and 19 sub-datasets w.r.t. citing prediction, review classification, recommendation, and task-blind challenge. ATJ-Net achieves the best performance over state-of-the-art approaches on three tasks and is competitive with KddCup Winner solution on task- blind challenge. Besides, our approach achieves automatic deployment for databases of satisfying the first normal form (1NF). |
Title: ATTENT: Active Attributed Network Alignment |
Authors: Qinghai Zhou (University of Illinois at Urbana-Champaign), Liangyue Li (Amazon), Xintao Wu (University of Arkansas), Nan Cao (Tongji University), Lei Ying (University of Michigan, Ann Arbor) and Hanghang Tong (University of Illinois at Urbana-Champaign). |
Network alignment finds node correspondences across multiple networks, where the alignment accuracy is of crucial importance because of its profound impact on downstream applications. The vast majority of existing works focus on how to best utilize the topology and attribute information of the input networks as well as the anchor links when available. Nonetheless, it has not been well studied on how to boost the alignment performance through actively obtaining high-quality and informative anchor links, with a few exceptions. The sparse literature on active network alignment introduces the human in the loop to label some seed node correspondence (i.e., anchor links), which are informative from the perspective of querying the most uncertain node given few potential matchings. However, the direct influence of the intrinsic network attribute information on the alignment results has largely remained unknown. In this paper, we tackle this challenge and propose an active network alignment method (Attent) to identify the best nodes to query. The key idea of the proposed method is to leverage effective and efficient influence functions defined over the alignment solution to evaluate the goodness of the candidate nodes for query. Our proposed query strategy bears three distinct advantages, including (1) effectiveness, being able to accurately quantify the influence of the candidate nodes on the alignment results; (2) efficiency, scaling linearly with 15 − 17× speed-up over the straightforward implementation without any quality loss; (3) generality, consistently improving alignment performance of a variety of network alignment algorithms. |
Title: Auction Design for ROI-Constrained Buyers |
Authors: Negin Golrezaei (MIT), Ilan Lobel (NYU) and Renato Paes Leme (Google). |
We combine theory and empirics to (i) show that some buyers in online advertising markets are nancially constrained and (ii) demonstrate how to design auctions that take into account such nancial constraints. We use data from a eld experiment where reserve prices were randomized on Google’s advertising exchange (AdX). We nd that, contrary to the predictions of classical auction theory, a signicant set of buyers lowers their bids when reserve prices go up. We show that this behavior can be explained if we assume buyers have constraints on their minimum return on investment (ROI). We proceed to design auctions for ROI-constrained buyers. We show that optimal auctions for symmetric ROI- constrained buyers are either second-price auctions with reduced reserve prices or subsidized second-price auctions. For asymmetric buyers, the optimal auction involves a modication of virtual values. Going back to the data, we show that using ROI-aware optimal auctions can lead to large revenue gains and large welfare gains for buyers. |
Title: Auditing for Discrimination in Algorithms Delivering Job Ads |
Authors: Basileal Imana (University of Southern California), Aleksandra Korolova (University of Southern California) and John Heidemann (USC/ISI). |
Platforms such as Facebook and Google support targeted advertising, promising good value for advertisers. However, multiple studies have shown that such platforms can create skewed outcomes: their ad-delivery algorithms can produce outcomes that are skewed by gender or race, sometimes due to hidden factors not explicitly requested by the advertiser. In this work, we focus on developing a methodology for measuring potential skew in the delivery of job advertisements. A challenge in studying skew of job ads is distinguishing between skew due to a difference in qualification among the underlying audience and skew due to other factors such as the ad platform’s optimization for engagement or changes in the on-line audience. Our work provides a novel methodology that controls for job qualification and audience with paired, concurrent ads and careful statistical tests. We apply our algorithm to two prominent platforms for job ads: Facebook and LinkedIn, confirming skew by gender in Facebook’s results and failing to find skew in LinkedIn’s results. We further examine LinkedIn and show that they do not optimize on the professional background of users unless explicitly requested by the ad purchaser—a choice that reduces the possibility of discriminatory outcomes. Finally, we suggest improvements ad platforms could introduce to make external auditing more efficient and accurate. |
Title: Autodidactic Neurosurgeon: Collaborative Deep Inference for Mobile Edge Intelligence via Online Learning |
Authors: Letian Zhang (University of Miami), Lixing Chen (University of Miami) and Jie Xu (University of Miami). |
Recent breakthroughs in deep learning (DL) have led to the emergence of many intelligent mobile applications and services, but in the meanwhile also pose unprecedented computing challenges on resource-constrained mobile devices. This paper builds a collaborative deep inference system between a resource-constrained mobile device and a powerful edge server, aiming at joining the power of both on-device processing and computation offloading. The basic idea of this system is to partition a deep neural network (DNN) into a front-end part running on the mobile device and a back-end part running on the edge server, with the key challenge being how to locate the optimal partition point to minimize the end-to-end inference delay. Unlike existing efforts on DNN partitioning that rely heavily on a dedicated offline profiling stage to search for the optimal partition point, our system has a built-in online learning module, called Autodidactic Neurosurgeon (ANS), to automatically learn the optimal partition point on-the-fly. Therefore, ANS is able to closely follow the changes of the system environment by generating new knowledge for adaptive decision making. The core of ANS is a novel contextual bandit learning algorithm, called $mu$LinUCB, which not only has provable theoretical learning performance guarantee but also is ultra-lightweight for easy real-world implementation. We implement our system on a video stream object detection testbed to validate the design of ANS and evaluate its performance. The experiments show that ANS significantly outperforms state-of-the-art benchmarks in terms of tracking system changes and reducing the end-to-end inference delay. |
Title: Automated Creative Optimization for E-Commerce Advertising |
Authors: Jin Chen (University of Electronic Science and Technology of China), Ju Xu (Alibaba Group), Gangwei Jiang (University of Science and Technology of China), Tiezheng Ge (Alibaba Group), Zhiqiang Zhang (Alibaba Group), Defu Lian (University of Science and Technology of China) and Kai Zheng (University of Electronic Science and Technology of China). |
Advertising creatives are ubiquitous in E-commerce advertisements and aesthetic creatives may improve the click- through rate (CTR) of the products. Nowadays smart advertisement platforms provide the function of compositing creatives based on source materials provided by advertisers. Since a great number of creatives can be generated, it is difficult to accurately predict their CTR given a limited amount of feedback. Factorization machine (FM), which models inner product interaction between features, can be applied for the CTR prediction of creatives. However, interactions between creative elements may be more complex than the inner product, and the FM-estimated CTR may be of high variance due to limited feedback. To address these two issues, we propose an Automated Creative Optimization (AutoCO) framework to model complex interaction between creative elements and to balance between exploration and exploitation. Specifically, motivated by AutoML, we propose one-shot search algorithms for searching effective interaction functions between elements. We then develop stochastic variational inference to estimate the posterior distribution of parameters based on the reparameterization trick, and apply Thompson Sampling for efficiently exploring potentially better creatives. We evaluate the proposed method with both a synthetic dataset and two public datasets. The experimental results show our method can outperform competing baselines with respect to cumulative regret. The online A/B test shows our method leads to a 7 increase of CTR compared to the baseline. |
Title: Automatic Intent-Slot Induction for Dialogue Systems |
Authors: Zengfeng Zeng (Ping An Life Insurance of China, Ltd), Dan Ma (Ping An Life Insurance of China, Ltd), Haiqin Yang (Ping An Life Insurance of China, Ltd), Zhen Gou (Ping An Life Insurance of China, Ltd) and Jianping Shen (Ping An Life Insurance of China, Ltd). |
Automatically and accurately identifying user intents and filling the associated slots from their spoken language are critical to the success of dialogue systems. Traditional methods require manually defining the DOMAIN-INTENT-SLOT schema and asking many domain experts to annotate the corresponding utterances, upon which neural models are trained. This procedure brings the challenges of information sharing hindering, out-of-schema, or data sparsity in open domain dialogue systems. To tackle these challenges, we explore a new task of automatic intent-slot induction and propose a novel domain-independent tool. That is, we design a coarse-to-fine three-step procedure including Role- labeling, Concept-mining, And Pattern-mining (RCAP): (1) role-labeling: extracting key phrases from users' utterances and classifying them into a quadruple of coarsely-defined intent-roles via sequence labeling; (2) concept-mining: clustering the extracted intent-role mentions and naming them into abstract fine-grained concepts; (3) pattern-mining: applying the Apriori algorithm to mine intent-role patterns and automatically inferring the intent-slot using these coarse-grained intent-role labels and fine-grained concepts. Empirical evaluations on both real-world in-domain and out-domain datasets show that: (1) our RCAP can generate satisfactory SLU schema and outperforms the state-of-the-art supervised learning method; (2) our RCAP can directly be applied to out-of-domain datasets and gain at least 76% improvement of F1-score on intent detection and 41% improvement of F1-score on slot filling; (3) our RCAP exhibits its power in generic intent-slot extractions with less manual effort, which opens pathways for schema induction on new domains and unseen intent-slot discovery for generalizable dialogue systems. |
Title: AutoSTG: Neural Architecture Search for Predictions of Spatio-Temporal Graph |
Authors: Zheyi Pan (Shanghai Jiao Tong University), Songyu Ke (Shanghai Jiao Tong University), Xiaodu Yang (Southwest Jiaotong University), Yuxuan Liang (National University of Singapore), Yong Yu (Shanghai Jiao Tong University), Junbo Zhang (JD Intelligent Cities Research) and Yu Zheng (JD Finance). |
Spatio-temporal graphs are important structures to describe urban sensory data, e.g., traffic speed and air quality. Predicting over spatio-temporal graphs enables many essential applications in intelligent cities, such as traffic management and environment analysis. Recently, many deep learning models have been proposed for spatio-temporal graph prediction and achieved significant results. However, designing neural networks requires rich domain knowledge and expert efforts. To this end, we study automated neural architecture search for spatio-temporal graphs with the application to urban traffic prediction, which meets two challenges: 1) how to define search space for capturing complex spatio-temporal correlations; and 2) how to learn network weight parameters related to the corresponding attributed graph of a spatio-temporal graph. To tackle these challenges, we propose a novel framework, entitled AutoSTG, for automated spatio-temporal graph prediction. In our AutoSTG, spatial graph convolution and temporal convolution operations are adopted in our search space to capture complex spatio-temporal correlations. Besides, we employ the meta learning technique to learn the adjacency matrices of spatial graph convolution layers and kernels of temporal convolution layers from the meta knowledge of the attributed graph. And specifically, such meta knowledge is learned by a graph meta knowledge learner that iteratively aggregates knowledge on the attributed graph. Finally, extensive experiments were conducted on two real-world benchmark datasets to demonstrate that AutoSTG can find effective network architectures and achieve state- of-the-art results. To the best of our knowledge, we are the first to study neural architecture search for spatio-temporal graph. |
Title: BBR Bufferbloat in DASH Video |
Authors: Santiago Vargas (Stony Brook University), Rebecca Drucker (Stony Brook University), Aiswarya Renganathan (Stony Brook University), Aruna Balasubramanian (Stony Brook University) and Anshul Gandhi (Stony Brook University). |
BBR is a new congestion control algorithm and is seeing increased adoption especially for video traffic. BBR solves the bufferbloat problem in legacy loss-based congestion control algorithms where application performance drops considerably when router buffers are deep. BBR regulates traffic such that router queues don't build up to avoid the bufferbloat problem while still maintaining high throughput. However, our analysis shows that video applications experience significantly poor performance when using BBR under deep buffers. In fact, we find that video traffic sees inflated latencies because of long queues at the router, ultimately degrading video performance. To understand this dichotomy, we study the interaction between BBR and DASH video. Our investigation reveals that BBR under deep buffers and high network burstiness severely overestimates available bandwidth and does not converge to steady state, both of which results in BBR sending substantially more data into the network, causing a queue buildup. This elevated packet sending rate under BBR is ultimately caused by the router's ability to absorb bursts in traffic, which destabilizes BBR's bandwidth estimation and overrides BBR's expected logic for exiting the startup phase. We design a new bandwidth estimation algorithm and apply it to BBR (and a still-unreleased newer version of BBR called BBR2). Our modified BBR and BBR2 both see significantly improved video QoE even under deep buffers. |
Title: Beyond I.I.D.: Three Levels of Generalization for Question Answering on Knowledge Bases |
Authors: Yu Gu (The Ohio State University), Sue Kase (U.S. Army Research Laboratory), Michelle Vanni (U.S. Army Research Laboratory), Brian Sadler (U.S. Army Research Laboratory), Percy Liang (Stanford University), Xifeng Yan (University of California, Santa Barbara) and Yu Su (The Ohio State University). |
Existing studies on question answering on knowledge bases (KBQA) mainly operate with the standard i.i.d assumption, i.e., training distribution over questions is the same as the test distribution. However, i.i.d may be neither reasonably achievable nor desirable on large-scale KBs because 1) true user distribution is hard to capture and 2) randomly sample training examples from the enormous space would be highly data-inefficient. Instead, we suggest that KBQA models should have three levels of built-in generalization: i.i.d, compositional, and zero-shot. To facilitate the development of KBQA models with stronger generalization, we construct and release a new large-scale, high-quality dataset with 64,495 questions, GrailQA, and provide evaluation settings for all three levels of generalization. In addition, we propose a novel BERT-based KBQA model. The combination of our dataset and model enables us to thoroughly examine and demonstrate, for the first time, the key role of pre-trained contextual embeddings like BERT in the generalization of KBQA. |
Title: Beyond Outlier Detection: Interpreting Outliers by Attention-Guided Triplet Deviation Network |
Authors: Hongzuo Xu (National University of Defense Technology), Yijie Wang (National University of Defense Technology), Songlei Jian (National University of Defense Technology), Zhenyu Huang (National University of Defense Technology), Yongjun Wang (National University of Defence Technology), Ning Liu (National University of Defense Technology) and Fei Li (Alibaba Cloud Computing Co. Ltd.). |
Outlier detection is an important task in many domains and is intensively studied in the past decade. Further, how to explain outliers, i.e., outlier interpretation, is more significant, which can provide valuable insights for analysts to better understand, solve, and prevent these detected outliers. However, only limited studies consider this problem. Most of the existing methods are based on the score-and-search manner. They select a feature subspace as interpretation per queried outlier by estimating outlying scores of the outlier in searched subspaces. Due to the tremendous searching space, they have to utilize pruning strategies and set a maximum subspace length, resulting in suboptimal interpretation results. Accordingly, this paper proposes a novel Attention-guided Triplet deviation network for Outlier interpretatioN (ATON). Instead of searching a subspace, ATON directly learns an embedding space and learns how to attach attention to each embedding dimension (i.e., capturing the contribution of each dimension to the outlierness of the queried outlier). Specifically, ATON consists of a feature embedding module and a customized self-attention learning module, which are optimized by a triplet deviation-based loss function. We obtain an optimal attention-guided embedding space with expanded high-level information and rich semantics, and thus outlying behaviors of the queried outlier can be better unfolded. ATON finally distills a subspace of original features from the embedding module and the attention coefficient. With the good generality, ATON can be employed as an additional step of any black-box outlier detector. A comprehensive suite of experiments is conducted to evaluate the effectiveness and efficiency of ATON. The proposed ATON significantly outperforms state-of-the-art competitors on 12 real-world datasets and obtains good scalability w.r.t. both data dimensionality and data size. This is the first work to release the ground-truth outlier interpretation annotations of real-world datasets. |
Title: Bid Prediction in Repeated Auctions with Learning |
Authors: Gali Noti (Hebrew University of Jerusalem) and Vasilis Syrgkanis (Microsoft Research). |
We consider the problem of bid prediction in repeated auctions and evaluate the performance of econometric methods for learning agents using a dataset from a mainstream sponsored search auction marketplace. Sponsored search auctions is a billion dollar industry and the main source of revenue of several tech giants. A critical problem in optimizing such marketplaces is understanding how bidders will react to changes in the auction design. We propose the use of no-regret based econometrics for bid prediction, modeling players as no-regret learners with respect to a utility function, unknown to the analyst. We propose new econometric approaches to simultaneously learn the parameters of a player's utility and her learning rule, and apply these methods in a real-world dataset from the BingAds sponsored search auction marketplace. We show that the no-regret econometric methods perform comparable to state-of-the-art time-series machine learning methods when there is no co-variate shift, but significantly outperform machine learning methods when there is a co-variate shift between the training and test periods. This portrays the importance of using structural econometric approaches in predicting how players will respond to changes in the market. Moreover, we show that among structural econometric methods, approaches based on no-regret learning outperform more traditional, equilibrium-based, econometric methods that assume that players continuously best-respond to competition. Finally, we demonstrate how the prediction performance of the no-regret learning algorithms can be further improved by considering bidders who optimize a utility function with a visibility bias component. |
Title: Bidirectional Distillation for Top-K Recommender System |
Authors: Wonbin Kweon (Pohang University of Science and Technology), Seongku Kang (Pohang University of Science and Technology) and Hwanjo Yu (Pohang University of Science and Technology). |
Recommender systems (RS) have started to employ knowledge distillation, which is a model compression technique training a compact model (student) with the knowledge transferred from a cumbersome model (teacher). The state-of- the-art methods rely on unidirectional distillation transferring the knowledge only from the teacher to the student, with an underlying assumption that the teacher is always superior to the student. However, we demonstrate that the student performs better than the teacher on a significant proportion of the test set, especially in RS. Based on this observation, we propose Bidirectional Distillation (BD) framework whereby both the teacher and the student collaboratively improve with each other. Specifically, each model is trained with the distillation loss that makes to follow the other’s prediction along with its original loss function. For effective bidirectional distillation, we propose rank discrepancy-aware sampling scheme to distill only the informative knowledge that can fully enhance each other. The proposed scheme is designed to effectively cope with a large performance gap between the teacher and the student. Trained in the bidirectional way, it turns out that both the teacher and the student are significantly improved compared to when being trained separately. Our extensive experiments on real-world datasets show that our proposed framework consistently outperforms state-of- the-art competitors. We also provide analyses for an in-depth understanding of BD and ablation studies to verify the effectiveness of each proposed component. |
Title: Biomedical Vocabulary Alignment at Scale in the UMLS Metathesaurus |
Authors: Vinh Nguyen (National Library of Medicine, NIH), Hong Yung Yip (Kno.e.sis, Wright State University) and Olivier Bodenreider (US National Library of Medicine). |
With 214 source vocabularies, the construction and maintenance process of the UMLS (Unified Medical Language System) Metathesaurus terminology integration system is costly, time-consuming, and error-prone as it primarily relies on (1) lexical and semantic processing for suggesting groupings of synonymous terms, and (2) the expertise of UMLS editors for curating these synonymy predictions. This paper aims to improve the UMLS Metathesaurus construction process by developing a novel supervised learning approach for improving the task of suggesting synonymous pairs that can scale to the size and diversity of the UMLS source vocabularies. We evaluate this deep learning (DL) approach against a rule-based approach (RBA) that approximates the current UMLS Metathesaurus construction process. The key to the generalizability of our approach is the use of various degrees of lexical similarity in negative pairs during the training process. Our initial experiments demonstrate the strong performance across multiple datasets of our DL approach in terms of recall (91-92%), precision (88-99%), and F1 score (89-95%). Our DL approach largely outperforms the RBA method in recall (+23%), precision (+2.4%), and F1 score (+14.1%). This novel approach has great potential for improving the UMLS Metathesaurus construction process by providing better synonymy suggestions to the UMLS editors. |
Title: Boosting the Speed of Entity Alignment 10×: Dual Attention Matching Network with Normalized Hard Sample Mining |
Authors: Xin Mao (East China Normal University), Wenting Wang (Alibaba Group), Yuanbin Wu (East China Normal University) and Man Lan (East China Normal University). |
Seeking the equivalent entities among multi-source Knowledge Graphs (KGs) is the pivotal step to KGs integration, also known as entity alignment (EA). However, most existing EA methods are inefficient and poor in scalability. A recent summary points out that some of them even require several days to deal with a dataset containing 200, 000 nodes (DWY100K). We believe over-complex graph encoder and inefficient negative sampling strategy are the two main reasons. In this paper, we propose a novel KG encoder — Dual Attention Matching Network (Dual-AMN), which not only models both intra-graph and cross-graph information smartly, but also greatly reduces computational complexity. Furthermore, we propose the Normalized Hard Samp le Mining Loss to smoothly select hard negative samples with reduced the loss shift. The experimental results on widely used public datasets indicate that our method achieves both high accuracy and high efficiency. On DWY100K, the whole running process of our method could be finished in 1, 100 seconds, at least 10X faster than previous work. The performances of our method also outperform previous works across all datasets, where Hits@1 and MRR have been improved from 6% to 13%. |
Title: Bridging the Gap between von Neumann Graph Entropy and Structural Information: Theory and Applications |
Authors: Xuecheng Liu (Shanghai Jiao Tong University), Luoyi Fu (Shanghai Jiao Tong University) and Xinbing Wang (Shanghai Jiao Tong University). |
The von Neumann graph entropy is a measure of graph complexity based on the Laplacian spectrum. It has recently found applications in various learning tasks driven by networked data. However, it is computational demanding and hard to interpret using simple structural patterns. Due to the close relation between Lapalcian spectrum and degree sequence, we conjecture that the structural information, defined as the Shannon entropy of the normalized degree sequence, might be a good approximation of the von Neumann graph entropy that is both scalable and interpretable. In this work, we thereby study the difference between the structural information and von Neumann graph entropy named as {em entropy gap}.Based on the knowledge that the degree sequence is majorized by the Laplacian spectrum, we for the first time prove the entropy gap is between $0$ and $log_2 e$ in any undirected unweighted graphs.Consequently we certify that the structural information is a good approximation of the von Neumann graph entropy that achieves provable accuracy, scalability, and interpretability simultaneously. We further study two entropy based applications which can benefit from the bounded entropy gap and structural information: network design and graph similarity measure. We combine greedy method and pruning strategy to develop fast algorithm for the network design, and propose a novel graph similarity measure with a fast incremental algorithm for graph streams. Our experimental results on graphs of various scales and types show that the very small entropy gap readily applies to a wide range of graphs and weighted graphs. As an approximation of the von Neumann graph entropy, the structural information is the only one that achieves both high efficiency and high accuracy among the prominent methods. It is at least two orders of magnitude faster than SLaQ with comparable accuracy. Our structural information based methods also exhibit superior performance in two entropy based applications. |
Title: BRIGHT: A Bridging Algorithm for Network Alignment |
Authors: Yuchen Yan (University of Illinois at Urbana-Champaign), Si Zhang (University of Illinois at Urbana-Champaign) and Hanghang Tong (University of Illinois at Urbana-Champaign). |
Multiple networks emerge in a wealth of high-impact applications. Network alignment, which aims to find the node correspondence across different networks, plays a fundamental role for many data mining tasks. Most of the existing methods can be divided into two categories: (1) consistency optimization based methods, which often explicitly assume the alignment to be consistent in terms of neighborhood topology and attribute across networks, and (2) network embedding based methods which learn low-dimensional node embedding vectors to infer alignment. In this paper, by analyzing certain methods of these two categories, we show that (1) the consistency optimization based methods are essentially specific random walk propagations from anchor links that might be restrictive; (2) the embedding based methods no longer explicitly assume alignment consistency but inevitably suffer from the space disparity issue. To overcome these two limitations, we bridge these methods and propose a novel family of network alignment algorithms BRIGHT to handle both non-attributed and attributed networks. Specifically, it constructs a space by random walk with restart (RWR) whose bases are one-hot encoding vectors of anchor nodes, followed by a shared linear layer. Our experiments on real-world networks show that the proposed family of algorithms BRIGHT outperform the state-of-the- arts for both non-attributed and attributed network alignment tasks. |
Title: BrowseLite: A Private Data Saving Solution for the Web |
Authors: Conor Kelton (Stony Brook University), Matteo Varvello (Nokia Bell Labs), Andrius Aucinas (Brave Software) and Ben Livshits (Brave Software). |
The median webpage has increased in size by more than 80% in the last 4 years. This extra complexity allows for a rich browsing experience, but it hurts the majority of mobile users which still pay for their traffic. This has motivated several data-saving solutions, which aim at reducing the complexity of webpages by transforming their content. Despite each method being unique, they either reduce user privacy by further centralizing web traffic through data-saving middleboxes or introduce web compatibility (Web-compat) issues by removing content that breaks pages in unpredictable ways.In this paper, we argue that data-saving is still possible without impacting either users privacy or Web-compat. Our main observation is that Web images make up a large portion of Web traffic and have negligible impact on Web-compat. To this end we make two main contributions. First, we quantify the potential savings that image manipulation, such as dimension resizing, quality compression, and transcoding, enables at large scale: 300 landing and 880internal pages. Next, we design and buildBrowseLite, an entirely client-side tool that achieves such data savings through opportunistically instrumenting existing server-side tooling to perform image compression, while simultaneously reducing the total amount of image data fetched. The effect of BrowseLite on the user experience is quantified using standard page load metrics and a real user study of over 200 users across 50 optimized web pages. BrowseLite allows for similar savings to middlebox approaches, while offering additional security, privacy, and Web-compat guarantees. |
Title: Causal Network Motifs: Identifying Heterogeneous Spillover Effects in A/B Tests |
Authors: Yuan Yuan (Massachusetts Institute of Technology), Kristen Altenburger (Facebook) and Farshad Kooti (Facebook). |
Randomized experiments, or "A/B" tests, remain the gold standard for evaluating the causal effect of a policy intervention or product change. However, experimental settings such as social networks, where users are interacting and influencing one another, violate conventional assumptions of no interference needed for credible causal inference. Existing solutions include accounting for the fraction or count of treated neighbors in a user's network, among other strategies. Yet, there are often a high number of researcher degrees of freedom in specifying network interference conditions and most current methods do not account for the local network structure beyond simply counting the number of neighbors. Capturing local network structures is important because it can account for theories, such as structural diversity and echo chambers. Our study provides an approach that accounts for both the local structure in a user's social network via motifs as well as the assignment conditions of neighbors. We propose a two-part approach. We first introduce and employ "causal network motifs", i.e. network motifs that characterize the assignment conditions in local ego networks; and then we propose a tree-based algorithm for identifying different network interference conditions and estimating their average potential outcomes. We test our method on a real-world experiment on a large-scale network and a synthetic network setting, which highlight how accounting for local structures can better account for different interference patterns in networks. |
Title: Characterizing Impacts of Heterogeneity in Federated Learning upon Large-Scale Smartphone Data |
Authors: Chengxu Yang (Peking University), Qipeng Wang (Peking University), Mengwei Xu (Beijing University of Posts and Telecommunications, Peking University), Zhenpeng Chen (Peking University), Kaigui Bian (Peking University), Yunxin Liu (Microsoft Corporation) and Xuanzhe Liu (Peking University). |
Federated learning (FL) is an emerging, privacy-preserving machine learning paradigm, drawing tremendous attention in both academia and industry. A unique characteristic of FL is heterogeneity, which resides in the various hardware specifications and dynamic states across the participating devices. Theoretically, heterogeneity can exert a huge influence on the FL training process, e.g., causing a device unavailable for training or unable to upload its model updates. Unfortunately, such impacts have never been systematically studied and quantified in existing FL literature. In this paper, we carry out the first empirical study to characterize the impacts of heterogeneity in FL. We collect large-scale data from 136k smartphones that can faithfully reflect heterogeneity in real-world settings. We also build a heterogeneity-aware FL platform that complies with the standard FL protocol but with heterogeneity consideration. Based on the data and platform, we conduct extensive experiments to compare the performance of state-of-the-art FL algorithms under heterogeneity-aware and heterogeneity-unaware settings. Results show that heterogeneity causes non-trivial performance degradation in FL, including up to 9.2% accuracy drop, 2.32x lengthened training time, and undermined fairness. Furthermore, we perform a cause analysis and find that device failure and participation bias are two potential root causes for performance degradation. Our study has insightful implications for FL practitioners. On one hand, our findings suggest FL algorithm designers consider necessary heterogeneity during the evaluation. On the other hand, our findings urge system providers to design specific mechanisms to mitigate the impacts of heterogeneity. |
Title: Chinese Wall or Swiss Cheese? Keyword filtering in the Great Firewall of China |
Authors: Raymond Rambert (Carnegie Mellon University), Zachary Weinberg (Carnegie Mellon University), Diogo Barradas (INESC-ID, Instituto Superior Técnico, Universidade de Lisboa) and Nicolas Christin (Carnegie Mellon University). |
The Great Firewall of China (GFW) prevents Chinese citizens from accessing online content deemed objectionable by the Chinese government. One way it does this is to search for forbidden keywords in unencrypted packet streams. When it detects them, it terminates the offending stream by injecting TCP RST packets, and blocks further traffic between the same two hosts for a few minutes. Previous studies of Chinese keyword filtering have concentrated on probing for the contents of the forbidden keywords list, identifying regional variations within China, and devising methods for evading the GFW. We report changes since 2014 in the organization and contents of the forbidden keywords list. In particular, we identify and distinguish between three different sub-lists used for HTTP, which contain forbidden terms for a) hostnames, b) page names, and c) search queries. Our experiments reveal that over 86% of the forbidden keywords have been replaced since 2014. By performing finer-grained experiments, we observe some conditions where forbidden keywords do not trigger the GFW blocking mechanisms (e.g., some HTTP headers are ignored), and differences in behavior depending on context (e.g., some keywords are only blocked when paired with the word “search”). We also conducted a pilot experiment to assess whether the GFW is able to detect keywords sent within HTTPS requests, e.g., by tampering with TLS certificates. The results of our experiment provided no evidence for bulk decryption of HTTPS traffic. |
Title: CLEAR: Contrastive-Prototype Learning with Drift Estimation for Resource Constrained Stream Mining |
Authors: Zhuoyi Wang (University of Texas at Dallas), Chen Zhao (University of Texas at Dallas), Yuqiao Chen (University of Texas at Dallas), Hemeng Tao (University of Texas at Dallas), Yu Lin (University of Texas at Dallas), Xujiang Zhao (University of Texas at Dallas), Yigong Wang (University of Texas at Dallas) and Latifur Khan (University of Texas at Dallas). |
Non-stationary data stream mining aims to classify large scale online instances that emerge continuously. The most apparent challenge compared with the offline learning manner is the issue of consecutive emergence of new categories, when tackling non-static categorical distribution. Non-stationary stream settings often appear in real-world applications, textit{e.g.}, online classification in E-commerce systems that involves the incoming productions, or the summary of news topics on social networks (Twitter). Ideally, a learning model should be able to learn novel concepts from labeled data (in new tasks) and reduce the abrupt degradation of model performance on the old concept (also named catastrophic forgetting problem). In this work, we focus on improving the performance of the stream mining approach under the textbf{constrained resources}, where both the memory resource of old data and labeled new instances are limited/scarce. We introduce the embedding model, which works on the class-embedding space from the encoder output, and continually constructs the prototypes to represent new categories without additional learned weight like the softmax classifier. We propose a simple yet efficient resource-constrained stream mining framework sysname{} based on prototype mechanism, it consists of two sub-steps: the contrastive-prototype learning and drift estimation. Specifically, the contrastive prototype learning is applied on unlabeled data to encode the semantically similar instances into an embedding space, then generate the discriminated prototype for each class. Next, during model updating on new tasks/categories, we implement a drift estimation strategy to compensate for the drift of each class's prototype, which is used to reduce the knowledge forgetting without storing previous data. We perform experiments on public datasets (textit{e.g.}, CUB200, CIFAR100) under stream setting, our approach is consistently and clearly better than many state- of-the-art methods, along with both the memory and annotation restriction. |
Title: ColChain: Collaborative Linked Data Networks |
Authors: Christian Aebeloe (Aalborg University), Gabriela Montoya (Aalborg University) and Katja Hose (Aalborg University). |
One of the major obstacles that currently prevents the Semantic Web from exploiting its full potential is that the data it provides access to is sometimes not available or outdated. The reason is rooted deep within its architecture that relies on data providers to keep the data available, queryable, and up-to-date at all times – an expectation that many data providers in reality cannot live up to for an extended (or infinite) period of time. Hence, decentralized architectures have recently been proposed that use replication to keep the data available in case the data provider fails. Although this increases availability, it does not help keeping the data up-to-date or allow users to query and access previous versions of a dataset. In this paper, we therefore propose ColChain (COLlaborative knowledge CHAINs), a novel decentralized architecture based on blockchains that not only lowers the burden for the data providers but at the same time also allows users to propose updates to faulty or outdated data, trace updates back to their origin, and query older versions of the data. Our extensive experiments show that ColChain reaches these goals while achieving query processing performance comparable to the state of the art. |
Title: Collaborative Filtering with Preferences Inferred from Brain Signals |
Authors: Keith Davis (University of Helsinki), Michiel Spapé (University of Helsinki) and Tuukka Ruotsalo (University of Helsinki). |
Collaborative filtering is a common technique in which data from a large number of users are used to infer preferences and recommend items to an individual that they may prefer but have not interacted with. Previous approaches have achieved this using a variety of behavioral signals, from dwell time and clickthrough rates to self-reported ratings. However, such signals are mere estimations of the real underlying preferences of the users. Here, we use brain-computer interfacing to infer preferences directly from the human brain. We then utilize these preferences in a collaborative filtering setting and report results from an experiment where brain inferred preferences are used in a neural collaborative filtering framework. Our results demonstrate, for the first time, that brain-computer interfacing can provide a viable alternative for behavioral and self-reported preferences in realistic recommendation scenarios. We also discuss the broader implications of our findings for personalization systems and user privacy. |
Title: CollaEdge: A Decentralized Blockchain-based Platform for Cooperative Edge Computing |
Authors: Liang Yuan (Swinburne University of Technology), Qiang He (Swinburne University of Technology), Siyu Tan (Swinburne University of Technology), Bo Li (Swinburne University of Technology), Jiangshan Yu (Monash University), Feifei Chen (Deakin University), Hai Jin (Huazhong University of Science and Technology) and Yun Yang (Swinburne University of Technology). |
Edge computing (EC) has recently emerged as a novel computing paradigm that offers users low-latency services. Suffering from constrained computing resources due to their limited physical sizes, edge servers cannot always handle all the incoming computation tasks timely when they operate independently. They often need to cooperate through peer- offloading. Deployed and managed by different stakeholders, edge servers operate in a distrusted environment. Trust and incentive are the two main issues that challenge cooperative computing between them. Another unique challenge in the EC environment is to facilitate trust and incentive in a decentralized manner. To tackle these challenges systematically, this paper proposes CollaEdge, a novel blockchain-based decentralized platform, to drive and support cooperative edge computing. On CollaEdge, an edge server can publish a computation task for other edge servers to contend for. A winner is selected from candidate edge servers based on their reputations. After that, a consensus is reached among edge servers to record the performance in task execution on blockchain. We implement CollaEdge based on Hyperledger Sawtooth and evaluate it experimentally against a baseline and two state-of-the-art implementations in a simulated EC environment. The results validate the usefulness of CollaEdge and demonstrate its performance. |
Title: Columbus: Fast, Reliable Co-residence Detection for Lambdas |
Authors: Anil Yelam (University of California, San Diego), Ariana Mirian (University of California, San Diego), Keerthana Ganesan (University of California, San Diego), Shibani Subbareddy (University of California, San Diego) and Stefan Savage (University of California, San Diego). |
``Serverless'' cloud services, such as AWS lambdas, are one of the fastest growing segment of the cloud services market. These services are lighter-weight and provide more flexibility in scheduling and cost, which contributes to their popularity, however the security issues associated with serverless computing are not well understood. In this work, we explore the feasibility of constructing a practical covert channel from lambdas. We establish that a fast and scalable co- residence detection for lambdas is key to enabling such a covert channel, and proceed to develop a generic, reliable, and scalable co-residence detector based on the memory bus hardware. Our technique enables dynamic neighbor discovery for co-resident lambdas and is incredibly fast, executing in a matter of seconds. We evaluate our approach for correctness and scalability, and perform a measurement study on lambda density in AWS cloud to demonstrate the practicality of establishing cloud covert channels using our co-residence detector for lambdas. Through this work, we show that efforts to secure co-residency detection on cloud platforms are not yet complete. |
Title: Communication Efficient Federated Generalized Tensor Factorization for Collaborative Health Data Analytics |
Authors: Jing Ma (Emory University), Qiuchen Zhang (Emory University), Jian Lou (Emory University), Li Xiong (Emory University) and Joyce C. Ho (Emory University). |
Modern healthcare systems knitted by a web of entities (e.g., hospitals, clinics, pharmacy companies) are collecting a huge volume of healthcare data from a large number of individuals with all sorts of medical procedures, medications, diagnosis, lab tests, and so on. To extract meaningful medical concepts (i.e., phenotypes) from such higher-arity relational healthcare data, tensor factorization has been proven to be an effective approach and received increasing research attention, due to their intrinsic capability to accommodate the high-dimensional data and provide adequate representation power. Recently, federated learning offers a privacy-preserving paradigm for collaborative learning among different entities, which seemingly provides an ideal potential to further enhance the tensor factorization-based collaborative phenotyping to handle sensitive personal health data. However, existing attempts to federated tensor factorization come with various limitations, e.g., restricted to the classic tensor factorization, high communication cost, and reduced accuracy. We propose a communication-efficient federated generalized tensor factorization, which is flexible enough to choose from a variate of losses to best suit different types of data in practice. We design a three-level communication reduction strategy tailored to the generalized tensor factorization, which is able to reduce the uplink communication cost up to 99.90%. In addition, we theoretically prove that our algorithm does not compromise convergence speed despite the aggressive communication compression. Extensive experiments on two real-world electronics health record datasets demonstrate the efficiency improvements in terms of computation and communication cost. |
Title: Community Value Prediction in Social E-commerce |
Authors: Guozhen Zhang (Department of Electronic Engineering, Tsinghua University), Yong Li (Department of Electronic Engineering, Tsinghua University), Yuan Yuan (Tsinghua University), Fengli Xu (Department of Computer Science and Engineering, Hong Kong University of Science and Technology), Hancheng Cao (Department of Computer Science, Stanford University), Depeng Jin (Department of Electronic Engineering, Tsinghua University) and Yujian Xu (Beibei group). |
The phenomenal success of the newly-emerging social e-commerce has demonstrated that utilizing social relations is becoming a promising approach to promote e-commerce platforms. In this new scenario, one of the most important problems is to predict the value of a community formed by closely connected users in social networks due to its tremendous business value. However, few works have addressed this problem because of 1) its novel setting and 2) its challenging nature that the structure of a community has complex effects on its value. To bridge this gap, we develop a underline{M}ulti-scale underline{S}tructure-aware underline{C}ommunity value prediction network (MSC) that jointly models the structural information of different scales, including peer relations, community structure, and inter-community connections, to predict the value of given communities. Specifically, we first proposed a Masked Edge Learning Graph Convolutional Network (MEL-GCN) based on a novel masked propagation mechanism to model peer influence. Then, we design a Pair-wise Community Pooling (PCPool) module to capture critical community structures. Finally, we model the inter-community connections by distinguishing intra-community edges from inter-community edges and employing a Multi-aggregator Framework (MAF). Extensive experiments on a large-scale real-world social e-commerce dataset demonstrate our method's superior performance over state-of-the-art baselines, with a relative performance gain of 11.40%, 10.01%, and 10.97% in MAE, RMSE, and NRMSE, respectively. Further ablation study shows the effectiveness of our designed components. |
Title: Completing Missing Prevalence Rates for Multiple Chronic Diseases by Jointly Leveraging Both Intra- and Inter-Disease Population Health Data Correlations |
Authors: Yujie Feng (Peking University), Jiangtao Wang (Coventry University), Yasha Wang (Peking University) and Sumi Helal (Lancaster University). |
Population health data are becoming more and more publicly available on the Internet than ever before. Such datasets offer a great potential for enabling a better understanding of the health of populations, and inform health professionals and policy makers for better resource planning, disease management and prevention across different regions. However, due to the laborious and high-cost nature of collecting such public health data, it is a common place to find many missing entries on these datasets, which challenges the utility of the data and hinders reliable analysis and understanding. To tackle this problem, this paper proposes a deep-learning-based approach, called Compressive Population Health (CPH), to infer and recover (to complete) the missing prevalence rate entries of multiple chronic diseases. The key insight of CPH relies on the combined exploitation of both intra-disease and inter-disease correlation opportunities. Specifically, we first propose a Convolutional Neural Network (CNN) based approach to extract and model both of these two types of correlations, and then adopt a Generative Adversarial Network (GAN) based prevalence inference model to jointly fuse them to facility the prevalence rates data recovery of missing entries. We extensively evaluate the inference model based on real-world public health datasets publicly available on the Web. Results show that our inference method outperforms other baseline methods in various settings and with a significantly improved accuracy |
Title: Compositional Question Answering via Hierarchical Graph Neural Networks |
Authors: Bingning Wang (Sogou Inc.), Ting Yao (Sogou Inc.), Weipeng Chen (Sogou Inc.), Jingfang Xu (Sogou Inc.) and Xiaochuan Wang (Tsinghua University). |
With the development of deep learning techniques and large scale datasets, the question answering (QA) systems have been quickly improved, providing more accurate and satisfying answers. However, current QA systems either focus on the sentence-level answer, i.e., answer selection, or phrase-level answer, i.e., machine reading comprehension. How to produce compositional answers has not been throughout investigated. In compositional question answering, the systems should assemble several support evidence from the document to generate the final answer, which is more difficult than sentence-level or phrase-level QA. In this paper, we present a large-scale compositional question answering dataset containing more than 120k human-labeled questions. The answer in this dataset is composed of discontiguous sentence in the corresponding document. To tackle the ComQA problem, we proposed a hierarchical graph neural networks, which represent the document from the low-level word to the high-level sentence. We also devise a question selection and node selection task for pre-training. Our proposed model achieves a significant improvement over previous machine reading comprehension methods and pre-training methods. We will release the proposed datasets and codes after the review process. |
Title: Computing Views of OWL Ontologies for the Semantic Web: A Forgetting-Based Approach |
Authors: Jiaqi Li (Nanjing University) and Yizheng Zhao (Nanjing University). |
This paper is concerned with the problem of computing views of OWL ontologies using a forgetting-based approach. In traditional relational databases, a view is a subset of a database, whereas in ontologies, a view is more than a subset; it contains not only axioms contained in the original ontology, but also newly derived axioms entailed by the original ontology (implicitly contained in the original ontology). Specifically, given an ontology O, the signature sig(O) of O is the set of all terms in O, and a view V of O is a new ontology obtained from O using only part of O's signature, namely the target signature, while preserving all logical entailments up to the target signature. Computing views of ontologies is useful for applications such as ontology-based query answering, in the sense that the view can be used as a substitute of the original ontology to answer queries formulated with the target signature, and also useful for security purposes, in the sense that it restricts users from viewing certain information of an ontology. Forgetting is a form of non-standard reasoning concerned with eliminating from an ontology a subset of its signature, namely the forgetting signature, in such a way that all logical entailments are preserved up to the target signature. Forgetting can thus be used as an ontology engineering tool to compute views of ontologies - the solution of forgetting a set F of terms from an ontology O is the view V of O for the target signature sig(O)F. In this paper, we introduce a forgetting-based method for computing views of OWL ontologies specified in the description logic ALCHOI, the basic ALC extended with role hierarchy, nominals and inverse roles. The method is terminating and sound. Despite the method not being complete, an evaluation with a prototype implementation of the method on a corpus of real-world ontologies has shown superb results. This is very useful from the perspective of the Semantic Web, as it provides knowledge engineers with a powerful approach/tool for creating views of OWL ontologies. |
Title: ConceptGuide: Supporting Online Video Learning with Concept Map-based Recommendation of Learning Path |
Authors: Chien-Lin Tang (National Chiao Tung University), Jingxian Liao (University of California, Davis), Hao-Chuan Wang (University of California, Davis), Ching-Ying Sung (University of Washington) and Wen-Chieh Lin (National Chiao Tung University). |
People increasingly use online video platforms, e.g., YouTube, to locate educational videos to acquire knowledge or skills to meet personal learning needs. However, most of existing video platforms display video search results in generic ranked lists based on relevance to queries. The design of relevance-oriented information display does not take into account the inner structure of the knowledge domain, and may not suit the need of online learners. In this paper, we present ConceptGuide, a prototype system for learning orientations to support ad hoc online learning from unorganized video materials. ConceptGuide features a computational pipeline that performs content analysis on the transcripts of YouTube videos retrieved for a topic, and generates concept-map-based visual recommendations of inter-concept and inter-video links, forming learning pathways as structures for learners to consume. We evaluated ConceptGuide by comparing the design to the general-purpose interface of YouTube in learning experiences and behaviors. ConceptuGuide was found to improve the efficiency of video learning and helped learners explore the knowledge of interest in many constructive ways. |
Title: Consistent Sampling Through Extremal Process |
Authors: Ping Li (Baidu Research), Xiaoyun Li (Rutgers University), Gennady Samorodnitsky (Cornell University) and Weijie Zhao (Baidu Research). |
The Jaccard similarity has been widely used in search and machine learning. For binary (0/1) data, it is often called ``resemblance'' and the method of min-wise hashing has been the standard practice for computing resemblance in massive data. For Jaccard similarity in general weighted data, the commonly used sampling algorithm is the Consistent Weighted Sampling (CWS). A convenient (and perhaps mysterious) implementation of CWS is the so-called ``0-bit CWS'' which we refer to as the ``relaxed CWS'' and was purely an empirical observation without any theoretical justification. The difficulty is due to the highly complicated probability problem of ``relaxed CWS''. In this paper, we propose using extremal process to generate samples for estimating Jaccard similarity. Surprisingly, this scheme makes possible to analyze the ``relaxed ES'' variant. Through novel probability endeavours, we can rigorously compute the bias of ``relaxed ES'' which explains why it works so well and when it does not in extreme corner cases. Interestingly, compared with CWS, the resultant algorithm only involves counting and does not need sophisticated mathematical operations (as required by CWS). It is not surprising that the proposed ``Extremal Sampling'' (ES) is actually noticeably faster than CWS. Although ES is different from CWS (and other previous algorithms for Jaccard similarity), in retrospect it is closely related to CWS. This paper provides the insight which connects CWS with extremal processes. This insight will help understand CWS (and variants) and help develop new algorithms for similarity estimation. |
Title: Constructing a Comparison-based Click Model for Web Search |
Authors: Ruizhe Zhang (Tsinghua University), Xiaohui Xie (Tsinghua University), Jiaxin Mao (Tsinghua University), Yiqun Liu (Tsinghua University), Min Zhang (Tsinghua University) and Shaoping Ma (Tsinghua University). |
Extracting valuable feedback information from user behavior logs is one of the major concerns in Web search studies. Among the tremendous efforts that aim to improve search performance with user behavior modeling, constructing click models is of vital importance because it provides a direct estimation of result relevance. Most existing click models assume that whether user clicks a result only depends on the examination probability and the content of the result. However, through carefully designed user eye-tracking study we found that users do not make click-through decisions isolatedly. Instead, they also take the context of a result (e.g. adjacent results) into consideration. This finding leads to the designing of a novel click model named Comparison-based Click Model (CBCM). Different from traditional examination hypothesis, CBCM introduces the concept of examination viewport and assumes users click results after comparing adjacent results within a same viewport. Experimental results on a public available user behavior dataset show effectiveness of CBCM. |
Title: Constructing Explainable Opinion Graphs from Reviews |
Authors: Nofar Carmeli (Technion), Xiaolan Wang (Megagon Labs), Yoshihiko Suhara (Megagon Labs), Stefanos Angelidis (University of Edinburgh), Yuliang Li (Megagon Labs), Jinfeng Li (Megagon Labs) and Wang-Chiew Tan (Megagon Labs). |
The Web is a major resource of both factual and subjective information. While there are significant efforts to organize factual information into knowledge bases, there is much less work on organizing opinions, which are abundant in subjective data, into a structured format. We present ExplainIt, a system that extracts and organizes opinions into an opinion graph, which are useful for downstream applications such as generating explainable review summaries and facilitating search over opinion phrases. In such graphs, a node represents a set of semantically similar opinions extracted from reviews and an edge between two nodes signifies that one node explains the other. ExplainIt mines explanations in a supervised method and groups similar opinions together in a weakly supervised way before combining the clusters of opinions together with their explanation relationships into an opinion graph. We experimentally demonstrate that the explanation relationships generated in the opinion graph are of good quality and our labeled datasets for explanation mining and grouping opinions will be made publicly available. |
Title: Contrastive Lexical Diffusion Coefficient: Quantifying the Stickiness of the Ordinary |
Authors: Mohammadzaman Zamani (Stony Brook University) and H. Andrew Schwartz (Stony Brook University). |
Linguistic phenomena, such as clusters of related words, disseminate through social networks at different rates but most diffusion models focus on the discrete adoption of new linguistic phenomena (i.e. new topics or memes). It is possible much of linguistic diffusion happens via the changing rates of existing word categories or concepts (those that are already regularly being used) rather than new ones. In this study we introduce a new metric, contrastive lexical diffusion (CLD) coefficient, which attempts to measure the degree to which ordinary language (here clusters of common words) catch on over friendship connections over time. For instance topics related to meeting and job are found to be sticky, while negative thinking and emotion, and global events, like `school orientation' were found to be less sticky even though they change rates over time. We evaluate CLC coefficient over both quantitative and qualitative tests, finding they predict the spread of tweets and friendship connections, they converge with human judgments of lexical diffusion (r=0.92), and they replicate across disjoint networks (r=0.85). Comparing CLD scores can help understand lexical diffusion: positive emotion words appear more diffusive than negative emotions, first-person plurals (we) score higher than other pronouns, and numbers and time appear less diffusive. |
Title: Controllable and Diverse Text Generation in E-commerce |
Authors: Huajie Shao (University of Illinois at Urbana-Champaign), Jun Wang (Alibaba Group, Seattle), Haohong Lin (Zhejiang University), Xuezhou Zhang (University of Wisconsin, Madison), Aston Zhang (University of Illinois at Urbana- Champaign), Heng Ji (University of Illinois at Urbana-Champaign) and Tarek Abdelzaher (University of Illinois at Urbana-Champaign). |
In E-commerce, a key challenge in text generation is to find a good trade-off between word diversity and accuracy (relevance) in order to make generated text appear more natural and human-like. In order to improve the relevance of generated results, conditional text generators were developed that use input keywords or attributes to produce the corresponding text. Prior work, however, do not finely control the diversity of automatically generated sentences. For example, it does not control the order of keywords to put more relevant ones first. Moreover, it does not explicitly control the balance between diversity and accuracy. To remedy these problems, we propose a fine-grained controllable generative model, called~textit{Apex}, that uses an algorithm borrowed from automatic control (namely, a variant of the textit{proportional, integral, and derivative (PID) controller}) to precisely manipulate the diversity/accuracy trade-off of generated text. The algorithm is injected into a Conditional Variational Autoencoder (CVAE), allowing textit{Apex} to control both (i) the order of keywords in the generated sentences (conditioned on the input keywords and their order), and (ii) the trade-off between diversity and accuracy. Evaluation results on real world datasets~footnote{The sampled data is publicly available at url{https://github.com/paper-data-open/Text-Gen-Ecommerce}} show that the proposed method outperforms existing generative models in terms of diversity and relevance. Moreover, it achieves about 97% accuracy in the control of the order of keywords. textit{Apex} is currently deployed to generate production descriptions and item recommendation reasons in Taobaofootnote{https://www.taobao.com/}, the largest E-commerce platform in China. The A/B production test results show that our method improves click-through rate (CTR) by 13.17% compared to the existing method for production descriptions. For item recommendation reason, it is able to increase CTR by 6.89% and 1.42% compared to user reviews and top-K item recommendation without reviews, respectively. |
Title: Controllable Gradient Item Retrieval |
Authors: Haonan Wang (University of Illinois at Urbana-Champaign), Chang Zhou (Alibaba Group), Carl Yang (Emory University), Hongxia Yang (Alibaba Group) and Jingrui He (University of Illinois Urbana-Champaign). |
In this paper, we identify and study the research problem of gradient item retrieval. We define the problem as retrieving a sequence of items with a gradual change to a certain attribute, given a reference item and a modification text. For example, we may want a more floral dress after we see a white one. The extent of "floral" is objective, thus we may ask the system to present a sequence of products with a gradual increasing floral attribute. Existing item retrieval methods mainly focus on whether the target items appeared at the top of the sequence of items ranked by similarities between query and items. However, those methods ignore the demand for retrieving a sequence of products with gradual change on a certain attribute. To deal with this problem, we propose a weakly-supervised method to learn a disentangled item representation from user-item interaction data and ground the semantic meaning to dimensions of the item representation. Our method takes a reference item and a modification as a query. During inference, we start from the reference item, then gradually change the value of certain meaningful dimensions of the item representation to retrieve a sequence of items. We demonstrate our proposed method can achieve disentanglement through weak supervision. Besides, we empirically show our method can retrieve items in a gradient manner and, in item retrieval task, our method outperforms existing approaches on three different datasets. |
Title: Controlling the Risk of Conversational Search via Reinforcement Learning |
Authors: Zhenduo Wang (University of Utah) and Qingyao Ai (University of Utah). |
Users often formulate their search queries with language fragments instead of complete and grammatical natural language. Such queries are likely to fail to express their true information needs and raise ambiguity as fragmental language often yield various interpretations and aspects. This gives the search engine a hard time processing and understanding the query. Hence, search engines can only occasionally return what users really need. An alternative approach to direct answer while facing an ambiguous query is to proactively ask clarifying questions to the user. Recent years have seen many works and shared tasks from both NLP and IR community about identifying the need for asking clarifying question and methodology to generate them. An often neglected fact by these works is that although sometimes the need for clarifying questions is correctly recognized, the clarifying questions these system generate are still off-topic and dissatisfaction provoking to users and may just cause users to leave the conversation. In this work, we propose a risk-aware conversational search agent model to balance the risk of answering user's query and asking clarifying questions. The agent is fully aware that asking clarifying questions can potentially collect more information from user, but it will compare all the choices it has and evaluate the risks. Only after then, it will make decision between answering or asking. To demonstrate that our system is able to retrieve better answers, we conduct experiments on the MSDialog dataset which contains real-world customer service conversations from Microsoft products community. We also purpose a reinforcement learning strategy which allows us to train our model on the original dataset directly and saves us from any further data annotation efforts. Our experiment results show that our risk-aware conversational search agent is able to significantly outperform strong non-risk-aware baselines. |
Title: Conversations Gone Alright: Quantifying and Predicting Prosocial Outcomes in Online Conversations |
Authors: Jiajun Bao (Carnegie Mellon University), Junjie Wu (Hong Kong University of Science and Technology), Yiming Zhang (University of Michigan), Eshwar Chandrasekharan (University of Illinois at Urbana-Champaign) and David Jurgens (University of Michigan). |
Online conversations can go in many directions: some turn out poorly due to antisocial behavior, while others turn out positively to the benefit of all. Research on improving online spaces has focused primarily on detecting and reducing antisocial behavior. Yet we know little about positive outcomes in online conversations and how to increase them—is a prosocial outcome simply the lack of antisocial behavior or something more? Here, we examine how conversational features lead to prosocial outcomes within online discussions. We introduce a series of new theory-inspired metrics to define prosocial outcomes such as mentoring and esteem enhancement. Using a corpus of 26M Reddit conversations, we show that these outcomes can be forecasted from the initial comment of an online conversation, with the best model providing at relative 24% improvement over human forecasting performance at ranking conversations for predicted outcome. Our results indicate that platforms can use these early cues in their algorithmic ranking of early conversations to prioritize better outcomes. |
Title: Cookie Swap Party: Abusing First-Party Cookies for Web Tracking |
Authors: Quan Chen (North Carolina State University), Panagiotis Ilia (University of Illinois at Chicago), Michalis Polychronakis (Stony Brook University) and Alexandros Kapravelos (North Carolina State University). |
As a step towards protecting user privacy, most web browsers perform some form of third-party HTTP cookie blocking or periodic deletion by default, while users typically have the option to select even stricter blocking policies. As a result, web trackers have shifted their efforts to work around these restrictions and retain or even improve the extent of their tracking capability. In this paper, we shed light into the increasingly used practice of relying on first-party cookies that are set by third-party JavaScript code to implement user tracking and other potentially unwanted capabilities. Although unlike third-party cookies, first-party cookies are not sent automatically by the browser to third-parties on HTTP requests, this tracking is possible because any included third-party code runs in the context of the parent page, and thus can fully set or read existing first-party cookies—which it can then leak to the same or other third parties. Previous works that survey user privacy on the web in relation to cookies, third-party or otherwise, have not fully explored this mechanism. To address this gap, we propose a dynamic data flow tracking system based on Chromium to track the leakage of first-party cookies to third parties, and used it to conduct a large-scale study of the Alexa top 10K websites. In total, we found that 97.72% of the websites have first-party cookies that are set by third-party JavaScript, and that on 57.66% of these websites there is at least one such cookie that contains a unique user identifier that is diffused to multiple third parties. Our results highlight the privacy-intrusive capabilities of first-party cookies, even when a privacy-savvy user has taken mitigative measures such as blocking third-party cookies, or employing popular crowd-sourced filter lists such as EasyList/EasyPrivacy and the Disconnect list. |
Title: Cost-Effective and Interpretable Job Skill Recommendation with Deep Reinforcement Learning |
Authors: Ying Sun (Institute of Computing Technology, Chinese Academy of Sciences), Fuzhen Zhuang (Institute of Computing Technology, Chinese Academy of Sciences), Hengshu Zhu (Baidu Inc.), Qing He (Institute of Computing Technology, CAS) and Hui Xiong (Rutgers University). |
Nowadays, as organizations operate in very fast-paced and competitive environments, workforce has to be agile and adaptable to regularly learning new job skills. However, it is nontrivial for talents to know which skills to develop at each working stage. To this end, in this paper, we aim to develop a cost-effective recommendation system based on deep reinforcement learning, which can provide personalized and interpretable job skill recommendation for each talent. Specifically, we first design an environment to estimate the utilities of skill learning by mining the massive job advertisement data, which includes a skill-matching-based salary estimator and a frequent itemset-based learning difficulty estimator. Based on the environment, we design a Skill Recommendation Deep Q-Network (SRDQN) with multi- task structure to estimate the long-term skill learning utilities. In particular, SRDQN can recommend job skills in a personalized and cost-effective manner; that is, the talents will only learn the recommended necessary skills for achieving their career goals. Finally, extensive experiments on a real-world dataset clearly validate the effectiveness and interpretability of our approach. |
Title: Cross-domain Knowledge Distillation for Retrieval-based Question Answering Systems |
Authors: Cen Chen (Ant Financial Services Group), Chengyu Wang (Alibaba Group), Minghui Qiu (Alibaba Group), Dehong Gao (Alibaba Group), Linbo Jin (Alibaba Group) and Wang Li (Ant Financial Services Group). |
Question Answering (QA) systems have been extensively studied in both academia and the research community due to their wide real-world applications. When building such industrial-scale QA applications, we are facing two prominent challenges, i.e., i) lacking a sufficient amount of training data to learn an accurate model and ii) requiring high inference speed for online model serving. There are generally two ways to mitigate the above-mentioned problems. One is to adopt transfer learning to leverage information from other domains; the other is to distill the ``dark knowledge'' from a large teacher model to small student models. The former usually employs parameter sharing mechanisms for knowledge transfer, but does not utilize the ``dark knowledge'' of pre-trained large models. The latter usually does not consider the cross-domain information from other domains. We argue that these two types of methods can be complementary to each other. Hence in this work, we provide a new perspective on the potential of the teacher-student paradigm facilitating cross-domain transfer learning, where the teacher and student tasks belong to heterogeneous domains, with the goal to improve the student model's performance in the target domain. Our framework considers the ``dark knowledge'' learned from large teacher models and also leverages the adaptive hints to alleviate the domain differences between teacher and student models. Extensive experiments have been conducted on two text matching tasks for retrieval-based QA systems. Results show the proposed method has better performance than the competing methods including the existing state-of-the-art transfer learning methods. We have also deployed our method in an online production system and observed significant improvements compared to the existing approaches in terms of both accuracy and cross-domain robustness. |
Title: Cross-lingual Language Model Pretraining for Retrieval |
Authors: Puxuan Yu (University of Massachusetts Amherst), Hongliang Fei (Baidu Research) and Ping Li (Baidu). |
Existing research on cross-lingual retrieval can not take good advantage of large-scale pretrained language models such as multilingual BERT and XLM. In this paper, we hypothesize that the absence of cross-lingual passage-level relevance data for finetuning and the lack of query-document style pretraining are key factors. We propose to directly finetune language models on the evaluation collection by making Transformers capable of accepting longer sequences. We introduce two novel retrieval-oriented pretraining tasks to further pretrain cross-lingual language models for downstream retrieval tasks, such as cross-lingual ad-hoc retrieval (CLIR) and cross-lingual question answering (CLQA). We construct distant supervision data from multilingual Wikipedia using section alignment to support retrieval-oriented language model pretraining. Experiments on multiple benchmark datasets show that our proposed model can significantly improve upon general multilingual language models on both cross-lingual retrieval setting and cross-lingual transfer setting. We make our pretraining implementation and checkpoints publicly available for future research. |
Title: Cross-Positional Attention for Debiasing Clicks |
Authors: Honglei Zhuang (Google), Zhen Qin (Google), Xuanhui Wang (Google), Michael Bendersky (Google), Xinyu Qian (Google), Po Hu (Google) and Dan Chary Chen (Google). |
A well-known challenge in leveraging implicit user feedback like clicks to improve real-world search services and recommender systems is its inherent bias. Most existing click models are based on the examination hypothesis in user behaviors and differ in how to model such an examination bias. However, they are constrained by assuming a simple position-based bias or enforcing a sequential order in user examination behaviors. These assumptions are insufficient to capture complex real-world user behaviors and hardly generalize to modern user interfaces (UI) in web applications (e.g., results shown in a grid view). In this work, we propose a fully data-driven neural model for the examination bias, Cross- Positional Attention (XPA), which is more flexible in fitting complex user behaviors. Our model leverages the attention mechanism to effectively capture cross-positional interactions among displayed items and is applicable to arbitrary UIs. We employ XPA in a novel neural click model that can both predict clicks and estimate relevance. Our experiments on offline synthetic data sets show that XPA is robust among different click generation processes. We further apply XPA to a large-scale real-world recommender system, showing significantly better results than baselines in online A/B experiments that involve millions of users. This validates the necessity to model more complex user behaviors than those proposed in the literature. |
Title: Crosslingual Topic Modeling with WikiPDA |
Authors: Tiziano Piccardi (Ecole Polytechnique Fédérale de Lausanne) and Robert West (Ecole Polytechnique Fédérale de Lausanne). |
We present Wikipedia-based Polyglot Dirichlet Allocation (WikiPDA), a crosslingual topic model that learns to represent Wikipedia articles written in any language as distributions over a common set of language-independent topics. It leverages the fact that Wikipedia articles link to each other and are mapped to concepts in the Wikidata knowledge base, such that, when represented as bags of links, articles are inherently language-independent. WikiPDA works in two steps, by first densifying bags of links using matrix completion and then training a standard monolingual topic model. A human evaluation shows that WikiPDA produces more coherent topics than monolingual text-based LDA, thus offering crosslinguality at no cost. We demonstrate WikiPDA's utility in two applications: a study of topical biases in 28 Wikipedia editions, and crosslingual supervised classification. Finally, we highlight WikiPDA's capacity for zero-shot language transfer, where a model is reused for new languages without any fine-tuning. |
Title: CrowdGP: a Gaussian Process Model for Inferencing Relevance from Crowd Annotations |
Authors: Dan Li (University of Amsterdam), Zhaochun Ren (Shandong University) and Evangelos Kanoulas (University of Amsterdam). |
Test collections has been a crucial factor for the development of information retrieval systems. Constructing a test collection requires human annotators to assess the relevance of massive query-document pairs (tasks). Relevance annotations acquired through crowdsourcing platforms alleviate the enormous cost of this process but they are often noisy. Existing models to infer true relevance labels from noisy annotations mostly assume that annotations are generated independently, based on which a probabilistic graphical model is designed to rebuild the annotation generation process. In this paper, we relax the independence assumption by assuming a Gaussian process on the true relevance labels of tasks to model their correlation. We propose a new crowd annotation generation model named CrowdGP, where the true relevance labels, the difficulty and bias of tasks, and the competence and bias of annotators are modelled through a Gaussian process and multiple Gaussian variables respectively. The CrowdGP model shows superior performance in terms of interring true relevance labels compared with state-of-the-art baselines on two crowdsourcing relevance datasets. The experiments also demonstrate its effectiveness in terms of predicting relevance labels for new tasks that has no crowd annotations, which is a new functionality of CrowdGP. Ablation studies demonstrate that the effectiveness is attributed to the modelling of task correlation based on the axillary information of tasks and the prior relevance information of documents to queries. |
Title: CurGraph: Curriculum Learning for Graph Classification |
Authors: Yiwei Wang (National University of Singapore), Wei Wang (National University of Singapore), Yuxuan Liang (National University of Singapore), Yujun Cai (Nanyang Technological University) and Bryan Hooi (National University of Singapore). |
Graph neural networks (GNNs) have achieved state-of-the-art performance on graph classification tasks. Existing work usually feeds graphs to GNNs in a random order for training. However, graphs can vary greatly in their difficulty for classification, and we argue that GNNs can benefit from an easy-to-difficult curriculum, similar to the learning process of humans. Evaluating the difficulty of graphs is challenging due to the high irregularity of graph data. To address this issue, we present the textbf{CurGraph} (Curriculum Learning for Graph Classification) framework, that analyzes the graph difficulty in the high-level semantic feature space. Specifically, we use the infomax method to obtain graph-level embeddings and a neural density estimator to model the embedding distributions. Then we calculate the difficulty scores of graphs based on the intra-class and inter-class distributions of their embedding. Given the difficulty scores, CurGraph first exposes a GNN to easy graphs, before gradually moving on to hard ones. To provide a soft transition from easy to hard, we propose a smooth-step method, which utilizes a time-variant smooth function to filter out hard graphs. Thanks to CurGraph, a GNN learns from the graphs at the border of its capability, neither too easy or too hard, to gradually expand its border at each training step. Empirically, CurGraph yields significant gains for popular GNN models on graph classification and enables them to achieve superior performance on miscellaneous graphs. |
Title: Curriculum CycleGAN for Textual Sentiment Domain Adaptation with Multiple Sources |
Authors: Sicheng Zhao (University of California Berkeley), Yang Xiao (Nankai University), Jiang Guo (Massachusetts Institute of Technology), Xiangyu Yue (University of California Berkeley), Jufeng Yang (Nankai University), Ravi Krishna (University of California Berkeley), Pengfei Xu (Didi Chuxing) and Kurt Keutzer (University of California Berkeley). |
Sentiment analysis of user-generated reviews or comments on products and services on social media can help enterprises to analyze the feedback from customers and take corresponding actions for improvement. Training deep neural networks for textual sentiment analysis requires large-scale labeled data, which is expensive and time-consuming to obtain. Domain adaptation (DA) provides an alternate solution by learning a transferable model from another labeled source domain to the unlabeled or sparsely labeled target domain. Since the labeled data may be from multiple sources, multi- source domain adaptation (MDA) would be more practical to effectively exploit the complementary information from different domains. Existing MDA methods for textual sentiment analysis mainly focus on extracting domain-invariant features of different domains, aligning each source and the target separately, or assigning weights to the source samples statically. However, they might fail to extract some discriminative features in the target domain that are related to sentiment, neglect the correlations of different sources as well as the distribution difference among different sub- domains even in the same source, and cannot reflect the varying optimal weighting during different training stages. In this paper, we propose an instance-level multi-source domain adaptation framework, named curriculum cycle- consistent generative adversarial network (C-CycleGAN), to address the above issues. Specifically, C-CycleGAN consists of three components: (1) pre-trained text encoder which encodes textual input from different domains into a continuous representation space, (2) intermediate domain generator with curriculum instance-level adaptation which bridges the gap across source and target domains, and (3) task classifier trained on the intermediate domain for final sentiment classification. C-CycleGAN transfers source samples at an instance-level to an intermediate domain that is closer to target domain with sentiment semantics preserved and without losing discriminative features. Further, our dynamic instance- level weighting mechanisms can assign the optimal weights to different source samples in each training stage. We conduct extensive experiments on three benchmark datasets and achieve substantial gains over state-of-the-art approaches, which demonstrate the superiority of the proposed C-CycleGAN for textual sentiment classification. |
Title: DAPter: Preventing User Data Abuse in Deep Learning Inference Services |
Authors: Hao Wu (National Key Lab for Novel Software Technology, Nanjing University), Xuejin Tian (National Key Lab for Novel Software Technology, Nanjing University), Yuhang Gong (National Key Lab for Novel Software Technology, Nanjing University), Xing Su (National Key Lab for Novel Software Technology, Nanjing University), Minghao Li (Cornell University) and Fengyuan Xu (National Key Lab for Novel Software Technology, Nanjing University). |
The data abuse concern has risen along with the widespread development of Deep Learning Inference Service (DLIS). Mobile users specifically worry about their DLIS input data being taken advantage of in the training of new deep learning (DL) models. Mitigating this new concern is demanding because it requires excellent balancing between data abuse prevention and highly-usable service. Unfortunately, existing works do not meet this unique requirement. In this work, we propose the first data abuse prevention mechanism called DAPter. DAPter is a user-side DLIS-input converter, and its outputs, although still good for inference, can hardly be labeled for new model training. At the core of DAPter is a lightweight generative model trained with a novel loss function to minimize abusable information in the inference input. Moreover, adapting DAPter does not have to change the existed provider backend and DLIS models. We conduct comprehensive experiments with our DAPter prototype on mobile devices and demonstrate that DAPter can substantially raise the bar of data abuse difficulty with little impact on the service quality and overhead. |
Title: Data Poisoning Attacks and Defenses to Crowdsourcing Systems |
Authors: Minghong Fang (Iowa State University), Minghao Sun (Iowa State University), Qi Li (Iowa State University), Neil Zhenqiang Gong (Duke University), Jin Tian (Iowa State University) and Jia Liu (The Ohio State University). |
A key challenge of big data analytics is how to collect a large volume of (labeled) data. Crowdsourcing aims to address this challenge via aggregating and estimating high-quality data (e.g., sentiment label for text) from pervasive clients/users. Existing studies on crowdsourcing focus on designing new methods to improve the aggregated data quality from unreliable/noisy clients. However, the security aspects of such crowdsourcing systems remain under-explored to date. We aim to bridge this gap in this work. Specifically, we show that crowdsourcing is vulnerable to data poisoning attacks, in which malicious clients provide carefully crafted data to corrupt the aggregated data. We formulate our proposed data poisoning attacks as an optimization problem that maximizes the error of the aggregated data. Our evaluation results on one synthetic and two real-world benchmark datasets demonstrate that the proposed attacks can substantially increase the estimation errors of the aggregated data. We also propose two defenses to reduce the impact of malicious clients. Our empirical results show that the proposed defenses can substantially reduce the estimation errors of the data poisoning attacks. |
Title: DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems |
Authors: Ruoxi Wang (Google), Rakesh Shivanna (Google), Derek Cheng (Google), Sagar Jain (Google), Dong Lin (Google), Lichan Hong (Google) and Ed Chi (Google). |
Learning effective feature crosses is the key behind building recommender systems. However, the sparse and large feature space requires exhaustive search to identify effective crosses. Deep & Cross Network (DCN) was proposed to automatically and efficiently learn bounded-degree predictive feature interactions. Unfortunately, in models that serve web-scale traffic with billions of training examples, DCN showed limited expressiveness in its cross network at learning more predictive feature interactions. Despite significant research progress made, many deep learning models in production still rely on traditional feed-forward neural networks to learn feature crosses inefficiently. In light of the pros/cons of DCN and existing feature interaction learning approaches, we propose an improved framework DCN-V2 to make DCN more practical in large-scale industrial settings. In a comprehensive experimental study with extensive hyper-parameter search and model tuning, we observed that DCN-V2 approaches outperform all the state-of-the-art algorithms on popular benchmark datasets. The improved DCN-V2 is more expressive yet remains cost efficient at feature interaction learning, especially when coupled with a mixture of low-rank architecture. DCN-V2 is simple, can be easily adopted as building blocks, and has delivered significant offline accuracy and online business metrics gains across many web-scale learning to rank systems at Google. |
Title: Debiasing Career Recommendations with Neural Fair Collaborative Filtering |
Authors: Rashidul Islam (University of Maryland, Baltimore County), Kamrun Naher Keya (University of Maryland, Baltimore County), Ziqian Zeng (The Hong Kong University of Science and Technology), Shimei Pan (University of Maryland, Baltimore County) and James Foulds (University of Maryland, Baltimore County). |
A growing proportion of human interactions are digitized on social media platforms and subjected to algorithmic decision-making, and it has become increasingly important to ensure fair treatment from these algorithms. In this work, we investigate gender bias in collaborative-filtering recommender systems trained on social media data. We develop neural fair collaborative filtering (NFCF), a practical framework for mitigating gender bias in recommending career-related sensitive items (e.g. jobs, academic concentrations, or courses of study) using a pre-training and fine-tuning approach to neural collaborative filtering, augmented with bias correction techniques. We show the utility of our methods for gender de-biased career and college major recommendations on the MovieLens dataset and a Facebook dataset, respectively, and achieve better performance and fairer behavior than several state-of-the-art models. |
Title: Deep Co-Attention Network for Multi-View Subspace Learning |
Authors: Lecheng Zheng (University of Illinois at Urbana-Champaign), Yu Cheng (Microsoft), Hongxia Yang (Alibaba Group), Nan Cao (TongJi University) and Jingrui He (University of Illinois at Urbana-Champaign). |
Many real-world applications involve data from multiple modalities and thus exhibit the view heterogeneity. For example, user modeling on social media might leverage both the topology of the underlying social network and the content of the users' posts; in the medical domain, multiple views could be X-ray images taken at different poses. To date, various techniques have been proposed to achieve promising results, such as canonical correlation analysis based methods, etc. In the meanwhile, it is critical for decision-makers to be able to understand the prediction results from these methods. For example, given the diagnostic result that a model provided based on the X-ray images of a patient at different poses, the doctor needs to know why the model made such a prediction. However, state-of-the-art techniques usually suffer from the inability to utilize the complementary information of each view and to explain the predictions in an interpretable manner. To address these issues, in this paper, we propose a deep co-attention network for multi-view subspace learning, which aims to extract both the common information and the complementary information in an adversarial setting and provide robust interpretations behind the prediction to the end-users via the co-attention mechanism. In particular, it uses a novel cross reconstruction loss and leverages the label information to guide the construction of the latent representation by incorporating the classifier into our model. This improves the quality of latent representation and accelerates the convergence speed. Finally, we develop an efficient iterative algorithm to find the global optimal solution of the representation, which is evaluated extensively on synthetic and real-world data sets. We also conduct a case study to demonstrate how the proposed method robustly interprets the predictive results on an image data set. |
Title: DeepFEC: Energy Consumption Prediction under Real-World Driving Conditions for Smart Cities |
Authors: Sayda Elmi (NUS) and Kian-Lee Tan (NUS). |
The status of air pollution is serious all over the world. Analysing and predicting vehicle energy consumption becomes a major concern. Vehicle energy consumption depends not only on speed but also on a number of external factors such as road topology, traffic, driving style, etc. Obtaining the cost for each link (i.e., link energy consumption) in road networks plays a key role in energy-optimal route planning process. This paper presents a novel framework that identifies vehicle/driving environment-dependent factors to predict energy consumption over a road network based on historical consumption data for different vehicle types. We design a deep-learning-based structure, called DeepFEC, to forecast accurate energy consumption in each and every road in a city based on real traffic conditions. A residual neural network and recurrent neural network are employed to model the spatial and temporal closeness, respectively. Static vehicle data reflecting vehicle type, vehicle weight, engine configuration and displacement are also learned. The outputs of these neural networks are dynamically aggregated to improve the spatially correlated time series data forecasting. Extensive experiments conducted on a diverse fleet consisting of 264 gasoline vehicles, 92 Hybrid Electric Vehicles, and 27 Plug-in Hybrid Electric Vehicles/Electric Vehicles drove in Michigan road network, show that our proposed deep learning algorithm significantly outperforms the state-of-the-art prediction algorithms. |
Title: DeepRec: On-device Deep Learning for Privacy-Preserving Sequential Recommendation in Mobile Commerce |
Authors: Jialiang Han (Peking University), Yun Ma (Peking University), Qiaozhu Mei (University of Michigan) and Xuanzhe Liu (Peking University). |
Sequential recommendation techniques are considered to be a promising way of providing better user experience in mobile commerce by learning temporal interests within historical user interaction behaviors. However, the recently increasing focus on privacy concerns, such as the General Data Protection Regulation (GDPR), can significantly affect the deployment of state-of-the-art sequential recommendation techniques, because the user behavior data are no longer allowed to be arbitrarily used without the user’s explicit permission. This paper proposes a novel sequential recommendation technique, namely DeepRec, which invents an on-device deep learning framework of mining interaction behaviors for recommendation without sending any raw data or intermediate results out of the device, preserving user privacy maximally. DeepRec constructs a global model using data collected before GDPR and fine-tunes a personal model continuously on individual mobile devices using data collected after GDPR. DeepRec employs the model pruning and embedding sparsity techniques to reduce the computation and network overhead, making the model training process practical on computation-constraint mobile devices. Evaluation results over a widely-adopted publicly released user behavior dataset from Taobao, which contains around a million users and 72 million clicks, show that DeepRec can achieve comparable recommendation accuracy to existing centralized recommendation approaches with small computation overhead and up to 10x reduction in network overhead. |
Title: DeepVista: 16K Panoramic Cinema on Your Mobile Device |
Authors: Wenxiao Zhang (Hong Kong University of Science and Technology), Feng Qian (University of Minnesota, Twin Cities), Bo Han (George Mason University) and Pan Hui (Hong Kong University of Science and Technology). |
In this paper, we design, implement, and evaluate DeepVista, which is to our knowledge the first consumer-class system that streams panoramic videos far beyond the ultra high-definition resolution (up to 16K) to mobile devices, offering truly immersive experiences. Such an immense resolution makes streaming video-on-demand (VoD) content extremely resource-demanding. To tackle this challenge, DeepVista introduces a novel framework that leverages an edge server to perform efficient, intelligent, and quality-guaranteed content transcoding, by extracting from panoramic frames the viewport stream that will be delivered to the client. To support real-time transcoding of 16K content, DeepVista employs several key mechanisms such as dual-GPU acceleration, lossless viewport extraction, deep viewport prediction, and a two-layer streaming design. Our extensive evaluations using real users’ viewport movement data indicate that DeepVista outperforms existing solutions, and can smoothly stream 16K panoramic videos to commodity mobile devices over diverse wireless networks including WiFi, LTE, and mmWave 5G. |
Title: Demystifying Illegal Mobile Gambling Apps |
Authors: Yuhao Gao (Beijing University of Posts and Telecommunications), Haoyu Wang (Beijing University of Posts and Telecommunications), Li Li (Monash University), Xiapu Luo (The Hong Kong Polytechnic University), Xuanzhe Liu (Peking University) and Guoai Xu (BUPT). |
Mobile gambling app, as a new type of online gambling service emerging in the mobile era, has become one of the most popular and lucrative underground businesses in the mobile app ecosystem. Since its born, mobile gambling app has received strict regulations from both government authorities and app markets. However, to the best of our knowledge, mobile gambling apps have not been investigated by our research community. In this paper, we take the first step to fill the void. Specifically, we first perform a 5-month dataset collection process to harvest illegal gambling apps in China, where mobile gambling apps are outlawed. We have collected 3,366 unique gambling apps with 5,344 different versions. We then characterize the gambling apps from various perspectives including app distribution channels, network infrastructure, malicious behaviors, abused third-party and payment services. Our work has revealed a number of covert distribution channels, the unique characteristics of gambling apps, and the abused fourth-party payment services. At last, we further propose a ``guilt-by-association'' expansion method to identify new suspicious gambling services, which help us further identify over 140K suspicious gambling domains and over 57K gambling app candidates. Our study demonstrates the urgency for detecting and regulating illegal gambling apps. |
Title: Density-Ratio Based Personalised Ranking from Implicit Feedback |
Authors: Riku Togashi (CyberAgent, Inc.), Masahiro Kato (CyberAgent, Inc.), Mayu Otani (CyberAgent, Inc.) and Shin'Ichi Satoh (National Institute of Informatics). |
Learning from implicit user feedback is challenging as we can only observe positive samples but never access negative ones. Most conventional methods cope with this issue by adopting a pairwise ranking approach with negative sampling. However, the pairwise ranking approach has a severe disadvantage in the convergence time owing to the quadratically increasing computational cost with respect to the sample size; it is problematic, particularly for large-scale datasets and complex models such as neural networks. By contrast, a pointwise approach does not directly solve a ranking problem, and is therefore inferior to a pairwise counterpart in top-K ranking tasks; however, it is generally advantageous in regards to the convergence time. This study aims to establish an approach to learn personalised ranking from implicit feedback, which reconciles the training efficiency of the pointwise approach and ranking effectiveness of the pairwise counterpart. The key idea is to estimate the ranking of items in a pointwise manner; we first reformulate the conventional pointwise approach based on density ratio estimation and then incorporate the essence of ranking-oriented approaches (e.g. the pairwise approach) into our formulation. Through experiments on three real-world datasets, we demonstrate that our approach not only dramatically reduces the convergence time (one to two orders of magnitude faster) but also significantly improving the ranking performance. |
Title: Detecting and Quantifying Wash Trading on Decentralized Cryptocurrency Exchanges |
Authors: Friedhelm Victor (TU Berlin) and Andrea Marie Weintraud (TU Berlin). |
Cryptoassets such as cryptocurrencies and tokens are increasingly traded on decentralized exchanges. The advantage for users is that the funds are not in custody of a centralized external entity. However, these exchanges are prone to manipulative behavior. In this paper, we illustrate how wash trading activity can be identified on two of the first popular decentralized exchanges on the Ethereum blockchain, IDEX and EtherDelta. We identify accounts and trading structures that meet the legal definitions of wash trading, discovering that they are responsible for a wash trading volume in equivalent of 159 million U.S. Dollars. While self trades and two-account structures are predominant, complex forms also occur. We quantify the activity in detail, finding that on both exchanges, more than 30% of all traded tokens have been subject to wash trading activity. On EtherDelta, 10% of the tokens have almost exclusively been wash traded. All data is made available for future research. Our findings underpin the need for countermeasures that are applicable in decentralized systems. |
Title: DF-TAR: A Deep Fusion Network for Citywide Traffic Accident Risk Prediction with Dangerous Driving Behavior |
Authors: Patara Trirat (Korea Advanced Institute of Science and Technology) and Jae-Gil Lee (Korea Advanced Institute of Science and Technology). |
Because traffic accidents cause huge social and economic losses, it is of prime importance to precisely predict the traffic accident risk for reducing future accidents. In this paper, we propose a Deep Fusion network for citywide Traffic Accident Risk prediction (DF-TAR) with dangerous driving statistics that contain the frequencies of various dangerous driving offences in each region. Our unique contribution is to exploit these statistics, obtained by processing the data from in- vehicle sensors, for modeling the traffic accident risk. Toward this goal, we first examine the correlation between dangerous driving offences and traffic accidents, and the analysis shows a strong correlation between them in terms of both location and time. Specifically, quick start (0.83), rapid acceleration (0.76), and sharp turn (0.76) are the top three offences that have the highest average correlation scores. We then train the DF-TAR model using the dangerous driving statistics as well as external environmental features. By extensive experiments on various frameworks, the DF-TAR model is shown to improve the accuracy of the baseline models by up to 54% by virtue of the integration of dangerous driving into the modeling of traffic accident risk. |
Title: Dialect Diversity in Text Summarization on Twitter |
Authors: L. Elisa Celis (Yale University) and Vijay Keswani (Yale University). |
Discussions on Twitter involve participation from different communities with different dialects and it is often necessary to summarize a large number of posts into a representative sample to provide a synopsis. Yet, any such representative sample should sufficiently portray the underlying dialect diversity to present the voices of different participating communities representing the dialects. Extractive summarization algorithms perform the task of constructing subsets that succinctly capture the topic of any given set of posts. However, we observe that there is dialect bias in the summaries generated by common summarization approaches, i.e., they often return summaries that under-represent certain dialects. The vast majority of existing "fair" summarization approaches require socially salient attribute labels (in this case, dialect) to ensure that the generated summary is fair with respect to the socially salient attribute. Nevertheless, in many applications, these labels do not exist. Furthermore, due to the ever-evolving nature of dialects in social media, it is unreasonable to label or accurately infer the dialect of every social media post. To correct for the dialect bias, we employ a framework that takes an existing text summarization algorithm as a blackbox and, using a small set of dialect- diverse sentences, returns a summary that is relatively more dialect-diverse. Crucially, this approach does not need the posts being summarized to have dialect labels, ensuring that the diversification process is independent of dialect classification/identification models. We show the efficacy of our approach on Twitter datasets containing posts written in dialects used by different social groups defined by race or gender; in all cases, our approach leads to improved dialect diversity compared to standard text summarization approaches. |
Title: DISCOS: Bridging the Gap between Discourse Knowledge and Commonsense Knowledge |
Authors: Tianqing Fang (Hong Kong University of Science and Technology), Hongming Zhang (Hong Kong University of Science and Technology), Weiqi Wang (Hong Kong University of Science and Technology), Yangqiu Song (Hong Kong University of Science and Technology) and Bin He (Huawei Noah's Ark Lab). |
Commonsense knowledge is crucial for artificial intelligence systems to understand natural language. Previous commonsense knowledge acquisition approaches typically rely on human annotations (e.g., ATOMIC) or text generation models (e.g., COMET). Human annotation could provide high-quality commonsense knowledge, yet its high cost often results in relatively small scale and low coverage. On the other hand, generation models have the potential to automatically generate more knowledge. Nonetheless, machine learning models often fit the training data well and thus struggle to generate high-quality novel knowledge. To address the limitation of previous approaches, in this paper, we propose an alternative commonsense knowledge acquisition framework DISCOS (from DIScourse to COmmonSense), which automatically mines expensive complex commonsense knowledge from more affordable linguistic knowledge resources. Experiments demonstrate that we can successfully convert discourse knowledge about eventualities from ASER, a large-scale discourse knowledge graph, into if-then commonsense knowledge defined in ATOMIC without any additional annotation effort. Further study suggests that DISCOS significantly outperforms previous supervised approaches in terms of novelty and diversity with comparable quality. |
Title: Disentangling User Interest and Conformity for Recommendation with Causal Embedding |
Authors: Yu Zheng (Tsinghua University), Chen Gao (Tsinghua University), Xiang Li (University of Hong Kong), Xiangnan He (University of Science and Technology of China), Yong Li (Tsinghua University) and Depeng Jin (Department of Electronic Engineering, Tsinghua University). |
Recommendation models are usually trained on observational interaction data. However, observational interaction data could result from users' conformity towards popular items, which entangles users' real interest. Existing methods tracks this problem as eliminating popularity bias, e.g., by re-weighting training samples or leveraging a small fraction of unbiased data. However, the variety of user conformity is ignored by these approaches, and different causes of an interaction are bundled together as unified representations, hence robustness and interpretability are not guaranteed when underlying causes are changing. In this paper, we present DICE, a general framework that learns representations where interest and conformity are structurally disentangled, and various backbone recommendation models could be smoothly integrated. We assign user and items with separate embeddings for interest and conformity, and make each embedding capture only one cause by training with cause-specific data and imposing direct disentanglement supervision on the embedding distribution. Our proposed methodology outperforms state-of-the-art baselines with remarkable improvements on two real-world datasets on top of various backbone models. We further demonstrate that the learned embeddings successfully capture the desired causes, and show that DICE guarantees the robustness and interpretability of recommendation. |
Title: Dissecting Performance of Production QUIC |
Authors: Alexander Yu (Brown University) and Theophilus Benson (Brown University). |
IETF QUIC, the standardized version of Google's UDP-based layer-4 network protocol, has seen increasing adoption from large Internet companies for its benefits over TCP. Yet despite its rapid adoption, performance analysis of QUIC in production is scarce. Most existing analyses have only used unoptimized open-source QUIC servers on non-tuned kernels: these analyses are unrepresentative of production deployments which raises the question of whether QUIC actually outperforms TCP in practice. In this paper, we conduct one of the first comparative studies on the performance of QUIC and TCP against production endpoints hosted by Google, Facebook, and Cloudflare under various dimensions: network conditions, workloads, and client implementations. To understand our results, we create a tool to systematically analyze the root cause of the performance differences between the two protocols. Using our tool we make several key observations. First, while QUIC has some inherent advantages over TCP, such as worst-case 1-RTT handshakes, much of its overall performance gains are largely determined by the server's choice of congestion-control algorithm and the robustness of its congestion-control implementation under edge-case network scenarios. Second, we find that performance of QUIC clients is a function of the configured flow control and TLS, which can be non-trivial to tune. Lastly, we demonstrate that QUIC's removal of head-of-line (HOL) blocking has little impact on web-page performance in practice. |
Title: DistCare: Distilling Knowledge from Publicly Available Online EMR Data to Emerging Epidemic for Prognosis |
Authors: Liantao Ma (School of Electronics Engineering and Computer Science, Peking University, China), Xinyu Ma (School of Electronics Engineering and Computer Science, Peking University, China), Junyi Gao (National Engineering Research Center of Software Engineering, Peking University, China), Xianfeng Jiao (National Engineering Research Center of Software Engineering, Peking University, China), Zhihao Yu (National Engineering Research Center of Software Engineering, Peking University, China), Chaohe Zhang (School of Electronics Engineering and Computer Science, Peking University, China), Wenjie Ruan (University of Exeter), Yasha Wang (School of Electronics Engineering and Computer Science, Peking University, China), Wen Tang (Division of Nephrology, Peking University Third Hospital, China) and Jiangtao Wang (United Kingdom Center for Intelligent Healthcare, Coventry University, UK). |
Due to the characteristics of COVID-19, the epidemic develops rapidly and overwhelms health service systems worldwide. Many patients suffer from life-threatening systemic problems and need to be carefully monitored in ICUs. An intelligent prognosis can assist physicians to take an early intervention, prevent adverse outcomes, and optimize the medical resource allocation, thus it is urgently needed especially in this ongoing global pandemic crisis. However, in the early stage of the epidemic outbreak, the data available for analysis is limited due to the lack of effective diagnostic mechanisms, the rarity of the cases, and privacy concerns. In this paper, we propose a distilled transfer learning framework, DistCare, which leverages the existing publicly available online electronic medical records to enhance the prognosis for inpatients with emerging infectious diseases. It learns to embed the COVID-19-related medical features based on massive existing EMR data. The transferred parameters are further trained to imitate the teacher models representation behavior based on distillation, which em-beds the health status comprehensively in the source dataset. We conduct the length-of-stay prediction experiments for patients in ICUs on a real-world COVID-19 dataset. The experiment results indicate that our proposed model consistently outperforms competitive baseline methods, especially when the data is insufficient. In order to further verify the scalability of DistCare to deal with different clinical tasks on different EMR datasets, we conduct an additional mortality prediction experiment on multiple end-stage renal disease datasets. The extensive experiments demonstrate that DistCare can significantly benefit the prognosis for emerging pandemics and other diseases with limited EMR. As a proof-of-concept to demonstrate that DistCare can assist the prognosis, we also implement DistCare as a real-world AI-Doctor interaction system that can reveal the patient’s health trajectory for the prognosis. We release our code and the AI-Doctor interaction system anonymously at GitHub https://github.com/anonymous201902/DistCare. |
Title: Distillation FrameworkExtract the Knowledge of Graph Neural Networks and Go Beyond it: An Effective Knowledge |
Authors: Cheng Yang (Beijing University of Posts and Telecommunications), Jiawei Liu (Beijing University of Posts and Telecommunications) and Chuan Shi (Beijing University of Posts and Telecommunications). |
Semi-supervised learning on graphs is an important problem in the machine learning area. In recent years, state-of-the- art classification methods based on graph neural networks (GNNs) have shown their superiority over traditional ones such as label propagation. However, these neural models are mostly ``black boxes'' and their sophisticated architectures will lead to a complex prediction mechanism which could not make full use of valuable prior knowledge lying in the data, e.g., structurally correlated nodes tend to have the same class. In this paper, we propose a framework based on knowledge distillation to address the above issues. Our framework extracts the knowledge of an arbitrary learned GNN model (teacher model), and injects it into a well-designed student model. The student model is built with two simple prediction mechanisms, i.e., label propagation and feature transformation, which naturally preserves structure-based and feature-based prior knowledge, respectively. In specific, we design the student model as a trainable combination of parameterized label propagation and feature transformation modules. As a result, the learned student can benefit from both prior knowledge and the knowledge in GNN teachers for more effective predictions. Moreover, the learned student model has a more interpretable prediction process than GNNs. We conduct experiments on five public benchmark datasets and employ seven GNN models including GCN, GAT, APPNP, SAGE, SGC, GCNII and GLP as the teacher models. Experimental results show that the learned student model can outperform its corresponding teacher model by 1.4%-4.7% on average. The improvements are consistent and significant with better interpretability. |
Title: Diverse and Specific Clarification Question Generation with Keywords |
Authors: Zhiling Zhang (Shanghai Jiao Tong University) and Kenny Zhu (Shanghai Jiao Tong University). |
Product descriptions on e-commerce websites often suffer from the missing of important aspects. Clarification question generation (CQGen) can be a promising approach to help alleviate the problem. Unlike traditional QGen assuming the existence of answers in the context and generating questions accordingly, CQGen mimics user behaviors of asking for unstated information. The generated CQs may help vendors check and fill in those missing information before posting, and improve consumer experience consequently. Due to the variety of possible user backgrounds and use cases, the information need can be quite diverse and specific, while previous works assume generating one CQ per context and the generation results tend to be generic. We thus propose the task of Diverse CQGen and also tackle the challenge of specificity.We propose a newmodel named KPCNet, which generates CQs with Keyword Prediction and Conditioning, to deal with the tasks. Automatic and human evaluation on 2 datasets (Home & Kitchen, Office) showed that KPCNet can generate more specific questions and promote better group-level diversity than several competitive baselines. |
Title: Diversification-Aware Learning to Rank using Distributed Representation |
Authors: Le Yan (Google), Zhen Qin (Google), Rama Kumar Pasumarthi (Google), Xuanhui Wang (Google) and Michael Bendersky (Google). |
Existing work on search result diversification typically falls into the ``next document'' paradigm, that is, selecting the next document based on the ones already chosen. A sequential process of selecting documents one-by-one is naturally modeled in learning-based approaches. However, such a process makes the learning difficult because there are an exponential number of ranking lists to consider. Sampling is usually used to reduce the computational complexity but this makes the learning less effective. In this paper, we propose a soft version of the ``next document'' paradigm in which we associate each document with an approximate rank, and thus the subtopics covered prior to a document can also be estimated. We show that we can derive differentiable diversification-aware losses, which are smooth approximation of diversity metrics like $alpha$-NDCG, based on these estimates. We further propose to optimize the losses in the learning-to-rank setting using neural distributed representations of queries and documents. Experiments are conducted on the public benchmark TREC datasets. By comparing with an extensive list of baseline methods, we show that our Diversification-Aware LEarning-TO-Rank (DALETOR) approaches outperform them by a large margin, while being much simpler during learning and inference. |
Title: Diversified Recommendation Through Similarity-Guided Graph Neural Networks |
Authors: Yu Zheng (Tsinghua University), Chen Gao (Tsinghua University), Liang Chen (Sun Yat-sen University), Yong Li (Tsinghua University) and Depeng Jin (Department of Electronic Engineering, Tsinghua University). |
These years much effort has been devoted to improving the accuracy or relevance of the recommendation system. Diversity, a crucial factor which measures the dissimilarity among the recommended items, received rather little scrutiny. Directly related to user satisfaction, diversification is usually taken into consideration after generating the candidate items. However, this decoupled design of diversification and candidate generation makes the whole system suboptimal. In this paper, we aim at pushing the diversification to the upstream in the representation learning stage, with the help of Graph Neural Networks(GNN). Although GNN based recommendation algorithms have shown great power in modeling complex collaborative filtering effect which makes the recommended items more relevant, how diversity change is ignored in those advanced works. We propose to perform rebalanced neighbor discovering, category-boosted negative sampling and adversarial learning with the guidance of item similarity. We conduct extensive experiments on real world e-commerce datasets. Experimental results verified the effectiveness of our proposed method on providing diverse contents. Further ablation studies validate that our proposed method could significantly alleviate the accuracy-diversity dilemma. |
Title: Diversity on the Go! Streaming Determinantal Point Processes under a Maximum Induced Cardinality Objective |
Authors: Paul Liu (Stanford University), Akshay Soni (Microsoft), Eun Yong Kang (Microsoft), Yajun Wang (Microsoft) and Mehul Parsana (Microsoft). |
Over the past decade, Determinantal Point Processes (DPPs) have proven to be a mathematically elegant framework for modeling diversity. Given a set of items $N$, DPPs define a probability distribution over subsets of $N$, with sets of larger diversity having greater probability. Recently, DPPs have achieved success in the domain of recommendation systems, as a method to enforce diversity of recommendations in addition to relevance. In large-scale recommendation applications however, the input typically comes in the form of a emph{stream} too large to fit into main memory. However, the natural greedy algorithm for DPP-based recommendations is memory intensive, and cannot be used in a streaming setting. In this work, we give the first streaming algorithm for optimizing DPPs under the Maximum Induced Cardinality (MIC) objective of Gillenwater etal~cite{gillenwater_maximizing_2018}. As noted by cite{gillenwater_maximizing_2018}, the MIC objective is better suited towards recommendation systems than the classically used maximum a posteriori (MAP) DPP objective. In the insertion-only streaming model, our algorithm runs in $tilde{O}(k^2)$ time per update and uses $tilde{O}(k)$ memory, where $k$ is the number of diverse items to be selected. In the sliding window streaming model, our algorithm runs in $tilde{O}(sqrt{n}k^2)$ time per update and $tilde{O}(sqrt{n}k)$ memory where $n$ is the size of the sliding window. The approximation guarantees are simple, and depend on the largest and the $k$-th largest eigenvalues of the kernel matrix used to model diversity. We show that in practice, the algorithm often achieves close to optimal results, and meets the memory and latency requirements of production systems. Furthermore, the algorithm works well even in a non-streaming setting, and runs in a fraction of time compared to the classic greedy algorithm. |
Title: Dr.Emotion: Disentangled Representation Learning for Emotion Analysis on Social Media to Improve Community Resilience in the COVID-19 Era and Beyond |
Authors: Mingxuan Ju (Case Western Reserve University), Wei Song (Case Western Reserve University), Shiyu Sun (Case Western Reserve University), Yanfang Ye (Case Western Reserve University), Yujie Fan (Case Western Reserve University), Shifu Hou (Case Western Reserve University), Kenneth Loparo (Case Western Reserve University) and Liang Zhao (Emory University). |
The deadly outbreak of coronavirus disease (COVID-19) has posed grand challenges to human society; in response to the spread of COVID-19, many activities have moved online and social media has played an important role by enabling people to discuss their experiences and feelings of this global crisis. To help combat the prolonged pandemic that has exposed vulnerabilities impacting community resilience, in this paper, based on our established large-scale COVID-19 related social media data, we propose and develop an integrated framework (named Dr.Emotion) to learn disentangled representations of social media posts (i.e., tweets) for emotion analysis and thus to gain deep insights into public perceptions towards COVID-19. In Dr.Emotion, for given social media posts, we first post-train a transformer-based model to obtain the initial post embeddings. Since users may implicitly express their emotions in social media posts which could be highly entangled with the content in the context, to address this challenge for emotion analysis, we propose an adversarial disentangler by integrating emotion-independent (i.e., sentiment-neutral) priors of the posts generated by another post-trained transformer-based model to separate and disentangle the implicitly encoded emotions from the content in latent space for emotion classification at the first attempt. Extensive experimental studies are conducted to fully evaluate Dr.Emotion and promising results demonstrate its performance in emotion analysis by comparisons with state-of-the-art baseline methods. By exploiting our developed Dr.Emotion, we further perform emotion analysis based on a large number of social media posts (i.e., 107,434 COVID-19 related tweets posted by users in the United States through Mar 1, 2020 to Sep 30, 2020) and provide in-depth investigation from both temporal and geographical perspectives, based on which additional work can be conducted to extract and transform the constructive ideas, experiences and support into actionable information to improve community resilience in responses to a variety of crises created by COVID-19 and well beyond. |
Title: Drug Package Recommendation via Interaction-aware Graph Induction |
Authors: Zhi Zheng (University of Science and Technology of China), Chao Wang (University of Science and Technology of China), Tong Xu (University of Science and Technology of China), Dazhong Shen (University of Science and Technology of China), Penggang Qin (University of Science and Technology of China), Baoxing Huai (Huawei Technologies), Tongzhu Liu (The First Affiliated Hospital of USTC) and Enhong Chen (University of Science and Technology of China). |
Recent years have witnessed the rapid accumulation of massive electronic medical records (EMRs), which highly support the intelligent medical services such as drug recommendation. However, prior arts mainly follow the traditional recommendation strategies like collaborative filtering, which usually treat individual drugs as mutually independent, while the latent interactions among drugs, e.g., synergistic or antagonistic effect, have been largely ignored. To that end, in this paper, we target at developing a new paradigm for drug package recommendation with considering the interaction effect within drugs, in which the interaction effects could be affected by patient conditions. Specifically, we first design a pre-training method based on neural collaborative filtering to get the initial embedding of patients and drugs. Then, the drug interaction graph will be initialized based on medical records and domain knowledge. Along this line, we propose a new Drug Package Recommendation (DPR) framework with two variants, respectively DPR on Weighted Graph (DPR-WG) and DPR on Attributed Graph (DPR-AG) to solve the problem, in which each the interactions will be described as signed weights or attribute vectors. In detail, a mask layer is utilized to capture the impact of patient condition, and graph neural networks (GNNs) are leveraged for the final graph induction task to embed the package. Extensive experiments on a real- world data set from a first-rate hospital demonstrate the effectiveness of our DPR framework compared with several competitive baseline methods, and further support the heuristic study for the drug package generation task with adequate performance. |
Title: Dual Side Deep Context-aware Modulation for Social Recommendation |
Authors: Bairan Fu (Nanjing University), Wenming Zhang (Nanjing University), Guangneng Hu (Hong Kong University of Science and Technology), Xinyu Dai (Nanjing University), Shujian Huang (Nanjing University) and Jiajun Chen (Nanjing University). |
Social recommendation is effective in improving the recommendation performance by leveraging social relations from online social networking platforms. Social relations among users provide friends' information for modelling users' interest in candidate item and help items expose to potential consumers (i.e. item attraction). However, there are two issues haven't been well-studied: Firstly, for the user interests, existing methods typically aggregate friends' information contextualized on the candidate item only, and this shallow context-aware aggregation makes them suffer from the limited friends' information. Secondly, for the item attraction, if the item's past consumers are the friends of or have the similar consumption habit to the targeted user, the item may be more attractive to the targeted user, but most existing methods neglect the relation enhanced context-aware item attraction. To address the above issues, we proposed DICER(Dual sIde deep Context-awarE modulation for social Recommendation). Specifically, we first proposed a novel graph neural network to model the social relation and collaborative relation, and on top of high-order relations, a dual side deep context-aware modulation is introduced to capture the friends' information and item attraction. Empirical results on two real-world datasets show the effectiveness of the proposed model and further experiments are conducted to help understand how the dual context-aware modulation works. |
Title: DYMOND: DYnamic MOtif-NoDes Network Generative Model |
Authors: Giselle Zeno (Purdue University), Timothy La Fond (Lawrence Livermore National Laboratory) and Jennifer Neville (Purdue University). |
The graph structure in dynamic networks changes rapidly. By leveraging temporal information inherent in network connections, models of these dynamic networks can be constructed to analyze how their structure changes over time. However, most existing generative models for temporal graphs grow networks, i.e. links are added over time, ignoring that real networks are dynamic with frequent lulls in activity. To this end, motifs have been established as building blocks for the structure of networks, thus modeling these higher-order structures can help to generate the graph structure seen on real-world networks. Furthermore, motifs can capture correlations in node connections and activity. To date, there are few dynamic-graph generative models and a minority of these consider higher-order network structure (instead of only node pair-wise connections). Such models have been evaluated using static graph structure metrics without incorporating measures that reflect the temporal behavior of the network. Our proposed DYnamic MOtif-NoDes (DYMOND) model considers both the dynamic changes in overall graph structure using temporal motif activity and the roles nodes play in motifs (e.g., one node plays the hub role in a wedge, while the remaining two act as spokes). We compare our model against three dynamic graph generative model baselines on real- world networks. We also propose a new methodology to adapt graph structure metrics to include the temporal aspect of the network. Our contributions in this paper are: (1) a statistical dynamic-graph generative model that samples graphs with realistic structure and temporal node behavior using motifs, and (2) a novel methodology for comparing dynamic- graph generative models and measuring how well they capture the underlying graph structure distribution and temporal node behavior of a real graph. |
Title: Dynamic Embeddings for Interaction Prediction |
Authors: Zekarias Kefato (KTH Royal Institute of Technology), Sarunas Girdzijauskas (Royal Institute of Technology (KTH), Sweden), Nasrullah Sheikh (IBM) and Alberto Montresor (University of Trento). |
In recommender systems (RSs), predicting the next item that a user interacts with is critical for user retention. While the last decade has seen an explosion of RSs aimed at identifying relevant items that match user preferences, there is still a range of aspects that could be considered to further improve their performance. For example, often RSs are centered around the user, who is modeled using her recent sequence of activities. Recent studies, however, have shown the effectiveness of modeling the mutual interactions between users and items using separate user and item embeddings. Building on the success of these studies, we propose a novel method called DeepRed that addresses some of their limitations. In particular, we avoid recursive and costly interactions between consecutive short-term embeddings by using long-term (stationary) embeddings as a proxy. This enable us to train DeepRed using simple mini-batches without the overhead of specialized mini-batches proposed in previous studies. Moreover, DeepRed's effectiveness comes from the aforementioned design and a multi-way attention mechanism that inspects user-item compatibility. Experiments show that DeepRed outperforms the best state-of-the-art approach by at least 14% on next item prediction task, while gaining more than an order of magnitude speedup over the best performing baselines. Although this study is mainly concerned with temporal interaction networks, we also show the power and flexibility of DeepRed by adapting it to the case of static interaction networks, substituting the short- and long-term aspects with local and global ones. |
Title: ECLARE: Extreme Classification with Label Graph Correlations |
Authors: Anshul Mittal (Indian Institute of Technology Delhi), Noveen Sachdeva (UC San Diego), Sheshansh Agrawal (Microsoft India), Sumeet Agarwal (Indian Institute of Technology Delhi), Purushottam Kar (Microsoft Research) and Manik Varma (Microsoft Research). |
Deep extreme multi-label classification seeks to train deep architectures that can tag a data point with its most relevant subset of labels from an extremely large label set. The core utility of extreme classification comes from predicting labels which are rarely seen during training. Such rare labels hold the key to extremely personalized yet relevant recommendations that can delight and surprise a user. However, the large number of rare labels and extremely small amount of training data per rare label offer significant challenges, both statistical and computational. The state-of-the-art in deep extreme classification tries to remedy this by using label metadata such as textual descriptions of labels, but fails to adequately address the problem. This paper presents ECLARE, a scalable deep learning architecture that incorporates not only label metadata, but also label correlations to offer accurate real-time predictions within a few milliseconds on commodity hardware. The core contributions of ECLARE include a frugal architecture and scalable techniques to accurately train deep architectures along with label correlation graphs at the scale of millions of labels. In particular, ECLARE offers predictions that are up to 9% more accurate on both publicly available benchmark datasets as well as proprietary datasets for a related products recommendation task sourced from a major search engine. Code for ECLARE will be made available on a public repository. |
Title: Effective and Scalable Clustering on Massive Attributed Graphs |
Authors: Renchi Yang (Nanyang Technological University), Jieming Shi (The Hong Kong Polytechnic University), Yin Yang (Hamad bin Khalifa University), Keke Huang (National University of Singapore), Shiqi Zhang (National University of Singapore) and Xiaokui Xiao (National University of Singapore). |
Given a graph G where each node is associated with a set of attributes, and a parameter k specifying the number of output clusters, k-attributed graph clustering (k-AGC) groups nodes in G into k disjoint clusters, such that nodes within the same cluster share similar topological and attribute characteristics, while those in different clusters are dissimilar. This problem is challenging on massive graphs, e.g., with millions of nodes and billions of edges. For such graphs, existing solutions either incur prohibitively high costs, or produce clustering results with compromised quality. In this paper, we propose ACMin, an effective approach to k-AGC that yields high-quality clusters with cost linear to the size of the input graph G. The main contributions of ACMin are twofold: (i) a novel formulation of the k-AGC problem based on an attributed multi-hop conductance quality measure custom-made for this problem setting, which effectively captures cluster coherence in terms of both topological proximities and attribute similarities, and (ii) a linear-time optimization solver that obtains high-quality clusters iteratively, based on efficient matrix operations such as orthogonal iterations, an alternative optimization approach, as well as an initialization technique that significantly speeds up the convergence of ACMin in practice. Extensive experiments, comparing 11 competitors on 6 real datasets, demonstrate that ACMin consistently outperforms all competitors in terms of result quality measured against ground-truth labels, while being up to orders of magnitude faster. In particular, on the Microsoft Academic Knowledge Graph dataset with 265.2 million edges and 1.1 billion attribute values, ACMin outputs high-quality results for 5-AGC within 1.68 hours using a single CPU core, while none of the 11 competitors finish within 3 days. |
Title: Effective Named Entity Recognition with Boundary-aware Bidirectional Neural Networks |
Authors: Fei Li (School of Computer Science and Technology, Beijing Institute of Technology), Zheng Wang (Nanyang Technological University), Siu Cheung Hui (Nanyang Technological University), Lejian Liao (School of Computer Science and Technology, Beijing Institute of Technology), Dandan Song (School of Computer Science and Technology, Beijing Institute of Technology) and Jing Xu (School of Computer Science and Technology, Beijing Institute of Technology). |
Named Entity Recognition (NER) is a fundamental problem in Natural Language Processing and has received much research attention. Although the current neural-based NER approaches have achieved the state-of-the-art performance, they still suffer from one or more of the following three problems in their architectures: (1) boundary tag sparsity, (2) lacking of global decoding information; and (3) boundary error propagation. In this paper, we propose a novel Boundary- aware Bidirectional Neural Networks (Ba-BNN) model to tackle these problems for neural-based NER. The proposed Ba- BNN model is constructed based on the structure of pointer networks for tackling the first problem on boundary tag sparsity. Moreover, we also use a boundary-aware binary classifier to capture the global decoding information as input to the decoders. In the Ba-BNN model, we propose to use two decoders to process the information in two different directions (i.e., from left-to-right and right-to-left). The final hidden states of the left-to-right decoder are obtained by incorporating the hidden states of the right-to-left decoder in the decoding process. In addition, a boundary retraining strategy is also proposed to help reduce boundary error propagation caused by the pointer networks in boundary detection and entity classification. We have conducted extensive experiments based on three NER benchmark datasets. The performance results have shown that the proposed Ba-BNN model has outperformed the current state-of-the-art models. |
Title: Efficient Computation of Semantically Cohesive Subgraphs for Keyword-Based Knowledge Graph Exploration |
Authors: Yuxuan Shi (Nanjing University), Gong Cheng (Nanjing University), Trung-Kien Tran (Bosch Center for Artificial Intelligence), Evgeny Kharlamov (University of Oslo) and Yulin Shen (Nanjing University). |
A knowledge graph (KG) represents a set of entities and their relations. To explore the content of a large and complex KG, a convenient way is keyword-based querying. Traditional methods assign small weights to salient entities or relations, and answer an exploratory keyword query by computing a group Steiner tree (GST), which is a minimum-weight subgraph that connects all the keywords in the query. Recent studies have suggested improving the semantic cohesiveness of a query answer by minimizing the pairwise semantic distances between the entities in a subgraph, but it remains unclear how to efficiently compute such a semantically cohesive subgraph. In this paper, we formulate it as a quadratic group Steiner tree problem (QGSTP) by extending the classical minimum-weight GST problem which is NP-hard. We design two approximation algorithms for QGSTP and prove their approximation ratios. Furthermore, to improve their practical performance, we present various heuristics, e.g., pruning and ranking strategies. |
Title: Efficient Knowledge Graph Embedding without Negative Sampling |
Authors: Zelong Li (Rutgers University), Jianchao Ji (Rutgers University), Zuohui Fu (Rutgers University), Yingqiang Ge (Rutgers University), Shuyuan Xu (Rutgers University), Chong Chen (Tsinghua University) and Yongfeng Zhang (Rutgers University). |
Knowledge Graph (KG) is a flexible structure that is able to describe the complex relationship between data entities. Currently, most KG embedding models are trained based on negative sampling, i.e., the model aims to maximize some similarity of the connected entities in the KG, while minimizing the similarity of the sampled disconnected entities. Negative sampling helps to reduce the time complexity of model learning by only considering a subset of negative instances, which may fail to deliver stable model performance due to the uncertainty in the sampling procedure. To avoid such deficiency, we propose a new framework for KG embedding --- Efficient Non-Sampling Knowledge Graph Embedding (NS-KGE). The basic idea is to consider all of the negative instances in the KG for model learning, and thus to avoid negative sampling. The framework can be applied to square-loss based knowledge graph embedding models or models whose loss can be converted to a square loss. A natural side-effect of this non-sampling strategy is the increased computational complexity of model learning. To solve the problem, we leverage mathematical derivations to reduce the complexity of non-sampling loss function, which eventually provides us both better efficiency and better accuracy in KG embedding compared with existing models. Experiments on benchmark datasets show that our NS-KGE framework can achieve a better performance on efficiency and accuracy over traditional negative sampling based models, and that the framework is applicable to a large class of knowledge graph embedding models. |
Title: Efficient Probabilistic Truss Indexing on Uncertain Graphs |
Authors: Zitan Sun (HKBU), Xin Huang (HKBU), Jianliang Xu (HKBU) and Francesco Bonchi (ISI Foundation). |
Many complex networks are associated with edge uncertainty, such as social networks, communication networks, and biological networks. Although various graph analytic tasks have been studied using such uncertain graphs, the problem of (k, γ)-truss indexing has received little attention to date in the literature. In this paper, we study a new problem regarding (k, γ)-truss indexing and querying over an uncertain graph G. (k, γ)-truss is the largest dense subgraph of G, such that the probability of each edge being contained in at least k-2 triangles is no less than γ. We first propose a compact and elegant CPT-index to keep all the (k, γ)-trusses. Based on the CPT-index, (k, γ)-truss retrieval for any given k and γ can be answered in an optimal linear time with regard to the graph size of the queried (k, γ)-truss. We develop a novel CPT-index construction scheme and a faster algorithm based on graph partitions. For trading off between (k, γ)- truss offline indexing and online querying, we further develop an approximate indexing approach of (ϵ, ∆r)-Indexing to keep the approximate trussness information of the partial edges using two tolerated errors, ϵ and ∆r. This approximate scheme can produce exact results and freely adjust parameters to efficiently support (k, γ)-truss indexing and querying tasks over large uncertain graphs. Extensive experiments using large-scale uncertain graphs with 261 million edges validate the efficiency of our proposed indexing and querying algorithms against state-of-the-art methods. |
Title: Efficient Reductions and A Fast Algorithm of Maximum Weighted Independent Set |
Authors: Mingyu Xiao (UESTC), Sen Huang (University of Electronic Science and Technology of China), Yi Zhou (LERIA) and Bolin Ding (Data Analytics and Intelligence Lab, Alibaba Group). |
The maximum independent set problem is one of the most extensively studied optimization problems in graph algorithms and social networks. The weighted version of this problem, where each vertex will be assigned a nonnegative weight, also receives a lot of attention due to its potential applications in many areas. However, many nice properties and fast algorithms for the unweighted version can not be extended to the weighted version. In this paper, we study the structural properties of this problem, giving some sufficient conditions for a vertex being or not being in a maximum weighted independent set. These properties provide a suite of reduction rules that includes and generalizes almost all frequently used reduction rules for this problem. These rules can efficiently find partial solutions and greatly reduce the instances, especially for sparse graphs. Based on them, we also propose a simple exact yet practical algorithm. To demonstrate the efficiency of our algorithm, we compare it with state-of-the-art algorithms on several well known datasets from the real world. The experimental results reveal that our exact algorithm is not only faster than existing algorithms but also can exactly solve more hard instances with 1,000 seconds. For remaining infeasible instances, our reduction rules can also improve existing heuristic algorithms by producing higher-quality solutions using less time. |
Title: ELIXIR: Learning from User Feedback on Explanations to Improve Recommender Models |
Authors: Azin Ghazimatin (Max Plank Institute for Informatics), Soumajit Pramanik (Indian Institute of Technology Bhilai), Rishiraj Saha Roy (Max Planck Institute for Informatics) and Gerhard Weikum (Max Planck Institute for Informatics). |
System-provided explanations for recommendations are an important component towards transparent and trustworthy AI. In state-of-the-art works, this is a one-way signal, though, to improve user acceptance. In this paper, we turn the role of explanations around and investigate how they can contribute to enhancing the quality of the generated recommendations themselves. We devise an active learning framework, called ELIXIR, where user feedback on explanations is leveraged for pair-wise learning of user preferences. ELIXIR leverages feedback on pairs of recommendations and explanations to learn user-specific latent preference vectors, overcoming sparseness by label propagation with item-similarity-based neighborhoods. Our framework is instantiated using generalized graph recommendation via Random Walk with Restart. Insightful experiments with a real-user study show significant improvements in the quality of movie recommendations over item-level feedback. |
Title: Elo-MMR: A Rating System for Massive Multiplayer Competitions |
Authors: Aram Ebtekar (No Affiliation) and Paul Liu (Stanford University). |
In competitive sports and games, rating systems play an often underrated role. Rating systems allow for competitors to measure their own skill, incentivize competitive performances, and is crucial to providing balanced match-ups between competitors. In this paper, we present a novel Bayesian rating system for contests with many participants. This system can be viewed as an extension of the popular Glicko rating system to multiple players, and is widely applicable to popular coding websites such as emph{Kaggle}, emph{LeetCode}, emph{Codeforces}, and emph{TopCoder}. The simplicity of our system allows us to show theoretical bounds for properties such as outlier robustness, running time, among others. In particular, we show that the system encourages emph{truthful play}: that is, intentional losses by a competitor will never raise their rating. Experimentally, the rating system outperforms existing systems in terms of accuracy, and is faster than existing systems by up to an order of magnitude. |
Title: Enquire One’s Parent and Child Before Decision: Fully Exploit Hierarchical Structure for Self-Supervised Taxonomy Expansion |
Authors: Suyuchen Wang (Université de Montréal), Ruihui Zhao (Tencent Jarvis Lab), Xi Chen (Tencent Jarvis Lab), Yefeng Zheng (Tencent Jarvis Lab) and Bang Liu (Université de Montréal/MILA). |
Taxonomy is a hierarchically structured knowledge graph that plays a crucial role in machine intelligence. The taxonomy expansion task aims to find a position for a new term in an existing taxonomy to capture the emerging knowledge in the world and keep the taxonomy dynamically updated. Previous taxonomy expansion solutions neglect valuable information brought by the hierarchical structure and evaluate the correctness of merely an added edge, which downgrade the problem to node-pair scoring or mini-path classification. In this paper, we propose the Hierarchy Expansion Framework (HEF), which fully exploits the hierarchical structure’s properties to maximize the coherence of expanded taxonomy. HEF makes use of taxonomy’s hierarchical structure in multiple aspects: i) HEF utilizes subtrees containing most relevant nodes as self-supervision data for a complete comparison of parental and sibling relations; ii) HEF adopts a coherence modeling module to evaluate the coherence of a taxonomy’s subtree by integrating hypernymy relation detection and several tree-exclusive features; iii) HEF introduces the Fitting Score for position selection, which explicitly evaluates both path and level selections and takes full advantage of parental relations to interchange information for disambiguation and self-correction. Extensive experiments show that by better exploiting the hierarchical structure and optimizing taxonomy’s coherence, HEF vastly surpasses the prior state-of-the-art on three benchmark datasets by an average improvement of 46.7% in accuracy and 32.3% in mean reciprocal rank. |
Title: Enriched Models for Legislative Edit Prediction |
Authors: Victor Kristof (Ecole Polytechnique Fédérale de Lausanne), Aswin Suresh (Ecole Polytechnique Fédérale de Lausanne), Matthias Grossglauser (Ecole Polytechnique Fédérale de Lausanne) and Patrick Thiran (Ecole Polytechnique Fédérale de Lausanne). |
The European Union legislative process is an instance of a peer-production system. We introduce a model of the success of legislative edits proposed by parliamentarians on new laws. Each edit can be in conflict with edits of other parliamentarians and with the original proposition in the law. Our model is combines three different categories of features: (a) Explicit features extracted from data related to the edits and to the parliamentarians, (b) latent features that capture bilinear interactions between parliamentarians and laws, and (c) text features of the edits. We show experimentally that this combination enables us to accurately predict the success of the edits. Furthermore, it leads to model parameters that are interpretable and provide therefore valuable insights into the legislative process. |
Title: Equilibrium Inverse Reinforcement Learning for Ride-hailing Vehicle Network |
Authors: Takuma Oda (Mobility Technologies Co., Ltd.). |
Ubiquitous mobile computing have enabled ride-hailing services to collect vast amounts of behavioral data of riders and drivers and optimize supply and demand matching in real time. While these mobility service providers have some degree of control over the market by assigning vehicles to requests, workers are usually free to drive when they are not assigned tasks, and thus need to deal with the uncertainty arising from self-interested driver behavior. If a driver's behavior can be accurately replicated on the digital twin, more detailed and realistic counterfactual simulations will enable decision making to improve mobility services as well as to validate urban planning. In this work, we formulate the problem of passenger-vehicle matching in a sparsely connected graph and proposed an algorithm to derive an equilibrium policy in a multi-agent environment. Our framework combines value iteration methods to estimate the optimal policy given expected state visitation and policy propagation to compute multi-agent state visitation frequencies. Furthermore, we developed a method to learn the driver's reward function transferable to an environment with significantly different dynamics from training data. We evaluated the robustness to changes in spatio-temporal supply-demand distributions and deterioration in data quality using a real-world taxi trajectory dataset; our approach significantly outperforms several baselines in terms of imitation accuracy. |
Title: Estimation of Fair Ranking Metrics with Incomplete Judgments |
Authors: Omer Kirnap (University College London), Fernando Diaz (Google Brain), Asia Biega (Microsoft Research, Max Planck Institute for Security and Privacy), Michael Ekstrand (Boise State Computer Science), Ben Carterette (Spotify Research) and Emine Yilmaz (University College London). |
Several methodologies have been proposed to push the evaluation of search systems to include the fairness of system decisions. These metrics often consider the membership of documents' authors to particular groups, referred to as protected attributes including gender or ethnicity. To date, these metrics typically assume the availability and completeness of protected attribute labels of authors. However, due to privacy or policy reasons, the protected attributes of individuals may not always be present, limiting the application of fair ranking metrics in large scale systems. In order to address this problem, we propose two sampling strategies and an estimation technique for four different fair ranking metrics. We formulate a robust and unbiased estimator which can operate even with very limited number of labeled items. We evaluate our approach using both simulated and real world data. Our experimental results demonstrate that our method can estimate this family of fair ranking metrics and provides a robust, reliable alternative to exhaustive or random data annotation. |
Title: Evaluating the Rationales of Retail Investors |
Authors: Chung-Chi Chen (National Taiwan University), Hen-Hsen Huang (National Chengchi University) and Hsin-Hsi Chen (National Taiwan University). |
Social media's rise in popularity has demonstrated the usefulness of the wisdom of the crowd. Most previous works take into account the law of large numbers and simply average the results extracted from tasks such as opinion mining and sentiment analysis. Few attempt to identify high-quality opinions from the mined results. In this paper, we propose an approach for capturing expert-like rationales from social media platforms without the requirement of the annotated data. By leveraging stylistic and semantic features, our approach achieves an F1-score of 90.81%. The comparison between the rationales of experts and those of the crowd is done from stylistic and semantic perspectives, revealing that stylistics and semantic information provides complementary cues for professional rationales. We further show the advantage of using these superlative analysis results in the financial market, and find that top-ranked opinions identified by our approach increase potential returns by up to 90.31% and reduce downside risk by up to 71.69%, compared with opinions ranked by feedback from social media users. Moreover, the performance of our method on downside risk control is even better than that of professional analysts. |
Title: Exploring the Scale-Free Nature of Stock Markets: Hyperbolic Graph Learning for Algorithmic Trading |
Authors: Ramit Sawhney (IIIT Delhi), Shivam Agarwal (Manipal Institute of Technology), Arnav Wadhwa (MIDAS, IIITD) and Rajiv Shah (IIIT Delhi). |
Quantitative trading and investment decision making are intricate financial tasks in the ever-increasing sixty trillion dollars global stock market. Despite advances in stock forecasting, a limitation of existing methods is that they treat stocks independent of each other, ignoring the valuable rich signals between related stocks' movements. Motivated by financial literature that shows stock markets and inter-stock correlations show scale-free network characteristics, we leverage domain knowledge on the Web to model inter-stock relations as a graph in four major global stock markets and formulate stock selection as a scale-free graph-based learning to rank problem. To capture the scale-free spatial and temporal dependencies in stock prices, we propose ASTHGCN: Attentive Spatio-Temporal Hyperbolic Graph Convolution Network, the first neural hyperbolic model for stock selection. Our work's key novelty is the proposal of modeling the complex, scale-free nature of inter-stock relations through temporal hyperbolic graph learning on Riemannian manifolds that can represent the spatial correlations between stocks more accurately. Through extensive experiments on long-term real-world data spanning over six years on four of the world's biggest markets: NASDAQ, NYSE, TSE, and China exchanges, we show that ASTHGCN significantly outperforms state-of-the-art stock forecasting methods in terms of profitability by over 22%, and risk-adjusted Sharpe Ratio by over 27%. We analyze ASTHGCN's components' contributions through a series of exploratory and ablative experiments to demonstrate its practical applicability to real-world trading. Furthermore, we propose a novel hyperbolic architecture that can be applied across various spatiotemporal problems on the Web's commonly occurring scale-free networks. |
Title: Extracting Contextualized Quantity Facts from Web Tables |
Authors: Vinh Thinh Ho (Max Planck Institute for Informatics), Koninika Pal (Max Planck Institute for Informatics), Simon Razniewski (Max Planck Institute for Informatics), Klaus Berberich (Saarbruecken University of Applied Sciences (htw saar)) and Gerhard Weikum (Max Planck Institute for Informatics). |
Quantity queries, with filter conditions on quantitative measures of entities, are so far out of reach of search engines and QA assistants. To enable such queries over web contents, this paper develops the first method for automatically extracting quantity facts from ad-hoc web tables. This involves recognizing quantities, with normalized values and units, aligning them with the proper entities, and contextualizing these pairs with informative cues to match sophisticated queries with modifiers. Our method performs joint inference on entity linking and on entity-quantity column alignment. The latter was oversimplified in prior works by assuming a single subject-column per table, whereas our approach is geared for complex tables and leverages external corpora as evidence. For contextualization, we identify informative cues from text and structural markup that surrounds a table. For query-time fact ranking, we devise a new scoring technique that exploits both context similarity, and inter-fact consistency. Comparisons of our building blocks against state-of-the- art baselines and extrinsic experiments with two query benchmarks demonstrate the benefits of our method. |
Title: Fair and Representative Subset Selection from Data Streams |
Authors: Yanhao Wang (University of Helsinki), Francesco Fabbri (Pompeu Fabra University) and Michael Mathioudakis (University of Helsinki). |
We study the problem of extracting a small subset of representative items from a large data stream. In many data mining and machine learning applications, such as social network analysis and recommender systems, this problem is formulated as maximizing a monotone submodular function subject to a cardinality constraint $k$. In this work, we consider a setting where data items in the stream belong to one of several disjoint groups and investigate the optimization problem with an additional emph{fairness constraint} that limits the selection to a given number of items from each group. We propose efficient algorithms for this fairness-aware variant of the streaming submodular maximization problem. In particular, we first give a $ (frac{1}{2}-varepsilon) $-approximation algorithm that requires $ O(frac{1}{varepsilon} cdot log |
Title: Fair Partitioning of Public Resources: Redrawing District Boundary to Minimize Spatial Inequality in School Funding |
Authors: Nuno Mota (Max Planck Institute for Software Systems), Negar Mohammadi (Tehran Institute for Advanced Studies), Palash Dey (Indian Institute of Technology Kharagpur), Krishna P. Gummadi (Max Planck Institute for Software Systems) and Abhijnan Chakraborty (Max Planck Institute for Software Systems). |
Public schools in the US offer tuition-free primary and secondary education to students, and are divided into school districts funded by the local and state governments. Although the primary source of school district revenue is public money, several studies have pointed to the inequality in funding across different school districts. In this paper, we focus on the spatial geometry/distribution of such inequality, i.e., how the highly funded and lesser funded school districts are located relative to each other. Due to the major reliance on local property taxes for school funding, we find existing school district boundaries promoting financial segregation, with highly-funded school districts surrounded by lesser- funded districts and vice-versa. To counter such issues, we formally propose Fair Partitioning problem to divide a given set of schools into k districts such that the spatial inequality in district-level funding is minimized. However, the Fair Partitioning problem turns out to be computationally challenging, and we formally show that it is strongly NP-complete. We further provide a greedy algorithm to offer practical solution to Fair Partitioning, and show its effectiveness in lowering spatial inequality in school district funding across different states in the US. |
Title: Fairness Aware PageRank |
Authors: Sotiris Tsioutsiouliklis (University of Ioannina), Evaggelia Pitoura (Univ. of Ioannina), Panayiotis Tsaparas (University of Ioannina), Ilias Kleftakis (University of Ioannina) and Nikos Mamoulis (University of Ioannina). |
In this paper, we consider fairness for link analysis and in particular for the celebrated PageRank algorithm. We provide definitions of fairness, both for PageRank and Personalized PageRank. We propose two families of fair PageRank algorithms: the first (Fairness-Sensitive PageRank) modifies the jump vector of the PageRank algorithm to enforce fairness; the second (Locally Fair PageRank) imposes a fair behavior per node. We prove that the Locally Fair algorithms achieve also personalized fairness, and that this is the only family of algorithms that is personalized fair, establishing an equivalence between personalized fairness and local fairness. We also consider the problem of achieving fairness while minimizing the utility loss with respect to the original algorithm. The utility loss for a network can be seen as a measure of the cost of being fair. We present experiments with real and synthetic graphs that examine the fairness of the original PageRank and demonstrate qualitatively and quantitatively the properties of our algorithms. |
Title: FANCY: Human-centered, Deep Learning-based Framework for Fashion Style Analysis |
Authors: Youngseung Jeon (Ajou University), Seungwan Jin (Ajou University) and Kyungsik Han (Ajou University). |
Fashion style analysis is of the utmost importance for fashion professionals. However, it has an issue of having different style classification criteria that rely heavily on professionals' subjective experiences with no quantitative criteria. We present FANCY (Fashion Attributes detectioN for Clustering stYle), a human-centered, deep learning-based framework to support fashion professionals' analytic tasks using a computational method integrated with their insights. We work closely with fashion professionals in the whole study process to reflect their domain knowledge and experience as much as possible. We redefine fashion attributes, demonstrate a strong association with fashion attributes and styles, and develop a deep learning model that detects attributes in a given fashion image and reflects fashion professionals' insight. Based on attribute-annotated 302,772 runway images, we developed 25 new fashion styles. We summarize quantitative standards of the fashion style groups and present fashion trends based on time, location, and brand. |
Title: Fast Evaluation for Relevant Quantities of Opinion Dynamics |
Authors: Wanyue Xu (Fudan University), Qi Bao (Fudan University) and Zhongzhi Zhang (Fudan University). |
One of the main subjects in the field of social networks is to quantify conflict, disagreement, controversy, and polarization, and some quantitative indicators have been developed to quantify these concepts. However, direct computation of these indicators involves the operations of matrix inversion and multiplication, which make it computationally infeasible for large-scale graphs with millions of nodes. In this paper, by reducing the problem of computing relevant quantities to evaluating $ell_2$ norms of some vectors, we present a nearly linear time algorithm to estimate all these quantities. Our algorithm is based on the Laplacian solvers, and has a proved theoretical guarantee of error for each quantity. We execute extensive numerical experiments on a variety of real networks, which demonstrate that our approximation algorithm is efficient and effective, scalable to large graphs having millions of nodes. |
Title: FedPS: A Privacy Protection Enhanced Personalized Search Framework |
Authors: Jing Yao (Renmin University of China), Zhicheng Dou (Renmin University of China) and Ji-Rong Wen (Renmin University of China). |
Personalized search returns each user more accurate results by collecting the user's historical search behaviors to infer the user interests and query intents. However, it brings the risk of user privacy leakage, and this may greatly limit the practical application of personalized search. In this paper, we focus on the problem of privacy protection in personalized search, and propose a privacy protection enhanced personalized search framework, denoted with FedPS. Under this framework, we keep each user's private data on her individual client, and train a shared personalized ranking model with all users' decentralized data by means of federated learning. We implement two models within the framework: the first one applies a personalization model with a personal module for each user to alleviate the challenge of data heterogeneity in federated learning, and the second model introduces trustworthy proxies and group servers to solve the problems of limited communication, performance bottleneck and privacy attack for FedPS. Experimental results verify that our proposed framework can enhance privacy protection without losing too much accuracy. |
Title: Few-Shot Knowledge Validation using Rules |
Authors: Michael Loster (Hasso Plattner Institute), Davide Mottin (Aarhus University), Paolo Papotti (Eurecom), Jan Ehmueller (Hasso Plattner Institute), Benjamin Feldmann (Hasso Plattner Institute) and Felix Naumann (HPI). |
Knowledge graphs (KGs) form the basis of modern intelligent search systems – their network structure helps with the semantic reasoning and interpretation of complex tasks. A KG is a highly dynamic structure in which facts are continuously updated, added, and removed. A typical approach to ensure data quality in the presence of continuous changes is to apply logic rules. These rules are automatically mined from the data using frequency-based approaches. As a result, these approaches depend on the data quality of the KG and are susceptible to errors and incompleteness. To address these issues, we propose Colt, a few-shot rule-based knowledge validation framework that enables the interactive quality assessment of logical rules. It evaluates the quality of any rule by asking a user to validate only a small percentage of the facts entailed by such rule on the KG. We formalize the problem as learning a validation function over the rule’s outcomes and study the theoretical connections to the generalized maximum coverage problem. Our model obtains (i) an accurate estimate of the quality of a rule with fewer than 20 user interactions and (ii) 75% quality (F1) with 5% annotations in the task of validating facts entailed by any rule. |
Title: Few-Shot Molecular Property Prediction |
Authors: Zhichun Guo (University of Notre Dame), Chuxu Zhang (Brandeis University), Wenhao Yu (University of Notre Dame), John Herr (University of Notre Dame), Olaf Wiest (University of Notre Dame), Meng Jiang (University of Notre Dame) and Nitesh Chawla (University of Notre Dame). |
The recent success of deep learning has significantly boosted molecular property prediction, advancing therapeutic activities such as drug discovery. The existing deep neural network methods usually require sufficient training data for each property, impairing their performances over cases (especially for new molecular properties) with a limited amount of laboratory data which are common in real situations. To this end, we propose Meta-MGNN, a novel model for few- shot molecular property prediction. Meta-MGNN applies molecular graph neural networks to learn molecular representation and builds a meta-learning framework for model optimization. To exploit unlabeled molecular information and address task heterogeneity of different molecular properties, Meta-MGNN further incorporates molecular structure and attribute based self-supervised module and self-attentive task weight into the former framework, strengthening the whole learning model. Extensive experiments on two public multi-property datasets demonstrate that Meta-MGNN outperforms a variety of state-of-the-art methods. |
Title: Few-shot Network Anomaly Detection via Cross-network Meta-learning |
Authors: Kaize Ding (ASU), Qinghai Zhou (UIUC), Hanghang Tong (UIUC) and Huan Liu (ASU). |
Network anomaly detection aims to find network elements (e.g., nodes, edges, subgraphs) with significantly different behaviors from the vast majority. It has a profound impact in a variety of applications ranging from finance, healthcare to social network analysis. Due to the unbearable labeling cost, existing methods are predominately developed in an unsupervised manner. Nonetheless, the anomalies they identify may turn out to be data noises or uninteresting data instances due to the lack of prior knowledge on the anomalies of interest. Hence, it is critical to investigate and develop few-shot learning for network anomaly detection. In real-world scenarios, few labeled anomalies are also easy to be accessed on similar networks from the same domain as the target network, while most of the existing works omit to leverage them and merely focus on a single network. Taking advantage of this potential, in this work, we tackle the problem of few-shot network anomaly detection by (1) proposing a new family of graph neural networks -- Graph Deviation Networks (GDN) that can leverage a small number of labeled anomalies for enforcing statistically significant deviations between abnormal and normal nodes on a network; (2) equipping the proposed GDN with a new cross- network meta-learning algorithm to realize few-shot network anomaly detection by transferring meta-knowledge from multiple auxiliary networks. Extensive experimental evaluations demonstrate the efficacy of the proposed approach on few-shot or even one-shot network anomaly detection. |
Title: Field-aware Embedding Space Searching in Recommender Systems |
Authors: Xiangyu Zhao (Michigan State University), Haochen Liu (Michigan State University), Hui Liu (Michigan State University), Jiliang Tang (Michigan State University), Weiwei Guo (LinkedIn), Jun Shi (LinkedIn), Sida Wang (LinkedIn Corporation), Huiji Gao (Arizona State University) and Bo Long (LinkedIn Corporation). |
Practical large-scale recommender systems usually contain thousands of feature fields from users, items, contextual information, and their interactions. Most of them empirically allocate a unified dimension to all feature fields, which is memory inefficient. Thus it is highly desired to assign different embedding dimensions to different feature fields according to their importance and predictability. Due to the large amounts of feature fields and the nuanced relationship between embedding dimensions with feature distributions and neural network architectures, manually allocating embedding dimensions in practical recommender systems can be very difficult. To this end, we propose an AutoML based framework (AutoDim) in this paper, which can automatically select dimensions for different feature fields in a data-driven fashion. Specifically, we first proposed an end-to-end differentiable framework that can calculate the weights over various dimensions for feature fields in a soft and continuous manner with an AutoML based optimization algorithm; then we derive a hard and discrete embedding component architecture according to the maximal weights and retrain the whole recommender framework. We conduct extensive experiments on benchmark datasets to validate the effectiveness of the AutoDim framework. We have released the implementation code to ease reproducibility. |
Title: Fine-grained Urban Flow Prediction |
Authors: Yuxuan Liang (National University of Singapore), Kun Ouyang (National University of Singapore), Junkai Sun (JD Digits), Yiwei Wang (National University of Singapore), Junbo Zhang (JD Digits), Yu Zheng (JD Digits), David Rosenblum (George Mason University) and Roger Zimmermann (National University of Singapore). |
Urban flow prediction benefits smart cities in many aspects, such as traffic management and risk assessment. However, a critical prerequisite for these benefits is having fine-grained knowledge of the city. Thus, unlike previous works that are limited to coarse-grained data, we extend the horizon of urban flow prediction to fine granularity which raises specific challenges: 1) the predominance of inter-grid transitions observed in fine-grained data makes it more complicated to capture the spatial dependencies among grid cells at a global scale; 2) it is very challenging to learn the impact of external factors (e.g., weather) on a large number of grid cells separately. To address these two challenges, we present a Spatio- Temporal Relation Network (STRN) to predict fine-grained urban flows. First, a backbone network is used to learn high- level representations for each cell. Second, we present a Global Relation Module (GloNet) that captures global spatial dependencies much more efficiently compared to existing methods. Third, we design a Meta Learner that takes external factors and land functions (e.g., POI density) as inputs to produce meta knowledge and boost model performances. We conduct extensive experiments on two real-world datasets. The results show that STRN reduces the errors by 7.1% to 11.5% compared to the state-of-the-art method while using much fewer parameters. |
Title: FINN: Feedback Interactive Neural Network for Intent Recommendation |
Authors: Yatao Yang (Alibaba Group), Biyu Ma (Alibaba), Jun Tan (Alibaba Group), Hongbo Deng (Alibaba Group), Haikuan Huang (Alibaba Group) and Zibin Zheng (Sun Yat-sen University). |
Intent recommendation, as a new type of recommendation service, is to recommend a predicted query to a user in the search box when the user lands on the homepage of an application without any input. Such an intent recommendation service has been widely used in e-commerce applications, such as Taobao and Amazon. The most difficult part is to accurately predict user's search intent, so as to improve user's search experience and reduce tedious typing especially on mobile phones. Existing methods mainly rely on user's historical search behaviors to estimate user's current intent, but they do not make full use of the feedback information between the user and the intent recommendation system. Essentially, feedback information is the key factor for capturing dynamics of user search intents in real time. Therefore, we propose a feedback interactive neural network (FINN) to estimate user's potential search intent more accurately, by making full use of the feedback interaction with the following three parts: 1) Both positive feedback (PF) and negative feedback (NF) information are collected simultaneously. PF includes user's search intent information that the user is interested in, such as the query used and the title clicked. NF indicates user's search intent information that the user is not interested in, such as the query recommended by the system but not clicked by the user. 2) A filter-attention (FAT) structure is proposed to filter out the noisy feedback and get more accurate positive and negative intentions of users. 3) A multi-task learning is designed to match the correlation between the user's search intent and query candidates, which can learn and recommend query candidates from user interests and disinterests associated with each user. Finally, extensive experiments have been conducted by comparing with state-of-the-art methods, and it shows that our FINN method can achieve the best performance using the Taobao mobile application dataset. In addition, online experimental results also show that our method improves the CTR by 8% and attracts more than 7.98% of users than the baseline. |
Title: FM^2: Field-matrixed Factorization Machines for CTR Prediction |
Authors: Yang Sun (Yahoo! Research), Junwei Pan (Yahoo! Research), Alex Zhang (Yahoo! Research) and Aaron Flores (Yahoo! Research). |
Click-through rate (CTR) prediction plays a critical role in the recommender system and online advertising. The data used in these applications are multi-field categorical data, where each feature belongs to one field. Field information is proved to be important and there are several works considering fields in their models. In this paper, we propose a novel way to model the field information effectively and efficiently, we called it Field-matrixed Factorization Machines (FmFM, or FM^2), which is a direct improvement of FwFMs. We also proposed a new explanation of FMs and FwFMs within the FmFMs framework, and compared the FFMs and FmFMs. Besides pruning the cross terms, our model can support fields specific variable dimensions in embedding, which act as a soft pruning. We also propose an efficient way to minimize the dimension while keeping the model performance. The FmFM model can also be optimized further by cache the intermediate vectors, and it only takes thousands of floating-point operations (FLOP) to make a prediction. Our experiment results show that it can out-perform the FFMs, which is a higher complexity model. The FmFMs model's performance is also comparable to those complex DNN models which requires millions or more FLOP. We open-sourced our code at https://github.com/fmfm2020/FmFM |
Title: From personal data to digital legacy: Exploring conflicts in the sharing, security and privacy of post-mortem data |
Authors: Jack Holt (Newcastle University), James Nicholson (Northumbria University) and Jan Smeddinck (Newcastle University). |
As digital technologies become more prevalent there is a growing awareness of the importance of good security and privacy practices. The tools and techniques used to achieve this are typically designed with the living user in mind, with little consideration of how they should or will perform after the user has died. We report on two workshops carried out with users of password managers to explore their views on the post-mortem sharing, security and privacy of a range of common digital assets. We discuss a post-mortem privacy paradox where users recognise value in planning for their digital legacy, yet avoid actively doing so. Importantly, our findings highlight a tension between the use of recommended security tools during life and facilitating appropriate post-mortem access to chosen assets. We offer design recommendations to assist and encourage digital legacy planning while promoting good security habits during life. |
Title: Future-Aware Diverse Trends Framework for Recommendation |
Authors: Yujie Lu (Tencent), Shengyu Zhang (Zhejiang University), Yingxuan Huang (Tencent), Xinyao Yu (Zhejiang University), Luyao Wang (Zhejiang University), Zhou Zhao (Zhejiang University) and Fei Wu (Zhejiang University). |
In recommender systems, modeling user-item behaviors is essential for user representation learning. Existing sequential recommenders consider the sequential correlations between historically interacted items for capturing users' historical preferences. However, since users' preferences are by nature time-evolving and diversified, solely modeling the historical preference (without being aware of the time-evolving trends of preferences) can be inferior for recommending complementary or fresh items and thus hurt the effectiveness of recommender systems. In this paper, we bridge the gap between the past preference and potential future preference by proposing the future-aware diverse trends (FAT) framework. By future-aware, for each inspected user, we construct the future sequences from other similar users, which comprise of behaviors that happen after the last behavior of the inspected user, based on a proposed neighbor behavior extractor. By diverse trends, supposing the future preferences can be diversified, we propose the diverse trends extractor and the time-interval aware mechanism to represent the possible trends of preferences for a given user with multiple vectors. We leverage both the representations of historical preference and possible future trends to obtain the final recommendation. The quantitative and qualitative results from relatively extensive experiments on real-world datasets demonstrate the proposed framework not only outperforms the state-of-the-art sequential recommendation methods across various metrics, but also makes complementary and fresh recommendations. |
Title: GalaXC: Graph neurAI networks with Labelwise Attention for eXtreme Classification |
Authors: Deepak Saini (Microsoft Research), Arnav Kumar Jain (Microsoft), Kushal Dave (Microsoft), Jian Jiao (Microsoft), Amit Singh (Microsoft), Ruofei Zhang (Microsoft) and Manik Varma (Microsoft Research). |
This paper develops the GalaXC algorithm for Extreme Classification (XC), where the task is to annotate a document with the most relevant subset of labels from an extremely large label set. XC has been successfully applied to many real world web-scale applications like web search, product recommendation, query rewriting etc. Leading XC algorithms generally consider documents and labels as two disjoint sets, even though the two sets can be coming from the same space. Further, many XC algorithms, which can scale to millions of labels, don't utilize label meta-data. To this end, GalaXC learns document representations by building a joint graph over documents and labels. GalaXC further employs an efficient label attention mechanism to learn label classifiers. This allowed the proposed algorithm to be up to 23% more accurate than leading deep extreme classifiers, while being upto 30-120x faster to train and 10x faster to predict on benchmark datasets. A joint graph over documents and labels allows GalaXC to naturally incorporate auxiliary sources of information. GalaXC is particularly well suited for warm start scenarios where predictions need to be made on training points with partially revealed labels. GalaXC was found to be up to 13% more accurate than XC algorithms specifically developed for the warm start setting. An efficient implementation of GalaXC allowed it be trained on a dataset with 50M labels and 97M training documents in less than 100 hours on 4xV100 GPUs. In A/B tests conducted on Bing search engine, GalaXC could improve the Click Yield (CY) and coverage by 1.52% and 1.11% respectively. Code for GalaXC will be made publicly available. |
Title: Generalizing Discriminative Retrieval Models using Generative Tasks |
Authors: Binsheng Liu (RMIT University), Hamed Zamani (University of Massachusetts Amherst), Xiaolu Lu (Microsoft) and J. Shane Culpepper (RMIT University). |
Information Retrieval has a long history of applying either dis- criminative or generative modeling to retrieval and ranking tasks. Recent developments in transformer architectures and multi-task learning techniques have dramatically improved our ability to train effective neural models capable of resolving a wide variety of tasks using either of these paradigms. In this paper, we propose a novel multi-task learning approach which can be used to produce more effective neural ranking models. The key idea is to improve the quality of the underlying transformer model by cross-training a retrieval task and one or more complimentary language genera- tion tasks. By targeting the training on the encoding layer in the transformer architecture, our experimental results show that the proposed multi-task learning approach consistently improves re- trieval effectiveness on the targeted collection and can easily be retargeted to new ranking tasks. We provide an in-depth analy- sis showing how multi-task learning modifies model behaviors, resulting in more general models. |
Title: Generating Accurate Caption Units For Figure Captioning |
Authors: Xin Qian (University of Maryland, College Park), Eunyee Koh (Adobe Research), Fan Du (Adobe Research), Sungchul Kim (Adobe Research), Joel Chan (University of Maryland, College Park), Ryan Rossi (Adobe Research), Sana Malik (Adobe Research) and Tak Yeon Lee (Adobe Research). |
Scientific-style figures are commonly used on the web to present numerical information. Captions that tell accurate figure information and sounds natural would significantly improve figure accessibility. In this paper, we present promising results on machine figure captioning. A recent corpus analysis of real-world captions reveals that machine figure captioning systems should start by generating accurate caption units. We formulate the caption unit generation problem as a controlled captioning problem: given a caption unit type as a control signal, a model generates an accurate and natural caption unit of that type. As a proof-of-concept, we propose a new deep learning model, FigJAM, that, given a caption unit control signal, utilizes metadata information and a joint static and dynamic dictionary, generates an accurate and natural caption unit of that type. We conduct quantitative evaluations with two datasets from a related task of figure question answering, and show that our model can generate more accurate caption units than competitive baseline models. Finally, a user study with 10 human experts confirms the value of machine-generated caption units in its standalone accuracy and naturalness. |
Title: GNEM: A Generic One-to-Set Neural Entity Matching Framework |
Authors: Runjin Chen (Shanghai Jiao Tong University), Yanyan Shen (Shanghai Jiao Tong University) and Dongxiang Zhang (Zhejiang University). |
Entity Matching is a classic research problem in any data analytics pipeline, aiming to identify records referring to the same real-world entity. It plays an important role in data cleansing and integration. Advanced entity matching techniques focus on extracting syntactic or semantic features from record pairs via complex neural architectures or pre-trained language models. However, the performances always suffer from noisy or missing attribute values in the records. We observe that comparing one record with several relevant records in a collective manner allows each pairwise matching decision to be made by borrowing valuable insights from other pairs, which is beneficial to the overall matching performance. In this paper, we propose a generic one-to-set neural framework named GNEM for entity matching. GNEM predicts matching labels between one record and a set of relevant records simultaneously. It constructs a record pair graph with weighted edges and adopts the graph neural network to propagate information among pairs. We further show that GNEM can be interpreted as an extension and generalization of the existing pairwise matching techniques. Extensive experiments on real-world data sets demonstrate that GNEM consistently outperforms the existing pairwise entity matching techniques and achieves up to 8.4% improvement on F1-Score compared with the state-of-the-art neural methods. |
Title: Graph Contrastive Learning with Adaptive Augmentation |
Authors: Yanqiao Zhu (Institute of Automation, Chinese Academy of Sciences), Yichen Xu (Beijing University of Posts and Telecommunications), Feng Yu (Alibaba), Qiang Liu (RealAI and Tsinghua University), Shu Wu (Institute of Automation, Chinese Academy of Sciences) and Liang Wang (Institute of Automation, Chinese Academy of Sciences). |
Recently, contrastive learning (CL) has emerged as a successful method for unsupervised graph representation learning. Most graph CL methods first perform stochastic augmentation on the input graph to obtain two graph views and maximize the agreement of representations in the two views. Despite the prosperous development of graph CL methods, the design of graph augmentation schemes—a crucial component in CL—remains rarely explored. We argue that the data augmentation schemes should preserve intrinsic structural and attribute information of graphs, which will force the model to learn representations that are insensitive to perturbation on unimportant nodes and edges. However, most existing methods adopt uniform data augmentation schemes, like uniformly dropping edges and uniformly shuffling features, leading to suboptimal performance. In this paper, we propose a novel graph contrastive representation learning method with adaptive augmentation that incorporates various priors for topological and semantic aspects of the graph. Specifically, on the topology level, we design augmentation schemes based on node centrality measures to highlight important connective structures. On the node attribute level, we corrupt node features by adding more noise to unimportant node features, to enforce the model to recognize underlying semantic information. We perform extensive experiments of node classification on a variety of real-world datasets. Experimental results demonstrate that our proposed method consistently outperforms existing state-of-the-art methods and even surpasses some supervised counterparts, which validates the effectiveness of the proposed contrastive framework with adaptive augmentation. |
Title: Graph Embedding for Recommendation against Attribute Inference Attacks |
Authors: Shijie Zhang (The University of Queensland), Hongzhi Yin (The University of Queensland), Tong Chen (The University of Queensland), Zi Huang (The University of Queensland), Lizhen Cui (Shandong University) and Xiangliang Zhang (King Abdullah University of Science & Technology). |
In recent years, recommender systems play a pivotal role in helping users identify the most suitable items that satisfy personal preferences. As user-item interactions can be naturally modelled as graph-structured data, variants of graph convolutional networks (GCNs) have become a well-established building block in the latest recommenders. Due to the wide utilization of sensitive user profile data, existing recommendation paradigms are likely to expose users to the threat of privacy breach, and GCN-based recommenders are no exception. Apart from the leakage of raw user data, the fragility of current recommenders under inference attacks offers malicious attackers a backdoor to estimate users' private attributes via their behavioral footprints and the recommendation results. However, little attention has been paid to developing recommender systems that can defend such attribute inference attacks, and existing works achieve attack resistance by either sacrificing considerable recommendation accuracy or only covering specific attack models or protected information. In our paper, we propose GERAI, a novel differentially private graph convolutional network to address such limitations. Specifically, in GERAI, we bind the information perturbation mechanism in differential privacy with the recommendation capability of graph convolutional networks. Furthermore, based on local differential privacy and functional mechanism, we innovatively devise a dual-stage encryption paradigm to simultaneously enforce privacy guarantee on users' sensitive features and the model optimization process. Extensive experiments show the superiority of GERAI in terms of its resistance to attribute inference attacks and recommendation effectiveness. |
Title: Graph Neural Networks for Friend Ranking in Large-scale Social Platforms |
Authors: Aravind Sankar (University of Illinois, Urbana-Champaign), Yozen Liu (Snap Inc.), Jun Yu (Snap Inc.) and Neil Shah (Snap Inc.). |
Graph Neural Networks (GNNs) have recently enabled substantial advances in graph learning. Despite their rich representational capacity, GNNs remain relatively under-explored for large-scale social modeling applications. One such industrially ubiquitous application in online social platforms is friend suggestion: platforms recommend their users other candidate users to befriend, to improve user connectivity, retention and engagement. However, modeling such user-user interactions on large-scale social platforms poses unique challenges: such graphs often have heavy-tailed degree distributions, where a significant fraction of users are inactive and have limited structural and engagement information. Moreover, users interact with different functionalities, communicate with diverse groups, and have multifaceted interaction patterns. We study the application of GNNs for friend suggestion, providing the first investigation of GNN design for this task, to our knowledge. To leverage the rich knowledge of heterogeneous in-platform user actions, we formulate friend suggestion as multi-faceted friend ranking with multi-modal user features and link communication features. We present a neural architecture, GraFRank, which is carefully designed to learn expressive user representations from multiple user feature modalities and user user interactions. Specifically, GraFRank handles heterogeneity in modality homophily via modality-specific neighbor aggregators, and learns non-linear modality correlations through cross-modality attention. We conduct experiments on two multi million user social network datasets from Snapchat, a leading and widely popular mobile social platform, where GraFRank outperforms several state-of-the-art approaches on candidate retrieval (by 30% MRR) and ranking (by 20% MRR) tasks. Moreover, our qualitative analysis suggests notable gains for critical populations of less-active and low-degree users. |
Title: Graph Structure Estimation Neural Networks |
Authors: Ruijia Wang (Tencent TEG; Beijing University of Posts and Telecommunications), Shuai Mou (Tencent TEG), Xiao Wang (Beijing University of Posts and Telecommunications), Wanpeng Xiao (Tencent TEG), Qi Ju (Tencent TEG), Chuan Shi (Beijing University of Posts and Telecommunications) and Xing Xie (Microsoft Research). |
Graph Neural Networks (GNNs) have drawn considerable attention in recent years and achieved outstanding performance in many tasks. Most empirical studies of GNNs assume that the observed graph represents a complete and accurate picture of node relationship. However, this fundamental assumption cannot always be satisfied, since the real- world graphs from complex systems are error-prone and may not be compatible with the properties of GNNs. Therefore, GNNs solely relying on original graph may cause unsatisfactory results, one typical example of which is that GNNs perform well on graphs with homophily while fail on the disassortative situation. In this paper, we propose graph estimation neural networks GEN, which estimates graph structure for GNNs. Specifically, our GEN presents a structure model to fit the mechanism of GNNs by generating graphs with community structure, and an observation model that injects multifaced observations into calculating the posterior distribution of graphs and is the first to incorporate multi- order neighborhood information. With above two models, the estimation of graph is implemented based on Bayesian inference to maximize the posterior probability, which attains mutual optimization with GNN parameters in an iterative framework. To comprehensively evaluate the performance of GEN, we perform a set of experiments on several benchmark datasets with different homophily and a synthetic dataset, where the experimental results demonstrate the effectiveness of our GEN and rationality of the estimated graph. |
Title: Graph Topic Neural Network for Document Representation |
Authors: Qianqian Xie (School of Computer, Wuhan University), Jimin Huang (Wuhan University), Pan Du (University of Montreal), Min Peng (Wuhan University) and Jian-Yun Nie (University of Montreal). |
Graph Neural Networks (GNNs) such as GCN can effectively learn document representations via the semantic relation graph among documents and words. However, most of previous work in this line of research (with only a few exceptions) does not consider the underlying topical semantics inherited in document contents and the relation graph, making the representations less effective and hard to interpret. In a few recent studies trying to incorporate latent topics into GNNs, the topics have been learned independently from the relation graph modeling. Intuitively, topic extraction can benefit much from the information propagation of the relation graph structure - connected and indirectly connected documents and words have similar topics. In this paper, we propose a novel Graph Topic Neural Network (GTNN) model to mine latent topic semantic for interpretable document representation learning, taking into account the document-document, document-word and word-word relationships in the graph.We also show that our model can be viewed as semi- amortized inference for relational topic model based on Poisson distribution, with high order correlations. We test our model in several settings: unsupervised, semi-supervised and supervised representation learning, for both connected and unconnected documents. In all the cases, our model outperforms the state-of-the-art models for these tasks. |
Title: Graph-based Hierarchical Relevance Matching Signals for Ad-hoc Retrieval |
Authors: Xueli Yu (CASIA), Weizhi Xu (CASIA), Zeyu Cui (CASIA), Shu Wu (CASIA) and Liang Wang (CASIA). |
The ad-hoc retrieval task is to rank related documents given a query and a document collection. A series of deep learning based approaches have been proposed to solve such problem and gained lots of attention. However, we argue that they are inherently based on local word sequences, ignoring the subtle long-distance document-level word relationships. To solve the problem, we explicitly model the document-level word relationship through the graph structure, capturing the subtle information via graph neural networks. In addition, due to the complexity and scale of the document collections, it is considerable to explore the different grain-sized hierarchical matching signals at a more general level. Therefore, we propose a Graph-based Hierarchical Relevance Matching model (GHRM) for ad-hoc retrieval, by which we can capture the subtle and general hierarchical matching signals simultaneously. We validate the effects of GHRM over two representative ad-hoc retrieval benchmarks, the comprehensive experiments and results demonstrate its superiority over state-of-the-art methods. |
Title: GuideBoot: Guided Bootstrap for Deep Contextual Bandits in Online Advertising |
Authors: Feiyang Pan (Institute of Computing Technology, Chinese Academy of Sciences), Haoming Li (Institute of Computing Technology, Chinese Academy of Sciences), Xiang Ao (Institute of Computing Technology, Chinese Academy of Sciences), Wei Wang (Tencent Advertising and Marketing Service), Yanrong Kang (Tencent Advertising and Marketing Service), Ao Tan (Tencent Advertising and Marketing Service) and Qing He (Institute of Computing Technology, Chinese Academy of Sciences). |
The exploration/exploitation (E&E) dilemma lies at the core of interactive systems such as online advertising, for which contextual bandit algorithms have been proposed. Bayesian approaches provide guided exploration with principled uncertainty estimation, but the applicability is often limited due to over-simplified assumptions. Non-Bayesian bootstrap methods, on the other hand, can apply to complex problems by using deep reward models, but lacks clear guidance to the exploration behavior. It still remains largely unsolved to develop a practical method for complex deep contextual bandits. In this paper, we introduce Guided Bootstrap (GuideBoot for short), combining the best of both worlds. GuideBoot provides explicit guidance to the exploration behavior by training multiple models over both real samples and noisy samples with fake labels, where the noise is added according to the predictive uncertainty. The proposed method is efficient as it can make decisions on-the-fly by utilizing only one randomly chosen model, but is also effective as we show that it can be viewed as a non-Bayesian approximation of Thompson sampling. Moreover, we extend it to an online version that can learn solely from streaming data, which is favored in real applications. Extensive experiments on both synthetic task and large-scale advertising environments show that GuideBoot achieves significant improvements against previous state-of-the-art methods. |
Title: Hashing-Accelerated Graph Neural Networks for Link Prediction |
Authors: Wei Wu (Leibniz University Hannover), Bin Li (Fudan University), Chuan Luo (Microsoft Research Asia) and Wolfgang Nejdl (Leibniz University Hannover). |
Networks are ubiquitous in the real world. Link prediction, as one of the key problems for network-structured data, aims to predict whether there exists a link between two nodes. The traditional approaches are based on the explicit similarity computation between the compact node representation by embedding each node into the low-dimensional space. In order to efficiently handle with the intensive similarity computation in link prediction, the hashing technique has been successfully used to produce the node representation in the hamming space. However, the hashing-based link prediction algorithms face accuracy loss from the randomized hashing techniques or inefficiency from the learning to hash techniques in the embedding process. Currently, the Graph Neural Network (GNN) framework has been widely applied to the graph-related tasks in an end-to-end manner, but it commonly requires substantial computational resources and memory costs due to massive parameter learning, which makes the GNN-based algorithms impractical without the help of a powerful workhorse. In this paper, we propose a simple and effective model called #GNN, which balances the trade- off between accuracy and efficiency. #GNN is able to efficiently acquire node representation in the hamming space for link prediction by exploiting the randomized hashing technique to implement message passing and capture high-order proximity in the GNN framework. Furthermore, we characterize the discriminative power of #GNN in probability. The extensive experimental results demonstrate that the proposed #GNN algorithm achieves accuracy comparable to the learning-based algorithms and outperforms the randomized algorithm, while running significantly faster than the learning-based algorithms. Furthermore, the proposed algorithm shows excellent scalability on a large-scale network with the limited resources. |
Title: Have You been Properly Notified? Automatic Compliance Analysis of Privacy Policy Text with GDPR Article 13 |
Authors: Shuang Liu (College of Intelligence and Computing, Tianjin University), Baiyang Zhao (College of Intelligence and Computing, Tianjin University), Renjie Guo (College of Intelligence and Computing, Tianjin University), Guozhu Meng (Institute of Information Engineering, Chinese Academy of Sciences), Fan Zhang (Tianjin University) and Meishan Zhang (Tianjin University). |
With the rapid development of web and mobile applications, as well as their wide adoption in different domains, more and more personal data is provided, consciously or unconsciously, to different application providers. Privacy policy is an important medium for users to understand what personal information has been collected and used. As data privacy protection is becoming a critical social issue, there are laws and regulations being enacted in different contraries and regions, and the most representative one is the EU General Data Protection Regulation (GDPR). It is thus important to detect compliance issues between regulations, e.g., GDPR, with privacy policies, and provide intuitive results for data subjects (i.e., users), data collection party (i.e., service providers) and the regulatory authorities. In this work, we target to solve the problem of compliance analysis between GDPR (Article 13) and privacy policies. We format the task into a combination of a sentence classification step and a rule-based analysis step. We manually curate a corpus of 36, 610 labeled sentences from 304 privacy policies, and benchmark our corpus with several standard sentence classifiers. We also conduct a rule-based analysis to detect compliance issues and a user study to evaluate the usability of our approach. The web-based tool AutoCompliance is publicly accessible. |
Title: HDMI: High-order Deep Multiplex Infomax |
Authors: Baoyu Jing (University of Illinois at Urbana-Champaign), Chanyoung Park (Korea Advanced Institute of Science & Technology) and Hanghang Tong (University of Illinois at Urbana-Champaign). |
Networks have been widely used to represent the relations between objects such as academic networks and social networks, and learning embedding for networks has thus garnered plenty of research attention. Self-supervised network representation learning aims at extracting node embedding without external supervision. Recently, maximizing the mutual information between the local node embedding and the global summary (e.g. Deep Graph Infomax, or DGI for short) has shown promising results on many downstream tasks such as node classification. However, there are two major limitations of DGI. Firstly, DGI merely considers the extrinsic supervision signal (i.e., the mutual information between node embedding and global summary) while ignores the intrinsic signal (i.e., the mutual dependence between node embedding and node attributes). Secondly, nodes in a real-world network are usually connected by multiple edges with different relations, while DGI does not fully explore the various relations among nodes. To address the above-mentioned problems, we propose a novel framework, called High-order Deep Multiplex Infomax (HDMI), for learning node embedding on multiplex networks in a self-supervised way. To be more specific, we first design a joint supervision signal containing both extrinsic and intrinsic mutual information by high-order mutual information, and we propose a High- order Deep Infomax (HDI) to optimize the proposed supervision signal. Then we propose an attention based fusion module to combine node embedding from different layers of the multiplex network. Finally, we evaluate the proposed HDMI on various downstream tasks such as unsupervised clustering and supervised classification. The experimental results show that HDMI achieves state-of-the-art performance on these tasks. |
Title: Heterogeneous Graph Neural Network via Attribute Completion |
Authors: Di Jin (College of Intelligence and Computing, Tianjin University), Cuiying Huo (College of Intelligence and Computing, Tianjin University), Chundong Liang (College of Intelligence and Computing, Tianjin University) and Liang Yang (School of Artificial Intelligence, Hebei University of Technology). |
Heterogeneous information networks (HINs), also called heterogeneous graphs, are composed of multiple types of nodes and edges, and contain comprehensive information and rich semantics. Graph neural networks (GNNs), as powerful tools for graph data, have shown superior performance on network analysis. Recently, many excellent models have been proposed to process hetero-graph data using GNNs and achieve great success. These GNN-based heterogeneous models can be interpreted as smooth node attributes guided by graph structure, which requires all nodes to have attributes. However, this is not easy to satisfy, because some types of nodes often have no attributes in heterogeneous graphs. Previous studies take some handcrafted methods to solve this problem, which separate the attribute completion from the graph learning process and result in poor performance. In this paper, we hold that missing attributes can be acquired by a learnable manner, so we propose a general framework for Heterogeneous Graph Neural Network via Attribute Completion (HGNN-AC), including pre-learning of topological embedding and attribute completion with attention mechanism. HGNN-AC first uses existing HIN-Embedding methods to obtain node topological embedding. Then it uses the topological relationship between nodes as guidance to complete attributes for no-attribute nodes by weighted aggregation the attributes from these attributed nodes. Our complement mechanism is easily combined with an arbitrary GNN-based heterogeneous model and make the whole end-to-end. We conduct extensive experiments on three real- world heterogeneous graphs. The results demonstrate the superiority of the proposed framework over state-of-the-art baselines. |
Title: HGCF: Hyperbolic Graph Convolution Networks for Collaborative Filtering |
Authors: Jianing Sun (Layer 6 AI), Zhaoyue Cheng (Layer 6 AI), Saba Zuberi (Layer 6 AI), Felipe Perez (Layer 6 AI) and Maksims Volkovs (Layer 6 AI). |
Hyperbolic spaces offer a rich setup to learn embeddings with superior properties that have been leveraged in areas such as computer vision, natural language processing and computational biology. Recently, several hyperbolic approaches have been proposed to learn high-quality representations for users and items in the recommendation setting. However, these approaches don't capture the higher order relations that typically exist in the implicit feedback domain. Graph convolutional neural networks (GCNs) on the other hand excel at capturing such information by applying multiple levels of aggregation to local representations. In this paper we combine these two approaches in a novel way: by applying the graph convolutions in the tangent space of a reference point. We show that this hyperbolic GCN architecture can be effectively learned with a margin ranking loss, and test-time retrieval is done using the hyperbolic distance. We conduct extensive empirical analysis on three public benchmarks and compare against a large set of baselines. Our approach achieves highly competitive performance with a significant improvement over state-of-the-art methods. the data. Full code for this work will be released at the time of publication.We further study the properties of the embeddings obtained by our hyperbolic model and show that they offer meaningful insights into |
Title: Hierarchical Personalized Federated Learning for User Modeling |
Authors: Jinze Wu (University of Science and Technology of China), Qi Liu (University of Science and Technology of China), Zhenya Huang (University of Science and Technology of China), Yuting Ning (University of Science and Technology of China), Hao Wang (University of Science and Technology of China), Enhong Chen (University of Science and Technology of China), Jinfeng Yi (JD AI Research) and Bowen Zhou (JD AI Research). |
User modeling aims to capture the latent characteristics of users from their behaviors, and is widely applied in numerous applications. Usually, centralized user modeling suffers from the risk of privacy leakage. Instead, federated user modeling expects to provide a secure multi-client collaboration for user modeling through federated learning. Existing federated learning methods are mainly designed for consistent clients, which cannot be directly applied to practical scenarios, where different clients usually store inconsistent user data. Therefore, it is a crucial demand to design an appropriate federated solution that can better adapt to user modeling tasks, and however, meets following critical challenges: 1) Statistical heterogeneity. The distributions of user data in different clients are not always independently identically distributed which leads to personalized clients; 2) Privacy heterogeneity. User data contains both public and private information, which have different levels of privacy. It means we should balance different information to be shared and protected; 3) Model heterogeneity. The local user models trained with client records are heterogeneous which need flexible aggregation in the server. In this paper, we propose a novel client-server architecture framework, namely Hierarchical Personalized Federated Learning (HPFL) to serve federated learning in user modeling with inconsistent clients. In the framework, we first define hierarchical information to finely partition the data with privacy heterogeneity. On this basis, the client trains a user model which contains different components designed for the hierarchical information. Moreover, client processes a fine-grained personalized update strategy to update personalized user model for statistical heterogeneity. Correspondingly, the server completes a differentiated component aggregation strategy to flexibly aggregate heterogeneous user models in the case of privacy heterogeneity and model heterogeneity. Finally, we conduct extensive experiments on real-world datasets for principal user modeling tasks, which demonstrate the effectiveness of the HPFL framework. |
Title: High-Dimensional Sparse Cross-Modal Hashing with Fine-Grained Similarity Embedding |
Authors: Yongxin Wang (Shandong University), Zhen-Duo Chen (Shandong University), Xin Luo (Shandong University) and Xin- Shun Xu (Shandong University). |
Recently, with the discoveries in neurobiology, high-dimensional sparse hashing has attracted increasing attention. In contrast with general hashing that generates low-dimensional hash codes, the high-dimensional sparse hashing maps inputs into a higher dimensional space and generates sparse hash codes, achieving superior performance. However, the sparse hashing has not been fully studied in hashing literature yet. For example, how to fully explore the power of sparse coding in cross-modal retrieval tasks; how to discretely solve the binary and sparse constraints so as to avoid the quantization error problem. Motivated by these issues, in this paper, we present an efficient sparse hashing method, i.e., High-dimensional Sparse Cross-modal Hashing, HSCH for short. It not only takes the high-level semantic similarity of data into consideration, but also properly exploits the low-level feature similarity. In specific, we theoretically design a fine- grained similarity with two critical fusion rules. Then we take advantage of sparse codes to embed the fine-grained similarity into the to-be-learnt hash codes. Moreover, an efficient discrete optimization algorithm is proposed to solve the binary and sparse constraints, reducing the quantization error. In light of this, it becomes much more trainable, and the learnt hash codes are more discriminative. More importantly, the retrieval complexity of HSCH is as efficient as general hash methods. Extensive experiments on three widely-used datasets demonstrate the superior performance of HSCH compared with several state-of-the-art cross-modal hashing approaches. |
Title: High-dimensional Sparse Embeddings for Collaborative Filtering |
Authors: Jan Van Balen (Antwerp University) and Bart Goethals (Antwerp University). |
A widely adopted paradigm in the design of recommender systems is to represent users and items as vectors, often referred to as latent factors or embeddings. Users’ predicted affinity to items are then represented as inner products between embeddings. Embeddings can be obtained using a variety of recommendation models and served in production using a variety of data engineering solutions. Embeddings also facilitate transfer learning, where trained embeddings from one model are reused in another. In contrast, some of the best-performing collaborative filtering models today are high-dimensional linear models that do not rely on factorization, and so they do not produce embeddings. They also require pruning, amounting to a trade-off between model size and sparsity of the predicted affinities. This paper argues for the use of sparse latent factor models, instead. We propose a new recommendation model based on a full-rank factorization of the inverse Gram matrix. The resulting high-dimensional embeddings can be made sparse while still factorizing a dense affinity matrix. We show how the embeddings combine the advantages of latent representations with the performance of high-dimensional linear models. |
Title: Highly Liquid Temporal Interaction Graph Embeddings |
Authors: Huidi Chen (Fudan University), Yun Xiong (Fudan University), Yangyong Zhu (Fudan University) and Philip S. Yu (University of Illinois at Chicago). |
Capturing the topological and temporal information of interactions and predicting future interactions are crucial for many domains, such as social networks, financial transactions, and e-commerce. With the advent of co-evolutional models, the mutual influence between the interacted users and items are captured. However, exist- ing models only update the interaction information of nodes along the timeline. It causes the problem of information lag, where early updated nodes often have much less information than the most recently updated nodes. We propose HILI (Highly Liquid Temporal Interaction Graph Embeddings) to predict highly liquid embed- dings on temporal interaction graphs. Our embedding model makes interaction information highly liquid without information lag. A specific least recently used-based and frequency-based windows are used to determine the priority of the nodes that receive the latest interaction information. HILI updates node embeddings by attention layers. The attention layers learn the correlation between nodes and update node embedding simply and quickly. In addition, HILI elaborately designs, a self-linear layer, a linear layer initialized in a novel method. A self-linear layer reduces the expected space of predicted embedding of the next interacting node and makes pre- dicted embedding focus more on relevant nodes. We illustrate the geometric meaning of a self-linear layer in the paper. Furthermore, the results of the experiments show that our model outperforms other state-of-the-art temporal interaction prediction models. |
Title: HINTS: Citation Time Series Prediction for New Publications viaDynamic Heterogeneous Information Network Embedding |
Authors: Song Jiang (University of California, Los Angeles), Bernard Koch (University of California, Los Angeles) and Yizhou Sun (University of California, Los Angeles). |
Accurate prediction of scientific impact is important for scientists,academic recommender systems, and granting organizations alike.Existing approaches rely on many years of leading citation valuesto predict a scientific paper’s citations (a proxy for impact), eventhough most papers make their largest contributions in the firstfew years after they are published. In this paper, we tackle a newproblem that predicting a newly published paper’s citation timeseries from the date of publication (i.e., without leading values). WeproposeHINTS, a novel end-to-end deep learning framework thatturns citation signals from dynamic heterogeneous informationnetworks (DHIN) to citation time series. HINTS imputes pseudo-leading values for a paper in the years before it is published fromDHIN embeddings, and then transforms these embeddings into theparameters of a model that can predict citation counts immediatelyafter publication. Empirical analysis on two real-world datasetsfrom Computer Science and Physics show that HINTS is competi-tive with baseline citation prediction models. While we focus oncitations, our approach generalizes to other “cold start" time seriesprediction tasks where relational data is available and accurateprediction in early timestamps is crucial. |
Title: How Do Hyperedges Overlap in Real-World Hypergraphs? – Patterns, Measures, and Generators |
Authors: Geon Lee (Korea Advanced Institute of Science and Technology), Minyoung Choe (Korea Advanced Institute of Science and Technology) and Kijung Shin (Korea Advanced Institute of Science and Technology). |
Hypergraphs, a generalization of graphs, naturally represent groupwise relationships among multiple individuals or objects, which are common in many application areas, including web, bioinformatics, and social networks. The flexibility in the number of nodes in each hyperedge, which provides the expressiveness of hypergraphs, brings about structural differences between graphs and hypergraphs. Especially, the overlaps of hyperedges lead to complex high-order relations beyond pairwise relations, raising new questions that have not been considered in graphs: How do hyperedges overlap in real-world hypergraphs? Are there any pervasive characteristics? What underlying process can cause such patterns? In this work, we closely investigate thirteen real-world hypergraphs from various domains and share interesting observations of the overlaps of hyperedges. To this end, we define principled measures and statistically compare the overlaps of hyperedges in real-world hypergraphs and those in null models. Additionally, based on the observations, we propose HyperLap, a realistic hypergraph generative model. HyperLap is (a) Realistic: it accurately reproduces overlapping patterns of real-world hypergraphs, (b) Automatically Fittable: its parameters can be tuned automatically using HyperLap+ to generate hypergraphs particularly similar to a given target hypergraph, (c) Scalable: it generates and fits hypergraphs with 0.7 billion hyperedges within few hours. |
Title: IFSpard: An Information Fusion-based Framework for Spam Review Detection |
Authors: Yao Zhu (Peking University), Hongzhi Liu (Peking University), Yingpeng Du (Peking University) and Zhonghai Wu (Peking University). |
Online reviews, which contain the quality information and user experience about products, always affect the consumption decisions of customers. Unfortunately, quite a number of spammers attempt to mislead consumers by writing fake reviews for some intents. Existing methods for detecting spam reviews mainly focus on constructing discriminative features, which heavily depend on experts and may miss some complex but effective features. Recently, some models attempt to learn the latent representations of reviews, users, and items. However, the learned embeddings usually lack of interpretability. Moreover, most of existing methods are based on single classification model while ignoring the complementarity of different classification models. To solve these problems, we propose IFSpard, a novel information fusion-based framework that aims at exploring and exploiting useful information from various aspects for spam review detection. First, we design a graph-based feature extraction method and an interaction-mining-based feature crossing method to automatically extract basic and complex features with consideration of different sources of data. Then, we propose a mutual-information-based feature selection and representation learning method to remove the irrelevant and redundant information contained in the automatically constructed features. Finally, we devise an adaptive ensemble model to make use of the information of constructed features and the abilities of different classifiers for spam review detection. Experimental results on several public datasets show that the proposed model performs better than state-of-the-art methods. |
Title: Improving Cyberbully Detection with User Interaction |
Authors: Suyu Ge (Tsinghua University), Lu Cheng (Arizona State University) and Huan Liu (Arizona State University). |
Cyberbullying, identified as intended and repeated online bullying behavior, has become increasingly prevalent in the past few decades. Despite the significant progress made thus far, the focus of most existing work on cyberbullying detection lies in the independent content analysis of different comments within a social media session. We argue that such leading notions of analysis suffer from three key limitations: they overlook the temporal correlations among different comments; they only consider the content within a single comment rather than the topic coherence across comments; they remain generic and exploit limited interactions between social media users. In this work, we observe that user comments in the same session may be inherently related, e.g., discussing similar topics, and their interaction may evolve over time. We also show that modeling such topic coherence and temporal interaction are critical to capture the repetitive characteristics of bullying behavior, thus leading to better prediction performance. To achieve the goal, we first construct a unified temporal graph for each social media session. Drawing on recent advances in graph neural network, we then propose a principled approach for modeling the temporal dynamics and topic coherence of user interactions. We empirically evaluate the effectiveness of our approach with the tasks of session-level bullying detection and comment-level case study. |
Title: Improving Graph Neural Networks with Structural Adaptive Receptive Fields |
Authors: Xiaojun Ma (Peking University), Junshan Wang (Peking University), Hanyue Chen (Peking University) and Guojie Song (Peking University). |
The abundant information in the graph helps us to learn more expressive node representations. Different nodes in the neighborhood bring different information. Average weight aggregation in most Graph Neural Networks fails to distinguish such difference. GAT-based models introduce attention mechanism to solve this problem. However, this attention mechanism they introduced only considers the similarity between node features, and ignores the rich structural information contained in the graph structure to some degree. In addition, for each node, we need specific encoder to aggregate the distinguishing information from the neighborhood. In this paper, we propose Graph Neural Networks with Structural Adaptive Receptive fields (STAR-GNN), which generates a learnable receptive field with local structure for each node. Further, STAR-GNN preserves not only the simple tree-like graph structure of each node, but also the induced subgraph of the central node and the learnt receptive field. Experimental results demonstrate the power of STAR-GNN in learning local-structural receptive fields adaptively and encoding more informative structural characteristics in graph neural networks. |
Title: Improving Neural Question Generation using Deep Linguistic Representation |
Authors: Wei Yuan (Nanjing University), Tieke He (Nanjing University) and Xinyu Dai (Nanjing University). |
Question Generation (QG) is a challenging Natural Language Processing (NLP) task which aims at generating questions with given answers and context. There are many works incorporating linguistic features to improve the performance of QG. However, similar to traditional word embedding, these works normally embed such features with a set of trainable parameters. Which, resulting in the linguistic features not fully exploited. In this work, inspired by the recent achievements of text representation, we propose to utilize linguistic information via large pre-trained neural models. First, these models are trained in several specific NLP tasks in order to better represent linguistic features. Then, such feature representation is fused into a seq2seq based QG model to guide question generation. Extensive experiments were conducted on two benchmark Question Generation datasets to evaluate the effectiveness of our approach. The experimental results demonstrate that our approach outperforms the state-of-the-art QG systems, as a result, it significantly improves the baseline by 17.2% and 6.2% under BLEU-4 metric on these two datasets, respectively. |
Title: Improving Text Encoder via Graph Neural Network in Sponsored Search |
Authors: Jason Zhu (Stanford University), Yanling Cui (Microsoft), Yuming Liu (Microsoft), Huasha Zhao (Microsoft), Hao Sun (Microsoft), Xue Li (Microsoft), Markus Pelger (Stanford University), Liangjie Zhang (Microsoft), Tianqi Yang (Microsoft) and Ruofei Zhang (Microsoft). |
Text encoders based on C-DSSM or transformers have demonstrated strong performance in many Natural Language Processing (NLP) tasks. Low latency variants of these models have also been developed in recent years in order to apply them in the field of sponsored search which has strict computational constraints. However these models are not the panacea to solve all the Natural Language Understanding (NLU) challenges as the pure semantic information in the data is not sufficient to fully identify the user intents. We propose the TextGNN model that naturally extends the strong twin tower structured encoders with the complementary graph information from user historical behaviors, which serves as a natural guide to help us better understand the intents and hence generate better language representations. The model inherits all the benefits of twin tower models such as C-DSSM and TwinBERT so that it can still be used in the low latency environment while achieving a significant performance gain than the strong encoder-only counterpart baseline models in both offline evaluations and online production system. In offline experiments, the model achieves a 0.14% overall increase in ROC-AUC with a 1% increased accuracy for long-tail low-frequency Ads, and in the online A/B testing, the model shows a 2.03% increase in Revenue Per Mille with a 2.32% decrease in Ad defect rate. |
Title: Incentive Mechanism for Horizontal Federated Learning Based on Reputation and Reverse Auction |
Authors: Jingwen Zhang (School of Data and Computer Science, Sun Yat-sen University), Yuezhou Wu (School of Data and Computer Science, Sun Yat-sen University) and Rong Pan (School of Data and Computer Science, Sun Yat-sen University). |
The current research on federated learning has focused on federated optimization, improving efficiency and effectiveness, preserving privacy, etc., but there are relatively few studies on incentive mechanisms. Most studies do not take into account the fact that if there is no profit, the participant is not motivated to provide data and train the model, and the task requester has no way to identify and select reliable participants with high-quality data. Therefore, in this paper, we propose an incentive mechanism for federated learning based on reputation and reverse auction theory. Participants bid for tasks, and reputation indirectly reflects their reliability and data quality. In this federated learning scenario, we select and reward participants by combining their reputation and bid price under limited budget conditions. Theoretical analysis proves that this mechanism meets properties such as truthfulness, etc., and simulation results show the effectiveness of it. |
Title: Incremental Spatio-Temporal Graph Learning for Online Query-POI Matching |
Authors: Zixuan Yuan (Rutgers University), Hao Liu (Business Intelligence Lab, Baidu Research), Junming Liu (City University of Hong Kong), Yanchi Liu (Rutgers University), Yang Yang (Nanjing University of Science and Technology), Renjun Hu (Beihang University) and Hui Xiong (Rutgers University). |
Query and Point-of-Interest (POI) matching, aiming at recommending the most relevant POIs from partial query keywords, has become one of the most essential functions in online navigation and ride-hailing applications. Existing methods for query-POI matching, such as Google Maps and Uber, have a natural focus on measuring the static semantic similarity between contextual information of queries and geographical information of POIs. However, it remains challenging for dynamic and personalized online query-POI matching because of the non-stationary and situational context-dependent query-POI relevance. Moreover, the large volume online queries require an adaptive and incremental model training strategy that is efficient and scalable in the online scenario. To this end, in this paper, we propose an textit{Incremental Spatio-Temporal Graph Learning}~(IncreSTGL) framework for intelligent online query-POI matching. Specifically, we first model dynamic query-POI interactions as microscopic and macroscopic graphs. Then, we propose an textit{incremental graph representation learning} module to refine and update query-POI interaction graphs in an online incremental fashion, which includes: (i) a contextual graph attention operation quantifying query-POI correlation based on historical queries under dynamic situational context, (ii) a graph discrimination operation capturing the sequential query-POI relevance drift from a holistic view of personalized preference and social homophily, and (iii) a multi-level temporal attention operation summarizing the temporal variations of query-POI interaction graphs for subsequent query- POI matching. Finally, we introduce a lightweight semantic matching module for online query-POI similarity measurement. To demonstrate the effectiveness and efficiency of the proposed algorithm, we conduct extensive experiments on real-world data from Baidu Maps, the leading online navigation and map service provider in China. |
Title: Incrementality Testing in Programmatic Advertising: Enhanced Precision with Doubled-Blind Designs |
Authors: Joel Barajas (Yahoo! Research, Verizon Media) and Narayan Bhamidipati (Yahoo! Research, Verizon Media). |
Measuring the incremental value of advertising (incrementality) is critical for financial planning and budget allocation by advertisers. Running randomized controlled experiments is the gold standard in marketing incrementality measurement. Current literature and industry practices to run incrementality experiments focus on running placebo, intention-to-treat (ITT), or ghost bidding based experiments. A fundamental challenge with these is that the serving engine as treatment administrator is not blind to the user treatment assignment. Similarly, ITT and ghost bidding solutions provide greatly decreased precision since many experiment users never see ads. We present a novel randomized design solution for incrementality testing based on ghost bidding with improved measurement precision. Our design provides faster and cheaper results including double-blind, to the users and to the serving engine, post-auction experiment execution without ad targeting bias. We also identify ghost impressions in open ad exchanges by matching the bidding values or ads sent to external auctions with held-out bid values. This design leads to larger precision than ITT or current ghost bidding solutions. Our proposed design has been fully deployed in a real production system within a commercial programmatic ad network combined with a Demand Side Platform (DSP) that places ad bids in third-party ad exchanges. We have found reductions of up to 85% of the advertiser budget to reach statistical significance with typical ghost bids conversion and winner rates. By deploying this design, for an advertiser in the insurance industry, to measure the incrementality of display and native programmatic advertising, we have found conclusive evidence that the last-touch attribution framework (current industry standard) undervalues these channels by 87% when compared to the incremental conversions derived from the experiment. |
Title: Inductive Entity Representations from Text via Link Prediction |
Authors: Daniel Daza (Vrije Universiteit Amsterdam), Michael Cochez (Vrije Universiteit Amsterdam) and Paul Groth (University of Amsterdam). |
Knowledge Graphs (KG) are of vital importance for multiple applications on the web, including information retrieval, recommender systems, and metadata annotation. Regardless of whether they are built manually by domain experts or with automatic pipelines, KGs are often incomplete. To address this problem, there is a large amount of work that proposes using machine learning to complete these graphs by predicting new links. Recent work has begun to explore the use of textual descriptions available in knowledge graphs to learn vector representations of entities in order to preform link prediction. However, the extent to which these representations learned for link prediction generalize to other tasks is unclear. This is important given the cost of learning such representations. Ideally, we would prefer representations that do not need to be trained again when transferring to a different task, while retaining reasonable performance. Therefore, in this work, we propose a holistic evaluation protocol for entity representations learned via a link prediction objective. We consider the inductive link prediction and entity classification tasks, which involve entities not seen during training. We also consider an information retrieval task for entity-oriented search. We evaluate an architecture based on a pretrained language model, that exhibits strong generalization to entities not observed during training, and outperforms related state-of-the-art methods (22% MRR improvement in link prediction on average). We further provide evidence that the learned representations transfer well to other tasks without fine-tuning. In the entity classification task we obtain an average improvement of 16% in accuracy compared with baselines that also employ pre-trained models. In the information retrieval task, we obtain significant improvements of up to 8.8% in NDCG@10 for natural language queries. We thus show that the learned representations are not limited KG-specific tasks, and have greater generalization properties than evaluated in previous work. |
Title: Information Elicitation from Rowdy Crowds |
Authors: Grant Schoenebeck (University of Michigan), Fang-Yi Yu (Harvard University) and Yichi Zhang (University of Michigan). |
We initiate the study of information elicitation mechanisms for a crowd containing both self-interested agents, who respond to incentives, and adversarial agents, who may collude to disrupt the system. Our mechanisms work in the peer prediction setting where ground truth need not be accessible to the mechanism or even exist. We provide a meta-mechanism that reduces the design of peer prediction mechanisms to a related robust learning problem. The resulting mechanisms are $epsilon$-informed truthful, which means truth-telling is the highest paid $epsilon$-Bayesian Nash equilibrium (up to $epsilon$-error) and pays strictly more than uninformative equilibria. The value of $epsilon$ depends on the properties of the robust learning algorithm, and typically limits to $0$ as the number of tasks and or agents increases. We show how to use our meta-mechanism to design mechanisms with provable guarantees in two important crowdsourcing settings even when some agents are self-interested and others are adversarial. |
Title: Information Extraction From Co-Occurring Similar Entities |
Authors: Nicolas Heist (University of Mannheim) and Heiko Paulheim (University of Mannheim). |
Knowledge about entities and their interrelations is a crucial factor of success for tasks like question answering or text summarization. Publicly available knowledge graphs like Wikidata or DBpedia are, however, far from being complete. In this paper, we explore how information extracted from similar entities that co-occur in structures like tables or lists can help to increase the coverage of such knowledge graphs. In contrast to existing approaches, we do not focus on relationships within a listing (e.g., between two entities in a table row) but on the relationship between a listing's subject entities and the context of the listing. To that end, we propose a descriptive rule mining approach that uses distant supervision to derive rules for these relationships based on a listing's context. Extracted from a suitable data corpus, the rules can be used to extend a knowledge graph with novel entities and assertions. In our experiments we demonstrate that the approach is able to extract up to 3M novel entities and 30M additional assertions from listings in Wikipedia. We find that the extracted information is of high quality and thus suitable to extend Wikipedia-based knowledge graphs like DBpedia, YAGO, and CaLiGraph. For the case of DBpedia, this would result in an increase of covered entities by roughly 50%. |
Title: Insightful Dimensionality Reduction with Very Low Rank Variable Subsets |
Authors: Bruno Ordozgoiti (Aalto University), Sachith Pai (University of Helsinki) and Marta Kołczyńska (Polish Academy of Sciences). |
Dimensionality reduction techniques can be employed to produce robust, cost-effective predictive models, and to enhance interpretability in exploratory data analysis. However, the models produced by many of these methods are formulated in terms of abstract factors or are too high-dimensional to facilitate insight and fit within low computational budgets. In this paper we explore an alternative approach to interpretable dimensionality reduction. Given a data matrix, we study the following question: are there subsets of variables that can be primarily explained by a single factor? We formulate this challenge as the problem of finding submatrices close to rank one. Despite its potential, this topic has not been sufficiently addressed in the literature, and there exist virtually no algorithms for this purpose that are simultaneously effective, efficient and scalable. We formalize the task as two problems which we characterize in terms of computational complexity, and propose efficient, scalable algorithms with approximation guarantees. Our experiments demonstrate how our approach can produce insightful findings in data, and show our algorithms to be superior to strong baselines. |
Title: Integrating Floor Plans into Hedonic Models for Rent Price Appraisal |
Authors: Kirill Solovev (University of Giessen) and Nicolas Pröllochs (University of Giessen). |
Online real estate platforms have become significant marketplaces facilitating users' search for an apartment or a house. Yet it remains challenging to accurately appraise a property's value. Prior works have primarily studied real estate valuation based on hedonic price models that take structured data into account while accompanying unstructured data is typically ignored. In this study, we investigate to what extent an automated visual analysis of apartment floor plans on online real estate platforms can enhance hedonic rent price appraisal. We propose a tailored two-staged deep learning approach to learn price-relevant aesthetics of floor plans from historical price data. Subsequently, we integrate the floor plan predictions into hedonic rent price models that account for both structural and locational characteristics of an apartment. Our empirical analysis based on a unique dataset of 9,174 real estate listings suggests that there is an underutilization of the available data in current hedonic models. We find that (1) the aesthetics of floor plans have significant explanatory power regarding rent prices – even after controlling for structural and locational apartment characteristics, and (2) harnessing floor plans results in an up to 10.56% lower out-of-sample prediction error. We further find that floor plans yield a particular high gain in predictive performance for older and smaller apartments. Altogether, our empirical findings contribute to the existing research body by establishing the link between visual aesthetics of floor plans and real estate prices. Moreover, our approach has important implications for online real estate platforms, which can use our findings to enhance user experience in their real estate listings. |
Title: Interest-aware Message-Passing GCN for Recommendation |
Authors: Fan Liu (Shandong University), Zhiyong Cheng (Shandong Artificial Intelligence Institute), Lei Zhu (Shandong Normal University), Zan Gao (Shandong Artificial Intelligence Institute) and Liqiang Nie (Shandong University). |
Graph Convolution Networks (GCNs) manifest great potential in recommendation. This is attributed to their capability on learning good user and item embeddings by exploiting the collaborative signals from the high-order neighbors. Like other GCN models, the GCN based recommendation models also suffer from the notorious over-smoothing problem – when stacking more layers, node embeddings becomes more similar and eventually indistinguishable, resulted in performance degradation. The recently proposed LightGCN and LR-GCN alleviate this problem to some extent, however, we argue that they overlook an important factor for the over-smoothing problem in recommendation, that is, high-order neighboring users with no common interests of a user can be also involved in her embedding learning in the graph convolution operation. As a result, the multi-layer graph convolution will make users with dissimilar interests have similar embeddings. In this paper, we propose a novel Interest-aware Message-Passing GCN (IMP-GCN) recommendation model, which performs high-order graph convolution inside subgraphs. The subgraph consists of users with similar interests and their interacted nodes. To form the subgraphs, we design a subgraph generation algorithm, which can effectively identify users with common interests by exploiting both user feature and graph structure. To this end, our model can avoid propagating negative information from high-order neighbors into embedding learning. Experimental results on three large-scale benchmark datasets show that our model can gain performance improvement by stacking more layers and outperforms the state-of-the-art GCN-based recommendation models significantly. |
Title: Interpreting and Unifying Graph Neural Networks with An Optimization Framework |
Authors: Meiqi Zhu (Beijing University of Posts and Telecommunications), Xiao Wang (Beijing University of Posts and Telecommunications), Chuan Shi (Beijing University of Posts and Telecommunications), Houye Ji (Beijing University of Posts and Telecommunications) and Peng Cui (Tsinghua University). |
Graph Neural Networks (GNNs) have received considerable attention on graph-structured data learning for a wide variety of tasks. The well-designed propagation mechanism which has been demonstrated effective is the most fundamental part of GNNs. Although most of GNNs basically follow a message passing manner, litter effort has been made to discover and analyze their essential relations. In this paper, we establish a surprising connection between different propagation mechanisms with a unified optimization problem, showing that despite the proliferation of various GNNs, in fact, their proposed propagation mechanisms are the optimal solution optimizing a feature fitting function over a wide class of graph filters with a graph regularization term. Our proposed unified optimization framework, summarizing the commonalities between several of the most representative GNNs, not only provides a macroscopic view on surveying the relations between different GNNs, but also further opens up new opportunities for flexibly designing new GNNs. With the proposed framework, we discover that existing works usually utilize naive graph filters for feature fitting function, and we further develop two novel objective functions considering adjustable low-pass or high-pass graph filters respectively. Moreover, we provide the convergence proofs and expressive power comparisons for the proposed models. Extensive experiments on benchmark datasets clearly show that the proposed GNNs not only outperform the state-of-the-art methods but also have good ability to alleviate over-smoothing, and further verify the feasibility for designing GNNs with our unified optimization framework. Codes and datasets will be publicly available after the review. |
Title: Interventions for Softening Can Lead to Hardening of Opinions: Evidence from a Randomized Controlled Trial |
Authors: Andreas Spitz (Ecole Polytechnique Fédérale de Lausanne), Ahmad Abu-Akel (University of Lausanne) and Robert West (Ecole Polytechnique Fédérale de Lausanne). |
Motivated by the goal of designing interventions for softening polarized opinions on the Web, and building on results from psychology, we hypothesized that people would be moved more easily towards opposing opinions when voiced by a celebrity they like, rather than by a celebrity they dislike or by an expert. We tested this hypothesis in a survey-based randomized controlled trial in which we exposed participants to opinions that were randomly assigned to one of four spokespersons each: a disagreeing but liked celebrity, a disagreeing and disliked celebrity, a disagreeing expert, and an agreeing but disliked celebrity. After the treatment, we measured changes in the participants' opinions, empathy towards the spokespersons, and use of affective language. Unlike hypothesized, no softening of opinions was observed regardless of the participants' attitudes towards the celebrity. Instead, we found strong evidence of a hardening of pre-treatment opinions when a disagreeing opinion was attributed to an expert and when an agreeing opinion was attributed to a disliked celebrity. While opinion change was elusive, we observed a pronounced reduction in empathy for disagreeing spokespersons, indicating a punitive response. The only celebrity for whom empathy remained unchanged was the one who agreed, even though they were disliked. Our results could be explained as a reaction to violated expectations towards experts and as a perceived breach of trust by liked celebrities. They confirm that naive strategies at mediation may not yield intended results, and how difficult it is to depolarize--and how easy it is to further polarize or provoke emotional responses. |
Title: It’s Not Just the Site, it’s the Contents: Intra-domain Fingerprinting Social Media Websites Through CDN Bursts |
Authors: Kailong Wang (National University of Singapore), Junzhe Zhang (Sichuan University), Guangdong Bai (The University of Queensland), Ryan Ko (The University of Queensland) and Jin Song Dong (National University of Singapore). |
Analysis of encrypted traffic using various machine learning techniques could threaten user privacy in web browsing. The website fingerprinting (or inter-domain WSF) has been shown to identify websites a user has visited. To our best knowledge, a finer-grained problem of web page fingerprinting (or intra-domain WPF) has not been systematically studied by our research community. The WPF attackers, such as government agencies who enforce Internet censorship, are keen to identify the particular web pages (e.g., a political dissident’s social media page) the target user has visited, rather than mere web domain information. In this work, we investigate the intra-domain WPF against social media websites. Our study involves the realistic on-path passive attack scenario. We reveal that delivering large-size data such as images and videos via Content Delivery Networks (CDNs), which is a common practice among social media websites, makes intra-domain WPF highly feasible. The occurring network traffic while the browser is rendering a social media page exhibits temporal patterns—which may be due to the critical rendering path and packet segmentation—and they are sufficiently recognizable by machine learning algorithms. We characterize such patterns as CDN bursts, and use features extracted from them to empower classification algorithms to achieve a high classification accuracy (96%) and a low false positive rate (0.02%). To alleviate the threat of intra-domain WPF, we also propose and evaluate countermeasures such as deviating the packet interval time and inserting dummy requests. |
Title: Itinerary-aware Personalized Deep Matching at Fliggy |
Authors: Jia Xu (College of Computer, Electronics and Information, Guangxi University), Ziyi Wang (Alibaba Group), Zulong Chen (Alibaba Group), Detao Lv (Alibaba Group), Yao Yu (Alibaba Group) and Chuanfei Xu (Huawei Technologies Co. Ltd). |
Matching items for a user from a travel item pool of large cardinality have been the most important technology for increasing the business at Fliggy, one of the most popular online travel platforms (OTPs) in China. There are three major challenges facing OTPs: sparsity, diversity, and implicitness. In this paper, we present a novel Fliggy ITinerary-aware deep matching NETwork (FitNET) to address these three challenges. FitNET is designed based on the popular deep matching network, which has been successfully employed in many industrial recommendation systems, due to its effectiveness. |
Title: Joint Spatio-Textual Reasoning for Answering Tourism Questions |
Authors: Danish Contractor (IBM), Shashank Goel (IIT Delhi), Mausam (IIT Delhi) and Parag Singla (Indian Institute of Technology Delhi). |
Our goal is to answer real-world tourism questions that seek Points-of-Interest (POI) recommendations. Such questions express various kinds of spatial and non-spatial constraints, necessitating a combination of textual and spatial-reasoning. In response, we develop the first joint spatio-textual reasoning model, which combines geo-spatial knowledge with information in textual corpora to answer questions.We first develop a modular spatial-reasoning network that uses geo-coordinates of location names mentioned in a question, and of candidate answer POIs, to reason over only spatial constraints. We then combine our spatial-reasoner with a textual reasoner in a joint model and present experiments on a real world POI recommendation task. We report substantial improvements over existing models with- out joint spatio-textual reasoning. To the best of our knowledge, we are the first to develop a joint QA model that combines reasoning over external geo-spatial knowledge along with textual reasoning. |
Title: Keyword-aware Abstractive Summarization by Extracting Set-level Intermediate Summaries |
Authors: Yizhu Liu (Shanghai Jiao Tong University), Qi Jia (Shanghai Jiao Tong University) and Kenny Zhu (Shanghai Jiao Tong University). |
Abstractive summarization is useful in providing a summary or digest of news or other web texts and enhancing users reading experience, especially when they are reading on small displays such as mobile phones. However, existing encoder-decoder summarization models have difficulty learning the latent alignment between source documents and summaries because of their vast disparity in length. In this paper, we propose a extractor-abstractor framework in which the keyword-based extractor selects a few sets of salient sentences from the input document and then the abstractor paraphrases these sets of sentences in parallel, which are more aligned to the summary, to generate the final summary. The new extractor and abstractor are pre-trained from a set of “pseudo summaries” extracted by specially designed heuristics, and then further trained together in a reinforcement learning framework. The results show that the proposed model generates high-quality summaries with faster training speed and less training memory footprint, and outperforms the state-of-the-art models on CNN/Daily Mail, Webis-TLDR-17, Webis-Snippet-20, WikiHow and DUC-2002 datasets. |
Title: Knowledge Embedding Based Graph Convolutional Network |
Authors: Donghan Yu (Carnegie Mellon University), Yiming Yang (Carnegie Mellon University), Ruohong Zhang (Carnegie Mellon University) and Yuexin Wu (Carnegie Mellon University). |
Recently, a considerable literature has grown up around the theme of Graph Convolutional Network (GCN). How to effectively leverage the rich structural information in complex graphs, such as knowledge graphs with heterogeneous types of entities and relations, is a primary open challenge in the field. Most GCN methods are either restricted to graphs with a homogeneous type of edges (e.g., citation links only), or focusing on representation learning for nodes only instead of jointly propagating and updating the embeddings of both nodes and edges for target-driven objectives. This paper addresses these limitations by proposing a novel framework, namely the Knowledge Embedding based Graph Convolutional Network (KE-GCN), which combines the power of GCNs in graph-based belief propagation and the strengths of advanced knowledge embedding (a.k.a. knowledge graph embedding) methods, and goes beyond. Our theoretical analysis shows that KE-GCN offers an elegant unification of several well-known GCN methods as specific cases, with a new perspective of graph convolution. Experimental results on benchmark datasets show the advantageous performance of KE-GCN over strong baseline methods in the tasks of knowledge graph alignment and entity classification. |
Title: Knowledge-Aware Procedural Text Understanding with Multi-Stage Training |
Authors: Zhihan Zhang (Peking University), Xiubo Geng (STCA NLP Group, Microsoft), Tao Qin (MSRA), Yunfang Wu (Peking University) and Daxin Jiang (STCA NLP Group, Microsoft). |
Procedural text describes dynamic state changes during a step-by-step natural process (e.g., photosynthesis). In this work, we focus on the task of procedural text understanding, which aims to comprehend such documents and track entities' states and locations during a process. Although recent approaches have achieved substantial progress, their results are far behind human performance. Two challenges, the difficulty of commonsense reasoning and data insufficiency, still remain unsolved, which require the incorporation of external knowledge bases. Previous works on external knowledge injection usually rely on noisy web mining tools and heuristic rules with limited applicable scenarios. In this paper, we propose a novel KnOwledge-Aware proceduraL text understAnding (KOALA) model, which effectively leverages multiple forms of external knowledge in this task. Specifically, we retrieve informative knowledge triples from ConceptNet and perform knowledge-aware reasoning while tracking the entities. Besides, we employ a multi-stage training schema which fine-tunes the BERT model over unlabeled data collected from Wikipedia before further fine- tuning it on the final model. Experimental results on two procedural text datasets, ProPara and Recipes, verify the effectiveness of the proposed methods, in which our model achieves state-of-the-art performance in comparison to various baselines. |
Title: Knowledge-Preserving Incremental Social Event Detection via Heterogeneous GNNs |
Authors: Yuwei Cao (University of Illinois at Chicago), Hao Peng (Beihang University), Jia Wu (Macquarie University), Yingtong Dou (University of Illinois at Chicago), Jianxin Li (Beihang University) and Philip Yu (University of Illinois at Chicago). |
Social events provide valuable insights into group social behaviors and public concerns and therefore have many applications in fields such as product recommendation and crisis management. The complexity and streaming nature of social messages make it appealing to address social event detection in an incremental learning setting, where acquiring, preserving, and extending knowledge are major concerns. Most existing methods, including those based on incremental clustering and community detection, learn limited amounts of knowledge as they ignore the rich semantics and structural information contained in the social data. Moreover, they cannot memorize previously acquired knowledge. In this paper, we propose a novel Knowledge-Preserving Incremental Heterogeneous Graph Neural Network (KPGNN) for social event detection. To acquire more knowledge, KPGNN models complex social messages into unified social graphs to facilitate data utilization and explores the expressive power of GNNs for knowledge extraction. To continuously adapt to the incoming data, KPGNN adopts contrastive loss terms that cope with a changing number of event classes. It also leverages the inductive learning ability of GNNs to efficiently detect events and extends its knowledge from the previously unseen data. To deal with large social streams, KPGNN adopts mini-batch subgraph sampling for scalable training and periodically removes obsolete data to maintain a dynamic embedding space. KPGNN requires no feature engineering and has few hyperparameters to tune. Extensive experimental results demonstrate the superiority of KPGNN over various baselines |
Title: Large-scale Comb-K Recommendation |
Authors: Houye Ji (Beijing University of Posts and Telecommunications), Junxiong Zhu (alibaba group), Chuan Shi (Beijing University of Posts and Telecommunications), Xiao Wang (Beijing University of Posts and Telecommunications), Bai Wang (Beijing University of Posts and Telecommunications), Chaoyu Zhang (Alibaba Group), Zixuan Zhu (Alibaba Group), Feng Zhang (Alibaba Group) and Yanghua Li (Alibaba Group). |
The prosperous development of e-commerce has spawned diverse recommendation scenarios (e.g., promotion scenario), and accompanied a new recommendation paradigm, comb-K recommendation. In the promotion scenario, the recommender system first selects K promotional items with lightning deals or limited-time discounts and then dispenses them to each user, encouraging the purchase desire of users and maximizing the total revenue of K items on all users. Significantly different from traditional top-K recommendation, when selecting a combination of K items, comb-K recommendation needs to fully consider the preferences of all users, rather than the preference of a single user. Considering the fact that each user only views a small part of selected items (a.k.a., dispensation window phenomenon), when selecting K items, we need to consider how many items are validly dispensed to each user and indeed generate revenue. To maximize the total revenue of K items, comb-K recommendation needs to address the following questions: (1) How to seamlessly integrating the item selection and item dispensation? (2) How to fully consider the preferences of all users in a productive way? Thus, we take the first step to formula comb-K recommendation as a combinatorial optimization problem with the crucial constraint of the dispensation window. Specifically, we model the promotion scenario as a heterogeneous graph and leverage a heterogeneous graph neural network to estimate user-item preferences, serving as the basis of comb-K recommendation in the user-level. However, for large-scale promotion scenario, the comb-K recommendation will be combination explosion and becomes unsolvable. To handle large-scale promotion scenario, we design a novel heterogeneous graph pooling model to cluster massive users into limited crowds and estimate crowd-item preference, so the large-scale comb-K recommendation becomes solvable in the crowd-level. Then, considering the "long tail" phenomenon in e-commerce, we design a fast strategy called restricted neighbor heuristic search to further accelerate the solving process of large-scale comb-K recommendation. Extensive experiments on four datasets demonstrate the superiority of comb-K recommendation over top-K recommendation. On billion-scale datasets, the proposed comb-K recommendation significantly improves the Total Click and Hit Ratio by 9.35% and 7.14%, respectively. |
Title: Latent Target-Opinion as Prior for Document-Level Sentiment Classification: A Variational Approach from Fine-Grained Perspective |
Authors: Hao Fei (Wuhan University), Yafeng Ren (Wuhan University), Shengqiong Wu (Wuhan University), Bobo Li (Wuhan University) and Donghong Ji (Wuhan University). |
Existing works for document-level sentiment classification task treat the review document as an overall text unit, performing feature extraction with various sophisticated model architectures. In this paper, we draw inspiration from fine-grained sentiment analysis, proposing to first learn the latent target-opinion distribution behind the documents, and then leverage such fine-grained prior knowledge into the classification process. We model the latent target-opinion distribution as hierarchical variables, where global-level variable captures the overall target and opinion, and local-level variables retrieve the detailed opinion clues at the word level. The proposed method consists of two main parts: a variational module and a classification module. We employ the conditional variational autoencoder to make reconstructions of the document, during which the user and product information can be integrated. In the classification module, we build a hierarchical model based on Transformer encoders, where the local-level and global-level prior distribution representations induced from the variational module are injected into the word-level and sentence-level Transformers, respectively. Experimental results on benchmark datasets show that the proposed method significantly outperforms strong baselines, achieving the state-of-the-art performance. Further analysis shows that our model is capable of capturing the latent fine-grained target and opinion prior information, which is highly effective for improving the task performance. |
Title: LChecker: Detecting Loose Comparison Bugs in PHP |
Authors: Penghui Li (The Chinese University of Hong Kong) and Wei Meng (The Chinese University of Hong Kong). |
Weakly-typed languages such as PHP support loosely comparing two values. Such a language feature is widely used but can also pose severe security threats, because operand values can be implicitly converted into a different type or value. In certain conditions, buggy loose comparisons can cause unexpected results, leading to authentication bypass and other functionality problems. In this paper, we present the first in-depth study of such loose comparison bugs. We develop LChecker, a system to detect PHP loose comparison bugs. It employs a context-sensitive inter-procedural data-flow analysis together with several new techniques to precisely detect loose comparison bugs. We also enhance the execution engine to help validate loose comparison bugs dynamically. Our evaluation shows that LChecker can both effectively and efficiently detect bugs with reasonably low false-positive rate. LChecker has successfully detected all previously known bugs in our evaluation dataset with no false negative. Using LChecker, we have confirmed 50 loose comparison bugs, of which 42 are new bugs. |
Title: Learning a Product Relevance Model from Click-Through Data in e-Commerce |
Authors: Shaowei Yao (Peking University), Jiwei Tan (Alibaba Group), Xi Chen (Alibaba Group), Keping Yang (Alibaba Group), Rong Xiao (Alibaba Group), Hongbo Deng (Alibaba Group) and Xiaojun Wan (Peking University). |
The search engine plays a fundamental role in online e-commerce systems, to help users find the products they want from the massive product collections. Relevance is an essential requirement for e-commerce search, since showing products that do not match search query intent will degrade user experience. With the existence of vocabulary gap between user language of queries and seller language of products, measuring semantic relevance is necessary and neural networks are engaged to address this task. However, semantic relevance is different from click-through rate prediction in that no direct training signal is available. Most previous attempts learn relevance models from user click-through data that are cheap and abundant. Unfortunately, click behavior is noisy and misleading, which is affected by not only relevance but also factors including price, image and attractive titles. Therefore, it is challenging but valuable to learn relevance models from click-through data. In this paper, we propose a new relevance learning framework that concentrates on how to train a relevance model from the weak supervision of click-through data. Different from previous efforts that treat samples as either relevant or irrelevant, we construct more fine-grained samples for training. We propose a novel way to consider samples of different relevance confidence, and come up with a new training objective to learn a robust relevance model with desirable score distribution. The proposed model is evaluated on offline annotated data and online A/B testing, and it achieves both promising performance and high computational efficiency. The model has already been deployed online, serving the search traffic of Taobao for over a year. |
Title: Learning Dynamic User Behavior Based on Error-driven Event Representation |
Authors: Honglian Wang (University of Science and Technology of China), Peiyan Li (Ludwig-Maximilians-Universität München), Wujun Tao (University of Science and Technology of China), Bailin Feng (University of Science and Technology of China) and Junming Shao (University of Science and Technology of China). |
Understanding the evolution of large graphs over time is of significant importance in user behaviors understanding and prediction. Modeling user behavior with temporal networks has gained increasing attention in recent years since it allows capturing users' dynamic preferences and predicting their next actions. Recently, some approaches have been proposed to model user behavior. However, these methods suffer from two problems: they work on static data which ignores the dynamic evolution, or they model the whole behavior sequences directly by recurrent neural networks and thus suffer from noisy information. To tackle these problems, we propose a dynamic user behavior learning algorithm, called LDBR. It views user behaviors as a set of dynamic events, and use recent event embedding to predict future user behavior and infer the current semantic labels. Specifically, we propose a new strategy to automatically learn a good event embedding in behavior sequence by introducing a smooth sampling strategy and minimizing the temporal link prediction error. It is hard to obtain real-word datasets with evolving labels, thus in this paper, we provide a new dynamic network dataset with evolving labels called Arxiv and make it publicly available. Based on Arxiv dataset we conduct a case study to verify the quality of event embedding. Extensive experiments on temporal link prediction tasks further demonstrate the effectiveness of the LDBR model. |
Title: Learning Fair Representations for Recommendation: A Graph-based Perspective |
Authors: Le Wu (Hefei University of Technology), Lei Chen (Hefei University of Technology), Pengyang Shao (Hefei University of Technology), Richang Hong (Hefei University of Technology), Xiting Wang (Microsoft Research Asia) and Meng Wang (Hefei University of Technology). |
As a key application of artificial intelligence, recommender systems are among the most pervasive computer aided systems to help users find potential items of interests. Recently, researchers paid considerable attention to fairness issues for artificial intelligence applications. Most of these approaches assumed independence of instances, and designed sophisticated models to eliminate the sensitive information to facilitate fairness. However, recommender systems differ greatly from these approaches as users and items naturally form a user-item bipartite graph, and are collaboratively correlated in the graph structure. In this paper, we propose a novel graph based technique for ensuring fairness of any recommendation models. Here, the fairness requirements refer to not exposing sensitive feature set in the user modeling process. Specifically, given the original embeddings from any recommendation models, we learn a composition of filters that transform each user's and each item's original embeddings into a filtered embedding space based on the sensitive feature set. For each user, this transformation is achieved under the adversarial learning of a user-centric graph, in order to obfuscating each sensitive feature between both the filtered user embedding and the sub graph structures of this user. Finally, extensive experimental results clearly show the effectiveness of our proposed model for fair recommendation. |
Title: Learning from Graph Propagation via Ordinal Distillation for One-Shot Automated Essay Scoring |
Authors: Zhiwei Jiang (Nanjing University), Meng Liu (Nanjing University), Yafeng Yin (Nanjing University), Hua Yu (Nanjing University), Zifeng Cheng (Nanjing University) and Qing Gu (Nanjing University). |
One-shot automated essay scoring (AES) aims to assign scores to a set of essays written specific to a certain prompt, with only one manually scored essay per distinct score. Compared to the previous-studied prompt-specific AES which usually requires a large number of manually scored essays for model training (e.g., about 600 manually scored essays out of totally 1000 essays), one-shot AES can greatly reduce the workload of manual scoring. In this paper, we propose a Transductive Graph-based Ordinal Distillation (TGOD) framework to tackle the task of one-shot AES. Specifically, we design a transductive graph-based model as a teacher model to generate pseudo labels of unlabeled essays based on the one-shot labeled essays. Then, we distill the knowledge in the teacher model into a neural student model by learning from the high confidence pseudo labels. Different from the general knowledge distillation, we propose an ordinal-aware unimodal distillation which makes a unimodal distribution constraint on the output of student model, to tolerate the minor errors existed in pseudo labels. Experimental results on the public dataset ASAP show that TGOD can improve the performance of existing neural AES models under the one-shot AES setting and achieve an acceptable average QWK of 0.69. |
Title: Learning Heterogeneous Temporal Patterns of User Preference for Timely Recommendation |
Authors: Junsu Cho (POSTECH), Dongmin Hyun (POSTECH), Seongku Kang (POSTECH) and Hwanjo Yu (POSTECH). |
Recommender systems have achieved great success in modeling user's preferences on items and predicting the next item the user would consume. Recently, there have been many efforts to utilize time information of users' interactions with items to capture inherent temporal patterns of user behaviors and offer timely recommendations at a given time. Existing studies regard the time information as a single type of feature and focus on how to associate it with user preferences on items. However, we argue they are insufficient for fully learning the time information because the temporal patterns of user preference are usually heterogeneous. A user's preference for a particular item may 1) increase periodically or 2) evolve over time under the influence of significant recent events, and each of these two kinds of temporal pattern appears with some unique characteristics. In this paper, we first define the unique characteristics of the two kinds of temporal pattern of user preference that should be considered in time-aware recommender systems. Then we propose a novel recommender system for timely recommendations, called TimelyRec, which jointly learns the heterogeneous temporal patterns of user preference considering all of the defined characteristics. In TimelyRec, a cascade of two encoders captures the temporal patterns of user preference using a proposed attention module for each encoder. Moreover, we introduce an evaluation scenario that evaluates the performance on predicting an interesting item and when to recommend the item simultaneously in top-K recommendation (i.e., item-timing recommendation). Our extensive experiments on a scenario for item recommendation and the proposed scenario for item-timing recommendation on real-world datasets demonstrate the superiority of TimelyRec and the proposed attention modules. |
Title: Learning Intents behind Interactions with Knowledge Graph for Recommendation |
Authors: Xiang Wang (National University of Singapore), Tinglin Huang (Zhejiang University), Dingxian Wang (eBay), Yancheng Yuan (The Hong Kong Polytechnic University), Zhenguang Liu (Zhejiang Gongshang University), Xiangnan He (University of Science and Technology of China) and Tat-Seng Chua (National University of Singapore). |
Knowledge graph (KG) plays an increasingly important role in recommender systems. A recent technical trend is to develop end-to-end models founded on graph neural networks (GNNs). However, existing GNN-based models are coarse- grained in relational modeling, failing to (1) identify user-item relation at a fine-grained level of intents, and (2) exploit relation dependencies to preserve the semantics of long-range connectivity. In this study, we explore the intents behind a user-item interaction by using auxiliary item knowledge, and propose a new model, Knowledge Graph-based Intent Network (KGIN). Technically, we model each intent as an attentive combination of KG relations, encouraging the independence of different intents for better model capability and interpretability. Furthermore, we devise a new information aggregation scheme for GNN, which recursively integrates the relation sequences of long-range connectivity (i.e., relational paths). This scheme allows us to distill useful information about user intents and encode them into the representations of users and items. Experimental results on three benchmark datasets show that, KGIN achieves significant improvements over the state-of-the-art methods like KGAT, KGNN-LS, and CKAN. Further analyses show that KGIN offers interpretable explanations for predictions by identifying influential intents and relational paths. We will release our implementation upon acceptance. |
Title: Learning Neural Point Processes with Latent Graphs |
Authors: Qiang Zhang (University College London), Aldo Lipani (University College London) and Emine Yilmaz (University College London). |
Neural point processes (NPPs) employ neural networks to capture the complicated dynamics of asynchronous event sequences. Existing NPPs feed all history events into neural networks,3assuming that all event types contribute to the prediction of the target type. However, this assumption can be problematic because in reality some event types do not contribute to the predictions of another type. To correct this defect, we learn to omit non-contributing types to remove their disturbance. Towards this end, we simultaneously consider the tasks of (1) finding event types that contribute to predictions of the target types and (2) learning an NPP model from event sequences. For the former, we formulate a latent graph, with event types being vertices and non-zero contributing relationships being directed edges; then we propose a probabilistic graph generator, from which we sample a latent graph. For the latter, the sampled graph can be readily used as a plug-in to modify an existing NPP model. Because these two tasks are nested, we propose to optimize the model parameters through bilevel programming and develop an efficient solution. Experimental results on both synthetic and real-world datasets show the improved performance against state-of-the-art baselines. This work removes disturbance of non-contributing event types with the aid of a validation procedure, similar to the practice to mitigate overfitting used when training machine learning models. |
Title: Leveraging Review Properties for Effective Recommendation |
Authors: Xi Wang (University of Glasgow), Iadh Ounis (University of Glasgow) and Craig Macdonald (University of Glasgow). |
Many state-of-the-art recommendation systems leverage explicit item reviews posted by users by considering their usefulness in representing the users’ preferences and describing the items’ attributes. These posted reviews may have various associated properties, such as their length, their age since they were posted, or their item rating. However, it remains unclear how these different review properties contribute to the usefulness of their corresponding reviews in addressing the recommendation task. In particular, users show distinct preferences when considering different aspects of the reviews (i.e. properties) for making decisions about the items. Hence, it is important to model the relationship between the reviews’ properties and the usefulness of reviews while learning the users’ preferences and the items’ attributes. Therefore, we propose to model the reviews with their associated available properties. We introduce a novel review properties-based recommendation model (RPRM) that learns which review properties are more important than others in capturing the usefulness of reviews, thereby enhancing the recommendation results. Furthermore, inspired by the users’ information adoption framework, we integrate two loss functions and a negative sampling strategy into our proposed RPRM model, to ensure that the properties of reviews are correlated with the users’ preferences. We examine the effectiveness of RPRM using the well-known Yelp and Amazon datasets. Our results show that RPRM significantly outperforms a classical and five state-of-the-art baselines. Moreover, we experimentally show the advantages of using our proposed loss functions and negative sampling strategy, which further enhance the recommendation performances of RPRM. |
Title: Leveraging User Behavior History for Personalized Email Search |
Authors: Keping Bi (University of Massachusetts Amherst), Pavel Metrikov (Microsoft), Chunyuan Li (Microsoft) and Byungki Byun (Microsoft). |
An effective email search engine can facilitate users' search tasks and improve their communication efficiency. Users could have varied preferences on various ranking signals of an email such as relevance and recency based on their tasks at hand and even their jobs. Thus a uniform matching pattern is not optimal for all users. Instead, an effective email ranker should conduct personalized ranking by taking users' characteristics into account. Existing studies have explored user characteristics from various angles to make email search results personalized. However, little attention has been given to users' search history for characterizing users. Although users' historical behaviors have been shown to be beneficial as context in Web search, their effect in email search has not been studied and remains unknown. Given these observations, we propose to leverage user search history as query context to characterize users and build a context- aware ranking model for email search. In contrast to previous context-dependent ranking techniques that are based on raw texts, we use ranking features in the search history. This frees us from potential privacy leakage while giving a better generalization power to unseen users. Accordingly, we propose a context-dependent neural ranking model (CNRM) that encodes the ranking features in users' search history as query context and show that it can significantly outperform the baseline neural model without using the context. We also investigate the benefit of the query context vectors obtained from CNRM on the state-of-the-art learning-to-rank model LambdaMart by clustering the vectors and incorporating the cluster information. Experimental results show that significantly better results can be achieved on LambdaMart as well, indicating that the query clusters can characterize different users and effectively turn the ranking model personalized. |
Title: Linear-Time Self Attention with Codeword Histogram for Efficient Recommendation |
Authors: Yongji Wu (Duke University), Defu Lian (University of Science and Technology of China), Neil Gong (Duke University), Lu Yin (Alibaba Group), Mingyang Yin (Alibaba Group), Jingren Zhou (Alibaba Group) and Hongxia Yang (Alibaba Group). |
Self-attention has become increasingly popular in a variety of sequence modeling tasks from natural language processing to recommendation, due to its effectiveness. However, self-attention suffers from quadratic computational and memory complexities, prohibiting its applications on long sequences. Existing approaches that address this issue mainly rely on a sparse attention context, either using a local window, or a permuted bucket obtained by locality-sensitive hashing (LSH) or sorting, while crucial information may be lost. Inspired by the idea of vector quantization that uses cluster centroids to approximate items, we propose LISA (LInear-time Self Attention), which enjoys both the effectiveness of vanilla self- attention and the efficiency of sparse attention. LISA scales linearly with the sequence length, while enabling full contextual attention via computing differentiable histograms of codeword distributions. Meanwhile, unlike some efficient attention methods, our method poses no restriction on casual masking or sequence length. We evaluate our method on four real-world datasets for sequential recommendation. The results show that LISA outperforms the state-of-the-art efficient attention methods in both performance and speed; and it is up to 57x faster and 78x more memory efficient than vanilla self-attention. |
Title: Linguistically-Enriched and Context-Aware Zero-shot Slot filling |
Authors: A.B. Siddique (University of California, Riverside), Fuad Jamour (University of California, Riverside) and Vagelis Hristidis (University of California, Riverside). |
Slot filling is identifying contiguous spans of words in an utterance that correspond to certain parameters (i.e., slots) of a user request/query. Slot filling is one of the most important challenges in modern task-oriented dialog systems. Supervised learning approaches have proven effective at tackling this challenge, but they need a significant amount of labeled training data in a given domain. However, new domains (i.e., unseen in training) may emerge after deployment. Thus, it is imperative that these models seamlessly adapt and fill slots from both seen and unseen domains -- unseen domains contain unseen slot types with no training data, and even seen slots in unseen domains are typically presented in different contexts. This setting is commonly referred to as zero-shot slot filling. Little work has focused on this setting, with limited experimental evaluation. Existing models that mainly rely on context-independent embedding-based similarity measures fail to detect slot values in unseen domains or do so only partially. We propose a new zero-shot slot filling neural model, LEONA, which works in three steps. Step one acquires domain-oblivious, con ext-aware representations of the utterance word by exploiting (a) linguistic features such as part-of-speech; (b) named entity recognition cues; and (c) contextual embeddings from pre-trained language models. Step two fine-tunes these rich representations and produces slot-independent tags for each word. Step three exploits generalizable context-aware utterance-slot similarity features at the word level, uses slot-independent tags, and contextualizes them to produce slot- specific predictions for each word. Our thorough evaluation on four diverse public datasets demonstrates that our approach consistently outperforms the state-of-the-art models by 17.52%, 22.15%, 17.42%, and 17.95% on average for unseen domains on SNIPS, ATIS, MultiWOZ, and SGD datasets, respectively. |
Title: Local Clustering in Contextual Multi-Armed Bandits |
Authors: Yikun Ban (University of Illinois at Urbana-Champaign) and Jingrui He (University of Illinois at Urbana-Champaign). |
We study identifying user clusters in contextual multi-armed bandits (MAB). Contextual MAB is an effective tool for many real applications, such as content recommendation and online advertisement. In practice, user dependency plays an essential role in the user's actions, and thus the rewards. Clustering similar users can improve the quality of reward estimation, which in turn leads to more effective content recommendation and targeted advertising. Different from traditional clustering settings, we cluster users based on an unknown bandit parameter, which will be estimated incrementally. In particular, we define the problem of cluster detection in contextual MAB, and propose a bandit algorithm, LOCB, embedded with local clustering procedure. And, we provide theoretical analysis about LOCB in terms of the correctness and efficiency of clustering and its regret bound. Finally, we evaluate the proposed algorithm from various aspects, which outperforms state-of-the-art baselines significantly. |
Title: Long Short-Term Session Search with Joint Document Reranking and Next Query Prediction |
Authors: Qiannan Cheng (Shandong University), Yujie Lin (Shandong University), Zhaochun Ren (Shandong University), Pengjie Ren (University of Amsterdam), Zhumin Chen (Shandong University), Xiangyuan Liu (Shandong University) and Maarten de Rijke (University of Amsterdam). |
Document reranking (DR) and next query prediction (NQP) are two core tasks in session search, often driven by the same search intent. So far, most proposed models for DR and NQP only focus on users' short-term intents in the current search sessions. This limitation fails to recognize and address the long-term intents present in historical search sessions. We consider a personalized mechanism for learning a user's profile from their long-term behavior to simultaneously enhance the performance of DR and NQP in an ongoing session. We propose a personalized session search model, called Long short-term session search Network (LostNet), that jointly learns to rerank documents for the current query and predict the next query. LostNet consists of three modules: a hierarchical session-based attention mechanism, a personalized multi-hop memory network, and a DR and NQP network. The hierarchical session-based attention mechanism tracks fine-grained short-term intent from a user's search session. The personalized multi-hop memory network tracks a user's dynamic profile information from their search sessions so as to infer their personal search intent. The DR and NQP network reranks documents and predicts the next query synchronously based on outputs from the above two modules. We conduct experiments on two benchmark session search datasets. The results show that LostNet achieves significant improvements over state-of-the-art baselines. |
Title: Lorentzian Graph Convolutional Neural Networks |
Authors: Yiding Zhang (Beijing University of Posts and Telecommunications), Xiao Wang (Beijing University of Posts and Telecommunications), Chuan Shi (Beijing University of Posts and Telecommunications), Nian Liu (Beijing University of Posts and Telecommunications) and Guojie Song (Peking University). |
Graph convolutional networks (GCNs) have received considerable research attention recently. Most GCNs learn the node representations in Euclidean geometry, but that could have a high distortion in the case of embedding graphs with scale- free or hierarchical structure. Recently, some GCNs are proposed to deal with this problem in non-Euclidean geometry, e.g., hyperbolic geometry. Although hyperbolic GCNs achieve promising performance, existing hyperbolic graph operations actually cannot rigorously follow the hyperbolic geometry, which may limit the ability of hyperbolic geometry and thus hurt the performance of hyperbolic GCNs. In this paper, we propose a novel hyperbolic GCN named Lorentzian graph convolutional network (LGCN), which rigorously guarantees the learned node features follow the hyperbolic geometry. Specifically, we rebuild the graph operations of hyperbolic GCNs with Lorentzian version, e.g., the feature transformation and non-linear activation. Also, an elegant neighborhood aggregation method is designed based on the centroid of Lorentzian distance. Moreover, we prove some proposed graph operations are equivalent in different types of hyperbolic geometry, which fundamentally indicates their correctness. Experiments on six datasets show that LGCN performs better than the state-of-the-art methods. LGCN has lower distortion to learn the representation of tree-likeness graphs compared with existing hyperbolic GCNs. We also find that the performance of some hyperbolic GCNs can be improved by simply replacing the graph operations with those we defined in this paper. |
Title: Mask-GVAE: Blind De-noising Graphs via Partition |
Authors: Jia Li (The Chinese University of Hong Kong), Mengzhou Liu (The Chinese University of Hong Kong), Honglei Zhang (Tianjin Univerisity), Pengyun Wang (Huawei Noah's Ark Lab), Yong Wen (Huawei Noah's Ark Lab), Lujia Pan (Huawei Noah's Ark Lab) and Hong Cheng (The Chinese University of Hong Kong). |
We present Mask-GVAE, a variational generative model for blind de-noising large discrete graphs, in which ``blind de- noising'' means we don't require any supervision from clean graphs. We focus on recovering clean graph structures via deleting irrelevant edges and adding missing edges, which has many applications in real-world scenarios, for example, enhancing the quality of connections in a co-authorship network. Mask-GVAE makes use of the robustness in low eigenvectors of graph Laplacian against random noise and decomposes the input graph into several stable clusters. It then harnesses the huge computations by decoding probabilistic smoothed subgraphs in a variational manner. On a wide variety of benchmarks, Mask-GVAE outperforms competing approaches by a significant margin on PSNR and WL similarity. |
Title: Match Plan Generation in Web Search with Parameterized Action Reinforcement Learning |
Authors: Ziyan Luo (University of California, San Diego), Linfeng Zhao (Northeastern University), Wei Cheng (North China Electric Power University), Sihao Chen (University of California, Berkeley), Qi Chen (Microsoft Research Asia), Hui Xue (Microsoft Research Asia), Haidong Wang (Microsoft Asia), Chuanjie Liu (Microsoft Asia), Mao Yang (Microsoft Research Asia) and Lintao Zhang (Microsoft Research Asia). |
To achieve good result quality and short query response time, search engines use specific match plans on Inverted Index to help retrieve a small set of relevant documents from billions of web pages. A match plan is composed of a sequence of match rules, which contain discrete match rule types and continuous stopping quotas. Currently, match plans are manually designed by experts according to their several years' experience, which encounters difficulty in dealing with heterogeneous queries and varying data distribution. In this work, we formulate the match plan generation as a Partially Observable Markov Decision Process (POMDP) with a parameterized action space, and propose a novel reinforcement learning algorithm Parameterized Action Soft Actor-Critic (PASAC) to effectively enhance the exploration in both spaces. In our scene, we also discover a skew prioritizing issue of the original Prioritized Experience Replay (PER) and introduce Stratified Prioritized Experience Replay (SPER) to address it. We are the first group to generalize this task for all queries as a learning problem with zero prior knowledge and successfully apply deep reinforcement learning in the real web search environment. Our approach greatly outperforms the well-designed production match plans by over 70% reduction of index block accesses with the quality of documents almost unchanged, and 9% reduction of query response time even with model inference cost. Our method also beats the baselines on some open-source benchmarks (Our code is available at https://github.com/RL-matchplangeneration/Match-Plan-Generation-in-Web-Search). |
Title: MATCH: Metadata-Aware Text Classification in A Large Hierarchy |
Authors: Yu Zhang (University of Illinois at Urbana-Champaign), Zhihong Shen (Microsoft Research), Yuxiao Dong (Microsoft Research), Kuansan Wang (Microsoft Research) and Jiawei Han (University of Illinois at Urbana-Champaign). |
Multi-label text classification refers to the problem of assigning each given document to its most relevant labels from the label set. Commonly, the metadata of the given documents and the hierarchy of the labels are available in real-world applications. However, most existing studies focus on only modeling the text information, with a few attempts to utilize a very small label hierarchy (up to several hundred labels). In this paper, we bridge the gap by formalizing the problem of metadata-aware text classification in a large hierarchy (orders of magnitude larger than previously used). To address this problem, we present the MATCH solution---an end-to-end framework that leverages both the document metadata and large-scale label hierarchy. To incorporate metadata, we pre-train the embeddings of text and metadata in the same space and also leverage the fully-connected attentions to capture the interrelations between them. To leverage the label hierarchy, we propose different ways to regularize the parameters and output probabilities of each child label by its parents. Extensive experiments on two massive text datasets with large-scale label hierarchies demonstrate the effectiveness of MATCH over state-of-the-art deep learning baselines. |
Title: Maximizing Marginal Fairness for Dynamic Learning to Rank |
Authors: Tao Yang (University of Utah) and Qingyao Ai (University of Utah). |
Rankings, especially those in search and recommendation systems, often determine how people access information and how information are exposed to people. Therefore, how to balance the relevance and fairness of information exposure is considered as one of the key problems for modern IR systems. As conventional ranking frameworks that myopically sorts documents with their relevance will inevitably introduce unfair result exposure, recent studies on ranking fairness mostly focus on dynamic ranking paradigms where result rankings can be adapted in real time to support fairness in groups (i.e., races, genders, etc.). Existing studies on fairness in dynamic learning to rank, however, often achieve the overall fairness of document exposure in ranked lists by significantly sacrificing the performance of result relevance and fairness on the top results. To address this problem, we propose a fair and unbiased ranking method named Maximal Marginal Fairness (MMF). The algorithm integrates unbiased estimators for both relevance and merit-based fairness while providing an explicit controller that balances the selection of documents to maximize the marginal relevance and fairness in top-k results. Theoretical and empirical analysis shows that, with small compromises on space complexity and long list fairness, our method achieves superior efficiency and effectiveness comparing to the state-of-the-art algorithms in both relevance and fairness for top-k rankings. |
Title: MedPath: Augmenting Health Risk Prediction via Medical Knowledge Paths |
Authors: Muchao Ye (Pennsylvania State University), Suhan Cui (Northeastern University China), Yaqing Wang (SUNY at Buffalo), Junyu Luo (Pennsylvania State University), Cao Xiao (IQVIA) and Fenglong Ma (Pennsylvania State University). |
The broad adoption of electronic health records (EHR) data and the availability of biomedical knowledge graph (KG) on the web have provided clinicians and researchers unprecedented resources and opportunities for conducting health risk predictions to improve healthcare quality and medical resource allocation. Existing methods have focused on improving the EHR feature representations using attention mechanisms, time-aware models, or external knowledge. However, they ignore the importance of using personalized information to make predictions. Besides, the reliability of their prediction interpretations needs to be improved since their interpretable attention scores are not explicitly reasoned from disease progression paths. In this paper, we propose MedPath to solve these challenges and augment existing risk prediction models with the ability to use personalized information and provide reliable interpretations inferring from disease progression paths. Firstly, MedPath extracts personalized knowledge graphs (PKGs) containing all possible disease progression paths from observed symptoms to target diseases from a large-scale online medical knowledge graph. Next, to augment existing EHR encoders for achieving better predictions, MedPath learns a PKG embedding by conducting multi-hop message passing from symptoms node to target disease nodes through a graph neural network encoder. Since MedPath reasons disease progression by paths existing in PKGs, it can provide explicit explanations for the prediction by pointing out how observed symptoms can finally lead to target diseases. Experimental results on three real-world medical datasets show that MedPath is effective in improving the performance of eight state-of-the-art methods with higher F1 scores and AUCs. Our case study also demonstrates that MedPath can greatly improve the explicitness of the risk prediction interpretation. |
Title: Meta-HAR: Federated Representation Learning for Human Activity Recognition |
Authors: Chenglin Li (University of Alberta), Di Niu (University of Alberta), Bei Jiang (Department of Mathematical and Statistical Sciences, University of Alberta), Xiao Zuo (Tencent) and Jianming Yang (Platform and Content Group, Tencent, Shenzhen, China). |
Human activity recognition (HAR) based on mobile sensors plays an important role in ubiquitous computing. However, the rise of data regulatory constraints precludes collecting private and labeled signal data from personal devices at scale. Thanks to the growth of computational power on mobile devices, federated learning has emerged as a decentralized alternative solution to model training, which iteratively aggregates locally updated models into a shared global model, therefore being able to leverage decentralized, private data without central collection. However, the effectiveness of federated learning for HAR is affected by the fact that each user has different activity types and even a different signal distribution for the same activity type. Furthermore, it is uncertain if a single global model trained can generalize well to individual users or new users with heterogeneous data. In this paper, we propose Meta-HAR, a federated representation learning framework, in which a signal embedding network is meta-learned in a federated manner, while the learned signal representations are further fed into a personalized classification network at each user for activity prediction. In order to boost the representation ability of the embedding network, we treat the HAR problem at each user as a different task and train the shared embedding network through a Model-Agnostic Meta-learning framework, such that the embedding network can generalize to any individual user. Personalization is further achieved on top of the robustly learned representations in an adaptation procedure. We conducted extensive experiments based on two publicly available HAR datasets as well as a newly created HAR dataset. Results verify that Meta-HAR is effective at maintaining high test accuracies for individual users, including new users, and significantly outperforms several baselines, including Federated Averaging, Reptile, and even centralized learning in certain cases. Our collected dataset will be open-sourced to facilitate future development in the field of sensor-based human activity recognition. |
Title: MicroRank: End-to-End Latency Issue Localization with Extended Spectrum Analysis in Microservice Environments |
Authors: Guangba Yu (Sun Yat-Sen University), Pengfei Chen (Sun Yat-sen University), Hongyang Chen (Sun Yat-sen University), Zijie Guan (Tencent), Zicheng Huang (Sun Yat-sen University), Linxiao Jing (Sun Yat-sen University), Tianjun Weng (Sun Yat-Sen University), Xinmeng Sun (Sun Yat-Sen University) and Xiaoyun Li (Sun Yat-sen University). |
With the advantages of strong scalability and fast delivery, microser-vice has become a popular software architecture in the modern ITindustry. Most of microservice faults manifest themselves in terms of service latency increase and impact user experience. The explosion in the number of service instances and complex dependencies among them make the application diagnosis extremely challenging.To help understand and troubleshoot a microservice system, the end-to-end tracing technology has been widely applied to capturethe execution path of each request. However, the tracing data are not fully leveraged by cloud and application providers when con-ducting latency issue localization in the microservice environment.This paper proposes a novel system ,named MicroRank, which analyzes clues provided by normal and abnormal traces to locateroot causes of latency issues. Once a latency issue is detected by the Anomaly Detector in MicroRank, the cause localization procedure is triggered. MicroRank first distinguishs which traces are abnormal. Then, MicroRank's PageRank Scorer module uses the abnormal and normal trace information as its input and differentials the importance of different traces to extended spectrum techniques . Finally, the spectrum techniques can calculate the ranking list based on the weighted spectrum information from PageRank Scorer to locate root causes more effectively. The experimental evaluationson a widely-used open-source system and a production system show that MicroRank achieves excellent results not only in one root cause situation but also in two issues that happen at the same time. Moreover,MicroRank makes 6% to 22% improvement in recall in localizing root causes compared to current state-of-the- art methods. |
Title: Minimally-Supervised Structure-Rich Text Categorization via Learning on Text-Rich Networks |
Authors: Xinyang Zhang (University of Illinois at Urbana-Champaign), Chenwei Zhang (Amazon), Xin Luna Dong (Amazon), Jingbo Shang (University of California San Diego) and Jiawei Han (University of Illinois at Urbana-Champaign). |
Text categorization is an essential task in Web content analysis. Considering the ever-evolving Web data and new emerging categories, instead of the laborious supervised setting, in this paper, we focus on the minimally-supervised setting that aims to categorize documents effectively, with a couple of seed documents annotated per category. We recognize that texts collected from the Web are often structure-rich, i.e., accompanied by various metadata. One can easily organize the corpus into a text-rich network, joining raw text documents with document attributes, high-quality phrases, label surface names as nodes, and their associations as edges. Such a network provides a holistic view of the corpus' heterogeneous data sources and enables a joint optimization for network-based analysis and deep textual model training. We therefore propose a novel framework for minimally supervised categorization by learning from the text-rich network. Specifically, we jointly train two modules with different inductive biases -- a text analysis module for text understanding and a network learning module for class-discriminative, scalable network learning. Each module generates pseudo training labels from the unlabeled document set, and both modules mutually enhance each other by co-training using pooled pseudo labels. We test our model on two real-world datasets. On the challenging e-commerce product categorization dataset with 683 categories, our experiments show that given only three seed documents per category, our framework can achieve an accuracy of about 92%, significantly outperforming all compared methods; our accuracy is only less than 2% away from the supervised BERT model trained on about 50K labeled documents. |
Title: Mining Dual Emotion for Fake News Detection |
Authors: Xueyao Zhang (Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences), Juan Cao (Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences), Xirong Li (Renmin University of China), Qiang Sheng (Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences), Lei Zhong (Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences) and Kai Shu (Illinois Institute of Technology). |
Emotion plays an important role in detecting fake news online. When leveraging emotional signals, the existing methods focus on exploiting the emotions of news contents that conveyed by the publishers (i.e., publisher emotion). However, fake news often evokes high-arousal or activating emotions of people, so the emotions of news comments that aroused in the crowd (i.e., social emotion) should not be ignored. Furthermore, it remains to be explored whether there exists a relationship between publisher emotion and social emotion (i.e., dual emotion), and how the dual emotion appears in fake news. In this paper, we verify that dual emotion is distinctive between fake and real news and propose Dual Emotion Features to represent dual emotion and the relationship between them for fake news detection. Further, we exhibit that our proposed features can be easily plugged into existing fake news detectors as an enhancement. Extensive experiments on three real-world datasets (one in English and the others in Chinese) show that our proposed feature set: 1) outperforms the state-of-the-art task-related emotional features; 2) can be well compatible with existing fake news detectors and effectively improve the performance of detecting fake news. |
Title: MIRA:Leveraging Multi-Intention Co-click Information in Web-scale Document Retrieval using Deep Neural Networks |
Authors: Yusi Zhang (Microsoft Bing Platform), Chuanjie Liu (Microsoft Bing Platform), Angen Luo (Microsoft Bing Platform), Hui Xue (Microsoft Research), Xuan Shan (Microsoft Bing Platform), Yuxiang Luo (Microsoft Bing Platform), Yiqian Xia (Microsoft Bing Platform), Yuanchi Yan (Microsoft Bing Platform) and Haidong Wang (Microsoft Bing Platform). |
We study the problem of deep recall model in industrial web search, which is, given a user query, retrieve hundreds of most relevant documents from billions of candidates. The common framework is to encoding queries and documents separately into distributed representations and match them in latent semantic space. However, all the exiting deep encoding models only leverage the information of the document itself, which is often not sufficient in practice when matching with query terms, especially for the hard tail queries. In this work we aim to leverage the additional information for documents from their co-click neighbours to help document retrieval. The challenges include how to effectively extract information and eliminate noise when involving co-click information while meet the demands of industrial scalability for real time online serving. To handle the noise in co-click relations, we firstly propose a web-scale Multi-Intention Co-click document Graph(MICG) which builds the co-click connections between documents on click intention level but not on document level. Then we present an encoding framework MIRA based on Bert and graph attention networks which leverages a two-factor attention mechanism to aggregate neighbours. To meet the online latency requirements, we only involve neighbour information in document side which can save the time-consuming query neighbor search in real time serving. We conduct extensive offline experiments on two public datasets and one private web-scale dataset from major commercial search engines(Bingfootnote{https://www.bing.com/} and Sougoufootnote{https://www.sogou.com/} ) demonstrating the effectiveness and scalability of the proposed method compared with several baselines. And a further case study reveals that co-click relations mainly help improve web search quality from two aspects: key concept enhancing and query term complementary. |
Title: Mitigating Gender Bias in Captioning Systems |
Authors: Ruixiang Tang (Texas A&M University), Mengnan Du (Texas A&M University), Yuening Li (Texas A&M University), Zirui Liu (Texas A&M University), Na Zou (Texas A&M University) and Xia Hu (Texas A&M University). |
Image captioning has made substantial progress with huge supporting image collections sourced from the web. However, recent studies have pointed out that captioning datasets, such as COCO, contain gender bias found in web corpora. As a result, learning models could heavily rely on the learned priors and image context for gender identification, leading to incorrect or even offensive errors. To encourage models to learn correct gender features, we reorganize the COCO dataset and present two new splits COCO-GB v1 and v2 datasets where the train and test sets have different gender- context joint distribution. Models relying on contextual cues will suffer from huge gender prediction errors on the anti- stereotypical test data. Benchmarking experiments reveal that most captioning models learn gender bias, leading to high gender prediction errors, especially for women. To alleviate the unwanted bias, we propose a new Guided Attention Image Captioning model (GAIC) which provides self-guidance on visual attention to encourage the model to capture correct gender visual evidence. Experimental results validate that GAIC can significantly reduce gender prediction errors with a competitive caption quality. Our codes and the designed benchmark datasets are available at https://github.com/CaptionGenderBias2020. |
Title: Mixed-Curvature Multi-relational Graph Neural Network for Knowledge Graph Completion |
Authors: Shen Wang (UIC), Xiaokai Wei (Amazon), Cicero Nogueira dos Santos (Amazon), Zhiguo Wang (Amazon), Ramesh Nallapati (Amazon), Andrew Arnold (Amazon), Bing Xiang (Amazon), Isabel F. Cruz (UIC) and Philip S. Yu (UIC). |
Knowledge graphs (KGs) have gradually become valuable assets for many AI applications. In a KG, a node denotes an entity, and an edge(or link) denotes a relationship between the entities represented by the nodes. Knowledge graph completion infers and predicts missing edges in a KG automatically. Knowledge graph embeddings have shed light on addressing this task. Recent research embeds KGsin hyperbolic (negatively curved) space instead of conventionalEuclidean (zero curved) space and has been effective in capturing hierarchical structures. However, as a multi-relational graph, KGs are not structured uniformly and display intrinsic heterogeneous structures. They usually contain rich types of structures, such as hierarchical or cyclical structure. Embedding KGs in single-curvature space, such as hyperbolic or Euclidean space, overlooks the intrinsic heterogeneous structures of KGs, and therefore cannot accurately capture their structure. To address this issue, we propose a Mixed-Curvature Multi-Relational Graph Neural Net-work (M2GNN), a generic approach that embeds multi-relationalKGs in mixed-curvature space. Specifically, we define and construct a mixed-curvature space through a product manifold combining multiple single-curvature spaces (e.g., spherical, hyperbolic, or Euclidean) with the purpose of modeling a variety of structures. However, constructing a mixed-curvature space typically requires manually defining the fixed curvatures, which requires domain knowledge and additional data analysis. Improperly defined curvature spaces also cannot capture the structure of KGs accurately. To overcome this problem, we adopt trainable curvatures to better capture the underlying structure of the KGs. Furthermore, we propose a Graph Neural Updater by leveraging the heterogeneous relational context in mixed-curvature space to improve the quality of the embedding. Experiments on three KG datasets demonstrate that the proposed M2GNN can outperform its single geometry counterpart as well as state-of-the-art embedding methods on the KGcompletion task. |
Title: Mixup for Node and Graph Classification |
Authors: Yiwei Wang (National University of Singapore), Wei Wang (National University of Singapore), Yuxuan Liang (National University of Singapore), Yujun Cai (Nanyang Technological University) and Bryan Hooi (National University of Singapore). |
Mixup is an advanced data augmentation method for training neural network based image classifiers, which interpolates both features and labels of a pair of images to produce synthetic samples. However, devising the Mixup methods for graph learning is challenging due to the irregularity and connectivity of graph data. In this paper, we propose the Mixup methods for two fundamental tasks in graph learning: node and graph classification. To interpolate the irregular graph topology, we propose the two-branch graph convolution to mix the receptive field subgraphs for the paired nodes. Mixup on different node pairs can interfere with the mixed features for each other due to the connectivity between nodes. To block this interference, we propose the two-stage Mixup framework, which uses each node's neighbors' representations before Mixup for graph convolutions. For graph classification, we interpolate complex and diverse graphs in the semantic space. Qualitatively, our Mixup methods enable GNNs to learn more discriminative features and reduce over-fitting. Quantitative results show that our method yields consistent gains in terms of test accuracy and F1-micro scores on standard datasets, for both node and graph classification. Overall, our method effectively regularizes popular graph neural networks for better generalization without increasing their time complexity |
Title: Modeling Human Motives and Emotions from Personal Narratives Using External Knowledge And Entity Tracking |
Authors: Prashanth Vijayaraghavan (MIT) and Deb Roy (MIT). |
The ability to automatically understand and infer characters' motivations and their emotional states is key towards better narrative comprehension. In this work, we propose a Transformer-based architecture to model characters' motives and emotions from personal narratives. Towards this goal, we incorporate social commonsense knowledge about mental states of people related to social events and employ dynamic state tracking of entities using an augmented memory module. Our model learns to produce contextual embeddings and explanations of characters' mental states by integrating external knowledge along with prior narrative context and mental state encodings. We leverage weakly- annotated personal narratives and knowledge data to train our model and demonstrate its effectiveness on publicly available STORYCOMMONSENSE dataset containing annotations for character mental states. Further, we show that the learnt mental state embeddings can be applied in downstream tasks such as empathetic response generation. |
Title: Modeling Sparse Information Diffusion at Scale via Lazy Multivariate Hawkes Processes |
Authors: Maximilian Nickel (Facebook AI Research) and Matthew Le (Facebook AI Research). |
Multivariate Hawkes Processes (MHPs) are an important class of temporal point processes that have enabled key advances in understanding and predicting social information systems. However, due to their complex modeling of temporal dependencies, MHPs have proven to be notoriously difficult to scale, what has limited their applications to relatively small domains. In this work, we propose a novel model and computational approach to overcome this important limitation. By exploiting a characteristic sparsity pattern in real-world diffusion processes, we show that our approach allows to compute the exact likelihood and gradients of an MHP -- independently of the ambient dimensions of the underlying network. We show on synthetic and real-world datasets that our model does not only achieve state-of-the-art predictive results, but also improves runtime performance by multiple orders of magnitude compared to standard methods on sparse event sequences. In combination with easily interpretable latent variables and influence structures, this allows us to analyze diffusion processes at previously unattainable scale. |
Title: Motif-driven Dense Subgraph Discovery in Directed and Labeled Networks |
Authors: A. Erdem Sarıyüce (University at Buffalo). |
Dense regions in networks are an indicator of interesting and unusual information. However, most existing methods only consider simple, undirected, unweighted networks. Complex networks in the real-world often have rich information though: edges are asymmetrical and nodes/edges have categorical and numerical attributes. Finding dense subgraphs in such networks in accordance with this rich information is an important problem with many applications. Furthermore, most existing algorithms ignore the higher-order relationships (i.e., motifs) among the nodes. Motifs are shown to be helpful for dense subgraph discovery but their wide spectrum in heterogeneous networks makes it challenging to utilize them effectively. In this work, we propose quark decomposition framework to locate dense subgraphs that are rich with a given motif. We focus on networks with directed edges and categorical attributes on nodes/edges. For a given motif, our framework builds subgraphs, called quarks, in varying quality and with hierarchical relations. Our framework is versatile, efficient, and extendible. We discuss the limitations and practical instantiations of our framework as well as the role confusion problem that needs to be considered in directed networks. We give an extensive evaluation of our framework in directed, signed-directed, and node-labaled networks. We consider various motifs and evaluate the quark decomposition using several real-world networks. Results show that quark decomposition performs better than the state-of-the-art techniques. Our framework is also practical and scalable to networks with up to 101M edges |
Title: Motif-Preserving Dynamic Attributed Network Embedding |
Authors: Zhijun Liu (Yantai University), Chao Huang (University of Notre Dame), Yanwei Yu (Ocean University of China) and Junyu Dong (Ocean University of China). |
Network embedding has emerged as a new learning paradigm to embed complex network into a low-dimensional vector space while preserving node proximities in both network structures and properties. It advances various network mining tasks, ranging from link prediction to node classification. However, most existing works primarily focus on static networks while many networks in real-life evolve over time with addition/deletion of links and nodes, naturally with associated attribute evolution. In this work, we present uline{textbf{M}}otif-preserving uline{textbf{T}}emporal uline{textbf{S}}hift uline{textbf{N}}etwork (MTSN), a novel dynamic network embedding framework that simultaneously models the local high-order structures and temporal evolution for dynamic attributed networks. Specifically, MTSN learns node representations by stacking the proposed TIME module to capture both local high-order structural proximities and node attributes by textit{motif-preserving encoder} and temporal dynamics by textit{temporal shift operation} in a dynamic attributed network. Finally, we perform extensive experiments on four real-world network datasets to demonstrate the superiority of MTSN against state-of-the-art network embedding baselines in terms of both effectiveness and efficiency. |
Title: MQuadE: a Unified Model for Knowledge Fact Embedding |
Authors: Jinxing Yu (Baidu Research), Yunfeng Cai (Baidu Research), Mingming Sun (Baidu Research) and Ping Li (Baidu Research). |
The methodology of knowledge graph embedding (KGE) tries to find appropriate representations for entities and relations and appropriate mathematical computations between the representations to approximate the symbolic and logical relationships between entities. One major challenge for knowledge graph embedding is that the relations in real- world knowledge bases exhibit very complex behaviors: they can be injective (1-1) or non-injective (1-N, N-1, or N-N), symmetry or skew-symmetry; one relation may be the inversion of another relation; one relation may be the composition of other two relations (where the composition can be either Abelian or non-Abelian). However, to our knowledge, there hasn't been any theoretical guarantee that these complex behaviors can be modeled by existing KGE methods. In this paper, we propose a method called MQuadE to solve the problem. In MQuadE, we represent a fact triple $(h, r, t)$ (that is, (head entity, relation, tail entity)) in the knowledge graph with a matrix quadruple $(bm{H}, bm{R}, hat{bm{R}}, bm{T})$, where $bm{H}$ and $bm{T}$ are the representations of $h$ and $t$ respectively and $ |
Title: MSTREAM: Fast Anomaly Detection in Multi-Aspect Streams |
Authors: Siddharth Bhatia (National University of Singapore), Arjit Jain (IIT Bombay), Pan Li (Purdue University), Ritesh Kumar (IIT Kanpur) and Bryan Hooi (National University of Singapore). |
Given a stream of entries in a multi-aspect data setting i.e., entries having multiple dimensions, how can we detect anomalous activities in an unsupervised manner? For example, in the intrusion detection setting, existing work seeks to detect anomalous events or edges in dynamic graph streams, but this does not allow us to take into account additional attributes of each entry. Our work aims to define a streaming multi-aspect data anomaly detection framework, termed MSTREAM which can detect unusual group anomalies as they occur, in a dynamic manner. MSTREAM has the following properties: (a) it detects anomalies in multi-aspect data including both categorical and numeric attributes; (b) it is online, thus processing each record in constant time and constant memory; (c) it can capture the correlation between multiple aspects of the data. MSTREAM is evaluated over the KDDCUP99, CICIDS-DoS, UNSW-NB 15 and CICIDS-DDoS datasets, and outperforms state-of-the-art baselines. |
Title: MulDE: Multi-teacher Knowledge Distillation for Low-dimensional Knowledge Graph Embeddings |
Authors: Kai Wang (School of Software, Dalian University of Technology), Yu Liu (School of Software, Dalian University of Technology), Qian Ma (School of Software, Dalian University of Technology) and Quan Z. Sheng (Department of Computing, Macquarie University). |
Link prediction based on knowledge graph embeddings (KGE) aims to predict new triples to automatically construct knowledge graphs (KGs). However, recent KGE models achieve performance improvements by excessively increasing the embedding dimensions, which may cause enormous training costs and require more storage space. To address this challenge, we first conduct a theoretical analysis on the capacity of low-dimensional space for KG embeddings based on the principle of minimum entropy. Instead of training high-dimensional models, we propose MulDE, a novel knowledge distillation framework, which includes multiple low-dimensional hyperbolic KGE models as teachers and two student components, Junior and Senior. Under a novel iterative distillation strategy, the Junior component, a low-dimensional KGE model, asks teachers actively based on its preliminary prediction results, and the Senior component integrates teachers' knowledge adaptively to train the Junior component based on two mechanisms: relation-specific scaling and contrast attention. The experimental results show that MulDE can effectively improve the performance and training speed of low-dimensional KGE models. The distilled 32-dimensional model is competitive compared to the state-or-the- art high-dimensional methods on several commonly-used datasets. The source code of our work is available on the GitHub. |
Title: Multi-domain Dialogue State Tracking with Recursive Inference |
Authors: Lizi Liao (National University of Singapore), Tongyao Zhu (National University of Singapore), Le Hong Long (National University of Singapore) and Tat Seng Chua (National University of Singapore). |
Multi-domain dialogue state tracking (DST) is a critical component for monitoring user goals during the course of an interaction. Existing approaches have relied on dialogue history indiscriminately or updated on the most recent turns incrementally. However, in spite of modeling it based on fixed ontology or open vocabulary, the former setting violates the interactive and progressing nature of dialogue, while the later easily gets affected by the error accumulation conundrum. Here, we propose a Recursive Inference mechanism (ReInf) to resolve DST in multi-domain scenarios that call for more robust and accurate tracking capability. Specifically, our agent reversely reviews the dialogue history until the agent has pinpointed sufficient turns confidently for slot value prediction. It also recursively factors in potential dependencies among domains and slots to further solve the co-reference and value sharing problems. The quantitative and qualitative experimental results on the MultiWOZ 2.1 corpus demonstrate that the proposed ReInf not only outperforms the state-of-the-art methods, but also achieves reasonable turn reference and interpretable slot co- reference. |
Title: Multi-level Connection Enhanced Representation Learning for Script Event Prediction |
Authors: Lihong Wang (CNCERT), Juwei Yue (Beijing Advanced Innovation Center of Big data and Brain Computing, Beihang University), Shu Guo (CNCERT), Jiawei Sheng (Institute of Information Engineering, China Academy of Sciences), Qianren Mao (Beijing Advanced Innovation Center of Big data and Brain Computing, Beihang University), Zhenyu Chen (Beijing Advanced Innovation Center of Big data and Brain Computing, Beihang University), Shenghai Zhong (Beijing Advanced Innovation Center of Big data and Brain Computing, Beihang University) and Chen Li (Beijing Advanced Innovation Center of Big data and Brain Computing, Beihang University). |
Script event prediction (SEP) aims to choose a correct subsequent event from a candidate list, given a chain of ordered context events. Event representation learning has been proposed and successfully applied to this task. Most previous methods learning representations mainly focus on coarse-grained connections at event or chain level, while ignoring more fine-grained connections between events. Here we propose a novel framework which can enhance the representation learning of events by mining their connections at multiple granularity levels, including argument level, event level and chain level. In our method, we first employ a masked self-attention mechanism to model the relations between the components of events (i.e. arguments). Then, a directed graph convolutional network is further utilized to model the temporal or causal relations between events in the chain. Finally, we introduce an attention module to the context event chain, so as to dynamically aggregate context events with respect to the current candidate event. By fusing threefold connections in a unified framework, our approach can learn more accurate argument/event/chain representations, and thus leads to better prediction performance. Comprehensive experiment results on public New York Times corpus demonstrate that our model outperforms other state-of-the-art baselines. |
Title: Multi-level Hyperedge Distillation for Social Linking Prediction on Sparsely Observed Networks |
Authors: Xiangguo Sun (Southeast University), Bo Liu (Southeast University), Hongxu Chen (University of Technology Sydney, Australia), Wang Han (Southeast University), Qing Meng (Southeast University), Jiuxin Cao (Southeast University) and Hongzhi Yin (The University of Queensland). |
Social linking prediction is one of the most fundamental problems in online social networks and has attracted researchers’ persistent attention. Most of the existing works predict unobserved links using graph neural networks (GNNs) to learn node embeddings upon pair-wise relations. Despite promising results given enough observed links, these models are still challenging to achieve heartstirring performance when observed links are extremely limited. The main reason is they only focus on the smoothness of node representations on pair-wise relations. Unfortunately, this assumption may fall when the networks do not have enough observed links to support it. To this end, we go beyond pair- wise relations and propose a new and novel framework using hypergraph neural networks with multi-level hyperedge distillation strategies. To break through the limitations of sparsely observed links, we introduce the hypergraph to uncover higher-level relations, which is exceptionally crucial to deduce unobserved links. A hypergraph allows one edge to connect multiple nodes, making it easier to learn better higher-level relations for link prediction. To overcome the restrictions of manually designed hypergraphs, which is constant in most hypergraph researches, we propose a new method to learn high-quality hyperedges using three novel hyperedges distillation strategies automatically. The generated hyperedges are hierarchical and follow the power-law distribution, which can significantly improve the link prediction performance. To predict unobserved links, we present a novel hypergraph neural networks named HNN. HNN takes the multi-level hypergraphs as input and makes the node embeddings smooth on hyperedges instead of pairwise links only. Extensive evaluations on four real-world datasets demonstrate our model’s superior performance over state- of-the-art baselines, especially when the observed links are extremely reduced. |
Title: Multi-Session Diversity to Improve User Satisfaction in Web Applications |
Authors: Mohammadreza Esfandiari (NJIT), Ria Mae Borromeo (Univ. of the Philippines), Sepideh Nikookar (NJIT), Paras Sakharkar (NJIT), Sihem Amer-Yahia (CNRS) and Senjuti Basu Roy (NJIT). |
In various Web applications, users consume content in a series of sessions. That is prevalent in online music listening, where a session is a channel and channels are listened to in sequence, or in crowdsourcing, where a session is a set of tasks and task sets are completed in sequence. Content diversity can be defined in more than one way, e.g., based on artists or genres for music, or on requesters or rewards in crowdsourcing. A user may prefer to experience diversity within or across sessions. Naturally, intra-session diversity is set-based, whereas, inter-session diversity is sequence- based. This novel multi-session diversity gives rise to four bi-objective problems with the goal of minimizing or maximizing inter and intra diversities. Given the hardness of those problems, we propose to formulate a constrained optimization problem that optimizes inter diversity, subject to the constraint of intra diversity. We develop an efficient algorithm to solve our problem. Our experiments with human subjects on two real datasets, music and crowdsourcing, show our diversity formulations do serve different user needs, and yield high user satisfaction. Our large data experiments on real and synthetic data empirically demonstrate that our solution satisfy the theoretical bounds and is highly scalable, compared to baselines. |
Title: Multi-view Graph Contrastive Representation Learning for Drug-Drug Interaction Prediction |
Authors: Yingheng Wang (Department of Electronic Engineering, Tsinghua University), Yaosen Min (Institute of Interdisciplinary Information Sciences, Tsinghua University), Xin Chen (Technology and Engineering Group, Tencent) and Ji Wu |
Potential Drug-Drug Interaction (DDI) occurring while treating complex or co-existing diseases with drug combinations may cause changes in drugs' pharmacological activity. Therefore, DDI prediction has been an important task in the medical healthy machine learning community. Recently, graph-based learning methods have aroused widespread interest and is proved to be a priority for this task. However, these methods are often limited to exploiting the inter-view drug molecular structure and ignoring the drug's intra-view interaction relationship, vital to capturing the complex DDI patterns. This study presents a new method, multi-view graph contrastive representation learning for drug-drug interaction prediction, MIRACLE for brevity, to capture inter-view molecule structure and intra-view interactions between molecules simultaneously. MIRACLE treats a DDI network as a multi-view graph where each node in the interaction graph itself is a drug molecular graph instance. We use GCN to encode DDI relationships and a bond-aware attentive message propagating method to capture drug molecular structure information in the MIRACLE learning stage. Also, we propose a novel unsupervised contrastive learning component to balance and integrate the multi-view information. Comprehensive experiments on multiple real datasets show that MIRACLE outperforms the state-of-the-art DDI prediction models consistently. |
Title: Multilingual COVID-QA: Learning towards Global Information Sharing via Web Question Answering in Multiple Languages |
Authors: Rui Yan (Peking University), Weiheng Liao (Peking University), Jianwei Cui (Xiaomi Inc.), Hailei Zhang (Laiye Inc.), Yichuan Hu (Laiye Inc.) and Dongyan Zhao (Peking University). |
Since late December 2019, the city of Wuhan in China has reported an outbreak of atypical pneumonia, now known as COVID-19 caused by the novel coronavirus. Cases have spread to other cities in China and more than 200 countries and regions internationally. World Health Organization (WHO) officially declares the coronavirus outbreak a pandemic and the public health emergency has caused world-wide impact to daily lives: people are advised to keep social distance, in- person events have been moved online, and some function facilitates have been locked-down. Alternatively, the Web becomes an active venue for people to share knowledge and information. With respect to the on-going topic, people continuously post questions online and seek for answers. Yet, sharing global information conveyed in different languages is challenging because the language barrier is intrinsically unfriendly to monolingual speakers. In this paper, we propose a multilingual COVID-QA model to answer people's questions in their own languages while the model is able to absorb knowledge from other languages. Another challenge is that in most cases, the information to share does not have parallel data in multiple languages. To this end, we propose a novel framework which incorporates (unsupervised) translation alignment to learn as pseudo-parallel data. Then we train (cross-lingual) question-answering mapping and generation. We demonstrate the effectiveness of our proposed approach compared against a series of competitive baselines. In this way, we make it easier to share global information across the language barriers, and hopefully we contribute to the battle against COVID-19. |
Title: Multiplex Bipartite Network Embedding using Dual Hypergraph Convolutional Networks |
Authors: Hansheng Xue (The Australian National University), Luwei Yang (Alibaba Group), Vaibhav Rajan (National University of Singapore), Wen Jiang (Alibaba Group), Yi Wei (Alibaba Group) and Yu Lin (Research School of Computer Science, Australian National University). |
A bipartite network is a graph structure where nodes are from two distinct domains and only inter-domain interactions exist as edges. A large number of network embedding methods exist to learn vectorial node representations from general graphs with both homogeneous and heterogeneous node and edge types, including some that can specifically model the distinct properties of bipartite networks. However, these methods are inadequate to model multiplex bipartite networks (e.g., in e-commerce), that have multiple types of interactions (e.g., click, inquiry, and buy) and node attributes. Most real-world multiplex bipartite networks are also sparse and have imbalanced node distributions that are challenging to model. In this paper, we develop an unsupervised Dual HyperGraph Convolutional Network (DualHGCN) model that scalably transforms the multiplex bipartite network into two sets of homogeneous hypergraphs and uses spectral hypergraph convolutional operators, along with intra- and inter-message passing strategies to promote information exchange within and across domains, to learn effective node embeddings. We benchmark DualHGCN using four real- world datasets on link prediction and node classification tasks. Our extensive experiments demonstrate that DualHGCN significantly outperforms state-of-the-art methods, and is robust to varying sparsity levels and imbalanced node distributions. |
Title: Network of Tensor Time Series |
Authors: Baoyu Jing (University of Illinois at Urbana-Champaign), Hanghang Tong (University of Illinois at Urbana-Champaign) and Yada Zhu (IBM Thomas J. Watson Research Center). |
Co-evolving time series appears in a multitude of applications such as environmental monitoring, financial analysis, and smart transportation. This paper aims to address the following three challenges, including (C1) how to effectively model its multi-mode tensor structure at each time step; (C2) how to incorporate explicit relationship networks of the time series; (C3) how to model the implicit relationship of the temporal dynamics. We propose a novel model called Network of Tensor Time Series, which is comprised of two modules, including Tensor Graph Convolutional Network (TGCN) and Tensor Recurrent Neural Network (TRNN). TGCN tackles the first two challenges by generalizing Graph Convolutional Network (GCN) for flat graphs to tensor graphs, which captures the synergy between multiple graphs associated with the tensors. TRNN leverages tensor decomposition to balance the trade-off between the commonality and specificity of the co-evolving time series. The experimental results on five real-world datasets demonstrate the efficacy of the proposed method. |
Title: Neural Collaborative Reasoning |
Authors: Hanxiong Chen (Rutgers University), Shaoyun Shi (Tsinghua University), Yunqi Li (Rutgers University) and Yongfeng Zhang (Rutgers University). |
Collaborative Filtering (CF) is an important approach to recommendation. However, existing CF methods are mostly designed based on the idea of matching, i.e., by learning user and item embeddings from data using shallow or deep models, they try to capture the relevance patterns in data, so that a user embedding can be matched with appropriate item embeddings using designed or learned similarity functions. However, as a cognition rather than a perception intelligent task, recommendation requires not only the ability of pattern recognition and matching from data, but also the ability of cognitive logical reasoning in data. In this work, we propose to advance Collaborative Filtering (CF) to Collaborative Reasoning (CR), which means that each user knows part of the logical space, and they collaborate to conduct logical reasoning in the space to estimate the preferences for each other. Inspired by recent progress on neural- symbolic learning, we propose a Neural Collaborative Reasoning (NCR) framework to integrate the power of embedding learning and logical reasoning, where the embeddings capture similarity patterns in data from perceptual perspectives, and the logic facilitates cognitive reasoning for informed decision making. An important challenge, however, is to bridge differentiable neural networks and symbolic reasoning in a shared architecture for optimization and inference. To solve the problem, we propose a neural logic reasoning architecture, which learns logical operations such as AND (∧), OR (∨) and NOT (¬) as neural modules for implication reasoning (→) in the form of Horn clause. In this way, each Horn clause can be equivalently organized as a neural network, so that logic reasoning and prediction can be conducted in a continuous space. Experiments on several real-world datasets verified the advantages of our framework compared with both shallow and deep recommendation models as well as state-of-the-art logical reasoning models. |
Title: NeuroPose: 3D Hand Pose Tracking using EMG Wearables |
Authors: Yilin Liu (Pennsylvania State University), Shijia Zhang (Pennsylvania State University) and Mahanth Gowda (Pennsylvania State University). |
Ubiquitous finger motion tracking enables a number of exciting applications in augmented reality, sports analytics, rehabilitation-healthcare, haptics etc. This paper presents NeuroPose, a system that shows the feasibility of 3D finger motion tracking using a platform of wearable ElectroMyoGraphy (EMG) sensors. EMG sensors can sense electrical potential from muscles due to finger activation, thus offering rich information for fine-grained finger motion sensing. However converting the sensor information to 3D fingerposes is non trivial since signals from multiple fingers superimpose at the sensor in complex patterns. Towards solving this problem, NeuroPose fuses information from anatomical constraints of finger motion with machine learning architectures on Recurrent Neural Networks (RNN), Encoder-Decoder Networks, and ResNets to extract 3D finger motion from noisy EMG data. The generated motion pattern is temporally smooth as well as anatomically consistent. Furthermore, a transfer learning algorithm is leveraged to adapt a pretrained model on one user to a new user with minimal training overhead. A systematic study with 12 users demonstrates a median error of 6.24 degrees and a 90%-ile error of 18.33 degrees in tracking 3D finger joint angles. The accuracy is robust to natural variation in sensor mounting positions as well as changes in wrist positions of the user. NeuroPose is implemented on a smartphone with a processing latency of 0.101s, and a low energy overhead. |
Title: Nonlinear Higher-Order Label Spreading |
Authors: Francesco Tudisco (Gran Sasso Science Institute), Austin R. Benson (Cornell University) and Konstantin Prokopchik (Gran Sasso Science Institute). |
Label spreading is a general technique for semi-supervised learning with point cloud or network data, which can be interpreted as a diffusion of labels on a graph. While there are many variants of label spreading, nearly all of them are linear models, where the incoming information to a node is a weighted sum of information from neighboring nodes. Here, we add nonlinearity to label spreading through nonlinear functions of higher-order structure in the graph, namely triangles in the graph. For a broad class of nonlinear functions, we prove convergence of our nonlinear higher-order label spreading algorithm to the global solution of a constrained semi-supervised loss function. We demonstrate the efficiency and efficacy of our approach on a variety of point cloud and network datasets, where the nonlinear higher-order model compares favorably to classical label spreading, as well as hypergraph models and graph neural networks. |
Title: Not All Features Are Equal: Discovering Essential Features for Preserving Prediction Privacy |
Authors: Fatemehsadat Mireshghallah (University of California San Diego), Mohammadkazem Taram (University of California San Diego), Ali Jalali (Amazon), Ahmed Taha Elthakeb (University of California San Diego), Dean Tullsen (University of California San Diego) and Hadi Esmaeilzadeh (University of California San Diego). |
When receiving machine learning based services from the cloud, the provider does not need to receive all features; in fact, only a subset of the features are necessary for the target prediction task. Discerning this subset is the key problem of this work. We formulate this problem as a gradient-based perturbation maximization method that discovers this subset in the input feature space with respect to the functionality of the prediction model used by the provider. After identifying the subset, our framework, Sieve, suppresses the rest of the features using utility-preserving constant values that are discovered through a separate gradient-based optimization process. These optimization processes are performed offline. During prediction, the features are accordingly suppressed and further obfuscated. We show that Sieve does not necessarily require collaboration from the service provider beyond its normal service, and can be applied in scenarios where we only have black-box access to the service provider's model. We theoretically guarantee that Sieve's optimizations reduce the upper bound of the Mutual Information (MI) between the data and the sifted representations that are sent out. Experimental results show that Sieve reduces the mutual information between the input and the sifted representations by 85.01% with only negligible reduction in utility (1.42%). In addition, we show that Sieve greatly diminishes adversaries' ability to learn and infer non-conducive features. |
Title: NTAM: Neighborhood-Temporal Attention Model for Disk Failure Prediction in Cloud Platforms |
Authors: Chuan Luo (Microsoft Research), Pu Zhao (Microsoft Research), Bo Qiao (Microsoft Research), Youjiang Wu (Microsoft Azure), Hongyu Zhang (The University of Newcastle), Wei Wu (Leibniz University Hannover), Weihai Lu (Microsoft Research), Yingnong Dang (Microsoft Azure), Saravanakumar Rajmohan (Microsoft Office), Qingwei Lin (Microsoft Research) and Dongmei Zhang (Microsoft Research). |
With the rapid deployment of cloud platforms, high service reliability is of critical importance. An industrial cloud platform contains a huge number of disks and disk failure is a common cause of service unreliability. In recent years, many machine learning based disk failure prediction approaches have been proposed, which can predict disk failures based on disk status data before the failures actually happen. In this way, proactive actions can be taken in advance to improve service reliability. However, existing approaches treat each disk individually and do not explore the influence of the neighboring disks. In this paper, we propose Neighborhood-Temporal Attention Model (NTAM), a novel deep learning based approach to disk failure prediction. When predicting whether or not a disk will fail in near future, NTAM is a novel approach that not only utilizes a disk's own status data, but also considers its neighbors' status data. Moreover, NTAM includes a novel attention-based temporal component to capture the temporal nature of the disk status data. Besides, we propose a data enhancement method, called Temporal Progressive Sampling (TPS), to handle the extreme data imbalance issue. We evaluate NTAM on a public dataset as well as two industrial datasets collected from around 9 million of disks in an industrial public cloud platform. Our experimental results show that NTAM significantly outperform state- of-the-art competitors. Also, our empirical evaluations indicate the effectiveness of the neighborhood-ware component and the temporal component underlying NTAM and the effectiveness of TPS. More encouragingly, we have successfully applied NTAM and TPS to Company M's public cloud platform and obtained practical benefits in industrial practice. |
Title: OCT-GAN: A Neural ODE-based Conditional Tabular GANs |
Authors: Jayoung Kim (Sookmyung Women's University), Jingsung Jeon (Yonsei University), Jaehoon Lee (Yonsei University), Jihyeon Hyeong (Yonsei University) and Noseong Park (Yonsei University). |
Tabular data is one of the important sources of many web-based applications. Therefore, synthesizing tabular data is receiving much attention these days for various purposes. With sophisticate synthetic data, one can augment its training data. For the past couple of years, tabular data synthesis techniques have been greatly improved. Recent work made progress to address many problems in synthesizing tabular data, such as the imbalanced distribution and multimodality problems. However, the data utility of state-of-the-art methods is not satisfactory yet. In this work, we significantly improve the utility by designing our generator and discriminator based on neural ordinary differential equations (NODEs). After showing that NODEs have theoretically preferred characteristics for generating tabular data, we introduce our designs. The NODE-based discriminator performs a hidden vector evolution trajectory-based classification rather than classifying with a hidden vector at the last layer only. Our generator also adopts an ODE layer at the very beginning of its architecture to transform its initial input vector (i.e., the concatenation of a noisy vector and a condition vector in our case) onto another latent vector space suitable for the generation process. We conduct experiments with 13 datasets, including but not limited to insurance fraud detection, online news article prediction, and so on, and our presented method outperforms other state-of-the-art tabular data synthesis methods in many cases of our classification, regression, and clustering experiments. |
Title: On the Equivalence of Decoupled Graph Convolution Network and Label Propagation |
Authors: Hande Dong (University of Science and Technology of China), Jiawei Chen (University of Science and Technology of China), Fuli Feng (National University of Singapore), Xiangnan He (University of Science and Technology of China), Shuxian Bi (University of Science and Technology of China), Zhaolin Ding (North Carolina State University) and Peng Cui (Tsinghua University). |
The original design of Graph Convolution Network (GCN) couples feature transformation and neighborhood aggregation for node representation learning. Recently, some work shows that coupling is inferior to decoupling, which supports deep graph propagation and has become the latest paradigm of GCN (e.g., APPNP and SGCN). Despite effectiveness, the working mechanisms of the decoupled GCN are not well understood. In this paper, we explore the decoupled GCN for semi-supervised node classification from a novel and fundamental perspective — label propagation. We conduct thorough theoretical analyses, proving that the decoupled GCN is essentially the same as the two-step label propagation: first, propagating the known labels along the graph to generate pseudo-labels for the unlabeled nodes, and second, training normal neural network classifiers on the augmented pseudo- labeled data. More interestingly, we reveal the effectiveness of decoupled GCN: going beyond the conventional label propagation, it could automatically assign structure- and model- aware weights to the pseudo-label data. This explains why the decoupled GCN is relatively robust to the structure noise and over-smoothing, but sensitive to the label noise and model initialization. Based on this insight, we propose a new label propagation method named Propagation then Training Adaptively (PTA), which overcomes the flaws of the decoupled GCN with a dynamic and adaptive weighting strategy. Our PTA is simple yet more effective and robust than decoupled GCN. We empirically validate our findings on four benchmark datasets, demonstrating the advantages of our method. |
Title: On the Feasibility of Automated Built-in Function Modeling for PHP Symbolic Execution |
Authors: Penghui Li (The Chinese University of Hong Kong), Wei Meng (The Chinese University of Hong Kong), Kangjie Lu (University of Minnesota) and Changhua Luo (The Chinese University of Hong Kong). |
Analyzing language-specific built-in functions is a necessary component of multiple program analysis tasks. In symbolic execution and exploit generation, built-in functions are usually manually translated into SMT-LIB specifications for constraint solving. Such translation requires an excessive amount of human efforts and deep understandings of the function behaviors. Incorrect translation can invalidate the results of their client applications and can bring severe security problems, e.g. a true positive case is classified as a (false) negative. In this paper, we explore the feasibility of automating the process of modeling PHP built-in functions into SMT-LIB specifications. We synthesize C programs that are equally transformed from the constraints collected in PHP symbolic execution. We then apply symbolic execution over the C program to find a suitable path for the program analysis task, which turns a constraint solving problem into a path finding problem. As the source code of PHP built-in functions is available, our method smoothly tackles the challenge of manually modeling PHP built-in functions in an automated manner. We thoroughly compare our automated method with the state-of-the-art manual modeling tool. The evaluation results show that our automated method can easily model more built-in functions and achieve higher correctness. It can solve a similar number of constraints in generating exploits for web application vulnerabilities. Our empirical analysis shows that the manual method and automated method have different strengths. Our study suggests combining manual and automated modeling to achieve better accuracy. |
Title: On the Value of Wikipedia as a Gateway to the Web |
Authors: Tiziano Piccardi (Ecole Polytechnique Fédérale de Lausanne), Miriam Redi (Wikimedia Foundation), Giovanni Colavizza (University of Amsterdam) and Robert West (Ecole Polytechnique Fédérale de Lausanne). |
By linking to external websites, Wikipedia acts as a gateway to the Web. To date, however, little is known about the amount of traffic generated by Wikipedia's external links. We fill this gap in a detailed analysis of usage logs gathered from Wikipedia users' client devices. Our analysis proceeds in three steps: First, we quantify the level of engagement with external links, finding that, in one month, English Wikipedia generates 43M clicks to external websites, which are roughly evenly distributed over links in infoboxes, cited references, and article bodies. Official links listed in infoboxes have by far the highest click-through rate (CTR), 2.47% on average. In particular, official links associated with articles about businesses, educational institutions, and websites have the highest CTR, whereas official links associated with articles about geographical content, television, and music have the lowest CTR. Second, we investigate patterns of engagement with external links, finding that Wikipedia frequently serves as a stepping stone between search engines and third-party websites, effectively fulfilling information needs that search engines do not meet. Third, we quantify the hypothetical economic value of the clicks received by external websites from English Wikipedia, by estimating that the respective website owners would need to pay a total of $8-15 million per month to obtain the same volume of traffic via sponsored search. Overall, these findings shed light on Wikipedia's role not only as an important source of information, but also as a high-traffic gateway to the broader Web ecosystem. |
Title: One Detector to Rule Them All: Towards a General Deepfake Attack Detection Framework |
Authors: Shahroz Tariq (Sungkyunkwan University), Sangyup Lee (Sungkyunkwan University) and Simon Woo (Sungkyunkwan University). |
Deep learning-based video manipulation methods have become widely accessible to the masses. With little to no effort, people can quickly learn how to generate deepfake (DF) videos. In particular, females have been occasional victims of deepfake, which are widely spread on the Web. While deep learning-based detection methods have been proposed to identify specific types of DFs, their performance suffers for other types of deepfake methods, including real-world deepfakes, on which they are not sufficiently trained. In other words, most of the proposed deep learning-based detection methods lack transferability and generalizability. Beyond detecting a single type of DF from benchmark deepfake datasets, we focus on developing a generalized approach to detect multiple types of DFs, including deepfakes from unknown generation methods such as DeepFake-in-the-Wild (DFW) videos. To better cope with unknown and unseen deepfakes, we introduce a Convolutional LSTM-based Residual Network (CLRNet), which adopts a unique model training strategy and explores spatial as well as the temporal information in a deepfakes. Through extensive experiments, we show that existing defense methods are not ready for real-world deployment. Whereas our defense method (CLRNet) achieves far better generalization when detecting various benchmark deepfake methods (97.57% on average). Furthermore, we evaluate our approach with a high-quality DeepFake-in-the-Wild dataset, collected from the Internet containing numerous videos and having more than 150,000 frames. Our CLRNet model demonstrated that it generalizes well against high-quality DFW videos by achieving 93.86% detection accuracy, outperforming existing state-of-the-art defense methods by a considerable margin. |
Title: Online Disease Self-diagnosis with Inductive Heterogeneous Graph Convolutional Networks |
Authors: Zifeng Wang (Tsinghua University), Rui Wen (Tencent), Xi Chen (Tencent), Shilei Cao (Tencent Jarvis Lab), Shao-Lun Huang (Tsinghua-Berkeley Shenzhen Institute, Tsinghua University), Buyue Qian (Xi'an Jiaotong University) and Yefeng Zheng (Tencent Jarvis Lab). |
We propose a Healthcare Graph Convolutional Network (HealGCN) to offer disease self-diagnosis service for online users, based on the Electronic Healthcare Records (EHRs). Two main challenges are focused in this paper for online disease self- diagnosis: (1) serving cold-start users via graph convolutional networks and (2) handling scarce clinical description via a symptom retrieval system. To this end, we first organize the EHR data into a heterogeneous graph that is capable of modeling complex interactions among users, symptoms and diseases, and tailor the graph representation learning towards disease diagnosis with an inductive learning paradigm. Then, we build a disease self-diagnosis system with a corresponding EHR Graph-based Symptom Retrieval System (GraphRet) that can search and provide a list of relevant alternative symptoms by tracing the predefined meta-paths. GraphRet helps enrich the seed symptom set through the EHR graph, resulting in better reasoning ability of our HealGCN model, when confronting users with scarce descriptions. At last, we validate our model on a large-scale EHR dataset, the superior performance does confirm our model's effectiveness in practice. |
Title: Online Label Aggregation: A Variational Bayesian Approach |
Authors: Chi Hong (Delft University of Technology), Amirmasoud Ghiassi (Delft University of Technology), Yichi Zhou (Tsinghua University), Robert Birke (ABB Research) and Lydia Y. Chen (Delft University of Technology). |
Noisy labeled data is more a norm than a rarity for crowd sourced contents. It is effective to distill noise and infer correct labels through aggregation results from crowd workers. To ensure the time relevance and overcome slow responses of workers, online label aggregation is increasingly requested, calling for solutions that can incrementally infer true label distribution via subsets of data items. In this paper, we propose a novel online label aggregation framework, BILA, which employs variational Bayesian inference method and designs a novel stochastic optimization scheme for incremental training. BILA is flexible to accommodate any generating distribution of labels by the exact computation of its posterior distribution. We also derive the convergence bound of the proposed optimizer. We compare BILA with the state of the art based on minimax entropy, neural networks and expectation maximization algorithms, on synthetic and real-world data sets. Our evaluation results on various online scenarios show that BILA can effectively infer the true labels, with an error rate reduction of at least 10 to 1.5 percent points for synthetic and real-world datasets, respectively. |
Title: Online Mobile App Usage as an Indicator of Sleep Behavior and Job Performance |
Authors: Chunjong Park (University of Washington), Morelle Arian (University of Washington), Xin Liu (University of Washington), Leon Sasson (Rise Science Inc.), Jeffrey Kahn (Rise Science Inc.), Shwetak Patel (University of Washington), Alex Mariakakis (University of Toronto) and Tim Althoff (University of Washington). |
Sleep is critical to human function, mediating factors like memory, mood, energy, and alertness; therefore, it is commonly conjectured that a good night’s sleep is important for job performance. However, both real-world sleep behavior and job performance are difficult to measure at scale. In this work, we demonstrate that people’s every-day interactions with online mobile apps can reveal insights into their job performance in real-world contexts. We present an observational study in which we objectively tracked the sleep behavior and job performance of salespeople (N=15) and athletes (N=19)for 18 months, leveraging a mattress sensor and online mobile app to conduct the largest study of this kind to date. We first demonstrate that cumulative sleep measures are significantly correlated with job performance metrics, showing that an hour of daily sleep loss for a week was associated with a 9.0% average reduction in con-tracts established for salespeople and a 9.5% average reduction in game grade for the athletes. We then investigate the utility of online app interaction time as a passively collectible and scalable performance indicator. We show that app interaction time is correlated with the job performance of the athletes, but not the salespeople. To support that our app-based performance indicator truly captures meaningful variation in psychomotor function as it relates to sleep and is robust against potential confounds, we conducted a second study to evaluate the relationship between sleep behavior and app interaction time in a cohort of 274 participants. Using a generalized additive model to control for per-participant random effects, we demonstrate that participants who lost one hour of daily sleep for a week exhibited average app interaction times that were 0.5 seconds slower. We also find that app interaction time exhibits meaningful chronobiologically consistent correlations with sleep history, time awake, and circadian rhythms. The findings from this work reveal an opportunity for online app developers to generate new insights regarding cognition and productivity. |
Title: OntoZSL: Ontology-enhanced Zero-shot Learning |
Authors: Yuxia Geng (Zhejiang University), Jiaoyan Chen (University of Oxford), Zhuo Chen (Zhejiang University), Jeff Z. Pan (University of Edinburgh), Zhiquan Ye (Zhejiang University), Zonggang Yuan (Huawei NAIE CTO Office), Yantao Jia (Huawei Technologies Co., Ltd) and Huajun Chen (Zhejiang University). |
Zero-shot Learning (ZSL), which aims to predict for those classes that have never appeared in the training data, has arisen hot research interests. The key of implementing ZSL is to leverage the prior knowledge of classes which builds the semantic relationship between classes and enables the transfer of the learned models (e.g., features) from training classes (i.e., seen classes) to unseen classes. However, the priors adopted by the existing methods are relatively limited with incomplete semantics. In this paper, we explore richer and more competitive prior knowledge to model the inter- class relationship for ZSL via ontology-based knowledge representation and semantic embedding. Meanwhile, to address the data imbalance between seen classes and unseen classes, we developed a generative ZSL framework with Generative Adversarial Networks (GANs). Our main findings include: (i) an ontology-enhanced ZSL framework that can be applied to different domains, such as image classification (IMGC) and knowledge graph completion (KGC); (ii) a comprehensive evaluation with multiple zero- shot datasets from different domains, where our method often achieves better performance than the state-of-the-art models. In particular, on four representative ZSL baselines of IMGC, the ontology-based class semantics outperform the previous priors e.g., the word embeddings of classes by an average of 12.3 accuracy points in the standard ZSL across two example datasets (see Figure 4). |
Title: Outlier-Resilient Web Service QoS Prediction |
Authors: Fanghua Ye (University College London), Zhiwei Lin (Sun Yat-sen University), Chuan Chen (Sun Yat-sen University), Zibin Zheng (Sun Yat-sen University) and Hong Huang (Huazhong University of Science andTechnology). |
The proliferation of Web services makes it difficult for users to select the most appropriate one among numerous functionally identical or similar service candidates. Quality-of-Service (QoS) describes the non-functional characteristics of Web services, and it has become the key differentiator for service selection. However, users cannot invoke all Web services to obtain the corresponding QoS values due to high time cost and huge resource overhead. Thus, it is essential to predict unknown QoS values. Although various QoS prediction methods have been proposed, few of them have taken outliers into consideration, which may dramatically degrade the prediction performance. To overcome this limitation, we propose an outlier-resilient QoS prediction method in this paper. Our method utilizes Cauchy loss to measure the discrepancy between the observed QoS values and the predicted ones. Owing to the robustness of Cauchy loss, our method is resilient to outliers. We further extend our method to provide time-aware QoS prediction results by taking the temporal information into consideration. Finally, we conduct extensive experiments on both static and dynamic datasets. The results demonstrate that our method is able to achieve better performance than state-of-the-art baseline methods. |
Title: PairRank: Online Pairwise Learning to Rank by Divide-and-Conquer |
Authors: Yiling Jia (University of Virginia), Huazheng Wang (University of Virginia), Stephen Guo (WalmartLabs) and Hongning Wang (University of Virginia). |
Online Learning to Rank (OL2R) eliminates the need of explicit relevance annotation by directly optimizing the rankers from their interactions with users. However, the required exploration drives it away from successful practices in offline learning to rank, which limits OL2R's empirical performance and practical applicability. In this work, we propose to estimate a pairwise learning to rank model online. In each round, candidate documents are partitioned and ranked according to the model's confidence on the estimated pairwise rank order, and exploration is only performed on the uncertain pairs of documents, i.e., divide-and-conquer. Regret directly defined on the number of mis-ordered pairs is proven, which connects the online solution's theoretical convergence with its ranking performance. Comparisons against an extensive list of OL2R baselines on two public learning to rank benchmark datasets demonstrate the effectiveness of the proposed solution. |
Title: PARIMA: Viewport Adaptive 360-Degree Video Streaming |
Authors: Lovish Chopra (Indian Institute of Technology Kharagpur), Sarthak Chakraborty (Indian Institute of Technology Kharagpur), Abhijit Mondal (Indian Institute of Technology) and Sandip Chakraborty (Indian Institute of Technology Kharagpur). |
With increasing advancements in technologies for capturing 360-degree videos, advances in streaming such videos have become a popular research topic. However, streaming 360-degree videos require high bandwidth, thus escalating the need for developing optimized streaming algorithms. Researchers have proposed various methods to tackle the problem, considering the network bandwidth or attempt to predict future viewports in advance. However, most of the existing works either (1) do not consider video contents to predict user viewport, or (2) do not adapt to user preferences dynamically, or (3) require a lot of training data for new videos, thus making them potentially unfit for video streaming purposes. We develop PARIMA, a fast and efficient online viewport prediction algorithm that uses past viewports of users along with the trajectories of prime objects as a representative of video content to predict future viewports. We claim that the head movement of a user majorly depends upon the trajectories of the prime objects in the video. We employ a pyramid-based bitrate allocation scheme and perform a comprehensive evaluation of the performance of PARIMA. In our evaluation, we show that PARIMA outperforms state-of-the-art approaches, improving the Quality of Experience by over 30% while maintaining a short response time. |
Title: Partial-Softmax Loss based Deep Hashing |
Authors: Rong-Cheng Tu (Beijing Institute of Technology), Xian-Ling Mao (Beijing Institute of Technology), Jia-Nan Guo (Beijing Institute of technology), Wei Wei (Huazhong University of Science and Technology) and Heyan Huang (Beijing Institute of technology). |
Recently, deep supervised hashing methods have shown state-of-the-art performance by integrating feature learning and hash codes learning into an end-to-end network to generate high-quality hash codes. However, it is still a challenge to learn discriminative hash codes for preserving the label information of images efficiently. To overcome this difficulty, in this paper, we propose a novel Partial-Softmax Loss based Deep Hashing, called PSLDH, to generate high-quality hash codes. Specifically, PSLDH first trains a category hashing network to generate a discriminative hash code for each category, and the hash code will preserve semantic information of the corresponding category well. Then, instead of defining the similarity between datapairs using their corresponding label vectors, we directly use the learned hash codes of categories to supervise the learning process of image hashing network, and a novel Partial-SoftMax loss is proposed to optimize the image hashing network. By minimizing the novel Partial-SoftMax loss, the learned hash codes can preserve the label information of images sufficiently. Extensive experiments on three benchmark datasets show that the proposed method outperforms the state-of-the-art baselines in image retrieval task. |
Title: Pathfinder Discovery Networks for Neural Message Passing |
Authors: Benedek Rozemberczki (The University of Edinburgh), Peter Englert (Google Research), Amol Kapoor (Google Research), Martin Blais (Google Research) and Bryan Perozzi (Google Research). |
In this work we propose Pathfinder Discovery Networks (PDNs), a method for jointly learning a message passing graph over a multiplex network with a downstream semi-supervised model. PDNs inductively learn an aggregated weight for each edge, optimized to produce the best outcome for the downstream learning task. PDNs are a generalization of attention mechanisms on graphs which allow flexible construction of similarity functions between nodes, edge convolutions, and cheap multiscale mixing layers. We show that PDNs overcome weaknesses of existing methods for graph attention (e.g. Graph Attention Networks), such as the diminishing weight problem. Our experimental results demonstrate competitive predictive performance on academic node classification tasks. Additional results from a challenging suite of node classification experiments show how PDNs can learn a wider class of functions than existing baselines. We analyze the relative computational complexity of PDNs, and show that PDN runtime is not considerably higher than static-graph models. Finally, we discuss how PDNs can be used to construct an easily interpretable attention mechanism that allows users to understand information propagation in the graph. |
Title: Peer Grading the Peer Reviews: A Dual-Role Approach for Lightening Scholarly Paper Review Processes |
Authors: Ines Arous (University of Fribourg), Jie Yang (Delft University of Technology), Mourad Khayati (University of Fribourg) and Philippe Cudre-Mauroux (U. of Fribourg). |
Scientific peer review is pivotal to maintain quality standards for academic publication. The effectiveness of the process is currently being challenged by the rapid increase of paper submissions in various conferences. Those conferences need to recruit a large number of reviewers of different levels of expertise and background. The submitted reviews often do not meet the conformity standards of the conference. Such a situation poses an ever-bigger burden to meta-reviewers in making decisions. In this work, we propose a human-AI approach that estimates the conformity of reviews to the conference standards. Specifically, we ask peers to grade each other's reviews anonymously on important criteria of review conformity such as clarity and consistency. We introduce a Bayesian framework that learns the conformity of reviews from the peer gradings and the historical reviews and decisions of a conference, while taking into account grading reliability. Our approach helps meta-reviewers easily identify reviews that require clarification and detect submissions requiring discussions while not inducing additional overhead from reviewers. Through a large-scale crowdsourced study where crowd workers are recruited as graders, we show that the proposed approach outperforms machine learning or review gradings alone and that it can be easily integrated into existing peer review systems. |
Title: Personalized Approximate Pareto-Efficient Recommendation |
Authors: Ruobing Xie (WeChat Search Application Department, Tencent), Yanlei Liu (WeChat Search Application Department, Tencent), Shaoliang Zhang (WeChat Search Application Department, Tencent), Rui Wang (WeChat Search Application Department, Tencent), Feng Xia (WeChat Search Application Department, Tencent) and Leyu Lin (WeChat Search Application Department, Tencent). |
Real-world recommendation systems usually have different learning objectives and evaluation criteria on accuracy, diversity or novelty. Therefore, multi-objective recommendation (MOR) has been widely explored to jointly model different objectives. Pareto efficiency, where no objective can be further improved without hurting others, is viewed as an optimal situation in multi-objective optimization. Recently, Pareto efficiency model has been introduced to MOR, while all existing scalarization methods only have shared objective weights for all instances. To capture users' objective- level preferences and enhance personalization in Pareto-efficient recommendation, we propose a novel Personalized Approximate Pareto-Efficient Recommendation (PAPERec) framework for multi-objective recommendation. Specifically, we design an approximate Pareto-efficient learning based on scalarization with KKT conditions that closely mimics Pareto efficiency, where users have personalized weights on different objectives. We propose a Pareto-oriented reinforcement learning module to find appropriate personalized objective weights for each user, with the weighted sum of multiple objectives' gradients considered in reward. In experiments, we conduct extensive offline and online evaluations on a real- world recommendation system. The significant improvements verify the effectiveness of PAPERec in practice. We have deployed PAPERec on a well-known real-world recommendation system, affecting millions of users. The source code will be released to facilitate future explorations. |
Title: Personalized Treatment Selection using Causal Heterogeneity |
Authors: Ye Tu (Linkedin), Kinjal Basu (Linkedin), Cyrus DiCiccio (Linkedin), Romil Bansal (Linkedin), Preetam Nandy (Linkedin), Padmini Jaikumar (Linkedin) and Shaunak Chatterjee (Linkedin). |
Randomized experimentation (also known as A/B testing or bucket testing) is widely used in the internet industry to measure the metric impact obtained by different treatment variants. A/B tests identify the treatment variant showing the best performance, which then becomes the chosen or selected treatment for the entire population. However, the effect of a given treatment can differ across experimental units and a personalized approach for treatment selection can greatly improve upon the usual global selection strategy. In this work, we develop a framework for personalization through (i) estimation of heterogeneous treatment effect at either a cohort or member-level, followed by (ii) selection of optimal treatment variants for cohorts (or members) obtained through (deterministic or stochastic) constrained optimization. We perform a two-fold evaluation of our proposed methods. First, a simulation analysis is conducted to study the effect of personalized treatment selection under carefully controlled settings. This simulation illustrates the differences between the proposed methods and the suitability of each with increasing uncertainty. We also demonstrate the effectiveness of the method through a real-life example related to serving notifications at Linkedin. The solution significantly outperformed both heuristic solutions and the global treatment selection baseline leading to a sizable win on top-line metrics like member visits. |
Title: PFA: Privacy-preserving Federated Adaptation for Effective Model Personalization |
Authors: Bingyan Liu (Peking University), Yao Guo (Peking University) and Xiangqun Chen (Peking University). |
Federated learning (FL) has become a prevalent distributed machine learning paradigm with improved privacy. After learning, the resulting federated model should be further personalized to each different client. While several methods have been proposed to achieve personalization, they are typically limited to a single local device, which may incur bias or overfitting since data in a single device is extremely limited. In this paper, we attempt to realize personalization beyond a single client. The textit{motivation} is that during the FL process, there may exist many clients with similar data distribution, and thus the personalization performance could be significantly boosted if these similar clients can cooperate with each other. Inspired by this, this paper introduces a new concept called textit{federated adaptation}, targeting at adapting the trained model in a federated manner to achieve better personalization results. However, the key challenge for federated adaptation is that we could not outsource any raw data from the client during adaptation, due to textit{privacy} concerns. In this paper, we propose textbf{PFA}, a framework to accomplish textbf{P}rivacy- preserving textbf{F}ederated textbf{A}daptation. PFA leverages the sparsity property of neural networks to generate privacy-preserving representations and uses them to efficiently identify clients with similar data distributions. Based on the grouping results, PFA conducts an FL process in a group-wise way on the federated model to accomplish the adaptation. For evaluation, we manually construct several practical FL datasets based on public datasets in order to simulate both the textit{class-imbalance} and textit{background-difference} conditions. Extensive experiments on these datasets and popular model architectures demonstrate the effectiveness of PFA, outperforming other state-of-the-art methods by a large margin while ensuring user privacy. |
Title: Pick and Choose: A GNN-based Imbalanced Learning Approach for Fraud Detection |
Authors: Yang Liu (Institute of Computing Technology, Chinese Academy of Sciences), Xiang Ao (Institute of Computing Technology, Chinese Academy of Sciences), Zidi Qin (Institute of Computing Technology, Chinese Academy of Sciences), Jianfeng Chi (Alibaba Group), Jinghua Feng (Alibaba Group), Hao Yang (Alibaba Group) and Qing He (Institute of Computing Technology, CAS). |
Graph-based fraud detection approaches have escalated lots of attention recently due to the abundant relational information of graph-structured data, which may be beneficial for the detection of fraudsters. However, the GNN-based algorithms could fare poorly when the label distribution of nodes is heavily skewed, and it is common in sensitive areas such as financial fraud, etc. To remedy the class imbalance problem of graph-based fraud detection, we propose a Pick and Choose Graph Neural Network (PC-GNN for short) for imbalanced supervised learning on graphs. First, nodes and edges are picked with a devised label-balanced sampler to construct sub-graphs for mini-batch training. Next, for each node in the sub-graph, the neighbor candidates are chosen by a proposed neighborhood sampler. Finally, information from the selected neighbors and different relations are aggregated to obtain the final representation of a target node. Experiments on both benchmark and real-world graph-based fraud detection tasks demonstrate that PC-GNN apparently outperforms state-of-the-art baselines. |
Title: Pivot-based Candidate Retrieval for Cross-lingual Entity Linking |
Authors: Qian Liu (University of Technology Sydney), Xiubo Geng (Microsoft Company), Jie Lu (University of Technology Sydney) and Daxin Jiang (Microsoft Company). |
Entity candidate retrieval plays a critical role in cross-lingual entity linking (XEL). In XEL, entity candidate retrieval needs to retrieve a list of plausible candidate entities from a large knowledge graph in a target language given a piece of text in a sentence or question, namely a mention, in a source language. Existing works mainly fall into two categories: lexicon- based and semantic-based approaches. The lexicon-based approach usually creates cross-lingual and mention-entity lexicons, which is effective but relies heavily on bilingual resources (e.g. inter-language links in Wikipedia). The semantic- based approach maps mentions and entities in different languages to a unified embedding space, which reduces dependence on large-scale bilingual dictionaries. However, its effectiveness is limited by the representation capacity of fixed-length vectors. In this paper, we propose a pivot-based approach which inherits the advantages of the aforementioned two approaches while avoiding their limitations. It takes an intermediary set of plausible target-language mentions as pivots to bridge the two types of gaps: cross-lingual gap and mention-entity gap. Specifically, it first converts mentions in the source language into an intermediary set of plausible mentions in the target language by cross-lingual semantic retrieval and a selective mechanism, and then retrieves candidate entities based on the generated mentions by lexical retrieval. The proposed approach only relies on a small bilingual word dictionary, and fully exploits the benefits of both lexical and semantic matching. Experimental results on two challenging cross-lingual entity linking datasets spanning over 11 languages show that the pivot-based approach outperforms both the lexicon-based and semantic-based approach by a large margin. All the datasets and codes will be publicly available. |
Title: Predicting Customer Value with Social Relationships via Motif-based Graph Attention Networks |
Authors: Jinghua Piao (Department of Electronic Engineering, Tsinghua University), Guozhen Zhang (Department of Electronic Engineering, Tsinghua University), Fengli Xu (Department of Computer Science & Engineering, HKUST), Zhilong Chen (Department of Electronic Engineering, Tsinghua University) and Yong Li (Department of Electronic Engineering, Tsinghua University). |
Customer value is essential for successful customer relationship management. Although growing evidence suggests that customers' purchase decisions can be influenced by social relationships, social influence is largely overlooked in previous research. In this work, we fill this gap with a novel framework -- Motif-based Multi-view Graph Attention Networks with Gated Fusion (MAG), which jointly considers customer demographics, past behaviors, and social network structures. Specifically, (1) to make the best use of higher-order information in complex social networks, we design a motif-based multi-view graph attention module, which explicitly captures different higher-order structures, along with the attention mechanism auto-assigning high weights to informative ones. (2) To model the complex effects of customer attributes and social influence, we propose a gated fusion module with two gates: one depicts the susceptibility to social influence and the other depicts the dependency of the two factors. Extensive experiments on a large-scale dataset show superior performance of our model over the state-of-the-art baselines. Further, we discover that the increase of motifs does not guarantee better performances and identify how motifs play different roles. These findings shed light on how to understand socio-economic relationships among customers and find high-value customers. |
Title: Privacy Policies over Time: Curation and Analysis of a Million-Document Dataset |
Authors: Ryan Amos (Princeton University), Gunes Acar (Katholieke Universiteit Leuven), Elena Lucherini (Princeton University), Mihir Kshirsagar (Princeton University), Arvind Narayanan (Princeton University) and Jonathan Mayer (Princeton University). |
Automated analysis of privacy policies has proved a fruitful research direction, with developments such as automated policy summarization, question answering systems, and compliance detection. So far, prior research has been limited to analysis of privacy policies from a single point in time or from short spans of time, as researchers did not have access to a large-scale, longitudinal, curated dataset. To address this gap, we developed a crawler that discovers, downloads, and extracts archived privacy policies from the Internet Archive’s Wayback Machine. Using the crawler and following a series of validation and quality control steps, we curated a dataset of 1,071,488 English language privacy policies, spanning over two decades and over 130,000 distinct websites. Our analyses of the data paint a troubling picture of the transparency and accessibility of privacy policies. We find evidence that the self-regulation industry has grown, but has largely been driven by advertising trade groups rather than first-party sites. Our results contribute to the literature demonstrating the widespread impact of the GDPR, and show that GDPR stands out in its impact. By comparing the abundance of tracking-related terminology in our dataset against prior works' measurements, we find that that privacy policies under-report the presence of many tracking technologies and all of the most common third parties. We also find that, while already shown to be inaccessible, over the last twenty years privacy policies have become even more difficult to read, doubling in length and increasing a full grade level in the median reading level. |
Title: Progressive, Holistic Geospatial Interlinking |
Authors: George Papadakis (National and Kapodistrian University of Athens), Georgios Mandilaras (National and Kapodistrian University of Athens), Nikos Mamoulis (University of Ioannina) and Manolis Koubarakis (National and Kapodistrian University of Athens). |
Geospatial data constitute a considerable part of Semantic Web data, but at the moment, its sources are not sufficiently interlinked with topological relations in the Linked Open Data cloud. Geospatial Interlinking aims to cover this gap through space tiling techniques, which significantly restrict the search space. Yet, the state-of-the-art techniques operate exclusively in a batch manner that produces results only after processing the entire input datasets. In each run, they are also restricted to searching for an individual topological relation, even though most operations are common for the 10 main relations. In this work, we address both issues: we introduce a batch algorithm that simultaneously computes all topological relations and define the task of Progressive Geospatial Interlinking, which produces results in a pay-as-you-go manner when the available computational or temporal resources are limited. We propose two progressive algorithms and explain how they can be adapted to massive parallelization with Apache Spark. We conduct a thorough experimental study over a six large, real datasets, demonstrating the superiority of our techniques over the current state-of-the-art. |
Title: Projected Hamming Dissimilarity for Bit-Level Importance Coding in Collaborative Filtering |
Authors: Christian Hansen (University of Copenhagen), Casper Hansen (University of Copenhagen), Jakob Grue Simonsen (Department of Computer Science, University of Copenhagen) and Christina Lioma (University of Copenhagen). |
When reasoning about tasks that involve large amounts of data, a common approach is to represent data items as objects in the Hamming space where operations can be done efficiently and effectively. Object similarity can then be computed by learning binary representations (hash codes) of the objects and computing their Hamming distance. While this is highly efficient, each bit dimension is equally weighted, which means that potentially discriminative information of the data is lost. A more expressive alternative is to use real-valued vector representations and compute their inner product; this allows varying the weight of each dimension but is many magnitudes slower. To fix this, we derive a new way of measuring the dissimilarity between two objects in the Hamming space with binary weighting of each dimension (i.e., disabling bits): we consider a field-agnostic dissimilarity that projects the vector of one object onto the vector of the other. When working in the Hamming space, this results in a novel projected Hamming dissimilarity, which by choice of projection, effectively allows a binary importance weighting of the hash code of one object through the hash code of the other. We propose a variational hashing model for learning hash codes optimized for this projected Hamming dissimilarity, and experimentally evaluate it in collaborative filtering experiments. The resultant hash codes lead to effectiveness gains of up to +7% in NDCG and +14% in MRR compared to state-of-the-art hashing-based collaborative filtering baselines, while requiring no additional storage and no computational overhead compared to using the Hamming distance. |
Title: Quiz-Style Question Generation for News Stories |
Authors: Adam D. Lelkes (Google Research), Vinh Tran (Google Research) and Cong Yu (Google Research). |
A large majority of American adults get at least some of their news from the Internet. Even though many online news products have the goal of informing their users about the news, they lack scalable and reliable tools for measuring how well they are achieving this goal, and therefore have to resort to noisy proxy metrics (e.g., click-through rates or reading time) to track their performance. As a first step towards measuring news informedness at a scale, we study the problem of quiz-style multiple-choice question generation, which may be used to survey users about their knowledge of recent news. In particular, we formulate the problem as two sequence-to-sequence tasks: question-answer generation (QAG) and distractor, or incorrect answer, generation (DG). We introduce NewsQuizQA, the first dataset intended for quiz-style question-answer generation, containing 20K human written question-answer pairs from 5K news article summaries. Using this dataset, we propose a series of novel techniques for applying large pre-trained Transformer encoder-decoder models, namely PEGASUS and T5, to the tasks of question-answer generation and distractor generation. We show that our models outperform strong baselines using both automated metrics and human raters. We provide a case study of running weekly quizzes on real-world users via the Google Surveys platform over the course of two months. We found that users generally found the automatically generated questions to be educational and enjoyable. Finally, to serve the research community, we plan to release the NewsQuizQA dataset upon publication. |
Title: Rabbit Holes and Taste Distortion: Distribution-Aware Recommendation with Evolving Interests |
Authors: Xing Zhao (Texas A&M University), Ziwei Zhu (Texas A&M University) and James Caverlee (Texas A&M University). |
To mitigate the rabbit hole effect in recommendations, conventional distribution-aware recommendation systems aim to ensure that a user’s prior interest areas are reflected in the recommendations that the system makes. For example, a user who historically prefers comedies to dramas by 2:1 should see a similar ratio in recommended movies. Such approaches have proven to be an important building block for recommendation tasks. However, existing distribution- aware approaches enforce that the target taste distribution should exactly match a user’s prior interests (typically revealed through training data), based on the assumption that users’ taste distribution is fundamentally static. This assumption can lead to large estimation errors. We empirically identify this taste distortion problem through a data- driven study over multiple datasets. We show how taste preferences dynamically shift and how the design of a calibration mechanism should be designed with these shifts in mind. We further demonstrate how to incorporate these shifts into a taste enhanced calibrated recommender system, which results in simultaneously mitigated both the rabbit hole effect and taste distortion problem. |
Title: Radflow: A Recurrent, Aggregated, and Decomposable Model for Networks of Time Series |
Authors: Alasdair Tran (Australian National University), Alexander Mathews (Australian National University), Cheng Soon Ong (CSIRO) and Lexing Xie (Australian National University). |
We propose a new model for networks of time series that influence each other. Graph structures among time series is found in diverse domains, such as web traffic influenced by hyperlinks, product sales influenced by recommendation, or urban transport volume influenced by the road network and weather. There has been recent progress in modeling graphs and time series, respectively, but an expressive and scalable approach for a network of series does not yet exist. We introduce Radflow, a novel model that embodies three main ideas: the recurrent structure of LSTM to obtain time- dependent node embeddings, aggregation of the flow of influence from other nodes with multi-head attention, and multi-layer decomposition of time series. Radflow naturally takes into account dynamic networks where nodes and edges appear over time, and it can be used for prediction and data imputation tasks. On four real-world datasets ranging from a few hundred to a few hundred thousand nodes, we observe Radflow variants being the best performing model across all tasks. We also report that the recurrent component in Radflow consistently outperforms N-BEATS, the state-of-the-art time series model. We show that Radflow can learn different trends and seasonal patterns, that it is robust to missing nodes and edges, and that correlated temporal patterns among network neighbors reflect influence strength. We curate WikiTraffic, the largest dynamic network of time series with 360K nodes and 22M time-dependent links spanning five years---this dataset provides an open benchmark for developing models in this area, and prototyping applications for problems such as estimating web resources and optimizing collaborative infrastructures. More broadly, Radflow can be used to improve the forecasts in correlated time series networks such as the stock market, or impute missing measurements of natural phenomena such as geographically dispersed networks of waterbodies. |
Title: Random Graphs with Prescribed K-Core Sequences: A New Null Model for Network Analysis |
Authors: Katherine Van Koevering (Cornell University), Austin Benson (Cornell University) and Jon Kleinberg (Cornell University). |
In the analysis of large-scale network data, a fundamental operation is the comparison of observed phenomena to the predictions provided by null models: when we find an interesting structure in a family of real networks, it is important to ask whether this structure is also likely to arise in random networks with similar characteristics to the real ones. A long- standing challenge in network analysis has been the relative scarcity of reasonable null models for networks; arguably the most common such model has been the configuration model, which starts with a graph G and produces a random graph with the same node degrees as G. This leads to a very weak form of null model, since fixing the node degrees does not preserve many of the crucial properties of the network, including the structure of its subgraphs. Guided by this challenge, we propose a new family of network null models that operate on the k-core decomposition. For a graph G, the k-core is its maximal subgraph of minimum degree k; and the core number of a node v in G is the largest k such that v belongs to the k-core of G. We provide the first efficient sampling algorithm to solve the following basic combinatorial problem: given a graph G, produce a random graph sampled nearly uniformly from among all graphs with the same sequence of core numbers as G. This opens the opportunity to compare observed networks G with random graphs that exhibit the same core numbers, a comparison that preserves aspects of the structure of G that are not captured by more local measures like the degree sequence. We illustrate the power of this core-based null model on some fundamental tasks in network analysis, including the enumeration of networks motifs. |
Title: Random Walks with Erasure: Diversifying Personalized Recommendations on Social and Information Networks |
Authors: Bibek Paudel (Stanford University) and Abraham Bernstein (University of Zurich). |
Most existing information filtering systems promote items that match a user's previous choices or those that are popular among other similar users. This results in recommendations that are highly similar to the information users are already exposed to, resulting in their isolation inside familiar but insulated information silos. In this context, we propose to develop a recommendation framework with a goal of improving information diversity. We focus on the problem of political content recommendation without any assumption about the availability of labels regarding the bias of users or content producers. By exploiting social network signals, we propose to first estimate ideological positions for both users and the information items they share. Based on these positions, we then generate diversified personalized recommendations using a modified random-walk based recommendation algorithm. With experimental evaluations on large datasets of twitter discussions, we show that our method based on random walks with erasure is able to generate more diverse recommendations. This research addresses a general problem and can be extended to recommendations in other domains, as we show with experiments on open benchmark datasets. |
Title: ReACt: A Resource-centric Access Control System for Web-app Interactions on Android |
Authors: Xin Zhang (SUNY Binghamton) and Yifan Zhang (SUNY Binghamton). |
We identify and survey five mechanisms through which web content interacts with mobile apps. While useful, these web- app interaction mechanisms cause various notable security vulnerabilities on mobile apps or web content. The root cause is lack of proper access control mechanisms for web-app interactions on mobile OSes. Existing solutions usually adopt either an origin-centric design or a code-centric deign, and suffer from one or several of the following limitations: coarse protection granularity, poor flexibility in terms of access control policy establishment, and incompatibility with existing apps/OSes due to the need of modifying the apps and/or the underlying OS. More importantly, none of the existing works can organically deal with all the five web-app interaction mechanisms. In this paper, we propose ReACt, a novel Resource-centric Access Control design that can coherently work with all the web-app interaction mechanisms while addressing the above-mentioned limitations. We have implemented a prototype system on Android, and performed extensive evaluation on it. The evaluation results show that our system works well with existing commercial off-the shelf Android apps and different versions of Android OS, and it can achieve the design goals with small overhead. |
Title: RECON: Relation Extraction using Knowledge Graph Context in a Graph Neural Network |
Authors: Anson Bastos (Indian Institute of Technology, Hyderabad), Abhishek Nadgeri (RWTH Aachen and Zerotha Research), Kuldeep Singh (Cerence GmbH and Zerotha Research), Isaiah Onando Mulang (University of Bonn and Zerotha Research), Saeedeh Shekarpour (University of Dayton), Johannes Hoffart (Goldman Sachs) and Manohar Kaul (Indian Institute of Technology, Hyderabad). |
In this paper, we present a novel method named RECON, that automatically identifies relations in a sentence (sentential relation extraction) and aligns to a knowledge graph (KG). RECON uses a graph neural network to learn representations of both the sentence as well as facts stored in a KG, improving the overall extraction quality. These facts, including entity attributes (label, alias, description, instance-of) and factual triples, have not been collectively used in the state of the art methods. We evaluate the effect of various forms of representing the KG context on the performance of RECON. The empirical evaluation on two standard relation extraction datasets shows that RECON significantly outperforms all state of the art methods on NYT Freebase and Wikidata datasets. RECON reports 87.23 F1 score (Vs 82.29 baseline) on Wikidata dataset whereas on NYT Freebase, our reported values are 87.5(P@10) and 74.1(P@30) compared to the previous baseline scores of 81.3(P@10) and 63.1(P@30). |
Title: Reinforcement Recommendation with User \Multi-aspect Preference |
Authors: Xu Chen (Gaoling School of Artificial Intelligence, Renmin University of China), Yali Du (University College London), Long Xia (School of Information Technology, York University) and Jun Wang (University College London). |
Formulating recommender system with reinforcement learning (RL) frameworks has attracted increasing attention from both academic and industry communities. While many promising results have been achieved, existing models mostly simulate the environment reward with a unified value, which may hinder the understanding of users' complex preferences and limit the model performance. In this paper, we consider how to model user multi-aspect preferences in the context of RL-based recommender system. More specifically, we base our model on the framework of deterministic policy gradient (DPG), which is effective in dealing with large action spaces. A major challenge for modeling user multi- aspect preferences lies in the fact that they may contradict with each other. To solve this problem, we introduce Pareto optimization into the DPG framework. We assign each aspect with a tailored critic, and all the critics share the same actor model. The Pareto optimization is realized by a gradient-based method, which can be easily integrated into the actor and critic learning process. Based on our designed model, we theoretically analyze its gradient bias in the optimization process, and we design a weight-reuse mechanism to lower the upper bound of this bias, which is shown effective for improving the model performance. We conduct extensive experiments on different real-world datasets to demonstrate our model's superiorities. |
Title: REST: Reciprocal Framework for Spatiotemporal coupled predictions |
Authors: Haozhe Lin (Tsinghua University), Yushun Fan (Tsinghua University), Jia Zhang (Southern Methodist University) and Bing Bai (Tencent). |
In recent years, Graph Convolutional Networks~(GCNs) have been applied to benefit spatiotemporal predictions. The current shell for spatiotemporal predictions often relies heavily on the quality of handcraft, fixed graphical structures, however, we argue that such paradigm could be expensive and sub-optimal in many applications. To raise the bar, this paper proposes to jointly mine the spatial dependencies and model temporal patterns in a coupled framework, i.e., to make spatiotemporal-coupled predictions. We come up with a novel Reciprocal SpatioTemporal~(REST) framework, which introduces Edge Inference Networks~(EINs) to couple with GCNs. From the temporal side to the spatial side, EINs infer spatial dependencies among time series vertices and generate multi-modal directed weighted graphs to serve GCNs. And from the temporal side to the spatial side, GCNs utilize these spatial dependencies to make predictions and then introduce feedback to optimize EINs. The REST framework is incrementally trained for higher performance of spatiotemporal prediction, powered by the reciprocity between its comprised two components from such an iterative joint learning process. Additionally, to maximize the power of the REST framework, we design a phased heuristic approach, which effectively stabilizes training procedure and prevents early-stop. Extensive experiments on two real- world datasets have demonstrated that the proposed REST framework significantly outperforms baselines, and can learn meaningful spatial dependencies beyond predefined graphical structures.footnote{The codes will be released upon the paper acceptance.} |
Title: REST: Relational Event-driven Stock Trend Forecasting |
Authors: Wentao Xu (Sun Yat-sen University), Weiqing Liu (Microsoft), Chang Xu (Microsoft), Jiang Bian (Microsoft), Jian Yin (Sun Yat-sen University) and Tie-Yan Liu (Microsoft). |
Stock trend forecasting, aiming at predicting the stock future trends, is crucial for investors to seek maximized profits from the stock market. In recent years, many event-driven methods utilized the events extracted from news, social media, and discussion board to forecast the stock trend. However, existing event-driven methods have two main shortcomings: 1) overlooking the influence of event information differentiated by the stock-dependent properties; 2) neglecting the effect of event information from other related stocks. In this paper, we propose a relational event-driven stock trend forecasting (REST) framework, which can address the shortcoming of existing methods. To remedy the first shortcoming, we propose to model the stock context and learn the effect of event information on the stocks under different contexts. To address the second shortcoming, we construct a stock graph and design a new propagation layer to propagate the effect of event information from related stocks. The experimental studies on the real-world data demonstrate the efficiency of our REST framework, and the results of investment simulation show that our framework can achieve a higher return of investment than baselines. |
Title: RETA: A Schema-Aware, End-to-End Solution for Instance Completion in Knowledge Graphs |
Authors: Paolo Rosso (University of Fribourg), Dingqi Yang (University of Fribourg), Natalia Ostapuk (University of Fribourg) and Philippe Cudré-Mauroux (University of Fribourg). |
Knowledge Graph (KG) completion has been widely studied to tackle the incompleteness issue (i.e., missing facts) in modern KGs. A fact in a KG is represented as a triplet (h,r,t) linking two entities h and t via a relation r. Existing work mostly consider link prediction to solve this problem, i.e., given two elements of a triplet predicting the missing one, such as (h,r,?). This task has, however, a strong assumption on the two given elements in a triplet, which have to be correlated, resulting otherwise in meaningless predictions, such as (Marie Curie, headquarters location, ?). In addition, the KG completion problem has also been formulated as a relation prediction task, i.e., when predicting relations r for a given entity h. Without predicting t, this task is however a step away from the ultimate goal of KG completion. Against this background, this paper studies an instance completion task suggesting r-t pairs for a given h, i.e., (h,?,?). We propose an end-to-end solution called RETA (as it suggests the Relation and Tail for a given head entity) consisting of two components: a RETA-Filter and RETA-Grader. More precisely, our RETA-Filter first generates candidate r-t pairs for a given h by extracting and leveraging the schema of a KG; our RETA-Grader then evaluates and ranks the candidate r-t pairs considering the plausibility of both the candidate triplet and its corresponding schema using a newly-designed KG embedding model. We evaluate our methods against a sizable collection of state-of-the-art techniques on three real- world KG datasets. Results show that our RETA-Filter generates of high-quality candidate r-t pairs, outperforming the best baseline techniques while reducing by 10.61%-84.75% the candidate size under the same candidate quality guarantees. Moreover, our RETA-Grader also significantly outperforms state-of-the-art link prediction techniques on the instance completion task by 16.25%-65.92% across different datasets. |
Title: RetaGNN: Relational Temporal Attentive Graph Neural Networks for Holistic Sequential Recommendation |
Authors: Cheng Hsu (National Cheng Kung University) and Cheng-Te Li (National Cheng Kung University). |
Sequential recommendation (SR) is to accurately recommend a list of items for a user based on her current accessed ones. While new-coming users continuously arrive in the real world, one crucial task is to have inductive SR that can produce embeddings of users and items without re-training. Given user-item interactions can be extremely sparse, another critical task is to have transferable SR that can transfer the knowledge derived from one domain with rich data to another domain. In this work, we aim to present the holistic SR that simultaneously accommodates conventional, inductive, and transferable settings. We propose a novel deep learning-based model, Relational Temporal Attentive Graph Neural Networks (RetaGNN), for holistic SR. The main idea of RetaGNN is three-fold. First, to have inductive and transferable capabilities, we train a relational attentive GNN on the local subgraph extracted from a user-item pair, in which the learnable weight matrices are on various relations among users, items, and attributes, rather than nodes or edges. Second, long-term and short-term temporal patterns of user preferences are encoded by a proposed sequential self-attention mechanism. Third, a relation-aware regularization term is devised for better training of RetaGNN. Experiments conducted on MovieLens, Instagram, and Book-Crossing datasets exhibit that RetaGNN can outperform state-of-the-art methods under conventional, inductive, and transferable settings. The derived attention weights also bring model explainability. |
Title: Revisiting the Evaluation Protocol of Knowledge Graph Completion Methods for Link Prediction |
Authors: Sudhanshu Tiwari (Rochester Institute of Technology), Iti Bansal (Rochester Institute of Technology) and Carlos R. Rivero (Rochester Institute of Technology). |
Completion methods learn models to infer missing (subject, predicate, object) triples in knowledge graphs, a task known as link prediction. The training phase is based on samples of positive triples and their negative counterparts. The test phase consists of ranking each positive triple with respect to its negative counterparts based on the scores obtained by a learned model. The best model ranks all positive triples first. Metrics like mean rank, mean reciprocal rank and hits at $k$ are used to assess accuracy. Under this generic evaluation protocol, we observe several shortcomings: 1)~Current metrics assume that each measurement is upper bounded by the same constant value and, therefore, are oblivious to the fact that, in link prediction, each positive triple has a different number of negative counterparts, which alters the difficulty of ranking positive triples. 2)~Benchmarking datasets contain anomalies (unrealistic redundancy) that allegedly simplifies link prediction; however, current instantiations of the generic evaluation protocol do not integrate anomalies, which are just discarded based on a user-defined threshold. 3)~Benchmarking datasets have been randomly split, which typically alters the graph topology and results in the training split not resembling the original dataset. 4)~A single model is typically kept based on its accuracy over the validation split using a given metric; however, since metrics aggregate ranks into a single value, there may be no significant differences among the ranks produced by several models, which must be all evaluated in the test phase. In this paper, we contribute to the evaluation of link prediction as follows: 1)~We propose a variation of the mean rank that considers the number of negative counterparts. 2)~We define the anomaly coefficient of a predicate and integrate such coefficient in the protocol. 3)~We propose a downscaling algorithm to generate training splits that reflect the original graph topology based on a nonparametric, unpaired statistical test. 4)~During validation, we discard a learned model only if its output ranks are significantly different than other ranks based on a nonparametric, paired statistical test. Our experiments over three well-known datasets show that translation-based methods (TransD, TransE and TransH) significantly outperform recent methods, which entails that our understanding of the accuracy of completion methods for link prediction is far from perfect. |
Title: RIGA: Covert and Robust White-Box Watermarking of Deep Neural Networks |
Authors: Tianhao Wang (Harvard University) and Florian Kerschbaum (University of Waterloo). |
Watermarking of deep neural networks (DNN) can enable their tracing once released by a data owner. In this paper, we generalize white-box watermarking algorithms for DNNs, where the data owner needs white-box access to the model to extract the watermark. %, and attack and defend them using DNNs. White-box watermarking algorithms have the advantage that they do not impact the accuracy of the watermarked model. We propose Robust whIte-box GAn watermarking (RIGA), a novel white-box watermarking algorithm that uses adversarial training. Our extensive experiments demonstrate that the proposed watermarking algorithm not only does not impact accuracy, but also significantly improves the covertness and robustness over the current state-of-art. |
Title: Robust Android Malware Detection Against Adversarial Example Attacks |
Authors: Heng Li (Huazhong University of Science and Technology), Shiyao Zhou (The Hong Kong Polytechnic University), Wei Yuan (Huazhong University of Science and Technology), Xiapu Luo (The Hong Kong Polytechnic University), Cuiying Gao (Huazhong University of Science and Technology) and Shuiyan Chen (Huazhong University of Science and Technology). |
Adversarial examples pose severe threats to Android malware detection because they can render the machine learning based detection systems useless. How to effectively detect Android malware under various adversarial example attacks becomes an essential but very challenging issue. Existing adversarial example defense mechanisms usually rely heavily on the instances or the knowledge of adversarial examples, and thus their usability and effectiveness are significantly limited because they often cannot resist the unseen-type adversarial examples. In this paper, we propose a novel robust Android malware detection approach that can resist adversarial examples without requiring their instances or knowledge by jointly investigating malware detection and adversarial example defenses. More precisely, our approach employs a new VAE (variational autoencoder) and an MLP (multi-layer perceptron) to detect malware, and combines their detection outcomes to make the final decision. In particular, we share a feature extraction network between the VAE and the MLP to reduce model complexity and design a new loss function to disentangle the features of different classes, hence improving detection performance. Extensive experiments confirm our model's advantage in accuracy and robustness. Our method outperforms $11$ state-of-the-art robust Android malware detection models when resisting $7$ kinds of adversarial example attacks. |
Title: Robust Generalization and Safe Query-Specialization in Counterfactual Learning to Rank |
Authors: Harrie Oosterhuis (University of Amsterdam) and Maarten de Rijke (University of Amsterdam & Ahold Delhaize). |
Existing work in counterfactual Learning to Rank (LTR) has focussed on optimizing feature-based models that predict the optimal ranking based on document features. LTR methods based on bandit algorithms often optimize tabular models that memorize the optimal ranking per query. These types of model have their own advantages and disadvantages. Feature-based models provide very robust performance across many queries, including those previously unseen, however, the available features often limit the rankings the model can predict. In contrast, tabular models can converge on any possible ranking through memorization. However, memorization is extremely prone to noise, which makes tabular models reliable only when large numbers of user interactions are available. Can we develop a robust counterfactual LTR method that pursues memorization-based optimization whenever it is safe to do? We introduce the Generalization and Specialization (GENSPEC) algorithm, a robust feature-based counterfactual LTR method that pursues per-query memorization when it is safe to do so. GENSPEC optimizes a single feature-based model for generalization: robust performance across all queries, and many tabular models for specialization: each optimized for high performance on a single query. GENSPEC uses novel relative high-confidence bounds to choose which model to deploy per query. By doing so, GENSPEC enjoys the high performance of successfully specialized tabular models with the robustness of a generalized feature-based model. Our results show that GENSPEC leads to optimal performance on queries with sufficient click data, while having robust behavior on queries with little or noisy data. |
Title: Robust Network Alignment via Attack Signal Scaling and Adversarial Perturbation Elimination |
Authors: Yang Zhou (Auburn University), Zeru Zhang (Auburn University), Sixing Wu (Peking University), Victor Sheng (Texas Tech University), Xiaoying Han (Auburn University), Zijie Zhang (Auburn University) and Ruoming Jin (Kent State University). |
Recent studies have shown that graph learning models are highly vulnerable to adversarial attacks, and network alignment methods are no exception. How to enhance the robustness of network alignment against adversarial attacks remains an open research problem. In this paper, we propose a robust network alignment solution, RNA, for offering preemptive protection of existing network alignment algorithms, enhanced with the guidance of effective adversarial attacks. First, we analyze how popular iterative gradient-based adversarial attack techniques suffer from gradient vanishing issues and show a fake sense of attack effectiveness. Based on dynamical isometry theory, an attack signal scaling (ASS) method with established upper bound of feasible signal scaling is introduced to alleviate the gradient vanishing issues for effective adversarial attacks while maintaining the decision boundary of network alignment. Second, we develop an adversarial perturbation elimination (APE) model to neutralize adversarial nodes in vulnerable space to adversarial-free nodes in safe area, by integrating Dirac delta approximation (DDA) techniques and the LSTM models. Our proposed APE method is able to provide proactive protection to existing network alignment algorithms against adversarial attacks. The theoretical analysis demonstrates the existence of an optimal distribution for the APE model to reach a lower bound. Last but not least, extensive evaluation on real datasets presents that RNA is able to offer the preemptive protection to trained network alignment methods against three popular adversarial attack models. |
Title: Role-Aware Modeling for N-ary Relational Knowledge Bases |
Authors: Yu Liu (Tsinghua University), Quanming Yao (4Paradigm Inc) and Yong Li (Tsinghua University). |
N-ary relational knowledge bases (KBs) represent knowledge with binary and beyond-binary relational facts. Especially, in an n-ary relational fact, the involved entities play different roles, e.g., the ternary relation PlayCharacterIn consists of three roles, Actor, Character and Movie. However, existing approaches are often directly extended from binary relational KBs, i.e., knowledge graphs, while missing the important semantic property of role. Therefore, we start from the role level, and propose a Role-Aware Modeling, RAM for short, for facts in n-ary relational KBs. RAM explores a latent space that contains basis vectors, and represents roles by linear combinations of these vectors. This way encourages semantically related roles to have close representations. RAM further introduces a pattern matrix that captures the compatibility between the role and all involved entities. To this end, it presents a multilinear scoring function to measure the plausibility of a fact composed by certain roles and entities. We show that RAM achieves both theoretical full expressiveness and computation efficiency, which also provides an elegant generalization for approaches in binary relational KBs. Experiments demonstrate that RAM outperforms representative baselines on both n-ary and binary relational datasets. |
Title: Rumor Detection with Field of Linear and Non-Linear Propagation |
Authors: An Lao (Beijing Institute of Technology), Chongyang Shi (Beijing Institute of Technology) and Yayi Yang (Beijing Institute of Technology). |
The propagation of rumors is a complex and varied phenomenon. In the process of rumor dissemination, in addition to rumor claims, there will be abundant social context information surrounding the rumor. Therefore, it is vital to learn the characteristics of rumors in terms of both a linear temporal sequence and non-linear diffusion structure simultaneously. However, in some recent research, time-dependent and diffusion-related information has not been fully utilized. Accordingly, in this paper, we propose a novel model Rumor Detection with Field of Linear and Non-Linear Propagation (RDLNP), which attempts to detect rumors from the above two fields automatically by taking advantage of claim content, social context and temporal information. First, the rumor hybrid feature learning (RHFL) we designed can extract the correlations of claims and temporal information in order to differentiate the hybrid features of specific posts and generate unified node embedding for rumors. Second, we proposed non-linear structure learning (NLSL) and linear sequence learning (LSL) to integrate contextual features along the path of the diffusion structure and the temporal engagement variation of responses respectively. Moreover, the introduction of stance attention grants the LSL have the ability to flexibly capture the dependency between the source node and the child nodes. Finally, shared feature learning (SFL) models the representation reinforcement and mutual influence between NLSL and LSL, then highlights their valuable features. Experiments conducted on two public and widely used datasets, i.e. PHEME and RumorEval, demonstrate both the effectiveness and the outstanding performance of the proposed approach. |
Title: Scalable Auto-weighted Discrete Multi-view Clustering |
Authors: Longqi Yang (State Key Laboratory of High Performance Computing, College of Computer, National University of Defense Technology), Liangliang Zhang (Beijing Institute of System Engineering) and Yuhua Tang (State Key Laboratory of High Performance Computing, College of Computer, National University of Defense Technology). |
Multi-view clustering has been widely studied in machine learning, which uses complementary information to improve clustering performance. However, challenges remain when handling large-scale multi-view data due to the traditional approaches' high time complexity. Besides, the existing approaches suffer from parameter selection. Due to the lack of labeled data, parameter selection in practical clustering applications is difficult, especially in big data. In this paper, we propose a novel approach for large-scale multi-view clustering to overcome the above challenges. Our approach focuses on learning the low-dimensional binary embedding of multi-view data, preserving the samples' local structure during binary embedding, and optimizing the embedding and clustering in a unified framework. Furthermore, we proposed to learn the parameters using a combination of data-driven and heuristic approaches. Experiments on five large-scale multi- view datasets show that the proposed method is superior to the state-of-the-art in terms of clustering quality and running time. |
Title: SDFVAE: Static and Dynamic Factorized VAE for Anomaly Detection of Multivariate CDN KPIs |
Authors: Liang Dai (Institute of Information Engineering, Chinese Academy of Sciences), Tao Lin (Communication University of China), Chang Liu (Institute of Information Engineering, Chinese Academy of Science & University of Chinese Academy of Sciences), Bo Jiang (School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University), Yanwei Liu (Institute of Information Engineering, Chinese Academy of Sciences), Zhen Xu (INSTITUTE OF INFORMATION ENGINEERING,CAS) and Zhi-Li Zhang (University of Minnesota). |
Content Delivery Networks (CDNs) are critical for providing good user experience of cloud services. CDN providers typically collect various multivariate Key Performance Indicators (KPIs) time series to monitor and diagnose system performance. State-of-the-art anomaly detection methods mostly use deep learning to extract the normal patterns of data, due to its superior performance. However, KPI data usually exhibit non-additive Gaussian noise, which makes it difficult for deep learning models to learn the normal patterns, resulting in degraded performance in anomaly detection. In this paper, we propose a robust and noise-resilient anomaly detection mechanism using multivariate KPIs. Our key insight is that different KPIs are constrained by certain time-invariant characteristics of the underlying system, and that explicitly modelling such invariance may help resist noise in the data. We thus propose a novel anomaly detection method called SDFVAE, short for Static and Dynamic Factorized VAE, that learns the representations of KPIs by explicitly factorizing the latent variables into dynamic and static parts. Extensive experiments using real-world data show that SDFVAE achieves a F1-score ranging from 0.92 to 0.99 on both regular and noisy dataset, outperforming state-of-the-art methods by a large margin. |
Title: Search Engines vs. Symptom Checkers: A Comparison of their Effectiveness for Online Health Advice |
Authors: Sebastian Cross (University of Queensland), Ahmed Mourad (The University of Queensland), Guido Zuccon (The University of Queensland) and Bevan Koopman (CSIRO). |
Increasingly, people go online to seek health advice. They commonly use the symptoms they are experiencing to identify the health conditions they may have (self-diagnosis task) as well as to determine an appropriate action to take (triaging task); e.g., should they seek emergent medical attention or attempt to treat themselves at home? This paper investigates the effectiveness of two of the most common methods people use for self-diagnosis and triaging: online symptom checkers and traditional web search engines. To this end, we conducted a user study with 64 real-world users performing 8 simulated self-diagnosis tasks. Participants were exposed to both a representative symptom checker and a search engine. The results of our study provides empirical evidence for whether using a search engine for health information improves people's understanding of their health condition and their ability to act on them, compared to interacting with a symptom checker, which bases its interaction model on a question-answering process. Additionally, recorded answers to qualitative questionnaires from study participants provide insights into which style of interaction and system they prefer to use for obtaining medical information, and how helpful they thought each system was. These findings can help inform the development of better search engines and symptom checkers that support people seeking health advice online. |
Title: Searching to Sparsify Tensor Decomposition for N-ary relational data |
Authors: Shimin Di (The Hong Kong University of Secience and Technology), Quanming Yao (4Paradigm Inc.) and Lei Chen (The Hong Kong University of Secience and Technology). |
Tensor, an extension of the vector and matrix to the multi-dimensional case, is a natural way to describe N-ary relational data. Recently, tensor decomposition methods have been introduced into N-ary relational data and become state-of-the- art on embedding learning. However, the performance of existing tensor decomposition methods is not as good as desired. First, they suffer from the data-sparsity issue since they can only learn from the N-ary relational data with a specific arity, i.e., parts of common N-ary relational data. Besides, they are neither effective nor efficient enough to be trained due to the over-parameterization problem. In this paper, we proposes a novel method, i.e., S2S, for effectively and efficiently learning from the N-ary relational data. Specifically, we propose a new tensor decomposition framework, which allows embedding sharing to learn from facts with mixed arity. Since the core tensors may still suffer from the over-parameterization, inspired by the success of neural architecture search (NAS) in designing data-dependent architectures, we propose to reduce parameters by sparsifying the core tensors while retaining their expressive power using NAS techniques. As a result, the proposed S2S not only guarantees to be expressive but also efficiently learns from mixed arity. Finally, empirical results have demonstrated that S2S is efficient to train and achieves state-of-the-art performance. |
Title: Security of Alerting Authorities in the WWW: Measuring Namespaces, DNSSEC, and Web PKI |
Authors: Pouyan Fotouhi Tehrani (Weizenbaum Institute / Fraunhofer FOKUS), Eric Osterweil (GMU), Jochen Schiller (FU Berlin), Thomas Schmidt (HAW Hamburg) and Matthias Wählisch (FU Berlin). |
During disasters, crisis, and emergencies the public relies on online services provided by official authorities to receive timely alerts, trustworthy information, and access to relief programs. It is therefore crucial for the authorities to reduce risks when accessing their online services. This includes catering to secure identification of service, secure resolution of name to network service, and content security and privacy as a minimum base for trustworthy communication. In this paper, we take a first look at Alerting Authorities (AA) in the US and investigate security measures related to trustworthy and secure communication. We study the domain namespace structure, DNSSEC penetration, and web certificates. We introduce an integrative threat model to better understand whether and how the online presence and services of AAs are harmed. As an illustrative example, we investigate 1,388 Alerting Authorities, backed by the United States Federal Emergency Management Agency (US FEMA). We observe partial heightened security relative to the global Internet trends, yet find cause for concern as about 80% of service providers fail to deploy measures of trustworthy service provision. Our analysis shows two major shortcomings. First, how the DNS ecosystem is leveraged: about 50% of organizations do not own their dedicated domain names and are dependent on others, 55% opt for unrestricted-use namespaces, which simplifies phishing, and less than 0.4% of unique AA domain names are secured by DNSSEC, which can lead to DNS poisoning and possibly to certificate misissuance. Second, how Web PKI certificates are utilized: 15% of all hosts provide none or invalid certificates, thus cannot cater to confidentiality and data integrity, 64% of the hosts provide domain validation certification that lack any identity information, and shared certificates have gained on popularity, which leads to fate-sharing and can be a cause for instability. |
Title: Self-Supervised Hyperboloid Representations from Logical Queries over Knowledge Graphs |
Authors: Nurendra Choudhary (Virginia Tech), Nikhil Rao (Amazon), Sumeet Katariya (Amazon), Karthik Subbian (Amazon) and Chandan K. Reddy (Virginia Tech). |
Knowledge Graphs (KGs) are ubiquitous structures for information storage in several real-world applications such as web search, ecommerce, social networks, and biology. Querying KGs remains a foundational and challenging problem due to their size and complexity. Promising approaches to tackle this problem include embedding the KG units (e.g., entities and relations) in a Euclidean space such that the query embedding contains the information relevant to its results. These approaches, however, fail to capture the hierarchical nature and semantic information of the entities present in the graph. Additionally, most of these approaches only utilize multihop queries (that can be modeled by simple translation operations) to learn embeddings and ignore more complex operations such as intersection and union of simpler queries. To tackle such complex operations, in this paper, we formulate KG representation learning as a self-supervised logical query reasoning problem that utilizes translation, intersection and union queries over KGs. We propose Hyperboloid Embeddings (HypE), a novel self-supervised dynamic reasoning framework, that utilizes positive first-order existential queries on a KG to learn representations of its entities and relations as hyperboloids in a Poincaré ball. HypE models the positive first-order queries as geometrical translation, intersection, and union. For the problem of KG reasoning in real- world datasets, the proposed HypE model significantly outperforms the state-of-the art results. We also apply HypE to an anomaly detection task on a popular e-commerce website product taxonomy as well as hierarchically organized web articles and demonstrate significant performance improvements compared to existing baseline methods. Finally, we also visualize the learned HypE embeddings in a Poincaré ball to clearly interpret and comprehend the representation space. |
Title: Self-Supervised Learning of Contextual Embeddings for Link Prediction in Heterogeneous Networks |
Authors: Ping Wang (Virginia Tech), Khushbu Agarwal (Pacific Northwest National Laboratory), Colby Ham (Pacific Northwest National Laboratory), Sutanay Choudhury (Pacific Northwest National Laboratory) and Chandan K. Reddy (Virginia Tech). |
Representation learning methods for heterogeneous networks produce a low-dimensional vector embedding for each node that is typically fixed for all tasks involving the node. Many of the existing methods focus on obtaining a static vector representation for a node in a way that is agnostic to the downstream application where it is being used. In practice, however, downstream tasks such as link prediction require specific contextual information that can be extracted from the subgraphs related to the nodes provided as input to the task. To tackle this challenge, we develop SLiCE, a framework bridging static representation learning methods using global information from the entire graph with localized attention driven mechanisms to learn contextual node representations. We first pre-train our model in a self-supervised manner by introducing higher-order semantic associations and masking nodes, and then fine-tune our model for a specific link prediction task. Instead of training node representations by aggregating information from all semantic neighbors connected via metapaths, we automatically learn the composition of different metapaths that characterize the context for a specific task without the need for any pre-defined metapaths. SLiCE significantly outperforms both static and contextual embedding learning methods on several publicly available benchmark network datasets. We also demonstrate the interpretability, effectiveness of contextual learning and the scalability of SLiCE through extensive evaluation. |
Title: Self-Supervised Multi-Channel Hypergraph Convolutional Network for Social Recommendation |
Authors: Junliang Yu (The University of Queensland), Hongzhi Yin (The University of Queensland), Jundong Li (University of Virginia), Qinyong Wang (The University of Queensland), Nguyen Quoc Viet Hung (Griffith University) and Xiangliang Zhang (King Abdullah University of Science and Technology). |
Social relations are often used to improve recommendation quality when user-item interaction data is sparse in recommender systems. Most existing social recommendation models exploit pairwise relations to mine potential user preferences. However, real-life interactions among users are much more complicated than we can imagine and relations can be high-order. In light of this, there is a need to think beyond pairwise interactions. Hypergraph provides a natural way to model complex high-order relations, and its potential for social recommendation has rarely been fully exploited. In this paper, we fill this gap and propose a multi-channel hypergraph convolutional network to enhance social recommendation by leveraging high-order user relations. Each channel in the network encodes a motif-induced hypergraph which depicts a common high-order user relation pattern in social recommender systems. By aggregating the embeddings learned via multiple channels, we obtain comprehensive user representations to generate recommendation results. However, the aggregation operation (pooling/attention mechanism) might obscure the features of different types of high-order information and discard the inherent characteristics of users. To compensate for the aggregating loss and fully inherit the rich information in the hypergraphs, we innovatively integrate self-supervised learning into the training of the hypergraph convolutional network. The self-supervised task serves as an auxiliary task to regularize the user representation by hierarchically maximizing the mutual information between representations of the user, the user- centered sub-hypergraph, and the hypergraph in each channel, with the intuition that the aggregated user representation should reflect the user node's local and global high-order connectivity patterns in different hypergraphs. Experimental results on multiple real-world datasets show that the proposed model outperforms the state-of-the-art baselines and further analysis verifies the rationality and effectiveness of the self-supervised task. |
Title: Semi-Open Information Extraction |
Authors: Bowen Yu (Institute of Information Engineering Chinese Academy of Sciences), Zhenyu Zhang (Institute of Information Engineering, Chinese Academy of Sciences), Jiawei Sheng (Institute of Information Engineering, Chinese Academy of Sciences), Tingwen Liu (Institute of Information Engineering, Chinese Academy of Sciences), Yubin Wang (Institute of Information Engineering, Chinese Academy of Sciences), Yucheng Wang (Institute of Information Engineering, Chinese Academy of Sciences) and Bin Wang (Xiaomi AI Lab, Xiaomi Inc). |
Open Information Extraction (OIE), the task aimed at discovering all textual facts organized in the form of (subject, predicate, object) found within a sentence, has gained much attention recently. However, in some knowledge-driven applications such as question answering, we often have a target entity and hope to obtain its structured factual knowledge for better understanding, instead of extracting all possible facts aimlessly from the corpus. In this paper, we define a new task, namely Semi-Open Information Extraction (SOIE), to address this need. The goal of SOIE is to discover domain-independent facts towards a particular entity from general and diverse web text. To facilitate research on this new task, we propose a large-scale human-annotated benchmark called SOIED, consisting of 61,984 facts for 8,013 subject entities annotated on 24,000 Chinese sentences collected from the web search engine. In addition, we propose a novel unified model called USE for this task. First, we introduce subject-guided sequence as input to a pre-trained language model and normalize the hidden representations conditioned on the subject embedding to encode the sentence in a subject-aware manner. Second, we decompose SOIE into three uncoupled subtasks: predicate extraction, object extraction, and boundary alignment. They can all be formulated as the problem of table filling by forming a two- dimensional tag table based on a task-specific tagging scheme. Third, we introduce a collaborative learning strategy that enables the interactive relations among subtasks to be better exploited by explicitly exchanging informative clues. Finally, we evaluate USE and several strong baselines on our new dataset. Experimental results demonstrate the advantages of the proposed method and reveal insight for future improvement. |
Title: SEPAL: Towards a Large-scale Analysis of SEAndroid Policy Customization |
Authors: Dongsong Yu (School of Cyber Security, UCAS and Institute of Information Engineering,CAS), Guangliang Yang (Georgia Institute of Technology), Guozhu Meng (Institute of Information Engineering, Chinese Academy of Sciences, China), Xiaorui Gong (Institute of Information Engineering, Chinese Academy of Science, China), Xiu Zhang (School of Cyber Security, UCAS and Institute of Information Engineering,CAS), Xiaobo Xiang (School of Cyber Security, UCAS and Institute of Information Engineering,CAS), Xiaoyu Wang (School of Cyber Security, UCAS and Institute of Information Engineering,CAS), Yue Jiang (School of Cyber Security, UCAS and Institute of Information Engineering,CAS), Kai Chen (Institute of Information Engineering, Chinese Academy of Sciences, China), Wei Zou (Institute of Information Engineering, Chinese Academy of Science, China), Wenke Lee (Georgia Institute of Technology) and Wenchang Shi (Renmin University of China, MOE, Beijing, China and Renmin University of China, Beijing, China). |
Nowadays, SEAndroid has been widely deployed in Android devices to enforce security policies and provide flexible mandatory access control (MAC), for the purpose of narrowing down attack surfaces and restricting risky operations (e.g., privilege escalation). Mobile device manufacturers have to customize policy rules and add their own rules to satisfy their functionality extensions. Ideally, these policy rules in SEAndroid should be carefully written, verified and maintained. However, many security issues have been found during the course of SEAndroid policy customization. Even worse, it is challenging to identify these issues due to the large and ever-increasing number of policy rules, as well as the complexity of policy semantics. To investigate the status quo of SEAndroid policy customization, we propose SEPAL, a universal tool to automatically retrieve and examine the customized policy rules. We perform a light-weight static analysis to extract atomic rules and runtime permission from Android firmware. Then, we employ natural language processing to construct a variety of features from both policy rules and their comments. A wide & deep model is trained to predict whether one rule is unregulated or not. SEPAL is evaluated to be effective in the classification on AOSP policy rules, and outperforms EASEAndroid by 15% accuracy rate on average. To evaluate its practicality, we collect 774 Android firmware images from 70 distinct manufacturers and extract 595,236 customized rules. SEPAL identifies 7,111 unregulated rules with a low false positive rate. With the help of SEPAL, we study the distribution of the unregulated rules. It shows that thanks to Google’s efforts, the security issues proposed in earlier studies have been significantly fixed in Android 7 era. However, after Android 8, the policy customization problem is getting worse again with the growing complexity of the policy - nearly 20% of the customized atomic rules in Android 9 are unregulated ones, while the percentage in Android 7 is less than 8%. We then summarized four common reasons why the unregulated rules are introduced by policy developers and how the rules compromise ALL categories of defenses provided by original SEAndroid. We further conduct two proof-of-concept attacks to validate their severity. Last, we report some unregulated rules to seven vendors and four of them confirm our findings. |
Title: SEPAR: Towards Regulating Future of Work Multi-Platform Crowdworking Environments with Privacy Guarantees |
Authors: Mohammadjavad Amiri (University of Pennsylvania), Joris Duguépéroux (Univ Rennes, CNRS, IRISA), Tristan Allard (Univ Rennes, CNRS, IRISA), Divyakant Agrawal (University of California Santa Barbara) and Amr El Abbadi (University of California Santa Barbara). |
Crowdworking platforms provide the opportunity for diverse workers to execute tasks for different requesters. The popularity of the "gig" economy has given rise to independent platforms that provide competing and complementary services. Workers as well as requesters with specific tasks may need to work for or avail from the services of multiple platforms resulting in the rise of multi-platform crowdworking systems. Recently, there has been increasing interest by governmental, legal and social institutions to enforce regulations, such as minimal and maximal work hours, on crowdworking platforms. Platforms within multi-platform crowdworking systems, therefore, need to collaborate to enforce cross-platform regulations. While collaborating to enforce global regulations requires the transparent sharing of information about tasks and their participants, the privacy of all participants needs to be preserved. In this paper, we propose an overall vision exploring the regulation, privacy, and architecture dimensions for the future of work multi- platform crowdworking environments. We then present SEPAR, a multi-platform crowdworking system that enforces a large sub-space of practical global regulations on a set of distributed independent platforms in a privacy-preserving manner. SEPAR enforces privacy using lightweight and anonymous tokens, while transparency is achieved using fault- tolerant blockchains shared across multiple platforms. The privacy guarantees of SEPAR against covert adversaries are formalized and thoroughly demonstrated, while the experiments reveal the efficiency of SEPAR in terms of performance and scalability. |
Title: Session-aware Linear Item-Item Models for Session-based Recommendation |
Authors: Minjin Choi (Sungkyunkwan University), Jinhong Kim (Sungkyunkwan University), Joonseok Lee (Google Research), Hyunjung Shim (Yonsei University) and Jongwuk Lee (Sungkyunkwan University). |
Session-based recommendation aims at predicting the next item given a sequence of previous items consumed in the session, e.g., on e-commerce or multimedia streaming services. Specifically, session data exhibits its unique characteristics, i.e., topical coherence and sequential dependency over items within the session, repeated item consumption, and timeliness of sessions. In this paper, we propose simple-yet-effective session-aware linear models, considering the holistic aspects of the sessions. This comprehensive nature of our models helps improve the quality of recommendations. More importantly, it provides a generalized framework for various types of session data. Because our models can be solved by a closed-form solution, they are highly scalable. Experimental results demonstrate that our simple linear models show competitive or state-of-the-art performance in various metrics on multiple real-world datasets. |
Title: Sinkhorn Collaborative Filtering |
Authors: Xiucheng Li (Nanyang Technological University), Jin Yao Chin (Nanyang Technological University), Yile Chen (Nanyang Technological University) and Gao Cong (Nanyang Technological University). |
Recommender systems play a vital role in modern web services. In a typical recommender system, we are given a set of observed user-item interaction records and seek to uncover the hidden behavioral patterns of users from these historical interactions. By exploiting these hidden patterns, we aim to discover users' personalized tastes and recommend them new items. Among various types of recommendation methods, the latent factor collaborative filtering models are the dominated ones. In this paper, we develop a unified view for the existing latent factor models from a probabilistic perspective. This is accomplished by interpreting the latent representations being drawn from their corresponding underlying representation distributions, and feeding the representations into a transformation function to obtain the parameters of the observation sampling distribution. The unified framework enables us to discern the underlying connections of different latent factor models and deepen our understandings of their advantages and limitations. In particular, we observe that the loss functions adopted by the existing models are oblivious to the geometry induced by the item-similarity and thus might lead to undesired models. To address this, we propose a novel model--- textsf{SinkhornCF}---based on Sinkhorn divergence. To address the challenge of the expensive computational cost of Sinkhorn divergence, we also propose new techniques to enable the resulting model to be able to scale to large datasets. Its effectiveness is verified on two real-world recommendation datasets. |
Title: Situation and Behavior Understanding by Trope Detection on Films |
Authors: Chen-Hsi Chang (National Taiwan University), Hung-Ting Su (National Taiwan University), Juiheng Hsu (National Taiwan University), Yu-Siang Wang (University of Toronto), Yu-Cheng Chang (National Taiwan University), Zhe Yu Liu (National Taiwan University), Ya-Liang Chang (National Taiwan University), Wen-Feng Cheng (National Taiwan University), Ke-Jyun Wang (National Taiwan University) and Winston Hsu (National Taiwan University). |
The human ability of deep cognitive skills are crucial for the development of various real-world applications that process diverse and abundant user generated input. While recent progress of deep learning and natural language processing have enabled learning system to reach human performance on some benchmarks requiring shallow semantics, such human ability still remains challenging for even modern contextual embedding models, as pointed out by many recent studies. Existing machine comprehension datasets assume sentence-level input, lack of casual or motivational inferences, or could be answered with question-answer bias. Here, we present a challenging novel task, trope detection on films, in an effort to create a situation and behavior understanding for machines. Tropes are storytelling devices that are frequently used as ingredients in recipes for creative works. Comparing to existing movie tag prediction tasks, tropes are more sophisticated as they can vary widely, from a moral concept to a series of circumstances, and embedded with motivations and cause-and-effects. We introduce a new dataset, Tropes in Movie Synopses (TiMoS), with 5623 movie synopses and 95 different tropes collecting from a Wikipedia-style database, TVTropes. We present a multi-stream comprehension network (MulCom) leveraging multi-level attention of words, sentences, and role relations. Experimental result demonstrates that modern models including BERT contextual embedding, movie tag prediction systems, and relational networks, perform at most 37% of human performance (23.97/64.87) in terms of F1 score. Our MulCom outperforms all modern baselines, by 1.5 to 5.0 F1 score and 1.5 to 3.0 mean of average precision (mAP) score. We also provide a detailed analysis and human evaluation to pave ways for future research. |
Title: Sketch-Based Algorithms for Approximate Shortest Paths in Road Networks |
Authors: Gaurav Aggarwal (Google), Sreenivas Gollapudi (Google), Raghavender R (Google) and Ali Kemal Sinop (Google). |
Constructing efficient data structures (distance oracles) for fast computation of shortest paths and other connectivity measures in graphs has been a promising area of study in computer science [TZ05, PR10, SGNP10]. In this paper, we propose a distance oracle for computing approximate shortest paths and alternate paths in road networks that exploits the existence of small separators in such networks. The existence of a small cut in a graph allows us to partition the graph into balanced components with a small number of inter-component edges. Specifically, we demonstrate the efficacy of our algorithm by using it to find near optimal shortest paths. Our algorithm also has the desired properties of ALT algorithms [Goldberg, Harrelson'05], such as exploring very few edges and fast query times. We further extend our distance oracle to produce multiple alternative routes, and we empirically demonstrate that our method, while exploring few edges, produces high quality alternates with respect to optimality-loss and diversity of paths. |
Title: Slot Self-Attentive Dialogue State Tracking |
Authors: Fanghua Ye (University College London), Jarana Manotumruksa (University College London), Qiang Zhang (University College London), Shenghui Li (Uppsala University) and Emine Yilmaz (University College London). |
An indispensable component in task-oriented dialogue systems is the dialogue state tracker, which keeps track of users' intentions in the course of conversation. The typical approach towards this goal is to fill in multiple pre-defined slots that are essential to complete the task. Although various dialogue state tracking methods have been proposed in recent years, most of them predict the value of each slot separately and fail to consider the correlations among slots. In this paper, we propose a slot self-attention mechanism that can learn the slot correlations automatically. Specifically, a slot-token attention is first utilized to obtain slot-specific features from the dialogue context. Then a stacked slot self-attention is applied on these features to learn the correlations among slots. We conduct comprehensive experiments on two multi- domain task-oriented dialogue datasets, including MultiWOZ 2.0 and MultiWOZ 2.1. The experimental results demonstrate that our approach achieves state-of-the-art performance on both datasets, which verifies the necessity and effectiveness of taking slot correlations into consideration. |
Title: Soft-mask: Adaptive Substructure Extractions for Graph Neural Networks |
Authors: Mingqi Yang (Dalian University of Technology), Yanming Shen (Dalian University of Technology), Heng Qi (Dalian University of Technology) and Baocai Yin (Dalian University of Technology). |
For learning graph representations, not all detailed structures within a graph are relevant to the given graph tasks. Task- relevant structures can be $localized$ or $sparse$ which are only involved in subgraphs or characterized by the interactions of subgraphs (a hierarchical perspective). A graph neural network should be able to efficiently extract task- relevant structures and be invariant to irrelevant parts, which is challenging for general massage passing GNNs. In this work, we propose to learn graph representations from a sequence of subgraphs of the original graph to better capture task-relevant substructures or hierarchical structures and skip $noisy$ parts. To this end, we design soft-mask GNN layer to extract desired subgraphs through the mask mechanism. The soft-mask is defined in a continuous space to maintain the differentiability and characterize the weights of different parts. Compared with existing subgraph or hierarchical representations learning methods and graph pooling operations, the soft-mask GNN layer is not limited by the fixed sample or drop ratio, and therefore is more flexible to extract subgraphs with arbitrary sizes. Extensive experiments on public graph benchmarks show that soft-mask mechanism brings performance improvements. And it also provides interpretability where visualizing the values of masks in each layer allows us to have an insight into the structures learned by the model. |
Title: SRVAR: Joint Discrete Hidden State Discovery and Structure Learning from Time Series Data |
Authors: Tsung-Yu Hsieh (The Pennsylvania State University), Yiwei Sun (The Pennsylvania State University), Xianfeng Tang (The Pennsylvania State University), Suhang Wang (The Pennsylvania State University) and Vasant Honavar (The Pennsylvania State University). |
Learning DAG structured inter-variable dependencies from observational data finds wide range of potentials given its convenient explainability that is favored in today's high stakes applications of artificial intelligence. In a variety of scientific disciplines, the data generation mechanism exhibits time-varying characteristics and hence it calls for effective methods to address the problem of time-varying structure learning from time-series data. Meanwhile, many practical time-varying systems show state transitioning property. In addition, state space models provide great generalization ability and interpretable results. Therefore, it makes learning time-varying systems with state space models appealing. Against this, we study the novel problem of jointly discrete hidden state monitoring and dynamic structure learning from multivariate time series data and introduce the State-Regularized Vector Autoregression Model (SRVAR). SRVAR exploits state-regularized recurrent neural network to discover underlying finite discrete state transition pattern while leveraging dynamic vector autoregression model together with a recent algebraic result to learn state-dependent inter-variable dependencies. Results of extensive experiments on simulated data as well as a real-world data show the superiority of SRVAR over state-of-the-art baselines at recovering the unobserved state transitions and discovering the state- dependent inter-variable relationships. |
Title: STAN: Spatio-Temporal Attention Network for next Point-of-Interest Recommendation |
Authors: Yingtao Luo (University of Washington), Qiang Liu (RealAI) and Zhaocheng Liu (RealAI). |
Next Point-of-Interest recommendation is at the core of various location-based applications. Current state-of-the-art models have attempted to solve spatial sparsity with hierarchical gridding and model temporal relation with explicit time intervals, while some vital questions remain unsolved. Non-adjacent locations and non-consecutive visits provide non- trivial correlations for understanding a user's behavior, but were rarely considered in previous models. To aggregate all relevant visits from user trajectory and recall the most plausible candidates from weighted representations, here we propose a Spatio-Temporal Attention Network (STAN) for location recommendation. STAN explicitly exploits relative spatio-temporal information of all the check-ins with self-attention layers along the trajectory. This improvement allows a point-to-point interaction between non-adjacent locations and non-consecutive check-ins with explicit spatio-temporal effect. STAN uses a bi-layer self-attention architecture that firstly aggregates spatio-temporal correlation within user trajectory and then recalls the target with consideration of personalized item frequency (PIF). By visualization, we show that STAN is in line with the above intuition. Experimental results unequivocally show that our model outperforms the existing state-of-the-art methods by more than 10%. |
Title: Stimuli-Sensitive Hawkes Processes for Personalized Student Procrastination Modeling |
Authors: Mengfan Yao (univeristy at albany), Siqian Zhao (Suny albany), Shaghayegh Sahebi (University at Albany - SUNY) and Reza Feyzi Behnagh (University at Albany - SUNY). |
Student procrastination and cramming for deadlines are major challenges in online learning environments, with negative educational and mental health side effects. Modeling student activities in continuous time and predicting their next study time are important problems that can help in creating personalized timely interventions to mitigate these challenges. However, previous attempts on dynamic modeling of student procrastination suffer from major issues: they are unable to predict the next activity times, cannot deal with missing activity history, are not personalized, and ignore important course properties, such as assignment deadlines, that are essential in explaining the cramming behavior. To resolve these problems, we introduce a new personalized stimuli-senstive Hawkes process model (SSHP), by jointly modeling all student-assignment pairs and utilizing their similarities, to predict students' next activity times even when there are no historical observations. Unlike regular point processes that assume a constant external triggering effect from the environment, we model three dynamic types of external stimuli, according to assignment availabilities, assignment deadlines, and each student's time management habits. Our experiments on a synthetic dataset and two real- world datasets show a superior performance of future activity prediction, comparing with state-of-the-art models. Moreover, we show that our model achieves a flexible and accurate parameterization of student activity intensities. |
Title: Stochastic bandits for multi-platform budget optimization in online advertising |
Authors: Vashist Avadhanula (Facebook), Riccardo Colini Baldeschi (Facebook), Stefano Leonardi (Sapienza University of Rome), Karthik Abinav Sankararaman (Facebook) and Okke Schrijvers (Facebook). |
We study the problem of an online advertising system that wants to optimally spend an advertiser's given budget for a campaign across multiple platforms, without knowing the value for showing an ad to the users on those platforms. We model this challenging practical application as a Stochastic Bandits with Knapsacks problem over $T$ rounds of bidding with the set of arms given by the set of distinct bidding $m$-tuples, where $m$ is the number of platforms. This paper makes three contributions: First, we give algorithms that efficiently spend the budget across platforms for both discrete and continuous bid spaces: despite the exponential number of arms that we use in the stochastic bandits modeling, the regret only grows polynomially with $m$, $T$, the size $n$ of the discrete bid space of the platforms, and the budget $B$. Namely, for discrete bid spaces we give an algorithm with regret $Oleft(OPT sqrt {frac{mn}{B} }+ sqrt{mn OPT}right)$, where $OPT$ is the performance of the optimal algorithm that knows the distributions. For continuous bid spaces the regret of our algorithm is $tilde{O}left(m^{1/3} cdot minleft{ B^{2/3}, (m T)^{2/3} right} right)$. Secondly, we show an $ Omegaleft (sqrt {m OPT} right)$ lower bound for the discrete case and an $Omegaleft( m^{1/3} B^{2/3}right)$ lower bound for the continuous setting, almost matching the upper bounds. Finally, we use a real-world data set from a large internet online advertising company with multiple ad platforms and show that our algorithms outperform common benchmarks. |
Title: Strongly Local Hypergraph Diffusions for Clustering and Semi-supervised Learning |
Authors: Meng Liu (Computer Science, Purdue University), Nate Veldt (Center for Applied Mathematics, Cornell University), Haoyu Song (Computer Science, Purdue University), Pan Li (Computer Science, Purdue University) and David Gleich (Computer Science, Purdue University). |
Hypergraph-based machine learning methods are now widely recognized as important for modeling and using higher- order and multiway relationships between data objects. Local hypergraph clustering and semi-supervised learning specifically involve finding a well-connected set of nodes near a given set of labeled vertices. Although many methods for local clustering exist for graphs, there are relatively few for localized clustering in hypergraphs. Moreover, those that exist often lack flexibility to model a general class of hypergraph cut functions or cannot scale to large problems. To tackle these issues, this paper proposes a new diffusion-based hypergraph clustering algorithm that solves a quadratic hypergraph cut based objective akin to a hypergraph analog of Andersen-Chung-Lang personalized PageRank clustering for graphs. We prove that, for graphs with fixed maximum hyperedge size, this method is strongly local, meaning that its runtime only depends on the size of the output instead of the size of the hypergraph and is highly scalable. Moreover, our method enables us to compute with a wide variety of cardinality-based hypergraph cut functions. We also prove that the clusters found by solving the new objective function satisfy a Cheeger-like quality guarantee. We demonstrate that on large real-world hypergraphs our new method finds better clusters and runs much faster than existing approaches. Specifically, it runs in few seconds for hypergraphs with a few million hyperedges compared with minutes for flow-based technique. We furthermore show that our framework is general enough that can also be used to solve other p-norm based cut objectives on hypergraphs. |
Title: Structure-Augmented Text Representation Learning for Efficient Knowledge Graph Completion |
Authors: Bo Wang (JiLin University), Tao Shen (University of Technology Sydney), Guodong Long (University of Technology Sydney), Tianyi Zhou (University of Washington), Ying Wang (JiLin University) and Yi Chang (Jilin University). |
Human-curated knowledge graphs provide critical supportive information to various natural language processing tasks, but these graphs are usually incomplete, urging auto-completion of them (a.k.a. knowledge graph completion). Prevalent graph embedding approaches, e.g., TransE, learn structured knowledge via representing graph elements (i.e., entities/relations) into dense embeddings and capturing their triple-level relationship with spatial distance. However, they are hardly generalizable to the elements never visited in training and are intrinsically vulnerable to graph incompleteness. In contrast, textual encoding approaches, e.g., KG-BERT, resort to graph triple's text and triple-level contextualized representations. They are generalizable enough and robust to the incompleteness, especially when coupled with pre-trained encoders. But two major drawbacks limit the performance: (1) high overheads due to the costly scoring of all possible triples in inference, and (2) a lack of structured knowledge in the textual encoder. In this paper, we follow the textual encoding paradigm and aim to alleviate its drawbacks by augmenting it with graph embedding techniques -- a complementary hybrid of both paradigms. Specifically, we partition each triple into two asymmetric parts as in translation-based graph embedding approach, and encode both parts into contextualized representations by a Siamese-style textual encoder. Built upon the representations, our model employs both deterministic classifier and spatial measurement for representation and structure learning respectively. It thus reduces the overheads by reusing graph elements' embeddings to avoid combinatorial explosion, and enhances structured knowledge by exploring the spatial characteristics. Moreover, we develop a self-adaptive ensemble scheme to further improve the performance by incorporating triple scores from an existing graph embedding model. In experiments, we achieve state-of-the-art performance on three benchmarks and a zero-shot dataset for link prediction, with highlights of inference costs reduced by 1-2 orders of magnitude compared to a sophisticated textual encoding method. |
Title: STruD: Truss Decomposition of Simplicial Complexes |
Authors: Giulia Preti (ISI Foundation), Gianmarco De Francisci Morales (ISI Foundation) and Francesco Bonchi (ISI Foundation). |
A simplicial complex is a generalization of a graph: a collection of n-ary relationships (instead of binary as the edges of a graph), named simplices. In this paper, we develop a new tool to study the structure of simplicial complexes: we generalize the graph notion of truss decomposition to complexes, and show that this more powerful representation gives rise to different properties compared to the graph-based one. This power, however, comes with important computational challenges derived from the combinatorial explosion caused by the downward closure property of complexes. Drawing upon ideas from itemset mining and similarity search, we design a memory-aware algorithm, dubbed STruD, which is able to efficiently compute the truss decomposition of a simplicial complex. STruD adapts its behavior to the amount of available memory by storing intermediate data in a compact way. We then devise a variant that computes directly the n simplices of maximum trussness. By applying STruD to several datasets, we prove its scalability, and provide an analysis of their structure. Finally, we show that the truss decomposition can be seen as a "filtration", and as such it can be used to study the persistent homology of a dataset, a method for computing topological features at different spatial resolutions, prominent in Topological Data Analysis. |
Title: STUaNet: Understanding uncertainty in spatiotemporal collective human mobility |
Authors: Zhengyang Zhou (University of Science and Technology of China), Yang Wang (University of Science and Technology of China), Xike Xie (University of Science and Technology of China), Lei Qiao (Beijing Institute of Control Engineering) and Yuantao Li (University of Science and Technology of China). |
The high dynamics and heterogeneous interactions in the complicated urban systems have raised the issue of uncertainty quantification in spatiotemporal human mobility, to support critical decision-makings in risk-aware web applications such as urban event detection and commercial promotion where fluctuations are of significant interests. Given the fact that uncertainty quantifies the potential variations around prediction results, traditional learning schemes always lack definite uncertainty labels in training, and conventional uncertainty quantification approaches mostly rely upon statistical estimations with dropout Bayesian neural networks or ensemble methods. However, they have never involved any spatiotemporal evolution of uncertainties under various contexts, and also have kept suffering from the poor efficiency of statistical uncertainty estimation while training models for multiple times. To provide high-quality uncertainty quantification for spatiotemporal forecasting, we propose an interpretable uncertainty learning mechanism to simultaneously estimate internal data quality and quantify external uncertainty regarding various contextual interactions. To address the issue of lacking labels of uncertainty, we propose a hierarchical data turbulence scheme where we can actively inject controllable uncertainty for guidance, and hence provide insights to both uncertainty quantification and weak supervised learning. Finally, we re-calibrate and boost the prediction performance by devising a gated-based bridge to adaptively leverage the learned uncertainty into predictions. Extensive experiments on three real-world spatiotemporal mobility sets have corroborated the superiority of our proposed model in terms of both forecasting and uncertainty quantification. |
Title: SUGAR: Subgraph Neural Network with Reinforcement Pooling and Self-Supervised Mutual Information Mechanism |
Authors: Qingyun Sun (Beihang University), Hao Peng (Beihang University), Jianxin Li (Beihang University), Yuanxing Ning (Beihang University), Jia Wu (Macquarie University), Philip S. Yu (University of Illinois at Chicago) and Lifang He (Lehigh University). |
Graph representation learning has attracted increasing research attention. However, most existing studies fuse all structural features and node attributes to provide an overarching view of graphs, neglecting finer substructures' semantics, and suffering from interpretation enigmas. This paper presents a novel hierarchical subgraph-level selection and embedding based graph neural network for graph classification, namely SUGAR, to learn more discriminative subgraph representations and respond in an explanatory way. SUGAR reconstructs a sketched graph by extracting striking subgraphs as the representative part of the original graph to reveal subgraph-level patterns. To adaptively select striking subgraphs without prior knowledge, we develop a reinforcement pooling mechanism, which improves the generalization ability of the model. To differentiate subgraph representations among graphs, we present a self- supervised mutual information mechanism to encourage subgraph embedding to be mindful of the global graph structural properties by maximizing their mutual information. Extensive experiments on six typical bioinformatics datasets demonstrate a significant and consistent improvement in model quality with competitive performance and interpretability. |
Title: Superways: A Datacenter Topology for Incast-Heavy Workloads |
Authors: Hamed Rezaei (University of Illinois at Chicago) and Balajee Vamanan (University of Illinois at Chicago). |
Several important datacenter applications cause incast congestion, which severely degrades flow completion times of short flows and throughput of long flows. Further, because most flows are short and the incast duration is shorter than typical round-trip times, reactive mechanisms that rely on congestion control are not effective. While modern datacenter topologies provide high bisection bandwidth to support all-to-all traffic, incast is fundamentally a many-to-one traffic pattern, and therefore, requires deep buffers or high bandwidth at the network edge. Deep buffers for high-speed routers is prohibitively expensive and incur high design complexity. Further, deep buffers would likely increase end-to- end delay and may render some congestion control schemes unstable. We propose Superways, a heterogeneous datacenter topology that provides higher bandwidth for some servers to absorb incasts, as incasts occur only at a small number of servers that aggregate responses from many senders. Our design is based on the key observation that a small subset of servers which aggregate responses are likely to be network bound, whereas most other servers that communicate only with random servers are not. Superways can be implemented over many of the existing datacenter topologies and can be expanded flexibly without incurring high cost and cabling complexity. We also provide a heuristic for scheduling jobs in our topology to fully utilize the extra capacity. Using a real CloudLab implementation and using ns-3 simulations, we show that Superways significantly improves flow completion times and throughput over existing datacenter topologies. We also analyze cost and cabling complexity, and discuss how to expand our topology. |
Title: Surrounded by the Clouds |
Authors: Lorenzo Corneo (Uppsala University), Maximilian Eder (Technical University of Munich), Nitinder Mohan (Technical University of Munich), Aleksandr Zavodovski (Uppsala University), Suzan Bayhan (University of Twente), Walter Wong (University of Helsinki), Per Gunningberg (Uppsala University), Jussi Kangasharju (University of Helsinki) and Jörg Ott (Technical University of Munich). |
Cloud computing, with its seemingly unlimited storage capacity and computational capabilities, has become during the years the de-facto computing paradigm that ensures scalability and performance where the limited hardware of commodity devices fail. In the early days, datacenters were sparsely deployed at distant locations from the end-users and were delivering high end-to-end communication latency. However, today's cloud datacenters have become more geographically spread, the speed and bandwidth of the networks keep increasing, hence pushing the end-users latency down. In this paper, we provide a state-of-the-art cloud reachability study as we perform extensive client-to-cloud latency measurements towards 200 datacenters deployed globally from the major cloud providers. We leverage the well- known measurements platform RIPE Atlas, involving in our study up to 12000 probes deployed in heterogeneous environments, e.g., home and offices. In order to evaluate and quantify the current state of cloud computing, we compare our latency results against three known timing thresholds, namely, human reaction time, perceivable latency and motion-to-photons. These three thresholds provide a good meter to check whether novel applications, e.g., augmented reality, can be supported by the current cloud infrastructure. We find that a good portion of the world's population can access cloud datacenters even within the motion-to-photon, the most stringent threshold. |
Title: Target-adaptive Graph for Cross-target Stance Detection |
Authors: Bin Liang (School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen)), Yonghao Fu (School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen)), Lin Gui (Department of Computer Science, University of Warwick), Min Yang (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences), Jiachen Du (School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen)), Yulan He (Department of Computer Science, University of Warwick) and Ruifeng Xu (School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen)). |
Target plays an essential role in stance detection of an opinionated review/claim, since the stance expressed in the text often depends on the target. In practice, we need to deal with targets unseen in the annotated training data. As such, detecting stance for an unknown or unseen target is an important research problem. This paper presents a novel approach that automatically identifies and adapts the target-dependent and target-independent roles that a word plays with respect to a specific target in stance expressions, so as to achieve cross-target stance detection. More concretely, we explore a novel solution of constructing heterogeneous target-adaptive semantic dependency graphs (TSDG) for each sentence towards a given target. An in-target graph is constructed to produce inherent semantic dependencies of words for a distinct target. In addition, another cross-target graph is constructed to develop the versatility of words across all targets for boosting the learning of dominant word-level stance expressions available to an unknown target. A novel graph-aware model with interactive Graphical Convolutional Network (GCN) blocks is developed to derive the target- adaptive graph representation of the context for stance detection. The experimental results on a number of benchmark datasets show that our proposed model outperforms state-of-the-art methods in cross-target stance detection. |
Title: Task-adaptive Neural Process for User Cold-Start Recommendation |
Authors: Xixun Lin (Institute of Information Engineering, Chinese Academy of Sciences), Jia Wu (Department of Computing, Macquarie University), Chuan Zhou (Academy of Mathematics and Systems Science, Chinese Academy of Sciences), Shirui Pan (Monash University), Yanan Cao (Institute of Information Engineering, Chinese Academy of Sciences) and Bin Wang (Xiaomi AI Lab). |
User cold-start recommendation is a long-standing challenge for recommender systems due to the fact that only few interactions of cold-start users can be exploited. Recent studies investigate how to address such challenge from the perspective of meta learning. Most current meta-learning models follow a manner of parameter initialization, where the model parameters can be learned by a few steps of gradient updates. While these gradient-based meta-learning models achieve promising performances to some extent, a fundamental problem of them is how to adapt the global knowledge learned from previous tasks for the recommendations of cold-start users more effectively. In this paper, we develop a novel meta-learning model called task-adaptive neural process (TaNP). TaNP is a new member of the neural process family, where making recommendations for each user is associated with a corresponding stochastic process. TaNP directly maps the observed interactions of each user to a predictive distribution, sidestepping some training issues in gradient-based meta-learning models. More importantly, to balance the trade-off between model capacity and adaption reliability, we introduce a novel task-adaptive mechanism which enables TaNP to learn the relevance of different tasks as well as customizing the global knowledge to the task-related model parameters of our decoder for estimating the user preference. We validate TaNP on multiple benchmark datasets with different experimental settings. Empirical results demonstrate that TaNP yields consistent improvements over several state-of-the-art meta-learning recommenders. |
Title: Taxonomy-aware Learning for Few-shot Event Detection |
Authors: Jianming Zheng (National University of Defense Technology), Fei Cai (National University of Defense Technology), Wanyu Chen (National University of Defense Technology), Wengqiang Lei (National University of Singapore) and Honghui Chen (National University of Defense Technology). |
Event detection classifies unlabeled sentences into event labels, which can benefit numerous applications including information retrieval, question answering and script learning. One of the major obstacles to event detection in reality is insufficient training data. To deal with the low-resource problem, we investigate few-shot event detection in this paper and propose TaLeM, a novel taxonomy-aware learning model, consisting of two components, i.e., the taxonomy-aware self-supervised learning framework (TaSeLF) and the taxonomy-aware prototypical networks (TaPN). Specifically, TaSeLF mines the taxonomy-aware distance relations to increase the training examples, which alleviates the generalization bottleneck brought by the insufficient data. TaPN introduces the Poincare embeddings to represent the label taxonomy, and integrates them into the task-adaptive projection networks, which tackles the problems of the class centroids distribution and the taxonomy-aware embedding distribution in the vanilla prototypical networks. Extensive experiments in four types of meta tasks demonstrate the superiority of our proposal over the competitive baselines, and verify the effectiveness as well as importance of modeling the label taxonomy. |
Title: TCN: Table Convolutional Network for Web Table Interpretation |
Authors: Daheng Wang (University of Notre Dame), Prashant Shiralkar (Amazon), Colin Lockard (Amazon), Binxuan Huang (Amazon), Xin Luna Dong (Amazon) and Meng Jiang (University of Notre Dame). |
Information extraction from semi-structured webpages provides valuable long-tailed facts for augmenting knowledge graph. Relational Web tables are a critical component containing additional entities and attributes of rich and diverse knowledge. However, extracting knowledge from relational tables is challenging because of sparse contextual information where each table cell typically contains only one or a few words. Existing work linearize table cells and heavily rely on modifying deep language models such as BERT which only captures related cells information in the same table. In this work, we propose a novel relational table representation learning approach considering both the intra- and inter-table contextual information. On one hand, the proposed Table Convolutional Network model employs the attention mechanism to adaptively focus on the most informative intra-table cells of the same row or column; and, on the other hand, it aggregates inter-table contextual information from various types of implicit conections between cells across different tables. Specifically, we propose three novel aggregation modules for (i) cells of the same value, (ii) cells of the same schema position, and (iii) cells linked to the same page topic. We further devise a supervised multi-task training objective for jointly predicting column type and pairwise column relation, as well as a table cell recovery objective for pre-training. Experiments on real Web table datasets demonstrate our method can outperform competitive baselines by +4.8% of F1 for column type prediction and by +4.1% of F1 for pairwise column relation prediction. |
Title: TEDIC: Neural Modeling of Behavioral Patterns in Dynamic Social Interaction Networks |
Authors: Yanbang Wang (Stanford University), Pan Li (Purdue University), Chongyang Bai (Dartmouth College) and Jure Leskovec (Stanford University). |
Extracting patterns from dynamic social interaction networks containing time-stamped social interactions such as eye contact between people is critical for inferring people's social characters and relationship. Previous approaches on extracting those patterns primarily rely on sophisticated expert knowledge of psychology and social science, and the obtained features are often overly task-specific. More generic models based on representation learning of dynamic networks may be applied but the unique properties of social interactions cause severe model mismatch and degenerates the quality of the obtained representations. Here we fill this gap by proposing a novel framework, termed Temporal Network-diffusion Convolutional networks (TEDIC), for generic representation learning on dynamic social interaction networks. We make TEDIC a good fit by designing two components: (1) Adopt diffusion of node attributes over a combination of the original network and its complement to capture long-hop interactive patterns embedded in the behaviors of people making or avoiding contact; (2) Leverage temporal convolution networks with hierarchical set- pooling operation to flexibly extract patterns from different-length interactions scattered over a long time span. The design also endows TEDIC with certain self-explaining power. We evaluate TEDIC over five real datasets for four different social character prediction tasks including deception detection, dominance identification, nervousness detection and community detection. It not only consistently outperforms previous SOTA's, but also provides two important pieces of social insight. In addition, it exhibits favorable societal characteristics by remaining unbiased to people from different regions. |
Title: Temporal Analysis of the Entire Ethereum Blockchain Network |
Authors: Lin Zhao (Nanyang Technological University, Singapore), Sourav Sen Gupta (Nanyang Technological University, Singapore), Arijit Khan (Nanyang Technological University, Singapore) and Robby Luo (Advanced Micro Devices, Inc.). |
With over 42 billion USD market capitalization (October 2020), Ethereum is the largest public blockchain that supports smart contracts. Recent works have modeled transactions, tokens, and other interactions in the Ethereum blockchain as static graphs to provide new observations and insights by conducting relevant graph analysis. Surprisingly, there is much less study on the evolution and temporal properties of these networks. In this paper, we investigate the evolutionary nature of Ethereum interaction networks from a temporal graphs perspective. We study the growth rate and model of four Ethereum blockchain networks, active lifespan and update rate of high-degree vertices. We detect anomalies based on temporal changes in global network properties, and forecast the survival of network communities in succeeding months leveraging on the relevant graph features and machine learning models. |
Title: TG-GAN: Continuous-time Temporal Graph Deep Generative Models with Time-Validity Constraints |
Authors: Liming Zhang (George Mason University), Liang Zhao (George Mason University), Dieter Pfoser (George Mason University), Shan Qin (Beijing University of Posts and Telecommunications) and Chen Ling (Emory University). |
Recent deep generative models for static graphs have focused on areas such as molecular design. However, many real- world problems involve temporal graphs whose topology and attribute values evolve dynamically over time. Examples include theoretical models such as activity-driven networks and link-node memories, as well as important applications such as protein folding, human mobility networks, and social network growth. % This work proposes the ``Temporal Graph Generative Adversarial Network'' (TG-GAN) model for continuous-time temporally-bounded graph generation, which captures the deep generative process of temporal graphs through compositions of time-budgeted temporal walks, which are themselves composed of truncated temporal walks. Specifically, a novel temporal graph generator is proposed that jointly models truncated edge sequences, time budgets, and node attributes, incorporating novel activation functions that enforce temporal validity constraints under a recurrent architecture. In addition, a new temporal graph discriminator is proposed that combines time and node encoding operations over a recurrent architecture to distinguish generated sequences from real ones sampled by a newly-developed truncated temporal walk sampler. Extensive experiments on both synthetic and real-world datasets confirm that TG-GAN significantly outperforms five bench- marking methods in terms of efficiency and effectiveness. |
Title: The Interaction between Political Typology and Filter Bubbles in News Recommendation Algorithms |
Authors: Ping Liu (Illinois Institute of Technology), Karthik Shivaram (Tulane University), Aron Culotta (Tulane University), Matthew A. Shapiro (Illinois Institute of Technology) and Mustafa Bilgic (Illinois Institute of Technology). |
Algorithmic personalization of news and social media content aims to improve user experience; however, there is evidence that this filtering can have the unintended side effect of creating homogeneous "filter bubbles," in which users are over-exposed to ideas that conform with their preexisting perceptions and beliefs. In this paper, we investigate this phenomenon in the context of political news recommendation algorithms, which have important implications for civil discourse. We first collect and curate a collection of over 900K news articles from over 40 sources annotated by topic and partisan lean. We then conduct simulation studies to investigate how different algorithmic strategies affect filter bubble formation. Drawing on Pew studies of political typologies, we identify heterogeneous effects based on the user's pre- existing preferences. For example, we find that i) users with more extreme preferences are shown less diverse content but have higher click-through rates than users with less extreme preferences, ii) content-based and collaborative-filtering recommenders result in markedly different filter bubbles, and iii) when users have divergent views on different topics, recommenders tend to have a homogenization effect. |
Title: The Structure of Toxic Conversations on Twitter |
Authors: Martin Saveski (Massachusetts Institute of Technology), Brandon Roy (Massachusetts Institute of Technology) and Deb Roy (Massachusetts Institute of Technology). |
Social media platforms promise to enable rich and vibrant conversations online; however, their potential is often hindered by antisocial behaviors. In this paper, we study the relationship between structure and toxicity in conversations on Twitter. We collect 1.18M conversations (58.5M tweets, 4.4M users) prompted by tweets that are posted by or mention major news outlets over one year and candidates who ran in the 2018 US midterm elections over four months. We analyze the conversations at the individual, dyad, and group level. At the individual level, we find that toxicity is spread across many low to moderately toxic users. At the dyad level, we observe that toxic replies are more likely to come from users who do not have any social connection nor share many common friends with the poster. At the group level, we find that toxic conversations tend to have larger, wider, and deeper reply trees, but sparser follow graphs. To test the predictive power of the conversational structure, we consider two prediction tasks. In the first prediction task, we demonstrate that the structural features can be used to predict whether the conversation will become toxic as early as the first ten replies. In the second prediction task, we show that the structural characteristics of the conversation are also predictive of whether the next reply posted by a specific user will be toxic or not. We observe that the structural and linguistic characteristics of the conversations are complementary in both prediction tasks. Our findings inform the design of healthier social media platforms and demonstrate that models based on the structural characteristics of conversations can be used to detect early signs of toxicity and potentially steer conversations in a less toxic direction. |
Title: The Surprising Performance of Simple Baselines for Misinformation Detection |
Authors: Kellin Pelrine (McGill University; Mila - Quebec AI Institute), Jacob Danovitch (McGill University; Mila - Quebec AI Institute) and Reihaneh Rabbany (McGill University; Mila - Quebec AI Institute). |
As social media becomes increasingly prominent in our day to day lives, it is increasingly important to detect informative content and prevent the spread of disinformation and unverified rumours. While many sophisticated and successful models have been proposed in the literature, they are often compared with older NLP baselines such as SVMs, CNNs, and LSTMs. In this paper, we examine the performance of a broad set of modern transformer-based language models and show that with basic fine-tuning, these models are competitive with and can even significantly outperform recently proposed state-of-the-art methods. We present our framework as a baseline for creating and evaluating new methods for misinformation detection. We further study a comprehensive set of benchmark datasets, and discuss potential data leakage and the need for careful design of the experiments and understanding of datasets to account for confounding variables. As an extreme case example, we show that classifying only based on the first three digits of tweet ids, which contain information on the date, gives state-of-the-art performance on a commonly used benchmark dataset for fake news detection - Twitter16. We provide a simple tool to detect this problem and suggest steps to mitigate it in future datasets. |
Title: Theoretically Improving Graph Neural Networks via Anonymous Walk Graph Kernels |
Authors: Qingqing Long (Peking University), Yilun Jin (The Hong Kong University of Science and Technology), Yi Wu (Peking University) and Guojie Song (Peking University). |
Graph neural networks (GNNs) have achieved tremendous success in graph mining. However, the inability of GNNs to model substructures in graphs remains a significant drawback. Specifically, message-passing GNNs (MPGNNs), as the prevailing type of GNNs, have been theoretically shown unable to distinguish, detect or count many graph substructures. While efforts have been paid to complement the inability, existing works either rely on pre-defined substructure sets, thus being less flexible, or are lacking in theoretical insights. In this paper, we propose GSKN, a GNN model with theoretically stronger ability to distinguish graph structures. Specifically, we design GSKN based on anonymous walks (AWs), flexible substructure units, and derive it upon feature mappings of graph kernels (GKs). We theoretically show that GSKN provably extends the 1-WL test, and hence the maximally powerful MPGNNs from both graph-level and node-level viewpoints. Correspondingly, both graph and node classification experiments are leveraged to evaluate GSKN, where GSKN outperforms a wide range of baselines on both synthetic and real-world datasets, endorsing the analysis. |
Title: Time-series Change Point Detection with Self-Supervised Contrastive Predictive Coding |
Authors: Shohreh Deldari (RMIT University), Daniel V. Smith (CSIRO), Hao Xue (RMIT University) and Flora D. Salim (RMIT University). |
Change Point Detection techniques aim to capture changes in trends and sequences in time-series data to describe the underlying behaviour of the system. Detecting changes and anomalies in the web services, the trend of applications usage can provide valuable insight towards the system, however, many existing approaches are done in a supervised manner, requiring well-labelled data. As the amount of data produced and captured by sensors are growing rapidly, it is getting harder and even impossible to annotate the data. Therefore, coming up with a self-supervised solution is a necessity these days. In this work, we propose TS-CP2 a novel self-supervised technique for temporal change point detection, based on representation learning with Temporal Convolutional Network (TCN). To the best of our knowledge, our proposed method is the first method which employs Contrastive Learning for prediction with the aim change point detection. Through extensive evaluations, we demonstrate that our method outperforms multiple state-of-the-art change point detection and anomaly detection baselines, including those adopting the either unsupervised or semi- supervised approach. TS-CP2 is shown to improve both non-Deep learning- and Deep learning-based methods by 0.28 and 0.12 in terms of average F1-score across three datasets. |
Title: TLS 1.3 in Practice: How TLS 1.3 Contributes to Internet |
Authors: Hyunwoo Lee (Purdue University), Doowon Kim (University of Tennessee, Knoxville) and Yonghwi Kwon (University of Virginia). |
Transport Layer Security (TLS) has become the norm for secure communication over the Internet. In August 2018, TLS 1.3, the latest version that improves the security and performance of the previous TLS version, was approved. In this paper, we take a closer look at TLS 1.3 deployments in practice regarding adoption rate, security, performance, and implementation by applying temporal, spatial, and platform-based approaches on 587M connections. Overall, TLS 1.3 that has rapidly adopted mainly due to third-party platforms such as Content Delivery Networks (CDNs) makes a significant contribution to the Internet. In fact, it deprecates vulnerable cryptographic primitives and substantially reduces the time required to perform the TLS 1.3 full handshake compared to the TLS 1.2 handshake. We quantify these aspects and show TLS 1.3 is beneficial to websites that do not rely on the third-party platforms. We also review Common Vulnerabilities and Exposures (CVE) regarding TLS libraries and show that many of recent vulnerabilities can be easily addressed by upgrading to TLS 1.3. However, some websites exhibit unstable support for TLS 1.3 due to multiple platforms with different TLS versions or migration to other platforms, which means that a website can show the lower TLS version at a certain time or from a certain region. Furthermore, we find that most of the implementations (including TLS libraries) do not fully support the new features of TLS 1.3 such as downgrade protection and certificate extensions. |
Title: Topic-enhanced knowledge-aware retrieval model for diverse relevance estimation |
Authors: Xiangsheng Li (Tsinghua University), Jiaxin Mao (Tsinghua University), Weizhi Ma (Tsinghua University), Yiqun Liu (Tsinghua University), Min Zhang (Tsinghua University), Shaoping Ma (Tsinghua university), Zhaowei Wang (Noah`s Ark Lab, Huawei) and Xiuqiang He (Noah`s Ark Lab, Huawei). |
Relevance is a key notion in information retrieval. It measures the relation between query and document which contains several different dimensions, e.g., semantic similarity, topical relatedness, cognitive relevance (the relations in the aspect of knowledge), usefulness, timeliness, utility and so on. However, existing retrieval models mainly focus on semantic similarity and cognitive relevance while ignore other possible dimensions to model relevance. Topical relatedness, as an important dimension to measure relevance, is not well studied in existing neural information retrieval. In this paper, we propose a Topic Enhanced Knowledge-aware retrieval Model (TEKM) that jointly learns semantic similarity, knowledge relevance and topical relatedness to estimate relevance between query and document. We first construct a neural topic model to learn topical information and generate topic embeddings of a query. Then we combine the topic embeddings with a knowledge-aware retrieval model to estimate different dimensions of relevance. Specifically, we exploit kernel pooling to soft match topic embeddings with word and entity in a unified embedding space to generate fine-grained topical relatedness. The whole model is trained in an end-to-end manner. Experiments on a large-scale publicly available benchmark dataset show that TEKM outperforms existing retrieval models. Further analysis also shows how topic relatedness is modeled to improve traditional retrieval model with semantic similarity and knowledge relevance |
Title: Touchscreen Exploration of Visual Artwork for Blind People |
Authors: Dragan Ahmetovic (University of Milan), Nahyun Kwon (Ewha Womans University), Uran Oh (Ewha Womans University), Cristian Bernareggi (University of Milan) and Sergio Mascetti (University of Milan). |
This paper investigates how touchscreen exploration and verbal feedback can be used to support blind people to access visual artwork. We present two artwork exploration modalities. The first one, attribute-based exploration, extends prior work on touchscreen image accessibility, and provides fine-grained segmentation of artwork visual elements; when the user touches an element, the associated attributes are read. The second one, hierarchical exploration, is designed with domain experts and provides multi-level segmentation of the artwork; the user initially accesses a general description of the entire artwork and then explores a coarse segmentation of the visual elements with the corresponding high-level descriptions; once selected, coarse segments are subdivided into fine-grained ones, which the user can access for more detailed descriptions. The two exploration modalities, implemented as a mobile web app, were evaluated through a user study with 10 blind participants. Both modalities were appreciated by the participants. Attribute-based exploration is perceived to be easier to access. Instead, the hierarchical exploration was considered more understandable, useful, interesting and captivating, and the participants remembered more details about the artwork with this modality. Participants commented that the two modalities work well together and therefore both should be made available. |
Title: Towards a Better Understanding of Query Reformulation Behavior in Web Search |
Authors: Jia Chen (Tsinghua University), Jiaxin Mao (Tsinghua University), Yiqun Liu (Tsinghua University), Fan Zhang (Tsinghua University), Min Zhang (Tsinghua University) and Shaoping Ma (Tsinghua University). |
As queries submitted by users directly affect search experiences, how to organize queries has always been a research focus in Web search studies. While search request becomes complex and exploratory, many search sessions contain more than a single query thus reformulation becomes a necessity. To help users better formulate their queries in these complex search tasks, modern search engines usually provide a series of reformulation entries on search engine result pages (SERPs), i.e., query suggestions and related entities. However, few existing work have thoroughly studied why and how users perform query reformulations in these heterogeneous interfaces. Therefore, whether search engines provide sufficient assistance for users in reformulating queries remains under-investigated. To shed light on this research question, we conducted a field study to analyze fine-grained user reformulation behaviors including reformulation type, entry, reason, and the inspiration source with various search intents. Different from existing efforts that rely on external assessors to make judgments, in the field study we collect both implicit behavior signals and explicit user feedback information. Analysis results demonstrate that query reformulation behavior in Web search varies with the type of search tasks. We also found that the current query suggestion/related query recommendations provided by search engines do not offer enough help for users in complex search tasks. Based on the findings in our field study, we design a supervised learning framework to predict: 1) the reason behind each query reformulation, and 2) how users organize the reformulated query, both of which are novel challenges in this domain. This work provides insight into complex query reformulation behavior in Web search as well as the guidance for designing better query suggestion techniques in search engines. |
Title: Towards a Lightweight, Hybrid Approach for Detecting DOM XSS Vulnerabilities with Machine Learning |
Authors: William Melicher (Carnegie Mellon University), Clement Fung (Carnegie Mellon University), Lujo Bauer (Carnegie Mellon University) and Limin Jia (Carnegie Mellon University). |
Client-side cross-site scripting (DOM XSS) vulnerabilities in web applications are common, hard to identify, and difficult to prevent. Taint tracking is the most promising approach for detecting DOM XSS with high precision and recall, but is too computationally expensive for many practical uses, e.g., in a web browser or for offline analysis at scale. We investigate whether machine learning (ML) classifiers can replace or augment taint tracking as a method to identify DOM XSS vulnerabilities. Through a large-scale web crawl we collect over 18 billion JavaScript functions and use taint tracking to label over 180,000 functions as potentially vulnerable. With this data, we train a 3-layer, feed-forward deep neural network (DNN) to analyze a JavaScript function and predict if it is vulnerable. In the process, we experiment with a range of hyperparameters and show how to train a low-latency, high-recall classifier that could serve as a pre-filter to taint tracking to reduce the cost of stand-alone taint tracking by 3.43x while detecting 94.5% of unique vulnerabilities. We argue that this combination of a DNN and taint tracking is efficient enough for a range of use cases for which taint tracking by itself is not, including in-browser run-time DOM XSS detection and analyzing large codebases. |
Title: Towards Content Provider Aware Recommender Systems: A Simulation Study on the Interplay between User and Provider Utilities |
Authors: Ruohan Zhan (Stanford University), Konstantina Christakopoulou (Google), Elaine Le (Google), Jayden Ooi (Google), Martin Mladenov (Google), Alex Beutel (Google), Craig Boutilier (Google), Ed Chi (Google) and Minmin Chen (Google). |
Most existing recommender systems focus primarily on matching users (content consumers) to contents which maximizes user satisfaction on the platform. It is increasingly obvious, however, that content providers have a critical influence on user satisfaction through content creation, largely determining the content pool available for recommendation. A natural question thus arises: can we design recommenders taking into account the long-term utility of both users and content providers? By doing so, we hope to sustain more content providers and a more diverse content pool for long-term user satisfaction. Understanding the full impact of recommendations on both user and content provider groups is challenging. This paper aims to serve as a research investigation of one approach toward building a content provider aware recommender, and evaluating its impact in a simulated setup. To characterize the users-recommender-providers interdependence, we complement user modeling by formalizing provider dynamics as well. The resulting joint dynamical system gives rise to a weakly-coupled partially observable Markov decision process driven by recommender actions and user feedback to providers. We then build a REINFORCE recommender agent, coined EcoAgent, to optimize a joint objective combining user utility and the counterfactual utility lift of the content provider associated with the recommended content, which we show to be equivalent to maximizing overall user utility and the utilities of all content providers on the platform. To evaluate our approach, we introduce a simulation environment capturing the key interactions among users, providers, and the recommender. We offer a number of simulated experiments that shed light on both the benefits and the limitations of our approach. These results help understand how and when a content provider aware recommender agent is of benefit in building multi-stakeholder recommender systems. |
Title: Towards Efficient Auctions for Auto-bidders |
Authors: Yuan Deng (Google), Jieming Mao (Google), Vahab Mirrokni (Google) and Song Zuo (Google). |
Auto-bidding has become one of the main options for bidding in online advertisements, in which advertisers only need to specify high-level objectives and leave the complex task of bidding to auto-bidders. In this paper, we propose a family of auctions with boosts to improve welfare for auto-bidders with both return on ad spend constraints and budget constraints. Our empirical results validate our theoretical findings and show that both the welfare and revenue can be improved by selecting the weight of the boosts properly. |
Title: Towards Facilitating Empathic Conversations in Online Mental Health Support: A Reinforcement Learning Approach |
Authors: Ashish Sharma (University of Washington), Inna Lin (University of Washington), Adam Miner (Stanford University), David Atkins (University of Washington) and Tim Althoff (University of Washington). |
Online peer-to-peer support platforms enable conversations between millions of people who seek and provide mental health support. If successful, web-based mental health conversations could improve access to treatment and reduce the global disease burden. Psychologists have repeatedly demonstrated that empathy, the ability to understand and feel the emotions and experiences of others, is a key component leading to positive outcomes in supportive conversations. However, recent studies have shown that highly empathic conversations are rare in online mental health platforms. In this paper, we work towards improving empathy in online mental health support conversations. We introduce a new task of Empathic Rewriting which aims to transform low-empathy conversational posts to higher empathy. Learning such transformations is challenging and requires a deep understanding of empathy while maintaining conversation quality through text fluency and specificity to the conversational context. Here we propose Partner, a deep reinforcement learning (RL) agent that learns to make sentence-level edits to posts in order to increase the expressed level of empathy while maintaining conversation quality. Our RL agent leverages a policy network, based on a transformer language model adapted from GPT-2, which performs the dual task of generating candidate empathic sentences and adding those sentences at appropriate positions. During training, we reward transformations that increase empathy in posts while maintaining text fluency, context specificity, and diversity. Through a combination of automatic and human evaluations we demonstrate that our approach successfully generates more empathic, diverse, and context-specific responses and outperforms current natural language processing methods from related tasks such as style transfer and sequence-to- sequence generation. This work has direct implications for facilitating empathetic conversations on web-based platforms. |
Title: Towards Realistic and Reproducible Web Crawl Measurements |
Authors: Jordan Jueckstock (North Carolina State University), Shaown Sarker (North Carolina State University), Peter Snyder (Brave Software), Aidan Beggs (North Carolina State University), Panagiotis Papadopoulos (Telefonica Research), Matteo Varvello (Nokia Bell Labs), Ben Livshits (Brave Software / Imperial College London) and Alexandros Kapravelos (North Carolina State University). |
Accurate web measurement is critical for understanding and improving security and privacy online. Implicit in these measurements is the assumption that automated crawls generalize to the experiences of typical web users, despite significant anecdotal evidence to the contrary. Anecdotal evidence suggests that the web behaves differently when approached from well known measurement endpoints, or with well known measurement and automation frameworks, for reasons ranging from DDOS detection, hiding malicious behavior, or bot detection. This work explores improving the state of web privacy and security by investigating how, and in what ways, privacy and security measurements change when using typical web measurement tools, compared to measurement configurations intentionally designed to match “real” web users. We build a web measurement framework encompassing network endpoints and browser configurations ranging from off-the-shelf defaults commonly used in research studies to configurations more representative of typical web users, and we note the effect of realism factors on security and privacy relevant measurements when applied to the Tranco top 25k web domains. We find that web privacy and security measurements are significantly affected by measurement vantage point and browser configuration, and conclude that unless researchers carefully consider if and how their web measurement tools match real world users, the research community is likely systematically missing important signals. For example, we find that browser configuration alone can cause shifts in 19% of known ad and tracking domains encountered, and similarly affects the loading frequency of up to 10% of distinct families of JavaScript code units executed. We also find that choice of measurement network points have similar, though less dramatic, affects on privacy and security measurements. To aid the measurement replicability, and to aid future web research, we share our dataset and precise measurement configurations. |
Title: Towards Understanding and Demystifying Bitcoin Mixing Services |
Authors: Lei Wu (Zhejiang University), Yufeng Hu (Zhejiang University), Yajin Zhou (Zhejiang University), Haoyu Wang (Beijing University of Posts and Telecommunications), Xiapu Luo (The Hong Kong Polytechnic University), Zhi Wang (Florida State University), Fan Zhang (Zhejiang University) and Kui Ren (Zhejiang University). |
The popularity of Bitcoin benefits a lot from its anonymity. However, the anonymity of Bitcoin is pseudonymity, or relationship anonymity between addresses. Researchers have proposed several heuristics to break it by clustering addresses. Meanwhile, new approaches are invented to provide enhanced anonymity. As one of the most promising approaches, many third-party services named mixing services have emerged and been widely used in recent years. Unfortunately, they are already abused to facilitate money laundering for criminal activities. Despite that there is an urgent need to understand Bitcoin mixing services, however, few studies have been proposed to systematically demystify them. In this paper, we take the first step to study state-of-the-art Bitcoin mixing services. Specifically, we propose a generic abstraction model for mixing services. According to our investigation, most mixing services share a same three-phase procedure but differ in mixing mechanisms. There are two mechanisms used in the wild, i.e. swapping and obfuscating. According to this model, we conducted a graph-based analysis and successfully revealed the mixing mechanisms and workflows of four representative mixing services. Besides, we propose a method to further demystify mixing services that apply obfuscating mechanism by identifying mixing transactions. The proposed approach is capable of identifying most (over 92%) of the mixing transactions. Based on identified transactions, we then estimated the profit of mixing services and provided a case study of tracing the money flow of stolen Bitcoins. |
Title: Towards Understanding Cryptocurrency Derivatives: A Case Study of BitMEX |
Authors: Kyle Soska (Carnegie Mellon University), Jin-Dong Dong (Carnegie Mellon University), Alex Khodaverdian (University of California, Berkeley), Ariel Zetlin-Jones (Carnegie Mellon University), Bryan Routledge (Carnegie Mellon University) and Nicolas Christin (Carnegie Mellon University). |
Since 2018, the cryptocurrency trading landscape has undergone a transformation from a collection of spot markets (fiat for cryptocurrency) to a hybrid ecosystem containing complex and popular derivatives products. In this paper we explore this new paradigm through a deep-dive of BitMEX, one of the first and most successful derivatives platforms for leveraged cryptocurrency trading, which trades on average over 3 billion dollars worth of volume per day, and allows users to go long or short Bitcoin with up to 100x leverage. To understand the complete picture of cryptocurrency derivatives, we analyzed the evolution of BitMEX’s products which consists of both settled and perpetual offerings that have become the standard across other cryptocurrency derivatives platforms. We additionally utilized on-chain forensics, public liquidation events, and a site-wide chat room to reveal that BitMEX is inhabited by a diverse ensemble of amateur and professional traders. These traders range from wealthy agents running automated strategies to individuals that trade small and risky positions and are focused on very short time-frames. Finally, we used our study of BitMEX and cryptocurrency derivatives to understand the impact that they have had on cryptocurrency asset prices in the past. In particular, we discuss the role that derivatives have served in several dramatic price movements which influenced not only the derivatives market but also the underlying spot markets. |
Title: Trav-SHACL: Efficiently Validating Networks of SHACL Constraints |
Authors: Philipp D. Rohde (TIB Leibniz Information Centre for Science and Technology), Mónica Figuera (University of Bonn) and Maria-Esther Vidal (TIB Leibniz Information Centre for Science and Technology). |
Knowledge graphs have emerged as expressive data structures for Web data. Knowledge graph potential and the demand for ecosystems to facilitate their creation, curation, and understanding, is testified in diverse domains, e.g., biomedicine. The Shapes Constraint Language (SHACL) is the W3C recommendation language for integrity constraints over RDF knowledge graphs. Enabling quality assements of knowledge graphs, SHACL is rapidly gaining attention in real-world scenarios. SHACL models integrity constraints as a network of shapes, where a shape contains the constraints to be fullfiled by the same entities. The validation of a SHACL shape schema can face the issue of tractability during validation. To facilitate full adoption, efficient computational methods are required. We present Trav-SHACL, a SHACL engine capable of planning the traversal and execution of a shape schema in a way that invalid entities are detected early and needless validations are minimized. Trav-SHACL reorders the shapes in a shape schema for efficient validation and rewrites target and constraint queries for the fast detection of invalid entities. Trav-SHACL is empirically evaluated on 27 testbeds executed against knowledge graphs of up to 34M triples. Our experimental results suggest that Trav-SHACL exhibits high performance gradually and reduces validation time by a factor of up to 28.93 compared to the state of the art. |
Title: Twin Peaks, a Model for Recurring Cascades |
Authors: Matteo Almanza (Sapienza University of Rome), Silvio Lattanzi (Google), Alessandro Panconesi (Sapienza University of Rome) and Giuseppe Re (Sapienza University of Rome). |
Understanding information dynamics and their resulting cascades is a central topic in social network analysis. In a recent seminal work, Cheng et al. analyzed multiples cascades on Facebook over several months, and noticed that many of them exhibit a recurring behaviour. They tend to have multiple peaks of popularity, with periods of quiescence in between. In this paper, we propose the first mathematical model that provably explains this interesting phenomenon, besides exhibiting other fundamental properties of information cascades. Our model is simple and shows that it is enough to have a good clustering structure to observe this interesting recurring behaviour with a standard information diffusion model. Furthermore, we complement our theoretical analysis with an experimental evaluation where we show that our model is able to reproduce the observed phenomenon on several social networks. |
Title: Typing Errors in Factual Knowledge Graphs: Severity and Possible Ways Out |
Authors: Peiran Yao (University of Alberta) and Denilson Barbosa (University of Alberta). |
Factual knowledge graphs (KGs) such as DBpedia and Wikidata have served as part of various downstream tasks and are also widely adopted by artificial intelligence research communities as benchmark datasets. However, we found those KGs to be surprisingly noisy. In this study, we question the quality of these KGs, where the typing error rate is estimated to be 27% for coarse-grained types on average, and even 73% for certain fine-grained types. In pursuit of solutions, we propose an active typing error detection algorithm that maximizes the utilization of both gold and noisy labels. We also comprehensively discuss and compare unsupervised, semi-supervised, and supervised paradigms to deal with typing errors in factual KGs. The outcomes of this study provide guidelines for researchers to use noisy factual KGs, and we will also publish our code and data alone with the paper to help practitioners deploy the techniques and conduct further research. |
Title: Unbiased Loss Functions for Extreme Scale Classification With Missing Labels |
Authors: Mohammadreza Mohammadnia Qararei (Aalto University), Erik Schultheis (Aalto University), Priyanshu Gupta (Indian Institute of Technology, Kanpur) and Rohit Babbar (Aalto University). |
Extreme-scale classification (XC) refers to the task of tagging an instance with a small subset of relevant labels from an extremely large set of all possible labels. The framework of XC has been widely employed in the web applications such as automatic labeling of web-encyclopedia and recommendation systems. While most state-of-the-art models in XC achieve high overall accuracy by performing well on the frequently occurring labels, they perform poorly on a large number of infrequent (tail) labels. This arises from two statistical challenges in XC, (i) missing labels, as it is virtually impossible to manually assign every relevant label to an instance, and (ii) highly imbalanced data distribution where a large fraction of labels are tail-labels. In this work, we consider common loss functions that decompose over labels, and calculate unbiased estimates that compensate missing labels according to (Natarajan et al., 2017). This turns out to be disadvantageous from an optimization perspective, as important properties such as convexity and lower-boundedness are lost. To circumvent this problem, we use the fact that typical loss functions in XC are convex surrogates of the 0-1 loss, and thus propose to switch to convex surrogates of its unbiased version. These surrogates are further adapted to the label imbalance by incorporating label-frequency based rebalancing and adaptive margins. The resulting losses can be easily incorporated into current frameworks for extreme classification. We tested two state-of-the-art algorithms, DiSMEC (Babbar and Scholkopf, 2017) as a shallow classifier based on the squared-hinge-loss and AttentionXML (You et al., 2019) as a deep learning method based on binary cross-entropy, and observed consistent (up to 15%) improvements in terms of propensity-scored metrics. |
Title: Understanding the complexity of detecting political ads |
Authors: Vera Sosnovik (University Grenoble Alpes) and Oana Goga (University Grenoble Alpes, CNRS). |
Online political advertising has grown significantly during the last years. To be able to monitor sponsored political discourse online companies such as Facebook, Google and Twitter have put in place Ad Libraries that contain the political ads that have ran on their platforms. The problem is that there is no global consensus on what is a political ad, each platform provides its own definition, and these definitions can be interpreted in different ways by different people. In this paper we investigate whether there are significant differences between ads labeled as political by advertisers, and ads labeled as political by a group of volunteers. Our results show that advertisers are underreporting ads about social issues while volunteers are underreporting ads from news organizations and ONGs. We analyze how variations in labeling strategies impact political ad classifiers. Finally, we devise a set of experiments to study how political ad definitions and experimental settings are impacting how users decide when an ad is political or not. |
Title: Understanding the Impact of Encrypted DNS on Internet Censorship |
Authors: Lin Jin (University of Delaware), Shuai Hao (Old Dominion University), Haining Wang (Virginia Tech) and Chase Cotton (University of Delaware). |
DNS traffic is transmitted in plaintext, resulting in privacy leakage. To combat this problem, secure protocols have been used to encrypt DNS messages. Existing studies have investigated the performance overhead and privacy benefits of encrypted DNS communications, yet little has been done from the perspective of censorship. In this paper, we study the impact of the encrypted DNS on Internet censorship in two aspects. On one hand, we explore the severity of DNS manipulation, which could be leveraged for Internet censorship, given the use of encrypted DNS resolvers. In particular, we perform 7.4 million DNS lookup measurements on 3,813 DoT and 75 DoH resolvers and identify that 1.66% of DoT responses and 1.42% of DoH responses undergo DNS manipulation. More importantly, we observe that more than two- thirds of the DoT and DoH resolvers manipulate DNS responses from at least one domain, indicating that the DNS manipulation is prevalent in encrypted DNS, which can be further exploited for enhancing Internet censorship. On the other hand, we evaluate the effectiveness of using encrypted DNS resolvers for censorship circumvention. Specifically, we first discover those vantage points that involve DNS manipulation through on-path devices, and then we apply encrypted DNS resolvers at these vantage points to access the censored domains. We reveal that 37% of the domains are accessible from the vantage points in China, but none of the domains is accessible from the vantage points in Iran, indicating that the censorship circumvention of using encrypted DNS resolvers varies from country to country. Moreover, for a same vantage point, using a different encrypted DNS resolver does not lead to a noticeable difference in accessing the censored domains. |
Title: Understanding User Sensemaking in Machine Learning Fairness Assessment Systems |
Authors: Jing Nathan Yan (Cornell University), Ziwei Gu (Cornell University) and Jeff Rzeszotarski (Cornell University). |
A variety of systems have been proposed to assist users in detecting machine learning (ML) fairness issues. These systems approach bias reduction from a number of perspectives, including recommender systems, exploratory tools, and dashboards. In this paper, we seek to inform the design of these systems by examining how individuals make sense of fairness issues as they use different de-biasing affordances. In particular, we consider the tension between de-biasing recommendations which are quick but may lack nuance and "what-if" style exploration which is time consuming but may lead to deeper understanding and transferable insights. Using logs, think-aloud data, and semi-structured interviews we find that exploratory systems promote a rich pattern of hypothesis generation and testing, while recommendations deliver quick answers which satisfy participants at the cost of reduced information exposure. We highlight design requirements and trade-offs in the design of ML fairness systems to promote accurate and explainable assessments. |
Title: Unifying Offline Causal Inference and Online Bandit Learning for Data Driven Decision |
Authors: Ye Li (The Chinese University of Hong Kong), Hong Xie (College of Computer Science, Chongqing University), Yishi Lin (Tencent) and John C.S. Lui (The Chinese University of Hong Kong). |
A fundamental question for companies with large amount of logged data is: How to use such logged data together with incoming streaming data to make good decisions? Many companies currently make decisions via online A/B tests, but wrong decisions during testing hurt users’ experiences and cause irreversible damage. A typical alternative is offline causal inference, which analyzes logged data alone to make decisions. However, these decisions are not adaptive to the new incoming data, and so a wrong decision will continuously hurt users’ experiences. To overcome the aforementioned limitations, we propose a framework to unify offline causal inference algorithms (e.g., weighting, matching) and online learning algorithms (e.g., UCB, LinUCB). We propose novel algorithms and derive bounds on the decision accuracy via the notion of “regret”. We derive the first upper regret bound for forest-based online bandit algorithms. Experiments on two real datasets show that our algorithms outperform other algorithms that use only logged data or online feedbacks, or algorithms that do not use the data properly. |
Title: UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced Data |
Authors: Chacha Chen (Pennsylvania State University), Junjie Liang (Pennsylvania State University), Fenglong Ma (Pennsylvania State University), Lucas Glass (IQVIA), Jimeng Sun (UIUC) and Cao Xiao (IQVIA). |
Successful health risk prediction demands accuracy and reliability of the model. Existing predictive models mainly depend on mining electronic health records (EHR) with advanced deep learning techniques to improve model accuracy. However, they all ignore the importance of publicly available online health data especially socioeconomic status, environmental factors and detailed demographic information for each location, which are all strong predictive signals and can definitely augment precision medicine. To achieve model reliability, the model needs to provide accurate prediction and uncertainty score of the prediction. However, existing uncertainty estimation approaches often failed in handling high- dimensional data, which are present in multi-sourced data. To fill the gap, we propose UNcertaInTy-based hEalth risk prediction (UNITE) model. Building upon an adaptive multimodal deep kernel and a stochastic variational inference module, UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data including EHR data, patient demographics, and public health data collected from the web. We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer’s disease (AD). UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection and outperforms various state-of-the-art baselines by up to 19% over the best baseline. We also show UNITE can model meaningful uncertainties and can provide evidence-based clinical support by clustering similar patients |
Title: Unsupervised Lifelong Learning with Curricula |
Authors: Yi He (University of Louisiana at Lafayette), Sheng Chen (University of Louisiana at Lafayette), Baijun Wu (University of Louisiana at Lafayette), Xu Yuan (University of Louisiana at Lafayette) and Xindong Wu (HeFei University of Technology). |
Lifelong machine learning (LML) has extensively driven the development of web applications, enabling the learning systems deployed on web servers to deal with a sequence of tasks in an incremental fashion. Such systems can retain knowledge from learned tasks in a knowledge base and seamlessly applying it to improve the future learning. Unfortunately, most existing LML methods require labels in every task, whereas providing persistent human labeling for all future tasks is costly, onerous, error-prone, and hence impractical. Motivated by this situation, we propose a new paradigm named unsupervised lifelong learning with curricula(ULLC), where only one task needs to be labeled for initialization and the system then performs lifelong learning for subsequent tasks in an unsupervised fashion. A main challenge of realizing this paradigm lies in the occurrence of negative knowledge transfer, where partial old knowledge becomes detrimental for learning a given task yet cannot be filtered out by the learner without the help of labels. To overcome this challenge, we draw insights from the learning behaviors of humans. When faced with a difficult task that cannot be well tackled by our current knowledge, we usually postpone it and work on some easier tasks first, which allows us to grow our knowledge. Thereafter, once we go back to the postponed task, we are more likely to tackle it well as we are more knowledgeable now. The key idea of ULLC is similar – at any time, a pool of candidate tasks are organized in a curriculum by their distances to the knowledgebase. The learner then starts from the closer tasks, accumulates knowledge from learning them, and moves to learn the faraway tasks with a gradually augmented knowledge base. The viability and effectiveness of our proposal are substantiated through both theoretical analyses and empirical studies. |
Title: Unsupervised Multi-Index Semantic Hashing |
Authors: Christian Hansen (University of Copenhagen), Casper Hansen (University of Copenhagen), Jakob Grue Simonsen (Department of Computer Science, University of Copenhagen), Stephen Alstrup (University of Copenhagen) and Christina Lioma (University of Copenhagen). |
Semantic hashing represents documents as compact binary vectors (hash codes) and allows both efficient and effective similarity search in large-scale information retrieval. The state of the art has primarily focused on learning hash codes that improve similarity search effectiveness, while assuming a brute-force linear scan strategy for searching over all the hash codes, even though much faster alternatives exist. One such alternative is multi-index hashing, an approach that constructs a smaller candidate set to search over, which depending on the distribution of the hash codes can lead to sub- linear search time. In this work, we propose Multi-Index Semantic Hashing (MISH), an unsupervised hashing model that learns hash codes that are both effective and highly efficient by being optimized for multi-index hashing. We derive novel training objectives, which enable to learn hash codes that reduce the candidate sets produced by multi-index hashing, while being end-to-end trainable. In fact, our proposed training objectives are model agnostic, i.e., not tied to how the hash codes are generated specifically in MISH, and are straight-forward to include in existing and future semantic hashing models. We experimentally compare MISH to state-of-the-art semantic hashing baselines in the task of document similarity search. We find that even though multi-index hashing also improves the efficiency of the baselines compared to a linear scan, they are still upwards of 33% slower than MISH, while MISH is still able to obtain state-of-the-art effectiveness. |
Title: Unsupervised Semantic Association Learning with Latent Label Inference |
Authors: Yanzhao Zhang (Beihang University), Richong Zhang (Beihang University), Jaein Kim (Beihang University), Xudong Liu (Beihang University) and Yongyi Mao (University of Ottawa). |
In this paper, we unify a diverse set of learning tasks in NLP, semantic retrieval and related areas, under a common umbrella, which we call unsupervised semantic association learning (USAL). Examples of this generic task include word sense disambiguation, answer selection and question retrieval. We then present a novel modeling framework to tackle such tasks. The framework introduces, under the deep learning paradigm, a latent label indexing the true target in the candidate target set. An EM algorithm is then developed for learning the deep model and inferring the latent variables, principled under variational techniques and noise contrastive estimation. We apply the model and algorithm to several semantic retrieval benchmark tasks and the superior performance of the proposed approach is demonstrated via empirical studies. |
Title: User Simulation via Supervised Generative Adversarial Network |
Authors: Xiangyu Zhao (Michigan State University), Long Xia (School of Information Technology, York University), Lixin Zou (Tsinghua University), Hui Liu (Michigan State University), Dawei Yin (JD.com) and Jiliang Tang (Michigan State University). |
With the recent advances in Reinforcement Learning (RL), there have been tremendous interests in employing RL for recommender systems. However, directly training and evaluating a new RL-based recommendation algorithm needs to collect users' real-time feedback in the real system, which is time/effort consuming and could negatively impact on users' experiences. Thus, it calls for a user simulator that can mimic real users' behaviors where we can pre-train and evaluate new recommendation algorithms. Simulating users' behaviors in a dynamic system faces immense challenges -- (i) the underlying item distribution is complex, and (ii) historical logs for each user are limited. In this paper, we develop a user simulator base on a Generative Adversarial Network (GAN). To be specific, the generator captures the underlying distribution of users' historical logs and generates realistic logs that can be considered as augmentations of real logs; while the discriminator not only distinguishes real and fake logs but also predicts users' behaviors. The experimental results based on benchmark datasets demonstrate the effectiveness of the proposed simulator. Further experiments have been conducted to understand the importance of each component in the simulator. We have publicized the implementation code of the simulator to advance RL-based recommendation research. |
Title: User tracking in the post-cookie era: How websites bypass GDPR consent to track users |
Authors: Emmanouil Papadogiannakis (FORTH/University of Crete), Panagiotis Papadopoulos (Telefonica Research), Nicolas Kourtellis (Telefonica Research) and Evangelos Markatos (ICS/FORTH). |
During the past couple of years, mostly as a result of GDPR and CCPA, websites have started to present users with cookie consent banners. These banners are web forms where the users can state their preference and declare which cookies they would like to accept, if any at all. Although requesting consent before storing any identifiable information is a good start towards respecting the user privacy, yet previous research has shown that websites do not always respect user choices. In this paper, we go a step further and explore whether websites use more persistent and sophisticated forms of tracking in order to track users who have said that they do not want cookies. Such forms of tracking include canvas fingerprinting, first party ID leaking, and cookie synchronisation. Our results suggest that websites do use such modern forms of tracking even before users had the opportunity to register their choice with respect to cookies. To add insult to injury, when users choose to raise their voice and reject all cookies, user tracking only intensifies. As a result, users’ choices play very little role with respect to tracking: we measured that more than 75% of tracking activities happened before users had the opportunity to make a selection in the cookie consent banner, or when users chose to reject all cookies. |
Title: User-oriented Group Fairness In Recommender Systems |
Authors: Yunqi Li (Rutgers University), Hanxiong Chen (Rutgers University-New Brunswick Computer Science Department), Zuohui Fu (Rutgers University), Yingqiang Ge (Rutgers University) and Yongfeng Zhang (Rutgers University). |
As a highly data-driven application, recommender systems could be affected by data bias, resulting in unfair results for different data groups, which could be a reason that affects the system performance. Therefore, it is important to identify and solve the unfairness issues in recommendation scenarios. In this paper, we address the unfairness problem in recommender systems from the user perspective. We group users into advantaged and disadvantaged groups according to their level of activity, and conduct experiments to show that current recommender systems will behave unfairly between two group of users. Specifically, the advantaged users (active) who only accounts for a small proportion in data enjoy much higher recommendation quality than those disadvantaged users (inactive). Such bias can also affects the overall performance since the disadvantaged users are the majority. To solve this problem, we provide a re-ranking approach to mitigate this unfairness problem by adding constraints over evaluation metrics. The experiments we conducted on several real-world datasets with various recommendation algorithms show that our approach can not only improve group fairness of users in recommender systems, but also achieve better overall recommendation performance. |
Title: Using Prior Knowledge to Guide BERT’s Attention in Semantic Textual Matching Tasks |
Authors: Tingyu Xia (JiLin University), Yue Wang (University of North Carolina at Chapel Hill), Yuan Tian (JiLin University) and Yi Chang (JiLin University). |
We study the problem of incorporating prior knowledge into a deep Transformer-based model, i.e., BERT, to enhance its performance in semantic textual matching tasks. By probing and analyzing what BERT has already known about this task, we obtain better understanding of what task-specific knowledge BERT needs the most and where it is most needed. The analysis further motivates us to take a different approach than existing work: instead of using prior knowledge to create a new training task to fine-tune BERT, we directly inject knowledge into BERT's multi-head attention mechanism. This leads us to a simple yet effective approach that enjoys fast training time as it saves the model from training on additional data or tasks other than the main task. Extensive experiments demonstrate that our knowledge-enhanced BERT is able to consistently improve semantic textual matching performance over the original BERT model, and the performance benefit is most salient when training data is scarce. |
Title: Variable Interval Time Sequence Modeling for Career Trajectory Prediction: Deep Collaborative Perspective |
Authors: Chao Wang (University of Science and Technology of China), Hengshu Zhu (Baidu Inc.), Qiming Hao (University of Science and Technology of China), Keli Xiao (Stony Brook University) and Hui Xiong (Rutgers University). |
In today's fast-evolving job market, the timely and effective understanding of the career trajectories of talents can help them quickly develop necessary skills and make the right career transitions at the right time. However, it is a non-trivial task for developing a successful career trajectory prediction method, which should have the abilities for finding the right timing for job-hopping, identifying the right companies, and matching the right positions for the candidates. While people have been trying to develop solutions for providing some of the above abilities, there is no total solution or complete framework to integrate all these abilities together. To this end, in this paper, we propose a unified time-aware career trajectory prediction framework, namely TACTP, which is capable of jointly providing the above three abilities for better understanding the career trajectories of talents. Along this line, we first exploit a hierarchical deep sequential modeling network for career embedding and extract latent talent factors from multiple networks, which are designed with different functions of handling related issues of the timing, companies, and positions for job-hopping. Then, we perform collaborative filtering for generating personalized predictions. Furthermore, we propose a temporal encoding mechanism to handle dynamic temporal information so that TACTP is capable of generating time-aware predictions by addressing the challenges for variable interval time sequence modeling. Finally, we have conducted extensive experiments on large-scale real-world data to evaluate TACTP against the state-of-the-art baselines, and the results show that TACTP has advantages over baselines on all targeted tasks for career trajectory prediction. |
Title: Variation Control and Evaluation for Generative Slate Recommendations |
Authors: Shuchang Liu (Rutgers University), Fei Sun (Alibaba Group), Yingqiang Ge (Rutgers University), Changhua Pei (Tsinghua University) and Yongfeng Zhang (Rutgers University). |
Slate recommendation generates a list of items as a whole instead of ranks each item individually, so as to better model the intra-list positional biases and item relations. In order to deal with the enormous combinatorial space of slates, recent work considers slate recommendation as a generation task so that a slate can be directly compressed and reconstructed. However, we observe that such generative approaches---despite their proved effectiveness in computer vision---suffer from a reconstruction-concentration trade-off dilemma in recommender systems: when focusing on reconstruction, they easily over-fit the training data and hardly generate satisfactory recommendations; on the other hand, when focusing on satisfying the user interests, they get trapped in a few items and fail to cover the item variation in slates. In this paper, we propose to enhance the accuracy-based evaluation with slate variation metrics to estimate the stochastic behavior of generative models. We then illustrate that instead of reaching to one of the two undesirable extreme cases in the dilemma, a valid generative slate recommendation model can be observed in a narrow ``elbow'' region in between. And we show that item perturbation can enforce slate variation and mitigate the over-concentration of generated slates, which expand the ``elbow'' performance to an easy-to-find region. We further propose to separate a pivot selection phase from the generation process so that the model can apply perturbation before generation. Empirical results show that this simple modification can provide even better variance with the same level of accuracy compared to post-generation perturbation methods. |
Title: Verdi: Quality Estimation and Error Detection for Bilingual Corpora |
Authors: Mingjun Zhao (University of Alberta), Haijiang Wu (Tencent), Di Niu (University of Alberta), Zixuan Wang (Tencent) and Xiaoli Wang (Tencent). |
Translation Quality Estimation is critical to reducing post-editing efforts in machine translation and to cross-lingual corpus cleaning. As a research problem, quality estimation (QE) aims to directly estimate the quality of translation in a given pair of source and target sentences, and highlight the words that need corrections, without referencing to golden translations. In this paper, we propose Verdi, a novel framework for word-level and sentence-level post-editing effort estimation for bilingual corpora. Verdi adopts two word predictors to enable diverse features to be extracted from a pair of sentences for subsequent quality estimation, including a transformer-based neural machine translation (NMT) model and a pre-trained cross-lingual language model (XLM). We exploit the symmetric nature of bilingual corpora and apply model-level dual learning in the NMT predictor, which handles a primal task and a dual task simultaneously with weight sharing, leading to stronger context prediction ability than single-direction NMT models. By taking advantage of the dual learning scheme, we further design a novel feature to directly encode the translated target information without relying on the source context. Extensive experiments conducted on WMT20 QE tasks demonstrate that our method beats the winner of the competition and outperforms other baseline methods by a great margin. We further use the sentence-level scores provided by Verdi to clean a parallel corpus and observe benefits on both model performance and training efficiency. |
Title: Wait, Let’s Think about Your Purchase Again: A Study on Interventions for Supporting Self-Controlled Online Purchases |
Authors: Yunha Han (Ulsan National Institute of Science and Technology), Hwiyeon Kim (Ulsan National Institute of Science and Technology), Hyeshin Chu (Ulsan National Institute of Science and Technology), Joohee Kim (Ulsan National Institute of Science and Technology), Hyunwook Lee (Ulsan National Institute of Science and Technology), Seunghyeong Choe (Ulsan National Institute of Science and Technology), Dooyoung Jung (Ulsan National Institute of Science and Technology), Dongil Chung (Ulsan National Institute of Science and Technology), Bum Chul Kwon (IBM Research) and Sungahn Ko (Ulsan National Institute of Science and Technology). |
As online marketplaces adopt new technologies (e.g., one-click purchase) to encourage consumers’ purchases, the number of consumers who impulsively purchase products also increases. Although many interventions have been introduced for consumers’ self-controlled purchases, there have been few studies that evaluate the effectiveness of the techniques in-the-wild. For intervention evaluation, we first conduct a survey with 118 consumers in their 20s to investigate their impulse buying patterns and self-control strategies. Based on the survey results and literature surveys, we develop interventions that can assist consumers’ self-controlled online purchases, including reflection, distraction, cost saliency, and desire reduction. The experiment results with 107 consumers indicate that all interventions are effective in reducing impulse buying urge, while it raises the variations in user experiences. Lastly, we discuss our findings and design implications. |
Title: Weakly-Supervised Question Answering with Effective Rank and Weighted Loss over Candidates |
Authors: Haozhe Qin (Shanghai Jiao Tong University), Jiangang Zhu (Microsoft) and Beijun Shen (Shanghai Jiao Tong University). |
We study the weakly supervised question answering problem. Weakly supervised question answering aims to learn how the questions should be answered directly from the |
Title: WebSocket Adoption and the Landscape of the Real-Time Web |
Authors: Paul Murley (University of Illinois), Zane Ma (University of Illinois), Joshua Mason (University of Illinois), Michael Bailey (University of Illinois) and Amin Kharraz (Florida International University). |
Developers are increasingly deploying web applications which require real-time bidirectional updates, a use case which does not naturally align with the original client-server architecture of the web. Many solutions have arisen to address this need over the preceding decades, including HTTP polling, Server-Sent Events, and WebSockets. This paper investigates this ecosystem and reports on the prevalence, benefits, and drawbacks of these technologies, with a particular focus on WebSockets. We crawl the Tranco Top 1 Million websites to build a client-side dataset to study real-time updates in the wild. We find that HTTP Polling remains significantly more common than WebSockets, and WebSocket adoption appears to have stagnated in the part 2-3 years. When WebSockets are used, the prescribed best practices for securing them are often disregarded. We investigate and discuss some of the possible reasons for this slowing adoption. We make our dataset available in the hopes that it may help inform the development of future real-time solutions for the web. |
Title: Whale Watching in Inland Indonesia: Analyzing a Small, Remote, Internet-Based Community Cellular Network |
Authors: Matthew Johnson (University of Washington), Jenny Liang (University of Washington), Michelle Lin (University of Washington), Sudheesh Singanamalla (University of Washington) and Kurtis Heimerl (University of Washington). |
While only generating a minuscule percentage of global traffic, largely lost in the noise of large-scale analyses, remote rural networks are the physical frontier of the Internet today. Through tight integration with a local operator's infrastructure, we gather a unique dataset to characterize and report a year of interaction between finances, utilization, and performance of a novel, remote, data-only Community LTE Network in Keafootnote{Anonymized for submission}, Indonesia. With visibility to drill down to individual users, we find use highly unbalanced and the network supported by only a handful of relatively heavy consumers. 45% of users are offline more days than online, and the median user consumes only 77 MB per day online and 36 MB per day on average, limiting consumption by frequently ``topping up'' in small amounts. Outside video and social media, messaging and IP calling provided by over-the-top services like Facebook Messenger, QQ, and WhatsApp comprise a relatively large percentage of traffic consistently across both heavy and light users. Our analysis shows that Internet-only Community Cellular Networks can be profitable despite most users spending less than $1 USD/day, and offers insights into the unique properties of these networks. |
Title: What do You Mean? Interpreting Image Classification with Crowdsourced Concept Extraction and Analysis |
Authors: Agathe Balayn (Delft University of Technology), Panagiotis Soilis (Delft University of Technology), Christoph Lofi (Delft University of Technology), Jie Yang (Delft University of Technology) and Alessandro Bozzon (Delft University of Technology). |
Global interpretability is a vital requirement for image classification in many application domains ranging from medical diagnosis to autonomous driving, from both a legal perspective but also for improving the classifier's performance or fairness. Existing interpretability methods mainly explain a model behavior by identifying salient image patches, which require manual efforts from users to make sense of, and also do not typically support model validation with questions that contain multiple concepts. In this paper, we introduce a scalable, crowdsourcing-based human-in-the-loop approach for global interpretability. Salient image areas identified by local interpretability methods are annotated with semantic concepts, which are then aggregated into a tabular representation of images to facilitate automatic statistical analysis of model behavior. We show that this approach can answer both interpretability needs for model validation and exploration, and provides semantically more diverse, informative, and relevant explanations while still allowing for scalable and cost-efficient execution. |
Title: Where are you taking me? Understanding Abusive Traffic Distribution Systems |
Authors: Janos Szurdi (Carnegie Mellon University), Meng Luo (Stony Brook University), Brian Kondracki (Stony Brook University), Nick Nikiforakis (Stony Brook University) and Nicolas Christin (Carnegie Mellon University). |
Illicit website owners frequently rely on traffic distribution systems (TDSs) operated by less-than-scrupulous advertising networks to acquire user traffic. While researchers have described a number of case studies on various TDSs or the businesses they serve, we still lack an understanding of how users are differentiated in these ecosystems, how different illicit activities frequently leverage the same advertisement networks and, subsequently, the same malicious advertisers. We design ODIN (Observatory of Dynamic Illicit ad Networks) the first system to study cloaking, user differentiation and business integration at the same time in four different types of traffic sources: typosquatting, copyright-infringing movie streaming, ad-based URL shortening, and illicit online pharmacy websites. ODIN performed 874,494 scrapes over two months (June 19, 2019 - August 24, 2019), posing as six different types of users (e.g., mobile, desktop, and crawler) and accumulating over 2TB of screenshots, browser events and archived HTTP communications. We observed 81% more malicious pages compared to using only the best performing crawl profile by itself. Three of the traffic sources we study redirect users to the same traffic broker domain names up to 44% of the time and all of them often expose users to the same malicious advertisers. Our experiments show that novel cloaking techniques could decrease by half the number of malicious pages observed. Worryingly, popular blacklists do not just suffer from the lack of coverage and delayed detection as observed by previous work, but miss the vast majority of malicious pages targeting mobile users. We use these findings to design a classifier, which can make precise predictions about the likelihood of a user being redirected to a malicious advertiser. |
Title: Where Next? A Dynamic Model of User Preferences |
Authors: Francesco Sanna Passino (Imperial College London), Lucas Maystre (Spotify), Dmitrii Moor (Spotify), Ashton Anderson (University of Toronto, Spotify) and Mounia Lalmas (Spotify). |
We consider the problem of predicting users' preferences on online platforms. We build on recent findings suggesting that users' preferences change over time, and that helping users expand their horizons beyond the narrow set of their current preferences is important in ensuring that they stay engaged. Most existing models of user preferences attempt to capture simultaneous preferences: "Users who like A tend to like B as well". In this paper, we argue that these models fail to anticipate changing preferences. To overcome this issue, we seek to understand the structure that underlies the evolution of user preferences. To this end, we propose the Preference Transition Model (PTM), a dynamic model for user preferences towards classes of items. The model enables estimating probabilities of transitions between classes of items over time, which can be used to estimate how users' tastes are expected to evolve based on their past history. We test our model's predictive performance on a number of different prediction tasks on data from three different domains: music streaming, restaurant recommendations and movie recommendation, and find that it outperforms competing approaches. We then focus on a music application, and inspect the structure learned by our model. We find that PTM uncovers remarkable regularities in users' preference trajectories over time. We believe that these findings could inform a new generation of dynamic, diversity-enhancing recommender systems. |
Title: Wiki2Prop: A Multi-Modal Approach for Predicting Wikidata Properties from Wikipedia |
Authors: Michael Luggen (University of Fribourg), Julien Audiffren (Fribourg University), Djellel Difallah (NYU) and Philippe Cudre-Mauroux (U. of Fribourg). |
Wikidata is rapidly emerging as a key resource for a multitude of online tasks such as Speech Recognition, Entity Linking, Question Answering, or Semantic Search. The value of Wikidata is directly linked to the rich information associated with each entity -- that is, the properties describing each entity as well as the relationships to other entities. Despite the tremendous manual and automatic efforts the community invested in the Wikidata project, the growing number of entities (now more than 100 million) present multiple challenges in terms of knowledge gaps in the graph that are hard to track. To help guide the community in filling the gaps in Wikidata, we propose to identify and rank the properties that an entity might be missing. In this work, we focus on entities that have a dedicated Wikipedia page in any language to make predictions directly based on textual content. We show that this problem can be formulated as a multi-label classification problem where every property defined in Wikidata is a potential label. Our main contribution, Wiki2Prop, solves this problem using a multimodal Deep Learning method to predict which properties should be attached to a given entity, using its Wikipedia page embeddings. Moreover, Wiki2Prop is able to incorporate additional features in the form of multilingual embeddings and multimodal data such as images whenever available. We empirically evaluate our approach against the state of the art and show how Wiki2Prop significantly outperforms its competitors for the task of property prediction in Wikidata, and how the use of multilingual and multimodal data improves the results further. |
Title: WiseKG: Balanced Access to Web Knowledge Graphs |
Authors: Amr Azzam (Vienna University of Business and Economics), Christian Aebeloe (Aalborg University), Gabriela Montoya (Aalborg University), Ilkcan Keles (Turkcell), Axel Polleres (Vienna University of Business and Economics) and Katja Hose (Aalborg University). |
SPARQL query services that balance processing between clients andservers become more and more essential to endure the increasing load on published open and decentralized knowledge graphs overthe Web. To this end, Linked Data Fragments (LDF) have introduceda foundational framework that has sparked research exploring aspectrum of potential Web querying interfaces in between SPARQL endpoint servers on the one end, and client-side processing of datadumps on the other. Current proposals in between usually suffer from imbalanced load on either the client (TPF, smart-KG) or theserver (SaGe, SPF) side. The present paper, to the best of our knowledge, is the first work to combine both client-side and server-side query processing optimizations in a truly dynamic fashion: we introduce WiseKG, which employs a cost model that dynamically delegates the load between servers and clients by combining client-side processing of shipped partitions (á la smart-KG) with efficientserver-side processing of star-shaped sub-queries (á la SPF), based on current server workload and client capabilities. Our experiments show that WiseKG significantly outperforms state-of-the-art so- lutions in terms of average total query execution time per client, while at the same time decreasing network traffic, and increasing server-side availability. |
Title: WiseTrans: Adaptive Transport Protocol Selection for Mobile Web Service |
Authors: Jia Zhang (Tsinghua University), Enhuan Dong (Tsinghua University), Zili Meng (Tsinghua University), Yuan Yang (Tsinghua University), Mingwei Xu (Tsinghua University), Sijie Yang (Baidu Inc.), Miao Zhang (Baidu Inc.) and Yang Yue (Tsinghua University). |
To improve the performance of mobile web service, a new transport protocol, QUIC, has been recently proposed. However, for large-scale real-world deployments, deciding whether and when to use QUIC in mobile web service is challenging. Complex temporal correlation of network conditions, high spatial heterogeneity of users in a nationwide deployment, and limited resources on mobile devices all affect the selection of transport protocols. In this paper, we present WiseTrans to adaptively switch transport protocols for mobile web service online and improve the completion time of web requests. WiseTrans introduces machine learning techniques to deal with temporal heterogeneity, makes decisions with historical information to handle spatial heterogeneity, and switches transport protocols at the request level to reach both high performance and acceptable overhead. We implement WiseTrans on two platforms (Android and iOS) in a popular mobile web service application of Company B. Comprehensive experiments demonstrate that WiseTrans can reduce request completion time by up to 26.5% on average compared to the usage of a single protocol. |
Title: XY-Sketch: on Sketching Data Streams at Web Scale |
Authors: Yongqiang Liu (University of Science and Technology of China) and Xike Xie (University of Science and Technology of China). |
Conventional sketching methods on counting stream item frequencies use hash functions for mapping data items to a concise structure, e.g., a two-dimensional array, at the expense of overcounting due to hashing collisions. Despite the popularity, however, the accumulated errors originated in hashing collisions deteriorate the sketching accuracies at the rapid pace of data increasing, which poses a great challenge to sketch big data streams at web scale. In this paper, we propose a novel structure, called XY-sketch, which estimates the frequency of a data item by estimating the probability of this item appearing in the data stream. The framework associated with XY-sketch consists of two phases, namely decomposition and recomposition phases. A data item is split into a set of compactly stored basic elements, which can be stringed up in a probabilistic manner for query evaluation during the recomposition phase. Throughout, we conduct optimization under space constraints and detailed theoretical analysis. Experiments on both real and synthetic datasets are done to show the superior scalability on sketching large-scale streams. Remarkably, XY-sketch is orders of magnitudes more accurate than existing solutions, when the space budget is small. |