FUTURE OF THE WEB

Future of the Web

Chair:
Evgeniy Gabrilovich, Google

16:00-16:30 CEST – Gerhard Weikum

15 Years of Knowledge Graphs: Lessons, Challenges and Opportunities

Abstract: Machines with comprehensive knowledge of the world’s entities and their relationships has been a long-standing vision and challenge of AI. Over the last 15 years, huge knowledge bases, also known as knowledge graphs, have been automatically constructed from web data and text sources, and have become a key asset for search engines and other use cases. Machine knowledge can be harnessed to semantically interpret textual phrases in news, social media and web tables, contributing to question answering, natural language processing and data analytics. This talk reviews these advances and discusses lessons learned (see also https://arxiv.org/abs/2009.11564 for a comprehensive survey). Moreover, the talk identifies open challenges and new research opportunities. In particular, extracting quantitative measures of entities (e.g., height of buildings or energy efficiency of cars), from text and web tables, presents an opportunity to further enhance the scope and value of knowledge bases.

Bio: Gerhard Weikum is a Scientific Director at the Max Planck Institute for Informatics in Saarbruecken, Germany, and an Adjunct Professor at Saarland University. He co-authored a comprehensive textbook on transactional systems, received the VLDB Test-of-Time Award 2002 for his work on automatic database tuning, and is one of the creators of the YAGO knowledge base which was recognized by the WWW Test-of-Time Award in 2018. Weikum is an ACM Fellow and elected member of various academies. He received the ACM SIGMOD Contributions Award in 2011, a Google Focused Research Award in 2011, an ERC Synergy Grant in 2014, and the ACM SIGMOD Edgar F. Codd Innovations Award in 2016.

16:30-17:00 CEST – Soumen Chakrabarti

A Brief History of Question Answering

Bio: We will review cross-community co-evolution of question answering (QA) with the advent of large-scale knowledge graphs (KGs), continuous representations of text and graphs, and deep sequence analysis. Early QA systems were information retrieval (IR) systems enhanced to extract named entity spans from high-scoring passages. Starting with WordNet, a series of structured curations of language and world knowledge, called KGs, enabled further improvements. Corpus is unstructured and messy to exploit for QA. If a question can be answered using the KG alone, it is attractive to ‘interpret’ the free-form question into a structured query, which is then executed on the structured KG. This process is called KGQA. Answers can be high-quality and explainable if the KG has an answer, but manual curation results in low coverage. KGs were soon found useful to harness corpus information. Named entity mention spans could be tagged with fine-grained types (e.g., scientist), or even specific entities (e.g., Einstein). The QA system can learn to decompose a query into functional parts, e.g., “which scientist” and “played the violin”. With increasing success of such systems, ambition grew to address multi-hop or multi-clause queries, e.g., “the father of the director of La La Land teaches at which university?” or “who directed an award-winning movie and is the son of a Princeton University professor?” Questions limited to simple path traversals in KGs have been encoded to a vector representation, which a decoder then uses to guide the KG traversal. Recently the corpus counterpart of such strategies has also been proposed. However, for general multi-clause queries that do not necessarily translate to paths, and seek to bind multiple variables to satisfy multiple clauses, or involve logic, comparison, aggregation and other arithmetic, neural programmer-interpreter systems have seen some success. Our focus will be on identifying situations where manual introduction of structural bias is essential for accuracy, as against cases where sufficient data can get around distant or no supervision.

Abstract: Soumen Chakrabarti is a Professor of Computer Science at IIT Bombay. He works on linking unstructured text to knowledge bases and exploiting these links for better search and ranking. Other interests include link formation and influence propagation in social networks, and personalized proximity search in graphs. He has published extensively in WWW, SIGKDD, EMNLP, VLDB, SIGIR, ICDE and other conferences. He won the best paper award at WWW 1999. He was coauthor on the best student paper at ECML 2008. His work on keyword search in databases got the 10-year influential paper award at ICDE 2012. He got his PhD from University of California, Berkeley and worked on Clever Web search and Focused Crawling at IBM Almaden Research Center. He has also worked at Carnegie-Mellon University and Google. He received the Bhatnagar Prize in 2014 and the Jagadis Bose Fellowship in 2019.

16:00-16:30 CEST – Mounia Lalmas

Personalization at Spotify: Enriching Life Through Audio

Abstract: The aim of the Personalization mission at Spotify is to “connect listeners and creators in a unique and enriching way”. This talk will describe some of the (research) work to achieve this, from using machine learning to metric validation and understanding listening diversity. The talk with the long-term vision in “enriching life through audio”.

Bio: Mounia is a Director of Research at Spotify, and the Head of Tech Research in Personalization. Mounia also holds an honorary professorship at University College London. Her work focuses on studying user engagement in areas such as native advertising, digital media, social media, search, and now audio. She has given numerous talks and tutorials on these and related topics, including tutorials at WWW 2019 and KDD 2020 on ‘Online User Engagement: Metrics and Optimization’. She was co-programme chair for SIGIR 2015, WWW 2018 and WSDM 2020. She is also the co-author of a book written as the outcome of her WWW 2013 tutorial on ‘measuring user engagement.

16:30-17:00 CEST – Mor Naaman

To Fix Online Trustworthiness, Learn from the Animals

From conspiracy theories about vaccines to false claims of extensive voter fraud in the US election, our information ecosystem has been overwhelmed by misinformation. The success of attempts to counter it—detection and removal on one hand, and attempts to bolster trust in credible sources on the other—has been limited. In this talk, I will argue that in order to better address these challenges we rethink our approach to evaluating credibility and trust on our information platforms, and that lessons from the animal kingdom could help point us in the right direction.

Mor Naaman is a professor of Information Science at the Jacobs Institute at Cornell Tech. He leads a research group focused on topics related to the intersection of technology, media and democracy. The group applies multidisciplinary techniques — from machine learning to qualitative social science — to study our information ecosystem and its challenges. Previously, Mor was on the faculty at the Rutgers School of Communication and Information, led a research team at Yahoo! Research Berkeley, received a Ph.D. in Computer Science from the Stanford University InfoLab, and played professional basketball for Hapoel Tel Aviv. He is also a former startup co-founder, and advises startup companies in social computing and related areas. His research is widely recognized, including with an NSF Early Faculty CAREER Award, research awards and grants from numerous corporations, and multiple best paper awards.

17:00-17:30 CEST – Michael Ostrovsky

The evolution of sponsored listings: past, present, and conjectures about the future

Abstract: Sponsored listings (also referred to as promoted listings, sponsored search, or sponsored products, depending on the setting) are one of the key sources of monetization on the Internet. I will briefly discuss the history of sponsored listings, cover their more recent evolution (both in terms of the expansion in the areas in which they are successfully deployed and in terms of their features), and speculate about their potential future development.

Bio: Michael Ostrovsky is the Fred H. Merrill Professor of Economics at the Stanford Graduate School of Business, and is co-director of the Market Design working group at the National Bureau of Economic Research. His research spans game theory, market design, auction theory and practice, matching, and e-commerce. He contributed seminal research on the theory and practice of advertising auctions: “Internet Advertising and the Generalized Second Price Auction: Selling Billions of Dollars Worth of Keywords,” the foundational paper on the analysis of sponsored search auctions, and “Reserve Prices in Internet Advertising Auctions,” the first field experiment on optimal reserve prices. More recently, he has also studied the economics of autonomous transportation and the design of choice screen auctions for default search engines on mobile platforms.