Rich Queries
Thursday 3:30–5:00 PM
Chair: Kevin Chang
Liquid Query: Multi-domain Exploratory Search on the Web
Marco Brambilla, Alessandro Bozzon, Stefano Ceri, Piero Fraternali
User search activities on the Web are getting more and more specialized: users expect more precise domain-specific results from search engines and typically perform complex tasks that involve exploratory, multi-step search processes. In this paper we propose the Liquid Query paradigm, that allows users finding responses to multi-domain queries through an exploratory information seeking approach, upon structured information collected from Web documents, deep Web data sources, and personal data repositories, wrapped by means of a uniform notion of search service. Liquid queries aim at filling the gap between generalized search systems, which are unable to find information spanning multiple topics, and domain-specific search systems, which cannot go beyond their domain limits. Liquid query provides a set of interaction primitives that let users make questions and explore results spanning over multiple sources, thus getting closer and closer to the sought information. We demonstrate our approach with a prototype built upon the YQL (Yahoo! Query Language) framework.
Atomate It! End-user Context-sensitive Automation Using Heterogeneous Information Sources on the Web
Max Van Kleek, Brennan Moore, Paul André, David Karger, mc schraefel
The transition of personal information management (PIM) tools off the desktop to the Web presents an opportunity to augment these tools with capabilities provided by the wealth of real-time information readily available. In this paper, we describe a next-generation personal information assistance engine that lets end-users delegate to it various simple context- and activity-reactive tasks and reminders. Our system, Atomate, treats RSS/ATOM feeds from social networking and life-tracking sites as sensor streams, integrating information from such feeds into a simple unified RDF world model representing people, places and things and their time-varying states and activities. Combined with other information sources on the web, including the user’s online calendar, web-based e-mail client, news feeds and messaging services, Atomate can be made to automatically carry out a variety of simple tasks for the user, ranging from context-aware filtering and messaging, to sharing and social coordination actions. Atomate’s open architecture and world model easily accommodate new information sources and actions via the addition of feeds and web services. To make routine use of the system easy for non-programmers, Atomate provides a constrained-input natural language interface (CNLI) for behavior specification, and a direct-manipulation interface for inspecting and updating its world model.
A Novel Traffic Analysis for Identifying Search Fields in the Long Tail of Web Sites
George Forman, Evan Kirshenbaum, Shyamsundar Rajaram
Using a clickstream sample of 2 billion URLs from many thousand volunteer Web users, we wish to analyze typical usage of keyword searches across the Web. In order to do this, we need to be able to determine whether a given URL represents a keyword search and, if so, which field contains the query. Although it is easy to recognize “q” as the query field in “http://www.google.com/search?hl=en\&q=music”, we must do this automatically for the long tail of diverse websites. This problem is the focus of this paper. Since the names, types and number of fields differ across sites, this does not conform to traditional text classification or to multi-class problem formulations. The problem also exhibits highly non-uniform importance across websites, since traffic follows a Zipf distribution. We developed a solution based on manually identifying the query fields on the most popular sites, followed by an adaptation of machine learning for the rest. It involves an interesting case-instances structure: labeling each website “case” usually involves selecting at most one of the field `instances’ as positive, based on seeing sample field values. This problem structure and soft constraint—which we believe has broader applicability—can be used to greatly reduce the manual labeling effort. We employed active learning and judicious GUI presentation to efficiently train a classifier with accuracy estimated at 96%, beating several baseline alternatives.
.