WWW2010

Facets

Friday, 3:30–5:00 PM
Chair: Torsten Suel

Facetedpedia: Dynamic Generation of Query-Dependent Faceted Interfaces for Wikipedia

Chengkai Li, Ning Yan, Senjuti Roy, Lekhendro Lisham, Gautam Das

This paper proposes Facetedpedia, a faceted retrieval system for information discovery and exploration over Wikipedia. Given the set of Wikipedia articles resulting from a keyword query, Facetedpedia generates a faceted interface for navigating the result articles. Compared with other traditional faceted retrieval systems, Facetedpedia is fully automatic and dynamic in both facets generation and hierarchy construction, and the facets are based on rich semantic information from Wikipedia. The essence of our approach is to build upon the collaborative vocabulary in Wikipedia, more specifically the intensive structures (hyperlinks) and folksonomy (category system). Given the sheer size and complexity of Wikipedia, the space of possible choices of faceted interfaces is prohibitively large. We propose metrics for ranking individual facet hierarchies by user’s navigational cost, and metrics for ranking interfaces (each with k facets) by both their average pairwise similarities and average navigational costs. We thus develop faceted interface discovery algorithms that optimize the ranking metrics and generate the faceted interface. Our experimental evaluation and user study verify the effectiveness of the system.

Ad-Hoc Object Retrieval in the Web of Data

Jeffrey Pound, Peter Mika, Hugo Zaragoza

Semantic Search vaguely refers to a loose set of concepts, challenges and techniques having to do with harnessing the information of the growing Web of Data (WOD) for Web search. Here we propose a formal model of one specific semantic search task: ad-hoc object retrieval. We show that this task provides a solid framework to study some of the semantic search problems currently tackled by commercial Web search engines. We connect this task to the traditional ad-hoc document retrieval and discuss appropriate evaluation metrics. Finally, we carry out a realistic evaluation of this task in the context of a Web search application.

Towards Rich Query Interpretation: Back and Forth on Query Template Mining

Govind Kabra, Kevin Chang, Ganesh Agarwal

In this paper, we propose to mine templates from search engine query logs, with the goal of rich structured query interpretation. To begin with, we formalize the notion of templates as a sequence of keywords and domain attributes, instantiating many queries based on the instances of these domain attributes. We identify the key challenge in template discovery as the limited seed knowledge. Our solution bootstraps from small seed input to discover relevant query templates, by harnessing the wealth of information available in search logs. We model this information in a tri-partite inference network of queries, sites and templates—together forming the “QueST” network. We propose iterative probabilistic inferencing framework based on dual metrics of precision and recall. We have deployed and tested our algorithm over a real-world large-scale search log of 15 Million queries from the MSN search engine. We find the accuracy of our algorithm to be as much as 90% (on F measure), with very little seed input knowledge and even incomplete domain schema.

.

Back to full list of papers