Presenters: Soumen Chakrabarti, IIT Bombay, India
Search engines today harness many sources of structured and semi-structured data in addition to unstructured Web/text pages, ranging from extremely well-structured product catalogs, geographical information systems, yellow pages, to less structured but fairly precise ontologies of concepts and relations like WordNet or CYC, to “organically” maintained knowledge bases like Wikipedia and Freebase. To succeed at exploiting these sources, search engines today need a much deeper understanding of the query and the information need behind the query. In this 3-hour tutorial we will describe latest statistical techniques to interpret a query in terms of intent, time, location, target types, entities, attributes, and relations, so that the query becomes more structured, and ranking of responses with respect to semi-structured or unstructured corpora become better informed by the annotations made on the query. We will also discuss how the uncertainties of query annotation can be mitigated by more sophisticated query execution techniques.