Heterogeneous Web Data Search Using Relevance-based On The Fly Data Integration
Published: 2012 April
Buchtitel: World Wide Web Conference (WWW2012)
Erscheinungsort: Lyon, France
Searching over heterogeneous structured data on the Web is challenging due to vocabulary and structure mismatches among different data sources. In this paper, we study two existing strategies and present a new approach to integrate additional data sources into the search process. The first strategy relies on data integration to mediate mismatches through upfront computation of mappings, based on which queries are rewritten to fit individual sources. The other extreme is keyword search, which does not require any up- front investment, but ignores structure information. Building on these strategies, we present a hybrid approach, which combines the advantages of both. Our approach does not require any upfront data integration, but also leverages the fine grained structure of the underlying data. For a structured query adhering to the vocabulary of just one source, the so-called seed query, we construct an entity relevance model (ERM), which captures the content and the struc- ture of the seed query results. This ERM is then aligned on the fly with keyword search results retrieved from other sources and also used to rank these results. The outcome of our experiments using large-scale real-world data sets suggests that data integration leads to higher search effective- ness compared to keyword search and that our new hybrid approach consistently exceeds both strategies.
Weitere Informationen unter: Link
DOI Link: 10.1145/2187836.2187856