Stage-oe-small.jpg

Inproceedings3250: Unterschied zwischen den Versionen

Aus Aifbportal
Wechseln zu:Navigation, Suche
(Die Seite wurde neu angelegt: „{{Publikation Erster Autor |ErsterAutorNachname=Herzig |ErsterAutorVorname=Daniel M. }} {{Publikation Author |Rank=2 |Author=Duc Thanh Tran }} {{Inproceedings |Re…“)
 
 
(3 dazwischenliegende Versionen desselben Benutzers werden nicht angezeigt)
Zeile 8: Zeile 8:
 
}}
 
}}
 
{{Inproceedings
 
{{Inproceedings
|Referiert=False
+
|Referiert=True
 
|Title=Heterogeneous Web Data Search Using Relevance-based On The Fly Data Integration
 
|Title=Heterogeneous Web Data Search Using Relevance-based On The Fly Data Integration
 
|Year=2012
 
|Year=2012
Zeile 17: Zeile 17:
 
}}
 
}}
 
{{Publikation Details
 
{{Publikation Details
|Abstract=Searching over heterogeneous structured data on the Web is challenging due to vocabulary and structure mismatches among different data sources. In this paper, we study two main directions. The first one relies on data integration to mediate these mismatches through upfront computation of mappings, based on which queries are rewritten to fit the vocabulary and structure of individual sources. The other extreme is keyword search, which does not require any up-front investment, but ignores structure information that can be exploited for more effective search. Then, we present a hybrid approach, which assumes only one single structured query that adheres to the vocabulary of just one of the sources. However, this so-called seed query is not rewritten to obtain structured queries for individual sources, but processed as a keyword query. For more effective keyword search that also takes structure information into account, we construct an entity relevance model (ERM), which captures both the content and structure of the seed query results. On the fly, this ERM model is then aligned with keyword search results retrieved from other sources to bridge vocabulary mismatches, and finally used to rank these results. Through experiments using large-scale real world datasets, we study these three different strategies. The outcomes suggest that upfront investment in data integration leads to higher search effectiveness compared to keyword search, and that the hybrid strategy clearly provide best results.
+
|Abstract=Searching over heterogeneous structured data on the Web is challenging due to vocabulary and structure mismatches among different data sources. In this paper, we study two existing strategies and present a new approach to integrate additional data sources into the search process. The first strategy relies on data integration to mediate mismatches through upfront computation of mappings, based on which queries are rewritten to fit individual sources. The other extreme is keyword search, which does not require any up- front investment, but ignores structure information. Building on these strategies, we present a hybrid approach, which combines the advantages of both. Our approach does not require any upfront data integration, but also leverages the fine grained structure of the underlying data. For a structured query adhering to the vocabulary of just one source, the so-called seed query, we construct an entity relevance model (ERM), which captures the content and the struc- ture of the seed query results. This ERM is then aligned on the fly with keyword search results retrieved from other sources and also used to rank these results. The outcome of our experiments using large-scale real-world data sets suggests that data integration leads to higher search effective- ness compared to keyword search and that our new hybrid approach consistently exceeds both strategies.
 +
|ISBN=978-1-4503-1229-5
 +
|Download=Dherzig-web-data-search-integration.pdf
 +
|Link=http://dl.acm.org/citation.cfm?id=2187856
 +
|DOI Name=10.1145/2187836.2187856
 
|Projekt=IGreen
 
|Projekt=IGreen
 
|Forschungsgruppe=Wissensmanagement
 
|Forschungsgruppe=Wissensmanagement
Zeile 33: Zeile 37:
 
|Forschungsgebiet=WWW Systeme
 
|Forschungsgebiet=WWW Systeme
 
}}
 
}}
 +
{{Forschungsgebiet Auswahl
 +
|Forschungsgebiet=Vernetzte Daten
 +
}}
 +
{{#set: equivalent URI=http://data.semanticweb.org/conference/www/2012/paper/329}}

Aktuelle Version vom 23. Juli 2012, 17:59 Uhr


Heterogeneous Web Data Search Using Relevance-based On The Fly Data Integration


Heterogeneous Web Data Search Using Relevance-based On The Fly Data Integration



Published: 2012 April

Buchtitel: World Wide Web Conference (WWW2012)
Verlag: ACM
Erscheinungsort: Lyon, France

Referierte Veröffentlichung

BibTeX

Kurzfassung
Searching over heterogeneous structured data on the Web is challenging due to vocabulary and structure mismatches among different data sources. In this paper, we study two existing strategies and present a new approach to integrate additional data sources into the search process. The first strategy relies on data integration to mediate mismatches through upfront computation of mappings, based on which queries are rewritten to fit individual sources. The other extreme is keyword search, which does not require any up- front investment, but ignores structure information. Building on these strategies, we present a hybrid approach, which combines the advantages of both. Our approach does not require any upfront data integration, but also leverages the fine grained structure of the underlying data. For a structured query adhering to the vocabulary of just one source, the so-called seed query, we construct an entity relevance model (ERM), which captures the content and the struc- ture of the seed query results. This ERM is then aligned on the fly with keyword search results retrieved from other sources and also used to rank these results. The outcome of our experiments using large-scale real-world data sets suggests that data integration leads to higher search effective- ness compared to keyword search and that our new hybrid approach consistently exceeds both strategies.

ISBN: 978-1-4503-1229-5
Download: Media:Dherzig-web-data-search-integration.pdf
Weitere Informationen unter: Link
DOI Link: 10.1145/2187836.2187856

Projekt

IGreen



Forschungsgruppe

Wissensmanagement


Forschungsgebiet

Vernetzte Daten, Information Retrieval, Semantische Suche, Semantic Web, WWW Systeme