Stage-oe-small.jpg

Techreport3039: Unterschied zwischen den Versionen

Aus Aifbportal
Wechseln zu:Navigation, Suche
(Die Seite wurde neu angelegt: „{{Publikation Erster Autor |ErsterAutorNachname=Wagner |ErsterAutorVorname=Andreas }} {{Publikation Author |Rank=2 |Author=Veli Bicer }} {{Publikation Author |Ran…“)
 
 
(4 dazwischenliegende Versionen desselben Benutzers werden nicht angezeigt)
Zeile 20: Zeile 20:
 
}}
 
}}
 
{{Publikation Details
 
{{Publikation Details
|Abstract=The Resource Description Framework (RDF) has become
+
|Abstract=Many RDF descriptions today are text-rich: besides struc-
an accepted standard for describing entities on the Web.
+
tured data they also feature much unstructured text. Text-rich RDF data is frequently queried via predicates matching structured data, combined with string predicates for textual constraints (hybrid queries). Evaluating hybrid queries efficiently requires means for selectivity estimation.
At the same time, many RDF descriptions today are text-
+
Previous works on selectivity estimation, however, suffer from inherent drawbacks, which are reflected in efficiency and effectiveness issues. We propose a novel estimation approach, TopGuess, which exploits topic models as data synopsis. This way, we capture correlations between structured and unstructured data in a uniform and scalable manner. We study
rich besides structured data, they also feature large portions of unstructured text. Such semi-structured data is frequently queried using predicates matching structured data, combined with string predicates for textual constraints: hybrid queries. Evaluating hybrid queries efficiently requires effective means for selectivity estimation. Previous works on selectivity estimation, however, target either structured or unstructured data alone. In contrast, we study the prob-
+
TopGuess in a theoretical analysis and show it to guarantee a linear space complexity w.r.t. text data size. Further, we show selectivity estimation time complexity to be independent from the synopsis size. In experiments on real-world data, TopGuess allowed for great improvements in estimation accuracy, without sacrificing efficiency.
lem in a uniform manner by exploiting a topic model as
+
|Download=Awa-topguess-selectivity-estimation-tr.pdf‎.pdf
data synopsis, which enables us to accurately capture correlations between structured and unstructured data. Relying on this synopsis, our novel topic-based approach (TopGuess) uses as small, fine-grained query-specific Bayesian network (BN). In experiments on real-world data we show that the query-specific BN allows for great improvements in estimation accuracy. Compared to a baseline relying on PRMs we could achieve a gain of 20%. In terms of efficiency TopGuess
 
performed comparable to our baselines.
 
|Download=TR-topguess-selectivityestimation.pdf
 
 
|Projekt=IZEUS
 
|Projekt=IZEUS
 
|Forschungsgruppe=Wissensmanagement
 
|Forschungsgruppe=Wissensmanagement

Aktuelle Version vom 15. Januar 2014, 16:43 Uhr


Topic-based Selectivity Estimation for Hybrid Queries over RDF Graphs




Published: 2013 Mai
Institution: Institute AIFB, KIT
Erscheinungsort / Ort: Karlsruhe
Archivierungsnummer:3039

BibTeX



Kurzfassung
Many RDF descriptions today are text-rich: besides struc- tured data they also feature much unstructured text. Text-rich RDF data is frequently queried via predicates matching structured data, combined with string predicates for textual constraints (hybrid queries). Evaluating hybrid queries efficiently requires means for selectivity estimation. Previous works on selectivity estimation, however, suffer from inherent drawbacks, which are reflected in efficiency and effectiveness issues. We propose a novel estimation approach, TopGuess, which exploits topic models as data synopsis. This way, we capture correlations between structured and unstructured data in a uniform and scalable manner. We study TopGuess in a theoretical analysis and show it to guarantee a linear space complexity w.r.t. text data size. Further, we show selectivity estimation time complexity to be independent from the synopsis size. In experiments on real-world data, TopGuess allowed for great improvements in estimation accuracy, without sacrificing efficiency.

Download: Media:Awa-topguess-selectivity-estimation-tr.pdf‎.pdf

Projekt

IZEUS



Forschungsgruppe

Wissensmanagement


Forschungsgebiet

Semantische Suche