Published: 2013 Oktober
Institution: Institut AIFB, KIT
Erscheinungsort / Ort: Karlsruhe
In recent years, the amount of RDF data on the Web has drastically increased. For an effective search over such a large Web of data, ranking of results is crucial. To allow efficient query processing of ranked queries, top-k join processing has been proposed, which aims at computing k top-ranked results, without complete result materialization. However, Web search poses novel challenges. Most notably, users are frequently willing to sacrifice result accuracy in favor of result computation time. Thus, there is a strong need for approximation of the top-k results. Previous work on approximate top-k processing, however, is not applicable for the join top-k setting – it solely targets the selection top-k problem. Further, existing approaches require offline computed score statistics to be available. Unfortunately, for many important kinds of Web search queries, e.g., keyword queries or queries with geographical con- straints, one has no score information before runtime. That is, these queries commonly employ query-/user-dependent ranking. In this paper, we address these shortcomings and propose a novel approximate top-k join framework, well-suited for Web search queries and ranking. For this, all necessary score statistics are learned via a pay-as-you-go training at runtime. We study our approach in a theoretical manner, and show its efficiency as well as effectiveness. Further, we conducted experiments on benchmarks comprising synthetic as well as real-world Web data/queries. Our results are very promising, as we could achieve up to 90% performance gains, while maintaining a high level of precision/recall.