Datum: 1. Juli 2010
KIT, Fakultät für Wirtschaftswissenschaften
Erscheinungsort / Ort: Karlsruhe
Referent(en): Rudi Studer
In this thesis, we elaborate on the conceptualization, objectives, problems, and challenges of Semantic Web search. We firstly introduce a general model for search, which allows us to compare different notions that exist in literature. Specifically, the task of data retrieval can be distinguished from the one of document retrieval. Independent of the tasks and the objects to be retrieved, there is one category of systems called semantic search, which employ a semantic model for searching. Semantic Web search can be considered as special kind of semantic search, which is focused on the task of data retrieval on the Semantic Web. The main objective of Semantic Web search is to address more complex information needs. This breaks down to delivering more relevant results to more complex queries, which might range from precise answers in the form of facts, to complex results in the form of entities and their relations, up to integrated units of content that combine heterogeneous data from different sources on the Web. The challenges that lie ahead include dealing with the large and increasing volume of data on the Semantic Web, its heterogeneity as well as its complexity. In fact, complexity in this scenario has many facets. From the system point of view, complexity poses additional requirements for data management and processing. But also, complexity imposes additional burden on the user. An effective Semantic Web search solution must be not only efficient, scalable and deliver high quality results but also, the users must be able to exploit it.
Based on this framework of Semantic Web search, we survey the state of the art and also, briefly present and position the specific contributions we made in the last four years that address some of the problems and challenges. We look at different tasks that are required for Semantic Web search. In particular, we study concepts and techniques that have been proposed for crawling, storing, indexing, querying and for ranking resources on the Semantic Web. Since Semantic Web search necessarily involves the use of multiples data sources, we also discuss solutions for federated query processing and data integration that aim to deal with the multi-source search problem.
Most of the contributions we presented in that survey part have been implemented and integrated into an approach called SemSearchPro, one of the first solution towards large-scale Semantic Web search. This thesis is focus on the presentation of this approach. SemSearchPro addresses the challenges of data volume, heterogeneity and complexity. It uses an index, whose design is inspired by Web search technologies that have proven to scale to the Web scenario. In fact, the same data structure that is used for managing Web documents is used for managing and indexing Semantic Web data. For dealing with heterogeneous data, SemSearchPro leverages the large body of research work on database integration to precompute mappings (links) between data sources. These links stored in a specialized index are then used to improve the performance of federated query processing, i.e., to combine results from different sources more efficiently. For usability as well as for efficiency, SemSearchPro heavily relies on the use of semantic models.
On the one hand, semantic models can be seen as a source of knowledge based on which the users can be supported through various tasks. In fact, we argue that for usabable Semantic Web search, the entire search process including the steps of query construction, query processing, and result presentation and refinement have to be taken into account. That is, we have to look beyond the task of processing queries against resources and to investigate where else semantics can help to support users in dealing with the complex data, queries and results involved in Semantic Web search. SemSearchPro implements the concept of process-oriented Semantic Web search to accomplish this. It uses semantic models to translate simple user keywords to possibly complex structured queries, to automatically select the appropriate widgets for the given results, and to generate facets for the user to refine the current results.
One the other hand, semantic models are treated as summary models. SemSearchPro shows that operating mainly on the summary models instead of using the actual data can help to improve the performance of several tasks. This thesis elaborates in detail how these semantic models / summary models can be exploited to translate keywords to structured queries and also, to process complex structured queries more efficiently.