Aus Aifbportal
Wechseln zu:Navigation, Suche

Relational Schemata for Distributed SPARQL Query Processing

Relational Schemata for Distributed SPARQL Query Processing

Published: 2019

Buchtitel: Proceedings of the International Workshop on Semantic Big Data (SBD∂SIGMOD'19)
Verlag: ACM

Referierte Veröffentlichung


To benefit from mature database technology RDF stores are built on top of relational databases and SPARQL queries are mapped into SQL. Using a shared-nothing computer cluster is a way to achieve scalability by carrying out query processing on top of large RDF datasets in a distributed fashion. Aiming to this the current paper elaborates on the impact of relational schema design when queries are mapped into Apache Spark SQL. A single triple table, a set of tables resulting from partitioning by predicate, a single wide table covering all properties, and a set of tables based on the application model specification called domain-dependent-schema, are the considered designs. For each of the mentioned approaches, the rows of the corresponding tables are stored in the distributed file system HDFS using the columnar-store Parquet. Experiments using standard benchmarks demonstrate that the single wide property table approach, despite its simplicity, is superior to other approaches. Further experiments demonstrate that this single table approach continues to be attractive even when repartitioning by key (RDF subject) is applied before executing queries.

Download: Media:RelationalSchemata_SBD2019.pdf
Weitere Informationen unter: Link
DOI Link: 10.1145/3323878.3325804


Web Science


Verteilte Algorithmen, Semantic Web