Aus Aifbportal
Wechseln zu:Navigation, Suche

Bibliometric-Enhanced arXiv: A Data Set for Paper-Based and Citation-Based Tasks

Bibliometric-Enhanced arXiv: A Data Set for Paper-Based and Citation-Based Tasks

Published: 2019 April

Buchtitel: Proceedings of the 8th International Workshop on Bibliometric-enhanced Information Retrieval (BIR) co-located with the 41st European Conference on Information Retrieval (ECIR 2019)
Seiten: 14–26
Verlag: CEUR-WS

Referierte Veröffentlichung


In recent years, several research paper-based tasks, such as paper recommendation, and citation-based tasks, such as citation recommendation and citation context-based document summarization, have been proposed. The evaluations of approaches to such tasks and their applicability in real-world scenarios heavily depend on the used data set. However, existing data sets are limited in several regards. In this paper, we propose a new data set based on all publications from all scientific fields available on Apart from providing the papers' plain text, in-text citations were annotated via global identifiers. As far as possible, cited publications were linked to the Microsoft Academic Graph. Our data set consists of over one million documents and 29.2 million citation contexts. The data set, which is made freely available for research purposes, not only can enhance the future evaluation of research paper-based and citation context-based approaches but also serve as a basis for novel ideas to analyze papers.

Download: Media:arXiv_Dataset_BIR2019.pdf
Weitere Informationen unter: Link

Verknüpfte Datasets



Web Science


Information Retrieval, Informationsextraktion, Natürliche Sprachverarbeitung