Stage-oe-small.jpg

Inproceedings3756

Aus Aifbportal
Wechseln zu:Navigation, Suche


Bibliometric-Enhanced arXiv: A Data Set for Paper-Based and Citation-Based Tasks


Bibliometric-Enhanced arXiv: A Data Set for Paper-Based and Citation-Based Tasks



Published: 2019 April

Buchtitel: Proceedings of the 8th International Workshop on Bibliometric-enhanced Information Retrieval (BIR) co-located with the 41st European Conference on Information Retrieval (ECIR 2019)
Seiten: 14–26
Verlag: CEUR-WS

Referierte Veröffentlichung
Note: https://dblp.org/rec/bibtex/conf/ecir/SaierF19

BibTeX


Kurzfassung
In recent years, several research paper-based tasks, such as paper recommendation, and citation-based tasks, such as citation recommendation and citation context-based document summarization, have been proposed. The evaluations of approaches to such tasks and their applicability in real-world scenarios heavily depend on the used data set. However, existing data sets are limited in several regards. In this paper, we propose a new data set based on all publications from all scientific fields available on arXiv.org. Apart from providing the papers' plain text, in-text citations were annotated via global identifiers. As far as possible, cited publications were linked to the Microsoft Academic Graph. Our data set consists of over one million documents and 29.2 million citation contexts. The data set, which is made freely available for research purposes, not only can enhance the future evaluation of research paper-based and citation context-based approaches but also serve as a basis for novel ideas to analyze papers.

Download: Media:arXiv_Dataset_BIR2019.pdf
Weitere Informationen unter: Link


Verknüpfte Datasets

UnarXive


Forschungsgruppe

Web Science


Forschungsgebiet

Information Retrieval, Informationsextraktion, Natürliche Sprachverarbeitung