Version vom 26. September 2019, 09:24 Uhr

Bibliometric-Enhanced arXiv: A Data Set for Paper-Based and Citation-Based Tasks

Published: 2019 April

Buchtitel: Proceedings of the 8th International Workshop on Bibliometric-enhanced Information Retrieval (BIR) co-located with the 41st European Conference on Information Retrieval (ECIR 2019)
Seiten: 14–26
Verlag: CEUR-WS

Referierte Veröffentlichung

BibTeX

Kurzfassung
In recent years, several research paper-based tasks, such as paper recommendation, and citation-based tasks, such as citation recommendation and citation context-based document summarization, have been proposed. The evaluations of approaches to such tasks and their applicability in real-world scenarios heavily depend on the used data set. However, existing data sets are limited in several regards. In this paper, we propose a new data set based on all publications from all scientific fields available on arXiv.org. Apart from providing the papers' plain text, in-text citations were annotated via global identifiers. As far as possible, cited publications were linked to the Microsoft Academic Graph. Our data set consists of over one million documents and 29.2 million citation contexts. The data set, which is made freely available for research purposes, not only can enhance the future evaluation of research paper-based and citation context-based approaches but also serve as a basis for novel ideas to analyze papers.

Weitere Informationen unter: Link

Verknüpfte Datasets

UnarXive

Forschungsgruppe

Web Science

Forschungsgebiet

Information Retrieval, Informationsextraktion, Natürliche Sprachverarbeitung

@@ Zeile 8: / Zeile 8: @@
 }}
 {{Inproceedings
-|Referiert=Ja
+|Referiert=true
 |Title=Bibliometric-Enhanced arXiv: A Data Set for Paper-Based and Citation-Based Tasks
 |Year=2019

Inproceedings3756: Unterschied zwischen den Versionen

Version vom 26. September 2019, 09:24 Uhr

Bibliometric-Enhanced arXiv: A Data Set for Paper-Based and Citation-Based Tasks

Bibliometric-Enhanced arXiv: A Data Set for Paper-Based and Citation-Based Tasks