A Large-Scale Analysis of Cross-Lingual Citations in English Papers
Buchtitel: Proceedings of the 22nd International Conference on Asia-Pacific Digital Libraries
Citation data is an important source of insight into the scholarly discourse and the reception of publications. Outcomes of citation analyses and the applicability of citation based machine learning approaches heavily depend on the completeness of citation data. One particular shortcoming of scholarly data nowadays is language coverage. That is, non-English publications are often not included in data sets, or language metadata is not available. While national citation indices exist, these are often not interconnected to other data sets. Because of this, citations between publications of differing languages (cross-lingual citations) have only been studied to a very limited degree. In this paper, we present an analysis of cross-lingual citations based on one million English papers, covering three scientific disciplines and a time span of 27 years. Our results unveil differences between languages and disciplines, show developments over time, and give insight into the impact of cross-lingual citations on scholarly data mining as well as the publications that contain them. To facilitate further analyses, we make our collected data and code for analysis publicly available.