Stage-oe-small.jpg

Cyrillic Script Publication Metadata Extraction

Aus Aifbportal
Wechseln zu:Navigation, Suche



Transparent.png

Cyrillic Script Publication Metadata Extraction

Daten für Training und Evaluation von Metadataexraktionsmodellen basierend auf 15 Tausend kyrillischen Publikationen


Kontaktperson: Tarek Saier

https://zenodo.org/record/4708696

Forschungsgruppe: Web Science

Veröffentlichungsdatum: 2021/04/22


Beschreibung

Data for training and evaluating sequence labeling models for metadata extraction based on 15,553 Cyrillic script language papers spanning 27 years and three languages. For each paper, ground truth sequence labeling output is provided in TEI format and as annotated plain text.


Involvierte Personen
Johan KrauseIgor ShapiroTarek SaierMichael Färber


Publikationen

inproceedings
Igor Shapiro, Tarek Saier, Michael Färber
Sequence Labeling for Citation Field Extraction from Cyrillic Script References
Proceedings of the AAAI Workshop on Scientific Document Understanding (SDU∂AAAI'22), ACM
(Details)


Johan Krause, Igor Shapiro, Tarek Saier, Michael Färber
Bootstrapping Multilingual Metadata Extraction: A Showcase in Cyrillic
Proceedings of the Second Workshop on Scholarly Document Processing, CEUR-WS
(Details)


↑ top


Projekte