-

Preface to the Understanding Literature References in Academic Full Text workshop at JCDL 2022

Anastasiia Iurshina

Anastasiia.Iurshina@ipvs.uni-stuttgart.de

Muhammad Ahsan Shahid

Ahsan.Shahid@gesis.org

Tobias Backes

Tobias.Backes@gesis.org

Philipp Mayr

Philipp.Mayr@gesis.org

Staab

Steffen.Staab@ipvs.uni-stuttgart.de

2022

This preface describes the Understanding Literature References in Academic Full Text (ULITE) workshop. ULITE was held as a virtual event on June 24, 2022. It was co-located with the Joint Conference on The goal of the ULITE workshop1 at JCDL 2022 is to engage communities interested in the broad topic of literature reference understanding and automatic processing of scientific fulltext publications. Our workshop has a focus on working with open infrastructures/tools and ofering the extracted information as open data for reuse. Our view is to expose people from one community to the work of the respective other community and to foster fruitful interaction across communities.

1. Introduction (S. Staab) CEUR Workshop Proceedings

2.1. Keynote We had one keynote speaker:

Silvio Perroni (University of Bologna, Italy) OpenCitations: a short introduction: In this paper, Silvio introduced a brief history of open citations, their main characteristics and use in the context of OpenCitations, a scholarly infrastructure organisation dedicated to open scholarship and the publication of open bibliographic and citation data using Semantic Web technologies.

2.2. Research papers Four research papers were presented at ULITE.

• Frederik Arnold and Robert Jäschke:

A Game with Complex Rules: Literature References in Literary Studies • Christian Boulanger and Anastasiia Iurshina:

Extracting bibliographic references from footnotes with EXcite-docker • Bastian Birkeneder, Philipp Aufenvenne, Christian Haase, Philipp Mayr and Malte Steinbrink:

Extracting literature references in German Speaking Geography – the GEOcite project • Tarek Saier, Meng Luan and Michael Färber:

A Blocking-Based Approach to Enhance Large-Scale Reference Linking

2.3. Invited talks

Four invited talks were given, for two of them papers were submitted: • Arcangelo Massari and Ivan Heibi:

How to structure citations data and bibliographic metadata in the OpenCitations accepted format • Silvia Eunice Gutiérrez De la Torre, Julián Equihua, Andreas Niekler and Manuel Burghardt: Into the bibliography jungle: using random forests to predict dissertations’ reference section

The two talks without papers:

• Bikash Joshi

”Inline Citation Extraction from Scientific Manuscripts” • Swati Sanagar

”Finest Tool for Bibliography Reference Matching to Article and Deduplication” The main outcome of the joined discussion between participants is the decision to join forces in creating a multi-domain golden standard dataset for literature references extraction and segmentation. It is clear that the lack of annotated data is one of the most serious limitations for the progress in the task of automatic reference extraction and segmentation. As annotating of the data is a very time-consuming and laborious process, it is dificult for one team to obtain enough data. However, by combining several smaller datasets, we can create one of the substantial size. In addition to the size, as the participants come from very diferent domains (law, literature, geography etc), the format of the annotated articles would be very diverse.

3. Workshop outcome