Introduction

Editorial for the Second Workshop on Mining Scienti c Papers: Computational Linguistics and Bibliometrics (CLBib2017)

0 Iana Atanassova Centre Tesniere - CRIT, University of Bourgogne Franche-Comte, Besancon, France Marc Bertin ELICO Laboratory, University of Lyon , Lyon , France Philipp Mayr GESIS 1 Leibniz-Institute for the Social Sciences , Cologne , Germany

2017

Scope and Motivation The CLBib workshops aim to bring together researchers in bibliometrics and computational linguistics in order to study the ways bibliometrics can bene t from large-scale text analytics and sense mining of scienti c papers, thus exploring the interdisciplinarity of Bibliometrics and Natural Language Processing. Working with full text allows us to go beyond metadata used in bibliometrics. Full text o ers a new eld of investigation, where the major problems arise around the organization and structure of text, the extraction of information and its representation on the level of metadata. Furthermore, the study of contexts around in-text citations o ers new perspectives related to the semantic dimension of citations. The analyses of citation contexts and the semantic

Introduction

categorization of publications will allow us to rethink co-citation networks, bibliographic coupling and other bibliometric techniques.

The rst edition of this workshop1, co-located with the International Society of Scientometrics and Informetrics Conference (ISSI) in 2015, attracted more than 70 participants and six full paper contributions, showing a large interest in these topics in the community. From a technical point of view, during the rst edition of the workshop, the e orts to provide articles in machine-readable formats and the rise of Open Access publishing have resulted in a number of standardized formats for scienti c papers, full-text datasets for research experiments and corpora and focus on number of open source tools for versatile text processing.

The goal of this second edition of the CLBib workshop, co-located with the ISSI conference 2017, is to continue to encourage the collaboration between these two domains and to answer questions like: How can we enhance author network analysis and Bibliometrics using data obtained by text analytics? What insights can NLP provide on the structure of scienti c writing, on citation networks, and on in-text citation analysis? Natural Language Processing and Bibliometrics meet again in this second workshop in a context where Open Access is at the heart of exchanges between scientists and publishers and raises many economic and ethical issues, but also new research problems through the access to articles in full text. Indeed, the possibility of enriching metadata currently used in bibliometrics with information from the text is an essential step towards building the tools of tomorrow.

As the CLBib 2017 workshop was held in China, at Wuhan University, the discussions raised important questions not only around the processing of scienti c papers but also on the need to take into account the multilingual aspect of the scienti c production. Even if today English is essential on the international stage, national level publications can also be rich in information and relevant for bibliometric studies. The linguistic aspect, which is more and more present at the ISSI conference, must be taken into consideration and highlights the importance of this workshop series and the growing interest in the community of bibliometricians but also in other communities for Natural Language Processing. 3

Overview of the papers

1See the proceedings of the rst edition of the workshop: http://ceur-ws.org/Vol-1384/, [ 1 ].

2https://easychair.org/cfp/CLBib2017 Gu Dongxiao and Shi Jin [ 7 ]. Among the methods that are used are social network analysis, keyword and coword analysis and clustering. Considering the hypothesis that authors that work on similar topics and keywords could potentially be contributors, this paper provides a method for author similarity analysis. 4

Outlook

The interest for this interdisciplinary research has been growing during the last years (see e.g. the workshops of BIRNDL - "Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries" and WoSP - "Workshop on Mining Scienti c Publications") and the series of CLBib workshops up to now have shown that both elds of Natural Language Processing and Bibliometrics can bene t from addressing the problem of the full text processing of papers.

As a result of this workshop series, a new Research Topic "Mining Scienti c Papers: NLP-enhanced Bibliometrics"3 has been launched as part of the "Frontiers in Research Metrics and Analytics" journal published in Open Access. We intend to continue the e ort to bring both communities together and foster the development of semantic technologies dedicated to Bibliometrics and Scientometrics. 4.0.1

Acknowledgements Part of this research has been funded by the FEDER (Fonds europeen de developpement regional) and selected by the French-Swiss programme Interreg V: Webso+ project4. 3https://www.frontiersin.org/research-topics/7043/mining-scientific-papers-nlp-enhanced-bibliometrics 4http://tesniere.univ-fcomte.fr/projet-webso/ [9] Wang, J., Ma, S., Zhang, C.: Citationas: A summary generation tool based on clustering of retrieved citation content. In: Atanassova, I., Bertin, M., Mayr, P. (eds.) 2nd Workshop on Mining Scienti c Papers: Computational Linguistics and Bibliometrics collocated with 16th International Conference on Scientometrics and Informetrics (ISSI 2017). CEUR-WS.org (2017)

[1] Atanassova , I. , Bertin , M. , Mayr , P. : Editorial for the rst workshop on mining scienti c papers: Computational linguistics and bibliometrics . In: Proceedings of the First Workshop on Mining Scienti c Papers: Computational Linguistics and Bibliometrics co-located with 15th International Society of Scientometrics and Informetrics Conference (ISSI 2015 ), Istanbul, Turkey, June 29, 2015 . pp. 1 { 4 ( 2015 ), http://ceur-ws. org/ Vol- 1384 /editorial.pdf

[2] Bertin , M. , Atanassova , I. , Lariviere , V. , Gingras , Y. : The invariant distribution of references in scienti c articles . Journal of the Association for Information Science and Technology (JASIST) 67(1) , 164 { 177 ( 2016 ), http://dx.doi.org/10.1002/asi.23367

[3] Chen , C. : Citespace ii: Detecting and visualizing emerging trends and transient patterns in scienti c literature . Journal of the American Society for Information Science and Technology 57 ( 3 ), 359 { 377 ( 2006 ), http://dx.doi.org/10.1002/asi.20317

[4] Gu , D. , Liu , B. , Bichindaritz , I. , Liang , C. : Temporal evolution, research themes, and emerging trends in case-based reasoning literature . In: Atanassova, I. , Bertin , M. , Mayr , P. (eds.) 2nd Workshop on Mining Scienti c Papers: Computational Linguistics and Bibliometrics collocated with 16th International Conference on Scientometrics and Informetrics (ISSI 2017) . CEUR-WS.org ( 2017 )

[5] He , J. , Chen , C. : Understanding the changing roles of scienti c publications via citation embeddings . In: Atanassova, I. , Bertin , M. , Mayr , P. (eds.) 2nd Workshop on Mining Scienti c Papers: Computational Linguistics and Bibliometrics collocated with 16th International Conference on Scientometrics and Informetrics (ISSI 2017) . CEUR-WS.org ( 2017 )

[6] Mayr , P. , Scharnhorst , A. : Combining bibliometrics and information retrieval: preface . Scientometrics 102 ( 3 ), 2191 {2192 (Mar 2015 ), https://doi.org/10.1007/s11192-015-1529-2

[7] Peng , Y. , Gu , D. , Jin , S. : Mining the potential collaborative relationships based on the author keyword coupling analysis and social network analysis . In: Atanassova , I. , Bertin , M. , Mayr , P. (eds.) 2nd Workshop on Mining Scienti c Papers: Computational Linguistics and Bibliometrics collocated with 16th International Conference on Scientometrics and Informetrics (ISSI 2017) . CEUR-WS.org ( 2017 )

[8] Shotton , D. : Cito, the citation typing ontology . Journal of Biomedical Semantics 1 ( 1 ), S6 (Jun 2010 ), https://doi.org/10.1186/2041-1480-1-S1-S6