Mapping the Trending Topics of Bibliometric-enhanced Information Retrieval Francisco Bolaños a a Universidad Espíritu Santo, Km 2.5 vía a Samborondón, Ecuador Abstract This paper aims to map the trending topics of Bibliometric-enhanced Information Retrieval (BIR) based on co-word network analysis and strategic diagram. The bibliographic dataset was gathered from proceedings and special issues from 2014 to 2021 which yields a final corpus of 227 papers. The unit of analysis was the titles, abstracts, and keywords. For the co-word network, we found five communities: Scientific Summarization Shared Tasks (SSST), Information Retrieval (IR), Bibliometrics and Science Mapping (BSM), Citation Context Analysis (CCA), and Mathematical Citation Context Analysis (MCCA). The first and the fourth communities are highly based on NLP and Computational Linguistics. The strategic map identified SST, CCA, and IR as BIR’s core themes. Furthermore, language models were recognized as a hot topic that is mostly used in scientific summarization. In the beginning, BIR’s focus was on bibliometrics and informational retrieval, but in recent years with the developments in NLP, other topics such as scientific-fact checking, argumentation mining, information extraction, scientific search based on Question and Answering are being studied. Keywords Bibliometric-enhanced Information Retrieval, Information Retrieval, Bibliometrics and Science Mapping, Citation Context Analysis, Mathematical Citation Context Analysis, Language Models 1. Introduction Searching for scientific information is not a trivial task [1] due to the vast amount of literature, lack of standardized vocabulary, abstract reporting format, gold standard selection bias, process automation, and output generalizability [2]. Bibliometric-enhanced Information Retrieval (BIR) aims to fill this gap based on their synergies and related topics such as Computational Linguistics (CL) and NLP. The main purpose of BIR is to gather bibliometrics and IR communities about possible links and overlapping on their research fields. The first BIR conference examined how statistical models such as Bradfordizing or science mapping analysis(co-authorship) can improve IR [3]. In the second BIR, the research community published the topics: citation context (citance) and complementary search techniques (stratagem, article recommendation, polyrepresentation) [4]. The third and the fourth BIR workshops aimed to boost BIR into academic search engine interfaces considering the scholarly and industrial researchers’ perspective [5, 6]. In the fifth BIR, IR systems together with how scientific knowledge is created, communicated, and used were part of the published manuscripts [7]. The sixth BIR focused on search and recommendation process [8]. The last two BIRs studied expert finding and ranking models, citations, learning to rank and evaluation [9], and a variety of topics such as NLP (NER, language models) for information extraction, related work recommendation , to name a few [10]. BIR 2022: 12th International Workshop on Bibliometric-enhanced Information Retrieval at ECIR 2022, April 10, 2022, hybrid. EMAIL: fcobolanos@uees.edu.ec © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) 61 BIRNDL was launched as an initiative to study how NLP, IR, scientometrics and recommendation techniques contribute to the understanding, analysis and retrieval of scholarly documents at scale mainly based on digital libraries. The first BIRNDL included a variety of the aforementioned topics as long as the CL-SciSumm Shared Task. CL-SciSumm, challenges participants to fulfill some tasks related to scientific summarization based on a curated CL Corpus [11]. The second BIRNDL produced scholarly documents in the areas of: paper recommendation, annotation of research papers, automatic paper classification, collaboration patterns, bibliographic database coverage, user behavior and CL-SciSumm Shared Task [12]. The last BIRNDL digged into scholarly search, retrieval and user experience in digital libraries or related sources based on bibliometrics, deep learning or NLP and CL-SciSumm Shared Task [13]. Since 2020, BIRNDL was replaced by the Scholarly Document Processing (SDP) workshop which considers wider sources not only digital libraries. The first SDP examined end uses of the scholarly literature, challenges associated with automated understanding, generation or question answering, and three shared tasks: CL-SciSumm, CL-LaySumm and LongSumm [14]. Finally, regarding proceedings, the CLBIB highlighted enhancement in science mapping based on text analysis, and NLP contributions to the structure of scientific writing, citation networks, and in-text citation analysis [15]. With regard to special issues, most of the papers are extended versions of the aforementioned proceedings and few belong to open calls. Scientometrics has three special issues. The first, portrayed the overlapping between bibliometrics, scientometrics, informetrics and IR [16]. The second, covered a variety of papers in the crossroad between bibliometrics and IR [17]. The last, depicted bibliometrics/scientometrics with IR and NLP, IR and text-mining of scholarly literature, and NLP- oriented papers[18]. Frontiers in Research Metrics and Analytics (FRMA) focused on the implementation of NLP , CL and bibliometrics in large scale text analytics in scientific corpus [19]. Finally, the International Journal on Digital Libraries (IJDL) presented two set of papers, ones focusing on the synergies between NLP, bibliometrics and IR in digital libraries and the other examined CL- SciSumm Shared Task [20]. BIR is a hot topic and due to the Covid-19 pandemic its research areas are taking more relevance for the scientific community. To our knowledge, there is no scholarly document that maps BIR’s trending topics regarding emerging themes, declining themes, core themes, and hot themes. The aim of this paper is to map BIR’s trending topics based on co-word network analysis and the strategic diagram approach proposed by Cobo et al. [21]. The former will describe its conceptual structure, and the latter will identify its trending topics. 2. Method 2.1. Methodology For the conceptual structure, we implemented co-word analysis which establishes the idea that a paper’s keywords are a true portray of its content. Two keywords that appear within the same manuscript represent their thematic connection [22]. Thus, themes are strengthened as more co- occurrence keywords pairs emerge [23]. The trending topics followed the methodology proposed by Cobo et al. [21]: 1. Detection of the research themes: a network is built on the terms extracted from the documents based on a co-occurrence matrix and a normalization technique. Themes are identified partitioning the network using a clustering algorithm such as simple center, single link, complete link, average link, or sum link [24] yielding a set of strongly-related terms which constitute the main themes. 2. Visualizing research themes and thematic networks: For research themes, a two-dimensional strategic diagram is used to obtain a spatial layout according to centrality and density measures. The former determines the external interaction between the networks and can be considered as the topic’s relevance for the research field, whereas the latter assesses the network internal cohesion and can be interpreted as a degree of theme development. Themes can be categorized into four groups: (a) motor themes in the upper-right quadrant which are well-developed and relevant for the research field; (b) basic and transversal themes in the lower-right quadrant which are considered relevant for the research field, but not fully developed; (c) emerging or declining themes in the lower left quadrant which are poorly or marginally developed, and (d) highly developed and isolated themes in the upper-left 62 quadrant which are well-developed but not relevant for the research area. Themes are plotted on spheres and their sizes are proportional to the documents or bibliometric performance indicators (see Figure 1a). Regarding thematic networks (network graphs), they are labelled with the most significant keyword of the theme or manual assignment depending on the chosen clustering algorithm. The size of each term is associated to the number of documents or bibliometric performance indicators. The thickness of the line between two keywords represents the strength of the normalization technique (see Figure 1b). 3. Performance analysis: It identifies the most relevant and productive themes based on number of documents or bibliometric performance indicators. This study was developed in several stages: a) data collection, b) data processing, and c) data visualization. For the conceptual structure, we created an ad-hoc script based on the Tidyverse [25] which cleaned the data and created the co-occurrence matrix. We used Vosviewer standalone version [26] for data visualization. Regarding the trending topics, we created the csv file requested by Cobo et al. [27] based on the ad-hoc script and used SciMAT for data visualization. 2.2. Data Collection We used as a reference dataset the BIR bibliography1 and as the check list the work of Cabanac et al. [28], since it gives a clear description of each workshop and special issue. For BIR, BIRNDL, CLBIB, Scientometrics, and IJDL, the metadata was retrieved from Scopus. Meanwhile, for FRMA and SDP, the metadata was obtained from Dimensions and manual search, respectively. Bibliometrix [29] allowed us to retrieve the metadata from Scopus and Dimensions. The scholarly documents included in our corpus comprises the time range of 2014-2021. We did not include editorial letters and keynotes’ papers in our final corpus. Furthermore, original papers from proceedings published as extended versions in the special issues were excluded. The initial corpus had 297 papers, after the exclusion process it retained 227 documents (see Appendix A). Most of them belong to proceedings (77.5%) and the rest to journals (22.5%). Figure 2 depicts the PRISMA diagram [30]. 2.3. Data Preprocessing Our unit of analysis was the titles, abstracts and keywords. For the titles and abstracts, we used spaCy applying noun phrases based on the en_core_web_lg model; this was executed on an ad-hoc Python script. The cleaning process was implemented on the ad-hoc script based on the Tidyverse. After removing stop words, the corpus had 386 noun phrases. Subsequently the normalization and deletion of general terms yields a corpus of 209 noun phrases. 2.4. Data Visualization For the conceptual structure, after many analyses and in alignment with the reading of the 227 papers, we decided to establish a term frequency of one and a co-occurrence of one because it yields the best results. The co-occurrence matrix was created and transformed into the Vosviewer network file format. The network was normalized with the association strength, which according to van Eck & Waltman [31] is the most suitable normalization measure for co-occurrence data. The network layout implemented the Visualization of Similarities with an attraction of 2 and repulsion of 0 as parameters. Regarding community detection, we used the Leiden algorithm, and after analyzing many scenarios (network outputs), we chose a parameter resolution of 0.10. For the trending topics, we used the aforementioned ad-hoc script thus, there was no need for term normalization. We based our analysis on a co-occurrence matrix and used the equivalence index for network normalization. The selected clustering algorithm was the simple center algorithm with a maximum network value of 12, and minimum network value of 3. The network reduction threshold was one, and the data reduction threshold was two. This algorithm labels the thematic network with the most relevant topic (the most central term). We chose these four parameters after several content analyses 1 https://github.com/PhilippMayr/Bibliometric-enhanced-IR_Bibliography/ 63 regarding the given scenarios (strategic diagram outputs). The sizes of the spheres are proportional to the number of papers. The aforementioned settings followed the recommendations by Cobo et al. [21]. (a) (b) Figure 1: Examples of Strategic Map (a) and Thematic Network (b) [21] BIR BIRNDL CLBIB SDP Scientometrics IJDL FRMA (95) (82) (14) (55) (30) (14) (7) BIR BIRNDL CLBIB SDP Scientometrics IJDL FRMA (69) (67) (10) (54) (30) (14) (7) BIR BIRNDL CLBIB SDP Scientometrics IJDL FRMA (60) (55) (7) (54) (30) (14) (7) Proceedings Journals (176) (51) Final Dataset (227) Figure 2: PRISMA diagram of BIR final corpus 3. Results Aiming to have a clear picture of the papers that contribute to each community for the conceptual structure, we implemented an ad-hoc script based on the Tidyverse, matching the words obtained from the Vosviewer map file with the titles, abstracts and keywords of the 227 scholarly documents. For each community, we retained the papers that were related to its topic having as a final corpus 104 articles. In Figure 32, we can see BIR co-word network built on the main component based on 205 nodes and 801 edges. Its original network had 209 nodes and 807 edges. It possesses five communities: Scientific Summarization Shared Tasks(red), Information Retrieval (green), Bibliometrics and Science Mapping (blue), Citation Context Analysis (yellow), and Mathematical Citation Context Analysis (purple). The community Scientific Summarization Shared Tasks (SSST) has 52 documents, 23 belong to BIRNDL, 22 to SDP, 4 to Scientometrics, and 3 to IJDL. Regarding the terms’ positioning it has 67 terms of which sentence, cl-scisumm, shared task, reference paper, task 1a, bert, task 1b, and scientific summarization are the most central. 2 Click on the link to see the co-word network inVosviewer online. 64 The CL-SciSumm Shared Task was launched in 2014 [32] as a pilot task with the main goal of addressing the challenges of scientific summarization having as dataset CL papers. It has three sub tasks: • Task 1A: For each citance of the Citing Paper (CP), identify its cited text span of the Reference Paper (RP). It could be a sentence fragment, a full sentence, or consecutive sentences (no more than 5). • Task 1B: For each cited text span in the RP, identify its facet. • Task 2: Generate a summary of no more than 250 words of the cited text span of the RP. It was an optional task. Since 2014 until 2021 diverse heuristical, lexical and supervised approaches have been implemented as long a language models such as BERT, SciBERT or their fine-tuning [33]. In SDP 2020, two new shared tasks were introduced LaySumm and LongSumm. The LaySumm challenges participants to create a summary of about 70-100 words for non-technical audience avoiding technical jargon. Its corpus is based on full text articles, abstracts and author-generated lay summaries in the subjects of Materials Science, Archaeology, Hepatology. The LongSumm addresses the task of generating a summary of 600 words based on extractive summaries on video talks of conferences and abstractive summaries of blog posts of NLP and ML researchers [34]. Most of the systems implemented BERT, SciBERT or their fine-tuning. Abstractive techniques were applied since 2020 with the new shared tasks LaySumm and LongSumm. As stated by Jaidka et al. [33], scientific summarization is in the middle of its scientific development, but significant progress has been done since 2014. Chandrasekaran, Feigenblat, Hovy, et al. [34] argued to find new solutions to the domain specific challenges as well as shifting from single- document to multiple-document summarization. The aforementioned improvements are aligned with Ibrahim Altmami & El Bachir Menai [35] review, but they also suggested to work on benchmark corpora and gold standard summaries. Figure 3: Co-Word Network of BIR The community Information Retrieval has 19 documents, 9 belong to Scientometrics, 7 to B-IR, 2 to SDP, and the rest to BIRNDL. Regarding the terms’ positioning it has 45 terms of which abstract, title, academic search, query, citation network, cluster and scientific domain are the most central. The two main topics of this community are search behavior and search techniques. It uses a variety of techniques like clustering, semantometrics, and deep learning which shows that there is no clear trend in the usage of a specific one. The community Bibliometrics and Science Mapping has 16 documents, 8 belong to BIR, 4 to 65 BIRNDL, 2 to CLBIB, and one to Scientometrics and IJDL respectively. Regarding the terms’ positioning it has 42 terms of which digital library, sentence similarity, recommender system, citation linkage, tf-idf, co-citation analysis and co-citation are the most central. Its scope has a wide range of topics such as: citation-based and topical relevance ranking, satellite documents into co-citation network, Author-Journal-Topic based on citation sentence, and bag of works just to name a few. Likewise, in the previous community there is no clear trend in the techniques. The community Citation Context Analysis has 13 documents, 7 belong to BIR, 3 to FMRA, 2 to BIRNDL, and one to IJDL. Regarding the terms’ positioning it has 34 terms of which citance, citation context analysis, imrad structure, recurrent neural network, imrad, and reference mining are the most central. It focuses in most of the cases on the semantic features of the citation context (citance) applying rule-based methods. Few studies implemented machine learning or deep learning techniques, and the authors limitations are aligned with the literature in regard to the lack of annotated corpus, variability in citation styles [36], and citation polarity [37]. Furthermore, there is a need to detect uncertainty and controversy, and create large datasets for scaling [38] as well as the implementation of advanced NLP techniques for citation classification [39, 40], citation polarity [41], and citation context extraction [42]. The community Mathematical Citation Context Analysis (MCCA) has four documents, 3 belong to BIRNDL, and the other to Scientometrics. Regarding the terms’ positioning it has 17 terms of which word embedding, mathematical information retrieval, mathematical document, and mathematics are the most central. In comparison with the NTCIR-11 Math-2 Task [43] which challenges participants to find the best solution for a given mathematical formula search retrieval, MCCA implements less novel techniques. Appendix B depicts each community. Regarding the strategic map, it is composed of 85 papers and has seven themes (see Figure 4). The themes were named according to the content analysis of each thematic network (see Appendix C). There are three motor themes which represent the core topics of BIR: Scientific Summarization Shared Task (CL-SCISUM), Citation Context Analysis (CITANCE), and Information Retrieval (ACADEMIC RESEARCH). The basic and transversal themes have one topic which is Language Models (BERT). The highly developed and isolated themes have one topic which is Mathematical Citation Context Analysis (MATHEMATICAL-DOCUMENT). The emerging or declining themes have two topics which are Recommender Systems and Science Mapping (RECOMMENDER-SYSTEM), and Advanced NLP in Scholarly Documents (ABSTRACT). The motor themes Scientific Summarization Shared Task, Citation Context Analysis, and Information Retrieval have a high overlapping with the papers found in the communities of the same names in the co-word network. They are considered BIR’s core topics since they have a high centrality and density. Their order of importance is Scientific Summarization Shared Task, Citation Context Analysis, and Information Retrieval. Thus, we could say that these are BIR’s current topics. The basic and transversal theme Language Models is a hot topic since it has high centrality and low density which in the future will be well developed. Thus, we could infer that this is a hidden topic. Language Models have been implemented mostly in scientific summarization, but in recent years BERT, SciBERT and their fine-tunning have been implemented in a mixture of topics. The Mathematical Citation Context Analysis can be considered as an isolated topic because it has low centrality and high density not being influential to the BIR community. This is portrayed on Figure 3 due to the fact that the MCCA community has few nodes and is the smallest one. The Recommender Systems and Science Mapping theme is linked to the community Bibliometrics and Science Mapping. It is considered a declining theme since its papers were published between 2014 and 2018, and in the aforementioned community few articles have been published recently. Finally, for the Advanced NLP in Scholarly Documents theme, there is not enough evidence to named it as an emerging topic because it has only three recent papers. 66 Figure 4: BIR Strategic Map 4. Conclusion In this paper, we mapped the conceptual structure of BIR using co-word network analysis for which we found five communities: Scientific Summarization Shared Tasks, Information Retrieval, Bibliometrics and Science Mapping, Citation Context Analysis, and Mathematical Citation Context Analysis. The first and the fourth communities are highly based on NLP and CL. The strongest community is Scientific Summarization Shared Tasks and the weakest community is Mathematical Citation Context Analysis. Regarding BIR’s trending topics, we implemented a strategic map approach for which we identified as BIR’s core themes Scientific Summarization Shared Task, Citation Context Analysis, and Information Retrieval. Furthermore, we discovered that language models are a topic that is highly used in scientific summarization and soon will be a motor theme. We also found that there is a decrease on scholarly documents regarding bibliometrics topics. It can be seen in the declining theme Recommender Systems and Science Mapping as well as in the few papers recently published. Our work has a limitation in regards to the number of papers included on the co-word network and the strategic diagram. Of the 227 papers only 104 contributed to the conceptual structure. In fact, ninety- nine of the missing one hundred twenty-three articles studied themes not related to the co-word network (see Appendix D). Furthermore, the other missing papers should be included in the communities but were not. We argued that their abstracts didn´t portray their meanings. The same contribution pattern is shown in the strategic diagram only 85 scholarly documents were part of it. The low amount of papers’ contribution may be due to the fact that the topics are still in development because of BIR interdisciplinarity. In the beginning, BIR’s focus was on bibliometrics and informational retrieval, but in recent years with the developments in NLP, other topics such as scientific-fact checking, argumentation mining, Information Extraction, scientific search based on Question and Answering are been studied. Future research is required on the creation of large datasets. Most of the studies are based on NLP or biomedical corpus, there is a need for other datasets domains like [44]. Furthermore, the creation of domain-specific pretrained models or general domain models like [45] need more research effort. The implementation of semantic publishing [46] hasn’t been explored by the community and could contribute to citation context analysis [47]. Despite the limitations of this exploratory study, we believe that this is an initially contribution to BIR community in order to map its current and trending topics. 67 5. Appendices Appendix A, B, C and D can be found in the link. 6. Acknowledgments I want to thank Philipp Mayr, Jodi Schneider, and Ludo Waltman for their valuable comments on this paper. References [1] C. Neilson and M. L. Lê, “A failed attempt at developing a search filter for systematic review methodology articles in ovid embase,” J. Med. Libr. Assoc., vol. 107, no. 2, pp. 203–209, 2019, doi: 10.5195/jmla.2019.519. [2] J. Harbour et al., “Review Article Reporting methodological search filter performance comparisons : a literature review,” Health Info. Libr. J., pp. 176–194, 2014, doi: 10.1111/hir.12070. [3] P. Mayr, P. Schaer, A. Scharnhorst, and P. Mutschke, “Editorial for the Bibliometric-enhanced Information Retrieval Workshop at ECIR 2014,” in BIR 2014 Workshop on Bibliometric-enhanced Information Retrieval, 2014, vol. 2345, pp. 1–4, [Online]. Available: http://ceur-ws.org/Vol- 1143/editorial.pdf. [4] P. Mayr, I. Frommholz, and P. Mutschke, “Editorial for the 2nd bibliometric-enhanced information retrieval workshop at ECIR 2015,” in BIR 2015 Workshop on Bibliometric-enhanced Information Retrieval, 2015, vol. 1344, pp. 1–4, [Online]. Available: http://ceur-ws.org/Vol-1344/editorial.pdf. [5] P. Mayr, I. Frommholz, and G. Cabanac, “Editorial for the 3rd Bibliometric-enhanced Information Retrieval workshop at ECIR 2016,” in BIR 2016 Workshop on Bibliometric-enhanced Information Retrieval, 2016, vol. 2080, pp. 1–5, [Online]. Available: http://ceur-ws.org/Vol-1344/. [6] P. Mayr, I. Frommholz, and G. Cabanac, “Editorial for the 5th bibliometric-enhanced information retrieval workshop at ECIR 2017,” in BIR 2017 Workshop on Bibliometric-enhanced Information Retrieval, 2017, vol. 1823, pp. 1–5, [Online]. Available: http://ceur-ws.org/Vol-1823/. [7] P. Mayr, I. Frommholz, and G. Cabanac, “Editorial for the 7th bibliometric-enhanced information retrieval workshop at ECIR 2018,” in BIR 2018 Workshop on Bibliometric-enhanced Information Retrieval, 2018, vol. 2345, pp. 4–8, [Online]. Available: http://ceur-ws.org/Vol-2080/editorial.pdf. [8] G. Cabanac, I. Frommholz, and P. Mayr, “Editorial for the 8th Bibliometric-enhanced Information Retrieval Workshop at ECIR 2019,” in BIR 2019 Workshop on Bibliometric-enhanced Information Retrieval, 2019, vol. 2345, pp. 1–7, [Online]. Available: http://ceur-ws.org/Vol-2345/editorial.pdf. [9] G. Cabanac, I. Frommholz, and P. Mayr, “Preface to the 10th Workshop on Bibliometric-enhanced Information Retrieval at ECIR 2020,” in BIR 2020 Workshop on Bibliometric-enhanced Information Retrieval, 2020, vol. 2847, no. April, pp. 1–4. [10] I. Frommholz, P. Mayr, G. Cabanac, and S. Verberne, “Preface to the 11th workshop on bibliometric- enhanced information retrieval at ECIR 2021,” in BIR 2021 Workshop on Bibliometric-enhanced Information Retrieval Preface, 2021, vol. 2847, pp. 1–4, [Online]. Available: http://ceur-ws.org/Vol- 2847/paper-01.pdf. [11] P. Mayr, I. Frommholz, G. Cabanac, and D. Wolfram, “Editorial for the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL) at JCDL 2016,” in BIRNDL 2016 Joint Workshop on Bibliometric-enhanced Information Retrieval and NLP for Digital Libraries, 2016, vol. 2132, pp. 1–5. [12] P. Mayr, M. K. Chandrasekaran, and K. Jaidka, “Editorial for the 2nd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL) at SIGIR 2017,” in BIRNDL 2017 Joint Workshop on Bibliometric-enhanced Information Retrieval and NLP for Digital Libraries, 2017, vol. 2132, pp. 1–5, [Online]. Available: http://ceur- ws.org/Vol-1888/editorial.pdf. [13] M. K. Chandrasekaran and P. Mayr, “Preface: 4th joint workshop on BIRNDL at SIGIR 2019,” in BIRNDL 2019 Joint Workshop on Bibliometric-enhanced Information Retrieval and NLP for Digital Libraries, 2019, vol. 2414, pp. 1–5, [Online]. Available: http://ceur-ws.org/Vol-2414/preface.pdf. 68 [14] M. K. Chandrasekaran et al., “Overview of the First Workshop on Scholarly Document Processing (SDP),” in Proceedings ofthe First Workshop on Scholarly Document Processing, 2020, pp. 1–6, doi: 10.18653/v1/2020.sdp-1.1. [15] I. Atanassova, M. Bertin, and P. Mayr, “Editorial for the First Workshop on Mining Scientific Papers: Computational Linguistics and Bibliometrics,” in Computational Linguistics and Bibliometrics, 2017, vol. 2004, pp. 1–4, [Online]. Available: http://ceur-ws.org/Vol-1384/editorial.pdf. [16] P. Mayr and A. Scharnhorst, “Scientometrics and information retrieval: weak-links revitalized,” Scientometrics, vol. 102, no. 3, pp. 2193–2199, 2015, doi: 10.1007/s11192-014-1484-3. [17] G. Cabanac, I. Frommholz, and P. Mayr, “Bibliometric-enhanced information retrieval: preface,” Scientometrics, vol. 116, no. 2, pp. 1225–1227, 2018, doi: 10.1007/s11192-018-2861-0. [18] G. Cabanac, I. Frommholz, and P. Mayr, “Scholarly literature mining with information retrieval and natural language processing: Preface,” Scientometrics, vol. 125, no. 3, pp. 2835–2840, 2020, doi: 10.1007/s11192-020-03763-4. [19] I. Atanassova, M. Bertin, and P. Mayr, “Editorial: Mining Scientific Papers: NLP-enhanced Bibliometrics,” Front. Res. Metrics Anal., vol. 4, no. April, pp. 2–4, 2019, doi: 10.3389/frma.2019.00002. [20] P. Mayr et al., “Introduction to the special issue on bibliometric-enhanced information retrieval and natural language processing for digital libraries (BIRNDL),” Int. J. Digit. Libr., vol. 19, no. 2–3, pp. 107–111, 2018, doi: 10.1007/s00799-017-0230-x. [21] M. J. Cobo, A. G. López-Herrera, E. Herrera-Viedma, and F. Herrera, “An approach for detecting, quantifying, and visualizing the evolution of a research field: A practical application to the Fuzzy Sets Theory field,” J. Informetr., vol. 5, no. 1, pp. 146–166, 2011, doi: 10.1016/j.joi.2010.10.002. [22] M. Callon, J.-P. Courtial, W. A. Turner, and S. Bauin, “From translations to problematic networks: An introduction to co-word analysis,” Soc. Sci. Inf., vol. 22, no. 2, pp. 191–235, 1983, doi: doi:10.1177/053901883022002003. [23] Y. Ding, G. G. Chowdhury, and S. Foo, “Bibliometric cartography of information retrieval research by using co-word analysis,” Inf. Process. Manag., vol. 37, no. 6, pp. 817–842, 2001, doi: 10.1016/S0306-4573(00)00051-0. [24] N. Coulter, I. Monarch, S. Konda, and M. Carr, “Software Engineering as Seen through Its Research Literature: A Study in Co- Word Analysis,” J. Am. Soc. Inf. Sci., vol. 49, no. 13, pp. 1097–4571, 1998, doi: 10.1002/(SICI)1097-4571(1998)49. [25] G. Grolemund and H. Wickham, R for Data Science, 1st ed. O’Reilly Media, Inc, 2017. [26] N. J. van Eck and L. Waltman, “Software survey: VOSviewer, a computer program for bibliometric mapping,” Scientometrics, vol. 84, no. 2, pp. 523–538, 2010, doi: 10.1007/s11192-009-0146-3. [27] M. J. Cobo, A. G. Lõpez-Herrera, E. Herrera-Viedma, and F. Herrera, “SciMAT: A new science mapping analysis software tool,” J. Am. Soc. Inf. Sci. Technol., vol. 63, no. 8, pp. 1609–1630, 2012, doi: 10.1002/asi.22688. [28] G. Cabanac, I. Frommholz, and P. Mayr, “Bibliometric-enhanced information retrieval 10th anniversary workshop edition,” in Advances in Information Retrieval, Vol 12036., vol. 12036 LNCS, Jose J. et al, Ed. Springer, 2020, pp. 641–647. [29] M. Aria and C. Cuccurullo, “bibliometrix: An R-tool for comprehensive science mapping analysis,” J. Informetr., vol. 11, no. 4, pp. 959–975, 2017, doi: 10.1016/j.joi.2017.08.007. [30] A. Liberati et al., “The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: explanation and elaboration.,” BMJ, vol. 339, 2009, doi: 10.1136/bmj.b2700. [31] N. J. Van Eck and L. Waltman, “How to normalize cooccurrence data? An analysis of some well- known similarity measures,” J. Am. Soc. Inf. Sci. Technol., vol. 60, no. 8, pp. 1635–1651, 2009, doi: https://doi.org/10.1002/asi.21075. [32] K. Jaidka et al., “The Computational Linguistics Summarization Pilot Task,” 2014. [33] K. Jaidka, M. K. Chandrasekaran, S. Rustagi, and M. Y. Kan, “Insights from CL-SciSumm 2016: the faceted scientific document summarization Shared Task,” Int. J. Digit. Libr., vol. 19, no. 2–3, pp. 163–171, 2018, doi: 10.1007/s00799-017-0221-y. [34] M. K. Chandrasekaran, G. Feigenblat, E. Hovy, A. Ravichander, M. Shmueli-Scheuer, and A. de Waard, “Overview and Insights from the Shared Tasks at Scholarly Document Processing 2020: CL- SciSumm, LaySumm and LongSumm,” in Proceedings of the First Workshop on Scholarly Document Processing, 2020, pp. 214–224, doi: 10.18653/v1/2020.sdp-1.24. [35] N. Ibrahim Altmami and M. El Bachir Menai, “Automatic summarization of scientific articles: A survey,” J. King Saud Univ. - Comput. Inf. Sci., no. xxxx, 2020, doi: 10.1016/j.jksuci.2020.04.020. 69 [36] S. Iqbal, S.-U. Hassan, N. R. Aljohani, S. Alelyani, R. Nawaz, and L. Bornmann, “A Decade of In- text Citation Analysis based on Natural Language Processing and Machine Learning Techniques: An overview of empirical studies,” Scientometrics, vol. 126, pp. 6551–6599, 2021, doi: 10.1007/s11192- 021-04055-1. [37] I. Tahamtan and L. Bornmann, What do citation counts measure? An updated review of studies on citations in scientific documents published between 2006 and 2018, vol. 121, no. 3. Springer International Publishing, 2019. [38] I. Atanassova, “Beyond metadata: The new challenges in mining scientific papers,” in Bibliometric- enhanced Information Retrieval 2019, 2019, vol. 2345, pp. 8–13, [Online]. Available: http://ceur- ws.org/Vol-2345/paper1.pdf. [39] M. Hernández-Alvarez and J. M. Gomez, “Survey about citation context analysis: Tasks, techniques, and resources,” Nat. Lang. Eng., vol. 22, no. 3, pp. 327–349, 2016, doi: 10.1017/S1351324915000388. [40] S. N. Kunnath, D. Herrmannova, D. Pride, and P. Knoth, “A meta-analysis of semantic classification of citations,” Quant. Sciene Stud., pp. 1–46, 2021, doi: https://doi.org/10.1162/qss_a_00159. [41] S. Machado, A. C. Ribeiro, and J. Oliveira e Sá, “Machine learning algorithms and techniques for sentiment analysis in scientific paper reviews: A systematic literature review,” in 19.a Conferência da Associação Portuguesa de Sistemas de Informação, 2019, pp. 1–12. [42] A. Rotondi, A. Di Iorio, and F. Limpens, “Identifying citation contexts: A review of strategies and goals,” in Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018, 2018, vol. 2253, no. December 2018, pp. 10–12, doi: 10.4000/books.aaccademia.3594. [43] A. Aizawa, M. Kohlhase, I. Ounis, and M. Schubotz, “NTCIR-11 Math-2 Task Overview,” in NTCIR- 11 Math-2 Task Overview, 2014, pp. 88–98, [Online]. Available: http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings11/pdf/NTCIR/OVERVIEW/01- NTCIR11-OV-MATH-AizawaA_poster.pdf. [44] D. Wright and I. Augenstein, “CITEWORTH: Cite-Worthiness Detection for Improved Scientific Document Understanding,” in Findings of the Association for Computational Linguistics: ACL- IJCNLP 2021, 2021, pp. 1796–1807. [45] L. He, A. Shah, M. Ostendorf, and H. Hajishirzi, “A General Framework for Information Extraction using Dynamic Span Graphs,” 2019. [46] D. Shotton, “Semantic publishing: the coming revolution in scientific journal publishing,” Learn. Publ., vol. 22, no. 2, pp. 85–94, 2009, doi: 10.1087/2009202. [47] M. Abdul and Q. Member, “CCRO : Citation ’ s Context & Reasons Ontology,” IEEE Access, vol. 7, pp. 30423–30436, 2019, doi: 10.1109/ACCESS.2019.2903450. 70