A preliminary study to compare deep learning with rule-based approaches for citation classification

A preliminary study to compare deep learning with rule-based approaches for citation classification JulienPerier-Camby julien.perier-camby@etu.univ-lyon.fr Laboratoire LIRIS Université Claude Bernard Lyon 1

France

MarcBertin marc.bertin@univ-lyon1.fr Laboratoire ELICO Université Claude Bernard Lyon 1

France

IanaAtanassova iana.atanassova@univ-fcomte.fr CRIT Université de Bourgogne Franche-Comté

France

FrédéricArmetta frederic.armetta@univ-lyon1.fr Laboratoire LIRIS Université Claude Bernard Lyon 1

France

A preliminary study to compare deep learning with rule-based approaches for citation classification EA8D2F56D286D8ED0DBFEB51C1D014BF GROBID - A machine learning software for extracting information from scholarly documents Biattentive Classification Network Citation Classification Citation Analysis Citation Contexts. BIR 2019 Workshop on Bibliometric-enhanced Information Retrieval

Categorization of semantic relationships between scientific papers is a key to characterize the condition of a research field and to identify influential works. Recently, new approaches based on Deep Learning have demonstrated good capacities to tackle Natural Language Processing problems, such as text classification and information extraction. In this paper, we show how deep learning algorithms can automatically learn to classify citations, and could provide a relevant alternative when compared with methods based on pattern extractions from the recent state of the art. The paper discusses their appropriateness given the requirement of large datasets to train neural networks.

Introduction

The categorization of semantic relationships is at the very heart of bibliometrics and Natural Languages Processing research. As described by Garfield more than 50 years ago [7], understanding how scholars use and frame citations is an essential prerequisite to characterize the state of a scientific field and to identify influential works. The research on citation acts has already proposed numerous empirical studies and models, in particular through the production of ontologies such as CiTO (see [14,5]) or studies on the analysis of sentiments applied to the context of citations [3,11].

Most of the studies in this field rely on Rule-based Information Extraction in order to categorize and semantically annotate citation acts. The general idea of such approaches in Natural Language Processing is to propose a categorization of citation contexts through the identification of patterns or text structures [16,9,2,10,1]. Nevertheless, the declarative nature of rule-based approaches leads to drawbacks and tends to be replaced by machine learning alternatives [4].

To our knowledge, deep learning methods have not yet been applied to categorize citations in texts, i.e. to determine a class for each of the citation acts. The reasons for this are mainly because few datasets are publicly available, and because they tend to be small and unbalanced, making them difficult to use for the development of deep learning approaches. If we consider the progress enabled by deep learning in any domains, it is nevertheless interesting to show how deep learning approaches behave within this innovative context.

In this paper, we aim to compare the most efficient rule-based approach from the state of the art used for categorization [8] to a famous deep learning approach well-known for its ability to identify sentence meanings [15]. In section 2, we describe how rule-based approaches have been applied to categorize citations and the main principles of deep learning approaches. We discuss the advantages and drawbacks for both approaches, and underline the challenges in training a neural network with a dataset that is limited in size and unbalanced between labeled categories or classes. This section introduces the Biattentive Classification Network (BCN, [12]) combined with Embeddings from Language Models (ELMo, [15]) word representations that we experiment. Section 3 introduces the corpus and the protocol that we use for evaluation. The results are presented and discussed in section 4. The conclusion is presented in section 5.

Categorization of Semantic Relationships

In this section, we describe and discuss the general functioning of rule-based and deep learning approaches and their requirements. Rather that giving the detailed description of each of the approaches, we present their general properties for the sake of comparison.

Rule-based information extraction

In rule-based approaches, one has to define a set of discourse features which can be relevant to characterize the sentences semantics dedicated to different scopes. A state-of-the-art method for rule-based information extraction applied to citation framing has been proposed by [8], using pattern-based features, topicbased features and prototypical argument features. As a final step, a training phase is used to weight the relevance of each of the available patterns depending on the class to predict. This is usually done through shallow machine learning models (for instance, k-nearest neighbors [17] or random forest [8]). Such models require smaller sizes of training datasets to provide satisfying results, compared to deep neural networks.

It should be noted that rule-based methods suffer only slightly from unbalanced datasets as the features are hand-crafted, and therefore inferred on wider knowledge and not limited to the sample in the training dataset. If a citation class is under represented, the classifier could still capture part of the meaning, as the knowledge used for the capturing is provided by an expert. Thus, the lack of balance in the dataset, only slightly degrades the classifier learning.

Deep learning information extraction

Deep learning algorithms are artificial neural networks that learn to perform tasks by learning from samples. For the specific problem we address, the network takes as input some selected characteristics of the citation and learns to give as an output the appropriate prediction (citation class). The efficiency of such algorithms does not rely on any task-specific rules, but rather benefits from non linear functions dedicated to capture complex patterns during the learning phase in order to produce a model capable of categorizing new samples.

Deep learning algorithms are highly sensitive to the quality of the training data as they do not rely on any external knowledge. As for any machine learning algorithm, the training data should be as balanced as possible, i.e. the variables have to be independent and identically distributed, and the training dataset should be large enough for the system to learn. For the so addressed problem, we need a dataset that is large and balanced across the different citation classes. In fact, if a citation class is underrepresented in the dataset, its characteristics will need to be extracted from a smaller number of samples and the inference mechanism will provide sub-optimal results.

For the purpose of comparison, we have selected the BCN model (Biattentive Classification Network, [12]) designed to handle sentence classification tasks. ELMo (Embeddings from Language Models, [15]) is designed to extract word representations, and can be used to encode sentences to pass through classifiers. BCN complemented by ELMo is the current state of the art on fine-grained (five-class) sentiment classification (SST-5, Stanford Sentiment Treebank). It is one of the best available algorithms from the state of the art for inference in text understanding.

Method and experimental setup

The dataset that we use for the training and the evaluation of the BCN model is the one used in [8]. This dataset has been fully annotated manually, which makes it particularly accurate to study the ability of an automatic classifier to imitate human performances. Table 1 presents the six classes used for the labelling of citations.

In order to underline the citation act to classify, every in-text reference is replaced in turn by a marker ('[X]'). The so formatted sentences (one marker for each sentence) are passed through the neural network for inference. We used in this paper the BCN model implemented by the AllenNLP library [6], which is a high-level framework built on PyTorch [13].

The evaluation has been done using k-fold cross-validation, with k = 10, for the learning and testing of the network to provide statistically significant results. The original samples have been randomly partitioned into 10 equally sized subsamples. The learning has been performed on 9 subsamples and tested on the remaining one for each of the combinations. The reported results correspond to the average results over the 10 training sessions 4 . M icro − F 1 and M acro − F 1 scores are used to report the global efficiency of the network for each class, where M icro − F 1 stands for the weighted arithmetic average and M acro − F 1 stands for the non-weighted arithmetic average of the F 1 score for each class.

Results and discussion

Global results

The selected deep learning and rule-based approaches performances are presented on table 2. Jurgens et al. [8] only reports the M acro − F 1 metric as a base for comparison. Because of their rarity, accurate samples of significant size for such a study are difficult to acquire and this can be a major obstacle to clearly identifying the potential of deep learning approaches for citation categorization.

Results by class

Table 3 presents the results of the F 1 score and the sample sizes for the different classes. One can note that some classes are more challenging to predict than others with the BCN model. The classes are highly imbalanced in the dataset. For this reason, we consider the M icro − average metric, which aggregates the contributions of all classes in an average value.

For each class, the reported efficiency clearly correlates with the size of the available data. While rule-based approaches become effective mainly thanks to expert knowledge, neural networks are completely dependant on samples for their learning. As a result, it is not surprising that the model performs poorly for small samples, as in the case of the class "FUTURE". Under-represented classes pull down the M acro − F 1 value, and sightly influence the M icro − F 1 value. On the other hand, the classifier performs well for larger samples, e.g. the classes "BACKGROUND" and "USES" with F 1 scores of 0.720 and 0.640.

Conclusion

The paper describes the citation classification which is a central problem leading to many applications in bibliometrics. In this work, we are interested in studying deep learning abilities to capture the semantics of citations when compared with rule-based approaches. To do so, we compare two approaches from the recent state of the art.

We still can not define an upper bound for the application of deep learning approaches to citation classification because the experiment is based on a limited dataset compared to the datasets generally used in deep learning. New datasets need to be created to delineate more precisely the F 1 score that can be reached by such approaches. The results encourage the use of neural networks for the cases where large samples are available. In the cases when large samples are not available, it is clear that efforts invested into rule-based approaches prove reliable and can guarantee more accurate output.

Table 1 .1Scheme used for the labelling of citations, extracted from[8] ClassDescriptionBACKGROUNDProvides relevant information for this domain.e.g. "This is often referred to as incorporating deterministicclosure (Dörre, 1993)."MOTIVATIONIllustrates need for data, goals, methods, etc.e.g. "As shown in Meurers (1994), this is a well-motivated con-vention [...]"USESUses data, methods, etc.e.g. "The head words can be automatically extracted [...] in themanner described by Magerman (1994)."EXTENSIONExtends data, methods, etc.e.g. "[...] we improve a two-dimensional multimodal version ofLDA (Andrews et al, 2009) [...]COMPARISON OR Expresses similarity/differences.CONTRASTe.g. "Other approaches use less deep linguistic resources (e.g.,POS-tags Stymne (2008)) [...]"FUTUREIs a potential avenue for future work.

e.g. "[...] but we plan to do so in the near future using the algorithm of Littlestone and Warmuth (1992)."

Table 2 .2Experimental resultsApproachM acro − F 1M icro − F 1BCN + ELMo (2018)0.4050.588Jurgens et al. (2018)0.530-

Table 3 .3F1 score reported by class for BCM and ELMoClassF1 scoreSample sizeBACKGROUND0.7201000MOTIVATION0.306185USES0.640823EXTENSION0.103152COMPARISON OR CONTRAST0.570857FUTURE0.09370M icro − average0.5883087M acro − average0.4053087Random0.1383087

The source code of the approach presented here is available on GitHub : https://github.com/jperier/BIR2019_citationBCN

Acknowledgements

We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.

Purpose and polarity of citation: Towards nlp-based bibliometrics AAbu-Jbara JEzra DRadev The 2013 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies (NAACL-HLT 2013)

Westin Peachtree Plaza Hotel Atlanta, Georgia, USA

Jun 2013 Mining the context of citations in scientific publications NRAljohani RNawaz Maturity and Innovation in Digital Libraries: 20 th International Conference on Asia-Pacific Digital Libraries

Hamilton, New Zealand

Nov. 19-22. 2018 316 Towards automatic topical question generation YChali SAHasan The COLING 2012 Organizing Committee 2012. 2012 Proceedings of COLING Rule-based information extraction is dead! long live rule-based information extraction systems! LChiticariu YLi FRReiss Proceedings of Empirical Methods in Natural Language Processing (EMNLP Empirical Methods in Natural Language Processing (EMNLP

Grand Hyatt SeattleSeattle, Washington, USA

Association for Computational Linguistics 2013. Oct 2013 Evaluating citation functions in cito: Cognitive issues PCiancarini ADi Iorio AGNuzzolese SPeroni FVitali The 11 th conference proceedings for Semantic Evaluation Challenge 2014 (ESWC2014) -The Semantic Web: Trends and Challenges VPresutti CAmato FGandon MD'aquin SStaab ATordai

Anissaras, Crete, Greece

Springer International Publishing May 2014 Allennlp: A deep semantic natural language processing platform MGardner JGrus MNeumann OTafjord PDasigi FLiu NPeters MSchmitz MZettlemoyer L 2018 Can citation indexing be automated ? In: Statistical association methods for mechanized documentation EGarfield symposium proceedings 269 1965 National Bureau of Standards Measuring the evolution of a scientific field through citation frames DJurgens SKumar RHoover DMcfarland DJurafsky Transactions of the Association for Computational Linguistics 6 2018 Automated classification of author's sentiments in citation using machine learning techniques: A preliminary study ICKim GRThoma 10.1109/CIBCB.2015.7300319 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)

Niagara Falls, Canada

2015. Aug 2015 Patterns in citation context: the case of the field of scientometrics WLamers NJVan Eck LWaltman HHoos 23rd International Conference on Science and Technology Indicators (STI 2018)

Leiden, The Netherlands

September 12-14, 2018. 2018 Centre for Science and Technology Studies (CWTS) Improve sentiment analysis of citations with author modelling ZMa JNam KWeihe 10.18653/v1/W16-0420 Proceedings of the 7 th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis the 7 th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis Association for Computational Linguistics 2016 Learned in Translation: Contextualized Word Vectors BMccann JBradbury CXiong RSocher The Thirty-first Annual Conference on Neural Information Processing Systems (NIPS) Aug 2017 Automatic differentiation in pytorch APaszke SGross SChintala GChanan EYang ZDevito ZLin ADesmaison LAntiga ALerer The Thirty-first Annual Conference on Neural Information Processing Systems (NeurIPS). The Neural Information Processing Systems Foundation

Long Beach Convention Center, CA, USA

Dec 2017 Fabio and cito: ontologies for describing bibliographic resources and citations SPeroni DShotton Web Semantics: Science, Services and Agents on the World Wide Web 17 2012 Deep contextualized word representations MEPeters MNeumann MIyyer MGardner CClark KLee LZettlemoyer The 16 th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2018. 2018 A new approach for scientific citation classification using cue phrases SBPham AHoffmann Advances in Artificial Intelligence TT DGedeon LC CFung

Berlin Heidelberg; Berlin, Heidelberg

Springer 2003. 2003 AI Automatic classification of citation function STeufel ASiddharthan DTidhar Proceedings of Empirical Methods in Natural Language Processing (EMNLP Empirical Methods in Natural Language Processing (EMNLP 2006. 2006 Association for Computational Linguistics