=Paper=
{{Paper
|id=Vol-1178/CLEF2012wn-QA4MRE-PakrayEt2012
|storemode=property
|title=An Automatic System for Modality and Negation Detection
|pdfUrl=https://ceur-ws.org/Vol-1178/CLEF2012wn-QA4MRE-PakrayEt2012.pdf
|volume=Vol-1178
|dblpUrl=https://dblp.org/rec/conf/clef/PakrayBBBG12
}}
==An Automatic System for Modality and Negation Detection==
An Automatic System for Modality and Negation Detection Partha Pakray1, Pinaki Bhaskar1, Somnath Banerjee1, Sivaji Bandyopadhyay1, and Alexander Gelbukh2 1 Department of Computer Science and Engineering, Jadavpur University, Kolkata – 700032, India 2 Center for Computing Research, National Polytechnic Institute, Mexico City, Mexico {pinaki.bhaskar, parthapakray, s.banerjee1980}@gmail.com, sivaji_cse_ju@yahoo.com, gelbukh@gelbukh.com Abstract. The article presents the experiments carried out as part of the participation in the pilot task (Modality and Negation)1 of QA4MRE@CLEF 2012. Modality and Negation are two main grammatical devices that allow to express extra-propositional aspects of meaning. Modality is a grammatical category that allows to express aspects related to the attitude of the speaker towards statements. Negation is a grammatical category that allows to change the truth value of a proposition. The input for the systems is a text where all events expressed by verbs are identified and numbered the output should be a label per event. The possible values are: mod, neg, neg-mod, none. In the developed system, we first build a database for modal verbs of two categories: epistemic and deontic. Also, we used a negative verb list of 1877 verbs. This negative verb list has been used to identify negative modality. We extract the each tagged events from each sentences. Then our system check modal verbs by that database from each sentences. If any modal verbs is found before that an event then that event should be modal verb and tagged as mod. If modal verb is there and also negeted words is found before that evet then that event should negeted mod and tagged as neg-mod. If no modal verb is found before that an event but negeted word are found before that event then that event should be negeted and tagged as neg. Otherwise the event should tagged as none. We trained our system by traing data (sample data) that was provided by QA4MRE organizer. Then we are tested our system on test dataset. In test data set there are eight documents, two per each of the four topics such as Alzheimer, music and society, AIDs and climate change. Our system overall accuracy is 0.6262 (779 out of 1244). Keywords: QA4MRE Data Sets, Modal Verbs List. 1 Introduction The main objective of QA4MRE2 [1] is to develop a methodology for evaluating Machine Reading systems through Question Answering and Reading Comprehension Tests. Beside the Main Task, also two pilot tasks are offered by organizer this year at QA4MRE, i.e. Processing Modality and Negation for Machine Reading, Machine Reading of Biomedical Texts about Alzheimer. This task (Processing Modality and Negation for Machine Reading) is defined as an annotation task where systems have to determine whether an event mentioned in a text is presented as negated, modalised (i.e. affected by an expression of modality), or both. This information can be relevant for machine reading systems, since negated and modalised events should be treated differently than factual events in the inference making process. We have participated in RespubliQA@CLEF 2010 [2], QA4MRE@CLEF 2011 [3] and QA4MRE@2012 [4]. This year we have participated in Main Task [1] and Pilot Task [1]. Section 2 describes the corpus statistics. Section 3 describes the system architecture. The experiments carried out on test data sets are discussed in Section 4 along with the results. 1 http://celct.fbk.eu/QA4MRE/index.php?page=Pages/modalityTask.html 2 http://celct.fbk.eu/QA4MRE/index.php 2 Corpus Statistics The organizer provided a test set consisting of 8 documents, 2 per topic. Documents annotated as shown in Table 1 in (1), where events are marked in the text, assigned an identification number and label per event with the format shown in Table 1 in (2). The possible values are: mod, neg, neg-mod, none. Table 1. QA4MRE Corpus Detail Corpus (1) Europe's climate policyBeing ambitious The European Commission maps a path to a low-carbon future. Now towalk it Mar 10th 2011 from the print edition About half Europe's electricitycomes from fossil fuels, with CO2 emissions as an unwanted by-product. By 2050,proposes a `` road map ''released by the European Commission this week, all that gassy baggage mustgo . (2) e1=NONE e2=MOD e3=NONE e4=NONE e5=NONE e6=MOD There are eight documents, two per each of the four topics, such as: Alzheimer, music and society, AIDs and climate change. For each document organizer provided the text version in the directory "txt" and the version with marked events in the directory "events" is shown in Table 2. This task was defined as an annotation task where systems have to determine whether an event mentioned in a text is presented as negated, modalised (i.e. affected by an expression of modality), or both. Table 2. Test corpus files Name of the Test Files Pilot task: PROCESSING MODALITY AND NEGATION No Txt files Event files 1 aids-all-colors-of-the-brainbow.txt aids-all-colors-of-the-brainbow-EVENTS.txt 2 aids-darc-continent.txt aids-darc-continent-EVENTS.txt 3 alz-barking-up-wrong-trip.txt alz-barking-up-wrong-trip-EVENTS.txt 4 alz-have-have-not.txt alz-have-have-not-EVENTS.txt 5 climate-a-record-making-effort.txt climate-a-record-making-effort-EVENTS.txt 6 climate-are-economists-erring-on-climate- climate-are-economists-erring-on-climate-change- change.txt EVENTS.txt 7 music-can-hiphop-change-the-world.txt music-can-hiphop-change-the-world-EVENTS.txt 8 music-how-to-sink-pirates.txt music-how-to-sink-pirates-EVENTS.txt 3 System Architecture The architecture of machine reading system is described in Figure 1. The describing system consists of database and four following modules: i. Sentence Extractor ii. Event Tag Identifier and Event Generator iii. Modality and Negation Processing iv. Decision Maker Further, Modality and Negation Processing module divides in two-sub modules: (i) Modality Processor and (ii) Negation Processor Database: The modal lists contain the following lists: modal verbs, epistemic adjectives, epistemic adverbs, epistemic nouns, propositional attitude verbs and adjectives, epistemic judgment verbs, epistemic evidential verbs, epistemic deductive verbs. Explicit negation lists have been also prepared manually to handle explicit negations. Those lists include negative nouns, negative verbs, negative prepositions, negative determiners, negative pronouns, and negative conjunctions. Fig. 1. System Architecture Sentence Extractor Module: The input to this module is single document and output is a list of sentences S = {S1, S2, S3 …Sn-1, Sn}. The objective of this module is to identify each sentence and make list S for next level. Event Tag Identifier and Event Generator: This module takes list of sentences S as input and processes each sentence to extract individual event. This module has the ability to identify event tag. For each sentence in sentence list S individual events have been identified using event tag and a list of events E= {e1, e2, e3…en-1, en} has been generated. Modality and Negation Processing: This module is the core module of the system. This module has two sub- modules- modality processor and negation processor. Modality Processor module is responsible for identifying an event is modalised or not. The manually prepared lists described at database section are applied to the processing event and a pair {event, modality} has been generated for each event. Next, the Negation Processor module uses the negative lists to check whether it appears before the event. If that do not occur then marks it as negative. So, for each event a new pair has been generated by this module- eventi = {modality, negation}; i.e e1 = {yes, no} , e1€ E ={e1, e2, e3…en-1, en}. Decision Maker: This modules takes event list E= {e1, e2, e3…en-1, en} and decides one of the four output based on the table is shown in Table 3. Table 3. Event List Options Modality Negation Results 1 no no NONE 2 no yes NEG 3 yes no MOD 4 yes yes NEGMOD Where, NONE: The event is presented as certain and it happened NEG: The event is presented as certain and did not happen MOD: The event is not presented as certain and is not negated NEGMOD: The event is not presented as certain and is negated 4 Evaluation We have trained our system by train data (sample data) and tested on test data set. Experiments result is shown in Table 4. Table 4. Experiment results on Test Data Dataset name Tag Precision Recall F-Score eval.JUCSENLP-aids-all-colors-of-the-brainbow- MOD 0.2500 0.4737 0.3273 r1.txt NONE 0.7857 0.6627 0.7190 Overall Accuracy: 0.5766 (64 out of 111) eval.JUCSENLP-aids-darc-continent-r1.txt MOD 0.4118 0.5385 0.4667 NONE 0.8000 0.7500 0.7742 Overall Accuracy: 0.6379 (74 out of 116) eval.JUCSENLP-alz-barking-up-wrong-trip-r1.txt MOD 0.7667 0.6053 0.6765 NONE 0.7324 0.8387 0.7820 Overall Accuracy: 0.7009 (75 out of 107) eval.JUCSENLP-alz-have-have-not-r1.txt MOD 0.7727 0.6296 0.6939 NEGMOD 0.1250 0.3333 0.1818 NONE 0.6744 0.7838 0.7250 Overall Accuracy: 0.6438 (47 out of 73) eval.JUCSENLP-climate-a-record-making-effort- MOD 0.5926 0.5369 0.5634 r1.txt NONE 0.6034 0.7447 0.6667 Overall Accuracy: 0.5867 (220 out of 375) eval.JUCSENLP-climate-are-economists-erring- MOD 0.7889 0.5917 0.6762 on-climate-change-r1.txt NEGMOD 0.0667 0.0909 0.0769 NONE 0.5727 0.8289 0.6774 Overall Accuracy: 0.6279 (135 out of 215) eval.JUCSENLP-music-can-hiphop-change-the- MOD 0.6875 0.5500 0.6111 world-r1.txt NONE 0.6753 0.7429 0.7075 Overall Accuracy: 0.6028 (85 out of 141) eval.JUCSENLP-music-how-to-sink-pirates- MOD 0.6452 0.5714 0.6061 r1.txt NONE 0.8429 0.8551 0.8489 Overall Accuracy: 0.7453 (79 out of 106) MOD 0.6268 0.5633 0.5933 NEGMOD 0.0286 0.0488 0.0360 Overall NONE 0.6818 0.7669 0.7219 Macroaveraged F-measure (beta=1.0): 0.3378 Microaveraged F-measure (beta=1.0): 0.6132 Overall Accuracy: 0.6262 (779 out of 1244) Acknowledgements. We acknowledge the support of the IFCPAR funded Indo-French project “An Advanced Platform for Question Answering Systems”, partial support of the DST India—CONACYT Mexico project “Answer Validation through Textual Entailment” and the DIT, Government of India funded project “Development of Cross Lingual Information Access (CLIA) System Phase II”. References 1. Anselmo Peñas, Eduard Hovy, Pamela Forner, Álvaro Rodrigo, Richard Sutcliffe, Caroline Sporleder, Corina Forascu, Yassine Benajiba, Petya Osenova. Overview of QA4MRE at CLEF 2012: Question Answering for Machine Reading Evaluation. CLEF 2012 Evaluation Labs and Workshop Working Notes Papers, 17-20 September, 2012, Rome, Italy. [2012] 2. Partha Pakray, Pinaki Bhaskar, Santanu Pal, Dipankar Das, Sivaji Bandyopadhyay and Alexander Gelbukh. JU_CSE_TE: System Description QA@CLEF 2010 – ResPubliQA. CLEF 2010 Workshop on Multiple Language Question Answering (MLQA 2010). [2010] 3. Partha Pakray, Pinaki Bhaskar, Somnath Banerjee, Bidhan Chandra Pal, Sivaji Bandyopadhyay and Alexander Gelbukh. A Hybrid Question Answering System based on Information Retrieval and Answer Validation. CLEF 2011 Workshop on Question Answering For Machine Reading Evaluation (QA4MRE). CLEF 2011 Labs and Workshop. Notebook Papers. 19-22 September, Amsterdam. ISBN 978-88-904810-1-7; ISSN 2038-4726. 2011, 16 pp. [2011] 4. Pinaki Bhaskar, Partha Pakray, Somnath Banerjee, Samadrita Banerjee, Sivaji Bandyopadhyay and Alexander Gelbukh. Question Answering System for QA4MRE@CLEF 2012. CLEF 2012 Workshop on Question Answering For Machine Reading Evaluation (QA4MRE). CLEF 2012 Labs and Workshop. Notebook Papers. 17-20 September 2012, Rome, Italy. [2012]