=Paper=
{{Paper
|id=Vol-2036/T3-3
|storemode=property
|title=Catchphrase Extraction from Legal Documents Using LSTM Networks
|pdfUrl=https://ceur-ws.org/Vol-2036/T3-3.pdf
|volume=Vol-2036
|authors=Rupal Bhargava,Sukrut Nigwekar,Yashvardhan Sharma
|dblpUrl=https://dblp.org/rec/conf/fire/BhargavaNS17
}}
==Catchphrase Extraction from Legal Documents Using LSTM Networks==
Catchphrase Extraction from Legal Documents Using LSTM Networks Rupal Bhargava1 Sukrut Nigwekar2 Yashvardhan Sharma3 WiSoc Lab, Department of Computer Science Birla Institute of Technology and Science, Pilani Campus, Pilani-333031 {rupal.bhargava1, f20150292, yash3} @pilani.bits-pilani.ac.in ABSTRACT Section 5 elaborates the evaluation and error analysis. Section 6 concludes the paper and presents future work. Legal texts usually have a complex structure and reading through them is a time-consuming and strenuous task. Hence it is essential to provide the legal practitioners a concise representation of the text. Catchphrases are 2. RELATED WORK those phrases which state the important issues present in the text, thus effectively characterizing it. This paper proposes an approach for the Various techniques are being used for the task of keyword subtask 1 of the task IRLed (Information Retrieval from Legal extraction [12]. They are broadly divided into supervised Documents), FIRE 2017. The proposed algorithm uses a three step learning, unsupervised learning and heuristic based. The goal of approach for extracting catchphrases from legal documents. supervised learning approaches was to train a classifier on documents annotated with keyphrases to determine whether a CCS Concepts candidate phrase is a keyphrase (Witten et al., 1999; Frank et al., • Information systems ! Retrieval tasks and goals • Information 1999) [4]. Another approach was to build a ranker for keyword systems ! Information extraction ranking (Jiang et al., 2009) [11]. Unsupervised techniques proposed can be categorized Keywords into four groups. Graph-based ranking is based on the idea to Keyword Extraction; Legal Documents; Deep Learning; LSTM; Natural build a graph from input document and rank its nodes according Language Processing; Information Retrieval to their importance using a ranking method (e.g., Brin and Page (1998)) [10]. Topic-based clustering involves grouping the 1. INTRODUCTION candidates into topics such that each topic is composed of only and only those candidates (Grineva et al., 2009) [5]. Simultaneous A prior case (also called a precedent) is an older court case learning approach is based on the assumption that important related to the current case, which discusses similar issue(s) and words occur in important sentences and a sentences is important which can be used as reference in the current case. If an ongoing is it contains important words (Wan et al. (2007)) [9]. Language case has any related/relevant legal issue(s) that has already been modeling scores keywords based on two features, namely, decided, then the court is expected to follow the interpretations phraseness and informativeness (Tomokiyo and Hurst (2003)) made in the prior case. For this purpose, it is critical for legal [8]. practitioners to find and study previous court cases, so as to Typical heuristics include (1) using a stop word list to examine how the ongoing issues were interpreted in the older remove stop words (Liu et al., 2009b) [7], (2) allowing words with cases. certain part-of-speech tags (e.g., nouns, adjectives, verbs) to be Generally, legal texts (e.g., court case descriptions) are long and candidate keywords (Mihalcea and Tarau, 2004) [6], (3) allowing have complex structures. This makes their thorough reading n-grams that appear in Wikipedia article titles to be candidates time-consuming and strenuous. So, it is essential for legal (Grineva et al., 2009) [5], and (4) extracting n-grams (Witten et practitioners to have a concise representation of the core legal al., 1999) [4] or noun phrases (Barker and Cornacchia, 2000) [3] issues described in a legal text. One way to list the core legal that satisfy pre-defined lexico-syntactic pattern(s) (Nguyen and issues is by keywords or key phrases, which are known as Phan, 2009) [2]. “catchphrases” in the legal domain. In order to address this issue FIRE 2017 organized a task to extract catchphrases from legal documents. The task was to 3. DATASET DESCRIPTION given training set of documents and their corresponding Dataset provided by the organizers [1] contained two sets of catchphrases, extract catchphrases from new documents. legal texts – training and testing. The training set was accompanied by the catchphrases corresponding to each text. Rest of the paper is organized as follows. Section 2 explains the The given catchphrases mainly consisted of words present in the related work that has been done in the past years. Section 3 text and rarely included phrases which were not present in the describes the dataset provided by IRLed 2017 organizers. Section document. 4 explains the proposed technique that has been performed. 4. PROPOSED TECHNIQUE not very good this does not rule out the possibility of using deep learning for the task. The problem is formulated as a classification task and the objective is to learn a classifier using LSTM network. The proposed methodology involves a pipelined approach and is 6. CONCLUSION divided into four phases: Catchphrases present a summary of a legal text and are very useful for practitioners. They can be used to implement a Pre-processing document retrieval system as they can be used as representation Candidate phrase generation of the document needed. This working note presents an Creating vector representations for the phrases extraction system using LSTM network. The results are poor but Training a LSTM network LSTM are suited to the task at hand because of its continuous nature and hence should be explored further. 4.1 Pre-Processing The legal texts were pre-processed in order to ensure uniformity. Pre-processing included removal of special characters, numbers References and words which were not present in the English dictionary and [1] Mandal, K. Ghosh, A. Bhattacharya, A. Pal and S. Ghosh. Overview of the FIRE 2017 track: Information Retrieval from Legal converting all characters to lower case. Documents (IRLeD). In Working notes of FIRE 2017 – Forum for Information Retrieval Evaluation, Bangalore, India, December 8-10, 4.2 Candidate Phrase Generation 2017, CEUR Workshop Proceedings. CEUR-WS.org, 2017. [2] Chau Q. Nguyen and Tuoi T. Phan. 2009. An ontology-based To generate candidates, n-grams with n in range 1 to 4 were approach for key phrase extraction. In Proceedings of the Joint created from the text. A standard stop list of common English Conference of the 47th Annual Meeting of the Association for words is taken to reduce the candidates. If the candidate starts or Computational Linguistics and the 4th International Joint Conference on Natural Language Processing: Short Papers, pages 181–184. ends with a stop word then it is removed. To reduce candidates [3] Ken Barker and Nadia Cornacchia. 2000. Using noun phrase heads further an assumption was made that, words adjacent to given to extract document keyphrases. In Proceedings of the 13th Biennial catchphrase will not be catchphrases. The assumption is justified Conference of the Canadian Society on Computational Studies of as catchphrases are identified by removing stop words; Intelligence, pages 40–52. [4] Ian H. Witten, Gordon W. Paynter, Eibe Frank, Carl Gutwin, and conversely stop words can be generated by removing Craig G. Nevill-Manning. 1999. KEA: Practical automatic keyphrase catchphrases. This modification to the stop list was done extraction. In Proceedings of the 4th ACM Conference on Digital simultaneously with generating catchphrases. The method Libraries, pages 254–255. carries an inherent bias as the candidates generated from [5] Maria Grineva, Maxim Grinev, and Dmitry Lizorkin. 2009. documents used in the beginning will be chosen according to a Extracting key terms from noisy and multitheme documents. In Proceedings of the 18th International Conference on World Wide Web, smaller stop list and those in the end will be according to a pages 661–670. larger list. To remove this bias the documents were chosen [6] Rada Mihalcea and Paul Tarau. 2004. TextRank: Bringing order into randomly to generate candidates. texts. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pages 404–411. [7] Zhiyuan Liu, Peng Li, Yabin Zheng, and Maosong Sun. 2009b. 4.3 Creating Vector Representation Clustering to find exemplar terms for keyphrase extraction. In Word vector representations were created using Google News Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pages 257–266. word-2-vec model. For phrases containing more than one word, [8] Takashi Tomokiyo and Matthew Hurst. 2003. A language model word vectors were combined by obtaining their weighted approach to keyphrase extraction. In Proceedings of the ACL average with the weights being the TFxIDF score of the Workshop on Multiword Expressions, pages 33–40. constituent words. [9] Xiaojun Wan, Jianwu Yang, and Jianguo Xiao. 2007. Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction. In Proceedings of the 45th 4.4 Training the Model Annual Meeting of the Association of Computational Linguistics, pages 552–559. Long-Short Term Memory units were used because text is [10] Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale considered to be a continuous input as the words used earlier hypertextual Web search engine. Computer Networks, 30(1–7):107– can affect words used later in the text. Keras framework on top 117. of TensorFlow backend was used to build the model. The number [11] Xin Jiang, Yunhua Hu, and Hang Li. 2009. A ranking approach to keyphrase extraction. In Proceedings of the 32nd International ACM of LSTM units in the model was 100, dropout was set to 0.5 and a SIGIR Conference on Research and Development in Information dense layer was added at the end to combine the outputs of the Retrieval, pages 756–757. units to give a probability. [12] Kazi Saidul Hasan and Vincent Ng. 2014. Automatic Keyphrase Extraction: A Survey of the State of the Art. In Proc. of the 52nd Annual Meeting of the Association for Computational Linguistics 5. EVALUATION RESULTS (ACL), pages 1262-1273. The proposed method achieved mean average precision of 0.0931 and overall recall of 0.0988. The precision could be probably improved by using a different model. Although the results are