=Paper=
{{Paper
|id=Vol-2143/paper5
|storemode=property
|title=Exploring the Use of Text Classification in the Legal Domain
|pdfUrl=https://ceur-ws.org/Vol-2143/paper5.pdf
|volume=Vol-2143
|authors=Octavia-Maria Şulea,Marcos Zampieri,Shervin Malmasi,Mihaela Vela,Liviu P. Dinu,Josef van Genabith
|dblpUrl=https://dblp.org/rec/conf/icail/SuleaZMVDG17
}}
==Exploring the Use of Text Classification in the Legal Domain==
<pdf width="1500px">https://ceur-ws.org/Vol-2143/paper5.pdf</pdf>
<pre>
     Exploring the Use of Text Classification in the Legal Domain
               Octavia-Maria Şulea                                         Marcos Zampieri                            Shervin Malmasi
         University of Bucharest, Romania                      University of Wolverhampton, United           Harvard Medical School, United States
                                                                             Kingdom

                    Mihaela Vela                                            Liviu P. Dinu                            Josef van Genabith
           Saarland University, Germany                          University of Bucharest, Romania                Saarland University, Germany
                                                                                                                        DFKI, Germany

Abstract                                                                             present day. We explore the use of lexical features and Support
In this paper, we investigate the application of text classification                 Vector Machine (SVM) ensembles on predicting the law area, the
methods to support law professionals. We present several experi-                     ruling, and on estimating the date of the ruling. We compare the
ments applying machine learning techniques to predict with high                      results of our method to those reported by a previous study [24]
accuracy the ruling of the French Supreme Court and the law area                     which used the same data. Finally, we also investigate how much
to which a case belongs to. We also investigate the influence of the                 of the final case description attached to the judge’s ruling needs to
time period in which a ruling was made on the form of the case                       be masked to obtain a synthetic draft description, close to what a
description and the extent to which we need to mask information                      lawyer would have at their disposal and how predictable the ruling
in a full case ruling to automatically obtain training and test data                 is based on this description. All results reported in this paper are in
that resembles case descriptions. We developed a mean probability                    fact on predictions based on these synthetic draft case descriptions,
ensemble system combining the output of multiple SVM classifiers.                    where what is to be predicted is masked in the training and test
We report results of 98% average F1 score in predicting a case ruling,               data and its descriptions in terms of features.
96% F1 score for predicting the law area of a case, and 87.07% F1
score on estimating the date of a ruling.
                                                                                     2   Related Work
1    Introduction                                                                    While text classification methods were investigated and applied
                                                                                     with commercial or forensic goals in mind for other areas (e.g. serv-
Text classification methods have been successfully applied to a                      ing better content or products to users through user profiling [23]
number of NLP tasks and applications ranging from plagiarism [2]                     and sentiment analysis, identifying potential criminals [25], crimes
and pastiche detection [6] to estimating the period in which a text                  [21], or anti-social behavior [4]), an area where these methods
was published [18]. In this paper we discuss the application of text                 have been under-explored, although both commercial and forensic
classification methods in the legal domain which, to the best of our                 interests exist, is the legal domain.
knowledge, is relatively under-explored and to date its application                     Assuming that argumentation plays an important role in law
has been mostly restricted to forensics [5].                                         practice, [19] investigate to which extent one can automatically
   In this paper we argue that law professionals would greatly                       identify argumentative propositions in legal text, their argumen-
benefit from the type of automation provided by machine learning.                    tative function and structure. They use a corpus containing legal
This is particularly the case of legal research, more specifically the               texts extracted from the European Court of Human Rights (ECHR)
preparation a legal practitioner has to undertake before initiating                  and classify argumentative vs. non-argumentative sentences with
or defending a case. The objective of the research reported in this                  an accuracy of 80%.
paper is the following: given a case, law professionals have to make                    Based on the association between a legal text and its domain label
complex decisions including which area of law applies to a given                     in a database of legal texts, [3] present a classification approach to
case, what the ruling might be, which laws apply to the case, etc.                   identify the relevant domain to which a specific legal text belongs.
Given the data available on previous court rulings, is it possible                   Using TF-IDF weighting and Information Gain for feature selection
train text classification systems that are able to predict some of                   and SVM for classification, [3] attain an f1-measure of 76% for the
these decisions, given a textual “draft” case description provided                   identification of the domains related to a legal text and 97.5% for
by the professional? Such a system could act as a decision support                   the correct classification of a text into a specific domain.
system or at least a sanity check for law professionals.                                Following the observation of a thematic structure in Canadian
   At present, law professionals have access to court ruling data                    court rulings, where the intro, context, reasoning, and conclusion
through search portals1 and keyword based search. In our work                        were found to be independent of the ruling itself, [8] present an
we want to go beyond this: instead of keyword based search, we                       automatic summarization of court rulings. [9] introduce a hybrid
use the full “draft” case description and text classification methods.               summarization system for legal text which combines hand crafted
For this purpose we acquire a large corpus of French court rulings                   knowledge base rules with already existing automatic summariza-
with over 126,000 documents, spanning from the 1800s until the                       tion techniques.
1 An example of a German website: https://www.juris.de/jportal/index.jsp                [11] proposed a system of classifying sentences for the task of
                                                                                     summarizing court rulings and, with the use of SVM and Naive
In: Proceedings of the Second Workshop on Automated Semantic Analysis of Informa-
tion in Legal Text (ASAIL 2017), June 16, 2017, London, UK.
                                                                                     Bayes applied to Bag of Words, TF-IDF, and dense features (e.g.
Copyright © 2017 held by the authors. Copying permitted for private and academic     position of sentence in document), obtained 65% f1 on 7 classes.
purposes.                                                                            Similarly, another study [10] used BOW, POS tags, and TF-IDF to
Published at http://ceur-ws.org
ASAIL 2017, June 16, 2017, London, UK                                                                                                    Şulea et al.


classify legal text in 3000 categories, based on a taxonomy of legal                     Law Area                           # of cases
concepts, and reported 64% and 79% f1.                                                   CHAMBRE_SOCIALE                        33,139
   For court ruling prediction, the task closest to our present work, a                  CHAMBRE_CIVILE_1                       20,838
few papers have been published: [12], using extremely randomized                         CHAMBRE_CIVILE_2                       19,772
trees, reported 70% accuracy in predicting the US Supreme Court’s                        CHAMBRE_CRIMINELLE                     18,476
behavior and, more recently, [26] tackled the task of predicting                         CHAMBRE_COMMERCIALE                    18,339
patent litigation and time to litigation (TTL) and obtained lower                        CHAMBRE_CIVILE_3                       15,095
than baseline 19% f1 for predicting the litigation outcome, but a                        ASSEMBLEE_PLENIERE                        544
remarkable 87% f1 for TTL prediction, when the interval considered                       CHAMBRE_MIXTE                             222
was less than 4 years, and only 43% f1 when the interval considered              Table 1. Distribution of cases according to the law area.
was narrowed down to less than a year. Among the most recent
studies, [1] proposed a computational method to predict decisions
of the European Court of Human Rights (ECRH) and [24] applied
linear SVM classifiers to predict the decisions of the French Supreme        For task 2, ruling prediction, we carry out two sets of experiments.
Court using the same dataset presented in this paper.                        A first set of experiments (6-class setup) considers only the first
   As evidenced in this section predicting court rulings is a new            word within each label and only those labels which appeared more
area for text classification methods and our paper contributes in            than 200 times in the corpus. This lead to an initial set of 6 unique
this direction, achieving performance substantially higher than in           labels: cassation, annulation, irrecevabilite, rejet, non-lieu, and qpc
previous work [24].                                                          (question prioritaire de constitutionnalitÃľ ). In the second set of
                                                                             ruling prediction experiments (8-class setup), we consider all labels
3    Corpus and Data Preparation                                             which had over 200 dataset entries and this time we did not reduce
In this paper, we use the diachronic collection of court rulings from        them to their first word as shown in Table 2.
the French Supreme Court, Court of Cassation (Court de Cassa-
tion). The complete collection2 contains 131,830 documents each
consisting of a unique court ruling including metadata formatted                      First-word ruling (6-class setup)        # of cases
in XML. Common metadata available in most documents include:                          rejet                                        68,516
law area, time stamp, case ruling (e.g. cassation, rejet, non-lieu, etc.),            cassation                                    53,813
case description, and cited laws. We use the metadata provided as                     irrecevabilite                                2,737
“natural” labels to be predicted by the machine learning system. In                   qpc                                             409
order to simulate realistic test scenarios, we automatically remove                   annulation                                      377
all mentions from the training and test data that explicitly refer to                 non-lieu                                        246
our target prediction classes.                                                        Full ruling (8-class setup)              # of cases
    During pre-processing, we removed all duplicate and incom-                        cassation                                    37,659
plete entries in the dataset. This resulted in a corpus comprising of                 cassation sans renvoi                         2,078
126,865 unique court rulings. Each instance contains a case descrip-                  cassation partielle                           9,543
tion and four different types of labels: a law area, the date of ruling,              cassation partielle sans renvoi               1,015
the case ruling itself, and a list of articles and laws cited within the              cassation partielle cassation                 1,162
description.                                                                          cassation partielle rejet cassation             906
    Taking the results by [24], henceforth Şulea et al. (2017), as a                  rejet                                        67,981
baseline, in this paper we tackle 3 tasks:                                            irrecevabilite                                2,376
     1. Predicting the law area of cases and rulings (Section 5.1).               Table 2. Distribution of cases according to ruling type.
     2. Predicting the court ruling based on the case description
        (Section 5.2).
     3. Estimating the time span when a case description and a
                                                                             Finally, in task 3, we investigate whether the text of the case de-
        ruling were issued (Section 5.3).
                                                                             scription contained indicators of the period when it was written, a
Deciding which labels to use in each experiment was not trivial              popular NLP task called temporal text classification addressed by a
as this information was very often not explicit in the instances of          recent SemEval task [22]. Table 3 shows the distribution of cases in
the dataset and the distribution of instances in the classes was very        each decade. Due to the amount of cases, we grouped all cases dated
unbalanced and sometimes inconsistent. For this reason, here we              1959 and before in a single class. We run temporal text classification
follow the decisions taken by Şulea et al. (2017) and summarize              experiments with 7 classes. Table 3 shows the distribution of cases
them next.                                                                   per decade.
   For task 1, predicting the law area of cases and rulings, out of 17       For the three tasks we eliminated the occurrence of each word
initial unique labels, the 8 labels that appeared in the corpus more         of the label from the text of the corresponding case description
than 200 times were kept. Table 1 shows the distribution of cases            following the methodology described in Şulea et al. (2017). For task
among each label.                                                            1, law area prediction, we eliminated all words contained in the
                                                                             label.
                                                                                 For predicting the ruling, we eliminated the ruling words them-
2 Acquired from https://www.legifrance.gouv.fr
                                                                             selves from all case descriptions. Aiming at a complete masking
Exploring the Use of Text Classification in the Legal Domain                                                          ASAIL 2017, June 16, 2017, London, UK


                     Time Span       # of cases                           in Section 3 (i.e. removing all “give-away” references in the original
                     Until 1959             201                           data to simulate a realistic draft case description scenario, where
                     1960 - 1969          4,797                           the prediction - here in task 1 law area - is not already preempted).
                     1970 - 1979         23,964                           Table 4 reports the average precision, recall, f1 score, and accuracy
                     1980 - 1989         18,233                           scores obtained of our method when discriminating between the
                     1990 - 1999         16,693                           aforementioned 8 classes each of them containing at least 200 in-
                     2000 - 2009         12,577                           stances. The scores reported by Şulea et al. (2017) using the same
                     2010 - 2016          4,541                           dataset are presented for comparison.
      Table 3. Distribution of cases in seven time intervals.

                                                                                      Model                P        R        F1      Acc.
                                                                                      Ensemble             96.8% 96.8% 96.5% 96.8%
of the ruling, we additionally looked at the top 20 most important                    Şulea et al. (2017) 90.9% 90.2% 90.3% 90.2%
features of each class to investigate whether some of them could be                  Table 4. Classification results for law area prediction.
directly linked to the target label. In this step, we realized that the
label was present both in its nominal form (e.g. cassation, irrecev-
abilite) and in its verbal form (e.g. casse, casser) and eliminated
both. For the task of predicting the century and decade in which a        We observe that the ensemble method outperforms the liner SVM
particular ruling took place, we eliminated all digits from the case      classifier by a large margin, 96.8% accuracy compared to 90.3%
description text, even though some of the digits referred to cited        reported by Şulea et al. (2017). We investigate the performance of
laws.                                                                     the ensemble system for each individual class by looking at the
                                                                          confusion matrix presented in Figure 1.
4     Methodology
We approach the three tasks using a system based on classifier
ensembles. Classifier ensembles have proven to achieve high per-                                                             Confusion Matrix
formance in many task classification tasks such as grammatical                                 ASSEMBLEE_PLENIERE
error detection [27], complex word identification [15], identifying                              CHAMBRE_CIVILE_1                                      0.8
self-harm risk in mental health forums [16], and dialect identifica-
                                                                                                 CHAMBRE_CIVILE_2
tion [17].
                                                                                                                                                       0.6
   There are many types of classifier ensembles and in this work we                              CHAMBRE_CIVILE_3
                                                                                True label


apply a mean probability classifier. The method works by adding                              CHAMBRE_COMMERCIALE
probability estimates for each class together and assigning the                                                                                        0.4
class label with the highest average probability as the prediction.                            CHAMBRE_CRIMINELLE
By using probability outputs in this way a classifier’s support for                                CHAMBRE_MIXTE                                       0.2
the true class label is taken into account, even when it is not the
                                                                                                 CHAMBRE_SOCIALE
predicted label (e.g. it could have the second highest probability).                                                                                   0.0
This method is considered to be simple and it has been shown to
                                                                                                                                          ERE

                                                                                                                                           E_1

                                                                                                                                           E_2

                                                                                                                                           E_3

                                                                                                                                            LE
                                                                                                                                            LE
                                                                                                                                         IXTE

                                                                                                                                          ALE
                                                                                                                                        CIA
                                                                                                                           CHA MINEL
                                                                                                                                      IVIL

                                                                                                                                      IVIL

                                                                                                                                      IVIL


                                                                                                                                      OCI
work well on a wide range of problems. It is intuitive, stable [14]
                                                                                                                                      ENI


                                                                                                                                     E_M
                                                                                                                      CHA OMMER
                                                                                                                                 E_C

                                                                                                                                 E_C

                                                                                                                                 E_C
                                                                                                                                  _PL


                                                                                                                                 E_S
                                                                                                                                MBR
                                                                                                                                  RI

and resilient to estimation errors [13] making it one of the most
                                                                                                                              E_C
                                                                                                                              LEE


                                                                                                                             MBR
                                                                                                                             MBR

                                                                                                                             MBR

                                                                                                                             MBR

                                                                                                                            E_C
                                                                                                                          MBR
                                                                                                               EMB


                                                                                                                         CHA
                                                                                                                         CHA

                                                                                                                         CHA

                                                                                                                         CHA


robust combiners described in the literature.
                                                                                                                        MBR
                                                                                                              ASS


   As features, our system uses word unigrams and word bigrams.
                                                                                                                    CHA


                                                                                                                               Predicted label
To evaluate the success of our method we compare the results ob-
tained by the mean probability ensemble system with the results
reported in Şulea et al. (2017) who approach the three tasks de-                         Figure 1. Confusion matrix for law area prediction.
scribed in this paper using the scikit-learn implementation [20] of
the LIBLINEAR SVM classifier [7] trained on bag of words and bag
                                                                          The confusion matrix presented in Figure 1 shows that cases from
of bigrams.
                                                                          the chambre mixte are the most difficult predict. This is firstly
   Finally, as to the evaluation, we employ a stratified 10-fold cross-
                                                                          because this class and assemblee pleniere, the second most difficult
validation setup for all experiments. We chose this approach to
                                                                          class to predict, contain the two lowest numbers of instances in the
be able to compare our results with those reported by Şulea et al.
                                                                          dataset (222 and 544 respectively), and secondly because by nature
(2017) and also to take the inherent imbalance of the classes present
                                                                          the chambre mixte received mixed cases from other courts such as
in the dataset into account. We report results in terms of average
                                                                          civil and commercial.
precision, recall, F1 score, and accuracy for all classes.
                                                                          5.2      Case Ruling
5     Results
                                                                          The results for the second task, court ruling prediction, are pre-
5.1   Law Area                                                            sented in Table 5. We report the results obtained in both experiment
In our first experiment, we trained our system to predict the law         setups, the 6-class setup and in the 8-class setup. The mean proba-
area of a case, given its case description preprocessed as described      bility ensemble once again outperforms the method by Şulea et al.
ASAIL 2017, June 16, 2017, London, UK                                                                                                                               Şulea et al.


(2017) in both settings. We observe a 2.9 percentage point decrease                                   Model                 P        R       F1      Acc.
in absolute average f1 score when the ensemble classifier is trained                                  Ensemble              87.3% 87.0% 87.0% 87.0%
on the dataset with more classes which is explained by the increase                                   Şulea et al. (2017) 75.9% 74.3% 73.2% 74.3%
in number of classes from 6 to 8 leading to a more challenging                                       Table 6. Classification results for temporal prediction.
classification scenario.

                                                                                              Results obtained by the ensemble system in this experiment out-
  Classes Model                   P        R        F1      Acc.                              perform the method by Şulea et al. (2017) by a large margin. This
     6      Ensemble              98.6% 98.6% 98.6% 98.6%                                     outcome once again confirms the robustness of classifier ensembles
     6      Şulea et al. (2017) 97.1% 96.9% 97.0% 96.9%                                       for many text classification tasks including those presented in this
     8      Ensemble              95.9% 96.2% 95.8% 96.2%                                     paper. The mean probability ensemble system achieved 87% f1 score
     8      Şulea et al. (2017) 93.2% 92.8% 92.7% 92.8%                                       against 73.2% reported by Şulea et al. (2017).
       Table 5. Classification results for ruling prediction.                                     The results obtained by our system in the temporal text clas-
                                                                                              sification task suggest that classifier ensembles are a good fit for
                                                                                              predicting the publication date not only of legal texts but other
To better understand the difficulties faced by our method in dis-                             types of texts as well. This is a particularly relevant application for
criminating between the ruling classes we first looked at the list                            researchers in the digital humanities who are often working with
of the most informative unigrams for each class. We found a few                               manuscripts with unknown or uncertain publication date. The use
clear cases of top-ranked words that are related to the target class,                         of ensembles for this task is, to the best of our knowledge, under
but even so the analysis did not go that far indicating that a more                           explored and should be investigated further.
interesting analysis is only possible without the aid of an expert in                             It should be noted, however, that predictions in this experiment
French law.                                                                                   are only estimates as the definition of time spans in unities such as
   Subsequently, we looked at the confusion matrix of predictions.                            month, year, or decade (in the case of this paper) is arbitrary. Previ-
In Figure 2 we present a confusion matrix of the performance                                  ous work in temporal text classification stressed that supervised
obtained for each individual class in the 6-class setup experiment.                           methods, such as the one presented in this paper fail to capture
We observe that the two most difficult classes for the system were                            the linearity of time [18, 28]. Other methods, such as ranking or
non-lieu and annulation. These two classe are also the two classes                            regression, could be applied to obtain more accurate predictions.
which contained the least amount of examples which probably led
to the poor performance of the classifier in identifying instances                            6    Conclusions and Future Work
from these classes.                                                                           In this paper we investigated the application of text classification
                                                                                              methods to the legal domain using the cases and rulings of the
                                                                                              French Supreme Court. We showed that a system based on SVM
                                                       Confusion Matrix                       ensembles can obtain high scores in predicting the law area and
                     ANNULATION                                                               the ruling of a case, given the case description, and the time span
                                                                                              of cases and rulings. The ensemble method presented in this paper
                                                                                        0.8
                       CASSATION                                                              outperformed a previously proposed Şulea et al. (2017) using the
                                                                                              same dataset.
                   IRRECEVABILITE                                                       0.6      We applied computational methods to mask the case description
      True label


                                                                                              attached to a judge’s ruling so that they convey as little information
                             NON
                                                                                              as possible about the ruling. This simulates the knowledge a lawyer
                                                                                        0.4
                                                                                              would have prior to entering court.
                                                                                                 The work presented in this paper confirms that text classifica-
                             QPC
                                                                                        0.2   tion techniques can indeed be used to provide valuable assistive
                                                                                              technology base as support for law professionals in obtaining guid-
                            REJET
                                                                                              ance and orientation from large corpora of previous court rulings.
                                                                                        0.0
                                                                                              In future work, we would like to investigate the extent to which
                                                 ION


                                                             LITE

                                                                    NON


                                                                           QPC


                                                                                  T
                                          N


                                                                                 REJE
                                       TIO


                                                                                              a more accurate draft form can be induced from the court’s case
                                                 SAT


                                                             I
                                                         VAB
                                       ULA

                                              CAS


                                                                                              description.
                                    ANN


                                                        ECE
                                                       IRR


                                                         Predicted label
                                                                                              Acknowledgements
                                                                                              Parts of this work have been carried out while the first and the
               Figure 2. Confusion matrix for law area prediction.
                                                                                              second author, Octavia-Maria Şulea and Marcos Zampieri, were at
                                                                                              the German Research Center for Artificial Intelligence (DFKI).
5.3      Temporal Text Classification                                                            We would like to thank the anonymous reviewers for providing
                                                                                              us with constructive feedback and suggestions.
Finally, in Table 6 we present the results obtained in the third
set of experiments described in this paper, predicting the time                               References
span of cases and rulings in a 7-class setting. Again all data was                             [1] Nikolaos Aletras, Dimitrios Tsarapatsanis, Daniel Preoţiuc-Pietro, and Vasileios
preprocessed as indicated in Section 3.                                                            Lampos. 2016. Predicting Judicial Decisions of the European Court of Human
Exploring the Use of Text Classification in the Legal Domain                              ASAIL 2017, June 16, 2017, London, UK


     Rights: A Natural Language Processing Perspective. PeerJ Computer Science 2
     (2016), e93.
 [2] Alberto Barrón-Cedeño, Marta Vila, M Antònia Martí, and Paolo Rosso. 2013.
     Plagiarism meets paraphrasing: Insights for the next generation in automatic
     plagiarism detection. Computational Linguistics 39, 4 (2013), 917–947.
 [3] Guido Boella, Luigi Di Caro, , and Llio Humphreys. 2011. Using classification to
     support legal knowledge engineers in the Eunomos legal document management
     system. In Proceedings of JURISIN.
 [4] Justin Cheng, Cristian Danescu-Niculescu-Mizil, and Jure Leskovec. 2015. Anti-
     social Behavior in Online Discussion Communities. In Proceedings of ICWSM.
 [5] Olivier De Vel, Alison Anderson, Malcolm Corney, and George Mohay. 2001.
     Mining E-mail Content for Author Identification Forensics. ACM Sigmod Record
     30, 4 (2001), 55–64.
 [6] Liviu P Dinu, Vlad Niculae, and Octavia-Maria Şulea. 2012. Pastiche Detection
     Based on Stopword Rankings: Exposing Impersonators of a Romanian Writer. In
     Proceedings of the Workshop on Computational Approaches to Deception Detection.
 [7] Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin.
     2008. LIBLINEAR: A Library for Large Linear Classification. Journal of Machine
     Learning Research 9 (2008), 1871–1874.
 [8] Atefeh Farzindar and Guy Lapalme. 2004. Legal Text Summarization by Explo-
     ration of the Thematic Structures and Argumentative Roles. Proceedings of the
     Text Summarization Branches Out Workshop (2004).
 [9] Filippo Galgani, Paul Compton, and Achim Hoffmann. 2012. Combining Different
     Summarization Techniques for Legal Text. In Proceedings of the Hybrid Workshop.
[10] Teresa Gonçalves and Paulo Quaresma. 2005. Evaluating preprocessing tech-
     niques in a text classification problem. In Proceedings of the Conference of the
     Brazilian Computer Society.
[11] Ben Hachey and Claire Grover. 2006. Extractive Summarisation of Legal Texts.
     Artificial Intelligence and Law 14, 4 (2006), 305–345.
[12] Daniel Martin Katz, Michael J. Bommarito II, and Josh Blackman. 2014. Predicting
     the Behavior of the Supreme Court of the United States: A General Approach.
     CoRR abs/1407.6333 (2014). http://arxiv.org/abs/1407.6333
[13] Josef Kittler, Mohamad Hatef, Robert PW Duin, and Jiri Matas. 1998. On Com-
     bining Classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence
     20, 3 (1998), 226–239.
[14] Ludmila I. Kuncheva. 2014. Combining Pattern Classifiers: Methods and Algorithms
     (second ed.). Wiley.
[15] Shervin Malmasi, Mark Dras, and Marcos Zampieri. 2016. LTG at SemEval-2016
     Task 11: Complex Word Identification with Classifier Ensembles. In Proceedings
     of SemEval.
[16] Shervin Malmasi, Marcos Zampieri, and Mark Dras. 2016. Predicting Post Severity
     in Mental Health Forums. In Proceedings of the CLPsych Workshop.
[17] Shervin Malmasi, Marcos Zampieri, Nikola Ljubešić, Preslav Nakov, Ahmed
     Ali, and Jörg Tiedemann. 2016. Discriminating between Similar Languages
     and Arabic Dialect Identification: A Report on the Third DSL Shared Task. In
     Proceedings of the VarDial Workshop.
[18] Vlad Niculae, Marcos Zampieri, Liviu P Dinu, and Alina Maria Ciobanu. 2014.
     Temporal Text Ranking and Automatic Dating of Texts. Proceedings of EACL
     (2014).
[19] Raquel Mochales Palau and Marie-Francine Moens. 2009. Argumentation Mining:
     The Detection, Classification and Structure of Arguments in Text. In Proceedings
     of the ICAIL.
[20] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M.
     Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cour-
     napeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine
     learning in Python. Journal of Machine Learning Research 12 (Oct 2011), 2825–
     2830.
[21] Verónica Pérez-Rosas and Rada Mihalcea. 2015. Experiments in Open Domain
     Deception Detection. In Proceedings of EMNLP, Lluís Màrquez, Chris Callison-
     Burch, Jian Su, Daniele Pighin, and Yuval Marton (Eds.). The Association for
     Computational Linguistics. http://aclweb.org/anthology/D/D15/D15-1133.pdf
[22] Octavian Popescu and Carlo Strapparava. 2015. SemEval 2015, Task 7: Diachronic
     Text Evaluation. In Proceedings of SemEval.
[23] O. Roozmand, N. Ghasem-Aghaee, M.A. Nematbakhsh, A. Baraani, and G.J.
     Hofstede. 2011. Computational Modeling of Uncertainty Avoidance in Consumer
     Behavior. International Journal of Research and Reviews in Computer Science
     (April 2011), 18–26.
[24] Octavia-Maria Sulea, Marcos Zampieri, Mihaela Vela, and Josef van Genabith.
     2017. Predicting the Law Area and Decisions of French Supreme Court Cases. In
     Proceedings of Recent Advances in Natural Language Processing (RANLP).
[25] Chris Sumner, Alison Byers, Rachel Boochever, and Gregory J. Park. 2012. Predict-
     ing Dark Triad Personality Traits from Twitter Usage and a Linguistic Analysis of
     Tweets. In Proceedings of ICMLA. DOI:http://dx.doi.org/10.1109/ICMLA.2012.218
[26] Papis Wongchaisuwat, Diego Klabjan, and John O McGinnis. 2016. Predict-
     ing Litigation Likelihood and Time to Litigation for Patents. arXiv preprint
     arXiv:1603.07394 (2016).
[27] Yang Xiang, Xiaolong Wang, Wenying Han, and Qinghua Hong. 2015. Chinese
     Grammatical Error Diagnosis Using Ensemble Learning. In Proceedings of the
     NLP-TEA Workshop.
[28] Marcos Zampieri, Shervin Malmasi, and Mark Dras. 2016. Modeling Language
     Change in Historical Corpora: The Case of Portuguese. In Proceedings of LREC.

</pre>