=Paper= {{Paper |id=Vol-2563/aics_42 |storemode=property |title=Convolutional Neural Network-based Automatic Prediction of Judgments of the European Court of Human Rights |pdfUrl=https://ceur-ws.org/Vol-2563/aics_42.pdf |volume=Vol-2563 |authors=Arshdeep Kaur,Bojan Bozic |dblpUrl=https://dblp.org/rec/conf/aics/KaurB19 }} ==Convolutional Neural Network-based Automatic Prediction of Judgments of the European Court of Human Rights== https://ceur-ws.org/Vol-2563/aics_42.pdf
 Convolutional Neural Network-based Automatic
 Prediction of Judgments of the European Court
                of Human Rights

                           Arshdeep Kaur and Bojan Božić

             School of Computer Science, Technological University Dublin
             arshdeep.kaur@myTUDublin.ie, bojan.bozic@TUDublin.ie



        Abstract. In the past few years, predictive modelling has brought revo-
        lutionary changes in the way various industries function. Advancements
        in the areas of Deep Learning (DL) and Natural Language Processing
        (NLP) have made their application to different problem areas highly
        promising. In the legal domain, positive results have been obtained in
        predicting the judgements of various Courts of different countries using
        DL and NLP. However, not much research has been carried out in the
        area of legal judgement forecasting for the European Court of Human
        Rights (ECHR). The models designed in the previous research employ
        only one Machine Learning algorithm namely a Support Vector Machine
        (SVM) to solve such problem.
            This study applies DL and NLP to the problem of automatic pre-
        diction of judgements for ECHR. Extensive experiments are conducted
        which compare the performance of models trained on SVM with linear
        kernel as part of previous research (Medvedeva, Vols & Wieling, 2018)
        with the models trained on Convolutional Neural Networks (CNN) as
        proposed in this study. To implement this, state-of-the-art NLP tech-
        niques are applied to the text data. Moreover, pre-trained and custom
        trained Word Embedding text representations are considered. Statistical
        tests are performed to gather sufficient statistical evidence to determine
        which algorithm performs better at providing a solution to this problem.
        Based on the results obtained, it is established that overall, CNN models
        outperform SVM models as the former achieves an average accuracy of
        82% whereas the latter achieves 75%. Specifically, CNN models for four
        Articles out of nine achieve statistically significant higher accuracy than
        SVM models.
        Keywords: Convolutional Neural Network, Support Vector Machine, Nat-
        ural Language Processing, Word Embedding, European Court of Human
        Rights


1     Introduction
Law and Order is one of the integral components of society. It is a well-known
fact that in any legal court, case proceedings take a lot of time before a final
judicial decision is declared. This leads to piling of cases and long waiting time




Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).
which is an ordeal for innocent people. Lawlor once surmised that one day, com-
puters would be able to analyze and predict the outcomes of judicial decisions
(Lawlor, 1963). Automating the process of forming legal judgments on different
court cases is one of the promising areas of application of artificial intelligence
that can revolutionize the legal domain. Building an automated system that can
accurately predict the outcome of a case will help in reducing the error of judg-
ment that might happen by a human judge. It will allow the judges to focus
on more complex tasks by prioritizing the cases. This will also improve the de-
lay caused in handling legal cases and will provide justice to individuals faster.
Hence, the efficiency of the judicial system will be improved.
    Automatic prediction of ECHR case judgments has been previously carried
out by developing systems that employ Machine Learning (ML) algorithms like
SVM. Although such systems have achieved a decent prediction accuracy, there is
a possibility of building more accurate prediction models by leveraging the power
of DL algorithms. Models built using such algorithms are capable of learning the
complexities of large text datasets which other ML models may not be able to
do. Based on the literature review performed in the next section, CNN models
have been observed to give good results for many text classification problems
and hence, they seem to be a potential modeling technique for ECHR cases.
    In the previous work, N-grams were used as the text representation. Although
they can capture some of the context of the text data, they can lead to high-
dimensional sparse matrices. Text representation such as Word embeddings can
capture the semantics of data well and has been observed to give good results
with models, based on literature review. This aspect of NLP has not been ex-
plored previously which needs to be evaluated for ECHR problem. If a legal
assistant system is built using DL algorithms based on Word Embeddings and
succeeds at achieving higher accuracy, it can be useful in assisting judges at
ECHR with unforeseen real word legal cases once it is deployed.
    This study has implemented automated decision making for ECHR by lever-
aging the power of Semantic Analysis of text and the ability of CNN to learn
local features in text without manually engineering the features. The research
methodologies used were a mix of qualitative and quantitative methods where
a Sequential Exploratory design was followed. Secondary research has been con-
ducted to collect an existing dataset for ECHR judicial cases, perform a system-
atic literature review and summarise the findings. Using Constructive form of
research, two different models were built, a general ML model and a DL model,
which were analytically compared to determine whether the new model outper-
forms the baseline model. Deductive reasoning has been employed in validating
the hypothesis and concluding whether the model built as part of this study
was better at predicting the judicial decisions of ECHR cases than the existing
model based on statistical test results.
2    Related Work

   2.1 Implementation of Artificial Neural Networks (ANNs) for NLP
applications in various domains
This section describes the research work carried out in various text classification
problems where models trained on DL algorithms outperformed models trained
on ML algorithms.
    Classification of medical documents at the sentence level into 26 clinical sub-
ject categories has been performed using various modeling and data processing
techniques. It has been observed that CNN models with Word2Vec achieved
15% higher accuracy compared to Logistic Regression with Sentence Embedding,
Mean Word Embedding, and Bag of Words (Hughes, Li, Kotoulas & Suzumura,
2017). A system was designed to predict Colorectal Cancer among patients by
capturing temporal nature of medical records dataset obtained from Julius Cen-
tre, Netherlands. It was found that Recurrent Neural Network models (RNNs)
performed at par with the state-of-the-art ML algorithms namely SVM, Random
Forests, Logistic Regression, and Decision Trees and achieved an Area Under the
Curve (AUC) of 0.811 (Amirkhan, Hoogendoorn, Noomans & Moons, 2017). All
such research work signifies that ANNs like CNN and variants of RNN can give
better performance than traditional ML models without any need of manually
handcrafting the features for text classification problems in various domains.
    2.2 Implementation of Artificial Intelligence(AI) in Legal Industry
The wide spectrum of research carried out in the past years have constantly
reiterated the potential of AI in the legal domain. It has been suggested that if
an accurate legal expert system is deployed, it can have a profound effect on the
legal sector (Bermen & Hafner, 1989).
    Research related to the application of AI in the legal domain has been ongo-
ing for the past couple of years. ML algorithms have been designed for predictive
modeling for different problems and promising results have been obtained. For
French Supreme Court, a model designed using SVM based on unigram and bi-
gram text representation achieved 98% average F1 score in predicting the case
judgment (Sulea, Zampieri, Vela, Dinu & Genabith, 2017). A Random Forest
model trained on the asylum cases dataset predicted with an accuracy of 80%
whether an applicant would be granted asylum by the US Courts (Dunn, Sagun,
Sirin & Chen, 2017). ANN architectures like CNN and RNN variants and dif-
ferent NLP techniques have also been employed for various legal problems. For
US Supreme Court, ANNs have been designed for classification of documents for
determining the outcome of legal cases. It has been observed that CNN models
with Word2Vec text representation achieved 72.4% accuracy in classifying the
court decision into 15 categories. Various modeling methods were implemented
like Latent Dirichlet Allocation with logistic regression, Doc2vec with logistic
regression, SVM with bag of words, CNN with Word2Vec, GloVe and fasttext,
Long short-term memory (LSTM) with Word2Vec and Gated Recurrent Unit
(GRU) with Word2Vec. CNN with Word2vec was found to achieve the best ac-
curacy (Undavia, Meyers & Ortega, 2018). In another research work, DNN us-
ing ReLU activation function, momentum optimization and dropout technique
trained on data containing 7700 cases outperformed SVM models and achieved
70.4% accuracy in predicting the judgments of US Supreme Court (Sharma,
Mittal, Tripathi & Acharya, 2015). All such research work implies that ML and
DL algorithms give positive results in solving many problems of legal domain.
    2.3 Implementation of ANNs in ECHR
A small body of research has been previously carried out for designing a legal
forecasting system for ECHR. The previous work on judicial cases developed
an SVM model having a linear kernel and applied on data represented using
N-Gram and topics which predicted whether an Article of ECHR Convention
(Article 3, 6 and 8 considered) has been violated or not with 79% accuracy (Ale-
tras, Tsarapatsanis, Preotiuc-Pietro & Lampos, 2016). The scope of this study
was limited in the way that it did not train SVM models for all the Articles of
ECHR. This was considered in further research conducted as an extension of this
work. SVM models based on data represented using Bag of Words were trained
for each of the Articles 2, 3, 5, 6, 8, 10, 11, 13 and 14 which achieved an average
training accuracy of 75% (Medvedeva, Vols & Wieling, 2018).
    Based on the related work, it has been observed that ANNs like CNN and
RNN variants can provide promising results for various text classification prob-
lems. It depends on the data, whether capturing local key phrases or long-term
semantic dependencies by a model would lead to better classification perfor-
mance. Hence, it cannot be said beforehand which model out of CNN and RNN
variants would perform better for any problem. For similar problem in ECHR,
only general ML algorithm like SVM has been implemented and good perfor-
mance results have been obtained. But there is a possibility that if ANNs are
trained and evaluated for ECHR, they can outperform the results obtained from
SVM models. This is the motivation which led to further investigation in this
study.


3   Methods

The implementation of the research work was performed as per the CRISP-DM
model. Accordingly, major steps incorporated in this study are Data Under-
standing, Data Preparation, Modelling, and Evaluation as depicted in Figure
3.1. In the first phase, existing dataset was obtained and understood as part of
the Data Collection and Data Understanding unit. In the second phase, the ex-
periment conducted previously in the referred research paper (Medvedeva, Vols
& Wieling, 2018) was replicated. Data preparation was performed, and a model
trained on SVM was designed. In the third phase, new experiment was conducted
where data preparation was done differently followed by building a CNN model
for each article. These two phases were part of Data preparation and Modelling
units. In the last phase, performance of designed models was evaluated, and hy-
pothesis testing was carried out using statistical tests. Detailed description of
the experimental process has been provided in the following sections.




                  Fig. 3.1. High Level Description of Experiments


     3.1 Data Understanding
Dataset collection was a primary step performed for achieving the research ob-
jectives of this study. As data is the foundation on which all the models are
built, it is important to ensure that the collected data is of high quality. Since
the aim of this study was to compare the performance of previously built SVM
models with the proposed CNN models, it was important to collect the same
dataset to carry out such an analysis. For this reason, Crystal ball data was the
dataset that was obtained from the previous research work (Medvedeva, Vols
& Wieling, 2018) carried out in predicting the judgments of ECHR legal cases.
It contained training and test dataset for each of the articles namely Article 2,
3, 5, 6, 8, 10, 11, 13 and 14, which correspond to different Human Rights. For




          Table 3.1. Count of cases in Training and Test dataset for ECHR



each article, these datasets contained text files of different published cases along
with their declared judgments by ECHR. They were obtained from HUDOC
website on September 11, 20171 . All the text files present in the dataset were
in a structured format where each file of any article had the same subsections.
These subsections were Procedures, Circumstances, Relevant Laws and Facts,
1
    https://www.dropbox.com/s/lxpvvqdwby30157/crystal_ball_data.tar.gz
which were treated as features of the dataset. The target value of any case was
of type binary and could take any of the two-class values namely ‘violation’
or ‘non-violation’, implying violation or non-violation of an article. The total
number of judicial cases for all the articles in training and test dataset were
3132 and 8400 respectively. The distribution of cases for each article of ECHR
for training and test dataset can be understood from Table 3.1. The training
dataset was a balanced dataset and the test dataset contained cases with only
‘violation’ target for all the articles except Article 14 which contained cases with
only ‘non-violation’ target. Other articles namely Article 4, 7, 9, 12 and 18 have
been dropped as they contained very a smaller number of cases.

   3.2 Data Preparation
The existing dataset was loaded and visualized for better understanding. It was
observed that the data needed to be cleaned to improve its quality and text pre-
processing was required to convert it into a form suitable for modeling. Since
this study involved a comparison of previous work with the new solution, the
previous work has been reproduced. Data preparation for SVM model was very
different from the data preparation performed for CNN model.

Data Preparation for SVM Models
The text data was organized in the form of various features namely Procedures,
Circumstances, Relevant Laws and Facts. All these features were not equally
important in predicting the decision of a judicial case of ECHR. In the previ-




                   Table 3.2. Best parameters for SVM models



ous work, the best predictors were identified for each of the articles. In order
to reproduce the previous work, only the data corresponding to the best pre-
dictor was retained. The text data was represented using N-grams as this was
the representation used in the prior work and Tf-Idf Vectorizer was used to re-
move irrelevant N-grams from text. A grid search operation was conducted over
different parameters for SVM model for each article and the best parameters
found are described in Table 3.2. The data pre-processing for SVM model was
performed as per the results of Grid Search operation.
Data Preparation for CNN Models

As discussed in the literature review, the advantage of ANNs lies in the fact that
they can automatically extract important features from text without having to
manually engineer them. Because of this reasoning, whole text data was consid-
ered for building predictive CNN models.
    It is a well-known fact that word embeddings are good at capturing the con-
text of text data (Rudkowsky et al., 2018), therefore this text representation has
been chosen for building CNN model. They were assumed to provide better pre-
diction accuracy as they captured relevant information from data. Initially, all
the data from different subsections were extracted and merged together for each
case as no feature selection was to be performed. Then, data cleaning operation
was performed where unwanted symbols, white spaces, and digits were removed.
Various techniques of NLP were employed for text data pre-processing. Punctu-
ation marks were removed using regular expressions, all the text was converted
to lowercase, pre-defined stop word list was used to remove stop words and all
the words were lemmatized. Lemmatization was implemented instead of Stem-
ming as it is considered more effective in areas like legal domain where language
plays an important role (Plisson, Lavrac & Mladenic, 2004). Since there were
too many words in the text data which could add noise, ten percent of the words
with lowest tf-idf scores were removed for all the cases. This helped in removing
less important document specific words which might not have been present in
the pre-defined stop word list containing only most commonly used words. Tf-Idf
scores were used instead of any other method like Zipf’s law, etc. for generating
list of irrelevant words because it determines the significance of words not only
based on the document in which it is present but also across the entire collection
of documents. The words with the lowest tf-idf scores have the least relevance
across all documents, hence they were eliminated.
    Various word embedding text representations have been used for preparing
data for CNN models. The pre-trained word embeddings used are fasttext, GloVe
and Word2vec. Since these embeddings have a vocabulary which is defined be-
forehand using data from different domains, a customized word embedding has
also been learned in order to build the legal domain-specific vocabulary. This
takes care of the fact that the pre-trained embeddings may not contain the
words of legal domain and hence, they may or may not be good representations
for ECHR data. All these representations have been implemented on the pre-
processed text data. Different CNN models were built in the later stages with
each of these word embeddings and a comparison was made to determine their
effectiveness.

   3.3 Modeling

Once the data pre-processing was carried out and text data was converted into
vectors of real numbers using appropriate text representation, a model trained
on SVM with linear kernel and a model trained on CNN were designed for each
article to answer the research question.
SVM Modelling
For performing a comparative analysis, it was important to reproduce the previ-
ous work. To achieve this, an experiment was conducted to build an SVM model
with linear kernel for each article of ECHR. All the parameters found using grid
search as presented in Table 3.1 were employed in building SVM models. In
the previous work, the performance of the models was evaluated using accuracy
as the performance metric, considering the dataset was balanced in nature. To
evaluate the performance of SVM model, 10-fold cross-validation was performed
to determine the training accuracy of the model. A total of 9 SVM models, one
for each article, were fitted on the training data. To check the testing accuracy,
each of the models trained on SVM was applied on the test dataset and test-
ing accuracies were reported. Confusion matrices and Classification reports for
training and test datasets were generated to better understand the performance
of the models. The results (training accuracies) obtained indicated that
the experiment was successful at replicating the original work.

CNN Modelling

The CNN modelling was carried out in two different experiments. In the first
experiment, a comparison was made between different word embedding repre-
sentations to identify the best text representation for training CNN model for
ECHR problem. In the second experiment, the best-found text representations
were used to determine the training accuracy of CNN models to compare their
performance with the previous work based on SVM models.
    After data preparation, different word embeddings like fasttext, Word2Vec,
GloVe, and custom trained embedding were used for text representation. A con-
servative CNN architecture was considered as a starting point for all articles
for the ease of comparison. Since DL models usually take more time to build
than ML algorithms, holdout method (75:25 train-test split) was considered for
evaluating the performance of the CNN models. The CNN models were fitted
on the data and architecture changes were performed to find the best models.
The criteria chosen to select the best model was that if the training
accuracy of the CNN model for any of the considered text represen-
tations was more than that of the SVM model, the architecture was
selected. If the model architecture was considered unsuitable for data, either
the hyperparameters or the number of layers in the architecture were modified
manually. Early Stopping and Checkpointing were used as the callbacks to mon-
itor the model performance. Since accuracy was used as a performance measure
for SVM models, the same metric has been used for this study to ensure consis-
tency and ease of comparison. The best performing CNN model as per the
selection criteria considered was found similar for all Articles except
5 and 11. The best-found word embedding was Word2Vec for articles
2, 8 and 10, GloVe for article 3, fasttext for articles 5 and 11, and
custom for articles 6, 13 and 14.
    To get a common ground for such a comparison, same CNN architectures
for corresponding articles with the best-found text representations were used.
CNN models were then trained and evaluated on the data using 10-fold cross-
validation, which was used in the previous work. This helped in attaining training
and testing accuracy which could be compared with the corresponding values
of SVM models. Based on the results, it was found that the best-found
CNN models performed better at predicting the judgements of ECHR
for all articles as compared to reproduced SVM models.

    3.4 Evaluation

After the modeling phase, it was found that CNN models achieved higher train-
ing accuracy than SVM models. To determine if such differences were actual and
not due to random chance, statistical significance difference tests were performed.
For such tests to be performed, several samples of training accuracies for both
SVM and CNN models for each article were required. To achieve this, 2 times
repeated 10-fold cross-validation was performed to evaluate the performance of
models. For each article, 20 (2*10) such training accuracies were obtained for
SVM models and the best-found CNN models respectively. To perform differ-
ence tests, it was important to understand if the distribution of data samples was
Gaussian. Visual normality inspections like histograms, probability distributions
and QQ plots were performed. To further validate normality, normality test such
as Shapiro-Wilk test was used on the distribution of training accuracies of both
SVM and CNN models as this test is considered to be a highly powerful normal-
ity test (Mendes & Pala, 2003). If the p-value was less than 0.05, then one could
reject normality hypothesis else one failed to reject it. The distribution was not
normal for SVM models for Article 2 and 5, and CNN models for Article 5 and
11. For SVM and CNN models of all other articles, the distribution was found
normal. If the distribution was found Gaussian for both the models for each
article, then parametric difference test was conducted otherwise non-parametric
difference test was conducted. Although the same dataset was used for SVM and
CNN models, the data for both were processed in different ways and was shuf-
fled randomly before training, hence independent samples difference test was
considered. Student’s t-Test was considered as parametric difference test and
Mann-Whitney U Test was considered as non-parametric difference test for such
case. The p-value obtained from the difference test was indicative of whether the
difference in the training accuracies was statistically significant. If the p-value
was less than 0.05, then it was stated with confidence that there was enough
evidence to reject the null hypothesis and there was statistically significant dif-
ference in the training accuracy of both the models. If value was more than 0.05,
then there was not enough evidence to reject the null hypothesis.
    The findings could be related to the research question as the statistical tests
helped in comparing both SVM and CNN models and determining whether there
was a statistically significant difference in their training accuracies. If there was
such a difference, then it could be concluded whether the solution in the re-
ferred paper or the solution proposed in this study was better at predicting the
decisions of ECHR cases.
4   Results

This section provides an analysis of the results obtained and uses them to validate
the hypothesis and answer the research question. It was observed that the best-
found CNN models achieved a higher training accuracy than SVM models for
all articles when their performance was evaluated using 10-fold cross-validation
technique. To validate whether such difference in mean accuracies was statisti-
cally significant, difference tests were performed for all the articles. Based on the
results obtained, the hypothesis assumed for different articles can be confirmed
or refuted as below:
    Article 2, 10, 11 and 14: Experimental results provided sufficient statis-
tical evidence to reject the null hypothesis. A model trained with CNN, using
Word embedding as text representation, achieved a statistically significant higher
training accuracy than a model trained with SVM having the linear kernel, using
N-gram as text representation, for predicting whether an Article of ECHR has
been violated or not.
    Article 3, 5, 6, 8, and 13: Experimental results failed to provide sufficient
statistical evidence to reject the null hypothesis. A model trained with CNN,
using Word embedding as text representation, did not achieve a statistically sig-
nificant higher training accuracy than a model trained with SVM having the
linear kernel, using N-gram as text representation, for predicting whether an
Article of ECHR has been violated or not.
    For Articles 3, 5, 6, 8 and 13, there exists a possibility that a statistically
significant higher training accuracy could have been achieved if a different ar-
chitecture of CNN model was considered. This requires further experiments and
observations to be validated, which can be carried out as part of future work.
    The results of hypothesis testing indicate that overall, the models trained on
CNN achieved either similar or higher training accuracy than models trained
on SVM with linear kernel for predicting the judgments of ECHR. The aver-
age training accuracy of CNN models was 82% which has been found
higher than the average training accuracy of SVM models which was
75%. Thus, the CNN models trained as part of this study prove to
be better predictive modeling solutions compared to the SVM models
proposed in previous research (Medvedeva, Vols & Wieling, 2018) for
the problem of legal forecasting for ECHR.


5   Conclusion

    5.1 Contributions and impact
Various experiments have been conducted in this study to develop a highly ac-
curate legal forecasting system for ECHR. Based on the statistical test results, it
has been found that the CNN models achieved either similar training accuracies
(Articles 3, 5, 6, 8 and 13) or higher training accuracies (Articles 2, 10, 11 and
14) than SVM models. Overall, CNN models built as part of this study achieved
an average training accuracy of 82% which is higher than 75% as achieved by
SVM models designed in the previous research. This is a significant improvement
in the accuracy with which the outcomes of the legal cases are predicted. This
study makes significant contribution to the existing body of research on applica-
tion of AI in the legal domain in terms of the building a legal forecasting system
for ECHR using DL algorithm and word embeddings which have not been done
before. The results act as proofs of the positive impact that the improved predic-
tion accuracy will make in the legal domain. If such CNN models are deployed
as legal assistant systems in real-time, they can fasten the litigation process and
help in improving the current judicial state of ECHR. Also, the study opens
various avenues of future research which can make great strides in automating
many tasks of legal domain.

   5.2 Future Work and recommendations

As part of future work, some recommendations can be suggested for further ex-
ploration. First, the CNN models were not trained for Articles 4, 7, 9, 12 and
18 due to less data samples available in the dataset considered for this study.
Therefore, more published case judgments can be collected from ECHR website
in order to design a comprehensive legal assistant system. Second, the part ‘Cir-
cumstances’ was identified as the most important part of text for determining
the decision of a legal case (Aletras, Tsarapatsanis, Preoţiuc-Pietro & Lampos,
2016) using SVM models for Articles 3, 6 and 8. A similar experiment can be
conducted to identify the best part(s)/predictor(s) to train CNN model for each
article, instead of using the whole data. It can be verified whether CNN mod-
els trained on the best predictor(s) achieve higher training accuracy compared
to the results obtained in this study. Third, since the dataset considered was
collected in September 2017, further data augmentation can be performed by
collecting data from the ECHR website post this date. The large size of dataset
may improve the learning of CNN models and it can be investigated whether
they yield better accuracy results. Fourth, the best-found CNN models in the
study were the ones which achieved higher training accuracy than SVM models,
for specific word embeddings. There exists a possibility that even better results
can be obtained if further architecture changes are made to the CNN models
for each article. Also, it might be possible that with custom trained word em-
beddings, such architectural changes can significantly improve the performance
of the models. The experimental study can be conducted on this in future and
the training accuracies obtained can be compared with the results of this study.
Fifth, since GRU recurrent models usually perform well with text data, further
exploration can be performed to find appropriate ways of fitting such models on
training data. It can be identified whether learning long term dependencies in
the text is more useful than learning key-phrases for predicting the judgments
of ECHR. Sixth, the accuracy with which CNN models make decisions can be
compared with that of an actual human judge on various cases of all articles.
This will be useful in determining whether CNN models can be deployed as
standalone systems rather than as legal assistant systems.
References

   Lawlor, R. (1963). What Computers Can Do: Analysis and Prediction of
Judicial Decisions. American Bar Association Journal, 49(4), 337-344.
     Hughes, M., Li, I., Kotoulas, S., & Suzumura, T. (2017). Medical text clas-
sification using convolutional neural networks. Studies in health technology and
informatics, 235, 246-250.
   Amirkhan, R., Hoogendoorn, M., Numans, M. E. & Moons, L. (2017). Using
Recurrent Neural Networks to Predict Colorectal Cancer among Patients. 2017
IEEE Symposium Series on Computational Intelligence (SSCI), 1-8.
   Bermen, D. H. & Hafner, C. D. (1989). The Potential of Artificial Intelligence
to Help Solve the Crisis in Our Legal System. Communications of the ACM,
32(8), 928-938.
    Sulea, O., Zampieri, M., Vela, M., Dinu, L. P. & Genabith, J. (2017). Pre-
dicting the Law Area and Decisions of French Supreme Court Cases. Proceedings
of Recent Advances in Natural Language Processing, 716–722.
    Dunn, M., Sagun, L., Şirin, H., & Chen, D. (2017, June). Early predictability
of asylum court decisions. Proceedings of the 16th edition of the International
Conference on Articial Intelligence and Law, 233-236.
    Undavia, S., Meyers, A., & Ortega, J. E. (2018). A comparative study of
classifying legal documents with neural networks. In 2018 Federated Conference
on Computer Science and Information Systems (FedCSIS), 515-522.IEEE.
   Sharma, R. D., Mittal, S., Tripathi, S., & Acharya, S. (2015). Using Modern
Neural Networks to Predict the Decisions of Supreme Court of the United States
with State-of-the-Art Accuracy. International Conference on Neural Information
Processing, 475-483. Springer, Cham.
   Aletras, N., Tsarapatsanis, D., Preoţiuc-Pietro, D., & Lampos, V. (2016).
Predicting judicial decisions of the European Court of Human Rights: A Natural
Language Processing perspective. PeerJ Computer Science, 2(e93), 1-19.
    Medvedeva, M., Vols, M., & Wieling, M. (2018). Judicial decisions of the
European Court of Human Rights: Looking into the crystal ball. Proceedings of
the Conference on Empirical Legal Studies in Europe 2018, 1-24.
   Rudkowsky, E., Haselmayer, M., Wastian, M., Jenny, M., Emrich, Š., & Sedl-
mair, M. (2018). More than bags of words: Sentiment analysis with word em-
beddings. Communication Methods and Measures, 12(2-3), 140-157.
  Plisson, J., Lavrac, N., & Mladenic, D. (2004). A Rule based Approach to
Word Lemmatization. Proceedings of IS-2004, 83-86.
    Mendes, M., & Pala, A. (2003). Type I error rate and power of three normality
tests. Pakistan Journal of Information and Technology, 2(2), 135-139.