<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Surviving the Legal Jungle: Text Classification of Italian Laws in extremely Noisy conditions</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Riccardo Coltrinari</string-name>
          <email>( railcecsasradnod.rcoo.latnrtiinnaorrii )@studenti.unicam.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandro Antinori</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabio Celli</string-name>
          <email>fabio.celli@maggioli.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science dept., University of Camerino</institution>
          ,
          <addr-line>Camerino, MC</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Computer Science dept., University of Camerino</institution>
          ,
          <addr-line>Camerino, MC</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Research and Development, Maggioli S.p.A.</institution>
          ,
          <addr-line>Santarcangelo, RN</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we present a method based on Linear Discriminant Analysis for legal text classification of extremely noisy data, such as duplicated documents classified in di erent classes. The results show that Linear Discriminant Analysis obtains very good performances both in clean and noisy conditions, if used as classifier in ensemble learning and in multi-label text classification.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        We address text categorization of
businessoriented legal documents in Italian, but with a
custom and overlapping hierarchy of product
categories. A typical approach to tackle similar tasks
is to exploit resources such as EUROVOC
        <xref ref-type="bibr" rid="ref10">(Daudaravicius, 2012)</xref>
        , a multilingual thesaurus
consisting of over 6700 hierarchically-organised class
descriptors used by many organizations of the
European Union (EU) for the classification and
retrieval of o cial documents. Our editorial
system has a hierarchy of 23 product categories and
more than 20600 labels, manually annotated and
customized for di erent clients in more than 15
years, hence it is not possible to exploit resources
like EUROVOC to categorize documents.
      </p>
      <p>In this paper, we propose a fast and e cient
method for document classification for noisy data
based on Linear Discriminant Analysis, a
dimensionality reduction technique that has been
employed successfully in many domains, including
neuroimaging and medicine. We believe that our
contribution will be useful to the NLP
community in the context of document categorization as
well as automatic ontology population, in
particular when dealing with very noisy data.</p>
      <p>The paper is structured as follows: in Section
1.1 we present the related works in the field of
text classification and the potential of Linear
Discriminant Analysis, in Section 2 we describe the
datasets we used, in Section 3 we report and
discuss the result of our classification experiments
and in Section 4 we draw our conclusions.
1.1</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        There are many applications of NLP in the
legal text domain, such as the creation of
ontologies for knowledge extraction
        <xref ref-type="bibr" rid="ref14">(Lenci et al., 2009)</xref>
        or legal reasoning
        <xref ref-type="bibr" rid="ref18">(Palmirani et al., 2018)</xref>
        , other
tasks include dependency parsing
        <xref ref-type="bibr" rid="ref11">(Dell’Orletta et
al., 2012)</xref>
        , deception detection
        <xref ref-type="bibr" rid="ref12">(Fornaciari et al.,
2013)</xref>
        and semantic annotation exploiting external
resources like FrameNet
        <xref ref-type="bibr" rid="ref24">(Venturi, 2011)</xref>
        . In this
domain, the most popular way to perform text
categorization is to use ontologies: for example many
used EUROVOC to label documents in several
languages
        <xref ref-type="bibr" rid="ref22">(Steinberger et al., 2013)</xref>
        with one label
for each document, in order to train SVMs
        <xref ref-type="bibr" rid="ref4">(Boella
et al., 2013)</xref>
        or deep learning models
        <xref ref-type="bibr" rid="ref6">(Caled et al.,
2019)</xref>
        , for the prediction of labels at di erent
levels of granularity in the label hierarchy. Another
approach is to use the judgments of the Supreme
Court as gold standard labels, thus reducing the
complexity of the task, and then train machine
learning models, such as SVMs, to perform
classification
        <xref ref-type="bibr" rid="ref23">(Sulea et al., 2017)</xref>
        . It is known that
active learning does not reach a good performance in
the legal domain
        <xref ref-type="bibr" rid="ref7">(Cardellino et al., 2015)</xref>
        , but it is
possible to align di erent resources to perform
ontology population or expansion
        <xref ref-type="bibr" rid="ref8">(Cardellino et al.,
2017)</xref>
        . The state-of-the-art in text classification
ranges from 40% to 85% or more, depending on
the complexity and size of the dataset, and from
the number of document classes
        <xref ref-type="bibr" rid="ref1">(Adhikari et al.,
2019)</xref>
        . The results of a noise introduction
simulation study revealed that substituting up to 40% of
words with random text strings yields to a small
decrease in text classification performance, while
the substitution of more than 40% of the text yields
a dramatic decrease in classification performance
        <xref ref-type="bibr" rid="ref2">(Agarwal et al., 2007)</xref>
        .
      </p>
      <p>
        A similar task, Extreme Multi-Label Text
Classification (XMTC), consists in the classification
of documents annotated with multiple tags.
Recent experiments of XMTC with Convolutional
Neural Networks on a dataset of 57k legal
documents annotated with multiple concepts from
EUROVOC, revealed that word embeddings extracted
with label-wise attention Networks
        <xref ref-type="bibr" rid="ref17">(Mullenbach
et al., 2018)</xref>
        leads to the best overall performance,
compared pre-trained word embeddings,
Hierarchical word embedding and Max-Pooling
Scorers that produce section-based word embeddings
        <xref ref-type="bibr" rid="ref9">(Chalkidis et al., 2019)</xref>
        . It has been demonstrated
in more than one context that cNNs perform well
for text categorization, but also that there is no
single algorithm that performed the best across
the combination of data sets and training sample
sizes
        <xref ref-type="bibr" rid="ref13">(Keeling et al., 2019)</xref>
        . The rationale behind
the good performance of label-wise attention
networks is their ability to maximise the di erence
of the words/features associated to di erent
labels. A very similar -but faster- approach is Linear
Discriminant Analysis
        <xref ref-type="bibr" rid="ref3">(Balakrishnama and
Ganapathiraju, 1998)</xref>
        , a feature selection and
classification technique that has been successfully used for
the incremental classification of large streams of
data
        <xref ref-type="bibr" rid="ref19">(Pang et al., 2005)</xref>
        , to find identity patterns in
images before the advent of deep learning
        <xref ref-type="bibr" rid="ref2 ref21">(Prince
and Elder, 2007)</xref>
        and as feature selection technique
for discriminating fMRI response patterns to
visual stimuli
        <xref ref-type="bibr" rid="ref16">(Mandelkow et al., 2016)</xref>
        .
      </p>
      <p>
        Linear Discriminant Analysis (henceforth
LDA) is a widely accepted dimensionality
reduction and classification method, which aims to
find a transformation matrix to convert a feature
space to a smaller space by maximising the
between-class scatter matrix while minimising
the within-class scatter matrix
        <xref ref-type="bibr" rid="ref5">(Boroujeni et al.,
2018)</xref>
        . The criticism towards this technique
emphasize the fact that it su ers from the domination
of the largest objectives, in particular when close
class pairs tend to overlap in a feature subspace,
but this can be solved with various optimizations,
including eigenvalue decomposition, among
others
        <xref ref-type="bibr" rid="ref15">(Li et al., 2017)</xref>
        .
      </p>
      <sec id="sec-2-1">
        <title>Data</title>
        <p>Our dataset consists of 2030 legal italian
documents with an average of 800 words each. We
have 23 classes representing products manually
annotated over 15 years, every document is
categorized in one or more classes. Classes are not
balanced, but their distribution is proportional to
the whole editorial system, that consists of 443.7k
documents. We extracted such a small dataset
from the editorial system because we plan to
update our models very frequently, using a small
portion of documents each time in order to save
computational power and time. Figure 1 reports the
distribution of the classes in our dataset.</p>
        <p>Since documents can fall under more than one
class, we have 43% of documents repeated under
di erent classes. We tested the performance of
di erent classifiers under two di erent conditions:
noisy (with repeated documents) and clean
(without the repeated documents).
3</p>
      </sec>
      <sec id="sec-2-2">
        <title>Experiments and Discussion</title>
        <p>In both cases (noisy and clean) we performed
preprocessing on text, deleting punctuation and
Italian stopwords. We did not use stemming or
lemmatization since their usage has led to a
degradation of results. We formalize the task in two
ways: a simple multinomial classification, where
we train a classifier to predict one class per
document, and a multi-label classification, where we
produce a score ranking of labels for each
document and evaluate if the gold standard label occurs
in the first N positions.</p>
        <p>
          noisy
acc(10f-cv)
noisy
acc(split)
clean
acc(10f-cv)
clean
acc(split)
We tested di erent feature settings and algorithms
with 10-fold cross validation (10f-cv) and
70%30% training-testing split in the clean and noisy
dataset conditions. Table 1 reports the results in
terms of accuracy, that is to say the percentage
of documents correctly classified. In both
conditions the majority baseline is very low,
ranging from 4.6% to 8.3%. First we experimented
with pre-trained GloVe word vectors as features
(vector size 200). As a matter of fact the GloVe
Project provides word vectors of di erent
dimensions for words representation trained on massive
web datasets
          <xref ref-type="bibr" rid="ref20">(Pennington et al., 2014)</xref>
          . For
instance the word vectors we used here have been
pre-trained by the GloVe Project from two massive
corpora, Wikipedia 2014 and Gigaword 5. As we
can see in Table 1 in the GloVe embeddings setting
we used the following classification algorithms:
cNN (with 2 convolutional layers with ReLU
activation, 1 pooling layer and 1 output layer), rNN
(with 1 rNN sequence layer, 1 LSTM layer with
tanH activation and 1 rNN outpur layer), bayesian
networks, na¨ıve bayes, SVMs, random forest and
LDA. In general, Deep Learning algorithms
suffer from the small data used for the experiment,
but surprisingly, cNNs performed badly and rNNs
worked better, indicating that the sequentiality of
text plays an important role.
        </p>
        <p>Among the other
classification algorithms it turned out that random
forest and LDA obtained the best performances,
proving that the ability of the algorithm to
generalize is crucial. The general low accuracies
obtained with these features might indicate that the
contexts of our documents represented by word
embeddings are not very discriminative. The
results increased significantly in the classification
with the TF-IDF scores of 4700 words, especially
with SVMs as algorithms. This suggests that using
more features brings better results without
overfittng the data, as shown by the similar accuracies
obtained with a 10-fold cross validation and with
training-test split. Next we experimented with
feature selection, using LDA and Pearsons’
correlations to select the best 200 words for the
prediction. Results show that, in this feature setting,
random forests are the best classification algorithm
and that LDA outperforms correlations as feature
selection algorithm. Furthermore, as can be seen
in the last part of Table 1, we were able to reach
state-of-the-art results with an ensemble learning
scheme: using LDA as a classifier we transformed
features
baseline
500 words per label tf-idf selected
500 words per label tf-idf selected
500 words per label tf-idf selected
500 words per label tf-idf selected
500 words per label tf-idf selected
500 words per label tf-idf selected
1000 words per label tf-idf selected
1000 words per label tf-idf selected
1000 words per label tf-idf selected
1000 words per label tf-idf selected
1000 words per label tf-idf selected
1000 words per label tf-idf selected
algorithm
majority (zeroR)
scoreranking LDA (1 label)
scoreranking LDA (2 labels)
scoreranking LDA (3 labels)
scoreranking LDA (4 labels)
scoreranking LDA (5 labels)
scoreranking LDA (6 labels)
scoreranking LDA (1 label)
scoreranking LDA (2 labels)
scoreranking LDA (3 labels)
scoreranking LDA (4 labels)
scoreranking LDA (5 labels)
scoreranking LDA (6 labels)
the initial space of 200 word features, previously
selected with LDA, in a space of 23 binary
features corresponding to the final classes. On top of
that we applied di erent classification algorithms,
finding that SVM is the best performing one in
the noisy dataset while random forest obtained the
best performance in the clean dataset.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3.2 Multi-Label Classification</title>
      <p>The Multi-Label classification task is structured as
follows: for each document label in the training
set, we create a Bag-of-Words (BoW) from the
words of its associated documents, then we use
TF-IDF scores to weight every word within the
BoW obtaining a word ranking that we use for
feature selection, since words with higher values
better characterize a particular label. Then we apply
LDA classification, but unlike the previous
experiment, here the prediction returns a list of all the
labels, ordered by the total score achieved, we call
score ranking this algorithm. Since the classifier
returns a list as an outcome, but the editors (our
customers) want to choose one or more label from
this list, we have to evaluate if the gold standard
label occurs in the returned list, thus we can assign
multiple labels to a document and test whether the
original one is present or not. In this sense, the
Score Ranking classifier is evaluated as a
MultiClass classifier (so the metrics in Table 2 are
actually Hit@N metrics where N is the size of the
returned list), but the returned list is used by the
end users to simulate a Multi-Label functionality,
leaving to the editors the choice of the best labels
to assign among the ones returned. The result of
this experiment, reported in Table 2, shows that the
performance with 1 label is in line with the
ensemble learning setting of the Multinomial
classification, but the score ranking system only works well
in the noisy dataset, as the results are very
similar in both noisy and clean conditions. The
performance increases at an average of +3.9% when
keeping more than one label. In general, we
observe that using 500 or 1000 words per label yield
similar results in our small dataset, but using more
words can help to capture more nuances in text,
that might be useful in larger sets of documents.
We also observe that 1000 words per label increase
the results in the clean condition, while 500 words
per label are enough in the noisy condition.
4</p>
      <sec id="sec-3-1">
        <title>Conclusion and Future</title>
        <p>
          We experimented with various settings, feature
selection methods and classification algorithms, and
we found a method to extract good models in
extremely noisy conditions, even with documents
repeated under di erent labels. LDA proved to
be a valuable classification and feature selection
technique, but we obtained the best performances
when LDA is combined with other algorithms.
The results we obtained with the score ranking
classification are in line with the state-of-the-art,
but our method is more suitable for small and
noisy datasets. In the future we plan to apply the
score ranking algorithm on a larger dataset and to
use it in a real multi-label environment
comparing the results with the state-of-the-art of Extreme
Multi-Label Document Classification
          <xref ref-type="bibr" rid="ref9">(Chalkidis
et al., 2019)</xref>
          . We also plan to make comparisons
with the more recent state of the art deep learning
techniques and to apply semantic indexing to the
documents to check for improvements.
        </p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Ashutosh</given-names>
            <surname>Adhikari</surname>
          </string-name>
          , Achyudh Ram,
          <string-name>
            <given-names>Raphael</given-names>
            <surname>Tang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Jimmy</given-names>
            <surname>Lin</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Docbert: Bert for document classification</article-title>
          . arXiv preprint arXiv:
          <year>1904</year>
          .08398.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Sumeet</given-names>
            <surname>Agarwal</surname>
          </string-name>
          , Shantanu Godbole, Diwakar Punjani, and
          <string-name>
            <given-names>Shourya</given-names>
            <surname>Roy</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>How much noise is too much: A study in automatic text classification</article-title>
          .
          <source>In Seventh IEEE International Conference on Data Mining (ICDM</source>
          <year>2007</year>
          ), pages
          <fpage>3</fpage>
          -
          <lpage>12</lpage>
          . IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Suresh</given-names>
            <surname>Balakrishnama</surname>
          </string-name>
          and
          <string-name>
            <given-names>Aravind</given-names>
            <surname>Ganapathiraju</surname>
          </string-name>
          .
          <year>1998</year>
          .
          <article-title>Linear discriminant analysis-a brief tutorial</article-title>
          .
          <source>In Institute for Signal and information Processing</source>
          , volume
          <volume>18</volume>
          , pages
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Guido</given-names>
            <surname>Boella</surname>
          </string-name>
          , Luigi Di Caro, Daniele Rispoli, and
          <string-name>
            <given-names>Livio</given-names>
            <surname>Robaldo</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>A system for classifying multi-label text into eurovoc</article-title>
          .
          <source>In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Law</source>
          , pages
          <fpage>239</fpage>
          -
          <lpage>240</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Forough</given-names>
            <surname>Rezaei</surname>
          </string-name>
          <string-name>
            <surname>Boroujeni</surname>
          </string-name>
          , Sen Wang,
          <string-name>
            <given-names>Zhihui</given-names>
            <surname>Li</surname>
          </string-name>
          , Nicholas West,
          <string-name>
            <given-names>Bela</given-names>
            <surname>Stantic</surname>
          </string-name>
          , Lina Yao, and
          <string-name>
            <given-names>Guodong</given-names>
            <surname>Long</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Trace ratio optimization with feature correlation mining for multiclass discriminant analysis</article-title>
          .
          <source>In Thirty-Second AAAI Conference on Artificial Intelligence.</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Danielle</given-names>
            <surname>Caled</surname>
          </string-name>
          , Miguel Won, Bruno Martins, and Ma´rio
          <string-name>
            <given-names>J</given-names>
            <surname>Silva</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>A hierarchical label network for multi-label eurovoc classification of legislative contents</article-title>
          .
          <source>In International Conference on Theory and Practice of Digital Libraries</source>
          , pages
          <fpage>238</fpage>
          -
          <lpage>252</lpage>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Cristian</given-names>
            <surname>Cardellino</surname>
          </string-name>
          , Serena Villata, Laura Alonso Alemany, and
          <string-name>
            <given-names>Elena</given-names>
            <surname>Cabrio</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Information extraction with active learning: A case study in legal text</article-title>
          .
          <source>In International Conference on Intelligent Text Processing and Computational Linguistics</source>
          , pages
          <fpage>483</fpage>
          -
          <lpage>494</lpage>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Cristian</given-names>
            <surname>Cardellino</surname>
          </string-name>
          , Milagro Teruel, Laura Alonso Alemany, and
          <string-name>
            <given-names>Serena</given-names>
            <surname>Villata</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Ontology population and alignment for the legal domain: Yago, wikipedia and lkif</article-title>
          .
          <source>In International Semantic Web Conference: Posters Demos and Industry Tracks</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Ilias</given-names>
            <surname>Chalkidis</surname>
          </string-name>
          , Emmanouil Fergadiotis, Prodromos Malakasiotis, Nikolaos Aletras, and
          <string-name>
            <given-names>Ion</given-names>
            <surname>Androutsopoulos</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Extreme multi-label legal text classification: A case study in eu legislation</article-title>
          .
          <source>In Proceedings of the Natural Legal Language Processing Workshop</source>
          <year>2019</year>
          , pages
          <fpage>78</fpage>
          -
          <lpage>87</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Vidas</given-names>
            <surname>Daudaravicius</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Automatic multilingual annotation of eu legislation with eurovoc descriptors</article-title>
          . In EEOP2012: Exploring and
          <string-name>
            <surname>Exploiting O - cial Publications</surname>
          </string-name>
          Workshop Programme, page
          <volume>14</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Felice</given-names>
            <surname>Dell'Orletta</surname>
          </string-name>
          , Simone Marchi, Simonetta Montemagni, Barbara Plank, and
          <string-name>
            <given-names>Giulia</given-names>
            <surname>Venturi</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>The splet-2012 shared task on dependency parsing of legal texts</article-title>
          .
          <source>In Semantic Processing of Legal Texts (SPLeT-2012) Workshop Programme</source>
          , page
          <volume>42</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Tommaso</given-names>
            <surname>Fornaciari</surname>
          </string-name>
          , Fabio Celli, and
          <string-name>
            <given-names>Massimo</given-names>
            <surname>Poesio</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>The e ect of personality type on deceptive communication style</article-title>
          .
          <source>In Intelligence and Security Informatics Conference (EISIC)</source>
          ,
          <year>2013</year>
          European, pages
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          . IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Robert</given-names>
            <surname>Keeling</surname>
          </string-name>
          , Rishi Chhatwal,
          <string-name>
            <surname>Nathaniel</surname>
            <given-names>HuberFliflet</given-names>
          </string-name>
          , Jianping Zhang, Fusheng Wei,
          <string-name>
            <given-names>Haozhen</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Shi</given-names>
            <surname>Ye</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Han</given-names>
            <surname>Qin</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Empirical comparisons of cnn with other learning algorithms for text classification in legal document review</article-title>
          .
          <source>arXiv preprint arXiv:1912</source>
          .09499.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Alessandro</given-names>
            <surname>Lenci</surname>
          </string-name>
          , Simonetta Montemagni, Vito Pirrelli, and
          <string-name>
            <given-names>Giulia</given-names>
            <surname>Venturi</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Ontology learning from italian legal texts</article-title>
          .
          <source>Law, Ontologies and the Semantic Web</source>
          ,
          <volume>188</volume>
          :
          <fpage>75</fpage>
          -
          <lpage>94</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Zhihui</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Feiping</given-names>
            <surname>Nie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Xiaojun</given-names>
            <surname>Chang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Yi</given-names>
            <surname>Yang</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Beyond trace ratio: weighted harmonic mean of trace ratios for multiclass discriminant analysis</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          ,
          <volume>29</volume>
          (
          <issue>10</issue>
          ):
          <fpage>2100</fpage>
          -
          <lpage>2110</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Hendrik</given-names>
            <surname>Mandelkow</surname>
          </string-name>
          ,
          <string-name>
            <surname>Jacco A de Zwart</surname>
          </string-name>
          , and
          <string-name>
            <surname>Je H Duyn</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Linear discriminant analysis achieves high classification accuracy for the bold fmri response to naturalistic movie stimuli</article-title>
          .
          <source>Frontiers in human neuroscience</source>
          ,
          <volume>10</volume>
          :
          <fpage>128</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>James</given-names>
            <surname>Mullenbach</surname>
          </string-name>
          , Sarah Wiegre e, Jon Duke, Jimeng Sun, and
          <string-name>
            <given-names>Jacob</given-names>
            <surname>Eisenstein</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Explainable prediction of medical codes from clinical text</article-title>
          . arXiv preprint arXiv:
          <year>1802</year>
          .05695.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>Monica</given-names>
            <surname>Palmirani</surname>
          </string-name>
          , Michele Martoni, Arianna Rossi, Cesare Bartolini, and
          <string-name>
            <given-names>Livio</given-names>
            <surname>Robaldo</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Pronto: Privacy ontology for legal reasoning</article-title>
          .
          <source>In International Conference on Electronic Government and the Information Systems Perspective</source>
          , pages
          <fpage>139</fpage>
          -
          <lpage>152</lpage>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <given-names>Shaoning</given-names>
            <surname>Pang</surname>
          </string-name>
          , Seiichi Ozawa, and
          <string-name>
            <given-names>Nikola</given-names>
            <surname>Kasabov</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>Incremental linear discriminant analysis for classification of data streams</article-title>
          .
          <source>IEEE transactions on Systems, Man, and Cybernetics</source>
          , part
          <string-name>
            <surname>B</surname>
          </string-name>
          (Cybernetics),
          <volume>35</volume>
          (
          <issue>5</issue>
          ):
          <fpage>905</fpage>
          -
          <lpage>914</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <surname>Je</surname>
            rey Pennington, Richard Socher, and
            <given-names>Christopher D</given-names>
          </string-name>
          <string-name>
            <surname>Manning</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Glove: Global vectors for word representation</article-title>
          .
          <source>In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)</source>
          , pages
          <fpage>1532</fpage>
          -
          <lpage>1543</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>Simon JD Prince and James H Elder</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Probabilistic linear discriminant analysis for inferences about identity</article-title>
          .
          <source>In 2007 IEEE 11th International Conference on Computer Vision</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          . IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <given-names>Ralf</given-names>
            <surname>Steinberger</surname>
          </string-name>
          , Mohamed Ebrahim, and
          <string-name>
            <given-names>Marco</given-names>
            <surname>Turchi</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Jrc eurovoc indexer jex-a freely available multi-label categorisation tool</article-title>
          . arXiv preprint arXiv:
          <volume>1309</volume>
          .
          <fpage>5223</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <surname>Octavia-Maria</surname>
            <given-names>Sulea</given-names>
          </string-name>
          , Marcos Zampieri, Shervin Malmasi, Mihaela Vela, Liviu P Dinu, and Josef Van Genabith.
          <year>2017</year>
          .
          <article-title>Exploring the use of text classification in the legal domain</article-title>
          .
          <source>arXiv preprint arXiv:1710</source>
          .
          <fpage>09306</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <given-names>Giulia</given-names>
            <surname>Venturi</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Semantic annotation of italian legal texts: a framenet-based approach</article-title>
          .
          <source>Constructions and Frames</source>
          ,
          <volume>3</volume>
          (
          <issue>1</issue>
          ):
          <fpage>46</fpage>
          -
          <lpage>79</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>