<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Sentence based System for Measuring Syntax Complexity using a Recurrent Deep Neural Network</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giosue Lo Bosco</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giovanni Pilato</string-name>
          <email>giovanni.pilato@icar.cnr.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniele Schicchi</string-name>
          <email>daniele.schicchig@unipa.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dipartimento di Matematica e Informatica, Universita degli studi di Palermo</institution>
          ,
          <country country="IT">ITALY</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>ICAR-CNR - National Research Council of Italy</institution>
          ,
          <addr-line>Palermo</addr-line>
          ,
          <country country="IT">ITALY</country>
        </aff>
      </contrib-group>
      <fpage>95</fpage>
      <lpage>101</lpage>
      <abstract>
        <p>In this paper we present a deep neural network model capable of inducing the rules that identify the syntax complexity of an Italian sentence. Our system, beyond the ability of choosing if a sentence needs of simpli cation, gives a score that represent the con dence of the model during the process of decision making which could be representative of the sentence complexity. Experiments have been carried out on one public corpus created speci cally for the problem of text-simpli cation.</p>
      </abstract>
      <kwd-group>
        <kwd>Text Simpli cation Neural Networks</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Text Simpli cation (TS) is Natural Language process that aims at making a
text more easily understandable for a determined target of people by changing
the lexical and syntactic content of the original text.</p>
      <p>
        The usefulness of TS can be appreciated by di erent kind of people, such as those
who are not mother tongue or have language disabilities. For example, people
a ected by aphasia during the reading process have di culties to understand
syntactic structure [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], deaf children have trouble comprehending syntactically
complex sentences [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and people a ected by dyslexia have comprehension
difculties in reading infrequent and long words.
      </p>
      <p>
        For what concerns the Italian language, TS is an underdeveloped research area
and this is evident from the availability of few resources and the number of
developed methodologies. A cause for this is probably that the English Language
is more widespread. Nonetheless, works have been done trying to face di erent
NLP problems in Italian Language.[
        <xref ref-type="bibr" rid="ref1 ref3 ref4">1, 3, 4</xref>
        ].
      </p>
      <p>
        The problem of evaluating the complexity of a document has already been
tackled in the past using indexes like GulpEase [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and Flesch-Vacca [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], which are
based on the structural features of the sentence such us the average number of
syllables per word, the average number of words per sentence, the number of
sentences and the average number of characters per words. The problems with these
indexes are that they are not suitable to measure the sentence complexity and
they do not consider other important aspects of the text complexity such as how
much popular are the words in the text. Nowadays, the most common index for
assessing sentence complexity is READ-IT[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]: a Support Vector Machine based
system capable of measuring the text complexity taking into account many of
di erent text features related to Lexical, Morpho-syntactic and Syntactic
Features aspects. Another system capable of measuring sentence complexity for the
Italian language is described in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. It is based on a Recurrent Neural Network
used to measure the lexical and syntactic complexity of a sentence using as
tokens only words and punctuation symbols.
      </p>
      <p>In the domain of TS, words like complex and simple should be used keeping
in mind that the complexity of a sentence is strictly related to a determined
kind of people that could have di erent needs.Since the corpus we have used
contains examples that represent the simpli cation process for di erent classes
of readers, our simpli cation system is not specialized for any speci c target
reader. Nonetheless, the corpus is suited for the the goal of this work that is to
understand the potentiality of a model based on Neural Network (NN) to
classify Italian sentences using only the part-of-speech(PoS) tags which represent the
syntactical aspects of the text.</p>
      <p>In this paper, we give a contribution to the TS eld using NN for
developing a system capable of inducing the patterns which characterize the syntactic
complexity of a sentence. Our system classi es the sentence in 2 classes di
cultto-read and simple-to-read and produces a score which represent the con dence
of the network during the decision making process that could be interpreted as
a measure of complexity of the given sentence.</p>
      <p>The paper is organized as follow: in section 2 we will describe the system and
our approach of facing the problem, in section 3 we will explain the methodology
of carrying out the tests and results, in section 5 we will give conclusion.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Proposed Methodology</title>
      <p>Our method is based on NN algorithms and it is able to discriminate if an
Italian sentence needs to be simpli ed in order to be more easily understandable
by di erent classed of target readers. Furthermore, the network gives a score that
could be interpreted as a score of the sentence complexity and that represents
the con dence of the network during the decision making.</p>
      <p>To manage the task of understanding the sentence complexity we have chosen
to use Recurrent Neural Networks (RNNs) that are a class of NN useful for
analyzing sequences. In the recent past RNNs have shown their e ectiveness
in many di erent linguistic elds since it is well known that a sentence can
be structured as a sequence of tokens such as words, punctuation symbols or
part-of-speech.
2.1</p>
      <sec id="sec-2-1">
        <title>Architecture and Parameters</title>
        <p>
          We have evaluated a sentence as a sequence of part of speech tags calculated
using a pre-trained version of TreeTagger3 [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. TreeTagger is a tool for annotating
text with part-of-speech and it has been successfully used to tag many di erent
languages such as German, English, Italian and so on. The tool is customizable
and it allows the choice of di erent tag-set for each supported language. For the
Italian language there exist two di erent tag-sets (Baroni4 and Stein5) that we
have separately used for parsing the sentences of the corpus.
        </p>
        <p>Both the tag-sets contain tags that identify linguistic elements such as adverbs,
adjective, verb, noun but they have di erent way to represent these linguistic
categories. For instance, in the description of verbs one tag-set (Baroni)
contains 17 di erent verb categories while the other one (Stein) contains 12 di
erent verb categories. In total, the Baroni tag-set contains 52 di erent categories
of part-of-speech tags while the Stein tag-set contains 38 di erent categories of
part-of-speech tags.</p>
        <p>Each part-of-speech tag obtained by TreeTagger is then coded as a vector using
the one-hot encoding in which a part-of-speech tag becomes a vector full of 0s
except for a unique one position in which the value is 1. Every sentence is
evaluated as sequence of one-hot encoded vectors that are passed as input to the
network that analyzes them. The complete process is shown in gure 1 and 2.
The network that we have used to tackle the problem of evaluating the
complex</p>
        <p>Salve, avrei bisogno di una
informazione piuttosto urgente</p>
        <p>TreeTagger</p>
        <p>
          TK1 TK2 TK3 TK4 TK10
NOM PON VER:cond NOM ... SENT
ity of a Italian sentence is an RNN based on Long Short Term Memory (LSTM)
arti cial neurons [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. Networks based on LSTM arti cial neurons have shown
good results for many sequence modeling tasks. The main features of LSTM are
its abilities of facing the problem of vanishing gradient [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] and of remembering
the dependencies among elements inside a sequence which are distant from each
other.
        </p>
        <p>
          The rst layer of the network is made up of 512 LSTM arti cial neurons. The
outcome of this layer is then handled by fully connected layer composed by two
neurons which use the softmax activation function. Finally, we have applied L2
regularization [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. The network architecture is shown in gure 2.
The probability that a sentence belongs either to a di cult-to-read class or a
simple-to-read class, which is given by the last layer of the network, can be
interpreted as a cumulative score that measures the complexity of the sentence by
3 http://www.cis.uni-muenchen.de/ schmid/tools/TreeTagger/
4 http://sslmit.unibo.it/ baroni/collocazioni/itwac.tagset.txt
5 http://www.cis.uni-muenchen.de/ schmid/tools/TreeTagger/data/italian-tagset.txt
taking into account uniquely his syntactic structure.
        </p>
        <p>
          We have used the well known cross-entropy as loss function which has been
minimized using the RMSPROP [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] algorithm on balanced minibatch of size 50 thus
each batch contains 25 complex sentence and 25 simple sentence.
To avoid over tting, during the training process, it has been taken into account
a regularization factor L2 with a weight value of 0:01. We have limited the
source sentences to 20 tokens and the network was trained for 10 epochs for
both tag-sets. We have not observed any signi cative improvements by choosing
a number of tokens greater than 20. The whole set of network parameters have
been obtained through a set of trials.
        </p>
        <p>PoS1 PoS2 PoS3 ... PoSn</p>
        <p>ONE</p>
        <p>HOT
ENCODING</p>
        <p>LSTM
LAYER</p>
        <p>FULLY
CONNECTED</p>
        <p>LAYER</p>
        <p>SOFTMAX</p>
        <p>
          COMPLEX CLASS
SIMPLE CLASS
There is a lack of corpora useful to tackle the text simpli cation problem for the
Italian language by means of machine learning algorithms. Thus we have chosen,
to the best of our knowledge, the biggest available dataset created for the italian
text simpli cation nowadays [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
        </p>
        <p>
          The corpus [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] contains about 63:000 pairs of sentences in which, for each original
sentence, there is another corresponding sentence that keeps the same meaning
and represents the simpli ed version of the original one. The paired sentences
containing structural transformations that identify how to simplify a sentence,
thus all the simpli ed sentences can be considered easy-to-read and can be used
as a developmental resource for training a sentence classi cation algorithm.
Some of simpli cation rules inside the corpus are, for example, deletion of some
words from a source sentence, lexical substitution of the source words so as to
have a simpler sentence to understand, insertion of other words that can help
to understand better the meaning of the sentence and so on.
        </p>
        <p>The corpus has been entirely tagged with the Treetagger parser, both training
and tests are based only on the tags without taking into account neither lemmas
or punctuation symbols. The experiments suggest that a NN is capable of
discovering the syntactics rules which characterize both approaches by understanding
how to associate each sentence to the correct class.
3.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Experiments</title>
        <p>The evaluation of the model has proceeded using the K-FOLD cross-validation
(K-FOLD) method. K-FOLD is a validation method useful for assessing the
abilities of a statistical model especially in presence of few data, which is our case. In
fact, 63:000 pairs of sentences are not enough to evaluate this kind of model and
the use of K-FOLD evaluation methodology is necessary for clearly understand
how well the classi er is capable of generalizing his knowledge to an
independent dataset. The method partitions randomly the dataset into K equal sized
subsets (in our case K=10): the method selects all possible K-1 subsets that are
used to train the model and use the last one to validate it. The K models have
been trained to classify two classes of sentences that are present into the corpus:
di cult-to-read (positive class), simple-to-read (negative class).</p>
        <p>The quantization of the results has been done using the Precision, Recall, True
Positive Ratio (TPR) and True Negative Ratio (TNR) measures for each
iteration of K-FOLD. The Recall and Precision measures, respectively, the percentage
of positive class elements that the model is able to correctly classify and the
percentage of mistakes that it has done during the classi cation of the positive class
elements. TPR6 and TNR measure respectively the proportion of elements
correctly identi ed as positive and the proportion of elements correctly identi ed as
negative. Finally, the results have been averaged on the K executed iterations.
We have decided to use as baseline model a support vector machine (SVM) model
trained using two di erent kernel methods: RBF and polynomial. This choice is
justi ed by the fact that, to our knowlege, does not exist another classi cation
system that take as input only the part-of-speech tags. READ-IT can measure
the syntactical complexity of a sentence but it makes available an online
interface that is not handy to make a huge amount of tests. The SVM model takes
as input the part-of-speech tags of the input sentence as a vector in which each
position represents a di erent part-of-speech tag whose value is the number of
the corresponding part-of-speech in the source text. Table 1 shows the results
obtained by both models.</p>
        <p>Model Kernel TAG-SET Recall Precision True Positive Ratio True Negative Ratio
RNN-S - STEIN 0.819 0.834 0.819 0.837
RNN-B - BARONI 0.764 0.845 0.764 0.859
SVM-SP polynomial STEIN 0.589 0.832 0.589 0.881
SVM-SR RBF STEIN 0.750 0.798 0.750 0.810
SVM-BP polynomial BARONI 0.506 0.839 0.506 0.903
SVM-BR RBF BARONI 0.731 0.793 0.731 0.809</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Discussion</title>
      <p>The results show the performance of the NN model compared to those obtained
by the SVM using di erent kernel methods. The RNN reaches the best result
of Recall and, obviously, on the True Positive Ratio with the STEIN tag-set
and the best result of Precision using the BARONI tag-sets. The True Negative
Ratio is better using the SVM model with the polinomial kernel for both the
tag-set. Although the good performance of the SVM-BP measured by the True
Negative Ratio, the relative Recall measure reaches only a value of 0.506.
In our opinion, the best model is the RNN-B one that uses the BARONI tag-set,
because it shows a good value of Recall that is better than the ones obtained
by the SVM and the best value of Precision. Furthermore, both Recall and True
Negative Ratio measures are not much di erent from the best ones obtained
respectively by RNN-S and SVM-BP (approximately 0.05 points of di erence).
The results suggest the e ectiveness of our model to evaluate the syntactical
complexity aspects of an Italian Sentence. The SVM model reaches a high value
of True Negative Ratio that will be studied in our future works trying to
understand what is the key of this outcome and if it can be embedded in the RNN-B
model.</p>
      <p>Looking into how the tag-sets in uence the results we observe that both of them
allow the models to obtain good value of Precision and True Negative Ratio,
in fact the maximum di erence, carried out as the best value minus the worst
value, among the precision results is 0.052 and the maximum di erence among
the True Negative Results is 0.094. Conversely, their usage a ect more the Recall
measure in which the maximum di erence is 0.313. The problem is speci cally
related to the polynomial kernel that seems to have more di cult to infer a
considerable number of rules that identify the elements of the class di cult-to-read.
The good performance achieved by the models, except for the Recall of the SVM
model with polynomial kernel, suggests that both tag-sets express well the
syntactic features of the text and they are suited to address this kind of problem
coupled with a neural model.
5</p>
    </sec>
    <sec id="sec-4">
      <title>Discussion and Conclusion</title>
      <p>We have presented a system for measuring the syntactic complexity of a
sentence written in Italian language. Our system takes as input a sentence and it
expresses the syntax of the sentence as a sequence of part-of-speech tags. The
RNN at the base of our system, after learning the patterns that determine the
syntactic complexity through a speci c corpus created for TS, is capable of
classifying a sentence as being di cult-to-read or simple-to-read. We have tested
the system using two di erent tag-sets and we have compared the RNN with a
SVM model using di erent kernel methods. Results show the e ectiveness of the
Neural Network model to address the task of classifying Italian sentences based
on their readability complexity. The system can be used either as a stand-alone
system or as a support tool for the creation of a complex system to address
di erent problems such as the generation of simpli ed text.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Alfano</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lenzitti</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lo Bosco</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perticone</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>An automatic system for helping health consumers to understand medical texts</article-title>
          . pp.
          <volume>622</volume>
          {
          <issue>627</issue>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Brunato</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cimino</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dell'Orletta</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Venturi</surname>
          </string-name>
          , G.:
          <article-title>Paccss-it: A parallel corpus of complex-simple sentences for automatic text simpli cation</article-title>
          .
          <source>In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing</source>
          . pp.
          <volume>351</volume>
          {
          <fpage>361</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Chiavetta</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lo Bosco</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pilato</surname>
          </string-name>
          , G.:
          <article-title>A lexicon-based approach for sentiment classi cation of amazon books reviews in italian language</article-title>
          . vol.
          <volume>2</volume>
          , pp.
          <volume>159</volume>
          {
          <issue>170</issue>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Chiavetta</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lo Bosco</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pilato</surname>
          </string-name>
          , G.:
          <article-title>A layered architecture for sentiment classi cation of products reviews in italian language</article-title>
          . In: Monfort,
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>Krempels</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.H.</given-names>
            ,
            <surname>Majchrzak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.A.</given-names>
            ,
            <surname>Traverso</surname>
          </string-name>
          , P. (eds.)
          <source>Web Information Systems and Technologies</source>
          . pp.
          <volume>120</volume>
          {
          <fpage>141</fpage>
          . Springer International Publishing,
          <string-name>
            <surname>Cham</surname>
          </string-name>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Dell'Orletta</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montemagni</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Venturi</surname>
          </string-name>
          , G.:
          <article-title>Read-it: Assessing readability of italian texts with a view to text simpli cation</article-title>
          .
          <source>In: Proceedings of the second workshop on speech and language processing for assistive technologies</source>
          . pp.
          <volume>73</volume>
          {
          <fpage>83</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Franchina</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vacca</surname>
          </string-name>
          , R.:
          <article-title>Adaptation of esh readability index on a bilingual text written by the same author both in italian and english languages</article-title>
          .
          <source>Linguaggi</source>
          <volume>3</volume>
          ,
          <issue>47</issue>
          {
          <fpage>49</fpage>
          (
          <year>1986</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Goodfellow</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Courville</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Deep Learning</article-title>
          . MIT Press (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Hinton</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Srivastava</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Swersky</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Neural networks for machine learning lecture 6a overview of mini-batch gradient descent (</article-title>
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Hochreiter</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidhuber</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>Long short-term memory</article-title>
          .
          <source>Neural computation 9(8)</source>
          ,
          <volume>1735</volume>
          {
          <fpage>1780</fpage>
          (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>Lo</given-names>
            <surname>Bosco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Pilato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Schicchi</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.:</surname>
          </string-name>
          <article-title>A recurrent deep neural network model to measure sentence complexity for the italian language</article-title>
          .
          <source>In: Proceedings of the sixth International Workshop on Arti cial Intelligence and Cognition</source>
          . (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Lucisano</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piemontese</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          :
          <article-title>Gulpease: una formula per la predizione della di colta dei testi in lingua italiana</article-title>
          .
          <source>Scuola e citta</source>
          <volume>3</volume>
          (
          <issue>31</issue>
          ),
          <volume>110</volume>
          {
          <fpage>124</fpage>
          (
          <year>1988</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Ng</surname>
          </string-name>
          , A.Y.:
          <article-title>Feature selection, l1 vs. l2 regularization, and rotational invariance</article-title>
          .
          <source>In: Proceedings of the Twenty- rst International Conference on Machine Learning</source>
          . pp.
          <volume>78</volume>
          {
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Schmid</surname>
          </string-name>
          , H.:
          <article-title>Probabilistic part-of-speech tagging using decision trees</article-title>
          .
          <source>In: New methods in language processing</source>
          . p.
          <volume>154</volume>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Shewan</surname>
            ,
            <given-names>C.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Canter</surname>
            ,
            <given-names>G.J.:</given-names>
          </string-name>
          <article-title>E ects of vocabulary, syntax, and sentence length on auditory comprehension in aphasic patients</article-title>
          .
          <source>Cortex</source>
          <volume>7</volume>
          (
          <issue>3</issue>
          ),
          <volume>209</volume>
          {
          <fpage>226</fpage>
          (
          <year>1971</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Siddharthan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>A survey of research on text simpli cation</article-title>
          .
          <source>ITL-International Journal of Applied Linguistics</source>
          <volume>165</volume>
          (
          <issue>2</issue>
          ),
          <volume>259</volume>
          {
          <fpage>298</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>