<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Processing and Speech Tools for Italian, September</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>at DisCoTeX: Predicting Text Coherence by Tree-based Modelling of Linguistic Features</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Martina Galletti</string-name>
          <email>martina.galletti@sony.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pietro Gravino</string-name>
          <email>pietro.gravino@sony.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giulio Prevedello</string-name>
          <email>giulio.prevedello@sony.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>00185</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer, Control and Management Engineering (DIAG) “Antonio Ruberti”, Sapienza University of Rome</institution>
          ,
          <addr-line>via Ariosto 25, Rome</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Enrico Fermi's Research Center (CREF)</institution>
          ,
          <addr-line>via Panisperna 89A, 00184, Rome</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Sony Computer Science Laboratories Paris</institution>
          ,
          <addr-line>6, Rue Amyot, 75005, Paris</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Workshop Proce dings</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>0</volume>
      <fpage>7</fpage>
      <lpage>08</lpage>
      <abstract>
        <p>Automatic text coherence modelling plays a crucial role in natural language processing tasks, such as machine translation, summarisation, and question answering. Moreover, text coherence is fundamental to reading comprehension and readers' engagement, essential to a number of application domains. In this report, we report progress for the Assessing Discourse Coherence in Italian Texts task from EVALITA-23, whose goal is to address automatic coherence detection. The task was challenged by extracting linguistic features used to train a machine learning classifier, leading to minor improvement over the baseline. The feature importance analysis revealed semantic features' relevance, providing indications for future feature engineering and modelling eforts.</p>
      </abstract>
      <kwd-group>
        <kwd>Natural language processing</kwd>
        <kwd>text coherence</kwd>
        <kwd>Italian language</kwd>
        <kwd>machine learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Coherence is an essential quality to facilitate
compresures the quality of organization in the structure of a
text and the extent to which a reader can follow the
relationships between sentences and paragraphs. Several
text coherence models exist in the literature which aim
ments. This distinction significantly afects some
downstream tasks, such as document summarisation,
autotual connectives [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], entity-grid based approaches [
        <xref ref-type="bibr" rid="ref7">7, 8</xref>
        ]
which takes inspiration from Centering Theory [9] to
capture coherence [
        <xref ref-type="bibr" rid="ref7">7, 10</xref>
        ]. Recent approaches rely on
neural architectures and use Convolutional Neural
Networks (CNNs) over an entity-based representation of
text [11, 12], Sequence to Sequence Models [
        <xref ref-type="bibr" rid="ref3">3, 13</xref>
        ] and
Multi-Task Learning [14]. Nevertheless, language models
still face challenges in capturing and predicting global
coherence across longer texts, and targeted evaluation
paradigms are still being implemented [15, 16, 17, 18, 19].
EVALITA 2023: 8th Evaluation Campaign of Natural Language
Italy
(G. Prevedello)
(G. Prevedello)
      </p>
      <p>0000-0002-0937-8830 (P. Gravino); 0000-0002-9857-2351
“Assessing Discourse Coherence in Italian Texts”
(DisCoTeX) [20] presented at EVALITA 2023 [21]. In this report,
the feature extraction procedure and the modelling
strategy are described in Section 2, while the results are
illustrated in Section 3 and discussed in Section 4.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Description of the system</title>
      <p>The data provided were cleaned and pre-processed in a
series of systematic steps. After removing irrelevant
characters, the dataset provided by the task organisers was
normalised in a standardised format, i.e. lemmas, to break
the text into meaningful units. Then, words with less
than three characters were removed, keeping stop words.
This was done because short words often carry less
semantic meaning than longer words, and thus they could
increase the vocabulary size of a model without
contributing significantly to its training. By removing these words,
some potential noise in the training could be removed.
On the other hand, stop words, especially conjunctions,
were kept for their role in connecting sentences across
a text and preserving syntactic and grammatical
relationships during the training phase, even if they do not
carry a semantic meaning per se. After lemmatization, we
added more info to the provided data by computing the
length of words and sentences. This was done because
the length of the words and/or sentences can provide
insights into the complexity and readability of the text,
which can, in turn, impact its intrinsic coherence. Longer
words can indicate, in fact, a more complex and
domainspecific vocabulary, while longer sentences could indicate
the presence of multiple sub-clauses, which could impact
the overall coherence. Moreover, an abrupt variation
in the length of words and sentences could indicate an
unbalance in the structure of the text, thus endangering
its linguistic coherence. For similar reasons, the statistics
of the uses of the diferent tenses in sentences were also
computed since the usage of appropriate tenses ensures
temporal consistency and logical progression of
information. Afterwards, we extracted lexical features such
as word frequency, for which we used document term
matrix, Term Frequency - Inverse Document Frequency
(TF-IDF), and sentence embeddings, i.e. Sentence-BERT
(SBERT) [22].</p>
      <p>To compress high-dimensional vectors, from the
TFIDF analysis and the sentence embedding, Uniform
Manifold Approximation and Projection (UMAP) was used
to reduce their dimension down to 30 components [23].
UMAP was chosen over the principal component
analysis method as it tends to preserve local distances
better. Meanwhile, compared to the t-distributed stochastic
neighbour embedding method, it is faster and better
preserves the global data structure.</p>
      <p>Finally, prompt and target were compared by means of
the statistics mentioned above, resulting in the following
list of features for each data point:
•   _  _  : weighted Jaccard distance
between prompt’s and target’s TF-IDF vectors;
•   _  _ : cosine distance between prompt’s
and target’s TF-IDF vectors;
•   _  _ : euclidean distance between
prompt’s and target’s TF-IDF vectors projected
by UMAP;
•   _  _ : cosine distance between prompt’s
and target’s TF-IDF vectors projected by UMAP;
•  _ _   _ _ _  _ : the number of
upper case words in target divided by their sum
in prompt and target;
•    _ _ _  _ : word density in target
divided by the sum in prompt and target;
•  _ _ _  _ : the number of
punctuation marks in target divided by their sum in
prompt and target;
• ℎ _ _ _  _ : the number of characters in
target divided by their sum in prompt and target;
•    _ _ _  _ : the number of words in
target divided by their sum in prompt and target;
•  _ _ : the size of the set of tenses in both
target and prompt divided by the one in the target
only;
•  _ : the size of the set of tenses in both
target and prompt divided by the one in either
target or prompt;
•  _ _ : the size of the set of entities in both
target and prompt divided by the one in the target
only;
•  _ : the size of the set of entities in both target
and prompt divided by the one in either target or
prompt;
•   _1 _  : first component of the 2d
UMAP projection of the average vector from the
embedding of prompt’s sentences;
•   _2 _  : second component of the 2d
UMAP projection of the average vector from the
embedding of prompt’s sentences;
•   _1 _ : first component of the 2d
UMAP projection of the vector from the
embedding of target’s sentence;
•   _2 _ : second component of the 2d
UMAP projection of the vector from the
embedding of target’s sentence;
•   _  _ : euclidean distance between 30d
UMAP projection of the average vector from the
embedding of prompt’s sentences and the vector
from the embedding of target’s sentence;
•   _  _ : cosine distance between 30d UMAP
projection of the average vector from the
embedding of prompt’s sentences and the vector from
the embedding of target’s sentence;
•  _ _ : average of the pairwise cosine
distances between the prompt’s sentence embedding
vectors and the target’s;
•  _ _ : maximum of the pairwise cosine
distances between the prompt’s sentence
embedding vectors and the target’s;
•  _ _ : the pairwise cosine distances
between the vector embedding of the prompt’s last
sentence and the target’s;
•  _ _ : minimum of the pairwise cosine
distances between the prompt’s sentence embedding
vectors and the target’s.</p>
      <p>These features were then passed to a machine learning
model that classifies whether the target sentences were
coherently following the prompt text, thus tackling the
Subtask 1 of the challenge.</p>
      <p>The classifier model of choice was LGBMClassifier, a
popular machine learning solution that combines
computational eficiency with good predictive performances
in various problems. The model was imported from the
LightGBM gradient boosting framework that uses
treebased learning algorithms [24], and the binary
crossentropy was set as the objective function for the training.
Model’s hyperparameters were selected by stratified
10fold cross-validation on shufled data [ 25], exhaustively
searching the space of hyperparameters ( _  ∈
{24, 25, 26},  _ℎ ∈ {5, 6, −1} ,   _  ∈
{0.009, 0.01, 0.011},  _  ∈ {185, 190, 195} ) for the
combination with best overall accuracy. This search
space of hyperparameters was defined empirically,
starting with intervals centred around extreme parameters
( _  50 ,  _ℎ 10 ,   _  0.01 , and
 _  1000 ). Then, for every new training instance,
those intervals were re-centred at the previous
bestiftting value if such value stood at the extremity of the
interval. Otherwise, the interval was made more narrow.
The random seed was set to 42 for reproducibility.</p>
      <p>The model’s performance was evaluated against the
baseline provided by the challenge organizers. See [20]
for more details.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Results</title>
      <p>Training the system on the data available for Subtask 1,
the model achieved an accuracy of 0.595 on the test set,
improving upon the challenge baseline (0.525) by only
0.07 points. To provide some insights into these
performances, the confusion matrix on the training data and
the importance of the features are shown in Figure 1 and
Figure 2, respectively. Finally, the relevant
hyperparameters, resulting from the grid-search cross-validation, are
reported in Table 1.
tfidf_raw_wjac
tfidf_raw_cos
upper_case_word_count_t_over_tp
tfidf_red_cos
tfidf_red_euc
emb_cos_mean
tense_tar_per
emb_cos_max
word_density_t_over_tp
punctuation_count_t_over_tp</p>
      <p>sbert_umap1_pro
char_count_t_over_tp
word_count_t_over_tp</p>
      <p>tense_iou
sbert_umap2_pro</p>
      <p>ent_iou
sbert_umap2_tar
sbert_red_euc
sbert_red_cos
ent_tar_per
sbert_umap1_tar
emb_cos_last
emb_cos_min
0
10000</p>
      <p>20000 30000
Feature importance [gain]</p>
      <p>40000
not very informative in predicting coherence for the
system employed. Yet “ _ _ ’’, the proportion of
target’s entities also present in the prompt’s text, was rather
4. Discussion important. Semantic information seemed the most
relevant, as supported by the many important features
exThe features’ importance highlights that standard fre- tracted by leveraging the sentence embedding model. Of
quency statistics (such as comparisons between prompt note, “  _1 _ ’’, the projection of the target’s
and target on TF-IDF vectors and counts of upper case sentence embedding, was quite important although not
words, tenses, punctuation, words, and characters) are derived from the comparison between prompt and target.
Finally, the most important features resulted from the
aggregation of the pairwise cosine distances between the
sentence embedding of the target sentence against those
from the prompt sentences. While the average and the
maximum of these distances seemed not much
important (“ _ _ ’’, “ _ _ ’’), the minimum
distance and the distance between the target’s and prompt’s
last sentences (“ _ _ ’’, “ _ _ ’’) were the
two most important features. These findings suggest that
coherence is elicited by one or a few proximal sentences,
while the rest might be of secondary importance.</p>
      <p>Our results suggest that including syntactic and
discourse-level features might lead to improved
performances. Syntactic features, such as part of speech
tagging, dependency relationships, or parsing trees, can
provide insights into sentence structure and overall
grammatical coherence. Moreover, discourse-level features,
such as entity co-reference, readability metrics,
argumentative structure, discourse markers or topics
progressionrelated features, could assess the flow of ideas in the
documents provided. Future work will address the
extraction of discourse structure and syntactic features to
enable our model to assess a certain text’s logical
connections, organization and grammatical structure.</p>
      <p>The moderate performance improvement might also
suggest limitations of standard machine learning
models. This could be due to several reasons. First, these
models have limited representation capacity of semantic
relationships compared to deep learning models, as they
lack the sequential modelling needed to represent a text’s
underlying coherence. Moreover, they lack automatic
contextual understanding, focusing more on provided
features that might not capture the global context
appropriately. Finally, they generally struggle with
highdimensional data. If the number of features is high, as in</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgments</title>
      <p>This work has been supported by the Horizon Europe
VALAWAI project (grant agreement number 101070930).</p>
      <p>We also wish to thank the Evalita-23 organizers for
organizing the task and emphasizing the significance of text
coherence measures for the Italian language. The
creation and annotation of the corpus have been
instrumental in advancing the field of Natural Language Processing
for the Italian language and fostering community interest
in coherence assessment. Your eforts will enable us to
develop AI-driven methods for fostering comprehension
assessment applied to both the infosphere and hybrid
speech and language practices.
An entity-based approach, Computational Linguis- [21] M. Lai, S. Menini, M. Polignano, V. Russo, R.
Sprugtics 34 (2008) 1–34. noli, G. Venturi, Evalita 2023: Overview of the 8th
[8] M. Elsner, E. Charniak, Extending the entity grid evaluation campaign of natural language
processwith entity-specific features, in: Proceedings of the ing and speech tools for italian, in: Proceedings
49th Annual Meeting of the Association for Com- of the Eighth Evaluation Campaign of Natural
Lanputational Linguistics: Human Language Technolo- guage Processing and Speech Tools for Italian. Final
gies, 2011, pp. 125–129. Workshop (EVALITA 2023), CEUR.org, Parma, Italy,
[9] B. J. Grosz, A. K. Joshi, S. Weinstein, Centering: 2023.</p>
      <p>A framework for modelling the local coherence of [22] N. Reimers, I. Gurevych, Sentence-bert: Sentence
discourse, IRCS Technical Reports Series (1995). embeddings using siamese bert-networks, arXiv
[10] M. Lapata, R. Barzilay, et al., Automatic evaluation preprint arXiv:1908.10084 (2019).
of text coherence: Models and representations, in: [23] L. McInnes, J. Healy, N. Saul, L. Grossberger, Umap:
Ijcai, volume 5, 2005, pp. 1085–1090. Uniform manifold approximation and projection,
[11] D. T. Nguyen, S. Joty, A neural local coherence The Journal of Open Source Software 3 (2018) 861.
model, in: Proceedings of the 55th Annual Meeting [24] G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen,
of the Association for Computational Linguistics W. Ma, Q. Ye, T. Liu, Lightgbm: A highly eficient
(Volume 1: Long Papers), 2017, pp. 1320–1330. gradient boosting decision tree, in: I. Guyon,
[12] H. C. Moon, T. Mohiuddin, S. Joty, X. Chi, A U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus,
unified neural coherence model, arXiv preprint S. Vishwanathan, R. Garnett (Eds.), Advances
arXiv:1909.00349 (2019). in Neural Information Processing Systems,
vol[13] M. Mesgar, M. Strube, A neural local coherence ume 30, Curran Associates, Inc., 2017. URL: https:
model for text quality assessment, in: Proceed- //proceedings.neurips.cc/paper_files/paper/2017/
ings of the 2018 Conference on Empirical Meth- file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf.
ods in Natural Language Processing, Association [25] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel,
for Computational Linguistics, Brussels, Belgium, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer,
2018, pp. 4328–4339. URL: https://aclanthology.org/ R. Weiss, V. Dubourg, J. Vanderplas, A. Passos,
D18-1464. doi:10.18653/v1/D18- 1464. D. Cournapeau, M. Brucher, M. Perrot, E.
Duch[14] Y. Farag, H. Yannakoudakis, Multi-task learn- esnay, Scikit-learn: Machine learning in Python,
ing for coherence modeling, arXiv preprint Journal of Machine Learning Research 12 (2011)
arXiv:1907.02427 (2019). 2825–2830.
[15] A. Beyer, S. Loáiciga, D. Schlangen, Is
incoherence surprising? targeted evaluation of coherence
prediction from language models, arXiv preprint
arXiv:2105.03495 (2021).
[16] Y. Farag, J. Valvoda, H. Yannakoudakis, T. Briscoe,</p>
      <p>Analyzing neural discourse coherence models,
arXiv preprint arXiv:2011.06306 (2020).
[17] A. Lai, J. Tetreault, Discourse coherence in the wild:</p>
      <p>A dataset, evaluation and methods, arXiv preprint
arXiv:1805.04993 (2018).
[18] L. Pishdad, F. Fancellu, R. Zhang, A. Fazly, How
coherent are neural models of coherence?, in:
Proceedings of the 28th International Conference on</p>
      <p>Computational Linguistics, 2020, pp. 6126–6138.
[19] A. Shen, M. Mistica, B. Salehi, H. Li, T. Baldwin,</p>
      <p>J. Qi, Evaluating document coherence modeling,
Transactions of the Association for Computational</p>
      <p>Linguistics 9 (2021) 621–640.
[20] D. Brunato, D. Colla, F. Dell’Orletta, I. Dini, D. P.</p>
      <p>Radicioni, A. A. Ravelli, Discotex at evalita 2023:
Overview of the assessing discourse coherence in
italian texts task, in: Proceedings of the Eighth
Evaluation Campaign of Natural Language
Processing and Speech Tools for Italian. Final Workshop
(EVALITA 2023), CEUR.org, Parma, Italy, 2023.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jernite</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. R.</given-names>
            <surname>Bowman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Sontag</surname>
          </string-name>
          ,
          <article-title>Discoursebased objectives for fast unsupervised sentence representation learning</article-title>
          ,
          <source>arXiv preprint arXiv:1705.00557</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <article-title>Learning to extract coherent summary via deep reinforcement learning</article-title>
          ,
          <source>in: Proceedings of the AAAI conference on artificial intelligence</source>
          , volume
          <volume>32</volume>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jurafsky</surname>
          </string-name>
          ,
          <article-title>Neural net models of opendomain discourse coherence</article-title>
          ,
          <source>in: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing</source>
          , Association for Computational Linguistics, Copenhagen, Denmark,
          <year>2017</year>
          , pp.
          <fpage>198</fpage>
          -
          <lpage>209</lpage>
          . URL: https://aclanthology.org/D17-1019. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D17</fpage>
          - 1019.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. T.</given-names>
            <surname>Ng</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kan</surname>
          </string-name>
          ,
          <article-title>Automatically evaluating text coherence using discourse relations</article-title>
          ,
          <source>in: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies</source>
          ,
          <year>2011</year>
          , pp.
          <fpage>997</fpage>
          -
          <lpage>1006</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. W.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Qin</surname>
          </string-name>
          , G. Hirst, T. Liu,
          <string-name>
            <given-names>J.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <article-title>Encoding world knowledge in the evaluation of local coherence</article-title>
          ,
          <source>in: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>1087</fpage>
          -
          <lpage>1096</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>Albertin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Miaschi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Brunato</surname>
          </string-name>
          ,
          <article-title>On the role of textual connectives in sentence comprehension: A new dataset for italian</article-title>
          ., in: CLiC-it,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Barzilay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lapata</surname>
          </string-name>
          , Modeling local coherence:
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>