<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>1987. Supplementum 8 - Italian journal of
[18] M. A. K. Halliday</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Exploring the Use of Cohesive Devices in Dementia within an Elderly Italian Semi-spontaneous Speech Corpus</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giorgia Albertin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Elena Martinelli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Alma Mater Studiorum - University of Bologna, Department of Classical Philology and Italian Studies</institution>
          ,
          <addr-line>32 Zamboni Street, 40126 Bologna</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <volume>26</volume>
      <fpage>5234</fpage>
      <lpage>5242</lpage>
      <abstract>
        <p>The study of language disruption in dementia, aimed at individuating which features correlate with cognitive impairment, is a growing area in computational linguistic research. Still, it needs a further development in analyzing some discourse phenomena that also undergo deterioration, and can help expand our understanding of dementia-related speech and refine automatic tools. This paper explores the discourse property of cohesion by investigating three types of cohesive devices: reference, lexical iteration, and connectives. Ten features related to these categories have been defined and automatically extracted from an Italian corpus of semi-spontaneous speech collected from dementia patients and healthy controls. Some of the designed features have proven significant for the binary classification of the two groups and further quantitative analysis highlight interesting diferences in the use of cohesive devices, that seem to be associated with cognitive decline.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Cohesion</kwd>
        <kwd>Cohesive devices</kwd>
        <kwd>Dementia</kwd>
        <kwd>Cognitive Impairment</kwd>
        <kwd>Semi-spontaneous speech</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>Coherence is compromised, especially in spontaneous</title>
        <p>speech: the discourse appears with an abundance of
irLinguistics deficits commonly characterized neurodegen- relevant details and the overt dificulty to mention the
erative diseases from their onset. In Dementia, or Major key concept or to refer to the topic, resulting in a lack of
Neurocognitive Disorder (DSM-5 [1]), a syndrome of informativeness in communication [8, 9, 10].
acquired and progressive impairment in cognitive func- In recent years, speech analysis in cognitive decline
tion that interfere with independence in everyday life, has gained increasing importance in the development
language deterioration manifests itself within a broader of low-cost and portable tools for dementia screening,
framework of cognitive impairment, which could afects also supported by the remarkable advancements in
Natmemory, visuo-spatial skills, executive functions and rea- ural Language Processing (NLP) and Machine Learning
soning. Deficits both in verbal production and compre- (ML) technologies [11]. The refinement of classicfiation
hension have been observed, despite the specificity of systems goes hand in hand with the operationalization
diferent Dementia’s etiological subtypes, among which of linguistic features computed from oral productions,
the most common is Alzheimer’s Disease (AD), character- that need to be adapted to diferent languages.
Regardized with a primary impairment in episodic memory. In ing Italian, the OPLON (OPportunities for active and
AD, for example, among the well-established linguistic healthy LONgevity) [2014-2016] project was devoted to
deficits there are word-finding problems, which include the automatic extraction of an extensive group of
linguisanomia, the production of semantic paraphasias [2, 3] and tic features from acoustic, rhythmic, readability, lexical,
the "on the-tip-of-the tongue" experience [4], low speech morpho-syntactic and syntactic levels, from a speech
corrate, poor word comprehension [5] and, as the disease pus of cognitively impaired patients and healthy peers
worsen, a generalized simplification of syntax [ 6]. Also [12, 13]. Analysis of the significance of the features
highdiscourse and pragmatic level is afected by cognitive de- lighted that the acoustics ones largely correlated with
cline. Errors in referential cohesion has been registered, the cognitive state of the subjects [14].
in particular regarding ambiguous use of pronouns [7]. Expanding the list of language levels covered to
include speech properties would enrich the features used
CLiC-it 2024: Tenth Italian Conference on Computational Linguistics, for classification and, in addition, could broaden our
un*DCecor0r4es—po0n6,d2in02g4a, uPtihsao,r.Italy derstanding of how cognitive decline manifests itself
† The contribution of each author to the paper is specified in the in verbal competence. Nevertheless, denfiing specific
CRediT authorship statement declaration. features of higher-level and complex phenomena is not
$ giorgia.albertin3@unibo.it (G. Albertin); trivial. Drawing inspiration from works that propose a
elena.martinelli12@unibo.it (E. Martinelli) "stratified" approach to discourse analysis, which
indi https://www.unibo.it/sitoweb/giorgia.albertin3 (G. Albertin); vidually considers macro-phenomena that intersect with
http0s0:0//0w-0w00w2.-u5n7i2b8o-.3it4/7s3ito(Gw.eAb/lebleerntain.m);a0r0t0in9e-0ll0i1027/-4(E39.9M-6a9r5ti1nelli) one another [15, 16], this paper will examine cohesion,
(E. Martinelli) the property of the superficial form of the text to reflect
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License its internal unity [17]. Cohesion assures continuity in
disAttribution 4.0 International (CC BY 4.0).
course through a network of cohesive devices, which are
mainly words or morphemes, that contribute to maintain
semantic relations occurring in the text [17]. Therefore,
we proposed a method to design and formalize a set of
cohesion features, with the aim of observing whether
they contribute to discriminate the speech of individuals
with dementia from healthy peers. Specifically, three
types of elements, which Halliday &amp; Hasan [18]
indicate among the major contributors to cohesion, were
taken into consideration: reference, lexical iteration and
connectives. The implementation of measures based on
cohesive devices is the first step towards the attempt to
include discourse properties in the automatic analysis of Figure 1: Esame del Linguaggio II [19], stimulus figure used
language in cognitive decline. The study of their interac- in the picture description task.
tion with features of other linguistic levels is crucial to
observe whether they have a positive impact on
discrimination between dementia subjects and healthy subjects. in the collected speech, and will be discussed in Section
The work presented in this paper, therefore, has to be in- 4 in relation to the results of the analysis.
tended as a preliminary analysis that will serve to pursue The Pathological Group (PG) consists of 20 patients
more sophisticated ML classification in the future. sufering from diferent forms of dementia (9 cases of
Alzheimer’s Disease, 2 of Mixed Dementia, 5 of
unspeci2. Corpus Description ifed Dementia, 3 of Vascular Dementia, 1 of
Frontotemporal Dementia), recruited at the “Universo Salute - Opera
In this study, we used the corpus collected within the Don Uva (PZ)” rest home, and the Control Group (CG)
project "Linguistic characteristics of the speech of el- consists of 20 subjects with neurotypical cognitive aging.
derly subjects with dementia” [20, 21], approved by the Informed consent was obtained from all participants (in
Bioethics Committee of the University of Bologna (Prot. the case of patients, by their family members, caregivers,
N. 0072032/2022). The corpus consists of oral linguistic or legal tutors). As a first step, the recruited subjects
unproduction of 40 Italian-speaking individuals living in derwent an evaluation of their cognitive status through
Basilicata, forming two groups balanced by sex and age. the administration of the four following
neuropsychologAlthough the initial objective was to balance the cohorts ical tests: Mini-Mental State Examination (MMSE [22]),
also on education level, it was not possible to consider Montreal Cognitive Assesment (MoCA [23]), and Verbal
this aspect due to the lack of this information in some Fluency Test, both Phonemic [24, 25, 26, 27] and
Semanpatients medical records. Even from a sociolinguistic tic [28]. The Table 1 summarizes the recruitment criteria
perspective, it is important to advance that some par- and the demographics for study participants.
ticipants, albeit Italian-speaking, were also exposed to Then, two narrative tasks (the story of a journey and
dialect systems in their lives. This aspect explains the fre- the story of the Christmas holiday’s traditions) and one
quent occurrence of substandard linguistic expressions picture description task (using the stimulus figure in
“Language Examination II" [19], see Figure 2) were adminis- to something already known in the text or anticipating
tered to collect semi-spontaneous speech, elicited with it. Reference functions either by repetition, which can
the following stimulus sentences: 1) "Do you want to tell be partial (e.g., through a synonym) or total, by semantic
me about a trip you took?"; 2) “How do you usually spend contiguity, or by substitution with pronouns or other
Christmas day?”; 3) “Could you describe this figure to elements [17]. It is this second type of referential
exme?”. This protocol allowed the collection of approxi- pressions, closely linked to the textual dimension, that is
mately 9 hours of audio (i.e., 8 hours for the recruited investigated through the features, thus focusing on the
groups and 1 hour for the interviewer), subsequently an- occurrence of anaphora and cataphora.
notated at various linguistic levels. By using the ELAN An extensive literature review was necessary to
sesoftware [29], the corpus was manually transcribed at lect a relevant group of those expressions in the Italian
the orthographic level, segmented into utterances (i.e., language (see [33, 34, 35]). The group of elements
colthe reference unit of discursive analysis [30]), and anno- lected includes pronouns, both personal (e.g., io, tu, lei,
tated at the prosodic level (theoretical framework: The lui), demonstrative (e.g., questo, quello), indefinite (e.g.,
Language into Act Theory - L-AcT [31]). Table 2 sum- alcuni, tutti), and possessive, possessive adjectives (e.g.,
marize the size of the corpus and the average material mio, tuo), as well as deictics (e.g., fuori, sopra, avanti, qua,
(audio/token) collected for each patient and control sub- qui, dentro, dietro, giù, indietro, su, lì, avanti, oltre, ci). The
ject. The total number of tokens was calculated on the occurrences of these groups were counted and divided by
orthographic transcription of the corpus (cleaned of an- the total number of tokens per subject (COE_REF).
Addinotation tags), and consists of 49,263 tokens (i.e., 23,518 tionally, the pronoun density (COE_PRON_DENS), defined
for PG and 25,745 for CG). Finally, using the Gagliardi as the ratio between pronouns and nouns uttered [36],
&amp; Tamburini pipeline [32], tokenization, lemmatization, was computed for each subject.
part-of-speech tagging, and syntactic parsing was
automatically performed for the entire corpus.</p>
        <sec id="sec-1-1-1">
          <title>3.2. Lexical iteration</title>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3. Cohesive Devices’ Features</title>
      <p>Ten features that quantify the use of cohesive devices
by the speakers were designed and formalised. The
features were computed with respect to each subject, thus
referring to the amount of speech produced by the
single individual in the three tasks. To comprehensively
address the categories of cohesive devices considered, we
use the .conll file resulted from the data annotation as
the input for our analysis. Features’ automatic extraction
was done via .python scripts. The methodology used
will be described in detail in the following sections.</p>
      <sec id="sec-2-1">
        <title>3.1. Reference</title>
        <sec id="sec-2-1-1">
          <title>Reference is involved when an expression that requires interpretation by referring to something else occurs in the discourse [18]. This mechanism can be employed both in anaphoric and cataphoric uses, to refer respectively</title>
          <p>According to Halliday and Hasan [18], the iteration of
a lexical item is a specific use of the repetition-type
referential mechanism, which acquires cohesive force on
its own because it is typically used when the referent is
farther in the text. This set of features focuses on the
repetition of three main open-class categories, namely
nouns, (main) verbs, and adjectives. The use of words
from these classes afects the richness of vocabulary,
relfecting the speaker’s tendency toward lexical variation.
Word-finding problems occurring in cognitive decline
often manifest as dificulties in retrieving forms from
the lexicon. The repetition of the same words can then
occur as a sort of repair mechanism, resulting in
semantically impoverished speech. Conversely, the use of some
types of closed-class particles, such as prepositions and
auxiliaries, is bound to the syntactic structure.</p>
          <p>Lexical iteration features were computed by
separately considering word forms and lemmas of nouns,
verbs, and adjectives. These features include the
repetitions of elements divided by the total number
of words (COE_RIP_LEM, COE_RIP_WORD), the
average number of repetitions for repeated elements
(COE_MEDRIP_LEM, COE_MEDRIP_WORD), and the
maximum number of repetitions over the total number of
iterations (COE_MAXRIP_LEM, COE_MAXRIP_WORD).</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>3.3. Connectives</title>
        <p>As defined by Ferrari [ 37], connectives are
morphologically invariable forms (e.g., conjunctions or locutions)
that explicitly indicate logical relations within parts of
the text and pertain to the logical level. Elements from
diferent grammatical classes can be used as connectives
and are classified based on their function, which usually
reflects their meaning (e.g., temporal, causal, additive).</p>
        <p>To compile an extensive list of connectives, we rely
on the Lexicon of Italian Connectives - LICO1 [38, 39].
LICO contains 173 entries, including single words (e.g., e,
se, ma, infatti, quando, quindi), complex expressions (e.g.,
a causa di, da allora), and correlatives (e.g., da un lato
... dall’altro). Connectives are reported along with their
lexical or orthographic variants, part of speech category,
the semantic relations conveyed according to the Penn
Discourse Tree Bank 3.0 schema [40], examples of usage,
and alignments of connectives from other languages. A
feature was devoted to compute the occurrences of
connectives relative to the total number of tokens per subject
(COE_TC).</p>
        <p>Finally, the last feature was designed as an attempt
to capture the overall impact of the classes of cohesive
devices studied in this paper in the two cohorts of
corpus speakers. Therefore, the role of cohesion elements
was comprehensively measured in COE_TOT by summing
referential-substitute expressions, lexical iteration items
and connectives, divided by the total number of words.</p>
        <p>Figure 3.3 shows as example an excerpt from the
annotation in .conll format, in which some of the linguistic
elements considered were highlighted.</p>
        <sec id="sec-2-2-1">
          <title>1http://connective-lex.info/</title>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Results</title>
      <p>The statistical significance of the cohesion features for
the binary discrimination of PG and CG cohorts was
calculated using the non-parametric Kolmogorov-Smirnov
test, due to the limited sample size of the corpus. Given
the number of comparisons performed, we adjusted the
results with Bonferroni correction to control for Type I
error. This approach involves adjusting the significance
level by dividing the conventional alpha value (0.05) by
the total number of comparisons made. The results of
the test, reported in Table 3, show that two of the
designed features significantly contribute to diferentiate
the two groups: a feature related to lemmas’ iteration
(COE_RIP_LEM) and the comprehensive feature of
cohesive devices (COE_TOT). The distribution of these features
is reported in Figure 4.</p>
      <p>The application of Bonferroni’s correction caused a
decrease in the p-value of two initially significant
features, namely COE_TC and COE_MAXRIP_WORD. Given
the exploratory nature of the experiment, which involves
the formalisation of new features in order to discriminate
subjects with cognitive impairment from healthy
controls in Italian, we have nevertheless chosen to highlight
the p-values of these features in 3.</p>
      <p>We can observe that, compared with the control group,
the speech of dementia subjects is characterized by fewer
repetitions of the same noun, verb and adjective
lemmas out of the total number of words uttered, captured
by COE_RIP_LEM. Thus in the dataset emerges that PG
group is less prone to lexical iteration of lemmas than
CG. However, if we have a look to the occurrences’
distributions of the cohesive elements considered, reported
in Table 4, interesting trends could be noticed. Indeed,
the quantitative analysis of lexical repetitions revealed a
disparity between repeated lemmas and repeated word
forms of the same grammatical categories (noun,
adjectives and verb) between the two groups. Specifically,
despite the high variability due to subjective diferences,
it is observed that in PG, the average repetition of forms
(mean=74.15) is higher than the repetition of lemmas
(mean=68.9), while the two values are very similar in CG
(lemmas: mean=87.05, words: mean=87.8). This
imbalance in favor of forms in the dementia patients appears
to uncover lexical impoverishment compared to healthy
subjects. Indeed in CG, although a higher overall number
of repetitions is registered, it is combined with a more
balanced distribution between lemmas and forms, suggest
greater lexical variety.</p>
      <p>An additional consideration regarding the opposing
trend observed between lemmas and forms could be
explained with respect to the sociolinguistic profile of the
data, related to the diatopic variation of Italian language
[41]. Indeed, speakers from both groups show an
extensive use of dialectal terms and structures characteristic of
the Italian variety spoken in the Lucanian Apennine area.
As reported in Section 2, the annotation was conducted
automatically using the pipeline developed by Gagliardi
&amp; Tamburini [32], which is designed to analyze standard
Italian. Therefore, it is likely that the system struggled
to handle some substandard expressions, which often
orthographically diverge from the other words in the
transcription, as can be observed in this example from a
PG subject:
gemm’ a trua’ [=andammo a fare visita] a mia
suocera, ca [=che] mio suocero è morto (. . . ).</p>
      <p>It is not excluded that the presence of dialect may also
have influenced the automatic extraction of other
cohesive devices. Indeed, the higher frequency in CG of
substitution-type reference items (mean=161) and
connectives (mean=36.65) compared to PG (ref. mean=146.5,
conn. mean=23.8) contrasts with what has been observed
in oral production of narrative discourse in cohorts of
dementia subjects and healthy controls [8]. Therefore,
we consider the possibility that automatic feature
extraction preceded on manually-checked annotation may
yield diferent results than those obtained.</p>
      <p>Nevertheless, the significance of the comprehensive
feature (COE_TOT) indicates that the use of cohesive
devices investigated in this paper plays a role in
distinguishing dementia subjects from healthy controls. In
Figure 4 it can be noted that COE_TOT shows, on average,
lower values for the PG compared to the CG. This results
suggests that the linguistic processing of some
phenomena related to cohesion (i.e. substitution-type reference
elements, lexical iteration items, and connectives) is
generally afected by cognitive decline in semi-spontaneous
speech. Thus, the analysis of discourse properties seems
to be a promising path for studying the linguistic
characterisation of neurodegenerative disorders. Therefore, we
hope that our approach in the future could be applied to
phenomena strictly related to cohesion - first of all,
coherence - or extend to other domains, such as pragmatics,
that may mask subtle clues of cognitive frailty.</p>
    </sec>
    <sec id="sec-4">
      <title>5. Conclusion</title>
      <p>a comparative review, Journal of clinical and
experimental neuropsychology 30 (2008) 501–556.</p>
      <p>In this work, we present a methodology for delineat- [4] E. A. Stamatakis, M. A. Shafto, G. Williams, P. Tam,
ing linguistic features of cohesion to track and study L. K. Tyler, White matter changes and word
findchanges in discourse properties in the speech of indi- ing failures with increasing age, PloS one 6 (2011)
viduals with cognitive impairment compared to healthy e14496.
peers. The research focused on three types of cohesive [5] A. E. Budson, N. W. Kowall, The handbook of
devices, i.e., reference, lexical iteration, and connectives, Alzheimer’s disease and other dementias, John
Withat were automatically extracted from a Italian corpus ley &amp; Sons, 2011.
of semi-spontaneous speech from dementia subjects and [6] S. O. Orimaye, J. S.-M. Wong, K. J. Golden, Learning
controls, collected in Basilicata. Statistical significance predictive linguistic features for alzheimer’s disease
for binary discrimination was computed applying the and related dementias using verbal utterances, in:
Kolmogorov-Smirnov test, and then adjusting the results Proceedings of the Workshop on Computational
with Bonferroni’s method. The test shows that a feature Linguistics and Clinical Psychology: From
linguisof the repetitions of lemmas and the one related to the tic signal to clinical reality, 2014, pp. 78–87.
set of cohesive devices jointly considered contribute to [7] S. Carlomagno, A. Santoro, A. Menditti, M.
Pandistinguish the two groups. Moreover, the quantitative dolfi, A. Marini, Referential communication in
distribution of the cohesive devices reveals diferences alzheimer’s type dementia, Cortex 41 (2005) 520–
in the use of elements within the considered categories 534.
between PG and CG, which seem to highlight a general [8] C. Drummond, G. Coutinho, R. P. Fonseca, N.
Asdeterioration in discursive competencies associated with sunção, A. Teldeschi, R. de Oliveira-Souza, J. Moll,
dementia. The results obtained provide a preliminary ba- F. Tovar-Moll, P. Mattos, Deficits in narrative
dissis for further study of discourse properties in cognitive course elicited by visual stimuli are already present
decline, with the aim of expanding the set of linguis- in patients with mild cognitive impairment,
Frontic features that can be automatically extracted to other tiers in aging neuroscience 7 (2015) 96.
levels of language. This expansion is intended to refine [9] S. Ahmed, A.-M. F. Haigh, C. A. de Jager, P. Garrard,
digital systems that could be employed as support for Connected speech as a marker of disease
progresthe early diagnosis and monitoring of neurodegenerative sion in autopsy-proven alzheimer’s disease, Brain
diseases, potentially improving timely interventions for 136 (2013) 3727–3737.
patients and their caregivers. [10] T. Bschor, K.-P. Kühl, F. M. Reischies, Spontaneous
speech of patients with dementia of the alzheimer
CRediT authorship statement type and mild cognitive impairment, International
psychogeriatrics 13 (2001) 289–298.
declaration [11] S. De la Fuente Garcia, C. W. Ritchie, S. Luz,
Artificial intelligence, speech, and language processing
GA Conceptualization, Methodology, Software (i.e. fea- approaches to monitoring alzheimer’s disease: a
tures formalization), Formal analysis, Writing (§ 1, 3, 4, systematic review, Journal of Alzheimer’s Disease
5). 78 (2020) 1547–1574.</p>
      <p>EM Resources (i.e. data collection), Data curation (i.e. [12] L. Calzà, G. Gagliardi, R. R. Favretti, F. Tamburini,
manual transcription), Writing (§ 2). Linguistic features and automatic classifiers for
identifying mild cognitive impairment and
demenReferences tia, Computer Speech &amp; Language 65 (2021) 101113.
[13] D. Beltrami, G. Gagliardi, R. Rossini Favretti, E.
Ghi[1] D. American Psychiatric Association, D. American doni, F. Tamburini, L. Calzà, Speech analysis by
Psychiatric Association, et al., Diagnostic and statis- natural language processing techniques: a possible
tical manual of mental disorders: DSM-5, volume 5, tool for very early detection of cognitive decline?,
American psychiatric association Washington, DC, Frontiers in aging neuroscience 10 (2018) 369.
2013. [14] G. Gagliardi, F. Tamburini, Linguistic
biomark[2] E. Catricalà, P. A. Della Rosa, V. Plebani, D. Perani, ers for the detection of mild cognitive impairment,
P. Garrard, S. F. Cappa, Semantic feature degrada- Lingue e linguaggio 20 (2021) 3–31.
tion and naming performance. evidence from neu- [15] B. S. Kim, Y. B. Kim, H. Kim, Discourse measures
rodegenerative disorders, Brain and language 147 to diferentiate between mild cognitive impairment
(2015) 58–65. and healthy aging, Frontiers in aging neuroscience
[3] V. Taler, N. A. Phillips, Language performance in 11 (2019) 221.</p>
      <p>alzheimer’s disease and mild cognitive impairment: [16] J. Kim, J. Shim, J. H. Yoon, Subjective rating scale for</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>