Exploring the Use of Cohesive Devices in Dementia within
                                an Elderly Italian Semi-spontaneous Speech Corpus
                                Giorgia Albertin*,† , Elena Martinelli†
                                Alma Mater Studiorum - University of Bologna, Department of Classical Philology and Italian Studies, 32 Zamboni Street, 40126 Bologna, Italy


                                                Abstract
                                                The study of language disruption in dementia, aimed at individuating which features correlate with cognitive impairment,
                                                is a growing area in computational linguistic research. Still, it needs a further development in analyzing some discourse
                                                phenomena that also undergo deterioration, and can help expand our understanding of dementia-related speech and refine
                                                automatic tools. This paper explores the discourse property of cohesion by investigating three types of cohesive devices:
                                                reference, lexical iteration, and connectives. Ten features related to these categories have been defined and automatically
                                                extracted from an Italian corpus of semi-spontaneous speech collected from dementia patients and healthy controls. Some of
                                                the designed features have proven significant for the binary classification of the two groups and further quantitative analysis
                                                highlight interesting differences in the use of cohesive devices, that seem to be associated with cognitive decline.

                                                Keywords
                                                Cohesion, Cohesive devices, Dementia, Cognitive Impairment, Semi-spontaneous speech


                                1. Introduction                                                                                        Coherence is compromised, especially in spontaneous
                                                                                                                                       speech: the discourse appears with an abundance of ir-
                                Linguistics deficits commonly characterized neurodegen- relevant details and the overt difficulty to mention the
                                erative diseases from their onset. In Dementia, or Major key concept or to refer to the topic, resulting in a lack of
                                Neurocognitive Disorder (DSM-5 [1]), a syndrome of informativeness in communication [8, 9, 10].
                                acquired and progressive impairment in cognitive func-                                                    In recent years, speech analysis in cognitive decline
                                tion that interfere with independence in everyday life, has gained increasing importance in the development
                                language deterioration manifests itself within a broader of low-cost and portable tools for dementia screening,
                                framework of cognitive impairment, which could affects also supported by the remarkable advancements in Nat-
                                memory, visuo-spatial skills, executive functions and rea- ural Language Processing (NLP) and Machine Learning
                                soning. Deficits both in verbal production and compre- (ML) technologies [11]. The refinement of classification
                                hension have been observed, despite the specificity of systems goes hand in hand with the operationalization
                                different Dementia’s etiological subtypes, among which of linguistic features computed from oral productions,
                                the most common is Alzheimer’s Disease (AD), character- that need to be adapted to different languages. Regard-
                                ized with a primary impairment in episodic memory. In ing Italian, the OPLON (OPportunities for active and
                                AD, for example, among the well-established linguistic healthy LONgevity) [2014-2016] project was devoted to
                                deficits there are word-finding problems, which include the automatic extraction of an extensive group of linguis-
                                anomia, the production of semantic paraphasias [2, 3] and tic features from acoustic, rhythmic, readability, lexical,
                                the "on the-tip-of-the tongue" experience [4], low speech morpho-syntactic and syntactic levels, from a speech cor-
                                rate, poor word comprehension [5] and, as the disease pus of cognitively impaired patients and healthy peers
                                worsen, a generalized simplification of syntax [6]. Also [12, 13]. Analysis of the significance of the features high-
                                discourse and pragmatic level is affected by cognitive de- lighted that the acoustics ones largely correlated with
                                cline. Errors in referential cohesion has been registered, the cognitive state of the subjects [14].
                                in particular regarding ambiguous use of pronouns [7].                                                    Expanding the list of language levels covered to in-
                                                                                                                                       clude speech properties would enrich the features used
                                CLiC-it 2024: Tenth Italian Conference on Computational Linguistics,
                                                                                                                                       for classification and, in addition, could broaden our un-
                                Dec 04 — 06, 2024, Pisa, Italy
                                *
                                  Corresponding author.                                                                                derstanding of how cognitive decline manifests itself
                                  The contribution of each author to the paper is specified in the in verbal competence. Nevertheless, defining specific
                                †

                                  CRediT authorship statement declaration.                                                             features of higher-level and complex phenomena is not
                                $ giorgia.albertin3@unibo.it (G. Albertin);                                                            trivial. Drawing inspiration from works that propose a
                                elena.martinelli12@unibo.it (E. Martinelli)                                                            "stratified" approach to discourse analysis, which indi-
                                 https://www.unibo.it/sitoweb/giorgia.albertin3 (G. Albertin);
                                                                                                                                       vidually considers macro-phenomena that intersect with
                                https://www.unibo.it/sitoweb/elena.martinelli12/ (E. Martinelli)
                                 0000-0002-5728-3473 (G. Albertin); 0009-0007-4399-6951                                               one another [15, 16], this paper will examine cohesion,
                                (E. Martinelli)                                                                                        the property of the superficial form of the text to reflect
                                          © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
                                          Attribution 4.0 International (CC BY 4.0).                                                   its internal unity [17]. Cohesion assures continuity in dis-


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
Table 1
Recruitment Criteria (age; language exposure; neurological status or diagnosis; cognitive scores: MMSE, MoCA, phonemic (PF)
and semantic (SF) fluency) and Demographics (age and sex).

                                    Control Group                                Pathological Group
                                    Age > 60 years                               Age > 60 years
                                    Monolingual                                  Monolingual
                                    Italian L1                                   Italian L1
                                    Absence of neurological/sensory deficits     Clinical diagnosis of dementia
             Recruitment criteria
                                    MMSE ≥ 22                                    MMSE < 22
                                    MoCA > 19.262                                MoCA ≤ 19.262
                                    PF ≥ 17.35                                   PF < 17.35
                                    SF ≥ 7.25                                    SF < 7.25
                     Age            81 ± 6.3 (range: 63-91)                      81 ± 6.9 (range: 63-92)
                     Sex            12F, 8M                                      12F, 8M


course through a network of cohesive devices, which are
mainly words or morphemes, that contribute to maintain
semantic relations occurring in the text [17]. Therefore,
we proposed a method to design and formalize a set of
cohesion features, with the aim of observing whether
they contribute to discriminate the speech of individuals
with dementia from healthy peers. Specifically, three
types of elements, which Halliday & Hasan [18] indi-
cate among the major contributors to cohesion, were
taken into consideration: reference, lexical iteration and
connectives. The implementation of measures based on
cohesive devices is the first step towards the attempt to
include discourse properties in the automatic analysis of       Figure 1: Esame del Linguaggio II [19], stimulus figure used
language in cognitive decline. The study of their interac-      in the picture description task.
tion with features of other linguistic levels is crucial to
observe whether they have a positive impact on discrim-
ination between dementia subjects and healthy subjects.         in the collected speech, and will be discussed in Section
The work presented in this paper, therefore, has to be in-      4 in relation to the results of the analysis.
tended as a preliminary analysis that will serve to pursue         The Pathological Group (PG) consists of 20 patients
more sophisticated ML classification in the future.             suffering from different forms of dementia (9 cases of
                                                                Alzheimer’s Disease, 2 of Mixed Dementia, 5 of unspeci-
2. Corpus Description                                           fied Dementia, 3 of Vascular Dementia, 1 of Frontotempo-
                                                                ral Dementia), recruited at the “Universo Salute - Opera
In this study, we used the corpus collected within the          Don Uva (PZ)” rest home, and the Control Group (CG)
project "Linguistic characteristics of the speech of el-        consists of 20 subjects with neurotypical cognitive aging.
derly subjects with dementia” [20, 21], approved by the         Informed consent was obtained from all participants (in
Bioethics Committee of the University of Bologna (Prot.         the case of patients, by their family members, caregivers,
N. 0072032/2022). The corpus consists of oral linguistic        or legal tutors). As a first step, the recruited subjects un-
production of 40 Italian-speaking individuals living in         derwent an evaluation of their cognitive status through
Basilicata, forming two groups balanced by sex and age.         the administration of the four following neuropsycholog-
Although the initial objective was to balance the cohorts       ical tests: Mini-Mental State Examination (MMSE [22]),
also on education level, it was not possible to consider        Montreal Cognitive Assesment (MoCA [23]), and Verbal
this aspect due to the lack of this information in some         Fluency Test, both Phonemic [24, 25, 26, 27] and Seman-
patients medical records. Even from a sociolinguistic           tic [28]. The Table 1 summarizes the recruitment criteria
perspective, it is important to advance that some par-          and the demographics for study participants.
ticipants, albeit Italian-speaking, were also exposed to           Then, two narrative tasks (the story of a journey and
dialect systems in their lives. This aspect explains the fre-   the story of the Christmas holiday’s traditions) and one
quent occurrence of substandard linguistic expressions          picture description task (using the stimulus figure in “Lan-
Table 2
Corpus Size. Audio duration and number of tokens (of the transcriptions) are reported, both with respect to the groups (Gr.
durat. and Gr. count), to the single subject (Subj. avg (st.dev)) and to the whole corpus.

                                                      Audio                              Tokens
                                         Gr. durat.     Subj. avg. (sd)      Gr. count      Subj. avg. (sd)
                   Pathological group     04:25:26     00:12:00 (00:08:00)     23,518        1,176 (1,218)
                   Control group          03:23:17     00:10:00 (00:05:00)     25,745         1,287 (710)
                   Total                  07:48:43              -              49,263              -


guage Examination II" [19], see Figure 2) were adminis-        to something already known in the text or anticipating
tered to collect semi-spontaneous speech, elicited with        it. Reference functions either by repetition, which can
the following stimulus sentences: 1) "Do you want to tell      be partial (e.g., through a synonym) or total, by semantic
me about a trip you took?"; 2) “How do you usually spend       contiguity, or by substitution with pronouns or other
Christmas day?”; 3) “Could you describe this figure to         elements [17]. It is this second type of referential ex-
me?”. This protocol allowed the collection of approxi-         pressions, closely linked to the textual dimension, that is
mately 9 hours of audio (i.e., 8 hours for the recruited       investigated through the features, thus focusing on the
groups and 1 hour for the interviewer), subsequently an-       occurrence of anaphora and cataphora.
notated at various linguistic levels. By using the ELAN           An extensive literature review was necessary to se-
software [29], the corpus was manually transcribed at          lect a relevant group of those expressions in the Italian
the orthographic level, segmented into utterances (i.e.,       language (see [33, 34, 35]). The group of elements col-
the reference unit of discursive analysis [30]), and anno-     lected includes pronouns, both personal (e.g., io, tu, lei,
tated at the prosodic level (theoretical framework: The        lui), demonstrative (e.g., questo, quello), indefinite (e.g.,
Language into Act Theory - L-AcT [31]). Table 2 sum-           alcuni, tutti), and possessive, possessive adjectives (e.g.,
marize the size of the corpus and the average material         mio, tuo), as well as deictics (e.g., fuori, sopra, avanti, qua,
(audio/token) collected for each patient and control sub-      qui, dentro, dietro, giù, indietro, su, lì, avanti, oltre, ci). The
ject. The total number of tokens was calculated on the         occurrences of these groups were counted and divided by
orthographic transcription of the corpus (cleaned of an-       the total number of tokens per subject (COE_REF). Addi-
notation tags), and consists of 49,263 tokens (i.e., 23,518    tionally, the pronoun density (COE_PRON_DENS), defined
for PG and 25,745 for CG). Finally, using the Gagliardi        as the ratio between pronouns and nouns uttered [36],
& Tamburini pipeline [32], tokenization, lemmatization,        was computed for each subject.
part-of-speech tagging, and syntactic parsing was auto-
matically performed for the entire corpus.                     3.2. Lexical iteration
                                                            According to Halliday and Hasan [18], the iteration of
3. Cohesive Devices’ Features                               a lexical item is a specific use of the repetition-type ref-
                                                            erential mechanism, which acquires cohesive force on
Ten features that quantify the use of cohesive devices its own because it is typically used when the referent is
by the speakers were designed and formalised. The fea- farther in the text. This set of features focuses on the
tures were computed with respect to each subject, thus repetition of three main open-class categories, namely
referring to the amount of speech produced by the sin- nouns, (main) verbs, and adjectives. The use of words
gle individual in the three tasks. To comprehensively from these classes affects the richness of vocabulary, re-
address the categories of cohesive devices considered, we flecting the speaker’s tendency toward lexical variation.
use the .conll file resulted from the data annotation as Word-finding problems occurring in cognitive decline
the input for our analysis. Features’ automatic extraction often manifest as difficulties in retrieving forms from
was done via .python scripts. The methodology used the lexicon. The repetition of the same words can then
will be described in detail in the following sections.      occur as a sort of repair mechanism, resulting in seman-
                                                            tically impoverished speech. Conversely, the use of some
3.1. Reference                                              types of closed-class particles, such as prepositions and
                                                            auxiliaries, is bound to the syntactic structure.
Reference is involved when an expression that requires
                                                               Lexical iteration features were computed by sepa-
interpretation by referring to something else occurs in the
                                                            rately considering word forms and lemmas of nouns,
discourse [18]. This mechanism can be employed both
                                                            verbs, and adjectives. These features include the
in anaphoric and cataphoric uses, to refer respectively
Figure 2: Example of .conll annotation. Occurrences of automatically extracted cohesion devices are reframed: lui as a
referential expression (note the specification PronType:Prs in FEAT column), the repetition of word forms and lemma of a
verb (parlava - parlare) and the connectives e and quando.


repetitions of elements divided by the total number            Table 3
of words (COE_RIP_LEM, COE_RIP_WORD), the av-                  Results of Kolmogorov-Smirnov test. The cohesive devices’
erage number of repetitions for repeated elements              features are reported along with their p-value, significant ones
(COE_MEDRIP_LEM, COE_MEDRIP_WORD), and the max-                are marked in bold. The p-values of features that resulted sig-
imum number of repetitions over the total number of            nificant in Kolmogorov-Smirnov test but not after Bonferroni’s
iterations (COE_MAXRIP_LEM, COE_MAXRIP_WORD).                  correction are given in italic.

                                                                             Features                      p-value
3.3. Connectives                                                             COE_TC                         0.33
                                                                             COE_REF                          1
As defined by Ferrari [37], connectives are morpholog-
                                                                             COE_REF_DENS                     1
ically invariable forms (e.g., conjunctions or locutions)                    COE_RIP_LEM                    0.04
that explicitly indicate logical relations within parts of                   COE_RIP_WORD                     1
the text and pertain to the logical level. Elements from                     COE_MEDRIP_LEM                 0.81
different grammatical classes can be used as connectives                     COE_MEDRIP_WORD                0.33
and are classified based on their function, which usually                    COE_MAXRIP_LEM                   1
reflects their meaning (e.g., temporal, causal, additive).                   COE_MAXRIP_WORD                  1
    To compile an extensive list of connectives, we rely                     COE_TOT                        0.04
on the Lexicon of Italian Connectives - LICO 1 [38, 39].
LICO contains 173 entries, including single words (e.g., e,
                                                               Table 4
se, ma, infatti, quando, quindi), complex expressions (e.g.,   Frequencies of cohesive devices by subject. The average num-
a causa di, da allora), and correlatives (e.g., da un lato     ber of occurrences of substitution-type reference items, itera-
... dall’altro). Connectives are reported along with their     tions of lemmas and of word forms (of nouns, adjectives and
lexical or orthographic variants, part of speech category,     verbs) and connectives for each subject in PG and CG is re-
the semantic relations conveyed according to the Penn          ported, along with (st. dev).
Discourse Tree Bank 3.0 schema [40], examples of usage,
                                                                    Cohesive devices           PG                    CG
and alignments of connectives from other languages. A
feature was devoted to compute the occurrences of con-              Reference             146.5 (152.23)      161 (90.93)
nectives relative to the total number of tokens per subject         Iter. lemma            68.9 (68.00)      87.05 (42.25)
(COE_TC).                                                           Iter. word form        74.15 (74.38)      87.8 (49.25)
    Finally, the last feature was designed as an attempt            Connectives            23.8 (35.15)      36.65 (26.68)
to capture the overall impact of the classes of cohesive
devices studied in this paper in the two cohorts of cor-
pus speakers. Therefore, the role of cohesion elements         4. Results
was comprehensively measured in COE_TOT by summing
referential-substitute expressions, lexical iteration items    The statistical significance of the cohesion features for
and connectives, divided by the total number of words.         the binary discrimination of PG and CG cohorts was cal-
    Figure 3.3 shows as example an excerpt from the anno-      culated using the non-parametric Kolmogorov-Smirnov
tation in .conll format, in which some of the linguistic       test, due to the limited sample size of the corpus. Given
elements considered were highlighted.                          the number of comparisons performed, we adjusted the
                                                               results with Bonferroni correction to control for Type I
1
    http://connective-lex.info/                                error. This approach involves adjusting the significance
                                                               (mean=68.9), while the two values are very similar in CG
                                                               (lemmas: mean=87.05, words: mean=87.8). This imbal-
                                                               ance in favor of forms in the dementia patients appears
                                                               to uncover lexical impoverishment compared to healthy
                                                               subjects. Indeed in CG, although a higher overall number
                                                               of repetitions is registered, it is combined with a more bal-
                                                               anced distribution between lemmas and forms, suggest
                                                               greater lexical variety.
                                                                  An additional consideration regarding the opposing
                                                               trend observed between lemmas and forms could be ex-
                                                               plained with respect to the sociolinguistic profile of the
                                                               data, related to the diatopic variation of Italian language
                                                               [41]. Indeed, speakers from both groups show an exten-
                                                               sive use of dialectal terms and structures characteristic of
                                                               the Italian variety spoken in the Lucanian Apennine area.
                                                               As reported in Section 2, the annotation was conducted
                                                               automatically using the pipeline developed by Gagliardi
Figure 3: Distribution plots of significantly discriminative   & Tamburini [32], which is designed to analyze standard
features. COE_RIP_LEM indicates the repetitions of lemmas of   Italian. Therefore, it is likely that the system struggled
nouns, adjectives and verbs and COE_TOT is a comprehensive
                                                               to handle some substandard expressions, which often
features of all the classes of cohesive devices considered.
                                                               orthographically diverge from the other words in the
                                                               transcription, as can be observed in this example from a
                                                               PG subject:
level by dividing the conventional alpha value (0.05) by
the total number of comparisons made. The results of                   gemm’ a trua’ [=andammo a fare visita] a mia
the test, reported in Table 3, show that two of the de-                suocera, ca [=che] mio suocero è morto (. . . ).
signed features significantly contribute to differentiate
the two groups: a feature related to lemmas’ iteration            It is not excluded that the presence of dialect may also
(COE_RIP_LEM) and the comprehensive feature of cohe-           have influenced the automatic extraction of other co-
sive devices (COE_TOT). The distribution of these features     hesive devices. Indeed, the higher frequency in CG of
is reported in Figure 4.                                       substitution-type reference items (mean=161) and con-
    The application of Bonferroni’s correction caused a        nectives (mean=36.65) compared to PG (ref. mean=146.5,
decrease in the p-value of two initially significant fea-      conn. mean=23.8) contrasts with what has been observed
tures, namely COE_TC and COE_MAXRIP_WORD. Given                in oral production of narrative discourse in cohorts of
the exploratory nature of the experiment, which involves       dementia subjects and healthy controls [8]. Therefore,
the formalisation of new features in order to discriminate     we consider the possibility that automatic feature ex-
subjects with cognitive impairment from healthy con-           traction preceded on manually-checked annotation may
trols in Italian, we have nevertheless chosen to highlight     yield different results than those obtained.
the p-values of these features in 3.                              Nevertheless, the significance of the comprehensive
    We can observe that, compared with the control group,      feature (COE_TOT) indicates that the use of cohesive de-
the speech of dementia subjects is characterized by fewer      vices investigated in this paper plays a role in distin-
repetitions of the same noun, verb and adjective lem-          guishing dementia subjects from healthy controls. In
mas out of the total number of words uttered, captured         Figure 4 it can be noted that COE_TOT shows, on average,
by COE_RIP_LEM. Thus in the dataset emerges that PG            lower values for the PG compared to the CG. This results
group is less prone to lexical iteration of lemmas than        suggests that the linguistic processing of some phenom-
CG. However, if we have a look to the occurrences’ dis-        ena related to cohesion (i.e. substitution-type reference
tributions of the cohesive elements considered, reported       elements, lexical iteration items, and connectives) is gen-
in Table 4, interesting trends could be noticed. Indeed,       erally affected by cognitive decline in semi-spontaneous
the quantitative analysis of lexical repetitions revealed a    speech. Thus, the analysis of discourse properties seems
disparity between repeated lemmas and repeated word            to be a promising path for studying the linguistic charac-
forms of the same grammatical categories (noun, adjec-         terisation of neurodegenerative disorders. Therefore, we
tives and verb) between the two groups. Specifically,          hope that our approach in the future could be applied to
despite the high variability due to subjective differences,    phenomena strictly related to cohesion - first of all, co-
it is observed that in PG, the average repetition of forms     herence - or extend to other domains, such as pragmatics,
(mean=74.15) is higher than the repetition of lemmas           that may mask subtle clues of cognitive frailty.
5. Conclusion                                                        a comparative review, Journal of clinical and exper-
                                                                     imental neuropsychology 30 (2008) 501–556.
In this work, we present a methodology for delineat-             [4] E. A. Stamatakis, M. A. Shafto, G. Williams, P. Tam,
ing linguistic features of cohesion to track and study               L. K. Tyler, White matter changes and word find-
changes in discourse properties in the speech of indi-               ing failures with increasing age, PloS one 6 (2011)
viduals with cognitive impairment compared to healthy                e14496.
peers. The research focused on three types of cohesive           [5] A. E. Budson, N. W. Kowall, The handbook of
devices, i.e., reference, lexical iteration, and connectives,        Alzheimer’s disease and other dementias, John Wi-
that were automatically extracted from a Italian corpus              ley & Sons, 2011.
of semi-spontaneous speech from dementia subjects and            [6] S. O. Orimaye, J. S.-M. Wong, K. J. Golden, Learning
controls, collected in Basilicata. Statistical significance          predictive linguistic features for alzheimer’s disease
for binary discrimination was computed applying the                  and related dementias using verbal utterances, in:
Kolmogorov-Smirnov test, and then adjusting the results              Proceedings of the Workshop on Computational
with Bonferroni’s method. The test shows that a feature              Linguistics and Clinical Psychology: From linguis-
of the repetitions of lemmas and the one related to the              tic signal to clinical reality, 2014, pp. 78–87.
set of cohesive devices jointly considered contribute to         [7] S. Carlomagno, A. Santoro, A. Menditti, M. Pan-
distinguish the two groups. Moreover, the quantitative               dolfi, A. Marini, Referential communication in
distribution of the cohesive devices reveals differences             alzheimer’s type dementia, Cortex 41 (2005) 520–
in the use of elements within the considered categories              534.
between PG and CG, which seem to highlight a general             [8] C. Drummond, G. Coutinho, R. P. Fonseca, N. As-
deterioration in discursive competencies associated with             sunção, A. Teldeschi, R. de Oliveira-Souza, J. Moll,
dementia. The results obtained provide a preliminary ba-             F. Tovar-Moll, P. Mattos, Deficits in narrative dis-
sis for further study of discourse properties in cognitive           course elicited by visual stimuli are already present
decline, with the aim of expanding the set of linguis-               in patients with mild cognitive impairment, Fron-
tic features that can be automatically extracted to other            tiers in aging neuroscience 7 (2015) 96.
levels of language. This expansion is intended to refine         [9] S. Ahmed, A.-M. F. Haigh, C. A. de Jager, P. Garrard,
digital systems that could be employed as support for                Connected speech as a marker of disease progres-
the early diagnosis and monitoring of neurodegenerative              sion in autopsy-proven alzheimer’s disease, Brain
diseases, potentially improving timely interventions for             136 (2013) 3727–3737.
patients and their caregivers.                                  [10] T. Bschor, K.-P. Kühl, F. M. Reischies, Spontaneous
                                                                     speech of patients with dementia of the alzheimer
                                                                     type and mild cognitive impairment, International
CRediT authorship statement                                          psychogeriatrics 13 (2001) 289–298.
declaration                                                     [11] S. De la Fuente Garcia, C. W. Ritchie, S. Luz, Artifi-
                                                                     cial intelligence, speech, and language processing
GA Conceptualization, Methodology, Software (i.e. fea-               approaches to monitoring alzheimer’s disease: a
tures formalization), Formal analysis, Writing (§ 1, 3, 4,           systematic review, Journal of Alzheimer’s Disease
5).                                                                  78 (2020) 1547–1574.
EM Resources (i.e. data collection), Data curation (i.e.        [12] L. Calzà, G. Gagliardi, R. R. Favretti, F. Tamburini,
manual transcription), Writing (§ 2).                                Linguistic features and automatic classifiers for
                                                                     identifying mild cognitive impairment and demen-
References                                                           tia, Computer Speech & Language 65 (2021) 101113.
                                                                [13] D. Beltrami, G. Gagliardi, R. Rossini Favretti, E. Ghi-
 [1] D. American Psychiatric Association, D. American                doni, F. Tamburini, L. Calzà, Speech analysis by
     Psychiatric Association, et al., Diagnostic and statis-         natural language processing techniques: a possible
     tical manual of mental disorders: DSM-5, volume 5,              tool for very early detection of cognitive decline?,
     American psychiatric association Washington, DC,                Frontiers in aging neuroscience 10 (2018) 369.
     2013.                                                      [14] G. Gagliardi, F. Tamburini, Linguistic biomark-
 [2] E. Catricalà, P. A. Della Rosa, V. Plebani, D. Perani,          ers for the detection of mild cognitive impairment,
     P. Garrard, S. F. Cappa, Semantic feature degrada-              Lingue e linguaggio 20 (2021) 3–31.
     tion and naming performance. evidence from neu-            [15] B. S. Kim, Y. B. Kim, H. Kim, Discourse measures
     rodegenerative disorders, Brain and language 147                to differentiate between mild cognitive impairment
     (2015) 58–65.                                                   and healthy aging, Frontiers in aging neuroscience
 [3] V. Taler, N. A. Phillips, Language performance in               11 (2019) 221.
     alzheimer’s disease and mild cognitive impairment:         [16] J. Kim, J. Shim, J. H. Yoon, Subjective rating scale for
     discourse: Evidence from the efficacy of subjective       [28] H. Spinnler, G. Tognoni, Standardizzazione
     rating scale in amnestic mild cognitive impairments,           e taratura italiana di test neuropsicologici:
     Medicine 98 (2019) e14041.                                     gruppo italiano per lo studio neuropsicologico
[17] A. Ferrari, Linguistica del testo, Principi, fenomeni,         dell’invecchiamento, Masson Italia periodici,
     strutture, Roma, Carocci (2014).                               Milano, 1987. Supplementum 8 - Italian journal of
[18] M. A. K. Halliday, R. Hasan, Cohesion in english,              neurological sciences.
     Routledge, 2014.                                          [29] ELAN (version 6.2) [computer software], 2021. URL:
[19] P. Ciurli, P. Marangolo, A. Basso, Esame del Lin-              https://archive.mpi.nl/tla/elan.
     guaggio II. Manuale e materiale d’esame, Giunti,          [30] J. L. Austin, How to do things with words, Claren-
     Firenze, 1996.                                                 don Press, Oxford, 1962.
[20] E. Martinelli, V. Garrammone, F. Mori, I. Nolè,           [31] E. Cresti, M. Moneglia, The illocutionary basis of
     F. Cameriero, M. Martino, G. Di Bello, G. Gagliardi,           information structure: The language into act theory
     DemCorpus-basilicata: Dementia corpus, 2022.                   (l-act), in: E. Adamou, et al. (Eds.), Information
     URL: http://hdl.handle.net/20.500.11752/OPEN-989,              Structure in Lesser-described Languages: Studies
     ILC-CNR for CLARIN-IT repository hosted at Insti-              in prosody and syntax, John Benjamins Publishing
     tute for Computational Linguistics "A. Zampolli",              Company, Amsterdam, 2018, pp. 360–402.
     National Research Council, in Pisa.                       [32] G. Gagliardi, F. Tamburini, The automatic extrac-
[21] E. Martinelli, G. Gagliardi,          Compromissioni           tion of linguistic biomarkers as a viable solution for
     semantico-lessicali nei pazienti italofoni affetti da          the early diagnosis of mental disorders, in: N. Calzo-
     demenza: un’analisi corpus-based, ITALIANO                     lari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. De-
     LINGUADUE 15 (2023) 711–732. doi:10.54103/                     clerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani,
     2037-3597/21986.                                               H. Mazo, J. Odijk, S. Piperidis (Eds.), Proceedings
[22] E. Magni, G. Binetti, A. Bianchetti, R. Rozzini,               of the Thirteenth Language Resources and Evalu-
     M. Trabucchi, Mini-mental state examination: a                 ation Conference, European Language Resources
     normative study in italian elderly population, Eu-             Association, Marseille, France, 2022, pp. 5234–5242.
     ropean Journal of Neurology 3 (1996). URL: https:              URL: https://aclanthology.org/2022.lrec-1.561.
     //api.semanticscholar.org/CorpusID:24843663.              [33] M. Prandi, C. De Santis, Le regole e le scelte, In-
[23] S. Conti, S. Bonazzi, M. Laiacona, M. Masina,                  troduzione alla grammatica italiana, UTET, Torino
     M. V. Coralli, Montreal cognitive assessment                   (2006).
     (moca)-italian version: regression based norms            [34] C. Andorno, Linguistica testuale. Un’introduzione,
     and equivalent scores, Neurological Sciences 36                Carocci, 2003.
     (2015) 209–214. URL: https://api.semanticscholar.         [35] A. Ferrari, L. Zampese, Dalla frase al testo: una
     org/CorpusID:3026657.                                          grammatica per l’italiano, Zanichelli, 2000.
[24] C. Caltagirone, G. Gainotti, G. Carlesimo, L. Par-        [36] M. M. Louwerse, P. M. McCarthy, D. S. McNamara,
     netti, L. Fadda, R. Gallassi, et al., Batteria per la          A. C. Graesser, Variation in language and cohesion
     valutazione del deterioramento mentale (parte I):              across written and spoken registers, in: Proceed-
     descrizione di uno strumento di diagnosi neurop-               ings of the Annual Meeting of the Cognitive Science
     sicologica, Archivio di Psicologia, Neurologia e               Society, volume 26, 2004.
     Psichiatria 56 (1995) 461–470.                            [37] A. Ferrari, Connettivi, Enciclopedia dell’italiano
[25] G. A. Carlesimo, C. Caltagirone, G. Gainotti, et al.,          (2010).
     Batteria per la valutazione del deterioramento men-       [38] A. Feltracco, E. Ježek, B. Magnini, Enriching a lex-
     tale (parte II): standardizzazione e affidabilità di-          icon of discourse connectives with corpus-based
     agnostica nell’identificazione di pazienti affetti da          data, in: Proceedings of the Eleventh International
     sindrome demenziale, Archivio di Psicologia, Neu-              Conference on Language Resources and Evaluation
     rologia e Psichiatria 56 (1995) 471–488.                       (LREC 2018), 2018.
[26] G. Carlesimo, C. Caltagirone, L. Fadda, et al., Batte-    [39] A. Feltracco, E. Jezek, B. Magnini, M. Stede, Lico:
     ria per la valutazione del deterioramento mentale              A lexicon of italian connectives, CLiC it (2016) 141.
     (parte III): analisi dei profili qualitativi di compro-   [40] B. Webber, R. Prasad, A. Lee, A. Joshi, A discourse-
     missione cognitiva, Archivio di Psicologia, Neu-               annotated corpus of conjoined vps, in: Proceedings
     rologia e Psichiatria 56 (1995) 489–502.                       of the 10th Linguistic Annotation Workshop held
[27] G. A. Carlesimo, C. Caltagirone, G. Gainotti, et al.,          in conjunction with ACL 2016 (LAW-X 2016), 2016,
     The mental deterioration battery: Normative data,              pp. 22–31.
     diagnostic reliability and qualitative analyses of cog-   [41] G. Berruto, Sociolinguistica dell’italiano contempo-
     nitive impairment, European Neurology 36 (1996)                raneo, Roma: Carocci (2021).
     378–384.