=Paper= {{Paper |id=Vol-2769/65 |storemode=property |title=The AEREST Reading Database |pdfUrl=https://ceur-ws.org/Vol-2769/paper_65.pdf |volume=Vol-2769 |authors=Marcello Ferro,Sara Giulivi,Claudia Cappa |dblpUrl=https://dblp.org/rec/conf/clic-it/FerroGC20 }} ==The AEREST Reading Database== https://ceur-ws.org/Vol-2769/paper_65.pdf

The AEREST Reading Database

Marcello Ferro Sara Giulivi Claudia Cappa
Istituto di Scuola Professionale Istituto di
Linguistica Computazionale della Svizzera Italiana Fisiologia Clinica
ILC-CNR Pisa, Italy SUPSI Locarno, Switzerland IFC-CNR Pisa, Italy
marcello.ferro@ilc.cnr.it sara.giulivi@supsi.ch claudia.cappa@cnr.it

Abstract defined as ”an individual’s capacity to understand,
use, evaluate, reflect on and engage with texts in
Aerest is a reading assessment protocol for order to achieve one’s goals, develop one’s knowl-
the concurrent evaluation of a child’s de- edge and potential, and participate in society”,
coding and comprehension skills. Reading and as the ”range of cognitive and linguistic com-
data complying with the Aerest protocol petencies, from basic decoding to knowledge of
were automatically collected and struc- words, grammar and the larger linguistic and tex-
tured with the ReadLet web-based plat- tual structures needed for comprehension, as well
form in a pilot study, to form the Aerest as integration of meaning with one’s knowledge
Reading Database. The content, structure about the world” (p.28). Achieving reading liter-
and potential of the database are described acy is crucial for an individuals’ participation in
here, together with the main directions of society and ultimately for their realization in aca-
current and future developments. demic context, in workplace or, more generally, in
life.
Aerest è un protocollo di valutazione della
lettura che misura in parallelo la capacità To achieve reading literacy, pupils need first and
di decodifica e quella di comprensione foremost to be able to read accurately, understand
del testo. Il protocollo è stato appli- what they read, and do this in a reasonably small
cato in uno studio pilota i cui dati sono amount of time. This multifaceted ability is de-
stati raccolti attraverso la piattaforma fined here as “reading efficiency”. Efficient read-
web ReadLet. L’articolo descrive il con- ing implies on its turn, in the subject, the devel-
tenuto, la strutture e le potenzialità del opment of deep comprehension skills. As a mat-
data set risultante, insieme a future di- ter of fact, comprehension is a complex construct
rezioni di sviluppo. that requires coordination and processing of sev-
eral cognitive abilities at word, sentence, and text
level (Perfetti et al., 2005; Padovani, 2006), in-
1 Introduction cluding, but not limited to, building coherent se-
mantic representations of what is being read (Na-
In the PISA 2000 report (OECD, 2003), a distinc-
tion and Snowling, 2000), making lexical and se-
tion is introduced between the concept of “read-
mantic inferences, using reading strategies, acti-
ing literacy” as opposed to “reading”, the lat-
vating metacognitive control (Carretti et al., 2002).
ter being restricted to the ability of decoding
or reading aloud, the former including a much When it comes to assessment, the above de-
wider and more complex range of cognitive and scribed complexity is not given due consideration
meta-cognitive competencies: decoding, vocabu- and is, among other aspects, at the basis of the
lary, grammar, mastery of larger linguistic and tex- inadequacy of most protocols currently available.
tual structures and features, knowledge about the The latter often measure comprehension perfor-
world, but also use of appropriate strategies nec- mance (in a way the ”product” of reading compre-
essary to process a text (p. 23). In the PISA hension) without considering the underlying pro-
2019 report (OECD, 2019) ”reading literacy” is cesses, or treat those processes as if they were in-
dependent, not in interaction with one another. In
Copyright ©2020 for this paper by its authors. Use per-
mitted under Creative Commons License Attribution 4.0 In- addition, reading comprehension tests often tend
ternational (CC BY 4.0). to be used interchangeably, while they actually
measure different skills or processes and are not touch events caused by the interaction of the user
really comparable to one another (Colenbrander et with the touchscreen. At the end of a session, all
al., 2017; Keenan et al., 2008; Cutting and Scar- data are sent to the central repository, ready for
borough, 2006; Calet et al., 2020; Joshi, 2019). post-processing and for further analysis. In the
Finally, most currently available reading assess- listening task, ReadLet provides an audio-player
ment tools fail to focus on reading efficiency, as playing a pre-recorded story. As the user finishes
they normally measure decoding and reading com- reading or listening, a multiple-choice question-
prehension separately. This leads to failure in the naire is presented one question at a time. In an-
identification of kids having difficulties in inte- swering each question, the reader/listener can get
grating the above mentioned abilities. back to the full text or play back the audio-player,
The AEREST protocol for reading assessment and search for relevant information.
was designed and developed to fill this gap, by Captured data are recorded, anonymized, and
testing student skills in three tasks: reading aloud, encrypted locally by the application, and sent to
silent reading, and listening comprehension. In the a remote server: i) the user information along
last two conditions, the student’s comprehension with the session settings; ii) the text disposition
of the text being read is assessed through a ques- and layout on the screen; iii) the audio stream
tionnaire. Only in the reading aloud condition, the (i.e. the user’s voice while reading aloud), iv) the
text can also contain non-words. time-stamped finger interaction during the reading
In 2019, AEREST was tested in schools located task and in filling the questionnaire; v) the tim-
in Southern Tuscany (Italy) and in the Canton of ing of the answers to each question, along with
Ticino (Switzerland), involving a total of 433 chil- possible self-corrections. ReadLet is equipped
dren, from the 3rd grade of the Italian primary with tools for the automated linguistic analysis of
school through to the first grade of the Italian mid- texts. The tools, together with a finger-tracking-
dle school (6th grade). The protocol was automat- to-text alignment module, make it possible to cap-
ically administered using a prototype version of ture the user finger-tracking behaviour (e.g. for-
ReadLet (Ferro et al., 2018a; Ferro et al., 2018b), ward tracking, regressions, tracking pauses) and
a web-based platform that records large streams of the time spent on the text for different text unit
time-aligned, multimodal reading data. levels (page, paragraph, sentence, token, syllable,
morpheme, n-gram, letter) and different linguis-
2 ReadLet tic levels (e.g. morphological, lexical, syntactic).
Furthermore, the ReadLet speech-to-text align-
The ReadLet platform monitors and records a ment module (currently under development) will
user’s behaviour during the execution of various allow the automatic assessment of decoding accu-
reading tasks. It includes a central repository and racy during reading-aloud sessions, by analysing
a set of web applications, background services for hesitations, reading errors, and self-corrections.
pre- and post-processing analysis and query tools.
The ReadLet endpoint is an ordinary tablet run- 3 The AEREST protocol
ning a web application which is responsible for
the administration of the reading protocol. The As already mentioned, the AEREST protocol was
ReadLet app overrides most of the actions taken created to provide teachers and education pro-
by a tablet to respond to typical touch events on the fessionals with an accurate, non-invasive, child-
screen (tapping, scrolling etc.), which is needed to friendly assessment tool that could identify the full
allow a reader to slide across the text displayed range of students with low reading efficiency. Un-
on the touchscreen as one would normally do on a like current protocols, that usually fail to identify
printed text on paper. students who do well in the single abilities under-
The child is asked to read a short story dis- lying reading when assessed one at a time, but
played on the tablet screen either silently or aloud, struggle in the integration of those abilities, the
and to finger-point to the text while reading. The AEREST protocol allows identification of all chil-
story is displayed on the tablet one page at a time dren manifesting difficulties, in so doing favoring
and the child is free to flip the pages back and access to specifically tailored enhancement train-
forth. During each reading session, the audio ing programs for all those who may need them.
stream is recorded along with the time-stamped The AEREST assessment protocol includes three
tasks: 1. Reading comprehension; 2. Listening For each question, the subject can choose
comprehension; 3. Decoding. among four different answers, out of which only
one is correct.
3.1 Reading comprehension Before starting the task, kids are told that they
In order to carry out this task, subjects are pro- have no time limit. Subjects are instructed to read
vided with a tablet, displaying a story that contains the story silently from beginning to end, always
narrative as well as descriptive parts. The texts pointing their finger to the text being read. Once
used for comprehension assessment are based on they reach the end of the story, they are prompted
existing stories written by well-known authors and with 15 comprehension questions. These are dis-
modified by adding or cutting out text, in order to played, one at a time, on the bottom part of the
achieve two main objectives. screen, while the text is available in the top part.
The first objective is to obtain a balanced mix- They can re-read the text, or chunks of it, as many
ture of narrative and descriptive text. In our opin- times as they want, by scrolling up and down the
ion, this reflects more closely the kind of texts we text on the screen.
normally encounter in life, which are hardly ever Analysing the responses to the comprehension
barely descriptive or barely narrative. Keeping this questions, built as described above, allows to un-
separation (as most reading assessment tools actu- derstand which of the processes underlying com-
ally do) would lead, in our opinion, to a less eco- prehension are leveraged by the subject and which
logical way of assessing reading comprehension. ones are not efficient and need support through
The second objective is to obtain a text that specific, personalised training.
would allow assessment of all (or most of) the cog- In order to consider comprehension abilities
nitive processes involved in reading comprehen- independent of decoding skills (that may be
sion (this is usually not found in other assessment weaker in some subjects, for example in kids
tools currently available). This is made possible with dyslexia) the listening comprehension test
through 15 comprehension questions that engage described underneath was included in the proto-
subjects in: col.
1. retrieving the general content of the text; 3.2 Listening comprehension
2. identifying specific information in the text; As with the reading comprehension task, subjects
(who/what/where/when/. . . ). Usually 4 ques- are given a tablet and headphones for story listen-
tions out of 15 concerns this kind of informa- ing. After hearing the whole story for the first
tion;
time, kids start answering comprehension ques-
3. identifying temporal relations; tions one by one, upon hearing them through
4. identifying cause-effect and sequential rela- their headphones and reading them on the tablet’s
tions; screen. In order to reduce the child’s working
memory load, some of the questions are asked
5. making inferences of different kinds;
only after the text passage containing the relevant
6. retrieving information from syntactic struc- information is heard for the second time.
ture (for example understanding if some
event in the story has actually happened or 3.3 Reading aloud
not, based on the verb tenses used by the au-
thor); In this task, children are asked to read aloud sto-
ries with a similar narrative structure. At the end
7. forming mental representations (in general,
subjects are prompted with 4 different images of each story, one of the story characters (typi-
of a character or situation in the story and are cally with some kind of supernatural powers: an
asked to determine which image corresponds alien, a witch, ecc.) starts speaking an unknown
to what they have read); language, which consists of non-words following
8. spotting incongruities and errors; the phonology and morpho-syntax of Italian, and
some Italian function words. We include here an
9. retrieving word meaning from context; example of text used for this task.
10. identifying text register and style;
E come se stesse leggendo su quel vetro,
11. identifying text type. rivelò a Lucilla la ricetta della segretis-
sima pozione: ”Prendi una sirta mellusa creating two black and white images and perform-
e gafala in un tulo. Spisola una rifa e ing a convolution operation over them: the first
lubica una buva. Non zudugnare e non image represents the text disposition on the screen,
tapire le vughe. Quita le puggie, zuba i where each line is rendered as a filled black rectan-
mumini e ralla un tifurno.” gle on a white background; the second represents
the user finger-tracking over time, where each seg-
The administrator takes notes on the subject’s ment between a touch-begin and a touch-end event
errors, hesitations and self-corrections throughout is rendered as a black rectangle on a white back-
the task. Meanwhile, the subject’s performance ground. During the execution of the convolu-
is also recorded by the tablet. In addition, as for tion operation, the vertical and horizontal offsets
the reading comprehension task, children are in- which maximize the overlapping of the black areas
structed to always finger-point to the text being within the two images indicate the optimal align-
read.The child’s reading score is then calculated ment to be taken into account. Such binding al-
taking off 1 point for each spelling error, 0.5 point lows for subsequent modelling and evaluation of
for each word stress error, 0.5 point for each self- the reading dynamic, as well as for measurement
correction. No points or fractions of point are sub- of the reading time at different levels of granular-
tracted for hesitations, as they already have an im- ity: from single letters and syllables through to
pact on reading time. sentences, and whole pages or documents.

4 Data structure 5 Collected Data
Data are stored at different levels. Texts are In 2019, the AEREST protocol was administered
pre-processed with NLP tools (Dell’Orletta et al., to a total of 433 students. A total of 12 narrative
2011) for text tokenization, POS tagging, depen- texts was used, one for each of the four grade lev-
dency parsing, readability analysis, syllabifica- els and the three assessment tasks. Details of par-
tion, n-gram splitting, and, finally, frequency in- ticipants and texts are reported respectively in Ta-
formation by means of a reference corpus. bles 1 and 2.
Session settings are stored to include metadata
Italy Switzerland
such as the administrator identifier, user infor- Grade N Age N Age
mation (a unique identifier, child’s affiliation and 3 78 (13) 8.6 (0.4) 22 (4) 8.8 (0.4)
4 71 (14) 9.6 (0.3) 21 (2) 9.7 (0.5)
grade level, possible annotations), the text being 5 94 (25) 10.6 (0.4) 23 (2) 10.7 (0.4)
read and its layout (e.g. margins, font size and 6 54 (6) 11.5 (0.4) 70 (2) 11.9 (0.4)
family, letter and line spacing), task type (i.e. TOT 297 (58) 10.0 (1.1) 136 (10) 10.9 (1.3)
silent reading, reading aloud, or listening compre-
Table 1: Sample size (number of children with
hension).
disorders between brackets) and mean age (stan-
At the end of each session, all recorded data dard deviation between brackets) of the partici-
are sent to a remote server. Basic data include pants involved in the study, across grades (from
information about the tablet (e.g. the user agent the 3rd to the 6th grade level) and countries (Italy
string, the screen resolution), time-stamps of the and Switzerland).
beginning and end of the reading task and of ques-
tionnaire answering. More detailed data include
the disposition of the text on the tablet screen (i.e. silent aloud listening
coordinates of the bounding box of each letter), Grade words words nonwords words
touchscreen events (i.e. event type, time-stamp, 3 588 177 53 572
4 750 180 74 527
and finger coordinates), the audio stream (sampled 5 951 216 80 941
at 48KHz stereo and compressed in MP3 format at 6 711 352 83 734
128kbps), answers to the questionnaire and their
Table 2: Number of tokens in the texts admin-
timing.
istered during the study, across grades (from the
Post-processing tools enrich stored data of-
3rd to the 6th grade level) and decoding conditions
fline. A finger-tracking-to-text alignment algo-
(silent reading, reading aloud, and listening).
rithm binds touchscreen events over time to the
text layout at the character level. This is done by
Reading Efficiency Plane (REP)
6 Results and discussion

comprehension (normalized questionnaire accuracy)
2

1.8
Tablets proved to be easy to use and well accepted 6 6
1.6 3 3 3
devices, extremely instrumental and accurate for 3
6
3 43 3
6 66 6 666 6
data collection with toddlers and older children 1.4 4 44
5
6 644 6 6
5555 5 3
44 5
(Frank et al., 2016; Semmelmann et al., 2016). 33 6 63 66 3 6
1.2 44 55 45 54 4 4
646 5 6465654434 565 363 3 4 5 4
Tablet data confirmed high standards of ecologi- 1 6 43 4 4 3646 6 4
55 6 5336363546 5 3 6 55
53
cal validity, and a high correspondence with data 3 4 3345
6 5 45 6 36 5 66 5
0.8 66 63 5 6 54
4 44
6 6 45 3 4
3
collected with other, more traditional tools (e.g. 5 6 656545 54 456466 6 64
0.6 4 5 4333 43 5 554 55 3 3
666 6 66 6 6
eye-tracking, see Lio et al. (2019)), and proto- 3 5
6
3534 3
0.4 3 45 5 35
cols. Within the present work, the collected data 5 5
33
allowed for the evaluation of the decoding and 0.2
3
comprehension skills of the children involved in 0
0 0.5 1 1.5 2 2.5 3
the study. For each grade level, Aerest decoding decoding (normalized syllables per second)
performance, expressed in syllables per second,
was shown to be in line with more classical read- Figure 1: Reading Efficiency Plane for the read-
ing assessment reports (Cornoldi et al., 2010), for ing comprehension task (silent reading and com-
both words and non-words. Furthermore, the use prehension questions). The decoding performance
of the finger tracking allowed for the validation on silent reading (expressed as the normalized
of the correlation of the time spent on each word syllables per second) is shown in the horizontal
with basic features such as frequency and length: axis, while the comprehension performance (ex-
statistical analysis with linear mixed-effect models pressed as the normalized questionnaire accuracy)
shows a highly significant correlation (p<0.0001), is shown in the vertical axis. For each grade level
thus confirming the reliability of the adopted tech- group (from the 3rd to the 6th grade level), the two
nique. measures are normalized on the basis of the per-
Decoding and comprehension performance formance of children with typical reading develop-
scores are shown in Fig. 1. Data are normalized ment. Each child is represented by a digit marker
for each grade level group, so that all data groups indicating the grade level. Typically and atypically
can be overlapped on the same plot. Indeed, data developing readers are shown respectively in gray
belonging to each group was divided by the me- and black.
dian value of control children only. In this way
data can be graphically compared, being a value of well as translation and adaptation of the protocol
0.5 equal to half the mean performance of control to languages other than Italian.
children, a value of 1 equal to average behaviour,
The collected data will be assembled in a mul-
and a value of 2 indicates a double outperforming
timodal linguistic resource and made freely avail-
with respect of the average performance.
able to the scientific community.
7 Conclusions and future work
Acknowledgments
The AEREST protocol was shown to be effective
in characterizing the decoding and comprehension This work was supported by the Swiss grant
performance of children of late primary school and ”AEREST: An Ecological Reading Efficiency
early middle school in text reading tasks. Results Screening Tool” (2017-2020) funded by the De-
are clear and encouraging, opening the way to partment of Teaching and Learning of the Uni-
further, more detailed, dynamic, and multimodal versity of Applied Sciences and Arts of Southern
analysis. Completion of the current AEREST pro- Switzerland (SUPSI), and by the Italian project
tocol with a second battery of tests is foreseen in ”(Bio-)computational models of language usage”
the near future. This will provide schools with two (2018-) funded by the Italian National Research
different test batteries, to be used for assessment Council (DUS.AD016.075.004, ILC-CNR).
at the beginning and end of school year, for ad- A special thanks goes to all schools that took
equate monitoring of pupils’ reading and reading part in the study, in particular: Ist. Comprensivo of
comprehension skills. A version of the protocol Manciano-Capalbio (Grosseto, Italy), elementary
conceived for clinical context is also foreseen, as school of Novaggio, (Ticino Switzerland), lower
secondary school of Bedigliora (Ticino, Switzer- R. Malatesha Joshi. 2019. Componential model of
land). reading (cmr): Implications for assessment and in-
struction of literacy problems. In D. A. Kilpatrick,
R. M. Joshi, and R. K. Wagner, editors, Reading
development and difficulties, pages 3–18. Springer,
References Dordrecht (The Netherlands).
Nuria Calet, Rocı́o López-Reyes, and Gracia Jiménez-
Fernández. 2020. Do reading comprehension as- Janice M. Keenan, Rebecca S. Betjemann, and
sessment tests result in the same reading profile? a Richard K. Olson. 2008. Reading comprehension
study of spanish primary school children. Journal of tests vary in the skills they assess: Differential de-
Research in Reading, 43:98–115. pendence on decoding and oral comprehension. Sci-
entific Studies of Reading, 12(3):281–300.
Barbara Carretti, Cesare Cornoldi, and Rossana De
Beni. 2002. Il disturbo specifico di comprensione Guillaume Lio, Roberta Fadda, Giuseppe Doneddu,
del testo scritto. In S. Vicari and M.C. Caselli, edi- Jean-René Duhamel, and Angela Sirigu. 2019.
tors, I disturbi dello sviluppo: neuropsicologia clin- Digit-tracking as a new tactile interface for vi-
ica e ipotesi riabilitative, pages 169–189. Il Mulino, sual perception analysis. Nature Communications,
Bologna. 10(5392):1–13.

Danielle Colenbrander, Lyndsey Nickels, and Saskia Kate Nation and Maggie J. Snowling. 2000. Fac-
Kohnen. 2017. Similar but different: differences tors influencing syntactic awareness skills in nor-
in comprehension diagnosis on the neale analysis of mal readers and poor comprehenders. Applied psy-
reading ability and the york assessment of reading cholinguistics, 21(2):229–241.
for comprehension. Journal of Research in Read-
ing, 40(4):403–419. OECD. 2003. Learners for life. student approaches
to learning. results from PISA 2000. https:
Cesare Cornoldi, Patrizio E. Tressoldi, and Nicoletta //doi.org/10.1787/9789264103917-en,
Perini. 2010. Valutare la rapidità e la correttezza OECD Publishing, Paris.
della lettura di brani. nuove norme e alcune chiari-
ficazioni per l’uso delle prove mt. Dislessia, 7:89– OECD. 2019. Assessment and analytical
101. framework. https://doi.org/10.1787/
b25efab8-en, OECD Publishing, Paris.
Laurie E. Cutting and Hollis S. Scarborough. 2006.
Prediction of reading comprehension: Relative con- Roberto Padovani. 2006. La comprensione del testo
tributions of word recognition, language proficiency, scritto in età scolare. una rassegna sullo sviluppo
and other cognitive skills can depend on how com- normale e atipico. Psicologia clinica dello sviluppo,
prehension is measured. Scientific studies of read- x(3):369–398.
ing, 10(3):277–299.
Charles A. Perfetti, Nicole Landi, and Jane Oakhill.
Felice Dell’Orletta, Simonetta Montemagni, and Giu- 2005. The acquisition of reading comprehension
lia Venturi. 2011. READ–IT: Assessing readabil- skill. In M. J. Snowling and C. Hulme, editors, The
ity of Italian texts with a view to text simplification. science of reading: a handbook, chapter 13, pages
In Proceedings of the second workshop on speech 227–247. Blackwell, Oxford.
and language processing for assistive technologies,
pages 73–83. Kilian Semmelmann, Marisa Nordt, Katharina Som-
mer, Rebecka Röhnke, Luzie Mount, Helen Prüfer,
Marcello Ferro, Claudia Cappa, Sara Giulivi, Claudia Sophia Terwiel, Tobias W Meissner, Kami Kold-
Marzi, Franco Alberto Cardillo, and Vito Pirrelli. ewyn, and Sarah Weigelt. 2016. U Can Touch This:
2018a. ReadLet: an ICT platform for the assessment How Tablets Can Be Used to Study Cognitive De-
of reading efficiency in early graders. page 61, Ed- velopment. Frontiers in psychology, 7:1021, jul.
monton, Alberta (Canada), 25-29 September, 2018.
11th International Conference on the Mental Lexi-
con.
Marcello Ferro, Claudia Cappa, Sara Giulivi, Clau-
dia Marzi, Ouaphae Nahli, Franco Alberto Cardillo,
and Vito Pirrelli. 2018b. Readlet: Reading for
understanding. In 2018 IEEE 5th International
Congress on Information Science and Technology
(CiSt), pages 1–6.
Michael C. Frank, Elise Sugarman, Alexandra C.
Horowitz, Molly L. Lewis, and Daniel Yurovsky.
2016. Using tablets to collect data from young
children. Journal of Cognition and Development,
17(1):1–17.