<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>PRELEARN @ EVALITA 2020: Overview of the Prerequisite Relation Learning Task for Italian</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Chiara Alzetta</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessio Miaschi?</string-name>
          <email>alessio.miaschi@phd.unipi.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Felice Dell'Orletta</string-name>
          <email>felice.dellorletta@ilc.cnr.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Frosina Koceva</string-name>
          <email>frosina.kocevag@edu.unige.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ilaria Torre</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>DIBRIS, Universita` degli Studi di Genova</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>The Prerequisite Relation Learning (PRELEARN) task is the EVALITA 2020 shared task on concept prerequisite learning, which consists of classifying prerequisite relations between pairs of concepts distinguishing between prerequisite pairs and non-prerequisite pairs. Four sub-tasks were defined: two of them define different types of features that participants are allowed to use when training their model, while the other two define the classification scenarios where the proposed models would be tested. In total, 14 runs were submitted by 3 teams comprising 9 total individual participants.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>The present paper provides an overview of the
systems participating to PRELEARN, the first shared
task on automatic prerequisite learning between
educational concepts.</p>
      <p>
        In the past decades we have witnessed a great
revolution in the field of Education: advancement
of technologies drastically transformed the
teaching method and the setting of the learning
process thanks to the raise of e-learning platforms
and electronic educational materials. While so far
they’ve been mainly used in lifelong learning, the
current pandemic situation made very clear that
distant learning is a valuable resource at all
educational levels. This new era in education is
commonly referred to as Education 4.0
        <xref ref-type="bibr" rid="ref12 ref19 ref20">(Saxena et al.,
2017; Hussin, 2018; Salmon, 2019)</xref>
        and its main
novelty is to put students at the core of every
learning activity promoting the mission of fostering
and improving personalisation techniques. While
      </p>
      <p>Copyright c 2020 for this paper by its authors. Use
permitted under Creative Commons License Attribution 4.0
International (CC BY 4.0).
there is still much work to do to develop usable
and scalable personalisation systems, much of the
attention has been devoted to building and testing
the building blocks of such applications.</p>
      <p>The massive use of distance learning platforms
has shed light on the need of developing
intelligent agents able to support both students and
teachers by, e.g., automatically identifying
educational relations between learning concepts.
Educational resources are designed to guide students
through learning paths consisting of concepts
related to each other. Among all pedagogical
relations, prerequisite is the most fundamental since it
establishes which sequence of concepts allows
students to have a full understanding of the domain.
In fact, the order in which concepts are presented
to the learner plays a crucial role in avoiding
student’s frustration and misunderstandings while
approaching a new topic, so teachers are very careful
to organise the content of their learning materials
accordingly and to highlight relevant connections
to their students. Doing this automatically is still
challenging from many perspectives.</p>
      <p>
        The NLP community has tackled automatic
prerequisite learning in the past with the goal of
integrating prerequisite relations in systems for, e.g.,
curriculum planning
        <xref ref-type="bibr" rid="ref2">(Agrawal et al., 2016)</xref>
        ,
reading list generation
        <xref ref-type="bibr" rid="ref11 ref9">(Gordon et al., 2017; Fabbri et
al., 2018)</xref>
        , automatic assessment
        <xref ref-type="bibr" rid="ref2 ref22 ref8">(Wang and Liu,
2016)</xref>
        and automatic educational content creation
        <xref ref-type="bibr" rid="ref15">(Lu et al., 2019)</xref>
        . Wikipedia is rightfully
considered a rich and freely available resource for
training and testing educational applications, and
this is also true in the case of prerequisite
learning systems, which are often evaluated against
manually annotated prerequisite relations between
Wikipedia pages
        <xref ref-type="bibr" rid="ref10 ref21 ref23">(Talukdar and Cohen, 2012;
Gasparetti et al., 2018; Zhou and Xiao, 2019)</xref>
        .
      </p>
      <p>
        Based on the works available in the
literature, we distinguish prerequisite learning systems
in two main categories: 1) those based on
relational metrics and 2) those on machine
learning approaches. Relational metrics are designed
to capture the strength of the relation between
co-occurring concepts and identify pairs of
concepts obtaining low values as non-prerequisites.
The RefD metric
        <xref ref-type="bibr" rid="ref13">(Liang et al., 2015)</xref>
        is possibly
the most popular and measures how differently
two concepts refer to each other considering the
Wikipedia links of the pages associated with the
concepts of the pair. Prerequisite concept
learning from textbook concepts is addressed in Adorni
et al. (2019), which presents a method based on
burst analysis combined with temporal
reasoning to identify possible propaedeutic relations and
compare it with a concept co-occurrence metric.
Among machine learning approaches, we
distinguish between those that exploited link-based
features (e.g.
        <xref ref-type="bibr" rid="ref10 ref13">(Liang et al., 2015; Gasparetti et al.,
2018)</xref>
        ), text-based features only (e.g.
        <xref ref-type="bibr" rid="ref1 ref16 ref16 ref3 ref3">(Miaschi et
al., 2019; Alzetta et al., 2019)</xref>
        ), or a combination
of the two
        <xref ref-type="bibr" rid="ref14">(Liang et al., 2018)</xref>
        .
      </p>
      <p>
        Unfortunately, the results obtained by those
systems are not directly comparable: their approaches
are based on different assumptions of what a
concept is and which are the distinctive features
for a prerequisite relation. Moreover, knowledge
structures defined by domain experts are not
always easily available or are missing for some
domains. With PRELEARN, we are proposing the
first shared task on automatic prerequisite
learning, at least to the best of our knowledge.
Located in the context of EVALITA 2020
evaluation campaign
        <xref ref-type="bibr" rid="ref6">(Basile et al., 2020)</xref>
        , the task
challenges participants to develop prerequisite
learning systems that can exploit either only
information derived from textual educational resources or
that can combine those information with
structural properties of knowledge structure. We aim
to compare the performances of systems based on
these two different approaches and verify if they
can obtain similar results or, conversely, one
strategy is far better performing than the other. The
goal of PRELEARN shared task is not only to
offer a setting where different approaches and
systems can be directly compared, but also to gather
the research teams working on automatic
prerequisite learning, which is distributed and doesn’t have
dedicated venues, and possibly fostering
collaborations within the community. More broadly, we
expect the outcomes of the task to be relevant to
the wider information extraction and knowledge
structure construction communities, as it offers the
opportunity to test which information – either
textual or extracted from a knowledge structure – are
more effective for retrieving pedagogical relations
in educational data.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Task Description</title>
      <p>PRELEARN (Prerequisite Relation Learning) is a
shared task on concept prerequisite learning which
consists of classifying prerequisite relations
between pairs of concepts. This is the first time, to
the best or our knowledge, that automatic
prerequisite learning is addressed in a shared task.
PRELEARN challenges participants to test their
models for automatic prerequisite learning on four
different domains and four training scenarios.
2.1</p>
      <sec id="sec-2-1">
        <title>Problem Formulation</title>
        <p>For the purposes of this task, prerequisite
relations learning is proposed as a binary
classification problem of concept pairs: given a pair of
concepts (A, B), we ask to predict whether or not
concept B is a prerequisite of concept A. We define a
“concept” as single or multi word domain terms
corresponding to the title of a page on the
Italian Wikipedia: Prodotto scalare and Aritmetica
are both concepts of the precalculus domain and
are also the titles of two Italian Wikipedia pages.
Prerequisite relations instead are dependency
relations that naturally occur between educational
concepts determining their learning precedence.</p>
        <p>Consider the knowledge structure proposed as
an example in Figure 1. Here, nodes represent
concepts while links identify the prerequisite
relations that connect them. According to the graph,
“Aritmetica” is a prerequisite of “Potenza” since,
if a student wants to understand what “Potenza”
is, he/she has to know “Aritmetica” first. Hence,
we formally define a prerequisite relation as a
relation connecting a target and a prerequisite
concept if the second has to be known in order to
understand the first. In other words, the Wikipedia
page of the prerequisite concept contains the prior
knowledge required to understand the content of
the Wikipedia page of the target concept.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Task Settings</title>
        <p>We defined four sub-tasks for addressing
automatic concepts prerequisite learning: two of them
concern the model used by participants for
tackling the task, the other two distinguish different
classification scenarios where the proposed model
can be tested. In order to make a valid submission,
we asked participants to submit at least one model
complying with at least one of these settings:
i) Raw features setting (RF): a model that acquires
information only from raw text (e.g. textual
content of the Wikipedia pages offered as training
set, corpora for acquiring distributional
representations, etc.);
ii) Raw and structured features setting (RnS): a
model that can rely both on raw text and
structured information (e.g. Wikipedia graph structure
of a domain and metadata of a Wikipedia page,
DBpedia, page hierarchical structure in terms of
sections and paragraphs, etc.).</p>
        <p>Each submitted model was tested in two
evaluation scenarios, defined as follows:
i) In-domain scenario: the model(s) can be trained
on data belonging to any domain, including the
one appearing in the test set;
ii) Cross-domain scenario: the model(s) can be
trained on data belonging to any domain but the
domain of the test set.</p>
        <p>Overall, we defined a total of four sub-tasks:
1) RF setting in an in–domain scenario;
2) RF setting in a cross–domain scenario;
3) RnS setting in an in–domain scenario;
4) RnS setting in an cross–domain scenario.</p>
        <p>
          Only few work in the literature test their
systems in a cross-domain scenario: our previous
attempts in this direction
          <xref ref-type="bibr" rid="ref16 ref3">(Miaschi et al., 2019)</xref>
          highlighted some issues in transferring the information
acquired from one domain to an unknown one. At
the same time, although the two proposed settings
correspond to the most widely used approaches
for automatic prerequisite learning, systems only
rarely rely on textual information only, and when
they do performances are generally worse than
those obtained by exploiting structural
information extracted from knowledge bases. This makes,
in our view, the RF setting tested in the
crossdomain scenario the most challenging sub-task.
2.3
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Evaluation</title>
        <p>Metrics. Evaluation of participants’ systems
outputs was carried out on four balanced datasets,
one for each domain, used for both in– and cross–
domain evaluation. The size of the test sets is
reported in Table 1. Each sub-task (i.e. each
model on each scenario) was evaluated
independently from the others by using standard metrics,
such as Accuracy (A), Precision (P ), Recall (R)
and F1-score (F1). Since the test sets are balanced,
we used Accuracy metric to rank participants’
submitted runs.</p>
        <p>Baseline. We used for all settings a linear SVM
classifier trained using two binary features
capturing the presence of a mention of concept B/A
in the text of the Wikipedia page of concept A/B.
Each feature returns 1 if the name of concept B/A
is mentioned in the text of the Wikipedia page of
concept A/B, while it returns 0 otherwise.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Data</title>
      <p>
        We relied on ITA-PREREQ dataset
        <xref ref-type="bibr" rid="ref16 ref3">(Miaschi et al.,
2019)</xref>
        , a dataset annotated with prerequisite
relations between pairs of concepts in Italian. The
dataset was built upon the AL-CPL dataset
        <xref ref-type="bibr" rid="ref14">(Liang
et al., 2018)</xref>
        , a collection of binary-labelled
concept pairs extracted from textbooks on four
domains: data mining, geometry, physics and
precalculus. In AL-CPL, for each domain, the
authors extracted the relevant terms from the
textbook: those appearing in the title of a English
Wikipedia page were promoted as domain
concepts and matched with their corresponding page.
Finally, domain experts were asked to manually
annotate the presence of absence o a
prerequisite relation between all concept pairs. The
final dataset consists of both positive and negative
concept pairs that can be represented as a concept
map, a specific type of knowledge graph where
each node is a scientific concept and edges
represent pedagogical relations.
      </p>
      <p>
        The construction of ITA-PREREQ was carried
out as follows, as described in
        <xref ref-type="bibr" rid="ref16 ref3">(Miaschi et al.,
2019)</xref>
        . First, we took the Italian version of the
Wikipedia pages considered for AL-CPL,
excluding from the dataset those concepts (and the
relations where they are involved) for which an
Italian page was not available. Then, we mapped
both positive and negative relations between pairs
of the remaining concepts from AL-CPL to
ITAPREREQ. As in AL-CPL, ITA-PREREQ dataset
was expanded by creating irreflexive relations (add
(B, A) as a negative sample if (A, B) is a positive
sample) and transitive pairs (add (A, C) if both (A,
B) and (B, C) are positive sample). In summary,
ITA-PREREQ consists of pairs of concepts (A, B),
labelled as follows: 1 if B is a prerequisite of A and
0 in all other cases. It was not allowed to use any
sort of prerequisite-labelled data apart from
ITAPREREQ dataset provided by task organisers as
official training set.
3.1
      </p>
      <sec id="sec-3-1">
        <title>Format</title>
        <p>PRELEARN participants were provided, upon
request, with five files: a “concept pairs file” for
each of the four domains containing the labelled
concept pairs and one “Wikipedia pages file”
containing the raw text and the link of the Wikipedia
pages referring to the concepts appearing in the
dataset. Here’s an example of the pairs contained
in the “concept pairs file”:
Riflessione interna totale,Luce,1
Plasticita’ (fisica),Durezza,0
...</p>
        <p>Campo magnetico,Magnete,1
1https://github.com/attardi/wikiextractor
dataset and overall. The number of concepts and
pairs varies for each domain: while Geometry and
Data Mining have a comparable amount of
concepts, the latter shows a significantly smaller
number of labelled pairs. It is interesting to note that,
although not being the richer domain in terms of
concepts, Physics shows the higher number of
relations. As can be noted, regardless of the domain
the dataset is strongly unbalanced since the
majority of concept pairs do not show a prerequisite
relation (Non-PR Pairs). For each domain we split
the pairs into a portion of training and a portion
of test data. For the test portion, we defined a
fixed number of pairs to include (i.e. 200 pairs),
with the exception of Data Mining where, given
the limited number of total pairs, we included only
99 pairs. The distribution of prerequisite and
nonprerequisite labels was balanced (50/50) for each
domain only in the test datasets.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Participants</title>
      <p>Following a call for interest, 16 teams registered
for the task and thus obtained the training data.
Eventually, three teams submitted their
predictions, for a total of 14 runs, each executed on all
four domains of the dataset. Two teams
participated in all four sub-tasks while one team
submitted results only for the two sub-tasks involving the
RF setting. A summary of participants is provided
in Table 2.</p>
      <sec id="sec-4-1">
        <title>4.1 Submitted Systems</title>
        <p>
          NLP-CIC
          <xref ref-type="bibr" rid="ref4">(Angel et al., 2020)</xref>
          presented three
different systems trained on both hand-crafted and
embedding-based features. In particular, the team
developed one model for the RF setting and two
models for the RnS setting. Concerning the RF
setting, the submitted model corresponds to a
single layer Neural Network trained using concept
pairs representations extracted from a BERT
Italian model2 fine-tuned on the training datasets.
With respect to the RnS setting, the two submitted
models are quite similar and differ only for one
feature. The first model (Complex) is based on
a tree-ensemble learner and trained it using a set
of complexity-based features based on those
defined by Aroyehun et al. (2018) combined with a
feature capturing concept view frequency, i.e. the
daily average of unique visits to the concept page
by Wikipedia users (including editors, anonymous
2https://huggingface.co/dbmdz/bert-base-italian-cased
        </p>
        <p>PR Pairs
159 (30.40%)
432 (24.71%)
415 (17.14%)
508 (26.51%)
1,514 (22.91%)
editors and readers) over the last year. The
second model (Complex+wd) is an improved version
of the first one: it takes as input the same set
of features along with the Wiki-data embedding
of each concept appearing in the concept pairs of
ITA-PREREQ dataset.</p>
        <p>
          B4DS
          <xref ref-type="bibr" rid="ref18">(Puccetti et al., 2020)</xref>
          presented two
different classification models, one based on
XGBoost
          <xref ref-type="bibr" rid="ref2 ref22 ref8">(Chen and Guestrin, 2016)</xref>
          classifier and
one based on a Gated Recurrent Unit (GRU)
model. The first classifier, Model 1, was trained
using a combination of lexical and hand-crafted
features. Specifically, lexical features were
computed by averaging 300-dimensions pretrained
word2vec embeddings
          <xref ref-type="bibr" rid="ref7">(Berardi et al., 2015)</xref>
          of
title A and B respectively, with A and B being the
two concepts involved in a pair. The set of 14
hand-crafted text-based features, inspired by
Miaschi et al. (2019), are extracted for each pair of
the datasets and aim at capturing mentions and
lexical similarity between the two pages associated
with the concepts in the pair. The second classifier
(Model 2) was trained with a GRU model
(hidden size=8, encoding size=32, learning rate=0.01)
that takes as input the first 400 words of each
Wikipedia page of the (A, B) pair. The output was
computed with a linear layer that takes the
concatenation of the two learned vectors.
        </p>
        <p>
          UNIGE SE
          <xref ref-type="bibr" rid="ref17 ref18 ref4">(Moggio and Parizzi, 2020)</xref>
          proposed a classifier based on a two-dense-layers
Neural Network trained using a set of features
automatically extracted from the Wikipedia pages
associated with the concepts appearing in
ITAPREREQ dataset. In particular, the RF model was
trained exploiting features that capture concepts
co-occurrence and the lexical similarity between
the pages referring to the concepts of a pair. On
the other hand, the RnS model is trained
combining the previous set of features with information
based on the hyperlink and category structure of
Wikipedia.
5
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Results</title>
      <p>In this section we provide both a discussion of the
approaches and an analysis of the results reported
in Tables 3 and 4.</p>
      <p>Participants experimented with more classical
machine learning algorithm as well as with
Neural Networks (NN): we received results computed
exploiting 7 different systems, 4 trained using only
raw text features (RF setting) and 3 exploiting also
structural information (RnS setting).
Considering their average performances across all four
domains, all systems outperformed the baseline. In
this Section, we describe the results obtained by
the submitted models and compare their
performances on the official test set based on their
average accuracy scores over the four domains
(column AVG in the Tables).
5.1</p>
      <sec id="sec-5-1">
        <title>Comparing Scenarios</title>
        <p>
          In–Domain Scenario. As shown in Table 3,
overall the model showing the best performances
is Italian BERT, achieving an average accuracy
score of 0.887 in the RF setting. Such result
is not surprising if we consider the
state-of-theart performances obtained by recent Neural
Language Models in the resolution of downstream
NLP tasks. However, results obtained by BERT
show only a small gap with respect to some of
the other models. For instance, B4DS’ Model
1, exploiting a decision tree based on XGBoost
framework and trained using both word
embedding and handcrafted features, achieved 0.866
accuracy thus gaining the second place in the in–
domain scenario. Similar competitive results are
obtained by the Complex+wd model submitted by
NLP-CIC team: this model combines Wiki-data
embedding of each concept with a set of manually
defined features that measure concept
complexity and were designed to solve the task of
complex word identification
          <xref ref-type="bibr" rid="ref5">(Aroyehun et al., 2018)</xref>
          .
B4DS team submitted also a more sophisticated
model (i.e. a GRU-based classifier) trained using
only Word2vec embeddings with no other
handcrafted features. Considering the results,
combining lexical features, like word embeddings, with
handcrafted features allows to achieve better
performances regardless of the model employed for
classification, while using these two types of
features independently seems a worse strategy. As
proof, B4DS’ Model 2, despite being more
sophisticated, achieved lower scores than Model 1. The
fact that these models obtained similar results
suggests that automatic prerequisite learning is more
affected by predictors rather than the model used
for classification.
        </p>
        <p>Among submitted systems, only three didn’t
exploit word embeddings: NLP-CIC team submitted
a tree-ensemble learner trained using only
complexity features, and UNIGE SE team used two
versions of a two-layer NN trained with different
sets of handcrafted features to comply with
settings requirements. The results obtained by these
models provide some interesting insights on the
role of raw and structural features for solving the
task. First, we observe that exploiting raw
textual features based on lexical similarity and topic
modelling (UNIGE SE NN in the RF setting) only
slightly outperforms the baseline, thus, when no
lexical features are available, it seems more
useful to rely on structural information. Anyways,
complexity-based features exploited by NLP-CIC
are more informative for prerequisite learning task
than Wikipedia category and link structure. The
intuition behind the NLP-CIC team approach is
that less complex concepts are prerequisite for the
more complex ones and, considering that the
results are only slightly below those obtained using
word embeddings, the intuition that complexity is
involved in the process of defining prerequisite
sequences seems confirmed.</p>
      </sec>
      <sec id="sec-5-2">
        <title>Cross–Domain Scenario. Moving to the cross–</title>
        <p>domain evaluation scenario (see Table 4), we
observe only small variations in the ranking of the
submitted systems. In spite of this, we also
observe a consistent drop of the accuracies obtained
by the submitted systems.</p>
        <p>Considering again the average accuracy scores,
BERT model proved to be the best performing
model also in this scenario. Interestingly, this
time NLP CIC’s Complex+wd model outperforms
B4DS’s Model 1: both models are trained
using both word embeddings and handcrafted
features, with the latter being more useful
possibly because capturing domain independent
properties. The different performances of the two
systems could be again due to the higher
effectiveness of complexity-based features for
identifying prerequisite relations. Consequently, these
results suggest that, unlike the in-domain
scenario, lexical information are not enough to
identify prerequisite relations. Nevertheless, lexical
features proved somehow useful since using
handcrafted features only, as in the case of Complex
NLP-CIC model and the NN models submitted
by UNIGE SE team, is outperformed by B4DS’s
Model 2 (based solely on word embeddings).
5.2</p>
      </sec>
      <sec id="sec-5-3">
        <title>Domains Impact</title>
        <p>Focusing on the differences between the four
domains, we observe that for almost all submitted
systems the results obtained on concept pairs
belonging to the Data Mining domain are lower than
the others. This is especially true for the cross–
domain scenario and seems to corroborate what
was already stated in Miaschi et al. (2019), namely
that Data Mining is a relatively new and more
specialised topic that presents shorter pages and,
therefore, that contains less clear prerequisite
relationships. Nevertheless, the model submitted by
the UNIGE SE team for the RF setting achieved
the lowest results when tested on concept pairs
belonging to the Physics domain.</p>
        <p>With the exception of the UNIGE SE’s RF
model in the cross–domain setting, all systems
achieved best (and similar) results when
classifying Geometry and Precalculus concepts pairs.
This might be due to the fact that these two
domains are more fundamental and broad subjects
and, therefore, present more clear learning
dependencies expressed through Wikipedia.
Furthermore, since Geometry and Precalculus share more
lexicon that the others, we believe that the models
can take advantage of this overlap to better
classify concept pairs, especially for the cross–domain
evaluation setting.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>Automatic prerequisite learning was for the first
time the focus of a dedicated shared task. In
particular, PRELEARN task was aimed at
comparing the performances of different approaches
and models tested within and across the four
domains of ITA-PREREQ dataset. Although the
results of 14 submitted runs were all above
baseline, we observe several differences within the
proposed settings and across domains. In particular,
results suggests that automatic prerequisite
learning is more affected by the predictors rather than
by the classification models. Results also confirm
that the RF cross–domain setting is the most
challenging scenario. Nevertheless, BERT achieved
best scores in both RF settings, also outperforming
models trained with structural features extracted
from the knowledge structure of Wikipedia.</p>
      <p>For the future, it would be interesting to test
the impact of hand-crafted features combined with
a contextual language model such BERT and,
considering the effectiveness of complexity–based
features, explore the contribution of predictors
encoding text readability properties in prerequisite
learning systems.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Giovanni</given-names>
            <surname>Adorni</surname>
          </string-name>
          , Chiara Alzetta, Frosina Koceva, Samuele Passalacqua, and
          <string-name>
            <given-names>Ilaria</given-names>
            <surname>Torre</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Towards the identification of propaedeutic relations in textbooks</article-title>
          .
          <source>In International Conference on Artificial Intelligence in Education (AIED)</source>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Rakesh</given-names>
            <surname>Agrawal</surname>
          </string-name>
          , Behzad Golshan, and
          <string-name>
            <given-names>Evangelos</given-names>
            <surname>Papalexakis</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Toward data-driven design of educational courses: A feasibility study</article-title>
          .
          <source>Journal of Educational Data Mining</source>
          ,
          <volume>8</volume>
          (
          <issue>1</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>21</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Chiara</given-names>
            <surname>Alzetta</surname>
          </string-name>
          , Alessio Miaschi, Giovanni Adorni, Felice Dell'Orletta, Frosina Koceva, Samuele Passalacqua, and
          <string-name>
            <given-names>Ilaria</given-names>
            <surname>Torre</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Prerequisite or not prerequisite? that's the problem! an nlp-based approach for concept prerequisites learning</article-title>
          .
          <source>In 6th Italian Conference on Computational Linguistics</source>
          , CLiC-it
          <year>2019</year>
          , volume
          <volume>2481</volume>
          .
          <article-title>CEUR-WS.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Jason</given-names>
            <surname>Angel</surname>
          </string-name>
          , Segun Taofeek Aroyehun, and
          <string-name>
            <given-names>Alexander</given-names>
            <surname>Gelbukh</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Nlp-cic @ prelearn: Mastering prerequisites relations, from handcrafted features to embeddings</article-title>
          . In Valerio Basile, Danilo Croce, Maria Di Maro, and Lucia C. Passaro, editors,
          <source>Proceedings of Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2020</year>
          ),
          <article-title>Online</article-title>
          . CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Segun</given-names>
            <surname>Taofeek</surname>
          </string-name>
          <string-name>
            <surname>Aroyehun</surname>
          </string-name>
          , Jason Angel, Daniel Alejandro Pe´rez Alvarez, and
          <string-name>
            <given-names>Alexander</given-names>
            <surname>Gelbukh</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Complex word identification: Convolutional neural network vs. feature engineering</article-title>
          .
          <source>In Proceedings of the thirteenth workshop on innovative use of NLP for building educational applications</source>
          , pages
          <fpage>322</fpage>
          -
          <lpage>327</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Valerio</given-names>
            <surname>Basile</surname>
          </string-name>
          , Danilo Croce, Maria Di Maro, and
          <string-name>
            <surname>Lucia</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Passaro</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Evalita 2020: Overview of the 7th evaluation campaign of natural language processing and speech tools for italian</article-title>
          .
          <source>In Valerio Basile</source>
          , Danilo Croce, Maria Di Maro, and Lucia C. Passaro, editors,
          <source>Proceedings of Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2020</year>
          ),
          <article-title>Online</article-title>
          . CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Giacomo</given-names>
            <surname>Berardi</surname>
          </string-name>
          , Andrea Esuli, and Diego Marcheggiani.
          <year>2015</year>
          .
          <article-title>Word embeddings go to italy: A comparison of models and training datasets</article-title>
          .
          <source>In IIR.</source>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Tianqi</given-names>
            <surname>Chen</surname>
          </string-name>
          and
          <string-name>
            <given-names>Carlos</given-names>
            <surname>Guestrin</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Xgboost: A scalable tree boosting system</article-title>
          .
          <source>In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining</source>
          , pages
          <fpage>785</fpage>
          -
          <lpage>794</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Alexander</given-names>
            <surname>Fabbri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Irene</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Prawat</given-names>
            <surname>Trairatvorakul</surname>
          </string-name>
          , Yijiao He, Weitai Ting, Robert Tung, Caitlin Westerfield, and
          <string-name>
            <given-names>Dragomir</given-names>
            <surname>Radev</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>TutorialBank: A manually-collected corpus for prerequisite chains, survey extraction and resource recommendation</article-title>
          .
          <source>In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</source>
          , pages
          <fpage>611</fpage>
          -
          <lpage>620</lpage>
          , Melbourne, Australia, July. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Fabio</given-names>
            <surname>Gasparetti</surname>
          </string-name>
          , Carlo De Medio, Carla Limongelli, Filippo Sciarrone, and
          <string-name>
            <given-names>Marco</given-names>
            <surname>Temperini</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Prerequisites between learning objects: Automatic extraction based on a machine learning approach</article-title>
          .
          <source>Telematics and Informatics</source>
          ,
          <volume>35</volume>
          (
          <issue>3</issue>
          ):
          <fpage>595</fpage>
          -
          <lpage>610</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Jonathan</given-names>
            <surname>Gordon</surname>
          </string-name>
          , Stephen Aguilar, Emily Sheng, and
          <string-name>
            <given-names>Gully</given-names>
            <surname>Burns</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Structured generation of technical reading lists</article-title>
          .
          <source>In Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications</source>
          , pages
          <fpage>261</fpage>
          -
          <lpage>270</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Anealka</given-names>
            <surname>Aziz Hussin</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Education 4.0 made simple: Ideas for teaching</article-title>
          .
          <source>International Journal of Education and Literacy Studies</source>
          ,
          <volume>6</volume>
          (
          <issue>3</issue>
          ):
          <fpage>92</fpage>
          -
          <lpage>98</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Chen</given-names>
            <surname>Liang</surname>
          </string-name>
          , Zhaohui Wu, Wenyi Huang, and
          <string-name>
            <given-names>C Lee</given-names>
            <surname>Giles</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Measuring prerequisite relations among concepts</article-title>
          .
          <source>In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing</source>
          , pages
          <fpage>1668</fpage>
          -
          <lpage>1674</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Chen</given-names>
            <surname>Liang</surname>
          </string-name>
          , Jianbo Ye, Shuting Wang,
          <string-name>
            <surname>Bart Pursel</surname>
            , and
            <given-names>C Lee</given-names>
          </string-name>
          <string-name>
            <surname>Giles</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Investigating active learning for concept prerequisite learning</article-title>
          .
          <source>In Thirty-Second AAAI Conference on Artificial Intelligence.</source>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Weiming</given-names>
            <surname>Lu</surname>
          </string-name>
          , Pengkun Ma, Jiale Yu,
          <string-name>
            <surname>Yangfan Zhou</surname>
            , and
            <given-names>Baogang</given-names>
          </string-name>
          <string-name>
            <surname>Wei</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Metro maps for efficient knowledge learning by summarizing massive electronic textbooks</article-title>
          .
          <source>International Journal on Document Analysis and Recognition (IJDAR)</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>13</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Alessio</given-names>
            <surname>Miaschi</surname>
          </string-name>
          , Chiara Alzetta,
          <source>Franco Alberto Cardillo, and Felice Dell'Orletta</source>
          .
          <year>2019</year>
          .
          <article-title>Linguistically-driven strategy for concept prerequisites learning on italian</article-title>
          .
          <source>In Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications</source>
          , pages
          <fpage>285</fpage>
          -
          <lpage>295</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Alessio</given-names>
            <surname>Moggio</surname>
          </string-name>
          and
          <string-name>
            <given-names>Andrea</given-names>
            <surname>Parizzi</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Unige se @ prelearn: Utility for automatic prerequisite learning from italian wikipedia</article-title>
          . In Valerio Basile, Danilo Croce, Maria Di Maro, and Lucia C. Passaro, editors,
          <source>Proceedings of Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2020</year>
          ),
          <article-title>Online</article-title>
          . CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>Giovanni</given-names>
            <surname>Puccetti</surname>
          </string-name>
          , Luis Bolanos, Filippo Chiarello, and
          <string-name>
            <given-names>Gualtiero</given-names>
            <surname>Fantoni</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>B4ds @ prelearn: Ensemble method for prerequisite learning</article-title>
          .
          <source>In Valerio Basile</source>
          , Danilo Croce, Maria Di Maro, and Lucia C. Passaro, editors,
          <source>Proceedings of Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2020</year>
          ),
          <article-title>Online</article-title>
          . CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <given-names>Gilly</given-names>
            <surname>Salmon</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>May the fourth be with you: Creating education 4.0</article-title>
          .
          <source>Journal of Learning for Development-JL4D</source>
          ,
          <volume>6</volume>
          (
          <issue>2</issue>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <given-names>Rajan</given-names>
            <surname>Saxena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Vinod</given-names>
            <surname>Bhat</surname>
          </string-name>
          , and
          <string-name>
            <given-names>A</given-names>
            <surname>Jhingan</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Leapfrogging to education 4.0: Student at the core</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <given-names>Partha</given-names>
            <surname>Pratim Talukdar</surname>
          </string-name>
          and William W Cohen.
          <year>2012</year>
          .
          <article-title>Crowdsourced comprehension: predicting prerequisite structure in wikipedia</article-title>
          .
          <source>In Proceedings of the Seventh Workshop on Building Educational Applications Using NLP</source>
          , pages
          <fpage>307</fpage>
          -
          <lpage>315</lpage>
          . Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <given-names>Shuting</given-names>
            <surname>Wang</surname>
          </string-name>
          and
          <string-name>
            <given-names>Lei</given-names>
            <surname>Liu</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Prerequisite concept maps extraction for automatic assessment</article-title>
          .
          <source>In Proceedings of the 25th International Conference Companion on World Wide Web</source>
          , pages
          <fpage>519</fpage>
          -
          <lpage>521</lpage>
          . International World Wide Web Conferences Steering Committee.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <given-names>Yang</given-names>
            <surname>Zhou</surname>
          </string-name>
          and
          <string-name>
            <given-names>Kui</given-names>
            <surname>Xiao</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Extracting prerequisite relations among concepts in wikipedia</article-title>
          .
          <source>In 2019 International Joint Conference on Neural Networks (IJCNN)</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          . IEEE.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>