<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>UNIGE SE @ PRELEARN: Utility for Automatic Prerequisite Learning from Italian Wikipedia</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alessio Moggio</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Parizzi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>DIBRIS, Universita` degli studi di Genova</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>The present paper describes the approach proposed by the UNIGE SE team to tackle the EVALITA 2020 shared task on Prerequisite Relation Learning (PRELEARN). We developed a neural network classifier that exploits features extracted both from raw text and the structure of the Wikipedia pages provided by task organisers as training sets. We participated in all four subtasks proposed by task organizers: the neural network was trained on different sets of features for each of the two training settings (i.e., raw and structured features) and evaluated in all proposed scenarios (i.e. in- and cross- domain). When evaluated on the official test sets, the system was able to get improvements compared to the provided baselines, even though it ranked third (out of three participants). This contribution also describes the interface we developed to compare multiple runs of our models. 1</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>Prerequisite relations constitute an essential
relation between educational items since they express
the order in which concepts should be learned by
a student in order to allow a full understanding of
a topic. Therefore, automatic prerequisite
learning is a relevant task for the development of many
educational applications.</p>
      <p>
        Prerequisite Relation Learning (PRELEARN)
        <xref ref-type="bibr" rid="ref1">(Alzetta et al., 2020)</xref>
        , a shared task organized
within EVALITA 2020, the 7th evaluation
campaign of Natural Language Processing and Speech
tools for Italian
        <xref ref-type="bibr" rid="ref2">(Basile et al., 2020)</xref>
        , has, as a
pur1Copyright c 2020 for this paper by its authors. Use
permitted under Creative Commons License Attribution 4.0
International (CC BY 4.0).
pose, automatic prerequisite relation learning
between pairs of concepts. For the purposes of the
shared tasks, concepts are represented as learning
materials written in Italian. In particular, each
concept corresponds to a page of the Italian Wikipedia
having the concept name as title. The goal of
the shared task is to build a system able to
automatically identify the presence or absence of a
prerequisite relation between two given concepts.
The task is divided in four sub-tasks: specifically
in order to make a valid submission participants
are asked to build at least one model for
automatic prerequisite learning to be tested both in
inand cross-domain scenario since task organisers
released four official training sets, one for each
domain of the dataset. Concerning the model, it
can exploit either 1) information extracted from
the raw textual content of Wikipedia pages, 2)
information acquired from any kind of structured
knowledge resource (excluding the prerequisite
labelled datasets). Eventually, we submitted our
results on the official test sets for all four proposed
subtasks. To tackle the problem proposed in the
shared task, we propose an approach based on
deep learning to classify on different sets of
features in order to comply with the sub-tasks
requirements. We also developed a user interface
to support the comparison between the results
obtained running the model trained using different
sets of features. Other than selecting which
features should be used to train the model, the user
can exploit the interface to define the value of a
set of parameters in order to customize the
classifier structure. The interface reports, for each run,
standard evaluation metrics (i.e., accuracy,
precision, recall and F-score) and other statistics that
allow to explore the model performances.
      </p>
      <p>The remainder of the paper is organised as
follows: we present our approach and system in
Section 2, then we discuss the results and evaluation
(Section 3). Section 4 describes the interface in
detail. We conclude the paper in Section 5.
2</p>
    </sec>
    <sec id="sec-2">
      <title>System Description</title>
      <p>In this Section we present our approach for
automatic prerequisite learning between Wikipedia
pages. We exploited a deep learning model that
can be customised by the user on a dedicated GUI.
The model was trained and tested on the official
dataset of the PRELEARN task.</p>
      <p>
        ITA-PREREQ
        <xref ref-type="bibr" rid="ref6">(Miaschi et al., 2019)</xref>
        is a binary
labelled dataset in which the labels stand for the
presence or the absence (1 or 0) of the prerequisite
relation between a pair of concepts. Each concept
is an educational item associated to a Wikipedia
page, therefore the concept name matches the
title of the equivalent Wikipedia page. Hence, the
dataset released for the shared task consists also
of the content and the link of the Wikipedia pages
referring to the concepts appearing in the dataset.
It covers four domains, namely precalculus,
geometry, physics and data mining.
2.1
      </p>
      <sec id="sec-2-1">
        <title>Classifier</title>
        <p>
          The classifier was built with the aim of testing
the combination of different hand–crafted
features on the automatic prerequisite learning task.
More specifically our classifier, whose
architecture is described in Figure 1, uses a two–dense–
layers Neural Network built using Scikit-Learn
and Keras libraries (wrapped for Tensorflow). The
activation function for the hidden layer is ReLU
while the Adam optimizer
          <xref ref-type="bibr" rid="ref4">(Kingma and Ba, 2014)</xref>
          is used as training algorithm. The output layer
consists of one neuron with sigmoid activation
function.
        </p>
        <p>Some structural properties of the classifier can
be customised by the user from a dedicated GUI.
In particular, for what concerns the structure of the
neural network the user can define the size of the
hidden layer and the number of epochs, while for
the evaluation the user can set the number of cross
validation folds. Moreover, training can be
performed on a customizable set of features (see
Section 2.2 for the complete list) since the input layer
is set to dynamically match the size of the
feature vector. For the specific purposes of this work,
we used in every scenario a model exploiting a 20
neurons hidden layer trained on 15 epochs. A
4fold cross validation was used for the in-domain
scenario.</p>
        <p>Training The official training set containing
concept pairs and their binary labels was
formatted as a pair of numpy arrays: one of them has
variable length and contains the serialization of the
features, which will be the model input, whilst the
latter contains the binary labels of the pairs. For
the in-domain scenario, the model was trained
using stratified random folds of concept pairs that
preserve the original proportion of domains’ pairs.
For the cross-domain evaluation scenario, a “leave
one domain out” approach was used, training the
model on all domains but the one used for test.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Features</title>
        <p>We defined a set of features extracted from the
Wikipedia page content and structure that are
available in the GUI and can be selected by the
user to train his model. While the pages content
was provided in the official release of the training
set, we exploited Wikipedia API 2 to extract the
Wikipedia metadata and knowledge structure.
Depending on the sub-task requirements, we trained
our models with a different combination of
features.</p>
        <p>
          Features used for the raw features model:
– titleInText: given a pair (A, B), it
checks if the title of page A/B is
mentioned in the page of the other concept.
– Jaccard similarity: a concept-based
metric that measures the similarity
between two pages by the number of
words shared between them.
– LDA: the Shannon Entropy of the LDA
          <xref ref-type="bibr" rid="ref3">(Deerwester et al., 1990)</xref>
          of nouns
and verbs in A and B. Nouns and
verbs are identified thanks to a
morphosyntactic analysis of the page content
performed by UDPipe pipeline
          <xref ref-type="bibr" rid="ref7">(Straka
and Strakova´, 2017)</xref>
          .
        </p>
        <p>2https://github.com/martin-majlis/
Wikipedia-API
– LDA Cross Entropy: the cross entropy
of the LDA vectors AnB.</p>
        <p>
          Features used for the raw and structural
features model. We exploited all the above
features combined with the followings:
– extractCategories: the Wikipedia
category(s) to which each page of the pair
(A, B) belongs.
– extractLinkConnections: for each pair
of concepts (A, B) checks if the
Wikipedia page of B contains a link to
A.
– totalIncoming/OutgoingLinks: it
computes how much a concept is linked
to/from other concepts.
– Reference distance: a link-based
metric that measures the relation between
two pages by the links contained in
each of them using the EQUAL weight
          <xref ref-type="bibr" rid="ref5">(Liang et al., 2015)</xref>
          .
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results and Error Analysis</title>
      <p>Table 1 reports the results obtained by our models
on the runs submitted for all four sub-tasks. On
average, the performances of our systems in the
different scenarios show that, as expected,
training the model in a in–domain scenario allows to
achieve better results. Among the different sets
of features used, exploiting structural information
extracted from Wikipedia pages’ structure is in
general more effective than relying only on raw
textual data. Thus, our best performing model
is the one exploiting both raw and structural
features evaluated in-domain scenario which achieves
an average accuracy computed across all four
domains of 0.700. Interestingly Data Mining
constitutes the only case where raw textual features
are more effective than a combination of raw and
structural features. In fact this domain shows
lower accuracies in the structured settings,
possibly due to the lower number of entries within
the dataset of Data Mining with respect to the
other three domains and to the lower coverage of
Wikipedia.</p>
      <p>If we compare our results obtained for each
domain with those obtained by the official
baseline, there are only two cases where our models
do not outperform the baseline, i.e. Geometry in
the raw feats in-domain subtask and physics in
both cross-domain subtasks. During error
analysis on geometry pairs, we observe that, while
pages about geometric figures, e.g. ”Rettangolo”
and ”Poligono”, show a prerequisite relation in the
gold dataset, our systems always fail to correctly
classify them. Concerning Physics, we observe
that in both cross-domain settings the classifier
did not consider the page ”Fisica” as prerequisite
of other pages belonging to the Physics domain,
causing the performances to be below baseline.</p>
      <p>If we look at the variation of accuracy values
for each model with respect to the classifier
confidence (see Figure 2), we notice that although
the four systems have a similar accuracy when
the confidence is low those related to the two
indomain settings show a similar increase in
accuracy confidence. Comparing cross-domain
settings we notice that only the structured one is able
to reach higher accuracy but only when it is highly
confident.
4</p>
    </sec>
    <sec id="sec-4">
      <title>System Interface</title>
      <p>Together with our system we also developed a
User Interface aimed at personalizing the
network and comparing results obtained with
different models. The interface is composed of the
following three modules: i) setup module; ii) results
module; iii) statistics module.</p>
      <p>The setup module, loaded at the start of the
program, allows to define:</p>
      <sec id="sec-4-1">
        <title>The input dataset; The parameters to setup the neural network architecture;</title>
      </sec>
      <sec id="sec-4-2">
        <title>The features for training the model.</title>
      </sec>
      <sec id="sec-4-3">
        <title>Raw Feats in-domain</title>
        <p>Raw+Struct in-domain
Baseline in-domain
Raw Feats cross-domain
Raw+Struct cross-domain
Baseline cross-domain</p>
        <sec id="sec-4-3-1">
          <title>Geometry</title>
        </sec>
        <sec id="sec-4-3-2">
          <title>Physics Precalc</title>
          <p>The module includes also a table where previously
saved Configurations can be selected in order to
run them again.</p>
          <p>After running the model, the user can reach
the results module in which are printed the
performance statistics (accuracy, precision, recall,
Fscore) achieved by the performed configuration.
Besides, the result module is composed of
different buttons that allows to:</p>
        </sec>
      </sec>
      <sec id="sec-4-4">
        <title>Save the performed configuration.</title>
        <p>See the results of the classifier on concept
pairs labelling.</p>
        <p>Save and download the results as csv file or
txt file.</p>
        <p>The statistics module plots in four bar charts the
values of accuracy, precision, F-score and recall
of all configurations saved in the interface. The
repository containing the system and its GUI can
be consulted on github 3.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>In the paper we described the approach proposed
by the UNIGE SE team for the EVALITA 2020
PRELEARN shared task. The classifier relied on
a set of features that was customised to address
the specific requests of each sub-task. The
results obtained by our models are all above baseline
(if considered averaging the accuracies across all
domains), although in some cases the results
obtained by the baseline are still highly competitive.
This suggests that automatic prerequisite learning
is a difficult task requiring many different
information to train the models. However, the obtained
results suggest that, at least in a in-domain setting,
3https://github.com/mnarizzano/
se20-project-16
features extracted from raw texts are sufficient to
achieve competitive results. In the cross-domain
setting exploiting only this type of features is not
enough. Nevertheless, using information extracted
from knowledge structures allows to achieve
better results in all sub-tasks. Although our obtained
results are promising, future work will be focused
on analyzing the impact of each feature in training
the model and exploring the inclusion of new
features to improve the performance of the classifier.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Chiara</given-names>
            <surname>Alzetta</surname>
          </string-name>
          , Alessio Miaschi, Felice Dell'Orletta,
          <string-name>
            <given-names>Frosina</given-names>
            <surname>Koceva</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Ilaria</given-names>
            <surname>Torre</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Prelearn@evalita 2020: Overview of the prerequisite relation learning task for italian</article-title>
          .
          <source>In Valerio Basile</source>
          , Danilo Croce, Maria Di Maro, and Lucia C. Passaro, editors,
          <source>Proceedings of Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2020</year>
          ),
          <article-title>Online</article-title>
          . CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Valerio</given-names>
            <surname>Basile</surname>
          </string-name>
          , Danilo Croce, Maria Di Maro, and
          <string-name>
            <surname>Lucia</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Passaro</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Evalita 2020: Overview of the 7th evaluation campaign of natural language processing and speech tools for italian</article-title>
          .
          <source>In Valerio Basile</source>
          , Danilo Croce, Maria Di Maro, and Lucia C. Passaro, editors,
          <source>Proceedings of Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2020</year>
          ),
          <article-title>Online</article-title>
          . CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Scott</given-names>
            <surname>Deerwester</surname>
          </string-name>
          , Susan T Dumais, George W Furnas,
          <article-title>Thomas K Landauer,</article-title>
          and Richard Harshman.
          <year>1990</year>
          .
          <article-title>Indexing by latent semantic analysis</article-title>
          .
          <source>Journal of the American society for information science</source>
          ,
          <volume>41</volume>
          (
          <issue>6</issue>
          ):
          <fpage>391</fpage>
          -
          <lpage>407</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Diederik P Kingma and Jimmy Ba</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Adam: A method for stochastic optimization</article-title>
          .
          <source>arXiv preprint arXiv:1412</source>
          .
          <fpage>6980</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Chen</given-names>
            <surname>Liang</surname>
          </string-name>
          , Zhaohui Wu, Wenyi Huang, and
          <string-name>
            <given-names>C Lee</given-names>
            <surname>Giles</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Measuring prerequisite relations among concepts</article-title>
          .
          <source>In Proceedings of the 2015 conference on empirical methods in natural language processing</source>
          , pages
          <fpage>1668</fpage>
          -
          <lpage>1674</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Alessio</given-names>
            <surname>Miaschi</surname>
          </string-name>
          , Chiara Alzetta,
          <source>Franco Alberto Cardillo, and Felice Dell'Orletta</source>
          .
          <year>2019</year>
          .
          <article-title>Linguistically-driven strategy for concept prerequisites learning on italian</article-title>
          .
          <source>In Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications</source>
          , pages
          <fpage>285</fpage>
          -
          <lpage>295</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Milan</given-names>
            <surname>Straka</surname>
          </string-name>
          and Jana Strakova´.
          <year>2017</year>
          .
          <article-title>Tokenizing, pos tagging, lemmatizing and parsing ud 2.0 with udpipe</article-title>
          .
          <source>In Proceedings of the CoNLL</source>
          <year>2017</year>
          <article-title>Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies</article-title>
          , pages
          <fpage>88</fpage>
          -
          <lpage>99</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>