<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>B4DS @ PRELEARN: Ensemble Method for Prerequisite Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giovanni Puccetti</string-name>
          <email>giovanni.puccetti@sns.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Filippo Chiarello</string-name>
          <email>filippo.chiarello@unipi.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luis Bolanos</string-name>
          <email>luis.bolanos@texty.biz</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gualtiero Fantoni</string-name>
          <email>g.fantoni@ing.unipi.it</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Scuola Normale Superiore</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Texty S.r.l.</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Universita ́ di Pisa</institution>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Universita ́ di Pisa</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>English. In this paper we describe the methodologies we proposed to tackle the EVALITA 2020 shared task PRELEARN. We propose both a methodology based on gated recurrent units as well as one using more classical word embeddings together with ensemble methods. Our goal in choosing these approaches, is twofold, on one side we wish to see how much of the prerequisite information is present within the pages themselves. On the other we would like to compare how much using the information from the rest of Wikipedia can help in identifying this type of relation. This second approach is particularly useful in terms of extension to new entities close to the one in the corpus provided for the task but not actually present in it. With this methodologies we reached second position in the challenge1.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>The PRELEARN task consists in classifying pairs
of concepts according to whether one is a
prerequisite for the other or not. The concepts are
presented as Wikipedia pages and they are divided
into four different domains, physics, precalculus,
data mining and geometry.</p>
      <p>
        The task was organized in 4 subtasks: i) two
of them concerned with the type of information
that can be exploited by the submitted models,
either solely textual or including metadata, e.g.
Wikipedia hyperlinks; ii) the other two based on
different classification scenarios, training and
testing could happen either on the same domain or
1Copyright c 2020 for this paper by its authors. Use
permitted under Creative Commons License Attribution 4.0
International (CC BY 4.0).
three domain could be used as training set and the
fourth as testing. A more extensive description
of the task together with all the results and more
information is found in the report
        <xref ref-type="bibr" rid="ref2">(Alzetta et al.,
2020)</xref>
        which is part of the EVALITA 2020
        <xref ref-type="bibr" rid="ref3">(Basile
et al., 2020)</xref>
        . The concept of being a prerequisite
is highly complex and can be misunderstood from
humans as well. Indeed, this relation can be subtle
and depending on the domain it may take a deep
level of expertise to recognize. One of the reasons
this challenge is very interesting, is the fact that
several application can arise from this same
setting. Regarding this, we point out how it could
be interesting to apply the systems we develop for
this task to evaluate teaching modules. Indeed,
one could design a quality assessment for courses
based on the level of agreement between
subsequent chapters and sections and their prerequisite
relations. A different application, could be the
definition of a new way to move around Wikipedia
itself, identifying which links move in the same
direction as the prerequisite relation and which on
the contrary move against it.
      </p>
      <p>
        Let us now outline three main aspects common
to different works tackling similar tasks. We will
take into into account these specifics while
developing our own models. The first is that hand
crafted features are commonly used, in
        <xref ref-type="bibr" rid="ref11">(Miaschi
et al., 2019)</xref>
        they develop these features mostly
analysing textual statistics, for example the
occurrence of one concept in the page of another one.
In
        <xref ref-type="bibr" rid="ref10">(Liang et al., 2015)</xref>
        they also develop top down
features, however the information they structure
does not come from the body of the pages, instead
they use the structure of Wikipedia as a graph with
hyperlinks. Following this line, the second aspect
is the use of graph structures. In most of the works
predicting prerequisites, we see how they interpret
pages as nodes and hyperlinks as edges. Both in
        <xref ref-type="bibr" rid="ref13">(Talukdar and Cohen, 2012)</xref>
        and in
        <xref ref-type="bibr" rid="ref10">(Liang et al.,
2015)</xref>
        they use this feature, in some cases joining
it with textual information, whereas in others as a
stand alone one. On the contrary, in
        <xref ref-type="bibr" rid="ref1">(Adorni et al.,
2019)</xref>
        they use a bottom up graph structures
created to help in the prediction. The third and last
is the use of neural networks, as done in
        <xref ref-type="bibr" rid="ref11">(Miaschi
et al., 2019)</xref>
        , where they are employed to create
representations of text that can afterward be fed
as features to simpler classifiers. We remark how
structuring information into a graph is a practice
used also in other tasks involving several
documents. One example is topic modeling
        <xref ref-type="bibr" rid="ref8">(Gerlach
et al., 2018)</xref>
        , it is interesting to notice how this
task shares some of the steps needed for
prerequisite learning. Indeed, in both cases one needs
to crate a hierarchy of concepts which is then
exploited in different ways. Since we wish to
exploit textual knowledge, we can also employ word
embeddings. For the Italian language they are
developed in
        <xref ref-type="bibr" rid="ref4">(Berardi et al., 2015)</xref>
        . On top of them
we will use ensemble methodologies since they
can proficiently exploit information in these
representations. Notice how in principle more modern
techniques, such as transformer models
        <xref ref-type="bibr" rid="ref7">(Devlin et
al., 2019)</xref>
        could be used to help performance in
this task, however as we will see we preferred not
to do so. The main reason supporting this choice
is the fact that the dataset provided for this task is
not too big and thus we avoided too large models.
The systems we developed try to enclose all these
pieces of information we reported. Indeed, we try
to exploit both knowledge strictly present within
the Wikipedia pages provided for this task as well
as information coming from the rest of the online
encyclopedia.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Description of the System</title>
      <p>In this report we describe the methodology we
developed to tackle the PRELEARN task. We
report the choices made and the steps that led us to
them. In particular, We focused on the raw-text
setting, for which we adopted two systems with
the goal of prerequisite learning. Although both
use the Wikipedia pages’ texts, each one does it in
different ways.
2.1</p>
      <sec id="sec-2-1">
        <title>Model 1</title>
        <p>
          This model exploits a combination of pretrained
word embeddings, of GloVe type
          <xref ref-type="bibr" rid="ref12">(Pennington et
al., 2014)</xref>
          , as trained for Italian in
          <xref ref-type="bibr" rid="ref4">(Berardi et al.,
2015)</xref>
          and handcrafted features, the latter inspired
from
          <xref ref-type="bibr" rid="ref11">(Miaschi et al., 2019)</xref>
          . In particular, for each
page title in a concept pair (A, B), we computed a
300-dimension vector by averaging the word
embeddings of each word in the A/B title. These two
resulting vectors were concatenated together with
the following 14 handcrafted features.
        </p>
        <sec id="sec-2-1-1">
          <title>Is B(A) in A(B)’s text? Number of occurrences of B(A) in A(B)’s text</title>
        </sec>
        <sec id="sec-2-1-2">
          <title>Is B(A) in the first sentence of A(B)?</title>
        </sec>
        <sec id="sec-2-1-3">
          <title>Is B in A’s title?</title>
        </sec>
        <sec id="sec-2-1-4">
          <title>Length of A(B)</title>
        </sec>
        <sec id="sec-2-1-5">
          <title>Jaccard similarity between the texts</title>
          <p>Jaccard similarity between nouns in the texts
Difference in length between first paragraphs
Difference in number of nouns in first
paragraphs
Jaccard similarity between nouns in first
paragraphs</p>
          <p>
            Then, for each pair (A,B) the final feature vector
of 614 dimensions, was fed to a XGBoost
classifier
            <xref ref-type="bibr" rid="ref5">(Chen and Guestrin, 2016)</xref>
            , whose model
selection was performed via a nested cross
validation with grid search.
2.2
          </p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Model 2</title>
        <p>
          This model takes as information the first 400
words of each Wikipedia page, and for each pair
(A,B) predicts if word B is a prerequisite for word
A. It is composed of a Gated Recurrent Unit
          <xref ref-type="bibr" rid="ref6">(Cho
et al., 2014)</xref>
          with hidden size of 8 and encoding
size 32, and a linear layer taking as input the
concatenation of the two vectors representing the two
Wikipedia pages to check and predict the
prerequisite relation. This model, similar to model M1 in
          <xref ref-type="bibr" rid="ref11">(Miaschi et al., 2019)</xref>
          , though simpler, performs
well enough and is fast to train. The parameters
are chosen based on a grid search selecting the best
results achieved on a validation set. The
aforementioned values are the best performing choices
for all settings and we keep them for the cross
domain task as well. We tried different learning
rates, though ultimately a constant one of 0.01 for
the whole training was the best choice.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Discarded Models</title>
      <p>We attempted to perform the structured data task
as well, in particular adding the Wikipedia link
Data-mining</p>
      <sec id="sec-3-1">
        <title>Geometry</title>
      </sec>
      <sec id="sec-3-2">
        <title>Physics</title>
      </sec>
      <sec id="sec-3-3">
        <title>Precalculus</title>
        <p>
          GRU + GCNConv1
structure to see if it would be useful. In
order to exploit this knowledge we tried to use a
Graph Convolutional Network (GCN)
          <xref ref-type="bibr" rid="ref9">(Kipf and
Welling, 2017)</xref>
          . To do so we added the GCN
between the Gated recurrent unit and the linear
layer in Model 2 so as to perform the prediction
based on the concatenation of the embedding of
each node (Wikipedia page) in each pair.
However this methodology resulted into lower scores
in all dataset so we ended up not submitting it. We
believe this is due to the fact that this is not the
appropriate way to leverage the information present
in the Wikipedia structure. Since we know from
          <xref ref-type="bibr" rid="ref11">(Miaschi et al., 2019)</xref>
          that the information itself is
relevant.
        </p>
        <p>For Model 1 instead, a variation was tested with
a multi-layer perceptron as well, but results were
below those reported for the XGBoost ensemble.</p>
        <p>An overall different approach we rejected is
using transformer models. Indeed to obtain a
representation of the text composing each page we
could employ a representation extrapolated from
BERT. However, after seeing how, much smaller
models were overfitting the training set, we
concluded that the amount of available textual data is
not enough to exploit this model and avoided it.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <p>
        In Table 1 we report the achieved accuracy on the
test set. As we can see, Model 1 outperformed
Model 2. This is remarkable in the sense that
the former is simpler than the one based on
recurrent networks. The same can be said about
the hand-crafted features, which are mostly
statistics of each pair of pages based on occurrences.
Indeed, as proven also in
        <xref ref-type="bibr" rid="ref11">(Miaschi et al., 2019)</xref>
        ,
1Values from our own validation set split
this information does help the model. We believe
Model 1 attained a higher score thanks to its
pretrained word embeddings and the larger corpora
they are trained upon. Indeed, the dataset used
to create those vectors is composed of the whole
Italian Wikipedia and of a large amount of novels.
This encodes within these representations a wider
knowledge than the one provided for this task only.
Looking at the accuracy achieved with the GCN
layer, we see how performances are systematically
lower than the others, that is why we chose not to
submit it.
      </p>
      <p>After looking at the challenge results, we
proceeded to explore more in general how well our
models performed. In order to do so, for each
one, we estimated precision, recall, accuracy and
f1 score (reported in Table 2).</p>
      <p>When comparing Model 1 and 2 between them,
we noticed that the latter exhibited higher
precision in 3 of the 4 areas, but also lower recall in 3
of them. As a result, there was a systematic
difference in accuracy and f1-scores favouring Model
1 over Model 2. If we look closely at Model 1
scores in Table 2 we see how Physics and
Precalculus show a broader difference between precision
and recall. This underlines how in these two
domains there are some concepts that despite being
involved in several prerequisite relations are less
represented in the general knowledge. Moreover,
the same behavior is experienced for Model 2,
indicating how the models started to miss some
positive samples. The fact that it happens for this
second setting makes us believe this phenomenon is
also due to the presence of more spread
information within the Wikipedia pages of the concepts
enclosed in these domains. As we mentioned the
second model has higher precision in three cases,
whereas the first has higher recall, in two cases the
Precision</p>
      <sec id="sec-4-1">
        <title>Recall</title>
      </sec>
      <sec id="sec-4-2">
        <title>Accuracy</title>
        <p>data mining
geometry
physics
precalculus
data mining
geometry
physics
precalculus
difference in recall is much in favor of the latter
and indeed it is the better performing one.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Discussion</title>
      <p>Regarding the first model, we see how the
vectorization obtained from the Wikipedia corpus
performs well, particularly considering that it
represents exclusively the pages’ titles. We also
notice that the comparison between the two models
is not straighforward since the ensemble model we
used was not tested on the vectors obtained from
the recurrent neural networks. We did not
experiment in this mixed setting, since we believe it
would not make sense to deploy a methodology
with the power of XGBoost on embeddings solely
based on the information present in the pages
provided for this task. Indeed, there are high chances
that the results for such complex model would still
be worse than the one with the pretrained
embeddings, since, as we mentioned in Section 4, the
knowledge available exclusively in the pages
proposed for this task is limited.</p>
      <p>The other remarkable aspect is that to surpass
the performance of the GRU, handcrafted features
were helpful, despite them being mostly word
occurrences counts. This same information is
available to the GRU models, which performs worse.
This underlines how the recurrent architecture,
though powerful and able to capture long distance
relations, can not retain this type of substantial
details. Regarding the second model introduced, we
remark how the hidden units size and the
encoding size are very small. This is coherent with the
fact that the dataset is not large enough to exploit
the scaling potential of a recurrent neural network
with a larger size. However, with this small model
the results are better than with a baseline and as we
mentioned the training times are all quite small.
Thus, the idea of performing more ablation
studies where bag of words methodologies are used
together with recurrent ones, could lead to further
improvements still supporting a more bottom-up
solution than hand crafted features.</p>
      <p>Following the analysis of the models we used,
we can conclude that the property of being a
prerequisite is a complex characteristic and thus the
use of large amounts of data can be useful. On the
other hand, the fact that the model solely based on
the data at hand performs only marginally worse
than the other underlines how this information is
present in the pages themselves. Possibly a mixed
dataset contained between the one at hand and
the whole Italian Wikipedia could be a solution to
move further in prerequisites learning.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Giovanni</given-names>
            <surname>Adorni</surname>
          </string-name>
          , Chiara Alzetta, Frosina Koceva, Samuele Passalacqua, and
          <string-name>
            <given-names>Ilaria</given-names>
            <surname>Torre</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Towards the identification of propaedeutic relations in textbooks</article-title>
          . In Seiji Isotani, Eva Milla´n, Amy Ogan,
          <string-name>
            <surname>Peter M. Hastings</surname>
          </string-name>
          ,
          <string-name>
            <surname>Bruce M. McLaren</surname>
          </string-name>
          , and Rose Luckin, editors,
          <source>Artificial Intelligence in Education - 20th International Conference, AIED 2019</source>
          , Chicago, IL, USA, June 25-29,
          <year>2019</year>
          , Proceedings,
          <string-name>
            <surname>Part</surname>
            <given-names>I</given-names>
          </string-name>
          , volume
          <volume>11625</volume>
          of Lecture Notes in Computer Science, pages
          <fpage>1</fpage>
          -
          <lpage>13</lpage>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Chiara</given-names>
            <surname>Alzetta</surname>
          </string-name>
          , Alessio Miaschi, Felice Dell'Orletta,
          <string-name>
            <given-names>Frosina</given-names>
            <surname>Koceva</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Ilaria</given-names>
            <surname>Torre</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Prelearn @ evalita 2020: Overview of the prerequisite relation learning task for italian</article-title>
          .
          <source>In Valerio Basile</source>
          , Danilo Croce, Maria Di Maro, and Lucia C. Passaro, editors,
          <source>Proceedings of Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2020</year>
          ),
          <article-title>Online</article-title>
          . CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Valerio</given-names>
            <surname>Basile</surname>
          </string-name>
          , Danilo Croce, Maria Di Maro, and
          <string-name>
            <surname>Lucia</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Passaro</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Evalita 2020: Overview of the 7th evaluation campaign of natural language processing and speech tools for italian</article-title>
          .
          <source>In Valerio Basile</source>
          , Danilo Croce, Maria Di Maro, and Lucia C. Passaro, editors,
          <source>Proceedings of Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2020</year>
          ),
          <article-title>Online</article-title>
          . CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Giacomo</given-names>
            <surname>Berardi</surname>
          </string-name>
          , Andrea Esuli, and Diego Marcheggiani.
          <year>2015</year>
          .
          <article-title>Word embeddings go to italy: A comparison of models and training datasets</article-title>
          .
          <source>In IIR.</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Tianqi</given-names>
            <surname>Chen</surname>
          </string-name>
          and
          <string-name>
            <given-names>Carlos</given-names>
            <surname>Guestrin</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Xgboost: A scalable tree boosting system</article-title>
          .
          <source>In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '16, page 785-794</source>
          , New York, NY, USA. Association for Computing Machinery.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Kyunghyun</given-names>
            <surname>Cho</surname>
          </string-name>
          , Bart van Merrie¨nboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and
          <string-name>
            <given-names>Yoshua</given-names>
            <surname>Bengio</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Learning phrase representations using RNN encoder-decoder for statistical machine translation</article-title>
          .
          <source>In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          , pages
          <fpage>1724</fpage>
          -
          <lpage>1734</lpage>
          , Doha, Qatar, October. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Jacob</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ming-Wei</surname>
            <given-names>Chang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Kenton</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Kristina</given-names>
            <surname>Toutanova</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>BERT: Pre-training of deep bidirectional transformers for language understanding</article-title>
          .
          <source>In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers), pages
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          , Minneapolis, Minnesota, June. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Martin</given-names>
            <surname>Gerlach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Tiago P.</given-names>
            <surname>Peixoto</surname>
          </string-name>
          , and
          <string-name>
            <surname>Eduardo</surname>
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Altmann</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>A network approach to topic models</article-title>
          .
          <source>Science Advances</source>
          ,
          <volume>4</volume>
          (
          <issue>7</issue>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Thomas N.</given-names>
            <surname>Kipf</surname>
          </string-name>
          and
          <string-name>
            <given-names>Max</given-names>
            <surname>Welling</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Semisupervised classification with graph convolutional networks</article-title>
          .
          <source>In 5th International Conference on Learning Representations, ICLR</source>
          <year>2017</year>
          , Toulon, France,
          <source>April 24-26</source>
          ,
          <year>2017</year>
          , Conference Track Proceedings. OpenReview.net.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Chen</given-names>
            <surname>Liang</surname>
          </string-name>
          , Zhaohui Wu, Wenyi Huang, and
          <string-name>
            <given-names>C. Lee</given-names>
            <surname>Giles</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Measuring prerequisite relations among concepts</article-title>
          .
          <source>In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing</source>
          , pages
          <fpage>1668</fpage>
          -
          <lpage>1674</lpage>
          , Lisbon, Portugal, September. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Alessio</given-names>
            <surname>Miaschi</surname>
          </string-name>
          , Chiara Alzetta,
          <source>Franco Alberto Cardillo, and Felice Dell'Orletta</source>
          .
          <year>2019</year>
          .
          <article-title>Linguistically-driven strategy for concept prerequisites learning on italian</article-title>
          .
          <source>In Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications</source>
          , pages
          <fpage>285</fpage>
          -
          <lpage>295</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Jeffrey</surname>
            <given-names>Pennington</given-names>
          </string-name>
          , Richard Socher, and
          <string-name>
            <given-names>Christopher D</given-names>
            <surname>Manning</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Glove: Global vectors for word representation</article-title>
          .
          <source>In EMNLP</source>
          , volume
          <volume>14</volume>
          , pages
          <fpage>1532</fpage>
          -
          <lpage>1543</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Partha</given-names>
            <surname>Talukdar</surname>
          </string-name>
          and
          <string-name>
            <given-names>William</given-names>
            <surname>Cohen</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Crowdsourced comprehension: Predicting prerequisite structure in Wikipedia</article-title>
          .
          <source>In Proceedings of the Seventh Workshop on Building Educational Applications Using NLP</source>
          , pages
          <fpage>307</fpage>
          -
          <lpage>315</lpage>
          , Montre´al, Canada, June. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>