<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Enhancing Categorization of Learning Resources in the DAtaset of Joint Educational Entities</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Carla Limongelli</string-name>
          <email>limongel@ing.uniroma3.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matteo Lombardi</string-name>
          <email>matteo.lombardi@griffithuni.edu.au</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandro Marani</string-name>
          <email>alessandro.marani@griffithuni.edu.au</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Davide Taibi</string-name>
          <email>davide.taibi@itd.cnr.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Engineering Department, Roma Tre University</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Istituto per le Tecnologie Didattiche, Consiglio Nazionale delle Ricerche</institution>
          ,
          <addr-line>Palermo</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>School of Information and Communication Technology, Griffith University</institution>
          ,
          <country country="AU">Australia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The DAtaset of Joint Educational Entities (DAJEE) is a repository which hosts more than 20,000 educational resources crawled from the MOOC platform Coursera. The resources are divided per category according to the MOOC categorization on Coursera, which is, however, very shallow. This contribution focuses on a more meaningful categorization of the resources in DAJEE, tailored to their content. To achieve such goal, our approach enriches the resources in DAJEE with semantic entities by applying state-of-the-art semantic techniques. The result is a significant improvement of the categorization of the resources in DAJEE than the previous version.</p>
      </abstract>
      <kwd-group>
        <kwd>Semantic Entities</kwd>
        <kwd>OER</kwd>
        <kwd>Linked Data for Education</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The DAtaset of Joint Educational Entities (DAJEE) provides a huge variety of
learning resources coming from the popular Massive Open Online Course (MOOC)
platform Coursera1. The novelty of DAJEE is the contextualization of the delivery of
learning resources in lessons and courses [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This information has a potential for
describing the teaching approaches of the author of the course, like, for example,
concept sequencing and semantic density [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>In Coursera, MOOCs are grouped in 10 top-level categories, each one with a
different number of sub-categories, while the resources do not have any categorization.
In DAJEE, the category of a resource is inherited from the related MOOC
(reproducing the same categorization stated by authors). However, Coursera offers only a
shallow categorization with at most two levels of categories, with some top-level
categories without any sub-category. Hence, hundreds of resources are grouped into just one
category with no additional diversification among them. Also, educational resources</p>
    </sec>
    <sec id="sec-2">
      <title>1 https://www.coursera.org/</title>
      <p>may belong to categories that differ from the ones of their course. This limited
categorization of the resources is currently replicated in DAJEE.</p>
      <p>
        We propose a method to enhance DAJEE with a more-in-depth categorization of
the resources based on their content, instead of their course. The proposed method
firstly exploits Semantic Web techniques and data offered by DBpedia for interlinking
the content of a resource with semantic entities in DBpedia [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The hierarchical
structure of the categories of DBpedia is used for building the category graph of the
resource. Then, we propose an application of Dijkstra's algorithm and the Spreading
Activation technique [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] to reduce the noise that may be introduced when interlinking
the resources with DBpedia entities, and, so, to refine the overall category graphs. The
result of this method is a significant improvement of the categorization in DAJEE,
more detailed and tailored directly to the content of the resources. A fine-grained
categorization improves the browsing of the educational resources and supports
further applications such as category-based retrieval and recommendation system.
2
      </p>
      <p>Categories in DAJEE</p>
      <p>The shallow category structure of Coursera is linked to the courses, not to the
resources. Instead, DBpedia offers a much more detailed category structure that can
directly categorize the resources. As an example, the category Math and Logic in
Coursera has no subcategories, while in DBpedia the equivalent category (named
Mathematics) has several subcategories with different levels of depth.</p>
      <p>In DBpedia, semantic entities have many categories. For achieving our goal,
resources in DAJEE should be associated with categories that are tailored to the
semantic entities extracted from their transcripts. To keep trace of the categorization of a
semantic entity, a sub-graph of the DBpedia category structure describes an entity.
Such sub-graph starts with the source category and ends with the categories stated in
the DBpedia page of the entity. Since each transcript has a number of entities, each
one with a sub-graph, the main problem is how to properly merge these sub-graphs
for a correct and meaningful categorization of a resource or a text.</p>
      <p>
        A simple merge of the category-graphs of the entities associated with the resource
is not an efficient solution. DBpedia entities can present a very wide set of categories
and some of them may be poorly or not at all related to the resource. Let us consider
the resource Generic birthday attack from course Cryptography I as a reference. The
DBpedia entity dbr:Cryptographic_hash_function2 has been found in its transcript,
and Figure 1 shows its category graph. The graph includes Science, Mathematics, and
History, while other categories are descendant of Business and Belief which seem to
be unrelated to the resource. For a more effective categorization of the resource, we
suggest a novel method for filtering DBpedia categories based on the Spreading
Activation (SA) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and Dijkstra's algorithm. For each entity extracted from the resource
text, the spreading phase starts giving one activation unit only to the categories
explicitly stated in the DBpedia page of the entity. Then, edges are weighted in three
      </p>
    </sec>
    <sec id="sec-3">
      <title>2 http://dbpedia.org/resource/Cryptographic_hash_function</title>
      <p>steps: 1st step) Dijkstra's algorithm finds the shortest paths among the categories
stated in the DBpedia page of the entity, if any; 2nd step) the edges receive a weight
which is 1 if they are part of those paths; 3rd step) the weight is set to 1 for all the
edges that connect the categories on the DBpedia page with the root of the graph.</p>
      <p>For deducing the most important top-level categories for an entity, the activation is
spread throughout the category graph opposite to the edges direction (from "child" to
"parent"). The activation for a category j is regulated as follows:
  =</p>
      <p>!∈!"#$!"#%&amp;#'((!) !"
  = 10,,  ℎ() &gt; 0
where wij is the weight of an edge from category i to j. The algorithm stops when the
OutgoingActivation is 0. Figure 2 (left) reports the activations of the top-level
categories, finding that Generic birthday attack is mostly about Mathematics (57% of the
activations). The same process can also filter the sub-categories of the most frequent
top-level categories, removing edges with weight 0 and sub-categories with no
activation from the graph of the resource3. Interestingly, the resource Alpha Beta Pruning is
identified as Mathematics as well, but it is different from Generic birthday attack.
Our method shows that the category graph for Alpha Beta Pruning4 is focused on
graph theory and algorithms, while Generic birthday attack presents many
connections to mathematical analysis and algebra. So, we can further distinguish resources
3 The final category graph for the resource Generic birthday attack is available at:
http://virtuosa.pa2.itd.cnr.it/iswc17/generic_birthday_attack.png
4 The resulting graph is available at:</p>
      <p>http://virtuosa.pa2.itd.cnr.it/iswc17/alpha_beta_pruning_graph.png
belonging to a same top-level category, like Generic birthday attack and Alpha Beta
Pruning. Applying our methodology, a more meaningful categorization tailored on
the resource content is now included in DAJEE.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Estivill-Castro</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Limongelli</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lombardi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Marani</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <article-title>Dajee: A dataset of joint educational entities for information retrieval in technology enhanced learning</article-title>
          .
          <source>In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval</source>
          (
          <year>2016</year>
          ), ACM, pp.
          <fpage>681</fpage>
          -
          <lpage>684</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Limongelli</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lombardi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marani</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Taibi</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <article-title>Enrichment of the dataset of joint educational entities with the web of data</article-title>
          .
          <source>In 17th IEEE International Conference on Advanced Learning Technologies (ICALT'17)</source>
          (
          <year>2017</year>
          ), IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Crestani</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <article-title>Application of spreading activation techniques in information retrieval</article-title>
          .
          <source>Artificial Intelligence Review</source>
          <volume>11</volume>
          ,
          <issue>6</issue>
          (
          <year>1997</year>
          ),
          <fpage>453</fpage>
          -
          <lpage>482</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Dietze</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            <given-names>H. Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giordano</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaldoudi</surname>
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dovrolis</surname>
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taibi</surname>
            <given-names>D.</given-names>
          </string-name>
          <year>2012</year>
          .
          <article-title>Linked Education: interlinking educational Resources and the Web of Data</article-title>
          .
          <source>ACM Symposium On Applied Computing (SAC-2012)</source>
          ,
          <article-title>Special Track on Semantic Web and Applications</article-title>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>