<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Socio-semantic Networks of Research Publications in the Learning Analytics Community</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Soude Fazeli</string-name>
          <email>soude.fazeli@ou.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hendrik Drachsler</string-name>
          <email>hendrik.drachsler@ou.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Peter Sloep</string-name>
          <email>peter.sloep@ou.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Open University of the Netherlands (OUNL) Centre for Learning Sciences and Technologies (CELSTEC) 6401 DL Heerlen</institution>
          ,
          <addr-line>The Netherlands 0031-(0)45-576-2218</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we present network visualizations and an analysis of publications data from the LAK (Learning Analytics and Knowledge) in 2011 and 2012, and the special edition on Learning and Knowledge Analytics in Journal of Educational Technology and Society (JETS) in 2012.</p>
      </abstract>
      <kwd-group>
        <kwd>Network</kwd>
        <kwd>recommender</kwd>
        <kwd>visualization</kwd>
        <kwd>dataset</kwd>
        <kwd>learning analytics</kwd>
        <kwd>degree</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The Society for Learning Analytics Research (SOLAR)1 provided
a dataset to solicit contributions to the LAK data challenge2
sponsored by the FP7 European Project LinkedUp3. The dataset
contains research publications in learning analytics and
educational data mining for the years 2010, 2011, and 2012
        <xref ref-type="bibr" rid="ref5">(Taibi
&amp; Dietze, 2013)</xref>
        . An overview of the dataset is shown in Figure 1.
The dataset contains in total, 173 authors and 76 papers from the
LAK (Learning Analytics and Knowledge) conference series in
2011 and 2012, and the special edition on learning and knowledge
analytics in the Journal of Educational Technology and Society
(JETS) in 2012. We found 24 authors who contributed to all three
scientific proceedings.
      </p>
      <p>
        Having access to a dataset always offers new opportunities,
particularly in the educational domain, that lacks public datasets
for running experimental studies
        <xref ref-type="bibr" rid="ref6">(Verbert, Drachsler, Manouselis,
Wolpers, Vuorikari, &amp; Duval, 2011)</xref>
        . Therefore, we used this
dataset to present visualization of the authors and papers network,
and to carry out a deeper analysis of the generated networks. Our
overall aim is to use such a graph of authors and papers to
recommend similar items to a target user. In the following
sections, we evaluate the suitability of the LAK dataset for this
purpose.
      </p>
      <sec id="sec-1-1">
        <title>1 http://www.solaresearch.org/</title>
        <p>2 http://www.solaresearch.org/events/lak/lak-data-challenge/</p>
      </sec>
      <sec id="sec-1-2">
        <title>3 http://linkedup-project.eu/</title>
        <p>2.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Motivation</title>
      <p>It is often difficult for conference attendees to decide which
workshops or sessions are suitable and relevant for them.
Therefore, a list of recommended authors and papers based on
shared interests could be supportive to plan the conference
participation more efficiently and effectively. There already exist
several papers published regarding awareness support for
researchers (Reinhardt et al., 2012; Fisichella et al., 2010; Ochoa
et al., 2009; Henry et al., 2009) and scientific recommender
systems (Huang et al., 2002; Wang &amp; Blei, 2010) but none of
them has analyzed the Learning Analytics datasets for this
purpose yet.</p>
      <p>Our overall vision is to support the LAK attendees with a list of
LAK authors and papers that are relevant for their own research
interests. Such a recommendation could be created based on one
or more of their own research papers but also on a short essay or
even a tag cloud summarizing the research interest and objectives.
Such a priority list can support the awareness of the attendees and
empower the network of like-minded authors in the attendees’
particular research focus.
In this paper, then, we aim to explore and identify like-minded
authors within the LAK dataset. Supposing that we have a
network of all the LAK authors and papers, the main research
questions are:
RQ1. How are the authors connected and which authors share
more connections and are more central in terms of sharing
commonalities with the others?
RQ2. How are the papers connected to each other in terms of
similarity?
To answer these questions, we went through two main steps in our
analysis: 1. Finding patterns of similarity between authors and
papers, 2. Visualizing networks of the LAK authors and papers.
We will now describe each step in detail.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Data processing</title>
      <p>To find relationships between authors, we first computed the
similarity of the papers with the TF-IDF4 algorithm. TF-IDF can
create a weighted list of the most commonly used terms in
research articles. To generate the TF-IDF matrix for the LAK
dataset, we first converted the LAK data from RDF to text files,
which is an accepted format for the Mahout5 system. Then, we ran
the default TF-IDF algorithm provided by Mahout on the text
files. We removed the stop words by setting the configuration
variables within Mahout to 90%. Thus, if a word appears in 90%
of the document, it is considered as a stop word (e.g. and, or, the,
etc.) and is removed from the similarity matrix. As a final
outcome we had:
•
•</p>
      <p>
        A so-called dictionary of all the terms in the LAK
dataset
A binary sequence file that includes the TF-IDF
weighted vectors
For computing similarity between the LAK authors, we used the
T-index algorithm
        <xref ref-type="bibr" rid="ref2">(Fazeli, Zarghami, Dokoohaki, &amp; Matskin,
2010)</xref>
        as a collaborative filtering recommender algorithm that
generates a graph of users. In it the nodes are users and the edges
show the relationship between users that originates from similarity
of user profiles. The T-index algorithm originally makes
recommendations based on the ratings data of users. We extended
the T-index algorithm to be able to process tags and keywords
extracted from the linked data e.g. RDF files. We used Jena6 APIs
to process RDF files and to handle Ontology Web Language
(OWL) files that describe the generated graph of authors and
papers. Jena helps to develop semantic Web application and tools.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Data visualization</title>
      <p>We visualized the generated graphs of authors and papers with the
Welkin7 tool. Welkin takes an OWL file as input and provides
visualization of the data as output. We present visualizations of
the LAK authors and the LAK papers generated by Welkin in the
following sub sections.</p>
      <sec id="sec-4-1">
        <title>4 http://en.wikipedia.org/wiki/Tf–idf</title>
      </sec>
      <sec id="sec-4-2">
        <title>5 http://mahout.apache.org/</title>
      </sec>
      <sec id="sec-4-3">
        <title>6 http://jena.apache.org/</title>
      </sec>
      <sec id="sec-4-4">
        <title>7 http://simile.mit.edu/welkin/</title>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4.1. The LAK authors network</title>
      <p>Figure 2 presents a network of the LAK authors in which red
nodes represent the authors and the edges show the similarity
between the publications of two authors. The result shows how
the LAK authors are connected in terms of their publications'
commonalities. Moreover, the network shows the users who share
more commonalities than do other authors. We call them ‘central
authors’. In the next section, we show how they are connected
with the other authors in the network.</p>
    </sec>
    <sec id="sec-6">
      <title>4.2. The LAK authors’ degree centrality</title>
      <p>
        For some node in the network, the degree centrality shows the
total number of incoming and outgoing edges. It is a metric
commonly used for Social Network Analysis (SNA)
        <xref ref-type="bibr" rid="ref1 ref3 ref4">(De Liddo,
Buckingham Shum, Quinto, Bachler, &amp; Cannavacciuolo, 2011;
Gu´eret, Groth, Stadler, &amp; Lehmann, 2012; Opsahl, Agneessens,
&amp; Skvoretz, 2010)</xref>
        . In other words, the degree of a node describes
how many other nodes are connected to the target node. In fact, it
helps to measure how many hubs are in the network. We describe
hubs as the nodes that have the most connections to the others in
the network. The degree centrality metric may be used to
strengthen a network by providing its nodes with more
connections. In this data study, degree centrality is used to
measure the relevance of an author’s papers to the other authors in
the network.
      </p>
      <p>u1
u2
u3
u4 u5 u6 u7</p>
      <p>Then  first  ten  central authors</p>
    </sec>
    <sec id="sec-7">
      <title>4.3. The LAK papers network</title>
      <p>Figure 4 shows a network of the LAK papers. The red nodes are
papers and the edges between them represent the similarity of the
papers. By finding similar papers, we can recommend the most
similar papers to specific authors. This increases the awareness of
the authors about papers which are relevant to them and published
in their communities.</p>
      <p>Figure 4 shows that, some of the papers share more similarity with
the others and own a higher degree number. As with the central
authors, these papers will appear more often in the top
recommendation list than the other papers of the dataset. One
may interpret their degree as their popularity. Therefore, the
papers with higher degree values are more popular and,
presumably, they are more of interests to users. For the
publication data, interests of users derives from the words and
terms they have used more frequently in their papers.</p>
      <p>47
24
46
23
42</p>
    </sec>
    <sec id="sec-8">
      <title>5. Discussion and conclusions</title>
      <p>The results presented here, allow us to answer our research
questions in the following way:
RQ1. How are the authors connected? Which authors share more
connections and are more central in terms of sharing
commonalities with the others?
We presented a visualization of the authors’ network to provide an
overview of how they are connected to each other. To justify the
authors’ connections and relationships, we evaluated the degree
centrality for the first ten, most central authors. Table 1 presents
the first ten central authors and their degree to show the authors
with the highest relevancy of their publications with others in the
network. Table 1 shows the degree of the authors for sizes of
neighborhoods equal to 10.</p>
    </sec>
    <sec id="sec-9">
      <title>4.4. The LAK papers’ degree centrality</title>
      <p>Figure 5 shows the degree centrality for the first ten papers that
are most similar to the other papers. We selected the first ten top
papers with the highest degrees. The horizontal axis (x) shows the
top ten papers e.g. p1 is the paper with the highest similarity and
thus, the highest degree value among the others shown by the
vertical axis (y). Figure 5 shows degree centrality for two
different sizes of nearest neighborhoods (n), 5 and 10. By
increasing the n, the degree of the papers increases accordingly.
As a result, we will have a larger number of top papers if n is
higher (here, when n=10). In Figure 5, the degree for the first top
paper (p1) is equal to 53 (n=10) and 29 (n=5). This shows how
much p1 shares similarity with other papers. As a consequence, p1
can be considered as the most popular paper and it has the highest
chance to appear in the top paper recommendations.
We presented degree centrality of the LAK papers to give insight
in their relationships in the papers’ visualized network. We
selected the top ten papers that have the highest similarity with the
other papers. To show which papers are placed in the top ten
papers’ list, we present the title and authors for each paper.
The top ten papers are not necessarily by the authors who are
identified as the central authors. Although most of the central
authors also appear in top ten papers’ list (see Table 2), the order
is not the same. As we investigated the LAK data, we found out
that some of the central authors have more than one paper. For
instance, Hendrik Drachsler has contributed to four papers. In this
study, similarity is calculated based on all papers of an author. So,
it is quite probable that not each and every one of the authors’
papers individually has the highest similarity to the other papers.
Although some of the central authors are common to the two
Overall, we found that the LAK dataset can help conference
attendees to become more aware of their research network, which,
in its turn, is useful for sharing knowledge and experiences.
However, the current dataset contains no user feedback or
evaluations to evaluate either an author or a paper recommender
system in terms of common metrics such as prediction accuracy
and coverage of the generated recommendations. For future
analysis it would be helpful if the LAK dataset also contains
references to the papers. The references could be used to identify
the top cited authors and papers within the LAK dataset and
beyond. As a further step, we are planning to try additional social
network analysis measures besides degree, such as betweenness or
closeness.
6.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>De Liddo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Buckingham Shum</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Quinto</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bachler</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Cannavacciuolo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Discourse-centric learning analytics Conference Item</article-title>
          .
          <source>LAK 2011: 1st International Conference on Learning Analytics &amp; Knowledge. Banff</source>
          , Alberta.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Fazeli</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zarghami</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dokoohaki</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Matskin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>Elevating Prediction Accuracy in Trust-aware Collaborative Filtering Recommenders through T-index Metric and TopTrustee lists</article-title>
          .
          <source>JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE</source>
          ,
          <volume>2</volume>
          (
          <issue>4</issue>
          ),
          <fpage>300</fpage>
          -
          <lpage>309</lpage>
          . doi:doi:10.4304/jetwi.2.4.
          <fpage>300</fpage>
          -
          <lpage>309</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Gu</surname>
            ´eret,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Groth</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stadler</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>Assessing Linked Data Mappings using Network Measures</article-title>
          .
          <source>Proceedings of the 9th international conference on The Semantic Web: research and applications</source>
          (pp.
          <fpage>87</fpage>
          -
          <lpage>102</lpage>
          ). Springer-Verlag Berlin, Heidelberg. doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>642</fpage>
          -30284-8_
          <fpage>13</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Opsahl</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agneessens</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Skvoretz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>Node centrality in weighted networks: Generalizing degree and shortest paths</article-title>
          .
          <source>Social Networks</source>
          ,
          <volume>32</volume>
          (
          <issue>3</issue>
          ),
          <fpage>245</fpage>
          -
          <lpage>251</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.socnet.
          <year>2010</year>
          .
          <volume>03</volume>
          .006
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Taibi</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Dietze</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Fostering analytics on learning analytics research: the LAK dataset</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Verbert</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Drachsler</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manouselis</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolpers</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vuorikari</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Duval</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Dataset-driven research for improving recommender systems for learning</article-title>
          .
          <source>Proceedings of the 1st International Conference on Learning Analytics and Knowledge</source>
          (pp.
          <fpage>44</fpage>
          -
          <lpage>53</lpage>
          ). ACM, New York, NY, USA.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>7. Appendix</mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <article-title>7.1. The LAK authors' network</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <article-title>7.2. The LAK papers' network</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>