=Paper= {{Paper |id=None |storemode=property |title=Socio-semantic Networks of Research Publications in the Learning Analytics Community |pdfUrl=https://ceur-ws.org/Vol-974/lakdatachallenge2013_02.pdf |volume=Vol-974 |dblpUrl=https://dblp.org/rec/conf/lak/FazeliDS13 }} ==Socio-semantic Networks of Research Publications in the Learning Analytics Community== https://ceur-ws.org/Vol-974/lakdatachallenge2013_02.pdf
           Socio-semantic Networks of Research Publications
                  in the Learning Analytics Community
                                       Soude Fazeli, Hendrik Drachsler, Peter Sloep

                                           Open University of the Netherlands (OUNL)
                                   Centre for Learning Sciences and Technologies (CELSTEC)
                                                6401 DL Heerlen, The Netherlands
                                                      0031-(0)45-576-2218
                                   {soude.fazeli,hendrik.drachsler,peter.sloep}@ou.nl

ABSTRACT                                                              2. Motivation
In this paper, we present network visualizations and an analysis of   It is often difficult for conference attendees to decide which
publications data from the LAK (Learning Analytics and                workshops or sessions are suitable and relevant for them.
Knowledge) in 2011 and 2012, and the special edition on               Therefore, a list of recommended authors and papers based on
Learning and Knowledge Analytics in Journal of Educational            shared interests could be supportive to plan the conference
Technology and Society (JETS) in 2012.                                participation more efficiently and effectively. There already exist
                                                                      several papers published regarding awareness support for
Categories and Subject Descriptors                                    researchers (Reinhardt et al., 2012; Fisichella et al., 2010; Ochoa
H.3.3 [Information Search and Retrieval]: Information filtering;      et al., 2009; Henry et al., 2009) and scientific recommender
K.3.m [computers and education]: Miscellaneous                        systems (Huang et al., 2002; Wang & Blei, 2010) but none of
                                                                      them has analyzed the Learning Analytics datasets for this
                                                                      purpose yet.
General Terms                                                         Our overall vision is to support the LAK attendees with a list of
Algorithm, visualizations                                             LAK authors and papers that are relevant for their own research
                                                                      interests. Such a recommendation could be created based on one
Keywords                                                              or more of their own research papers but also on a short essay or
Network, recommender, visualization, dataset, learning analytics,     even a tag cloud summarizing the research interest and objectives.
degree
                                                                      Such a priority list can support the awareness of the attendees and
1. Introduction                                                       empower the network of like-minded authors in the attendees’
                                                         1
The Society for Learning Analytics Research (SOLAR) provided          particular research focus.
a dataset to solicit contributions to the LAK data challenge2
sponsored by the FP7 European Project LinkedUp3. The dataset
contains research publications in learning analytics and
educational data mining for the years 2010, 2011, and 2012 (Taibi
& Dietze, 2013). An overview of the dataset is shown in Figure 1.
The dataset contains in total, 173 authors and 76 papers from the
LAK (Learning Analytics and Knowledge) conference series in
2011 and 2012, and the special edition on learning and knowledge
analytics in the Journal of Educational Technology and Society
(JETS) in 2012. We found 24 authors who contributed to all three
scientific proceedings.
Having access to a dataset always offers new opportunities,
particularly in the educational domain, that lacks public datasets
for running experimental studies (Verbert, Drachsler, Manouselis,
Wolpers, Vuorikari, & Duval, 2011). Therefore, we used this                             Figure 3. The used datasets
dataset to present visualization of the authors and papers network,   In this paper, then, we aim to explore and identify like-minded
and to carry out a deeper analysis of the generated networks. Our     authors within the LAK dataset. Supposing that we have a
overall aim is to use such a graph of authors and papers to           network of all the LAK authors and papers, the main research
recommend similar items to a target user. In the following            questions are:
sections, we evaluate the suitability of the LAK dataset for this
purpose.                                                              RQ1. How are the authors connected and which authors share
                                                                      more connections and are more central in terms of sharing
                                                                      commonalities with the others?
1
    http://www.solaresearch.org/
2
                                                                      RQ2. How are the papers connected to each other in terms of
    http://www.solaresearch.org/events/lak/lak-data-challenge/        similarity?
3
    http://linkedup-project.eu/                                       To answer these questions, we went through two main steps in our
                                                                      analysis: 1. Finding patterns of similarity between authors and
papers, 2. Visualizing networks of the LAK authors and papers.         4.1. The LAK authors network
We will now describe each step in detail.                              Figure 2 presents a network of the LAK authors in which red
                                                                       nodes represent the authors and the edges show the similarity
                                                                       between the publications of two authors. The result shows how
                                                                       the LAK authors are connected in terms of their publications'
                                                                       commonalities. Moreover, the network shows the users who share
                                                                       more commonalities than do other authors. We call them ‘central
                                                                       authors’. In the next section, we show how they are connected
                                                                       with the other authors in the network.
                                                                       4.2. The LAK authors’ degree centrality
                                                                       For some node in the network, the degree centrality shows the
                                                                       total number of incoming and outgoing edges. It is a metric
                                                                       commonly used for Social Network Analysis (SNA) (De Liddo,
                                                                       Buckingham Shum, Quinto, Bachler, & Cannavacciuolo, 2011;
                                                                       Gu´eret, Groth, Stadler, & Lehmann, 2012; Opsahl, Agneessens,
                Figure 2. The LAK authors’ network                     & Skvoretz, 2010). In other words, the degree of a node describes
                 (The Appendix shows a larger version)                 how many other nodes are connected to the target node. In fact, it
                                                                       helps to measure how many hubs are in the network. We describe
3. Data processing                                                     hubs as the nodes that have the most connections to the others in
To find relationships between authors, we first computed the           the network. The degree centrality metric may be used to
                                          4
similarity of the papers with the TF-IDF algorithm. TF-IDF can         strengthen a network by providing its nodes with more
create a weighted list of the most commonly used terms in              connections. In this data study, degree centrality is used to
research articles. To generate the TF-IDF matrix for the LAK           measure the relevance of an author’s papers to the other authors in
dataset, we first converted the LAK data from RDF to text files,       the network.
                                            5
which is an accepted format for the Mahout system. Then, we ran                    140
                                                                                                                                                                 n=10
the default TF-IDF algorithm provided by Mahout on the text                              121
                                                                                                                                                                 n=5
                                                                                   120
files. We removed the stop words by setting the configuration
                                                                                         96
variables within Mahout to 90%. Thus, if a word appears in 90%                     100         92
                                                                                                    85
of the document, it is considered as a stop word (e.g. and, or, the,
                                                                                    80                   71
                                                                        indegree




etc.) and is removed from the similarity matrix. As a final                                    57
                                                                                                                     64
                                                                                                    55                             55
outcome we had:                                                                     60
                                                                                                         46          45
                                                                                                                                                  50   49
                                                                                                                                                            45         44
                                                                                                                                   36             35
       •    A so-called dictionary of all the terms in the LAK                      40
                                                                                                                                                       22
            dataset                                                                 20
                                                                                                                                                            17         16

       •    A binary sequence file that includes the TF-IDF
            weighted vectors                                                         0
                                                                                         u1    u2   u3   u4          u5            u6             u7   u8   u9     u10

For computing similarity between the LAK authors, we used the                                            Then	
  first	
  t en	
  central	
  authors

T-index algorithm (Fazeli, Zarghami, Dokoohaki, & Matskin,
                                                                        Figure 3. The degree centrality of the top ten central authors
2010) as a collaborative filtering recommender algorithm that
generates a graph of users. In it the nodes are users and the edges    Figure 3 shows the degree centrality for the first ten authors with
show the relationship between users that originates from similarity    the highest similarity degree with respect to the LAK	
  publications.
of user profiles. The T-index algorithm originally makes               The horizontal axis (x) shows the top ten central users, e.g. u1 is
recommendations based on the ratings data of users. We extended        the author whose paper(s) has the highest degree. The vertical axis
the T-index algorithm to be able to process tags and keywords          (y) shows the degree values that describe the number of
                                                             6
extracted from the linked data e.g. RDF files. We used Jena APIs       relationships of a each user shown in the x-axis. Figure 3 also
to process RDF files and to handle Ontology Web Language               shows degree centrality for two different sizes of nearest
(OWL) files that describe the generated graph of authors and           neighborhoods (n). Such neighborhoods are commonly used in
papers. Jena helps to develop semantic Web application and tools.      collaborative filtering recommender algorithms. By increasing the
                                                                       neighborhood size n, the degree of the authors increases
4. Data visualization                                                  accordingly. As a result, we will have a larger number of central
We visualized the generated graphs of authors and papers with the      authors when n is higher (e.g. n=10). As can be seen in Figure 3,
Welkin7 tool. Welkin takes an OWL file as input and provides           degree for the first central author (u1) is equal to 121 if n=10 and
visualization of the data as output. We present visualizations of      97 if n=5. These high scores show the high relevancy of u1’s
the LAK authors and the LAK papers generated by Welkin in the          publications to the authors. As a consequence, u1 will appear in
following sub sections.                                                the top-n authors recommendations more often than the other
                                                                       authors.

4
    http://en.wikipedia.org/wiki/Tf–idf
5
    http://mahout.apache.org/
6
    http://jena.apache.org/
7
    http://simile.mit.edu/welkin/
                                                                                                5. Discussion and conclusions
                                                                                                The results presented here, allow us to answer our research
                                                                                                questions in the following way:
                                                                                                RQ1. How are the authors connected? Which authors share more
                                                                                                connections and are more central in terms of sharing
                                                                                                commonalities with the others?
                                                                                                We presented a visualization of the authors’ network to provide an
                                                                                                overview of how they are connected to each other. To justify the
                                                                                                authors’ connections and relationships, we evaluated the degree
                                                                                         	
     centrality for the first ten, most central authors. Table 1 presents
                      Figure 4. The LAK papers network                                          the first ten central authors and their degree to show the authors
                                                                                                with the highest relevancy of their publications with others in the
                     (The Appendix shows a larger version)                                      network. Table 1 shows the degree of the authors for sizes of
                                                                                                neighborhoods equal to 10.
4.3. The LAK papers network
Figure 4 shows a network of the LAK papers. The red nodes are                                                  Table 1. The first ten central authors
papers and the edges between them represent the similarity of the                                 Author                                              Degree
papers. By finding similar papers, we can recommend the most
similar papers to specific authors. This increases the awareness of
the authors about papers which are relevant to them and published                                 Hendrik Drachsler                                     116
in their communities.
Figure 4 shows that, some of the papers share more similarity with                                Kon Shing Kenneth Chung                                87
the others and own a higher degree number. As with the central
authors, these papers will appear more often in the top                                           Wolfgang Greller                                       80
recommendation list than the other papers of the dataset. One
may interpret their degree as their popularity. Therefore, the
                                                                                                  Javier Melenchon                                       66
papers with higher degree values are more popular and,
presumably, they are more of interests to users. For the
publication data, interests of users derives from the words and                                   Brandon White                                          59
terms they have used more frequently in their papers.
           70                                                                                     Vania Dimitrova                                        50
                58
           60
                      51
                           47   46                                                                Erik Duval                                             45
           50                            42                                       n=5
           40   34                                34         33                   n=10
  degree




                      30                                          31                              Rebecca Ferguson                                       44
                                                                       27   26
           30              24   23       21       20         19   18   17
           20                                                               16                    Anna Lea Dyckhoff                                      40
           10
                                                                                                  Simon Buckingham Shum                                  39
            0
                p1    p2   p3   p4       p5       p6         p7   p8   p9   p10
                                     Top	
  t en	
  papers
                                                                                                RQ2. How are the papers connected to each other in terms of
            Figure 5. The degree centrality of Top ten papers                                   similarity?

4.4. The LAK papers’ degree centrality                                                          We presented degree centrality of the LAK papers to give insight
Figure 5 shows the degree centrality for the first ten papers that                              in their relationships in the papers’ visualized network. We
are most similar to the other papers. We selected the first ten top                             selected the top ten papers that have the highest similarity with the
papers with the highest degrees. The horizontal axis (x) shows the                              other papers. To show which papers are placed in the top ten
                                                                                                papers’ list, we present the title and authors for each paper.
top ten papers e.g. p1 is the paper with the highest similarity and
thus, the highest degree value among the others shown by the                                    The top ten papers are not necessarily by the authors who are
vertical axis (y). Figure 5 shows degree centrality for two                                     identified as the central authors. Although most of the central
different sizes of nearest neighborhoods (n), 5 and 10. By                                      authors also appear in top ten papers’ list (see Table 2), the order
increasing the n, the degree of the papers increases accordingly.                               is not the same. As we investigated the LAK data, we found out
As a result, we will have a larger number of top papers if n is                                 that some of the central authors have more than one paper. For
higher (here, when n=10). In Figure 5, the degree for the first top                             instance, Hendrik Drachsler has contributed to four papers. In this
paper (p1) is equal to 53 (n=10) and 29 (n=5). This shows how                                   study, similarity is calculated based on all papers of an author. So,
much p1 shares similarity with other papers. As a consequence, p1                               it is quite probable that not each and every one of the authors’
can be considered as the most popular paper and it has the highest                              papers individually has the highest similarity to the other papers.
chance to appear in the top paper recommendations.                                              Although some of the central authors are common to the two
tables, only one of the papers authored by those central authors      6.   References
appears in the top ten papers list shown by Table 2.
                  Table 2. The Top ten papers                         De Liddo, A., Buckingham Shum, S., Quinto, I., Bachler, M., &
Paper                                  Authors                              Cannavacciuolo, L. (2011). Discourse-centric learning
                                                                            analytics Conference Item. LAK 2011: 1st International
                                                                            Conference on Learning Analytics & Knowledge. Banff,
Learning Dispositions and              Simon Buckingham-Shum,
                                                                            Alberta.
Transferable Competencies:             Ruth Deakin Crick
Pedagogy, Modelling and Learning
Analytics                                                             Fazeli, S., Zarghami, A., Dokoohaki, N., & Matskin, M. (2010).
                                                                             Elevating Prediction Accuracy in Trust-aware
The Pulse of Learning Analytics        Hendrik Drachsler,                    Collaborative Filtering Recommenders through T-index
Understandings and Expectations        Wolfgang Greller                      Metric and TopTrustee lists. JOURNAL OF EMERGING
from the Stakeholders                                                        TECHNOLOGIES IN WEB INTELLIGENCE, 2(4), 300–
Social Learning Analytics: Five        Rebecca Ferguson,                     309. doi:doi:10.4304/jetwi.2.4.300-309
Approaches                             Simon Buckingham-Shum
Multi-mediated Community               Dan Suthers, Kar Hai Chu       Gu´eret, C., Groth, P., Stadler, C., & Lehmann, J. (2012).
Structure in a Socio-Technical                                              Assessing Linked Data Mappings using Network Measures.
Network                                                                     Proceedings of the 9th international conference on The
Modelling Learning &                   Walter Christian Paredes,            Semantic Web: research and applications (pp. 87–102).
Performance: A Social Networks         Kon Shing Kenneth Chung              Springer-Verlag Berlin, Heidelberg. doi:10.1007/978-3-
Perspective                                                                 642-30284-8_13
Teaching Analytics: A Clustering       Beijie Xu,
and Triangulation Study of Digital     Mimi M Recker                  Opsahl, T., Agneessens, F., & Skvoretz, J. (2010). Node centrality
Library User Data                                                          in weighted networks: Generalizing degree and shortest
Monitoring Student Progress            Johann Ari Larusson,                paths. Social Networks, 32(3), 245–251.
Through Their Written "Point of        Brandon White                       doi:10.1016/j.socnet.2010.03.006
Originality"
Learning Designs and Learning          Lori Lockyer,
Analytics                              Shane Dawson                   Taibi, D., & Dietze, S. (2013). Fostering analytics on learning
                                                                             analytics research: the LAK dataset.
A Multidimensional Analysis Tool       Eunchul Lee,
for Visualizing Online Interactions    M'hammed Abdous
                                                                      Verbert, K., Drachsler, H., Manouselis, N., Wolpers, M.,
Using computational methods to         Bruce Sherin                         Vuorikari, R., & Duval, E. (2011). Dataset-driven research
discover student science                                                    for improving recommender systems for learning.
conceptions in interview data                                               Proceedings of the 1st International Conference on
                                                                            Learning Analytics and Knowledge (pp. 44–53). ACM,
Overall, we found that the LAK dataset can help conference                  New York, NY, USA.
attendees to become more aware of their research network, which,
in its turn, is useful for sharing knowledge and experiences.
However, the current dataset contains no user feedback or
evaluations to evaluate either an author or a paper recommender
system in terms of common metrics such as prediction accuracy
and coverage of the generated recommendations. For future
analysis it would be helpful if the LAK dataset also contains
references to the papers. The references could be used to identify
the top cited authors and papers within the LAK dataset and
beyond. As a further step, we are planning to try additional social
network analysis measures besides degree, such as betweenness or
closeness.
7. Appendix
7.1. The LAK authors’ network




7.2. The LAK papers’ network