=Paper=
{{Paper
|id=None
|storemode=property
|title=Socio-semantic Networks of Research Publications in the Learning Analytics Community
|pdfUrl=https://ceur-ws.org/Vol-974/lakdatachallenge2013_02.pdf
|volume=Vol-974
|dblpUrl=https://dblp.org/rec/conf/lak/FazeliDS13
}}
==Socio-semantic Networks of Research Publications in the Learning Analytics Community==
Socio-semantic Networks of Research Publications
in the Learning Analytics Community
Soude Fazeli, Hendrik Drachsler, Peter Sloep
Open University of the Netherlands (OUNL)
Centre for Learning Sciences and Technologies (CELSTEC)
6401 DL Heerlen, The Netherlands
0031-(0)45-576-2218
{soude.fazeli,hendrik.drachsler,peter.sloep}@ou.nl
ABSTRACT 2. Motivation
In this paper, we present network visualizations and an analysis of It is often difficult for conference attendees to decide which
publications data from the LAK (Learning Analytics and workshops or sessions are suitable and relevant for them.
Knowledge) in 2011 and 2012, and the special edition on Therefore, a list of recommended authors and papers based on
Learning and Knowledge Analytics in Journal of Educational shared interests could be supportive to plan the conference
Technology and Society (JETS) in 2012. participation more efficiently and effectively. There already exist
several papers published regarding awareness support for
Categories and Subject Descriptors researchers (Reinhardt et al., 2012; Fisichella et al., 2010; Ochoa
H.3.3 [Information Search and Retrieval]: Information filtering; et al., 2009; Henry et al., 2009) and scientific recommender
K.3.m [computers and education]: Miscellaneous systems (Huang et al., 2002; Wang & Blei, 2010) but none of
them has analyzed the Learning Analytics datasets for this
purpose yet.
General Terms Our overall vision is to support the LAK attendees with a list of
Algorithm, visualizations LAK authors and papers that are relevant for their own research
interests. Such a recommendation could be created based on one
Keywords or more of their own research papers but also on a short essay or
Network, recommender, visualization, dataset, learning analytics, even a tag cloud summarizing the research interest and objectives.
degree
Such a priority list can support the awareness of the attendees and
1. Introduction empower the network of like-minded authors in the attendees’
1
The Society for Learning Analytics Research (SOLAR) provided particular research focus.
a dataset to solicit contributions to the LAK data challenge2
sponsored by the FP7 European Project LinkedUp3. The dataset
contains research publications in learning analytics and
educational data mining for the years 2010, 2011, and 2012 (Taibi
& Dietze, 2013). An overview of the dataset is shown in Figure 1.
The dataset contains in total, 173 authors and 76 papers from the
LAK (Learning Analytics and Knowledge) conference series in
2011 and 2012, and the special edition on learning and knowledge
analytics in the Journal of Educational Technology and Society
(JETS) in 2012. We found 24 authors who contributed to all three
scientific proceedings.
Having access to a dataset always offers new opportunities,
particularly in the educational domain, that lacks public datasets
for running experimental studies (Verbert, Drachsler, Manouselis,
Wolpers, Vuorikari, & Duval, 2011). Therefore, we used this Figure 3. The used datasets
dataset to present visualization of the authors and papers network, In this paper, then, we aim to explore and identify like-minded
and to carry out a deeper analysis of the generated networks. Our authors within the LAK dataset. Supposing that we have a
overall aim is to use such a graph of authors and papers to network of all the LAK authors and papers, the main research
recommend similar items to a target user. In the following questions are:
sections, we evaluate the suitability of the LAK dataset for this
purpose. RQ1. How are the authors connected and which authors share
more connections and are more central in terms of sharing
commonalities with the others?
1
http://www.solaresearch.org/
2
RQ2. How are the papers connected to each other in terms of
http://www.solaresearch.org/events/lak/lak-data-challenge/ similarity?
3
http://linkedup-project.eu/ To answer these questions, we went through two main steps in our
analysis: 1. Finding patterns of similarity between authors and
papers, 2. Visualizing networks of the LAK authors and papers. 4.1. The LAK authors network
We will now describe each step in detail. Figure 2 presents a network of the LAK authors in which red
nodes represent the authors and the edges show the similarity
between the publications of two authors. The result shows how
the LAK authors are connected in terms of their publications'
commonalities. Moreover, the network shows the users who share
more commonalities than do other authors. We call them ‘central
authors’. In the next section, we show how they are connected
with the other authors in the network.
4.2. The LAK authors’ degree centrality
For some node in the network, the degree centrality shows the
total number of incoming and outgoing edges. It is a metric
commonly used for Social Network Analysis (SNA) (De Liddo,
Buckingham Shum, Quinto, Bachler, & Cannavacciuolo, 2011;
Gu´eret, Groth, Stadler, & Lehmann, 2012; Opsahl, Agneessens,
Figure 2. The LAK authors’ network & Skvoretz, 2010). In other words, the degree of a node describes
(The Appendix shows a larger version) how many other nodes are connected to the target node. In fact, it
helps to measure how many hubs are in the network. We describe
3. Data processing hubs as the nodes that have the most connections to the others in
To find relationships between authors, we first computed the the network. The degree centrality metric may be used to
4
similarity of the papers with the TF-IDF algorithm. TF-IDF can strengthen a network by providing its nodes with more
create a weighted list of the most commonly used terms in connections. In this data study, degree centrality is used to
research articles. To generate the TF-IDF matrix for the LAK measure the relevance of an author’s papers to the other authors in
dataset, we first converted the LAK data from RDF to text files, the network.
5
which is an accepted format for the Mahout system. Then, we ran 140
n=10
the default TF-IDF algorithm provided by Mahout on the text 121
n=5
120
files. We removed the stop words by setting the configuration
96
variables within Mahout to 90%. Thus, if a word appears in 90% 100 92
85
of the document, it is considered as a stop word (e.g. and, or, the,
80 71
indegree
etc.) and is removed from the similarity matrix. As a final 57
64
55 55
outcome we had: 60
46 45
50 49
45 44
36 35
• A so-called dictionary of all the terms in the LAK 40
22
dataset 20
17 16
• A binary sequence file that includes the TF-IDF
weighted vectors 0
u1 u2 u3 u4 u5 u6 u7 u8 u9 u10
For computing similarity between the LAK authors, we used the Then
first
t en
central
authors
T-index algorithm (Fazeli, Zarghami, Dokoohaki, & Matskin,
Figure 3. The degree centrality of the top ten central authors
2010) as a collaborative filtering recommender algorithm that
generates a graph of users. In it the nodes are users and the edges Figure 3 shows the degree centrality for the first ten authors with
show the relationship between users that originates from similarity the highest similarity degree with respect to the LAK
publications.
of user profiles. The T-index algorithm originally makes The horizontal axis (x) shows the top ten central users, e.g. u1 is
recommendations based on the ratings data of users. We extended the author whose paper(s) has the highest degree. The vertical axis
the T-index algorithm to be able to process tags and keywords (y) shows the degree values that describe the number of
6
extracted from the linked data e.g. RDF files. We used Jena APIs relationships of a each user shown in the x-axis. Figure 3 also
to process RDF files and to handle Ontology Web Language shows degree centrality for two different sizes of nearest
(OWL) files that describe the generated graph of authors and neighborhoods (n). Such neighborhoods are commonly used in
papers. Jena helps to develop semantic Web application and tools. collaborative filtering recommender algorithms. By increasing the
neighborhood size n, the degree of the authors increases
4. Data visualization accordingly. As a result, we will have a larger number of central
We visualized the generated graphs of authors and papers with the authors when n is higher (e.g. n=10). As can be seen in Figure 3,
Welkin7 tool. Welkin takes an OWL file as input and provides degree for the first central author (u1) is equal to 121 if n=10 and
visualization of the data as output. We present visualizations of 97 if n=5. These high scores show the high relevancy of u1’s
the LAK authors and the LAK papers generated by Welkin in the publications to the authors. As a consequence, u1 will appear in
following sub sections. the top-n authors recommendations more often than the other
authors.
4
http://en.wikipedia.org/wiki/Tf–idf
5
http://mahout.apache.org/
6
http://jena.apache.org/
7
http://simile.mit.edu/welkin/
5. Discussion and conclusions
The results presented here, allow us to answer our research
questions in the following way:
RQ1. How are the authors connected? Which authors share more
connections and are more central in terms of sharing
commonalities with the others?
We presented a visualization of the authors’ network to provide an
overview of how they are connected to each other. To justify the
authors’ connections and relationships, we evaluated the degree
centrality for the first ten, most central authors. Table 1 presents
Figure 4. The LAK papers network the first ten central authors and their degree to show the authors
with the highest relevancy of their publications with others in the
(The Appendix shows a larger version) network. Table 1 shows the degree of the authors for sizes of
neighborhoods equal to 10.
4.3. The LAK papers network
Figure 4 shows a network of the LAK papers. The red nodes are Table 1. The first ten central authors
papers and the edges between them represent the similarity of the Author Degree
papers. By finding similar papers, we can recommend the most
similar papers to specific authors. This increases the awareness of
the authors about papers which are relevant to them and published Hendrik Drachsler 116
in their communities.
Figure 4 shows that, some of the papers share more similarity with Kon Shing Kenneth Chung 87
the others and own a higher degree number. As with the central
authors, these papers will appear more often in the top Wolfgang Greller 80
recommendation list than the other papers of the dataset. One
may interpret their degree as their popularity. Therefore, the
Javier Melenchon 66
papers with higher degree values are more popular and,
presumably, they are more of interests to users. For the
publication data, interests of users derives from the words and Brandon White 59
terms they have used more frequently in their papers.
70 Vania Dimitrova 50
58
60
51
47 46 Erik Duval 45
50 42 n=5
40 34 34 33 n=10
degree
30 31 Rebecca Ferguson 44
27 26
30 24 23 21 20 19 18 17
20 16 Anna Lea Dyckhoff 40
10
Simon Buckingham Shum 39
0
p1 p2 p3 p4 p5 p6 p7 p8 p9 p10
Top
t en
papers
RQ2. How are the papers connected to each other in terms of
Figure 5. The degree centrality of Top ten papers similarity?
4.4. The LAK papers’ degree centrality We presented degree centrality of the LAK papers to give insight
Figure 5 shows the degree centrality for the first ten papers that in their relationships in the papers’ visualized network. We
are most similar to the other papers. We selected the first ten top selected the top ten papers that have the highest similarity with the
papers with the highest degrees. The horizontal axis (x) shows the other papers. To show which papers are placed in the top ten
papers’ list, we present the title and authors for each paper.
top ten papers e.g. p1 is the paper with the highest similarity and
thus, the highest degree value among the others shown by the The top ten papers are not necessarily by the authors who are
vertical axis (y). Figure 5 shows degree centrality for two identified as the central authors. Although most of the central
different sizes of nearest neighborhoods (n), 5 and 10. By authors also appear in top ten papers’ list (see Table 2), the order
increasing the n, the degree of the papers increases accordingly. is not the same. As we investigated the LAK data, we found out
As a result, we will have a larger number of top papers if n is that some of the central authors have more than one paper. For
higher (here, when n=10). In Figure 5, the degree for the first top instance, Hendrik Drachsler has contributed to four papers. In this
paper (p1) is equal to 53 (n=10) and 29 (n=5). This shows how study, similarity is calculated based on all papers of an author. So,
much p1 shares similarity with other papers. As a consequence, p1 it is quite probable that not each and every one of the authors’
can be considered as the most popular paper and it has the highest papers individually has the highest similarity to the other papers.
chance to appear in the top paper recommendations. Although some of the central authors are common to the two
tables, only one of the papers authored by those central authors 6. References
appears in the top ten papers list shown by Table 2.
Table 2. The Top ten papers De Liddo, A., Buckingham Shum, S., Quinto, I., Bachler, M., &
Paper Authors Cannavacciuolo, L. (2011). Discourse-centric learning
analytics Conference Item. LAK 2011: 1st International
Conference on Learning Analytics & Knowledge. Banff,
Learning Dispositions and Simon Buckingham-Shum,
Alberta.
Transferable Competencies: Ruth Deakin Crick
Pedagogy, Modelling and Learning
Analytics Fazeli, S., Zarghami, A., Dokoohaki, N., & Matskin, M. (2010).
Elevating Prediction Accuracy in Trust-aware
The Pulse of Learning Analytics Hendrik Drachsler, Collaborative Filtering Recommenders through T-index
Understandings and Expectations Wolfgang Greller Metric and TopTrustee lists. JOURNAL OF EMERGING
from the Stakeholders TECHNOLOGIES IN WEB INTELLIGENCE, 2(4), 300–
Social Learning Analytics: Five Rebecca Ferguson, 309. doi:doi:10.4304/jetwi.2.4.300-309
Approaches Simon Buckingham-Shum
Multi-mediated Community Dan Suthers, Kar Hai Chu Gu´eret, C., Groth, P., Stadler, C., & Lehmann, J. (2012).
Structure in a Socio-Technical Assessing Linked Data Mappings using Network Measures.
Network Proceedings of the 9th international conference on The
Modelling Learning & Walter Christian Paredes, Semantic Web: research and applications (pp. 87–102).
Performance: A Social Networks Kon Shing Kenneth Chung Springer-Verlag Berlin, Heidelberg. doi:10.1007/978-3-
Perspective 642-30284-8_13
Teaching Analytics: A Clustering Beijie Xu,
and Triangulation Study of Digital Mimi M Recker Opsahl, T., Agneessens, F., & Skvoretz, J. (2010). Node centrality
Library User Data in weighted networks: Generalizing degree and shortest
Monitoring Student Progress Johann Ari Larusson, paths. Social Networks, 32(3), 245–251.
Through Their Written "Point of Brandon White doi:10.1016/j.socnet.2010.03.006
Originality"
Learning Designs and Learning Lori Lockyer,
Analytics Shane Dawson Taibi, D., & Dietze, S. (2013). Fostering analytics on learning
analytics research: the LAK dataset.
A Multidimensional Analysis Tool Eunchul Lee,
for Visualizing Online Interactions M'hammed Abdous
Verbert, K., Drachsler, H., Manouselis, N., Wolpers, M.,
Using computational methods to Bruce Sherin Vuorikari, R., & Duval, E. (2011). Dataset-driven research
discover student science for improving recommender systems for learning.
conceptions in interview data Proceedings of the 1st International Conference on
Learning Analytics and Knowledge (pp. 44–53). ACM,
Overall, we found that the LAK dataset can help conference New York, NY, USA.
attendees to become more aware of their research network, which,
in its turn, is useful for sharing knowledge and experiences.
However, the current dataset contains no user feedback or
evaluations to evaluate either an author or a paper recommender
system in terms of common metrics such as prediction accuracy
and coverage of the generated recommendations. For future
analysis it would be helpful if the LAK dataset also contains
references to the papers. The references could be used to identify
the top cited authors and papers within the LAK dataset and
beyond. As a further step, we are planning to try additional social
network analysis measures besides degree, such as betweenness or
closeness.
7. Appendix
7.1. The LAK authors’ network
7.2. The LAK papers’ network