Socio-semantic Networks of Research Publications in the Learning Analytics Community Soude Fazeli, Hendrik Drachsler, Peter Sloep Open University of the Netherlands (OUNL) Centre for Learning Sciences and Technologies (CELSTEC) 6401 DL Heerlen, The Netherlands 0031-(0)45-576-2218 {soude.fazeli,hendrik.drachsler,peter.sloep}@ou.nl ABSTRACT 2. Motivation In this paper, we present network visualizations and an analysis of It is often difficult for conference attendees to decide which publications data from the LAK (Learning Analytics and workshops or sessions are suitable and relevant for them. Knowledge) in 2011 and 2012, and the special edition on Therefore, a list of recommended authors and papers based on Learning and Knowledge Analytics in Journal of Educational shared interests could be supportive to plan the conference Technology and Society (JETS) in 2012. participation more efficiently and effectively. There already exist several papers published regarding awareness support for Categories and Subject Descriptors researchers (Reinhardt et al., 2012; Fisichella et al., 2010; Ochoa H.3.3 [Information Search and Retrieval]: Information filtering; et al., 2009; Henry et al., 2009) and scientific recommender K.3.m [computers and education]: Miscellaneous systems (Huang et al., 2002; Wang & Blei, 2010) but none of them has analyzed the Learning Analytics datasets for this purpose yet. General Terms Our overall vision is to support the LAK attendees with a list of Algorithm, visualizations LAK authors and papers that are relevant for their own research interests. Such a recommendation could be created based on one Keywords or more of their own research papers but also on a short essay or Network, recommender, visualization, dataset, learning analytics, even a tag cloud summarizing the research interest and objectives. degree Such a priority list can support the awareness of the attendees and 1. Introduction empower the network of like-minded authors in the attendees’ 1 The Society for Learning Analytics Research (SOLAR) provided particular research focus. a dataset to solicit contributions to the LAK data challenge2 sponsored by the FP7 European Project LinkedUp3. The dataset contains research publications in learning analytics and educational data mining for the years 2010, 2011, and 2012 (Taibi & Dietze, 2013). An overview of the dataset is shown in Figure 1. The dataset contains in total, 173 authors and 76 papers from the LAK (Learning Analytics and Knowledge) conference series in 2011 and 2012, and the special edition on learning and knowledge analytics in the Journal of Educational Technology and Society (JETS) in 2012. We found 24 authors who contributed to all three scientific proceedings. Having access to a dataset always offers new opportunities, particularly in the educational domain, that lacks public datasets for running experimental studies (Verbert, Drachsler, Manouselis, Wolpers, Vuorikari, & Duval, 2011). Therefore, we used this Figure 3. The used datasets dataset to present visualization of the authors and papers network, In this paper, then, we aim to explore and identify like-minded and to carry out a deeper analysis of the generated networks. Our authors within the LAK dataset. Supposing that we have a overall aim is to use such a graph of authors and papers to network of all the LAK authors and papers, the main research recommend similar items to a target user. In the following questions are: sections, we evaluate the suitability of the LAK dataset for this purpose. RQ1. How are the authors connected and which authors share more connections and are more central in terms of sharing commonalities with the others? 1 http://www.solaresearch.org/ 2 RQ2. How are the papers connected to each other in terms of http://www.solaresearch.org/events/lak/lak-data-challenge/ similarity? 3 http://linkedup-project.eu/ To answer these questions, we went through two main steps in our analysis: 1. Finding patterns of similarity between authors and papers, 2. Visualizing networks of the LAK authors and papers. 4.1. The LAK authors network We will now describe each step in detail. Figure 2 presents a network of the LAK authors in which red nodes represent the authors and the edges show the similarity between the publications of two authors. The result shows how the LAK authors are connected in terms of their publications' commonalities. Moreover, the network shows the users who share more commonalities than do other authors. We call them ‘central authors’. In the next section, we show how they are connected with the other authors in the network. 4.2. The LAK authors’ degree centrality For some node in the network, the degree centrality shows the total number of incoming and outgoing edges. It is a metric commonly used for Social Network Analysis (SNA) (De Liddo, Buckingham Shum, Quinto, Bachler, & Cannavacciuolo, 2011; Gu´eret, Groth, Stadler, & Lehmann, 2012; Opsahl, Agneessens, Figure 2. The LAK authors’ network & Skvoretz, 2010). In other words, the degree of a node describes (The Appendix shows a larger version) how many other nodes are connected to the target node. In fact, it helps to measure how many hubs are in the network. We describe 3. Data processing hubs as the nodes that have the most connections to the others in To find relationships between authors, we first computed the the network. The degree centrality metric may be used to 4 similarity of the papers with the TF-IDF algorithm. TF-IDF can strengthen a network by providing its nodes with more create a weighted list of the most commonly used terms in connections. In this data study, degree centrality is used to research articles. To generate the TF-IDF matrix for the LAK measure the relevance of an author’s papers to the other authors in dataset, we first converted the LAK data from RDF to text files, the network. 5 which is an accepted format for the Mahout system. Then, we ran 140 n=10 the default TF-IDF algorithm provided by Mahout on the text 121 n=5 120 files. We removed the stop words by setting the configuration 96 variables within Mahout to 90%. Thus, if a word appears in 90% 100 92 85 of the document, it is considered as a stop word (e.g. and, or, the, 80 71 indegree etc.) and is removed from the similarity matrix. As a final 57 64 55 55 outcome we had: 60 46 45 50 49 45 44 36 35 • A so-called dictionary of all the terms in the LAK 40 22 dataset 20 17 16 • A binary sequence file that includes the TF-IDF weighted vectors 0 u1 u2 u3 u4 u5 u6 u7 u8 u9 u10 For computing similarity between the LAK authors, we used the Then  first  t en  central  authors T-index algorithm (Fazeli, Zarghami, Dokoohaki, & Matskin, Figure 3. The degree centrality of the top ten central authors 2010) as a collaborative filtering recommender algorithm that generates a graph of users. In it the nodes are users and the edges Figure 3 shows the degree centrality for the first ten authors with show the relationship between users that originates from similarity the highest similarity degree with respect to the LAK  publications. of user profiles. The T-index algorithm originally makes The horizontal axis (x) shows the top ten central users, e.g. u1 is recommendations based on the ratings data of users. We extended the author whose paper(s) has the highest degree. The vertical axis the T-index algorithm to be able to process tags and keywords (y) shows the degree values that describe the number of 6 extracted from the linked data e.g. RDF files. We used Jena APIs relationships of a each user shown in the x-axis. Figure 3 also to process RDF files and to handle Ontology Web Language shows degree centrality for two different sizes of nearest (OWL) files that describe the generated graph of authors and neighborhoods (n). Such neighborhoods are commonly used in papers. Jena helps to develop semantic Web application and tools. collaborative filtering recommender algorithms. By increasing the neighborhood size n, the degree of the authors increases 4. Data visualization accordingly. As a result, we will have a larger number of central We visualized the generated graphs of authors and papers with the authors when n is higher (e.g. n=10). As can be seen in Figure 3, Welkin7 tool. Welkin takes an OWL file as input and provides degree for the first central author (u1) is equal to 121 if n=10 and visualization of the data as output. We present visualizations of 97 if n=5. These high scores show the high relevancy of u1’s the LAK authors and the LAK papers generated by Welkin in the publications to the authors. As a consequence, u1 will appear in following sub sections. the top-n authors recommendations more often than the other authors. 4 http://en.wikipedia.org/wiki/Tf–idf 5 http://mahout.apache.org/ 6 http://jena.apache.org/ 7 http://simile.mit.edu/welkin/ 5. Discussion and conclusions The results presented here, allow us to answer our research questions in the following way: RQ1. How are the authors connected? Which authors share more connections and are more central in terms of sharing commonalities with the others? We presented a visualization of the authors’ network to provide an overview of how they are connected to each other. To justify the authors’ connections and relationships, we evaluated the degree   centrality for the first ten, most central authors. Table 1 presents Figure 4. The LAK papers network the first ten central authors and their degree to show the authors with the highest relevancy of their publications with others in the (The Appendix shows a larger version) network. Table 1 shows the degree of the authors for sizes of neighborhoods equal to 10. 4.3. The LAK papers network Figure 4 shows a network of the LAK papers. The red nodes are Table 1. The first ten central authors papers and the edges between them represent the similarity of the Author Degree papers. By finding similar papers, we can recommend the most similar papers to specific authors. This increases the awareness of the authors about papers which are relevant to them and published Hendrik Drachsler 116 in their communities. Figure 4 shows that, some of the papers share more similarity with Kon Shing Kenneth Chung 87 the others and own a higher degree number. As with the central authors, these papers will appear more often in the top Wolfgang Greller 80 recommendation list than the other papers of the dataset. One may interpret their degree as their popularity. Therefore, the Javier Melenchon 66 papers with higher degree values are more popular and, presumably, they are more of interests to users. For the publication data, interests of users derives from the words and Brandon White 59 terms they have used more frequently in their papers. 70 Vania Dimitrova 50 58 60 51 47 46 Erik Duval 45 50 42 n=5 40 34 34 33 n=10 degree 30 31 Rebecca Ferguson 44 27 26 30 24 23 21 20 19 18 17 20 16 Anna Lea Dyckhoff 40 10 Simon Buckingham Shum 39 0 p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 Top  t en  papers RQ2. How are the papers connected to each other in terms of Figure 5. The degree centrality of Top ten papers similarity? 4.4. The LAK papers’ degree centrality We presented degree centrality of the LAK papers to give insight Figure 5 shows the degree centrality for the first ten papers that in their relationships in the papers’ visualized network. We are most similar to the other papers. We selected the first ten top selected the top ten papers that have the highest similarity with the papers with the highest degrees. The horizontal axis (x) shows the other papers. To show which papers are placed in the top ten papers’ list, we present the title and authors for each paper. top ten papers e.g. p1 is the paper with the highest similarity and thus, the highest degree value among the others shown by the The top ten papers are not necessarily by the authors who are vertical axis (y). Figure 5 shows degree centrality for two identified as the central authors. Although most of the central different sizes of nearest neighborhoods (n), 5 and 10. By authors also appear in top ten papers’ list (see Table 2), the order increasing the n, the degree of the papers increases accordingly. is not the same. As we investigated the LAK data, we found out As a result, we will have a larger number of top papers if n is that some of the central authors have more than one paper. For higher (here, when n=10). In Figure 5, the degree for the first top instance, Hendrik Drachsler has contributed to four papers. In this paper (p1) is equal to 53 (n=10) and 29 (n=5). This shows how study, similarity is calculated based on all papers of an author. So, much p1 shares similarity with other papers. As a consequence, p1 it is quite probable that not each and every one of the authors’ can be considered as the most popular paper and it has the highest papers individually has the highest similarity to the other papers. chance to appear in the top paper recommendations. Although some of the central authors are common to the two tables, only one of the papers authored by those central authors 6. References appears in the top ten papers list shown by Table 2. Table 2. The Top ten papers De Liddo, A., Buckingham Shum, S., Quinto, I., Bachler, M., & Paper Authors Cannavacciuolo, L. (2011). Discourse-centric learning analytics Conference Item. LAK 2011: 1st International Conference on Learning Analytics & Knowledge. Banff, Learning Dispositions and Simon Buckingham-Shum, Alberta. Transferable Competencies: Ruth Deakin Crick Pedagogy, Modelling and Learning Analytics Fazeli, S., Zarghami, A., Dokoohaki, N., & Matskin, M. (2010). Elevating Prediction Accuracy in Trust-aware The Pulse of Learning Analytics Hendrik Drachsler, Collaborative Filtering Recommenders through T-index Understandings and Expectations Wolfgang Greller Metric and TopTrustee lists. JOURNAL OF EMERGING from the Stakeholders TECHNOLOGIES IN WEB INTELLIGENCE, 2(4), 300– Social Learning Analytics: Five Rebecca Ferguson, 309. doi:doi:10.4304/jetwi.2.4.300-309 Approaches Simon Buckingham-Shum Multi-mediated Community Dan Suthers, Kar Hai Chu Gu´eret, C., Groth, P., Stadler, C., & Lehmann, J. (2012). Structure in a Socio-Technical Assessing Linked Data Mappings using Network Measures. Network Proceedings of the 9th international conference on The Modelling Learning & Walter Christian Paredes, Semantic Web: research and applications (pp. 87–102). Performance: A Social Networks Kon Shing Kenneth Chung Springer-Verlag Berlin, Heidelberg. doi:10.1007/978-3- Perspective 642-30284-8_13 Teaching Analytics: A Clustering Beijie Xu, and Triangulation Study of Digital Mimi M Recker Opsahl, T., Agneessens, F., & Skvoretz, J. (2010). Node centrality Library User Data in weighted networks: Generalizing degree and shortest Monitoring Student Progress Johann Ari Larusson, paths. Social Networks, 32(3), 245–251. Through Their Written "Point of Brandon White doi:10.1016/j.socnet.2010.03.006 Originality" Learning Designs and Learning Lori Lockyer, Analytics Shane Dawson Taibi, D., & Dietze, S. (2013). Fostering analytics on learning analytics research: the LAK dataset. A Multidimensional Analysis Tool Eunchul Lee, for Visualizing Online Interactions M'hammed Abdous Verbert, K., Drachsler, H., Manouselis, N., Wolpers, M., Using computational methods to Bruce Sherin Vuorikari, R., & Duval, E. (2011). Dataset-driven research discover student science for improving recommender systems for learning. conceptions in interview data Proceedings of the 1st International Conference on Learning Analytics and Knowledge (pp. 44–53). ACM, Overall, we found that the LAK dataset can help conference New York, NY, USA. attendees to become more aware of their research network, which, in its turn, is useful for sharing knowledge and experiences. However, the current dataset contains no user feedback or evaluations to evaluate either an author or a paper recommender system in terms of common metrics such as prediction accuracy and coverage of the generated recommendations. For future analysis it would be helpful if the LAK dataset also contains references to the papers. The references could be used to identify the top cited authors and papers within the LAK dataset and beyond. As a further step, we are planning to try additional social network analysis measures besides degree, such as betweenness or closeness. 7. Appendix 7.1. The LAK authors’ network 7.2. The LAK papers’ network