=Paper=
{{Paper
|id=None
|storemode=property
|title=Analysis of the Community of Learning Analytics
|pdfUrl=https://ceur-ws.org/Vol-974/lakdatachallenge2013_05.pdf
|volume=Vol-974
|dblpUrl=https://dblp.org/rec/conf/lak/NawazMS13
}}
==Analysis of the Community of Learning Analytics==
Analysis of the Community of Learning Analytics Sadia Nawaz Farshid Marbouti Johannes Strobel Purdue University Purdue University Purdue University West Lafayette, IN, USA West Lafayette, IN, USA West Lafayette, IN, USA sadia@alumni.purdue.edu fmarbout@purdue.edu jstrobel@purdue.edu ABSTRACT Analytics, scholars from different disciplines such as education, The trends of the learning analytics community being presented in technology, and social sciences are contributing towards this field this paper are in terms of authors, their affiliation and [6]. Different authors with different backgrounds, expertise and geographical location. Thus the most influential authors, purpose publish and present their work in Learning Analytics institutes, and countries who have been actively contributing to related journals and conferences. To draw a better understanding this field are brought out. In addition, this paper identifies of who are top collaborates in the field and which institutes and collaborations among authors, institutes, and countries. The paper countries are more active in creating and disseminating also tries to explore the research themes followed by the learning knowledge, we analyzed the data described in the previous analytics community. section. 1. DATA AND TOOLS 3. AUTHORSHIP TRENDS The data that is analyzed in this paper consists of the conference Complete summary of various (author related) statistics has been on Learning Analytics and Knowledge (LAK) 2011–2012, provided in table 1 (detailed definition of these graph theory Educational Data Mining (EDM) conference 2008–2012 and the related terms is available at [7]). Analysis of authors provides Journal of Educational Technology and Society (JETS) special information which not only helps in understanding the growth of edition on learning and knowledge analytics. This data was the field (in terms of publication counts and author counts etc.) provided on the Society for Learning Analytics Research but also is used to predict the future of the field e.g.,: information (SoLAR) website in xml format [1]. The xml data converted to such as „connected components‟ and „maximum edges in a tabular data using an xml to csv convertor [2]. The converted csv connected component‟ is showing that the graphs are getting well files were then processed and merged using macro programming populated and connected – thus, employing more inclination in MS Excel. Later, this data was using NodeXL tool – an open towards collaboration Overall it can be said that the field itself is source template for Microsoft® Excel® [3]. It allows the user to growing as apparent from node counts (2008-2012) and article work on different worksheets for different operations such as counts (the sum of single and multi-author article counts). „Edges‟ worksheet can be used to compute the inter/intra Similarly, self-loop count together with single vertex connected collaboration. „Vertices‟ worksheet allows the display and component can show how many authors of the single authored computation of individual node properties such as degree, publication have / have not collaborated (within this data)? e.g., betweenness, centrality etc. Other tools that have been utilized in the last column indicates that overall there have been 26 single- this paper include NetDraw [4] and IBM‟s Many-eyes [5]. authored articles by 25 authors. It was found that 14 of these authors have had no collaborative work in this data. And it was also found that „Stephen E. Fancsali‟ is the only author with two 2. MOTIVATION single authored publications. With increase of attention to interdisciplinary field of Learning Table 1: Combined statistics for EDM, LAK and JETS Graph Metric (graph theory terminologies) 2008 2009 2010 2011 2012 Total Total unique vertices / nodes (authors) 74 79 151 193 281 623 Unique edges (edge is loop for single author articles & straight line otherwise) 100 106 208 251 435 938 Edges with duplicates (i.e., edge weight is greater than 1) (These edges show joint authorship in more than one publication ) 17 18 50 42 48 337 Total edges 117 124 258 293 483 1275 Self-loop (single author articles) 4 1 3 10 8 26 Multi-author article count 27 31 61 75 96 27 Connected components (authors forming a cluster based on authorship) 20 22 38 53 79 140 Single-vertex connected components (Count of the authors of single author articles who did not collaborate) 4 0 3 8 7 14 Maximum vertices in a connected component 15 7 15 29 22 113 Maximum edges in a connected component 33 16 36 72 76 370 Table 4: Top 10 authors with highest degree counts 4. COLLABORATION TRENDS Collaboration as defined in Oxford dictionary [8] is the „action of Author Degree Article Count working with someone to produce something‟ and in current Kenneth R. Koedinger 34 17 context it represents co-authorship of an article by two or more Ryan S. J. d. Baker 25 11 researchers. This term can be extended to institutes and even C. Romero 19 11 countries and hence extended collaboration patterns will be Vincent Aleven 18 5 extracted between and within institutes and countries respectively. S. Ventura 17 11 Table 2 shows that there have been 938 pairs of authors who Neil T. Heffernan 16 16 collaborated just once (this number includes single author articles Sujith M. Gowda 15 5 - since in that case a self-loop serves as an edge to itself). Mykola Pechenizkiy 15 7 Alternatively, it can be stated that 73.57% of all articles have been Arthur C. Graesser 14 4 written by the authors who have collaborated just once. It could Jack Mostow 13 12 either mean that new collaborations are forming or that the authors published just once and then they started working in other 6. GEOGRAPHICAL LOCATION research areas, with other authors or they started targeting other venues. Therefore, initiatives such as LAK Data challenge will Next, the geographical analysis of this dataset is presented which attract more researchers towards this field and hence may help in aims to explore the countries that have been extending this field further growth and development of authorship networks. especially through contributions to the venues: EDM, LAK and JETS. There have been contributions from 41 different countries. Table 2: Overall collaboration pattern For extracting this information, all aliases of a country‟s name were merged e.g., Netherland, Netherlands, The_Netherlands etc. Author Pairs Article Counts were all merged together. The top countries that have had 1 10 international collaborations are provided in table 5. Clearly, USA 2 6 and UK are on top of the list. To illustrate the collaboration 2 5 patterns between countries figure 1 is drawn using „NetDraw‟. In 10 4 this figure an edge between two countries depicts the co- 15 3 authorship between the researchers from these countries. The edge 110 2 width (also represented by a number) shows the strength of such 938 1 collaboration. Also, different symbols have been used for different 1(10)+2(6)+2(5)+10(4)+15(3)+110(2)+938(1) =1275 nodes based on their „betweeness‟ values. „Betweenness centrality‟ is the “number of times a node acts as a bridge along the shortest path between two other nodes” [9]. Clearly, USA, UK Table 3 presents some of the top collaborators e.g., N.T. and Germany are on top of this list based on degree and centrality Heffernan had been a co-author with J.E. Beck and Z.A. Pardos in measures. It is apparent that most of the nodes have „betweenness‟ 6 articles. Such analysis can help in finding active researchers and value of zero as depicted with a „+‟ symbol. It indicates the collaborators in this field. peripheral nature of these nodes and thus depicts the birth or growth of this field – in that newer nodes are being added and the Table 3: Top collaborators based on article count graph is currently sparse. Figure 2 illustrates geographical Author Author Article Count diversity of collaborators. The smaller circles show lesser S. Ventura C. Romero 10 diversity in terms of collaboration (with researchers from other Joseph E. Beck, countries). Similarly, larger circles are indicative of the countries Neil T. Heffernan 6, 6 whose researchers have more diverse group of co-authors (from Zachary A. Pardos Arnon Hershkovitz Rafi Nachmias 5 across the world). In this figure a small table at the bottom depicts Sujith M. Gowda Ryan S. J. d. Baker 5 the count of papers from each continent. Thus it brings out the most active region for research in the area of learning analytics. Clearly, North America and Europe are at the top of this list 5. DIVERSITY (complete geographical mapping is available at [5]). Diversity in this context is the count of distinct researchers – a Table 5: Top international collaborators given author may have worked with. Table 4 aims at identifying the contributors who have worked with most diverse group of Country Degree authors e.g., K.R. Koedinger has worked with 34 distinct authors USA 11 and Ryan Baker has worked with 25 distinct authors. We also UK 10 extracted the graph of these top contributors (based on degree) Australia, Germany 6 i.e., a graph which includes these top authors and all of their Netherland 5 collaborators; and it was found that this new graph consists of 128 Canada, Belgium, Greece, Spain 4 authors (roughly 21% of the total authors). This percentage shows the significance of the top authors towards EDM, LAK, JETS and in general towards learning analytics. Figure 1: Collaboration in terms of geographical location Figure 2: Geographical diversity of collaborators i.e., 2008-2009 this field is empty, similarly some of the articles 7. AUTHOR AFFILIATION in later years had this field empty. Therefore, it was decided to Next, the institutional affiliation of authors was analyzed and it use the „title‟ field for the purpose of keyword extraction. The was found that there have been contributions from 200 different selection of „title‟ field rather than the „abstract‟ field for the institutes world-wide. The ranking of the top few institutes in purpose of keyword extraction relies on an earlier study by the terms of collaboration with other institutes is provided in table authors of this paper [10]. Later, Hermetic Word Frequency 6. The term degree represents count of unique institutes that a Counter (HWFC) software [11] was used to parse out top 30 given institute may have worked with. This term can be keywords for each year. Some of the common English keywords influenced by both the „article counts‟ and the „coauthor are already ignored by this software, as available in its stop word counts‟. Table 7 provides the institutes with highest count of list. Other words which are apparent by the nature of the venues intra-institute collaboration and table 8 provides the „institute – EDM, LAK and JETS were then manually eliminated (since pairs‟ that have had highest collaboration. Such analysis is they would not bring any insightful information for this beneficial to research institutes and organizations so that they analysis) e.g., student, learn, knowledge, education etc. Further may collaborate and extend further studies in the field of refinement was made to merge varying instances of the same learning analytics. Figure 3 illustrates trends of collaboration word such as „visual, visualize, visualization‟ etc. Then, IBM‟s between institutes. Many-eyes software utility was used to obtain the Matrix Chart Table 6: Top institutes with highest counts of distinct as provided in figure 4. In this figure top 30 keywords for each collaborators year have been presented. It should be noted that since the count of articles and venues has also increased over years; therefore, Institute Degree the relative rank or position of keywords will be discussed rather Carnegie Mellon University 20 than absolute frequency counts. From this figure, it was found University of Cordoba 9 that the usage of some of the keywords such as „visualization, Stanford University 8 intelligent, network*‟ is increasing over time. Some keywords Fraunhofer Institute for Applied Information Technology 7 such as „model*, system*, tutor*‟ retain their ranks. The Dept. Computer wetenschappen, KU Leuven 7 keywords „online, collaborat*, performance‟ etc. show Worcester Polytechnic Institute 7 fluctuating trends. Similarly, other trends can be interpreted. The authors further extracted the context of these keywords: it Open University of the Netherlands 6 was found that „visualization co-occurs with data-mining‟, University of Pittsburgh 6 „intelligent appears with tutoring system‟. The word „online‟ has a broader class of co-occurring keywords which includes Table 7: Top institutes with highest count of intra-institute „learning, education, university, assessment systems, tutoring, collaboration courses, curriculum‟ etc. Interestingly, in 2012 the context changed to „online communities, interactions and social Institute Self-loop count learning‟ etc. Due to space restriction further analysis cannot be Worcester Polytechnic Institute 116 provided in this paper. Carnegie Mellon University 107 Eindhoven University of Technology 36 CONCLUSION University of Cordoba 33 In this paper the data of past five years of publications related to University of Memphis 31 learning analytics are analyzed. The trends show increasing Universitat Oberta de Catalunya (UOC) 31 number of authors and more collaboration between authors as University of North Carolina at Charlotte 20 well as institutes. Geographical analysis of authors shows that RWTH Aachen University 16 scholars from different countries have been collaborating and contributing towards this field. Top authors, collaborators, and 8. RESEARCH THEMES institutes are identified in this paper. The authors also attempted In order to track the research themes being followed by learning to bring out the research themes followed by the learning analytics society and to see their emergence over time, the analytics community based on the frequency of the usage of authors conducted a keyword based analysis. The information keywords. for this analysis has been extracted from the keyword (subject) section of the data provided by Society for Learning Analytics The authors plan to extend this study based on author‟s Research (SoLAR) website [1]. However, for initial two years disciplinary diversity and on the association between authors and their explored research areas within learning analytics. Table 8: Top pairs for inter-institute collaboration Institute Institute Edge weight Worcester Polytechnic Institute Carnegie Mellon University 37 Claremont Graduate University University of Memphis 18 University of Belgrade Simon Fraser University 9 Northern Illinois University University of Memphis 9 Hochschule fur Wirtschaft und Recht Hochschule fur Technik und Wirtschaft 8 Beuth Hochschule fur Technik Berlin Hochschule fur Technik und Wirtschaft 8 Universidade Federal de Alagoas Carnegie Mellon University 8 Fraunhofer Institute for Applied Information Technology Saarland University 8 Figure 3: Trends of collaboration in terms of author affiliation REFERENCES [1] Taibi, D., Dietze, S., Fostering analytics on learning [7] YWORKS, 2013. Y works developer‟s guide glossary. analytics research: the LAK dataset, Technical Report, http://docs.yworks.com/yfiles/doc/developers- 03/2013 guide/glossary.html [2] LUXON SOFTWARE, 2013. Luxon software converter. [8] OXFORD DICTIONARIES, 2013. Oxford dictionary http://www.luxonsoftware.com/converter/xmltocsv collaboration. [3] NODEXL, 2013. NodeXL. http://nodexl.codeplex.com/ http://oxforddictionaries.com/definition/english/collaborati on [4] Borgatti, S.P., 2002. NetDraw Software for Network Visualization. Analytic Technologies: Lexington, KY [9] WIKIPEDIA, 2013. Wikipedia centrality. http://en.wikipedia.org/wiki/Betweenness#Betweenness_ce [5] IBM, 2013. Many eyes. http://www- ntrality 958.ibm.com/software/analytics/manyeyes/visualizations/a nalysis-of-the-community-of-learn [10] Nawaz, S., Strobel, J., 2013. IEEE Transactions on Education – authorship and content analysis, under [6] Ferguson, R. 2012. The State Of Learning Analytics in preparation 2012: A Review and Future Challenges. Technical Report KMI 12-01, Knowledge Media Institute, The Open [11] HERMETIC, 2013. Hermetic Word Frequency Counter. University, UK. http://www.hermetic.ch/wfc/wfc.htm http://kmi.open.ac.uk/publications/techreport/kmi-12-01 Figure 4: Keyword analysis for research theme extraction