=Paper= {{Paper |id=None |storemode=property |title=Analysis of the Community of Learning Analytics |pdfUrl=https://ceur-ws.org/Vol-974/lakdatachallenge2013_05.pdf |volume=Vol-974 |dblpUrl=https://dblp.org/rec/conf/lak/NawazMS13 }} ==Analysis of the Community of Learning Analytics== https://ceur-ws.org/Vol-974/lakdatachallenge2013_05.pdf
           Analysis of the Community of Learning Analytics
               Sadia Nawaz                                Farshid Marbouti                                   Johannes Strobel
          Purdue University                               Purdue University                                  Purdue University
        West Lafayette, IN, USA                         West Lafayette, IN, USA                            West Lafayette, IN, USA
    sadia@alumni.purdue.edu                           fmarbout@purdue.edu                                  jstrobel@purdue.edu

ABSTRACT                                                                    Analytics, scholars from different disciplines such as education,
The trends of the learning analytics community being presented in           technology, and social sciences are contributing towards this field
this paper are in terms of authors, their affiliation and                   [6]. Different authors with different backgrounds, expertise and
geographical location. Thus the most influential authors,                   purpose publish and present their work in Learning Analytics
institutes, and countries who have been actively contributing to            related journals and conferences. To draw a better understanding
this field are brought out. In addition, this paper identifies              of who are top collaborates in the field and which institutes and
collaborations among authors, institutes, and countries. The paper          countries are more active in creating and disseminating
also tries to explore the research themes followed by the learning          knowledge, we analyzed the data described in the previous
analytics community.                                                        section.


1. DATA AND TOOLS                                                           3. AUTHORSHIP TRENDS
The data that is analyzed in this paper consists of the conference          Complete summary of various (author related) statistics has been
on Learning Analytics and Knowledge (LAK) 2011–2012,                        provided in table 1 (detailed definition of these graph theory
Educational Data Mining (EDM) conference 2008–2012 and the                  related terms is available at [7]). Analysis of authors provides
Journal of Educational Technology and Society (JETS) special                information which not only helps in understanding the growth of
edition on learning and knowledge analytics. This data was                  the field (in terms of publication counts and author counts etc.)
provided on the Society for Learning Analytics Research                     but also is used to predict the future of the field e.g.,: information
(SoLAR) website in xml format [1]. The xml data converted to                such as „connected components‟ and „maximum edges in a
tabular data using an xml to csv convertor [2]. The converted csv           connected component‟ is showing that the graphs are getting well
files were then processed and merged using macro programming                populated and connected – thus, employing more inclination
in MS Excel. Later, this data was using NodeXL tool – an open               towards collaboration Overall it can be said that the field itself is
source template for Microsoft® Excel® [3]. It allows the user to            growing as apparent from node counts (2008-2012) and article
work on different worksheets for different operations such as               counts (the sum of single and multi-author article counts).
„Edges‟ worksheet can be used to compute the inter/intra                    Similarly, self-loop count together with single vertex connected
collaboration. „Vertices‟ worksheet allows the display and                  component can show how many authors of the single authored
computation of individual node properties such as degree,                   publication have / have not collaborated (within this data)? e.g.,
betweenness, centrality etc. Other tools that have been utilized in         the last column indicates that overall there have been 26 single-
this paper include NetDraw [4] and IBM‟s Many-eyes [5].                     authored articles by 25 authors. It was found that 14 of these
                                                                            authors have had no collaborative work in this data. And it was
                                                                            also found that „Stephen E. Fancsali‟ is the only author with two
2. MOTIVATION                                                               single authored publications.
With increase of attention to interdisciplinary field of Learning
                                           Table 1: Combined statistics for EDM, LAK and JETS

 Graph Metric (graph theory terminologies)                                               2008       2009      2010       2011      2012    Total
 Total unique vertices / nodes (authors)                                                  74         79        151       193       281      623
 Unique edges (edge is loop for single author articles & straight line otherwise)         100       106        208       251       435      938
 Edges with duplicates (i.e., edge weight is greater than 1)
 (These edges show joint authorship in more than one publication )                        17         18        50         42        48      337

 Total edges                                                                              117       124        258       293       483      1275
 Self-loop (single author articles)                                                        4          1         3         10        8        26
 Multi-author article count                                                               27         31        61         75        96       27
 Connected components (authors forming a cluster based on authorship)                     20         22        38         53        79      140
 Single-vertex connected components
 (Count of the authors of single author articles who did not collaborate)                  4          0         3          8        7        14

 Maximum vertices in a connected component                                                15          7        15         29        22      113
 Maximum edges in a connected component                                                   33         16        36         72        76      370
                                                                              Table 4: Top 10 authors with highest degree counts
4. COLLABORATION TRENDS
Collaboration as defined in Oxford dictionary [8] is the „action of           Author                       Degree     Article Count
working with someone to produce something‟ and in current                     Kenneth R. Koedinger           34             17
context it represents co-authorship of an article by two or more              Ryan S. J. d. Baker            25             11
researchers. This term can be extended to institutes and even                 C. Romero                      19             11
countries and hence extended collaboration patterns will be                   Vincent Aleven                 18              5
extracted between and within institutes and countries respectively.           S. Ventura                     17             11
Table 2 shows that there have been 938 pairs of authors who                   Neil T. Heffernan              16             16
collaborated just once (this number includes single author articles           Sujith M. Gowda                15              5
- since in that case a self-loop serves as an edge to itself).                Mykola Pechenizkiy             15              7
Alternatively, it can be stated that 73.57% of all articles have been         Arthur C. Graesser             14              4
written by the authors who have collaborated just once. It could              Jack Mostow                    13             12
either mean that new collaborations are forming or that the
authors published just once and then they started working in other      6. GEOGRAPHICAL LOCATION
research areas, with other authors or they started targeting other
venues. Therefore, initiatives such as LAK Data challenge will          Next, the geographical analysis of this dataset is presented which
attract more researchers towards this field and hence may help in       aims to explore the countries that have been extending this field
further growth and development of authorship networks.                  especially through contributions to the venues: EDM, LAK and
                                                                        JETS. There have been contributions from 41 different countries.
             Table 2: Overall collaboration pattern                     For extracting this information, all aliases of a country‟s name
                                                                        were merged e.g., Netherland, Netherlands, The_Netherlands etc.
         Author Pairs             Article Counts                        were all merged together. The top countries that have had
               1                        10                              international collaborations are provided in table 5. Clearly, USA
               2                         6                              and UK are on top of the list. To illustrate the collaboration
               2                         5                              patterns between countries figure 1 is drawn using „NetDraw‟. In
               10                        4                              this figure an edge between two countries depicts the co-
               15                        3                              authorship between the researchers from these countries. The edge
              110                        2                              width (also represented by a number) shows the strength of such
              938                        1                              collaboration. Also, different symbols have been used for different
      1(10)+2(6)+2(5)+10(4)+15(3)+110(2)+938(1) =1275                   nodes based on their „betweeness‟ values. „Betweenness
                                                                        centrality‟ is the “number of times a node acts as a bridge along
                                                                        the shortest path between two other nodes” [9]. Clearly, USA, UK
Table 3 presents some of the top collaborators e.g., N.T.               and Germany are on top of this list based on degree and centrality
Heffernan had been a co-author with J.E. Beck and Z.A. Pardos in        measures. It is apparent that most of the nodes have „betweenness‟
6 articles. Such analysis can help in finding active researchers and    value of zero as depicted with a „+‟ symbol. It indicates the
collaborators in this field.                                            peripheral nature of these nodes and thus depicts the birth or
                                                                        growth of this field – in that newer nodes are being added and the
       Table 3: Top collaborators based on article count
                                                                        graph is currently sparse. Figure 2 illustrates geographical
   Author                 Author                 Article Count          diversity of collaborators. The smaller circles show lesser
   S. Ventura             C. Romero                    10               diversity in terms of collaboration (with researchers from other
                          Joseph E. Beck,                               countries). Similarly, larger circles are indicative of the countries
   Neil T. Heffernan                                   6, 6             whose researchers have more diverse group of co-authors (from
                          Zachary A. Pardos
   Arnon Hershkovitz      Rafi Nachmias                 5               across the world). In this figure a small table at the bottom depicts
   Sujith M. Gowda        Ryan S. J. d. Baker           5               the count of papers from each continent. Thus it brings out the
                                                                        most active region for research in the area of learning analytics.
                                                                        Clearly, North America and Europe are at the top of this list
5. DIVERSITY                                                            (complete geographical mapping is available at [5]).
Diversity in this context is the count of distinct researchers – a                  Table 5: Top international collaborators
given author may have worked with. Table 4 aims at identifying
the contributors who have worked with most diverse group of                        Country                               Degree
authors e.g., K.R. Koedinger has worked with 34 distinct authors                   USA                                     11
and Ryan Baker has worked with 25 distinct authors. We also                        UK                                      10
extracted the graph of these top contributors (based on degree)                    Australia, Germany                      6
i.e., a graph which includes these top authors and all of their                    Netherland                              5
collaborators; and it was found that this new graph consists of 128                Canada, Belgium, Greece, Spain          4
authors (roughly 21% of the total authors). This percentage shows
the significance of the top authors towards EDM, LAK, JETS and
in general towards learning analytics.
Figure 1: Collaboration in terms of geographical location




   Figure 2: Geographical diversity of collaborators
                                                                           i.e., 2008-2009 this field is empty, similarly some of the articles
7. AUTHOR AFFILIATION                                                      in later years had this field empty. Therefore, it was decided to
Next, the institutional affiliation of authors was analyzed and it         use the „title‟ field for the purpose of keyword extraction. The
was found that there have been contributions from 200 different            selection of „title‟ field rather than the „abstract‟ field for the
institutes world-wide. The ranking of the top few institutes in            purpose of keyword extraction relies on an earlier study by the
terms of collaboration with other institutes is provided in table          authors of this paper [10]. Later, Hermetic Word Frequency
6. The term degree represents count of unique institutes that a            Counter (HWFC) software [11] was used to parse out top 30
given institute may have worked with. This term can be                     keywords for each year. Some of the common English keywords
influenced by both the „article counts‟ and the „coauthor                  are already ignored by this software, as available in its stop word
counts‟. Table 7 provides the institutes with highest count of             list. Other words which are apparent by the nature of the venues
intra-institute collaboration and table 8 provides the „institute –        EDM, LAK and JETS were then manually eliminated (since
pairs‟ that have had highest collaboration. Such analysis is               they would not bring any insightful information for this
beneficial to research institutes and organizations so that they           analysis) e.g., student, learn, knowledge, education etc. Further
may collaborate and extend further studies in the field of                 refinement was made to merge varying instances of the same
learning analytics. Figure 3 illustrates trends of collaboration           word such as „visual, visualize, visualization‟ etc. Then, IBM‟s
between institutes.                                                        Many-eyes software utility was used to obtain the Matrix Chart
    Table 6: Top institutes with highest counts of distinct                as provided in figure 4. In this figure top 30 keywords for each
                        collaborators                                      year have been presented. It should be noted that since the count
                                                                           of articles and venues has also increased over years; therefore,
Institute                                                  Degree          the relative rank or position of keywords will be discussed rather
Carnegie Mellon University                                   20            than absolute frequency counts. From this figure, it was found
University of Cordoba                                        9             that the usage of some of the keywords such as „visualization,
Stanford University                                          8             intelligent, network*‟ is increasing over time. Some keywords
Fraunhofer Institute for Applied Information Technology      7             such as „model*, system*, tutor*‟ retain their ranks. The
Dept. Computer wetenschappen, KU Leuven                      7             keywords „online, collaborat*, performance‟ etc. show
Worcester Polytechnic Institute                              7             fluctuating trends. Similarly, other trends can be interpreted.
                                                                           The authors further extracted the context of these keywords: it
Open University of the Netherlands                           6
                                                                           was found that „visualization co-occurs with data-mining‟,
University of Pittsburgh                                     6
                                                                           „intelligent appears with tutoring system‟. The word „online‟ has
                                                                           a broader class of co-occurring keywords which includes
 Table 7: Top institutes with highest count of intra-institute             „learning, education, university, assessment systems, tutoring,
                        collaboration                                      courses, curriculum‟ etc. Interestingly, in 2012 the context
                                                                           changed to „online communities, interactions and social
 Institute                                       Self-loop count           learning‟ etc. Due to space restriction further analysis cannot be
 Worcester Polytechnic Institute                       116                 provided in this paper.
 Carnegie Mellon University                            107
 Eindhoven University of Technology                     36                 CONCLUSION
 University of Cordoba                                  33                 In this paper the data of past five years of publications related to
 University of Memphis                                  31                 learning analytics are analyzed. The trends show increasing
 Universitat Oberta de Catalunya (UOC)                  31                 number of authors and more collaboration between authors as
 University of North Carolina at Charlotte              20                 well as institutes. Geographical analysis of authors shows that
 RWTH Aachen University                                 16                 scholars from different countries have been collaborating and
                                                                           contributing towards this field. Top authors, collaborators, and
8. RESEARCH THEMES                                                         institutes are identified in this paper. The authors also attempted
In order to track the research themes being followed by learning           to bring out the research themes followed by the learning
analytics society and to see their emergence over time, the                analytics community based on the frequency of the usage of
authors conducted a keyword based analysis. The information                keywords.
for this analysis has been extracted from the keyword (subject)
section of the data provided by Society for Learning Analytics             The authors plan to extend this study based on author‟s
Research (SoLAR) website [1]. However, for initial two years               disciplinary diversity and on the association between authors
                                                                           and their explored research areas within learning analytics.

                                             Table 8: Top pairs for inter-institute collaboration
          Institute                                                    Institute                                     Edge weight
          Worcester Polytechnic Institute                              Carnegie Mellon University                        37
          Claremont Graduate University                                University of Memphis                             18
          University of Belgrade                                       Simon Fraser University                            9
          Northern Illinois University                                 University of Memphis                              9
          Hochschule fur Wirtschaft und Recht                          Hochschule fur Technik und Wirtschaft              8
          Beuth Hochschule fur Technik Berlin                          Hochschule fur Technik und Wirtschaft              8
          Universidade Federal de Alagoas                              Carnegie Mellon University                         8
          Fraunhofer Institute for Applied Information Technology      Saarland University                                8
                                  Figure 3: Trends of collaboration in terms of author affiliation



REFERENCES
[1] Taibi, D., Dietze, S., Fostering analytics on learning            [7] YWORKS, 2013. Y works developer‟s guide glossary.
    analytics research: the LAK dataset, Technical Report,                http://docs.yworks.com/yfiles/doc/developers-
    03/2013                                                               guide/glossary.html
[2] LUXON SOFTWARE, 2013. Luxon software converter.                   [8] OXFORD DICTIONARIES, 2013. Oxford dictionary
    http://www.luxonsoftware.com/converter/xmltocsv                       collaboration.
[3] NODEXL, 2013. NodeXL. http://nodexl.codeplex.com/                     http://oxforddictionaries.com/definition/english/collaborati
                                                                          on
[4] Borgatti, S.P., 2002. NetDraw Software for Network
    Visualization. Analytic Technologies: Lexington, KY               [9] WIKIPEDIA,           2013.      Wikipedia      centrality.
                                                                          http://en.wikipedia.org/wiki/Betweenness#Betweenness_ce
[5] IBM,        2013.     Many        eyes.     http://www-
                                                                          ntrality
    958.ibm.com/software/analytics/manyeyes/visualizations/a
    nalysis-of-the-community-of-learn                                 [10] Nawaz, S., Strobel, J., 2013. IEEE Transactions on
                                                                           Education – authorship and content analysis, under
[6] Ferguson, R. 2012. The State Of Learning Analytics in
                                                                           preparation
    2012: A Review and Future Challenges. Technical Report
    KMI 12-01, Knowledge Media Institute, The Open                    [11] HERMETIC, 2013. Hermetic Word Frequency Counter.
    University, UK.                                                        http://www.hermetic.ch/wfc/wfc.htm
    http://kmi.open.ac.uk/publications/techreport/kmi-12-01
Figure 4: Keyword analysis for research theme extraction