=Paper= {{Paper |id=Vol-2868/article_7 |storemode=property |title=Recommendations for Network Research in Learning Analytics: To Open a Conversation |pdfUrl=https://ceur-ws.org/Vol-2868/article_7.pdf |volume=Vol-2868 |authors=Oleksandra Poquet,Mohammed Saqr,Bodong Chen }} ==Recommendations for Network Research in Learning Analytics: To Open a Conversation== https://ceur-ws.org/Vol-2868/article_7.pdf
Recommendations for Network Research in Learning Analytics:
To Open a Conversation
Oleksandra Poquet 1, Mohammed Saqr 2, Bodong Chen 3
1
  Centre for Change and Complexity in Learning (C3L), University of South Australia, Australia
2
  University of Eastern Finland, Finland
3
  University of Minnesota, USA


                                  Abstract
                                  Network science methods are widely adopted in learning analytics, an applied research area
                                  that focuses on the analysis of learning data to understand and improve learning. The
                                  workshop, taking place at the 11th International Learning Analytics and Knowledge
                                  conference, focused on the applications of network science in learning analytics. The workshop
                                  attracted over twenty researchers and practitioners working with network analysis and
                                  educational data. The workshop included work-in-progress and group-wide conversations
                                  about enhancing the quality of network research in learning analytics. The conversations were
                                  driven by concerns around reproducibility and interpretability currently discussed across
                                  research communities. This paper presents a snapshot of the workshop discussions beyond its
                                  work-in-progress papers. To this end, we summarize a literature review presented to the
                                  workshop participants, with the focus on the elements related to the reproducibility and
                                  interpretability of network research in education settings. We also provide a summary of the
                                  workshop discussions and conclude with suggested guidelines for the reporting of network
                                  methods to improve generalizability and reproducibility.

                                  Keywords 1
                                  Network science, education, learning analytics, learning sciences, recommendations

1. Introduction
    Learning analytics (LA) aims to use “measurement, collection, analysis and reporting of data about
learners and their contexts, for purposes of understanding and optimizing learning and the environments
in which it occurs” [1, p.1382]. Network analysis (NA) is one of the methodological approaches used
in LA. NA enables the modeling and analysis of relational data in education. In essence, a network is
composed of a group of entities or elements referred to as nodes or vertices and a relationship that
connects them referred to as edges or links. Network visualizations created by NA can be useful in
mapping relations and interactions, identifying patterns of interactions, finding active and inactive
students, and detecting student or teacher roles. Mathematical analysis of network graphs commonly
entails the calculation of indices at the levels of the whole network to show global structural properties
or individual nodes/actors comprising the network. Individual-level measures, referred to as centrality
measures, can quantify the importance of learners in the network or characterize their roles. Given that
the meaning of an actor’s importance, roles and contributions is context-specific, centrality measures
may vary in their applications and interpretations.
    In education, researchers have used NA to meet various analytical goals: to represent interactions
among collaborators, to examine mediated communication in Computer-Supported Collaborative
Learning (CSCL), to represent and study the relationships among epistemics, to mention a few. To
construct a network, many choices must be made by the researcher, for instance, what are the elements

Proceedings of the NetSciLA21 workshop, April 12, 2021
EMAIL: sasha.poquet@unisa.edu.au (A. 1); chenbd@umn.edu (A. 2); saqr@saqr.me (A. 3)
ORCID: 0000-0001-9782-816X (A. 1); 0000-0003-4616-4353 (A. 2); 0000-0001-5881-3109 (A. 3)
                               © 2021 Copyright for this paper by its authors.
                               Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Wor
    Pr
       ks
        hop
     oceedi
          ngs
                ht
                I
                 tp:
                   //
                    ceur
                       -
                SSN1613-
                        ws
                         .or
                       0073
                           g
                               CEUR Workshop Proceedings (CEUR-WS.org)
that need to be studied (nodes), what kind of relationships between them are of interest (edges), and
whether the strength of relationship should be considered (weight). These choices influence the
constructed network in significant ways, and thereby impact insights based on the researcher’s
interpretation of the network. While NA could be flexibly applied to various contexts, this advantage
raises challenges with selecting network analysis methods and generalizing across contexts.

2. Review of Literature

    Researchers aspire to conduct research that can have a societal impact, which needs to be grounded
in trustworthy findings that are transferable to various contexts and populations. For research findings
to be transferred into practice and policy making, those research findings have to be valid. Two types
of validity are essential here: internal validity, a term that refers to the rigorousness of research methods
(sampling, data collections, statistical analysis); and external validity, also known as generalizability, a
term that refers to whether the results from one sample can be extended to the population or to be
applied widely or “externally” in other settings. On the one hand, valid and transferable findings
embolden the trust in our methods, help establish impactful practices, and advance the field in general.
On the other hand, research without sound methodological rigor endangers our trust. As Bergner put it:
“without foregrounding methodological choices in learning analytics we run the risk of generating more
doubt” [2, p. 3].
    Previous research using NA in learning analytics repeatedly confirmed that variability in research
findings can result from methodological choices. Fincham et al. [3] and Wise et al. [4] examined
different tie extraction methods in CSCL (direct reply ties, star reply ties, total co-presence, limited co-
presence, moving window) on the resulting network and research findings. They found differences in
the same analysis applied to the networks constructed with different tie extraction methods. While a
significant positive association between centrality measures and academic performance was found in
some networks, significant negative association was found in other networks. This work serves as a
reminder that tie extraction should be cautiously chosen and justified. Saqr et al. [5] further examined
different network configurations and tie weight assignment in multiple courses. Their results similarly
show significant differences between network configuration methods as they influence observed
correlations between centralities and learning outcomes, such as performance.
    Sound methodological choices, valid results, and transparent scientific reporting of the findings are
essential for the transferability and reproducibility of the research applying NA in learning analytics.
The multifaceted rigor of network research would determine the potential for research findings to
influence educational practice and policy making. When applying NA in learning settings, it is
important to defining the network model, including its nodes and relations, with the guidance from
theory and context. Asking what defines a non-relation in a network and whether the relationships that
are excluded comply with that definition, for instance, can help evaluate if the tie definitions were
carefully considered. Similar deliberation around the weight and direction of ties is needed. Calculating
the network indices either at node or network level relies heavily on these configurations, which is why
these choices need to be explained and justified. These choices are essential for the internal validity of
network research, as well as the reproducibility and transferability of the research results.

3. Scanning Highly Cited Literature: Methods
    To offer a common ground for applying network analysis to the analysis of learning data, we selected
fifty most cited papers characterized by the combination of keywords ‘network analysis’ and
‘education’ or ’learning’. These most cited studies provide valuable insights into the studies that are
highly visible in the field. Because they are highly cited, they set the standard for research practices and
offer points of entry for researchers who aspire to conduct network analysis in the field. In February
2021, the authors queried the Web of Science and Scopus databases, with the query: (TITLE-ABS-
KEY ( "network analysis" AND ( education OR learning ) ) ) AND PUBYEAR > 2010 AND (
LIMIT-TO ( DOCTYPE , "ar" ) OR LIMIT-TO ( DOCTYPE , "cp" ) ) AND ( LIMIT-TO (
LANGUAGE , "English" )). All the results returned by the query were sorted by the citation count, and
fifty papers with the highest number of citations were selected. Two researchers scanned abstracts of
the selected fifty studies to exclude eleven papers that were not relevant, e.g., not analyzing networks
related to learning and education. The remaining 39 papers were split between two authors who
independently recorded information pertinent to the design, method, and sample of each study. The
third author coded a small subset of the studies to calibrate the overall categories used to record
information about the papers. During the process, 4 more papers were excluded due to the lack of
relevant focus (e.g., focus on infrastructure for learner networks rather than analyzing them), leaving
35 papers to be included in the final analysis.
    From each paper we extracted information guided by the coding scheme reported in Table 1. During
the workshop, we presented the summary of the dataset to the workshop participants to stimulate group
discussions around rigor of network analysis in learning analytics. Here, we report the descriptive
summary of the selected papers as well as a summary of group discussions from the workshop. We
conclude with suggestions for practical recommendations of reporting data, methods, and analysis
conducted in this area of work.

       Table 1
       Coding categories used to analyze highly cited papers
        Category                                         Explanation
       Meta-data             ● Authors, journal, pages, title, year published, volume, issue,
                                 number of citations
    Network modality         ● One mode or Two mode
      Use of theory          ● Was the study theoretically framed?
                                 Yes/No
                             ● If yes, what was the theory used to frame the study?
         Region              ● Where were the participants from
                                 (International for MOOCs, otherwise as specified)
   Learning Modality         ● What was the modality of the educational setting (online, blended,
                                 face-to-face)
         Sector              ● What was the level of the educational setting (higher education,
                                 adults/MOOCs, secondary education)
      Network size           ● What was the size of the network analyzed in the study?
         Scope               ● How many courses did the study analyze
                                 (one, two, three, program-level analysis)
    Type of research         ● Was the study reporting the results of descriptive or statistical
                                 network analysis?
   Research question         ● What research question was investigated in the study?
     Node definition             How were nodes defined in the study?
     Edge definition         ● How were edges defined in the study? Were they directed or
                                 undirected?
     Mixed methods           ● Was network analysis used in combination with another method?
   Descriptive analysis      ● Were centralities identified in the study? Were they normalized?
                                 Did the study report equations?
                                 If sub-graph analysis was a part of the study, what community
                                 detection algorithm was used?
    Analysis software        ● What software was used in the study?
         Results             ● Does the study offer interpretation of network centrality
                                 measures?


4. Results

Table 2 presents the set of thirty-five highly cited papers selected for this review.
   Which settings were represented in the highly cited papers? 37% of the papers listed in Table 2 were
based on educational data collected in North America, 31% in Europe, and 11% in Australasia. The
remainder of the papers focused on fully online settings in international courses, such as MOOCs. In
63% of the studies, data were collected in higher education settings; 23% in adult learning, professional
learning, and MOOC settings; and only 11% in primary and secondary education. In terms of the
modality of educational provisions, 63% of the studies examined fully online courses, 26% face-to-face
educational settings, and 11% blended settings that required both in-person and online interpersonal
learning interactions. Across all highly cited studies, students were treated as network nodes, that is
these studies analyzed networks of students. The smallest network comprised 12 students, whereas the
largest one included 4,337 students; the mean and median network size across highly cited studies were
respectively 511 and 83 students. Two papers did not report the number of students studied. In addition
to examining networks of different sizes, in 57% of the studies the networks were collected in one
course only; 14% of the studies analyzed student networks in two courses. The remaining studies
analyzed program-based networks and alike, with varying numbers of courses.
    What theoretical frames were used for the analysis of student networks in highly cited papers? 57%
of the papers in the dataset used theory to frame network analysis. 42% of papers did not use theory or
only loosely referenced theory without explicitly incorporating its lenses in the analysis or
interpretation. Frequently used theoretical frames included ‘social capital’, ‘structural holes’, retention,
and integration into social structures (e.g. being central to the networks). Studies often applied socio-
cultural, socio-technological, and social constructivist perspectives on learning. Studies also had
examples of how centrality measures in student networks were interpreted to identify student roles
(brokers, gatekeepers, peripheral).
    What methodological choices were reported in the highly cited papers? Before reporting
methodological approaches in these highly cited papers, it is important to note that around two thirds
of the dataset focused on student relations through online communication and the other third examined
face-to-face networks. This signifies that in this set of papers, different data sources were used for edge
definitions. In particular, most online student networks were operationalized through text-based
technology-mediated online events (discussion logs), such as ‘replies’, ‘mentions’, ‘comments’, ‘co-
occurs with’ in the shared online space (such as a discussion thread), and less frequently log events,
such as ‘follows’ and ‘reads’. In essence, these data sources were used to create network projections of
digital event data. In contrast, networks constructed from face-to-face settings were based on self-
reported relational states, such as ‘is friends with’, ‘is knowledgeable’, and ‘cooperates with’. Multiplex
tie definitions (combined relations, such as ‘uses the same computer’, ‘does homework with’,
‘submitted at the same time’) were rare within the set of highly cited studies.
    In terms of the direction of ties, 71% of the studies analyzed directed networks. Three studies did
not specify whether directed or undirected networks were analyzed. Approximately 6% of studies used
different tie definitions to construct several networks for the same actors but of different relations, then
analyzing them and comparing to each other. Only 23% of selected papers constructed weighted
networks. This finding is important since around two thirds of the studies analyzed online interactions
where information about the frequency of exchanges between learners could be easily extracted and
may constitute an important element of the analysis. In some studies information about edge weight
was not explicit. For instance, the authors would not state if the ties were weighted but then would
include weighted degree measure in their analysis. This suggests that networks were, in fact, weighted.
    57% of the studies reported descriptive network measures. 82% of the papers calculated centrality
measures; only 1 paper stated that these measures were normalized. 69% of the studies analyzed student
networks in combination with other data, most commonly text/content sent among students, student
grades, and self-reported measures (such as the sense of belonging). Complementary data were used in
studies seeking to correlate learning measures with network indices. Some network studies also
examined network structures to understand the networks’ ‘inclusiveness’ and ‘communication
patterns’.
    UCINET was predominantly used to calculate measures (given that some papers date back over a
decade), but other tools were also used, including R, Gephi, Pajek, State, NodeX, Python-NetworkX,
Cytoscape, Meerkat-ed, KBDeX, Netdraw, and NetMiner. However, three studies did not mention the
software that was used to calculate the metrics. Given that software packages may use somewhat varied
versions of metrics, it is worth noting that only 14% of the papers included equations for the metrics
they calculated.
    Table 2
    Overview of selected studies (citations as in Scopus, April, 2021)
Authors                                 Title                                                                                    Venue                           Citations
Stewart & Abidi, 2011                   Applying social network analysis to understand the knowledge sharing behaviour of        Journal of Medical Internet        50
                                        practitioners in a clinical online discussion forum                                      Research
Rabbany, Takaffoli & Zaiane, 2011       Analyzing Participation of Students in Online Courses Using Social Network Analysis      Proceedings of educational         50
                                        Techniques.                                                                              data mining
Vaughan et al., 2015                    Bridging the gap: The roles of social capital and ethnicity in medical student           Medical Education                  38
                                        achievement
Vercellone-Smith, Jablokow, &           Characterizing communication networks in a web-based classroom: Cognitive styles         Computers and Education            42
Friedel, 2012                           and linguistic behavior of self-organizing groups in online discussions
Ryu & Lombardi, 2015                    Coding Classroom Interactions for Collective and Individual Engagement                   Educational Psychologist           37
Marcos-Garcia, Martinez-Mones, &        DESPRO: A method based on roles to provide collaboration analysis support                Computers and Education            36
Dimitriadis, 2015                       adapted to the participants in CSCL situations
Yang et al., 2015                       Group interactive network and behavioral patterns in online english-to-Chinese           Internet and Higher Education      34
                                        cooperative translation activity
Thoms & Eryilmaz, 2014                  How media choice affects learner interactions in distance learning classes               Computers and Education            61
Xie,Yu, & Bradshaw, 2014                Impacts of role assignment and participation in asynchronous discussions in college-     Internet and Higher Education      49
                                        level online classes
Zhang et al., 2017                      Interactive networks and social knowledge construction behavioral patterns in            Computers and Education            42
                                        primary school teachers' online collaborative learning activities
Oshima, Oshima, & Matsuzawa, 2012 Knowledge Building Discourse Explorer: A social network analysis application for               Educational Technology             84
                                        knowledge building discourse                                                             Research and Development
Wise & Cui, 2018                        Learning communities in the crowd: Characteristics of content related interactions       Computers and Education            36
                                        and social relationships in MOOC discussion forums
Shea et al., 2013                       Online learner self-regulation: Learning presence viewed through quantitative            IRRODL                             41
                                        content- and social network analysis
Zheng & Warschauer, 2015                Participation, interaction, and academic achievement in an online discussion             Computers and Education            40
                                        environment
Skrypnyk et al., 2015                   Roles of course facilitators, learners, and technology in the flow of information of a   IRRODL                             31
                                        CMOOC
Tirado, Hernando, Aguaded, 2015         The effect of centralization and cohesion on the social construction of knowledge in     Interactive Learning               34
                                        discussion forums                                                                        Environments
Lu & Churchill, 2014                The effect of social interaction on learning engagement in a social networking         Interactive Learning             44
                                    environment                                                                            Environments
Rienties et al. 2012                The role of scaffolding and motivation in CSCL                                         Computers & Education            76
Rienties & Kinchin, 2014            Understanding (in)formal learning in an academic development programme: A              Teaching and Teacher             40
                                    social network perspective                                                             Education
Eckles & Stradley, 2012             A social network analysis of student retention using archival data                     Social Psychology of Education   35
Kellogg, Booth, & Oliver, 2014      A social network perspective on peer supported learning in MOOCs for educators         IRRODL                           67
Hernandez-Garcia et al. 2015        Applying social learning analytics to message boards in online distance learning: A    Computers in Human Behavior      60
                                    case study
Gillani & Eynon, 2014               Communication patterns in massively open online courses                                Internet and Higher Education    131
Chen et al., 2018                   Fostering student engagement in online discussion through social learning analytics    Internet and Higher Education    35
Grunspan et al., 2016               Males’ under-estimate academic performance of their female peers in                    PLoS ONE                         91
                                    undergraduate biology classrooms
Dawson, Tan, & McWilliam, 2011      Measuring creative potential: Using social network analysis to monitor a learners'     Australasian Journal of          33
                                    creative capacity                                                                      Educational Technology
Conlan et al., 2011                 Measuring social networks in British primary schools through scientific engagement     Proceedings of the Royal         41
                                                                                                                           Society B: Biological Sciences
Fire, 2012                          Predicting Student Exam's Scores by Analyzing Social Network Data                      Lecture Notes in Computer        31
                                                                                                                           Science
Gasevic, D. et al., 2019            SENS: Network analytics to combine social and cognitive perspectives of                Computers in Human Behavior      50
                                    collaborative learning
De-Marcos et al., 2016              Social network analysis of a gamified e-learning course: Small-world phenomenon        Computers in Human Behavior      58
                                    and network metrics as predictors of academic performance
Lambropoulos, Faulkner, & Culwin,   Supporting social awareness in collaborative e-learning                                British Journal of Educational   44
2012                                                                                                                       Technology
Bruun & Brewe, 2013                 Talking and learning physics: Predicting future grades from network measures and       Physical Review Special Topics   32
                                    Force Concept Inventory pretest scores                                                 - Physics Education Research
Jimoyiannis, 2012                   Towards an analysis framework for investigating students' engagement and learning      Journal of Computer Assisted     38
                                    in educational blogs                                                                   Learning
Joksimovic et al., 2016             Translating network position into performance: Importance of centrality in different   ACM International Conference     44
                                    network configurations                                                                 Proceeding Series
Grunspan, Wiggins, & Goodreau,      Understanding classrooms through social network analysis: A primer for social          CBE Life Sciences Education      101
2014                                network analysis in education research
5. Workshop Discussion

   The presented overview of research settings, theoretical framings, and methodological details in
these highly cited studies of student networks suggests limited generalizability of research findings
across contexts, given the lack of details needed to understand and interpret the findings. Descriptive
measures also lack generalizability, as they are predominantly derived for one or two cases (i.e. one or
two networks) and are embedded in a specific pedagogical context. These contexts are not always
explicitly described. To open the conversation about how to improve this state of research, the workshop
participants, were invited to discuss questions, such as: (1) Are we asking practice-related questions
that enable action? (2) Are we asking questions in a way that elicits theoretical insights? (3) Are we
reporting our methods in ways that are reproducible? (4) What social science and learning theories
apply to socio-technical networks? What measures and interpretations are relevant for socio-technical
networks? (5) What recommendations can be provided at this stage to improve the quality of network
research in learning analytics? Group notes from discussions are available at the
https://learningfutures.github.io/lak-network/.
   Much discussion in relation to theoretical issues revolved around the need to understand what
elements of social theories (e.g., theories of social capital, social ties, constructs of prestige, power,
harmonics) translate to digital settings, and what conceptual and methodological adjustments may be
needed to incorporate these theories in studies on digital learning and computer-mediated
communication.
   Another prominent discussion revolved around the issues of reporting methodological details to
ensure both methodological and conceptual rigor.

6. Recommendations

Whereas contributions to theory are beyond the scope of our workshop, suggestions around the
reporting of methodological details could be made. To improve the quality and transferability of
empirical work focused on the analysis of learning, communication, and social processes in digitally
mediated settings, we put forward a set of recommendations applicable to future network research in
learning analytics. As researchers working at the intersection of digital data and social networks, we
suggest that future studies incorporate the following in their reports.

1) Recommended elements for describing network studies in learning analytics:

       ●   What are the nodes and their relevance to the context, theory, or research question?
       ●   What are the edges, and what do they represent? Are there any assumptions made for the
           edge definition and what are the justifications for such assumptions?
       ●   Is the network directed, undirected, or mixed?
       ●   Is the network weighted? Is the network simplified or filtered based on a certain threshold?
           If so, how was the threshold justified?
       ●   How do the edges, weight, and direction align with the context, theory, and interpretation?
       ●   Is the network unipartite or bipartite?
       ●   If edges were aggregated; what was the duration over which the aggregation was made?
       ●   What software, and which version, was used to calculate network indices? What algorithms
           or equations were implemented in the software to compute these indices?
       ●   What software was used for network visualization? Which network layout was used?
       ●   What community detection method, algorithm, and parameters were used?
       ●   What was the size of the network: number of nodes, edges in each of the studied networks;
           were there any isolates, were they excluded from the analysis and why?
       ●   In communication networks that are constructed from event data, was time (discrete or
           continuous) and frequency of exchanges included in the network construction or statistical
           modelling; if not, why this information was excluded.
       ●    What setting (e.g., learning context, pedagogical design) was the network developed in?
            How does this setting compare with other reported studies?

   2) Additional suggestions for analysis of networks that are in part constructed from digital data:
       ● Justify how the findings contribute to theory or address an applied problem. If possible,
           consider explaining why social network theory is applicable to networks constructed from
           digital trace events; the use of social science constructs, such as power or influence may not
           carry its meaning into the digital context, or at least in full, and needs elaboration.
       ● If possible, justify the choice of centrality measures and elaborate on the interpretations of
           centrality measures in digital learning.
       ● If relevant, use null models to compare networks or infer if observed structures occurred by
           chance, rather than draw inferences from descriptive measures between potentially
           incomparable structures.
       ● When modelling networks statistically, include goodness of fit plots for relevant methods.

We hope these recommendations could serve as points of departure for future dialogues that will
continue to add to and refine the list. We invite network researchers in learning analytics to join the
conversation.

7. Citation Diversity Statement

Recent work in several fields of science has identified a bias in citation practices such that papers from
women and other minorities are under‐cited relative to the number of such papers in the field (Zurn et
al.). We have manually checked the first and the last author's names in the reference list and inferred
gender. By this measure, 16% of the cited literature was written by woman (first author)/woman (last
author), 16% by men (first)/woman (last), 16% by woman (first)/ man (last), and 50% by man
(first)/man (last). This method is limited as it is not indicative of gender identity, and it cannot account
for intersex, non‐binary, or transgender people. We look forward to future work that could help us to
better understand and support equitable practices in science.

8. References

 [1] G. Siemens. Learning analytics: The emergence of a discipline. American Behavioral Scientist
     (2013): 57(10), pp. 1380–1400.
 [2] Y. Bergner. Measurement and its uses in learning analytics. in: C. Lang, G. Siemens, A. Wise, D.
     Gasevic (Eds.), Handbook of Learning Analytics, first ed., Society for Learning Analytics
     Research (SoLAR) (2017): pp. 23–33.
 [3] E. Fincham, E., D. Gašević, & A. Pardo. From social ties to network processes: Do tie definitions
     matter? Journal of Learning Analytics (2018): 5(2), 9–28. https://doi.org/10.18608/jla.2018.52.2.
 [4] A. Wise, Y. Cui. Unpacking the relationship between discussion forum participation and learning
     in MOOCs: Content is key. in LAK '18: Proceedings of the 8th International Conference on
     Learning Analytics and Knowledge (2018): pp. 330-339
     https://doi.org/10.1145/3170358.3170403.
 [5] M. Saqr, M., O. Viberg, H. Vartiainen. Capturing the participation and social dimensions of
     computer-supported collaborative learning through social network analysis: which method and
     measures matter? International Journal of Computer-Supported Collaborative Learning (2020):
     15, pp. 227–248. https://doi.org/10.1007/s11412-020-09322-6
 [6] P. Zurn, P., D. Bassett, N. Rust. The citation diversity statement: A practice of transparency, a
     way of life. Trends in Cognitive Sciences (2020): 24(9), 669–672.
     https://doi.org/10.1016/j.tics.2020.06.009