<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Construction of Weighted Course Co-Enrollment Network</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>XunFei Li</string-name>
          <email>xunfeil@uci.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Renzhe Yu</string-name>
          <email>renzhey@uci.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of California</institution>
          ,
          <addr-line>Irvine</addr-line>
          ,
          <institution>University of California</institution>
          ,
          <addr-line>Irvine, Irvine, 92697</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The increasing availability of digitized campus administrative data provides researchers with the opportunity to systematically quantify how co-presence in classes shapes individual students' educational outcomes. Social network analysis is appropriate for this purpose through the construction of course co-enrollment network and network-based statistical models. This study intends to explore different ways to construct the course co-enrollment network and evaluate their capacity to capture meaningful student connections through courses. We specifically compare a simple unweighted co-enrollment network and a weighted network based on course characteristics along two dimensions: the relationship between network indices and students' academic performance, and the degree to which students with stronger weighted ties with each other experience more peer influence on individual performance than peers who are less connected through course co-enrollment.</p>
      </abstract>
      <kwd-group>
        <kwd>1 Social network analysis</kwd>
        <kwd>course co-enrollment</kwd>
        <kwd>network autocorrelation model</kwd>
        <kwd>peer effect</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Course-taking experience is a critical part of undergraduate students’ college life. Exposure to peers
who take the same course might significantly impact individual academic achievement. The
demographic composition (in regard to gender, ethnicity, etc.) of classmates shapes the socio-cultural
contexts of students’ academic experience, and the direct (such as group work) and indirect (such as
presentations) interactions with peer students exert intangible influence on individual outcomes from
time to time
        <xref ref-type="bibr" rid="ref2">(Eckles &amp; Stradley, 2012)</xref>
        .
      </p>
      <p>With the availability of campus administrative data, researchers are able to evaluate this important
peer influence at scale. Among a few different methodological traditions, social network analysis (SNA)
is appropriate for this purpose because it is one of the most used methods to study relational data and
can explicitly model how students are connected through the course co-enrollment network as well as
the effects of network properties on individual-level outcomes.</p>
      <p>
        Studies applying SNA to course co-enrollment networks have found that network statistics such as
degree and density contribute to explaining students’ educational outcomes
        <xref ref-type="bibr" rid="ref3 ref4 ref5 ref7">(Fincham et al., 2018; Israel
et al., 2020; Weeden &amp; Cornwell, 2020)</xref>
        . However, the network edge in most of these studies is defined
as a binary indicator of whether two students enroll in the course or not. This is a rather coarse proxy
for peer exposure because the strength of connections between students in different courses largely
varies with different course types, delivery formats, meeting schedules, among other factors. Given that
the relationship between network statistics and node-level outcomes is affected by how the network is
constructed, using an overly simplified construction of course co-enrollment network might mask the
actual effect of enrolling in the same course. To date, little effort has been put into examining alternative
ways of constructing this network, and this study intends to investigate what network construction(s)
best captures class-based peer influence. Specifically, by comparing different weighting strategies that
leverage different course-level information, we aim to identify the construction approach that can best
predict student achievement from network characteristics. The findings of this inquiry will inspire both
researchers and practitioners to get deeper insights into students’ college experience from
administrative data which is largely standard and usable across different institutions.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
    </sec>
    <sec id="sec-3">
      <title>2.1. Social Network Analysis and Course Co-Enrollment Networks</title>
      <p>
        Social network analysis (SNA) has been used in studying educational contexts for a long time
        <xref ref-type="bibr" rid="ref1">(Biancani &amp; McFarland, 2013)</xref>
        . Traditionally, students’ friendship and residence-based networks have
gained much attention for examining how significant others’ preference and selection affect focal
students’ educational performance and behavior. At the micro-level, SNA has also been applied to
students’ posts in online discussion forums to understand how students interact with each other through
discourse in individual classrooms
        <xref ref-type="bibr" rid="ref3 ref4">(Fincham et al., 2018)</xref>
        . While these networks capture different
aspects of peer influence in college experience, they are either very context-specific (e.g., course design
contexts for discussion forum networks) or require extensive data collection effort from researchers.
These characteristics limit the scalability of such analyses.
      </p>
      <p>
        As various campus-wide data become digitally available from the administrative end, some other
aspects of peer influence become measurable on a larger scale and at low cost. A prominent example is
course transcript data which can be used to construct course co-enrollment networks. Course
coenrollment captures the most important academic relations between students and their fellow students,
but only a handful of studies in the field of higher education have examined how the structure of
coenrollment network relates to students’ connections and behavior
        <xref ref-type="bibr" rid="ref3 ref4">(Fincham et al., 2018; Weeden &amp;
Cornwell, 2020)</xref>
        . As a cost of scalability, student-by-course enrollment records can only capture
between-course variations in peer exposure and miss out variations in granular peer interaction within
a class. Accordingly, the main challenge of constructing a course co-enrollment network is how to
understand and model peer exposure and peer influence in relation to course contexts in a more accurate
manner.
2.2.
      </p>
    </sec>
    <sec id="sec-4">
      <title>Approaches to Network Construction</title>
      <p>
        Different network constructions represent researchers’ understanding of the relation(s) being
modeled. In friendship networks, the existence of a tie depends on students’ self-reports of their best
friend(s), which assumes that perceived intimacy between friends has a significant effect on individual
students. Ties could also be constructed based on students’ direct interactions. The discussion forum
network, for example, usually defines a tie as a student’s response to another student’s post. Networks
in the context of small groups such as study groups or orientation groups are also based on the
assumption that students affect each other through direct interaction. Another type of network is
copresence networks which define ties as students’ physical presence in the same space during the same
time, such as networks constructed based on campus network data, course co-enrollment, and campus
activity participation
        <xref ref-type="bibr" rid="ref2 ref5 ref7">(Eckles &amp; Stradley, 2012; Nguyen et al., 2020)</xref>
        .
      </p>
      <p>
        In the case of course co-enrollment network, it can either be constructed as a two-mode
coursestudent network or be projected as two one-mode networks separately (student-student and
coursecourse network). The network structure and tie definition could also be affected by the time span, node
inclusion criteria and other research-specific concerns
        <xref ref-type="bibr" rid="ref3 ref4 ref5 ref7">(Gardner et al., 2018; Israel et al., 2020; Weeden
&amp; Cornwell, 2020)</xref>
        . Weeden and Cornwell (2020) construct a two-mode course co-enrollment network
with a single term’s transcript data at Cornell University. Undergraduate, graduate and professional
master students are connected to each other if they are in the same class at all in that term, and all the
ties are treated equally. Israel and colleagues (2020) project a one-mode course network and a
onemode student network from the full two-mode co-enrollment network, which is based on one single
cohort of students’ course-taking data over six years. A student forms a tie with another student if they
ever enrolled in the same class within six years after they enrolled, and the edge is weighted by the total
number of co-enrolled courses. Gardner et al. (2018) use ten years of undergraduate course-taking
records to build the network and further specify different edges through link attributes, which change
according to the characteristics of co-enrolled peers.
2.3.
      </p>
    </sec>
    <sec id="sec-5">
      <title>Link Network Statistics to Students Educational Outcomes</title>
      <p>
        Researchers have applied SNA to explore how network-level features and node-level indices could
help understand the connection between students’ social relations and their educational outcomes as
well as how such relations form and evolve in contexts. Network-level indices such as density,
betweenness centralization, clustering coefficient, and two-mode bi-component structure are used to
examine overall how students are connected to each other, and how certain classes or students play
critical linking roles
        <xref ref-type="bibr" rid="ref5 ref7">(Israel et al., 2020; K. A. Weeden &amp; Cornwell, 2020)</xref>
        . Node level indices such as
degree and demographic and academic features of peers in the network are examined in relation to
students’ educational outcomes such as retention rate
        <xref ref-type="bibr" rid="ref2">(Eckles &amp; Stradley, 2012)</xref>
        , STEM preference
        <xref ref-type="bibr" rid="ref8">(Raabe et al., 2019)</xref>
        , and GPA based performance
        <xref ref-type="bibr" rid="ref3 ref4">(Gardner et al., 2018)</xref>
        . As discussed in Section 2.2,
the specific network construction approach would affect the estimated relationship between network
statistics and individual outcomes of interest
        <xref ref-type="bibr" rid="ref3 ref4">(Fincham et al., 2018)</xref>
        . However, previous studies on
course co-enrollment networks did not further investigate this perspective.
      </p>
    </sec>
    <sec id="sec-6">
      <title>3. Research questions</title>
      <p>This study investigates different ways to construct course co-enrollment networks with course-level
information from university administrative data. We specifically focus on weighting network ties by
different pieces of course information such as course type, class size, and meeting schedule. The
assumption is that co-enrolling in a course means different levels and effects of peer exposure in
different course contexts. For example, students may have more in-depth connections in small seminars
than in large lectures, in classes with more frequent meeting schedules than in courses with fewer
opportunities to meet. This course-relevant information would affect the strength of students’
connection through the course co-enrollment network.</p>
      <p>RQ 1:</p>
      <p>What are the different ways of constructing co-enrollment networks weighted by course information
from campus administrative data?</p>
      <p>To further validate which construction approach more effectively captures students’ connection in
different course contexts, we employ two modeling perspectives. We first examine the predictive power
of local network statistics on individual outcomes in each network construction. The assumption is that
a stronger predictive relationship would indicate a more valid network construction.</p>
      <p>RQ 2:</p>
      <p>Is the relationship between network indices and students’ academic performance in an unweighted
baseline co-enrollment network different from that in a weighted network?</p>
      <p>We also examine how individual students' academic performance correlates with each other in each
network construction through network autocorrelation models. We assume that in a valid co-enrollment
network, peers with heavier weights on their connections have stronger correlations in their
performance.</p>
      <p>RQ 3:</p>
      <p>How does the autocorrelation model fit on a weighted co-enrollment network compared to an
unweighted baseline network?</p>
    </sec>
    <sec id="sec-7">
      <title>4. Methods and proposed analyses 4.1. Data</title>
      <p>The data used in constructing the course co-enrollment network come from the administrative data
from a large four-year public university in the United States. The administrative data includes
studentlevel courses-taking records and grades, and the course-level information for full-time undergraduate
students across multiple years. This context carries good representative value for research on
coenrollment networks for a few reasons. First, the large public university includes a variety of majors
and schools that are commonly in place at other institutions. Second, students come from very different
family backgrounds including those that are traditionally underrepresented. Third, courses at the
university have a variety of class size and delivery format, providing sufficient variations in course
contexts and the corresponding network constructions. In this study we restrict our analysis to the data
from 2015 to 2020 in order to follow the complete college experience of students from the 2015 and
2016 cohort. We only include course enrollment records for students who completed a course and got
a valid grade.</p>
    </sec>
    <sec id="sec-8">
      <title>4.2. Course Co-Enrollment Network Construction</title>
    </sec>
    <sec id="sec-9">
      <title>4.2.1. Baseline Network</title>
      <p>The course co-enrollment network is constructed as a one-mode network that each node represents
one student (Zhou et al., 2007). Students have ties with other students if they enrolled and completed
the same class. The network is an m*m matrix that m equal to the total number of students in that term
excluding students who were only in courses with only one student or students who failed all classes.</p>
      <p>Each cell in the matrix presents the weight of the tie of row m student and column n student. If they
went to and completed the same class then their cell would be filled with 1 instead of 0. If row m student
and column n student enrolled and completed more than one class, the cell would be filled with the total
overlapping courses they had.</p>
    </sec>
    <sec id="sec-10">
      <title>4.2.2. Weighted Ties</title>
      <p>In the baseline network, the existence of ties between two students solely depends on whether they
completed the same courses together, but in reality not all ties are equal. Considering the differences in
course contexts, we further add the edge weight based on the combination of different aspects of
courselevel information. The specific course features we use include:
• Course types, including lecture, seminar, lab, and discussion. Different types correspond to
different edge weights in the co-enrollment network based on the chance of interaction they
generally offer to students. The order from the most to the least weighted course type is seminar,
discussion, lab, and lecture;
• Course schedule (meeting times). Courses that meet more often correspond to larger edge
weight than courses with fewer meetings (Srinivasan et al., 2006);
• Class size. Smaller courses lead to larger edge weight because the chance of interaction between
students there is higher than in larger classes;
• Courses level (upper-division vs. lower division). Upper-division courses are weighted heavier
than lower-division courses since they generally expect more engagement from students.
4.3.</p>
    </sec>
    <sec id="sec-11">
      <title>Network Autocorrelation Model</title>
      <p>
        The network autocorrelation model enables us to analyze the social influence process among people
in an interdependent network
        <xref ref-type="bibr" rid="ref6">(Leenders, 2002)</xref>
        . In the autocorrelation model, ego’s endogenous
outcome variable is not only affected by the ego’s own covariates but also affected by other alters in
the same network with the ego. The strength of alters’ influence is determined by the weight matrix in
the autocorrelation model.
      </p>
      <p>In this study, students’ term GPA would be the endogenous outcome variable, and the covariates
include students’ cumulative GPA before the term and demographic characteristics (gender, race,
firstgeneration college student status, low-income status). In the baseline network, the weight matrix is
defined as described in Section 4.2.1; in the weighted network, the weight matrix is further computed
from the weighted ties following Section 4.2.2. By comparing the model fit on these different network
constructions, we can evaluate if incorporating more course information could capture more accurate
strength of students’ influence to each other in the course co-enrollment.</p>
    </sec>
    <sec id="sec-12">
      <title>5. Discussion</title>
      <p>This proposed study is contextualized in a specific usage of campus administrative data:
understanding students’ connection and peer influence through course co-enrollment. We focus on
finding the optimal approach to constructing co-enrollment networks from both student transcripts and
course-level metadata, largely because these administrative records only reflect co-presence and the
actual peer exposure and influence needs to be inferred. While the two analytical perspectives we take
(network statistics in relation to individual outcome; network autocorrelation model) aim at evaluating
the different network constructions, the results in turn could provide insights into how college students’
academic connection with each other varies with course characteristics. For policymakers, this is
informative for them to better tailor academic and curricular policies to the goal of promoting student
success.</p>
    </sec>
    <sec id="sec-13">
      <title>6. Acknowledgements</title>
      <p>This material is based upon work supported by the National Science Foundation under Grant
No.153500 and the Andrew W. Mellon Foundation under Grant No.1806-05902.</p>
    </sec>
    <sec id="sec-14">
      <title>7. References</title>
      <p>Computing and Networking (MobiCom ’06), ACM Press, 2006, pp. 86–97,
doi:10.1145/1161089.1161100.
[10] Weeden, Kim A., and Benjamin Cornwell. “The Small-World Network of College Classes:
Implications for Epidemic Spread on a University Campus.” Sociological Science, vol. 7, 2020,
pp. 222–41, doi:10.15195/V7.A9.
[11] Zhou, Tao, et al. “Bipartite Network Projection and Personal Recommendation.” Physical Review
E - Statistical, Nonlinear, and Soft Matter Physics, vol. 76, no. 4, 2007,
doi:10.1103/PhysRevE.76.046115.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Biancani</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McFarland</surname>
            <given-names>DA</given-names>
          </string-name>
          .
          <article-title>Social networks research in higher education</article-title>
          .
          <source>InHigher education: Handbook of theory and research</source>
          <year>2013</year>
          (pp.
          <fpage>151</fpage>
          -
          <lpage>215</lpage>
          ). Springer, Dordrecht. doi: https://doi.org/10.1007/
          <fpage>978</fpage>
          -94-007-5836-
          <issue>0</issue>
          _
          <fpage>4</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Eckles</surname>
            ,
            <given-names>James E.</given-names>
          </string-name>
          , and Eric G. Stradley.
          <article-title>“A Social Network Analysis of Student Retention Using Archival Data</article-title>
          .”
          <source>Social Psychology of Education</source>
          , vol.
          <volume>15</volume>
          , no.
          <issue>2</issue>
          ,
          <year>2012</year>
          , doi:10.1007/s11218-011- 9173-z.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Fincham</surname>
          </string-name>
          , Ed, et al. “
          <article-title>From Social Ties to Network Processes: Do Tie Definitions Matter?</article-title>
          ”
          <source>Journal of Learning Analytics</source>
          , vol.
          <volume>5</volume>
          , no.
          <issue>2</issue>
          ,
          <year>2018</year>
          , doi:10.18608/jla.
          <year>2018</year>
          .
          <volume>52</volume>
          .2.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Gardner</surname>
          </string-name>
          ,
          <string-name>
            <surname>Josh</surname>
          </string-name>
          , et al. “
          <article-title>Learn From Your (Markov) Neighbour: Co-Enrollment, Assortativity, and Grade Prediction in Undergraduate Courses</article-title>
          .
          <source>” Journal of Learning Analytics</source>
          , vol.
          <volume>5</volume>
          , no.
          <issue>3</issue>
          ,
          <string-name>
            <surname>Society</surname>
          </string-name>
          for Learning Analytics Research, Dec.
          <year>2018</year>
          , pp.
          <fpage>42</fpage>
          -
          <lpage>59</lpage>
          , doi:10.18608/jla.
          <year>2018</year>
          .
          <volume>53</volume>
          .4.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Israel</surname>
          </string-name>
          ,
          <string-name>
            <surname>Uriah</surname>
          </string-name>
          , et al. “
          <article-title>Campus Connections: Student and Course Networks in Higher Education.” Innovative Higher Education</article-title>
          , vol.
          <volume>45</volume>
          , no.
          <issue>2</issue>
          ,
          <string-name>
            <given-names>Innovative</given-names>
            <surname>Higher</surname>
          </string-name>
          <string-name>
            <surname>Education</surname>
          </string-name>
          ,
          <year>2020</year>
          , pp.
          <fpage>135</fpage>
          -
          <lpage>51</lpage>
          , doi:10.1007/s10755-019-09497-3.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Leenders</surname>
          </string-name>
          ,
          <article-title>Roger Th A</article-title>
          . J. “
          <article-title>Modeling Social Influence through Network Autocorrelation: Constructing the Weight Matrix</article-title>
          .” Social Networks,
          <year>2002</year>
          , doi:10.1016/S0378-
          <volume>8733</volume>
          (
          <issue>01</issue>
          )
          <fpage>00049</fpage>
          -
          <lpage>1</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <surname>Quan</surname>
          </string-name>
          , et al. “
          <article-title>Exploring Homophily in Demographics and Academic Performance Using Spatial-Temporal Student Networks</article-title>
          .
          <source>” Proceedings of The 13th International Conference on Educational Data Mining (EDM</source>
          <year>2020</year>
          ),
          <year>2020</year>
          , pp.
          <fpage>194</fpage>
          -
          <lpage>201</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Raabe</surname>
            ,
            <given-names>Isabel J.</given-names>
          </string-name>
          , et al. “
          <article-title>The Social Pipeline: How Friend Influence and Peer Exposure Widen the STEM Gender Gap</article-title>
          .”
          <source>Sociology of Education</source>
          ,
          <year>2019</year>
          , doi:10.1177/0038040718824095.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Srinivasan</surname>
          </string-name>
          ,
          <string-name>
            <surname>Vikram</surname>
          </string-name>
          , et al. “
          <article-title>Analysis and Implications of Student Contact Patterns Derived from Campus Schedules</article-title>
          .
          <source>” Proceedings of the 12th Annual International Conference on Mobile</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>