<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Analysis of Educational Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Diellor Hoxhaj</string-name>
          <email>diellorhoxhaj@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jan Hric</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Iveta Mrázová</string-name>
          <email>iveta.mrazova@mff.cuni.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Mathematics and Physics, Charles University</institution>
          ,
          <addr-line>Prague</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Higher education shapes society both at the individual and country-wide levels. This study examines the character of US four-year colleges in the context of the achieved graduation rates. Based on the data provided by the British Open University, we further investigate the precursors indicating student success/failure. Finally, we explore the impact of a higher education background on the structure of the parliament in the UK. Data mining techniques like clustering and decision trees adopted to analyze relevant educational data confirm a substantial impact of demographic factors and study behavior on academic success. Social network-based methods assist in revealing alumni connections in the UK parliament.</p>
      </abstract>
      <kwd-group>
        <kwd>educational data mining</kwd>
        <kwd>higher education</kwd>
        <kwd>graduation rates</kwd>
        <kwd>clustering</kwd>
        <kwd>decision trees</kwd>
        <kwd>social network analysis</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Related</title>
    </sec>
    <sec id="sec-3">
      <title>Work</title>
      <p>HE is tied to improved career opportunities, life, and even health. Student success is fundamental to
the sector because of the enormous costs associated with HE. Early detection of at-risk students could
call for timely actions improving students’ success rates [3], [23]. In HE, success exists at diferent
levels [26]. At the personal level, it may be acceptance into a university or securing stellar grades for
students and the perceived success can substantially impact an individual’s wellbeing, retention, and
future career prospects [19]. For institutions, success may be measured by graduation or retention</p>
      <p>CEUR
Workshop</p>
      <p>ISSN1613-0073
rates, or international rankings. For a country, success in HE may translate into economic development,
and globally, advanced skills and knowledge contribute to a civilized society, wealth, and stability.</p>
      <p>In general, student success is defined by academic achievement demonstrated through strong grades
and ultimately graduation, followed by advanced career prospects. Further, HE students are expected to
develop critical thinking skills, the ability to solve complex problems, and refine efective communication
[3], [23], [26]. In addition, a study involving prestigious Swedish HE programs noticed a new
phenomenon of efortless achievement perceived by students as an indicator of their ability to juggle
studies and extracurricular activities or future professional life [19]. Anyway, frequent short-term
employment may compromise longer-term academic outcomes. Beyond HE, employment rates track
graduates’ professional success, while alumni engagement reflects connection to the institution.</p>
      <p>The top two aspects predicting academic success include prior academic achievement and student
demographics like gender, age, race, and socioeconomic status (SES), as students who are not burdened by
ifnancial worries can better focus on their studies [ 3]. Further factors comprise students’ environment,
psychological attributes like motivation, study behavior, and integrated student e-learning activities. In
general, feeling safe and secure together with a strong sense of belonging to the campus community
boosts students’ wellbeing and increases retention [25], [26]. On the other hand, a lack of diversity might
negatively afect students’ sense of belonging and inhibit their academic achievement. Surprisingly, a
longitudinal study [17] found that Hungarian students do not benefit from higher-level
twenty-firstcentury skills like critical thinking or inductive reasoning. Moreover, good problem-solvers had higher
chances of dropping out than graduating. Emerging tools such as AI and ChatGPT were found to
enhance learning performance [23].</p>
    </sec>
    <sec id="sec-4">
      <title>3. Background</title>
      <p>Educational databases enabled the emergence of the Educational Data Mining (EDM) research field.
The iterative EDM process comprises six stages: data collection, initial data preparation, statistical
analysis, data preprocessing, data mining implementation, and result evaluation [18]. Lately, machine
learning techniques have been extensively used for predictive purposes. For a preliminary data
analysis, clustering techniques provide interpretability without employing costly labeled data. Clustering
methods partition the data into subsets of mutually similar data, dissimilar to data grouped in other
clusters. Decision trees induce a hierarchical sequence of decisions organized in a tree-like model to
facilitate explainable data classification.</p>
      <p>Social Network Analysis (SNA) studies interconnection patterns between individuals within a larger
system, e.g., an education-based one. At ETH Zurich, SNA was used to analyze the factors explaining
academic failure and success of engineering undergraduates [22]. In critical examination periods,
functional studying relationships strongly impact students’ success. Socially isolated students, on the
other hand, tend to score remarkably worse and are more likely to drop out of university regardless of
their SES and cognitive abilities.</p>
      <sec id="sec-4-1">
        <title>3.1. Clustering</title>
        <p>Let  &lt;</p>
        <p>
          for the number of clusters  and the number of data patterns  . The patterns assigned to
the same cluster are considered to be mutually similar, whereas the patterns from diferent clusters
are regarded as mutually dissimilar [9]. In the case of numerical data, the goal of clustering sounds
to find the best partition of a finite set of patterns
 ⊂ 
 into subsets  1, … ,   called clusters (and
represented by the centroids ⃗1, … , ⃗ ) such that the value of the applied objective function, e.g.,

∑
=1  ⃗ ∈ 
 =
∑ || ⃗ − ⃗ ||2 ;  ⃗ ∈  , 1 ≤  ≤ 
(
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
is optimized. The  -means clustering algorithm [15] and Kohonen Self-Organizing Feature Maps [13]
belong to popular techniques used for this purpose.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>3.1.1.  -means Clustering</title>
        <p>
          Initially, we choose the desired number of clusters  , and the algorithm randomly assigns the data
patterns to the clusters. Initial cluster centroids correspond to the mean of all data patterns from the
same cluster. Afterwards, the  -means clustering algorithm [15] iteratively reassigns the patterns to
clusters to minimize the objective function (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ). As the centroids might not actually be present within
the analyzed data, median values help to interpret the found cluster characteristics.
        </p>
        <sec id="sec-4-2-1">
          <title>3.1.2. Kohonen Self-Organizing Feature Maps</title>
          <p>Also, Kohonen Self-Organizing Feature Maps (SOMs) [13] can be used for preliminary analyses and
visualizations of high-dimensional data. SOMs map high-dimensional data onto the (output) neurons
arranged on a 2D topological grid that usually preserves the topography of the data in the original
space. Given an input pattern  ⃗, SOM finds the neuron with the closest weight vector  ⃗ . This neuron
is called the winner. During training, the weights  ⃗ of the winner and its neighbors on the grid are
updated at time  according to:
Then, the minimum average (dis)similarity of  to all patterns  ∈   ;  ≠  will be determined as:</p>
        </sec>
        <sec id="sec-4-2-2">
          <title>3.1.3. Silhouette Score</title>
          <p>Silhouette score helps estimate the adequate number of clusters. The quality of the underlying
clustering is assessed by comparing the similarity (, )
of patterns  and  from the same clusters to the
(dis)similarity of patterns from diferent clusters [ 21]. For each pattern  from cluster   ; |  | &gt; 1, we
evaluate its average similarity to all other patterns  ∈   :</p>
          <p>
            ⃗ ( + 1) =  ⃗ () + () ⋅ ℎ  () ⋅ ( ⃗−  ⃗  ())
() ∈ (
            <xref ref-type="bibr" rid="ref1">0, 1</xref>
            ) denotes the learning rates decreasing in time and ℎ () is the lateral interaction function
value, e.g., of the Mexican hat form, at time  .
          </p>
          <p>() =</p>
          <p>1
|  | − 1 ∈  , ≠
∑</p>
          <p>(, ).
() = min 1
 ≠
|  |
∑ (, ).</p>
          <p>∈ 
() =</p>
          <p>() − ()
max((), ())
entropy() − entropy ()
|  | ⋅ log2 |||  | )
||
For   &gt; 1, the silhouette score () corresponds to:
For |  | = 1, () is defined to be 0. The mean over all () determines the overall quality of the considered
clustering. Its values range from -1 to 1, with higher values indicating a better clustering.</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>3.2. Decision Trees</title>
        <p>Decision trees are built in a greedy manner, starting at the root and choosing the most informative
attribute  at each node to split the data  in a more class-uniform fashion. The C4.5 [20] and CART
[7] algorithms belong to the most common techniques. C4.5 uses the entropy-based information gain
ratio gainRT(, ) for data splitting:
(2)
(3)
(4)
(5)
(6)
from  with value  of attribute  :
with entropy () for attribute  having   diferent values and |  | standing for the number of patterns
For the number of patterns from class  in  , |  |:
entropy () = − ∑ (


=1
||
|  | ⋅ entropy(  )) .
entropy() = − ∑
 |  |
=1 ||
log2 |||  |</p>
        <sec id="sec-4-3-1">
          <title>3.2.1. CART algorithm</title>
          <p>by selecting attribute  ∗ and its splitting value   ∗ to minimize
The CART algorithm [7] can handle both categorical and numerical attributes. During training, it
raises a binary tree by iteratively splitting the data received at each node into two subsets based on the
selected attribute and its value. CART stops when all the present data points belong to the same class.</p>
          <p>Let us consider a dataset</p>
          <p>with  training patterns {( ⃗,   ); 1 ≤  ≤ } .  ⃗ is the vector of attribute
values and   is its class label indicating one of  possible classes. The dataset  is split into   and  
Gini(, ,  ) =
||
|  | Gini(  ) +
|  | Gini(  ).
||
Gini index for dataset  is then calculated as:</p>
          <p>Gini() = 1 −

∑ (
=1
|  | 2</p>
          <p>)
||
where   comprises the patterns from  belonging to class  . To avoid deep and overfitted trees,
maximum depth, number of nodes, or minimum number of data in a node can be constrained [27].</p>
          <p>The so-called cost-complexity post-pruning might follow. This procedure systematically removes
entire subtrees and replaces them with leaf nodes with previously found class labels. We will denote
the error rates of tree  over the dataset  by ( , )
. Further, let prune ( , ) define the tree obtained
by pruning subtree  from  . The subtree  ∗ that minimizes cost-complexity  ,
 =</p>
          <p>( prune ( , ), ) − ( , )
|leaves( )| − | leaves(prune ( , ))|
is then chosen for removal. The validation set is used to evaluate the error rates of the pruned trees.</p>
        </sec>
        <sec id="sec-4-3-2">
          <title>3.2.2. Ensemble learning</title>
          <p>Decision tree classifiers can be combined through ensemble learning to improve the overall performance
for imbalanced data. Popular generalizations include adaptive boosting (AdaBoost) to lower the overall
bias and Random Forests to reduce the variance of averaged predictions [1]. Alternatively, metrics like
precision, recall, and F1-score can be used to reflect the actual model performance better.</p>
          <p>AdaBoost [5], [10] assigns a weight to each pattern based on its dificulty for classification. At each
iteration, an additional classifier is built with the weights updated according to the result of the previous
classification. The final classification is determined as a weighted output of all previous classifiers,
giving a higher weight to the more accurate ones. Random forests [8] inject more variety to the trees
that may be used in parallel by randomly limiting attribute choices the trees can make at each node.
Trained trees do not have to be pruned; the majority vote determines the final classification output.
(7)
(8)
(9)
(10)
(11)</p>
        </sec>
      </sec>
      <sec id="sec-4-4">
        <title>3.3. Social Network Analysis</title>
        <p>A social network is given by the graph ( , ) . Its vertices represent actors and its edges correspond
to the relationships between the actors. The importance of an actor  can be assessed by means of
centrality measures [4].</p>
        <p>• The degree centrality of actor  is defined as   () = dg()/(| | − 1) . High   () values
characterize influential actors with a direct relationship to others.
• The closeness centrality measures the eficiency of accessing other actors using the inverse of the
average shortest path distance from  to all other actors  , (,  )
:
(12)
(13)
(14)
(15)
• Let  ,</p>
        <p>,</p>
        <p>be the total number of shortest paths between the actors  and  . Some of these paths go
through actor  , let their number be   , . The betweenness centrality of  is the sum of the ratios
/ , for all available pairs of  and  normalized over (| | − 1)(| | − 2)/2 :</p>
        <p>Actors with a high   ( ) exhibit more control over the network by interconnecting its parts.
• The eigenvector centrality of actor  reflects the importance of all its neighbors. With the biggest
eigenvalue of the adjacency matrix denoted as  :
| |
=1
  () =</p>
        <p>∑ (,  )/(| | − 1),
  ( ) =</p>
        <p>2
(| | − 1)(| | − 2) &lt;
∑</p>
        <p>, .</p>
        <p>,
  ( ) =  −1</p>
        <p>∑
∈ ne()
  () .</p>
        <sec id="sec-4-4-1">
          <title>3.3.1. Community Detection</title>
          <p>Community detection identifies the subsets of actors that are more densely tied together than to the
rest of the network. The so-called network modularity measures the concentration of edges within
communities compared to the expected concentration of randomly distributed edges. High modularity
values indicate optimum partitioning.</p>
          <p>Definition 1 (Network Modularity).</p>
          <p>Let  = ( , )
be a graph and  ∶  → ℕ
be a community
assigning function. The network modularity   of  with the communities given by  is defined as
  =</p>
          <p>1
2|| ,∈
∑ (A, −
dg() ⋅ dg( )
2||
)  (),()
.
 denotes Kronecker delta ( , = 1 if  =  , otherwise  , = 0), A, is the (,  ) entry in the adjacency
matrix, i.e., the number of edges between  and  .</p>
          <p>Various methods exist for community detection, e.g., the Girvan-Newman algorithm [11], the
KernighanLin algorithm [12], or the Leiden algorithm [24]. Further, we will use the Louvain method [6] capable
of quickly detecting communities of varying sizes.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Data Acquisition</title>
      <sec id="sec-5-1">
        <title>4.1. Data on US Four-Year Colleges</title>
        <p>The dataset on graduation data of US four-year colleges was acquired from the National Center
for Educational Statistics https://nces.ed.gov/. We gained the required university IDs from https:
//secondnature.org/wp-content/uploads/Workbookv7-Sheet1.pdf. Along with the numbers of enrolled
students and graduation rates on 7 diferent student ethnic groups (‘Asian’, ‘Black/African American’,
‘Hispanic/Latino’, ‘Race and Ethnicity Unknown’, ‘Two or More Races’, ‘US-Nonresidents’, and ‘White’),
the total number of men, women, and tuition fees was obtained there.</p>
        <p>We enhanced the dataset with the county hosting the respective university’s median household
income (MHI). We scraped this information from https://www.countyhealthrankings.org. To avoid
single-year fluctuations, the data was collected over three years, 2020-2022, resulting in 3,975 data
patterns, each comprising 11 numeric attributes.</p>
      </sec>
      <sec id="sec-5-2">
        <title>4.2. UK Data on Student Success/Failure</title>
        <p>The statistics we will use to analyze study behavior linked to students’ success/failure stems from the
UK Open University (OU). OU courses are taught mostly online (of-campus) [ 14]. The Open University
Learning Analytics Dataset (OULAD) contains evidence about students’ behavior while studying and is
available at https://analyse.kmi.open.ac.uk/open_dataset. The file includes information on seven courses
involving 32,593 students from 2013 and 2014. The collected data combines students’ personal details
along with their exam scores, and records how they interacted with the Virtual Learning Environment
(VLE), given by the summaries of their clicks. After preprocessing, the available data comprised 35
attributes over 21663 patterns.</p>
      </sec>
      <sec id="sec-5-3">
        <title>4.3. Data on the UK Parliament Structure</title>
        <p>The UK Parliament consists of the House of Lords (787 appointed members as of April 2024) and the
House of Commons with 650 Members of Parliament (MPs). MPs are elected in general elections, and
the party or coalition with the majority in the House of Commons forms the government. Usually, the
leader of that party or coalition becomes the Prime Minister. In this study, we will analyze the House of
Commons structure based on its members’ previous education.</p>
        <p>For the data analysis, the information on the MPs, such as Name, Alma Mater, and Party membership,
was scraped from the site: https://www.parallelparliament.co.uk/MPs in April 2024. We used the MPs’
Wikipedia pages to find information on the universities they attended. We enhanced the data by adding
additional attributes (‘Party Membership’, ‘Age’, and ‘Gender’. Afterwards, we cleaned the data to
avoid possible inconsistencies. In the formed social network, a vertex represents each MP. An edge will
connect any two MPs who graduated from the same university.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Supporting Experiments</title>
      <sec id="sec-6-1">
        <title>5.1. US Graduation Rates Analysis</title>
        <p>This experiment aimed to understand the character of four-year HE institutions from the perspective of
the achieved graduation rates. For the initial data analysis, a SOM with a hexagonal mesh of 35 x 35
neurons was trained for 500000 iterations using the scraped data on US four-year colleges. Afterwards,
the SOM’s weight vectors were clustered by the  -means algorithm. Silhouette score indicated two
viable choices for  , namely  = 2 and  = 6 , with the scores for both slightly below 0.36. We grouped
the universities into six distinctive clusters to opt for higher variability.</p>
        <p>Cluster ID 0 comprises high-performing institutions with reasonably many students diversified across
various ethnicities (median student population size of cca 720). Their tuition fees are the highest (median
of cca $55,000 a year). The median household income of the counties hosting these universities is also
high (cca $74,000). Institutions from this cluster include Harvard, Princeton, and Yale universities, which
consistently exhibit high median graduation rates across all ethnic groups (over 80%). The institutions
from cluster ID 4 are notably more afordable (with median tuition fees of cca $24,000), administer
significantly more students (median student population size of cca 3400), yet still attain above-average
graduation rates (between 55% and 70%) across the respective ethnicity groups. However, the hosting
counties’ median household income is lower (cca $65,000) when compared to cluster ID 0.</p>
        <p>Colleges from cluster ID 1 do not demand high tuition fees (median at roughly $17,000) but achieve
only low graduation rates (between 15% and 40% across the ethnicities). While they serve a relatively
small student body (median of around 150), ‘Asian’, ‘US-Nonresidents’, and ‘Two or More Races’
ethnicity representatives do not seem to prefer enrolling in these universities. The median household
income of the hosting counties is comparable to cluster ID 4 (cca $64,000). Clusters ID 2, 3, and 5
comprise HE institutions with a more petite student body and varying graduation rates across ethnicity
groups (moderate to low-performance for cluster ID 2; on par with cluster ID 4 for cluster ID 3; and
above average graduation rates for ‘Asian’ and ‘White’ for cluster ID 5, with ‘US-Nonresidents’ rarely
attending). Median county household incomes and tuition fees also difer.</p>
      </sec>
      <sec id="sec-6-2">
        <title>5.2. Prediction of student success/failure in a virtual learning environment</title>
        <p>While many papers analyzed the OULAD dataset, the produced models still lack actionable
interpretability. Therefore, this experiment focused on finding an accurate decision tree-based model for students’
success/failure assessment. Random Forests performed the best with an accuracy 0.86 and an F1-score
of 0.86. A recent paper [2], reports comparable classification results on the same dataset (accuracy of
0.83 for the CART algorithm vs. 0.86 for Random Forests) - see Table 1.</p>
        <p>To facilitate explainability, we also provide a pruned CART decision tree 2 elucidating the attributes
that most contribute to the student’s success. Based on our findings, essential attributes to predict
student success comprise the clicks recorded on the homepage of VLE, the number of times students
interacted with the quizzes in VLE, and the domain of the course - STEM or social sciences, where
students tend to succeed more, and their scores on assessments throughout the semester.</p>
        <p>We also tested the option to predict students’ success or failure using only demographic features
comprising the deprivation index (IMD), age, disability, studied credits, highest education, and final
result (PASS or FAIL). However, the formed trees performed much worse - see Table 1. The accuracy
of the best C4.5 algorithm reached just 63 % indicating a deficiency of sole demographic features for
accurate performance predictions. Out of the demographic features, the highest level of students’
education best indicated their success or failure.</p>
      </sec>
      <sec id="sec-6-3">
        <title>5.3. UK Parliament Analysis</title>
        <p>In this experiment, our objective was to identify the most influential members in the parliament and
to detect communities of MPs based on their educational background. Each MP could have attended
multiple universities. Each vertex in the graph thus represents an MP and the vertices are interconnected
by an edge if both MPs participated at the same university – see Figure 3.</p>
        <sec id="sec-6-3-1">
          <title>5.3.1. UK Parliament - the Graph Structure</title>
          <p>Table 2 lists the centralities of 5 MPs selected for their consistently high scores. MPs with high centrality
values typically graduated from two or more universities shared by other MPs, are well-connected to
other influential MPs, and play a central role in the inspected network.</p>
          <p>John Glen interlinks the graduates of Oxford and Cambridge. He also studied at King’s College
London and was named Paymaster General and Minister for the Cabinet Ofice in 2023. Matt Hancock
and Tanmanjeet Singh Dhesi also interconnect the graduates from Oxford with those from Cambridge,
see Figure 3. Matt Hancock served as the Secretary of State for Health and Social Care from 2018
to 2021. Tanmanjeet Singh Dhesi further studied at University College London. Ed Davey and Keir
Starmer exhibit high betweenness centrality. Keir Starmer graduated from the University of Leeds and
from Oxford, thus serving as a bridge in the MP social network. Keir Starmer became the leader of the
Labour Party in 2020 and won last year’s election. In 2024, he became Prime Minister of the UK.</p>
        </sec>
        <sec id="sec-6-3-2">
          <title>5.3.2. UK Parliament - Community Detection</title>
          <p>We used the Louvain algorithm for community detection in the UK MPs’ graph, which was investigated.
Twenty-nine communities were found, yet for further analysis, we considered only those communities
with more than 15 MPs. Figure 4 lists those 7 communities comprising a total 548 MPs (based on the
data from April 2024). The MPs most frequently attend Oxford and Cambridge universities. Explicitly,
communities 1 and 2 emerged around these universities. They are predominantly male (around 70%)
and approximately 70% of their MPs belong to the Conservative Party.</p>
          <p>MPs from communities 4 and 7 are younger than those from other communities. Communities 6 and
7 appear more gender-equal, and communities 3, 4, 5, and 6 include more members from the Labour
Party than the first two communities (significantly more than 30%). Scottish National Party members
dominate community 7; they mostly attended the University of Glasgow and the University of Stirling,</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>6. Conclusions</title>
      <p>This work aimed to address a burning issue in recent HE, namely, worldwide low graduation rates. We
approached this task by clustering, decision trees, and social network analysis methods. The analysis
of four-year US colleges indicates that universities consistently achieving high graduation rates are
characterized by a student body that is of a reasonable size and ethnically diverse. These schools are
located in counties with a higher median household income, but also demand high tuition fees.</p>
      <p>Examining the OULAD student data highlights the gravity of a sustained learning efort and frequent
use of studying materials to succeed in HE. The education-based UK Parliament structure analysis was
elaborated using the 2019 election results. In July 2024, the structure of the House of Commons changed
significantly. Labour Party won 404 seats (202 in 2019). The Conservative Party secured 121 seats (365
in 2019). Liberal Democrats raised the number of their seats to 72 (11 in 2019). The Scottish National
Party obtained nine seats (48 in 2019) [16]. Still, past parliament centrality measures clearly identified
influential MPs for the newly elected parliament. Keir Starmer became Prime Minister, and Ed Davey,
the Leader of the Liberal Democrats, won 61 more seats compared to 2019.</p>
      <p>Overall, the obtained results insinuate the significance of a supportive environment (both financially
and socially) that encourages building of functional relationships during students’ formative years
and promotes mutual collaboration and frequent exchange of ideas. Talent and diligence are equally
essential to succeed in HE. Further research should also contemplate other factors that might account
for students’ academic success, like the criteria applied during the admission process, the candidates’
personality traits, talents, and motivations, or the character of ofered classes and educational experience.
When analyzing professional alliances, we plan to utilize large language models (LLMs) and the recently
introduced graph neural network models.</p>
      <p>Communities</p>
      <p>Description
Community 1
(84 members,
59 Male, 25 Female)</p>
      <p>Approximately 70% of the members in
this community belong to the
Conservative Party. 83 out of 84
members in this community attended
Oxford while some of them also
attended London School of Economics.</p>
      <p>Community 2
(59 members,
40 Male, 19 Female)</p>
      <p>Approximately 70% of the members in
this community belong to the
Conservative Party, and around 30%
are from Labour Party.</p>
      <p>Oxford (83),
London School of Economics
(5)</p>
      <p>The average age in</p>
      <p>Community 1 is 53.</p>
      <p>Cambridge (56),
Oxford (4)</p>
      <p>The average age in</p>
      <p>Community 2 is 55.</p>
      <p>Community 3
(82 members,</p>
      <p>belong to either Conservative or
53 Male, 29 Female) Labour Party.</p>
      <p>Most of the members in this community University of London (29),</p>
      <p>Durham University (9),
University of Sussex (10),
University of Birmingham (10)</p>
      <p>The average age in</p>
      <p>Community 3 is 53.</p>
      <p>Community 4
(52 members,
33 Male, 19 Female)</p>
      <p>In this community, approximately 36%
of the members are from the Labour
Party, and around 50% from the
Conservative Party. We can observe
that more members are from the
Labour Party compared to the first and
second community.</p>
      <p>King's Col ege London (13),
University of Exeter (11),
University Col ege London (9),
Aberystwyth University (9)</p>
      <p>The average age in</p>
      <p>Community 4 is 51.</p>
      <p>Community 5
(18 members,
11 Male, 7 Female)</p>
      <p>Party; the community is more balanced
in terms of gender.</p>
      <p>This community has only members of London School of Economics
the Conservative Party and the Labour (16),</p>
      <p>Brunel University (3)
Community 6
(71 members,
39 Male, 32 Female)</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>The authors have not employed any Generative AI tools.</p>
      <p>vol. 8, 2022, 29 p.
adaptive boosting for augmenting performance of machine learning models”, PeerJ Comput Sci.,
[3] E. Alyahyan and D. Dűştegőr, “Predicting academic success in higher education: literature review
and best practices”, Int. J. of Educational Technology in Higher Education, vol. 17, no. 3, 2020, 21 p.
[4] A.-L. Barabási and M. Pósfai, “Network Science”, Cambridge University Press, 2016.
[5] P. Beja-Battis, “Overview of AdaBoost: Reconciling its views to better understand its dynamics”,
arXiv:2310.18323v1, 2023, 39p.
[6] V. D. Blondel, J.-L. Guillaume, R. Lambiotte and E. Lefebvre, “Fast unfolding of communitiesin
large networks”, J. of Statistical Mechanics: Theory and Experiment, vol. 2008, no. 10, 2008, 12 p.
[7] L. Breiman, J.H. Friedman, R.A. Olshen and Ch.J. Stone, “Classification and Regresion Trees”,</p>
      <p>Taylor &amp; Francis, 1984.
[8] L. Breiman, “Random Forests”, Statistics Department, University of California, Berkeley, 2001, 33p.
[9] B.S. Duran and P.L. Odell, “Cluster Analysis: A Survey”, Springer, 2013.
[10] Y. Freund and R. Schapire, “Experiment with a new boosting algorithm”, Machine Learning: Proc.</p>
      <p>of the Thirteenth International Conf., pp.148 - 156.
[11] M. Girvan and M.E.J. Newman, “Community structure in social and biological networks”, PNAS,
vol. 99, no. 12, 2002, pp. 7821-7826.
[12] B.W. Kerninghan and S. Lin, “An eficient heuristic procedure for partitioning graphs”, The Bell
system technical journal, vol. 49, no. 2, 1970, pp. 291-307.
[13] T. Kohonen, “Self-Organized Formation of Topologically Correct Feature Maps”, Biological
Cybernetics, vol. 43, no.1, 1982, pp. 59-69.
[14] J. Kuzilek, M. Hlosta and Z. Zdrahal, “Open university learning analytics dataset”, Sci Data, vol. 4,
2017.
[15] J. MacQueen, “Some methods for classification and analysis of multivariate observations”, Proc. of
the fith Berkeley Symposium on Mathematical Statistics and Probability, 1967, pp. 281-297.
[16] “Membership of the UK Parliament”,
https://commonslibrary.parliament.uk/researchbriefings/sn01250/, accessed 2024-07-15.
[17] G. Molnár and Á. Kocsis, “Cognitive and non-cognitive predictors of academic success in higher
education: a large-scale longitudinal study”, Stud. in Higher Educ., vol. 49, no. 9, 2024, pp.
16101624.
[18] O. Moscoso-Zea, A. Sampedro and S. Lujan-Mora, “Datawarehouse design for educational data
mining”, Proc. of ITHET, 2016, pp. 1-6.
[19] A.-S. Nystrőm, C. Jackson and M.S. Karlsson, “What counts as success? Constructions of
achievement in prestigious higher education programmes”, Research Papers in Education, vol. 34, no. 4,
2019, pp. 465-482.
[20] J.R. Quinlan, “C4.5: Programs for Machine Learning”, Morgan Kaufmann, 1993.
[21] P.J. Rousseeuw, “Silhouettes: A graphical aid to the interpretation and validation of cluster analysis”,</p>
      <p>Journal of Computational and Applied Mathematics, vol. 20, 1987, pp. 53-65.
[22] Ch. Stadtfeld, A. Vörös, T. Elmer, Z. Boda and I.J. Raabe, “Integration in emerging social networks
explains academic failure and success”, PNAS, vol. 116, no. 3, 2019, pp. 792-797.
[23] T.G. Tareke, T.Z. Oo and K. Jozsa, “Bridging theoretical gaps to improve students´ academic
success in higher education in the digital era: A systematic literature review”, Int. J. of Educational
Research Open, vol. 9, article no. 100510, 2025, 12 p.
[24] V.A. Traag, L. Waltman and N.J. van Eck, “From Louvain to Leiden: guaranteeing well-connected
communities”, Scientific Reports, vol. 9, article no. 5233, 2019, 12 p.
[25] M. Weatherton and E.E. Schussler, “Success for All? A Call to Re-examine How Student Success Is</p>
      <p>Defined in Higher Education”, CBE – Life Sciences Education, vol. 20:es3, 2021, pp. 1-13.
[26] L.N. Wood and Y.A. Breyer, “Success in Higher Education”, Springer Nature, 2017.
[27] X. Wu, V. Kumar, R. Quinlan et al. “Top 10 algorithms in data mining”, Knowl Inf Syst, vol. 14,
2008, pp. 1–37.
[28] Czechia in the data, “Šance na dostudování českých VŠ”,
https://www.ceskovdatech.cz/graphs/vs2.php, accessed: 2025-06-29.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C.C.</given-names>
            <surname>Aggarwal</surname>
          </string-name>
          , “
          <article-title>Data mining: the textbook</article-title>
          ”, Springer,
          <year>2015</year>
          . [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Adnan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.A.S.</given-names>
            <surname>Alarood</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.I.</given-names>
            <surname>Uddin</surname>
          </string-name>
          and
          <string-name>
            <surname>I.U</surname>
          </string-name>
          .Rehman, “
          <article-title>Utilizing grid search crossvalidation with</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>