<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>SPEET: Visual Data Analysis of Engineering Students Performance From Academic Data?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>M. Dom nguez</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>R. Vilanova</string-name>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>M.A. Prada</string-name>
          <email>ma.pradag@unileon.es</email>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>J. Vicario</string-name>
          <email>Jose.Vicariog@uab.cat</email>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>M. Barbu</string-name>
          <email>Marian.Barbu@ugal.ro</email>
          <xref ref-type="aff" rid="aff6">6</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>M. J. Varanda</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>M. Podpora</string-name>
          <email>michal.podpora@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>U. Spagnolini</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>P. Alves</string-name>
          <email>palvesg@ipb.pt</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>A. Paganoni</string-name>
          <email>anna.paganonig@polimi.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Instituto Politecnico de Braganca</institution>
          ,
          <country country="PT">PORTUGAL</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Opole University of Technology</institution>
          ,
          <country country="PL">POLAND</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Politecnico di Milano</institution>
          ,
          <country country="IT">ITALY</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Ramon.Vilanova</institution>
          ,
          <addr-line>Jose.Vicario</addr-line>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Universidad de Leon</institution>
          ,
          <country country="ES">SPAIN</country>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>Universitat Autonoma de Barcelona</institution>
          ,
          <country country="ES">SPAIN</country>
        </aff>
        <aff id="aff6">
          <label>6</label>
          <institution>University Dunarea de Jos</institution>
          ,
          <addr-line>Galati</addr-line>
          ,
          <country country="RO">ROMANIA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <abstract>
        <p>This paper presents the steps conducted to design and develop an IT Tool for Visual Data Analysis within the SPEET (Student Pro le for Enhancing Engineering Tutoring) ERASMUS+ project. The global goals of the project are to provide insight into student behaviours, to identify patterns and relevant factors of academic success, to facilitate the discovery and understanding of pro les of engineering students, and to analyse the di erences across European institutions. Those goals are partly covered by the visualisations that the proposed tool comprises. Speci cally, the aim is to provide support to the sta involved in tutoring, facilitating the exploratory analysis that might lead them to discover and understand student pro les. For that purpose, visual interaction and two main approaches are used, one based on the joint display of interconnected visualisations and the other focused on dimensionality reduction. The tool is validated on a data set that includes variables present in a typical student record.</p>
      </abstract>
      <kwd-group>
        <kwd>Visual Analytics Academic Data Dimensionality Reduction</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The vast amount of data collected by higher education institutions and the
growing availability of analytic tools, makes it increasingly interesting to apply data
analysis in order to support educational or managerial goals. The SPEET
(Student Pro le for Enhancing Engineering Tutoring) project aims to determine and</p>
      <p>
        Copyright © 2018 for this paper by its authors. Copying permitted for private and academic purposes
categorise the di erent pro les for engineering students across Europe, in order
to improve tutoring actions so that this can help students to achieve better
results and to complete the degree successfully [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. For that purpose, it is proposed
to perform an analysis of student record data, obtained from the academic o ces
of the Engineering Schools/Faculties.
      </p>
      <p>
        The application of machine learning techniques to provide a, somewhat
automatic, analysis of academic data is a common approach in the elds of
Educational Data Mining (EDM) and Learning Analytics (LA). Nevertheless, it is also
often interesting to involve the human analyst in the task of knowledge
discovery [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ]. Indeed, visual analysis approaches have been used to analyze
multidimensional data from on-line educational environments, such as performance
in exams or assignments, behaviour patterns, access to resources, tutor-student
interaction, etc. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        Visual analytics, understood as a blend of information visualisation and
advanced computational methods, is useful for the analysis and understanding of
complex processes, especially when data are nonhomogeneous or noisy [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The
reason is that taking advantage of the ability of humans to detect structure
in complex visual presentations, as well as their exibility and ability to apply
prior knowledge, facilitates the process aimed to understand the data, to
identify their nature, and to create hypotheses [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. For that purpose, visual analytics
uses several strategies, such as pre-attentive processing and visual recall, that
reduce cognitive load [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. But a key feature is the interactive manipulation of
resources, which is used to drive a semi-automated analytical process that enables
a dialogue between the human and the tool. An example of the application of
interactive visualization in the eld of learning analytics can be found in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        During this human-in-the-loop process, analysts iteratively update their
understanding of data, to meet the evidence discovered through exploration [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
The joint display of several interconnected visualisations is known to be
interesting for visual analytics [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. On the other hand, dimensionality reduction [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
is an unsupervised learning approach that is commonly used for multivariate
data visualisation. Since it aims at representing high-dimensional data in
lowdimensional spaces, while preserving most of its structure, the resulting
projection can be visualised as a scatterplot. By means of the spatialisation principle,
which assumes that closeness in the representation can be assimilated to high
similarity in the original space, an intuitive recognition of salient patterns in
that scatterplot is possible [
        <xref ref-type="bibr" rid="ref11 ref9">9, 11</xref>
        ].
      </p>
      <p>
        This paper presents the conceptualisation of a practical tool for visual data
analysis within the SPEET7 ERASMUS+ project. The goals are to provide
support to the sta involved in tutoring, facilitating the exploratory analysis of
performance-related student data to discover and understand student pro les.
For that purpose, the tool is based on the combination of visualisation,
interaction and machine learning techniques. For the implementation details and
validation of the tool, a data set has been proposed. It only includes variables
present in a typical student record, such as the details of the student (such as,
7 Student Pro le for Enhancing Tutoring Engineering (www.speet-project.com)
for example, age, geographical information, previous studies and family
background), school, degree, courses undertaken, scores, etc. Although the scope of
this data set is limited, similar data structures have recently been used in
developments oriented to the prediction of performance and detection of drop-outs or
students at risk [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>The paper discusses the suitability of visual analytics for the exploration of
academic data. For that purpose, it presents, in section 2, the background of this
endeavour. Section 3 describes the approaches proposed for the analysis of the
available data, whereas section 4 outlines the key elements of the implementation.
Finally, the last section discusses the main conclusions.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Background</title>
      <sec id="sec-2-1">
        <title>The SPEET Project</title>
        <p>SPEET is an European project funded under the ERASMUS+ programme as a
Strategic Partnership for higher education. The partnership includes universities
from Spain, Portugal, Italy, Poland and Romania:
{ Spain: Universitat Autonoma de Barcelona (UAB) and Univ. de Leon (ULEON)
{ Romania: University Dunarea de Jos, Galati (GALATI)
{ Portugal: Instituto Politecnico de Braganca (IPB)
{ Poland: Opole University of Technology (OPOLE)
{ Italy: Politecnico di Milano (POLIMI)</p>
        <p>The nal aim of this project is to determine and categorise the di erent
pro les for engineering students across Europe. The main rationale behind this
proposal is the observation that students' performance seems to follow some
classi cation according to their behaviour while conducting their studies. Also
the observation that this knowledge would be a valuable help for tutors to better
know their students and improve counselling actions. On the basis of this
scenario, an opportunity emerges from the synergy among (a) the great amount of
academic data currently available at the academic o ces of faculties and schools,
and (b) the growing availability of data science approaches to analyse data and
to extract knowledge.</p>
        <p>Therefore, the main objective of this project is to apply data analysis
algorithms to process these data in order to identify and to extract information about
student pro les. In this scope, the considered pro les are, e.g., students that will
nish degree on time, students that are blocked on a certain set of subjects,
students that will leave degree earlier, etc. Another characteristic of the SPEET
project is its transnational nature, aimed to identify common characteristics on
engineering students coming from di erent EU institutions. For that purpose, it
is proposed to conduct an analysis both at country and at transnational level.
The comparison of results across EU countries improves the understanding of
similarities and di erences among countries. If discrepancies arise, a more
detailed country-wise analysis can be carried out to expose the details and the
potential causes behind those di erences.</p>
        <p>The proposed goals of the tool described in this paper are aligned with those
of the project, i.e., to provide insight into student behaviours, to identify
patterns and relevant factors of academic success, to facilitate the discovery and
understanding of pro les of engineering students, and to analyse the di erences
across European institutions.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Data Set</title>
        <p>Due to the transnational nature of the SPEET project, it is necessary to choose
appropriate variables and representation to cover the di erences in course
organisation at a country level. Additionally, the dataset must include students'
information while complying with privacy regulations of the European Union
(e.g., the General Data Protection Regulation (GDPR) (EU) 2016/679).</p>
        <p>
          For that reason, the proposed dataset uses variables obtained from the
administrative records of the students, such as anonymised indicators about the
socio-economic and educational environment, courses undertaken, and previous
or current academic performance. It is well known that this information only
covers in part the external factors of academic success [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. But the hypothesis
is that these indicators are enough to at least identify, in a rst instance, the
students at risk. Furthermore, it is possible to augment the data set with other
potentially useful additional data sources.
        </p>
        <p>Figure 1 shows the initial, minimum core data set, proposed to perform
the analysis. From an interpretation perspective, variables can be de ned as
explanatory or performance-related. Among the variables that the core data set
comprises, there are numerical (discrete and continuous) and categorical data
(and, in particular, spatial data).</p>
        <p>Students
StudentID (PK)
This section describes two methods oriented to achieve the aforementioned goals,
which can materialise in a set of questions that are interesting to address:
1. Is it possible to establish hypotheses about the relation between explanatory
variables and academic performance?
2. Can we detect clear trends in the score distribution grouped with respect to
another variable?
3. Are there clear di erences among the di erent institutions/degrees?
4. Can we visually verify the ndings obtained by the automatic analysis?
5. Can we distinguish a clear data structure and is this structure explainable
in relation to a certain variable?</p>
        <p>One method relies strongly on interaction, whereas the other one is an
example of the natural integration of machine learning in the visual analytics process.
3.1</p>
      </sec>
      <sec id="sec-2-3">
        <title>Data hypercube for coordinated views of data</title>
        <p>
          This approach is based on the connection of visualisations and their coordination,
in order to provide a global view of the data set that facilitates the exploration of
correlations between variables. For that purpose, the data set should be viewed
as a multi-dimensional array where each variable is a dimension, being possible
to interpret it as a data (hyper-)cube [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] (see Fig. 2). This abstraction resembles
that of On-Line Analytical Processing (OLAP) in the eld of business
intelligence, which enables the data analysis by means of four basic operations: roll-up
(aggregation), drill-down (disaggregation), slicing (selection in one dimension)
and dicing (selection in more than one dimension).
        </p>
        <p>
          With this structure, it is possible to build a visualisation based on the joint
and simultaneous view of coordinated histograms or bar charts in the same
dashboard. The usefulness of this view is increased if users are allowed to lter
one or several factors and those lters trigger a uid update of the distributions
of the other charts. This way, users can explore the distributions of the variables
and establish links between them [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. Furthermore, since the visualisation works
with the original data instead of a model based on certain assumptions, a higher
reliability of the insight acquired with this approach is expected.
3.2
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>Dimensionality reduction</title>
        <p>
          Dimensionality reduction is a common approach in multivariate data
visualisation [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. It takes advantage of the fact that it is generally possible to
approximate data using a fewer number of features while preserving most of the
variability of data, because high-dimensional data tend to lie on an embedded
low-dimensional manifold. This reduction might be useful as a previous step to
other machine learning techniques in order to alleviate the generalisation
problems. However, for visualisation purposes, the aim is just to project data onto a
2- or 3-dimensional space that can be visualised by means of, e.g., a scatter plot.
        </p>
        <p>
          Many alternative techniques can be used for this purpose [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. Some of them
rely on strong assumptions, such as PCA (Principal Component Analysis) for
linear data. Other ones, such as the manifold learning algorithms, are
powerful non-linear techniques with strong performance in many data sets, although
sometimes they fail to retain both local and global structure of real data.
        </p>
        <p>
          Among the manifold learning techniques, comparisons available in the
previous literature[
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] show that t-SNE (t-distributed Stochastic Neighbour
Embedding) generally produces, in general, better visualisations. The technique is
a variation of Stochastic Neighbour Embedding (SNE) [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], an algorithm that
computes conditional probabilities (representing similarities) from the pairwise
high-dimensional and low-dimensional Euclidean distances and aims to nd the
data projection that minimises the mismatch between these probabilities. The
t-SNE technique alleviates some problems of SNE by using a symmetric version
of the SNE cost function with simpler gradients and a Student-t distribution to
compute similarities in the low-dimensional space [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. As a result, t-SNE is
easier to optimise, do not accumulate data points in the centre of the visualisation
and it is able to reveal structure at di erent scales. For that reason, t-SNE is
selected as the dimensionality reduction algorithm for the visualisations.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <p>In this section, the application of the proposed methods is discussed. That
involves the algorithmic or visualisation details, and software implementation.
4.1</p>
      <sec id="sec-3-1">
        <title>Coordinated view</title>
        <p>The visualisation of coordinated histograms that can be interactively ltered
by one or more variables is very useful for the proposed application, because
it allows, in real time, to validate or re ne the hypotheses an expert might
develop about a set of students. Thus, with the appropriate ltering and
aggregation operations, it would be possible to visualise the average distribution of a
performance-oriented variable grouped by an exploratory one, or to analyse the
distribution of all variables when we only consider a restricted group of values
for one or several allegedly interesting dimensions.</p>
        <p>The histograms are used to display the distribution of items from a
continuous variable, which is previously partitioned into groups/bins. From a visual
point of view, they use an encoding with aligned bars ordered by bins, where the
size of the rectangles along the other axis is determined by a count aggregation. A
similar bar chart representation can be used for categorical variables, but in this
case each group is de ned by a category. Although their usefulness to discover
the distribution of a certain variable is obvious, the value of histograms and bar
charts for the analysis of a whole multi-dimensional data set is improved when
di erent variables are juxtaposed and coordinated or when interactive ltering
is performed through a uid selection of ranges.</p>
        <p>
          Although roll-up and drill-down operations might potentially be used to work
with a certain variable at di erent levels of aggregation, it seems that there is
not any intuitive application for the student data set. On the contrary, other
user-de ned aggregations of a performance-related variable with respect to (i.e.,
grouped by) an exploratory variable would be more informative. On the other
hand, the selection of subsets of groups in variables is, in any case, very
interesting for exploration. These selections are often called dicing (when the groups
cover more than one variable) and slicing (when the groups are selected from a
single variable) [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. Visualisation of count/frequency of each interval/category
in the histograms or bar charts is generally interesting. On the other hand,
grouping between two variables seems more useful when the aggregated variable
is the 'score' and the variable by which it is grouped is explanatory.
        </p>
        <p>
          Since the application of this approach does not require further processing
than the sorting, grouping and reducing needed to recompute the histograms,
the main factor to consider is that its implementation should be e cient enough
to allow uid ltering. E ciency can be achieved through the used of sorted
indexes and incremental updates [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ].
4.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Data projection through dimensionality reduction techniques</title>
        <p>The two-dimensional projections obtained through the application of the t-SNE
technique can be visualised as scatterplots, in the framework of a complete
dashboard that adds both the information necessary to support the exploratory
analysis and the visual controls needed to provide interaction. In these visualisations,
the position of the points is not interpretable, but their distances with each other
try to preserve the original distances in the high-dimensional space. The aim is
to provide an easy way to nd and interpret groups of data, as well as the
inuence of certain variables in the performance, through the visual proximity of
the points and the changes due to user interaction.</p>
        <p>Apart from the spatial position channels used to convey information about
the data structure, additional visual channels can be used to show values of
other variables from the original high-dimensional data. In fact, radius, shape
and colour of the points are useful for this purpose because their changes are
easily perceived. For that reason, they need to be included in the proposed tool
to ease the detection of salient patterns. On the other hand, it is appropriate to
enable chart customisation and interaction with data, in terms of the selection of
a data sample to obtain further details and the modi cation of weights. The
customisation of charts can be driven by usual visual control such as sliders, whereas
interaction is more easily understood when embedded in the visualisation.</p>
        <p>There are at least two interesting visualisations that might be obtained by
means of the dimensionality reduction approach:
{ The projection of a common data set of students, represented by their
descriptive variables and the average score for each academic year, in order to
analyse data from a global perspective, that aims at understanding common
characteristics of the institutions.
{ The projection of several data sets (for each degree/institution) of students,
represented by their descriptive variables and the scores of all the subjects,
with potentially missing data. The usefulness of this visualisation resides
in the analysis of the groups found for each degree. Speci cally, it would
be interesting to determine if clearly separated groups of students can be
found, if they gather students with di erent performance (high/low scores
or graduated/dropout), and whether the explanatory variables that are not
considered in the projection can provide some interpretation of the groups.
In this case, for the training of t-SNE, a custom metric is used, which is
essentially a pairwise Euclidean distance where missing components (i.e.,
scores of subjects that have not been taken by both students) are ignored.</p>
        <p>In both cases, for the training of the t-SNE algorithm, a PCA initialisation is
performed. The perplexity hyper-parameter, which drives the balance between
local and global focus, is chosen heuristically.
4.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Implementations</title>
        <p>The proposed implementations were developed and organised as a toolbox. The
rst visualisation tool is a set of coordinated histograms where a user can
lter by one or more variables, causing that the rest of the charts are updated
accordingly. The lters are applied through a range selection for the numeric
variables and through a one-click selection for the categorical ones. A subset
of variables have been selected according to their assumed relevance. The xed
charts associated to these variables generally show the count of student-subject
records binned by intervals. In the charts of the categorical variables, the groups
are distributed along the vertical axis, whereas in the numerical variables the
bins are represented along the horizontal axis. Nevertheless, it is also possible to
visualise other variables in a customisable chart associated to a dropdown menu.
Additionally, a histogram of the score grouped by another explanatory variable is
included. Finally, for the 'ResidenceCity' variable, which is geographic, a
choropleth map of the European Union has been used, aggregated at the NUTS2
region (i.e., state) level. Figure 3 shows an example of the results provided by
this tool.</p>
        <p>The second visualisation tool is an interactive dimensionality reduction of
the students' data, where data are projected onto a 2D scatterplot and some
parameters of the projection can be interactively adjusted. Two prototypes have
been developed following this idea:
{ In the rst case, data has been organised by year, so that each point
represents a student and its graphical properties (colour, shape, size) are linked
to the value of a certain variable, which can be customised. An example of
this kind of visualisation can be seen in Figure 4. In this case, size has been
linked to the admission score, shape shows the mother's education level and
the colour represents the score. The students of ve institutions have been
projected altogether and a cluster structure can be seen.
{ In the second case, a di erent visualisation is provided for each
degreeinstitution combination, as seen in Figure 5. The projected data is
essentially constituted by the scores of every course for each student. The
pairwise distance measure used to perform the dimensionality reduction is only
computed with respect to the coinciding courses. In this case, students
corresponding to the degree on Computer Science at U. of Leon have been
projected, linking the place of birth to radius and the sex to shape. Although
it does not create a clearly separated cluster structure, probably due to the
small data set, some students are projected far from the central group.
Further analysis of these points with regard to the additional information, shown
in the right side of the visualization, might lead to interesting conclusions.</p>
        <p>The prototypes have been developed with Python and JavaScript
technologies to enable an easy deployment as a web tool. The software development
process has been iterative because feedback has been gathered from partners in
order to delimit the speci c needs for the tools to be developed.
5</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Discussion</title>
      <p>This paper has presented an approach proposed within the SPEET project for
the visual analysis of student data. The work has resulted in the implementation
of prototypes that leverage the proposed methods: coordinated histograms and
interactive dimensionality reduction.</p>
      <p>The main qualities of the coordinated view of data are the joint display of
interconnected visualisations, the uid reaction to user actions and the absence
of further assumptions or imposed models on the data. These features made
it valuable for the validation or re nement of hypotheses. For instance, it has
been seen that the application of lters allows to con rm educators'
preconceptions about the in uence of the nature (mandatory/elective) and methodology
(theoretical/practical) of courses or the mobility in the score distributions.</p>
      <p>On the other hand, the interactive projections obtained by means of
dimensionality reduction can be useful for the recognition of salient patterns in data,
thus leading to the suggestion of new hypotheses about the in uence of
explanatory variables in the performance. By means of the presented prototypes, a user
can analyse the whole data set grouped by academic years, or focus the attention
on the variables that drive the structure of a certain degree.
6</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <p>The work presented in this paper has been co-funded by the Erasmus+
Programme of the European Union. The European Commission support for the
production of this publication does not constitute an endorsement of the
contents, which re ects the views only of the authors, and the Commission cannot
be held responsible for any use which may be made of the information therein.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Barbu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vilanova</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vicario</surname>
            ,
            <given-names>J.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varanda</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alves</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Podpora</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prada</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moran</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Torrebruno</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tocu</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Data mining tool for academic data exploitation. literature review and rst architecture proposal</article-title>
          .
          <source>Technical report, ERASMUS + KA2 / KA203 SPEET Project</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Romero</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ventura</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Educational data mining: A survey from 1995 to 2005</article-title>
          .
          <article-title>Expert systems with applications 33(1) (</article-title>
          <year>2007</year>
          )
          <volume>135</volume>
          {
          <fpage>146</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Tervakari</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Silius</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koro</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paukkeri</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pirttila</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Usefulness of information visualizations based on educational data</article-title>
          .
          <source>In: 2014 IEEE Global Engineering Education Conference (EDUCON)</source>
          , IEEE (apr
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Romero</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ventura</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Educational data mining: a review of the state of the art</article-title>
          .
          <source>IEEE Transactions on Systems, Man, and Cybernetics</source>
          , Part C (
          <article-title>Applications</article-title>
          and Reviews)
          <volume>40</volume>
          (
          <issue>6</issue>
          ) (
          <year>2010</year>
          )
          <volume>601</volume>
          {
          <fpage>618</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Gomez-Aguilar</surname>
            ,
            <given-names>D.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hernandez-Garc</surname>
            <given-names>a</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Garc</surname>
          </string-name>
          a-Pen~alvo,
          <string-name>
            <given-names>F.J.</given-names>
            ,
            <surname>Theron</surname>
          </string-name>
          , R.:
          <article-title>Tap into visual analysis of customization of grouping of activities in elearning</article-title>
          .
          <source>Computers in Human Behavior</source>
          <volume>47</volume>
          (
          <year>2015</year>
          )
          <volume>60</volume>
          {
          <fpage>67</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Keim</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Andrienko</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fekete</surname>
            ,
            <given-names>J.D.</given-names>
          </string-name>
          , Gorg,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Kohlhammer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Melancon</surname>
          </string-name>
          , G.:
          <article-title>Visual analytics: De nition, process, and challenges</article-title>
          . In: Information visualization. Springer (
          <year>2008</year>
          )
          <volume>154</volume>
          {
          <fpage>175</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Keim</surname>
            ,
            <given-names>D.A.</given-names>
          </string-name>
          :
          <article-title>Information visualization and visual data mining</article-title>
          .
          <source>IEEE transactions on Visualization and Computer Graphics</source>
          <volume>8</volume>
          (
          <issue>1</issue>
          ) (
          <year>2002</year>
          ) 1{
          <fpage>8</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Ware</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Information visualization: perception for design, Third Edition</article-title>
          . Morgan Kaufmann (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Endert</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ribarsky</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Turkay</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wong</surname>
            ,
            <given-names>B.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nabney</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blanco</surname>
            ,
            <given-names>I.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rossi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>The state of the art in integrating machine learning into visual analytics</article-title>
          .
          <source>In: Computer Graphics Forum</source>
          . Volume
          <volume>36</volume>
          ., Wiley Online Library (
          <year>2017</year>
          )
          <volume>458</volume>
          {
          <fpage>486</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verleysen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          : Nonlinear Dimensionality Reduction. Springer (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Sacha</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sedlmair</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peltonen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weiskopf</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , North,
          <string-name>
            <given-names>S.C.</given-names>
            ,
            <surname>Keim</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.A.</surname>
          </string-name>
          :
          <article-title>Visual interaction with dimensionality reduction: A structured literature analysis</article-title>
          .
          <source>IEEE transactions on visualization and computer graphics 23(1)</source>
          (
          <year>2017</year>
          )
          <volume>241</volume>
          {
          <fpage>250</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Rovira</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Puertas</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Igual</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Data-driven system to predict academic grades and dropout</article-title>
          .
          <source>PLoS one 12</source>
          (
          <year>2017</year>
          )
          <volume>1</volume>
          {
          <fpage>21</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>D</given-names>
            <surname>az</surname>
          </string-name>
          , I.,
          <string-name>
            <surname>Cuadrado</surname>
            ,
            <given-names>A.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perez</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , Dom nguez, M.,
          <string-name>
            <surname>Alonso</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prada</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          :
          <article-title>Energy analytics in public buildings using interactive histograms</article-title>
          .
          <source>Energy and Buildings</source>
          <volume>134</volume>
          (
          <year>2017</year>
          )
          <volume>94</volume>
          {
          <fpage>104</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Maaten</surname>
          </string-name>
          , L.v.d.,
          <string-name>
            <surname>Hinton</surname>
          </string-name>
          , G.:
          <article-title>Visualizing data using t-sne</article-title>
          .
          <source>Journal of machine learning research 9(Nov)</source>
          (
          <year>2008</year>
          )
          <volume>2579</volume>
          {
          <fpage>2605</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Hinton</surname>
            ,
            <given-names>G.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roweis</surname>
          </string-name>
          , S.T.:
          <article-title>Stochastic neighbor embedding</article-title>
          .
          <source>In: Advances in neural information processing systems</source>
          . (
          <year>2003</year>
          )
          <volume>857</volume>
          {
          <fpage>864</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16. :
          <article-title>Cross lter. fast multidimensional ltering for coordinated views</article-title>
          . http://square.github.io/cross lter/ (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>