=Paper= {{Paper |id=Vol-2294/DCECTEL2018_paper_26 |storemode=property |title=Scalable Higher Education Learning Analytics Architecture through Data Integration |pdfUrl=https://ceur-ws.org/Vol-2294/DCECTEL2018_paper_26.pdf |volume=Vol-2294 |authors=Jeanette Samuelsen |dblpUrl=https://dblp.org/rec/conf/ectel/Samuelsen18 }} ==Scalable Higher Education Learning Analytics Architecture through Data Integration== https://ceur-ws.org/Vol-2294/DCECTEL2018_paper_26.pdf
        Scalable Higher Education Learning Analytics
            Architecture through Data Integration

                            Jeanette Samuelsen1[0000-0002-8084-0258]
                           1 University of Bergen, Bergen, Norway

                             jeanette.samuelsen@uib.no



       Abstract. Effectively implementing LA at an institutional level is far from trivial,
       as such a solution needs to be scalable. In this project I aim to build a scalable
       learning analytics architecture. The main research focus will be on semantic in-
       teroperability, for integration of multiple educational data sources, but the archi-
       tecture will also have components for data analysis and reporting insights to
       stakeholders. As part of this project we have conducted a systematic review, to
       understand state-of-the art research and practice regarding multiple data source
       usage and combination in learning analytics. The initial work with the architec-
       ture has also resulted in a first conceptual model. For developing the architecture,
       semantic web technologies will be employed, to handle aspects such as shared
       data meaning. Once developed, we will conduct studies that use the architecture
       to address real-world challenges in higher education.

       Keywords: Learning analytics, Interoperability, Semantic web, Scalability


1      Introduction

Learning Analytics (LA) includes collecting, computationally analyzing and reporting
data to stakeholders; to gain insights, and enable decision making and interventions
related to questions about learning and learning environments [1]. Effectively imple-
menting LA at an institutional level is far from trivial, as such a solution needs to be
scalable. Factors affecting scalability include the technical solution, but also factors
such as organizational hierarchy [25], management structures [4], policy and regula-
tions [6]. Making LA scalable includes collecting and integrating data from multiple
data sources, which can be stored in different formats, and have varying levels of struc-
ture. The merge of data from many different sources can lead to more useful analysis,
since many LA techniques require large scale and possibly diverse data [3].
   Data integration is related to interoperability. Interoperability involves technical, se-
mantic, legal and organizational levels. The semantic level is about ensuring that data
format and meaning is preserved and understood. The technical level includes services
for data exchange and data integration. The legal level is about aspects such as enabling
collaboration despite of different organizational policies. Organizational interoperabil-
ity includes aligning processes for common organizational goals and addressing user
expectations and requirements [19].
2


   The number of data sources used in combination for data analysis tend to be limited.
Existing LA projects addressing data integration needs tend to put the emphasis on the
technical level of interoperability [14]. The new EU general data protection regulation
[6] will to a large degree address legal and organizational concerns. Semantic interop-
erability, enabling shared data meaning, is typically less emphasized, even though it
will enable more effective merge of data.


2      Goals and Research Question

The main goal of this project is to build a scalable LA architecture for higher education,
with research emphasis on semantic interoperability. However, the architecture we will
develop also has components for data analysis and reporting of insights to stakeholders.
A secondary goal, providing insights into the real-world application of the architecture,
will be to address one or more challenges in a higher education institution, through
architecture usage. Initially, the challenges will focus on student success.
   To achieve the goals, the following overarching research question has been formu-
lated:
      How can a scalable learning analytics architecture be built and applied to ad-
         dress challenges in higher education?


3       (Abbreviated) State of the Field

Different studies have used or combined multiple data sources for LA. Data from a
Learning Management System and Student Information System have been combined
to detect students who struggle academically [13]. Researchers joined four data sets
containing student data, originating from two separate tools, to build a model to predict
low academic performance [9]. Behavior data from an LMS and university course da-
tabase have been joined, to cluster blended learning courses [22]. A commonality
among these studies is that they all combine a limited number of data sources, in similar
formats.
    Merging data available in different formats is more challenging than combining sim-
ilar data. In addition to common operations such as data access, data cleaning, and data
filtering, the data to be combined also needs to be transformed into a common format.
For this purpose the organization JISC has developed an architecture that includes
plugins to transform educational data to xAPI statements, meaning they can be com-
bined in a common store (learning record warehouse) [14]. xAPI, together with IMS
Caliper, are educational data specifications, both enabling standardization [7].
    While the approach taken by JISC will enable technical interoperability, there is less
of an emphasis on enabling shared meaning between data. Using semantic technologies,
such as the RDF data model and ontologies, it is possible to add descriptions and mean-
ing to data coming from various sources, and to combine, support and reuse different
specifications/data models. Ontologies also enable inference (given some stated fact,
we can state new and related facts) [10]. In addition, the use of ontologies is different
from the more traditional approach of data warehousing [15], since alignment (mapping
                                                                                        3


between concepts in different systems) can be optimized through ontology reuse, rather
than ad-hoc for each specific use case.


4      Research Design

As an overarching research methodology for this research project, the design science
research framework will be used [13]. This methodology acknowledges a technical ar-
tifact, including development and evaluation, as a research contribution. Novel parts of
such an artifact can also be viewed as contributions.


4.1    Systematic Literature Review

To identify important foundations for the work on the architecture, and to make it clear
where the proposed research fits into the body of knowledge, we have recently con-
ducted a systematic review. The systematic review follows specific guidelines [16].
Our review research questions include:
     How and to what extent are different types of data being used/combined for
         learning analytics research in higher education?
     What types of data are being used for learning analytics in higher education?


4.2    System Architecture
Informed by the results of the systematic literature review, we are developing a LA
architecture. Of special importance is that the solution should scale, emphasizing se-
mantic interoperability, to remove barriers for data exchange.


4.3    Studies Using the Architecture

Once the system architecture has been developed, and we have combined educational
data sets, we will use the architecture to address one or more challenges in higher edu-
cation, through computational data analysis and reporting. Challenge selection will be
informed by stakeholder needs for specific insights about learners and their environ-
ments. Initially, the focus will be on LA related to student success (e.g. through provid-
ing relevant dashboards to the students).


5      Current Status and Results

5.1    Systematic Literature Review

The following search string was formulated to search relevant databases (ACM, IEEE
Xplore, SpringerLink, Science Direct, Wiley and AISEL), conference proceedings
(Learning Analytics and Knowledge, Learning at Scale and International conference on
4


Educational Data Mining) and journals (Journal of Learning Analytics and Journal of
Educational Data Mining) for documents:
   ("multiple data sources" OR multimodal OR "multi-modal" OR
   "multiple data sets" OR "multiple datasets") AND ("learning analytics" OR
   "educational data mining") AND "higher education"
   A number of inclusion and exclusion criteria were formulated, to ensure that re-
viewed papers would fit the research questions and would provide foundations for our
development of a LA architecture. One such criteria was that the reviewed papers had
to use or combine more than one data source.
   The search originally returned 71 papers, but following specified inclusion and ex-
clusion criteria, we were left with 14 papers for inclusion in the review. This process
was documented using a PRISMA flow diagram [18].
   In our results, we have observed that five of the reviewed papers analyze data in
different formats that originate from different sources without a common format, thus
these data are not integrated, but analyzed separately. Nine of the studies merge and
analyze data from different data sources that are already in the same format. A more
extensive list is given in Table 1. With regards to number of data sources used/com-
bined, we found that nine of the fourteen papers use or combine only two data sources.

                    Table 1. Multiple dataset use - preliminary observations
    Observation                                               Freq.     Papers
    Merge data that are already in the same format from       9         [5, 8, 9, 12, 13, 22,
    different data sources                                              23, 24, 29]
    Analyze data in different formats from different          5         [20, 21, 27, 28, 29]
    sources without a common format
    Support educational data models/controlled vocab-         2         [5, 21]
    ularies

   Having conducted the review, we are now finishing up the resulting paper and plan
to send it for review before the end of the 2018.


5.2      System Architecture and Usage
The initial work with the architecture has resulted in a first conceptual model.




              Fig. 1. High-level LA architecture (functionality) and its environment
                                                                                                 5


   As seen in Fig. 1, the architecture shall be able to combine a number of data sets
with a multitude of formats, supported by semantic technologies. For this purpose we
need to develop tools to obtain, clean and transform the relevant data. An ontology will
be constructed to enable alignment between concepts in different systems. Such an on-
tology can build on pre-existing educational data models, but will likely need to be
further extended with new concepts to address gaps in the existing specifications. To
persistently store and combine educational data from different sources, a RDF database
will be used. With this approach we can support data expressed in compliance with both
xAPI and Caliper specifications, but also numerous other data models. In this sense the
architecture goes beyond the approach taken by organizations such as JISC, and it is
more streamlined than data warehousing.
   We have just recently obtained educational datasets from a Norwegian higher edu-
cation institution, thus development of the architecture will soon commence. Studies
that use the architecture will follow.


References
 1. 1st International Conference on Learning Analytics and Knowledge 2011 (February 2011),
    https://tekri.athabascau.ca/analytics/
 2. Chang, C.J., Chang, M.H., Liu, C C., Chiu, B.C., Fan Chiang, S.H., Wen, C.T., ..., Chai,
    C.S.: An analysis of collaborative problem‐solving activities mediated by individual‐based
    and collaborative computer simulations. Journal of Computer Assisted Learning 33(6), 649-
    662 (2017).
 3. Cooper, A., Hoel, T.: Data Sharing Requirements and Roadmap (2015).
 4. Dawson, S., Poquet, O., Colvin, C., Rogers, T., Pardo, A., Gasevic, D.: Rethinking learning
    analytics adoption through complexity leadership theory. In Proceedings of the 8th Interna-
    tional Conference on Learning Analytics and Knowledge (pp. 236-244). ACM (2018).
 5. Di Mitri, D., Scheffel, M., Drachsler, H., Börner, D., Ternier, S., Specht, M.: Learning pulse:
    a machine learning approach for predicting performance in self-regulated learning using
    multimodal data (pp. 188–197). ACM Press (2017).
 6. EU GDPR (2016). http://eur-lex.europa.eu/legal-
    content/EN/TXT/PDF/?uri=CELEX:32016R0679&from=EN.
 7. Griffiths, D., Hoel, T.: Comparing xAPI and Caliper. LACE Review 7 (2016).
 8. Gray, G., McGuinness, C., Owende, P., Hofmann, M.: Learning Factor Models of Students
    at Risk of Failing in the Early Stage of Tertiary Education. Journal of Learning Analytics,
    3(2), 330–372 (2016). https://doi.org/10.18608/jla.2016.32.20
 9. Guarín, C.E.L., Guzmán, E.L., González, F.A.: A model to predict low academic
    performance at a specific enrollment using data mining. IEEE Revista Iberoamericana de
    Tecnologias del Aprendizaje 10(3), 119-125 (2015).
10. Hebeler, J., Fisher, M., Blace, R., Perez-Lopez, A.: Semantic web programming. Wiley
    (2009).
11. Hevner, A., March, S., Park, J., Ram, S.: Design science in information systems research.
    MIS Quarterly 28(1), 75–105 (2004).
12. Hutt, S., Hardey, J., Bixler, R., Stewart, A., Risko, E., D’Mello, S.: Gaze-based Detection
    of Mind Wandering during Lecture Viewing (2017).
6


13. Jayaprakash, S., Moody, E., Lauría, E., Regan, J., Baron, J.; Early Alert of Academically
    At-Risk Students: An Open Source Analytics Initiative. Journal of Learning Analytics, 1(1),
    6–47 (2014). https://doi.org/10.18608/jla.2014.11.3
14. JISC           (2018).           https://docs.analytics.alpha.jisc.ac.uk/docs/learning-records-
    warehouse/Technical-Overview:--Integration-Overview
15. Kimball, R., Ross, M.: The data warehouse toolkit: the complete guide to dimensional mod-
    eling. John Wiley & Sons (2011).
16. Kitchenham, B., Charters, S.: Guidelines for performing systematic literature reviews in
    software engineering. Technical Report EBSE-2007-01. School of Computer Science and
    Mathematics, Keele University (2007).
17. Liu, M., Kang, J., Zou, W., Lee, H., Pan, Z., Corliss, S.: Using Data to Understand How to
    Better Design Adaptive Learning. Technology, Knowledge and Learning 22(3), 271-298
    (2017).
18. Moher, D., Liberati, A., Tetzlaff, J., Altman, D.G.: The PRISMA group. Preferred reporting
    items for systematic review and meta-analysis: the PRISMA statement. PLoS Medicine
    6(7), e1000097 (2009).
19. New European Interoperability Framework (2017).
    https://ec.europa.eu/isa2/sites/isa/files/eif_brochure_final.pdf
20. Ochoa, X., Domínguez, F., Guamán, B., Maya, R., Falcones, G., Castells, J.: The RAP
    system: automatic feedback of oral presentation skills using multimodal analysis and low-
    cost sensors (pp. 360–364). ACM Press (2018). https://doi.org/10.1145/3170358.3170406
21. Pardos, Z.A., Whyte, A., Kao, K.: moocRP: Enabling Open Learning Analytics with an
    Open Source Platform for Data Distribution, Analysis, and Visualization. Technology,
    Knowledge and Learning 21(1), 75-98 (2016).
22. Park, Y., Yu, J.H., Jo, I.H.: Clustering blended learning courses by online behavior data: A
    case study in a Korean higher education institute. Internet and Higher Education 29, 1-11
    (2016).
23. Raca, M., Tormey, R., & Dillenbourg, P.: Sleepers’ Lag: Study on Motion and Attention.
    Journal of Learning Analytics, 3(2), 239–260 (2016).
    https://doi.org/10.18608/jla.2016.32.12
24. Rodríguez-Triana, M., Prieto, L., Martínez-Monés, A., Asensio-Pérez, J., Dimitriadis, Y.
    The teacher in the loop: customizing multimodal learning analytics for blended learning (pp.
    417–426). ACM Press (2018). https://doi.org/10.1145/3170358.3170364
25. Shum, S., McKay, T.: Architecting for Learning Analytics: Innovating for Sustainable Im-
    pact (2018).
26. Sclater, N., Peasgood, A., Mullan, J.: Learning analytics in higher education: A review of
    UK and international practice (2016).
27. Thompson, K., Kennedy‐Clark, S., Wheeler, P., Kelly, N.: Discovering indicators of
    successful collaboration using tense: Automated extraction of patterns in discourse. British
    Journal of Educational Technology 45(3), 461-470 (2014).
28. Wang, Y., Paquette, L., Baker, R.: A Longitudinal Study on Learner Career Advancement
    in MOOCs. Journal of Learning Analytics, 1(3), 203–206 (2014).
    https://doi.org/10.18608/jla.2014.13.23
29. Zheng, M., Bender, D., Nadershahi, N.: Faculty professional development in emergent
    pedagogies for instructional innovation in dental education. European Journal of Dental
    Education 21(2), 67-78 (2017).