Facilitating Learning Analytics in Histology Courses
                                with Knowledge Graphs
                                Jimmy Walraff1,† , Andreas Coco1,† , Guillaume Delporte1,† , Merlin Michel1,† ,
                                Allyson Fries2 , Valérie Defaweux2 and Christophe Debruyne1,∗
                                1
                                    Montefiore Institute of Electrical Engineering and Computer Science, University of Liège, Liège, Belgium
                                2
                                    Department of Biomedical and Preclinical Sciences, Faculty of Medicine, University of Liège, Liège, Belgium


                                               Abstract
                                               We report on an ongoing learning analytics project at the University of Liège, in which we want to
                                               analyze student interactions on Cytomine for a histology course. Cytomine provides tools for medical
                                               image annotation and an API that has been used for learning analytics. The problem, however, is that
                                               the data obtained from Cytomine has implicit semantics and requires many data preprocessing and
                                               integration steps. This poster presents the prototype KG we have built to address these problems. The
                                               KG adopts PROV-O to distinguish activities from their outcomes, addressing some of the issues faced in
                                               the past. We also demonstrate that the KG can be used in Jupyter notebooks, though learning analytics
                                               is left for future work. It did demonstrate that the data analysis process has become more declarative and
                                               transparent, as data is analyzed starting from SPARQL queries. We focused on one project in Cytomine,
                                               and future work consists of integrating additional projects. We also plan to investigate the development
                                               of more self-contained KG generation techniques as we have no direct access to the Cytomine application.

                                               Keywords
                                               KG Construction, Learning Analytics, Ontology Engineering


                                1. Introduction
                                Cytomine [1] is a Web-based image analysis software platform that facilitates collaborative
                                exploration and analysis of large biological and medical image datasets. Cytomine provides
                                tools for image annotation (see Figure 1). Its application facilitates collaboration and educational
                                applications, as demonstrated by its use in histology courses at the University of Liège. Cytomine
                                employs a MongoDB database for data storage and provides a fairly restricted API to engage
                                with the various objects, such as the image annotations and tags created by its users.
                                   While advantageous for object persistence, MongoDB’s document-oriented storage model
                                presents challenges for the interconnected analysis required in learning analytics research.
                                Additionally, the various document types contain implicit relationships, so one must manually
                                determine a user’s subsequent annotations, for example. As such, prior learning analytics
                                studies [2] relied on preprocessing pipelines to create CSV files for machine learning models,
                                which led to various provenance issues (e.g., why were certain points omitted, amended, etc.).

                                SEMANTiCS 2024: 20th International Conference on Semantic Systems, September 17–19, 2024, Amsterdam, The
                                Netherlands
                                ∗
                                  Corresponding author.
                                †
                                  These authors contributed equally.
                                Orcid 0000-0002-2780-7264 (A. Fries); 0000-0002-8928-1309 (V. Defaweux); 0000-0003-4734-3847 (C. Debruyne)
                                             © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
Figure 1: An example of an annotation in Cytomine created for this poster. In this example, one has
selected an area on an image, entered a description, defined some tags, and provided some properties, a
series of key-value pairs. Each annotation has a URL that can be shared with others.


This study aims to investigate the suitability of knowledge graphs (KGs) as a foundation for
learning analytics research. It is hoped that KGs can render those implicit relationships explicit
and that graph query languages are better suited to retrieve data for learning analytics. Another
motivation for using KGs is that the tools used in learning activities are just that—tools. The
data they store pertains to the tool. With KG technologies, we can integrate these data with
(different) learning models, e.g., to analyze whether the triple consistency[3] between learning
objectives, activities, and evaluations is met. In other words, KGs allow us to integrate these
tools in a flexible manner to support learning analytics.
   This paper briefly discusses our approach to integrating Cytomine’s data into a KG, demon-
strates our KG in a Jupyter Notebook, and elaborates on future work. The potential of this
study is substantial, as the feedback provided to students will guide their studies and enhance
their performance. Moreover, the data will assist educators in effectively integrating digital
microscopy into their pedagogical plan, thereby optimizing educational outcomes.

1.1. Related Work
There is little related work on the use of KGs for learning analytics. The learning analytics
community seems to focus on using Linked Data to facilitate research, as can be observed in
the LAK Data Challenge [4] and a Web-portal reported in [5]. [6] report on the potentials
and challenges of KGs in learning analytics, but only mention anecdotal uses such as [7], who
analyzed student enrollments in a university using a dataset enriched with Linked Datasets.


2. Approach: Building CytoGRAPH
The current iteration of the KG, dubbed CytoGRAPH, was built as follows:
Ontology Development The KG’s ontology was engineered with a middle-out approach
    where entities in the data (described below) were identified and aligned with the UoD of
     domain experts and existing ontologies. We adopted OWL 2 QL as we anticipate the KG to
     contain many assertions. The ontology we developed builds upon PROV-O [8] to model
     the interactions between users and images and a sequence of annotations on an image in
     one use session, GeoSPARQL [9] for representing the annotation’s geometries, and Web
    Annotation Vocabulary [10].1 PROV-O was adopted as many of the core concepts aligned
    well with this ontology; entities are the resources used (e.g., the images) and produced
    (e.g., annotations) in the learning activities. The interactions of students are represented
     as activities. Both students and instructors are represented as agents.

Data Transformation We had no access to Cytomine’s MongoDB instance, though we could
     download the data via its API.2 The data of one project consisting of 11 images, 588 users
     (pseudonymized), and 27185 annotations, 1571 properties, and 31507 descriptions. We
     used RML [11] with BURP [12] to generate RDF from the data. The University of Liège’s
     Cytomine instance has over 175 projects, which indicates the KG’s potential size.

Data Annotation While we have yet to create links to other datasets and even other institu-
     tional repositories (e.g., the e-learning platform), we have decided to represent geometries
     using geo:wktLiteral s so that we can retrieve activities from certain areas on the images.
     As such, we enriched the data with a geometric dimension.

   We recognize that our approach’s major limitation is its inability to transform the data stored
in MongoDB. Moreover, Cytomine’s API is fairly restricted, allowing us to retrieve data when
sufficient restrictions are placed (e.g., retrieving the annotations on a project-per-project basis).
This limitation is beyond our control.


3. Results
The result of this study yielded a proof-of-concept KG for learning analytics. The KG can be
explored with tools such as Ontodia [13], as shown in Figure 2. The KG currently contains
information on over 27K annotations made by 587 users over one decade, which is for the sole
project to which we have access.
   To demonstrate that one could engage with the KG for learning analytics, we created a Jupyter
Notebook that retrieved the number of annotations per contributor and used this to determine
the optimal number of clusters using the Elbow Method, as shown in Figure 3.


4. Conclusions
We reported on the feasibility of creating a KG out of Cytomine, which required integrating
CSV into RDF. The data we obtained from Cytomine was rather flat. Information about a user’s
1
  The ontology, available at https://chrdebru.github.io/papers/2024-09-semantics/ontology.owl, is not yet made
  available using a persistent identifier. The ontology will be published in a future iteration of the KG construction.
2
  https://doc.uliege.cytomine.org/dev-guide/api/reference
Figure 2: Ontodia is used to visualize concepts and their relationships in CytoGRAPH. This image
illustrates relationships between users and their annotations of an image.


Figure 3: As a proof of concept, we showed domain experts how to interact with the KG using a Jupyter
Notebook. Using the number of annotations per contributor (a type of user), we applied the elbow
method to determine the optimal number of clusters (k). One can see that the optimal number of
clusters seems to be three, as the elbow is the most pronounced at this specific number of clusters.


activity was implicitly stored but rendered explicit using PROV-O in the KG generation process.
As users annotated slides and stored them with geometric coordinates, we adopted GeoSPARQL
to use geospatial predicates. This allows us to analyze interactions on specific regions on
slides, for example. The number of annotations within one project indicates our project’s scale,
knowing there are over 150 projects in Cytomine. Challenges that we will investigate include
the evolution of this KG over time. As we currently have no access to the MongoDB instance,
which is normal, we should investigate more elegant ways to generate the KG. One venue is to
retrieve the data via rest calls in the mapping, which requires the development of bespoke RML
iterators.


Acknowledgments
The authors wish to thank Ulysse Rubens from Cytomine Corporation.
References
 [1] U. Rubens, R. Hoyoux, L. Vanosmael, M. Ouras, M. Tasset, C. Hamilton, R. Longuespée,
     R. Marée, Cytomine: Toward an open and collaborative software platform for digital
     pathology bridged to molecular investigations, PROTEOMICS – Clinical Applications 13
     (2019) 1800057.
 [2] A. Fries, M. Pirotte, L. Vanhee, P. Bonnet, P. Quatresooz, C. Debruyne, R. Marée, V. De-
     faweux, Validating instructional design and predicting student performance in histology
     education: Using machine learning via virtual microscopy, Anatomical Sciences Education
     17 (2024) 984–997.
 [3] V. R. Kovertaite, D. Leclercq, The triple consistency illustrated by e-tivities to help under-
     stand national and international policies in e-learning, International Journal of Technolo-
     gies in Higher Education 3 (2006) 1–7.
 [4] M. d’Aquin, S. Dietze, E. Herder, H. Drachsler, D. Taibi, Using linked data in learning
     analytics, eLearning Papers 36 (2014) 1–9.
 [5] Y. Hu, G. McKenzie, J. Yang, S. Gao, A. Abdalla, K. Janowicz, A linked-data-driven web
     portal for learning analytics: Data enrichment, interactive visualization, and knowledge
     discovery, in: Workshops at the 4th International Conference on Learning Analytics and
     Knowledge (LAK 2014), Indianapolis, Indiana, USA, March 24-28, 2014, volume 1137 of
     CEUR Workshop Proceedings, CEUR-WS.org, 2014.
 [6] A. Zouaq, J. Jovanovic, S. Joksimovíc, D. Gašević, Linked data for learning analytics:
     Potentials and challenges, Handbook of Learning Analytics (2017) 347–355.
 [7] M. d’Aquin, N. Jay, Interpreting data mining results with linked data for learning analytics:
     motivation, case study and directions, in: Third Conference on Learning Analytics and
     Knowledge, LAK ’13, Leuven, Belgium, April 8-12, 2013, ACM, 2013, pp. 155–164.
 [8] S. Sahoo, T. Lebo, D. McGuinness, PROV-O: The PROV Ontology, W3C Recommendation,
     W3C, 2013. Https://www.w3.org/TR/2013/REC-prov-o-20130430/.
 [9] R. Battle, D. Kolas, Geosparql: enabling a geospatial semantic web, Semantic Web Journal
     3 (2011) 355–370.
[10] R. Sanderson, P. Ciccarese, B. Young, Web Annotation Vocabulary, W3C Recommendation,
     W3C, 2017. Https://www.w3.org/TR/2017/REC-annotation-vocab-20170223/.
[11] A. Iglesias-Molina, D. Van Assche, J. Arenas-Guerrero, B. De Meester, C. Debruyne, S. Joza-
     shoori, P. Maria, F. Michel, D. Chaves-Fraga, A. Dimou, The RML ontology: A community-
     driven modular redesign after a decade of experience in mapping heterogeneous data
     to RDF, in: 22nd International Semantic Web Conference - ISWC 2023, Athens, Greece,
     November 6-10, 2023, Proceedings, Part II, volume 14266 of LNCS, Springer, 2023, pp.
     152–175.
[12] D. Van Assche, C. Debruyne, Burping through RML test cases, in: 5th International
     Workshop on Knowledge Graph Construction co-located with ESWC 2024, Hersonissos,
     Greece, May 27, 2024, volume 3718 of CEUR Workshop Proceedings, CEUR-WS.org, 2024.
[13] D. Mouromtsev, D. S. Pavlov, Y. Emelyanov, A. V. Morozov, D. S. Razdyakonov, M. Galkin,
     The simple web-based tool for visualization and sharing of semantic data and ontologies,
     in: ISWC 2015 Posters & Demonstrations co-located with ISWC-2015, Bethlehem, PA, USA,
     October 11, 2015, volume 1486 of CEUR Workshop Proceedings, CEUR-WS.org, 2015.