<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>SEMANTiCS</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Analytics in Histology Courses with Knowledge Graphs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jimmy Walraf</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andreas Coco</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Guillaume Delporte</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Merlin Michel</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Allyson Fries</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Valérie Defaweux</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christophe Debruyne</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>KG Construction, Learning Analytics, Ontology Engineering</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Biomedical and Preclinical Sciences, Faculty of Medicine, University of Liège</institution>
          ,
          <addr-line>Liège</addr-line>
          ,
          <country country="BE">Belgium</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Montefiore Institute of Electrical Engineering and Computer Science, University of Liège</institution>
          ,
          <addr-line>Liège</addr-line>
          ,
          <country country="BE">Belgium</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>20</volume>
      <fpage>17</fpage>
      <lpage>19</lpage>
      <abstract>
        <p>We report on an ongoing learning analytics project at the University of Liège, in which we want to analyze student interactions on Cytomine for a histology course. Cytomine provides tools for medical image annotation and an API that has been used for learning analytics. The problem, however, is that the data obtained from Cytomine has implicit semantics and requires many data preprocessing and integration steps. This poster presents the prototype KG we have built to address these problems. The KG adopts PROV-O to distinguish activities from their outcomes, addressing some of the issues faced in the past. We also demonstrate that the KG can be used in Jupyter notebooks, though learning analytics is left for future work. It did demonstrate that the data analysis process has become more declarative and transparent, as data is analyzed starting from SPARQL queries. We focused on one project in Cytomine, and future work consists of integrating additional projects. We also plan to investigate the development of more self-contained KG generation techniques as we have no direct access to the Cytomine application.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        Cytomine [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is a Web-based image analysis software platform that facilitates collaborative
exploration and analysis of large biological and medical image datasets. Cytomine provides
tools for image annotation (see Figure 1). Its application facilitates collaboration and educational
applications, as demonstrated by its use in histology courses at the University of Liège. Cytomine
employs a MongoDB database for data storage and provides a fairly restricted API to engage
with the various objects, such as the image annotations and tags created by its users.
      </p>
      <p>
        While advantageous for object persistence, MongoDB’s document-oriented storage model
presents challenges for the interconnected analysis required in learning analytics research.
Additionally, the various document types contain implicit relationships, so one must manually
determine a user’s subsequent annotations, for example. As such, prior learning analytics
studies [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] relied on preprocessing pipelines to create CSV files for machine learning models,
which led to various provenance issues (e.g., why were certain points omitted, amended, etc.).
Netherlands
This study aims to investigate the suitability of knowledge graphs (KGs) as a foundation for
learning analytics research. It is hoped that KGs can render those implicit relationships explicit
and that graph query languages are better suited to retrieve data for learning analytics. Another
motivation for using KGs is that the tools used in learning activities are just that—tools. The
data they store pertains to the tool. With KG technologies, we can integrate these data with
(diferent) learning models, e.g., to analyze whether the triple consistency[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] between learning
objectives, activities, and evaluations is met. In other words, KGs allow us to integrate these
tools in a flexible manner to support learning analytics.
      </p>
      <p>
        This paper briefly discusses our approach to integrating Cytomine’s data into a KG,
demonstrates our KG in a Jupyter Notebook, and elaborates on future work. The potential of this
study is substantial, as the feedback provided to students will guide their studies and enhance
their performance. Moreover, the data will assist educators in efectively integrating digital
microscopy into their pedagogical plan, thereby optimizing educational outcomes.
1.1. Related Work
There is little related work on the use of KGs for learning analytics. The learning analytics
community seems to focus on using Linked Data to facilitate research, as can be observed in
the LAK Data Challenge [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and a Web-portal reported in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] report on the potentials
and challenges of KGs in learning analytics, but only mention anecdotal uses such as [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], who
analyzed student enrollments in a university using a dataset enriched with Linked Datasets.
      </p>
    </sec>
    <sec id="sec-3">
      <title>2. Approach: Building CytoGRAPH</title>
      <sec id="sec-3-1">
        <title>The current iteration of the KG, dubbed CytoGRAPH, was built as follows:</title>
        <p>
          Ontology Development The KG’s ontology was engineered with a middle-out approach
where entities in the data (described below) were identified and aligned with the UoD of
domain experts and existing ontologies. We adopted OWL 2 QL as we anticipate the KG to
contain many assertions. The ontology we developed builds upon PROV-O [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] to model
the interactions between users and images and a sequence of annotations on an image in
one use session, GeoSPARQL [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] for representing the annotation’s geometries, and Web
Annotation Vocabulary [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].1 PROV-O was adopted as many of the core concepts aligned
well with this ontology; entities are the resources used (e.g., the images) and produced
(e.g., annotations) in the learning activities. The interactions of students are represented
as activities. Both students and instructors are represented as agents.
        </p>
        <p>
          Data Transformation We had no access to Cytomine’s MongoDB instance, though we could
download the data via its API.2 The data of one project consisting of 11 images, 588 users
(pseudonymized), and 27185 annotations, 1571 properties, and 31507 descriptions. We
used RML [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] with BURP [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] to generate RDF from the data. The University of Liège’s
Cytomine instance has over 175 projects, which indicates the KG’s potential size.
Data Annotation While we have yet to create links to other datasets and even other
institutional repositories (e.g., the e-learning platform), we have decided to represent geometries
using geo:wktLiterals so that we can retrieve activities from certain areas on the images.
        </p>
        <p>As such, we enriched the data with a geometric dimension.</p>
        <p>We recognize that our approach’s major limitation is its inability to transform the data stored
in MongoDB. Moreover, Cytomine’s API is fairly restricted, allowing us to retrieve data when
suficient restrictions are placed (e.g., retrieving the annotations on a project-per-project basis).
This limitation is beyond our control.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. Results</title>
      <p>
        The result of this study yielded a proof-of-concept KG for learning analytics. The KG can be
explored with tools such as Ontodia [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], as shown in Figure 2. The KG currently contains
information on over 27K annotations made by 587 users over one decade, which is for the sole
project to which we have access.
      </p>
      <p>To demonstrate that one could engage with the KG for learning analytics, we created a Jupyter
Notebook that retrieved the number of annotations per contributor and used this to determine
the optimal number of clusters using the Elbow Method, as shown in Figure 3.</p>
    </sec>
    <sec id="sec-5">
      <title>4. Conclusions</title>
      <p>We reported on the feasibility of creating a KG out of Cytomine, which required integrating
CSV into RDF. The data we obtained from Cytomine was rather flat. Information about a user’s
1The ontology, available at https://chrdebru.github.io/papers/2024-09-semantics/ontology.owl, is not yet made
available using a persistent identifier. The ontology will be published in a future iteration of the KG construction.
2https://doc.uliege.cytomine.org/dev-guide/api/reference
activity was implicitly stored but rendered explicit using PROV-O in the KG generation process.
As users annotated slides and stored them with geometric coordinates, we adopted GeoSPARQL
to use geospatial predicates. This allows us to analyze interactions on specific regions on
slides, for example. The number of annotations within one project indicates our project’s scale,
knowing there are over 150 projects in Cytomine. Challenges that we will investigate include
the evolution of this KG over time. As we currently have no access to the MongoDB instance,
which is normal, we should investigate more elegant ways to generate the KG. One venue is to
retrieve the data via rest calls in the mapping, which requires the development of bespoke RML
iterators.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <sec id="sec-6-1">
        <title>The authors wish to thank Ulysse Rubens from Cytomine Corporation.</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>U.</given-names>
            <surname>Rubens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hoyoux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Vanosmael</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ouras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tasset</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hamilton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Longuespée</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Marée</surname>
          </string-name>
          , Cytomine:
          <article-title>Toward an open and collaborative software platform for digital pathology bridged to molecular investigations</article-title>
          ,
          <source>PROTEOMICS - Clinical Applications</source>
          <volume>13</volume>
          (
          <year>2019</year>
          )
          <fpage>1800057</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Fries</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pirotte</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Vanhee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bonnet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Quatresooz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Debruyne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Marée</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Defaweux</surname>
          </string-name>
          ,
          <article-title>Validating instructional design and predicting student performance in histology education: Using machine learning via virtual microscopy</article-title>
          ,
          <source>Anatomical Sciences Education</source>
          <volume>17</volume>
          (
          <year>2024</year>
          )
          <fpage>984</fpage>
          -
          <lpage>997</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>V. R.</given-names>
            <surname>Kovertaite</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Leclercq</surname>
          </string-name>
          ,
          <article-title>The triple consistency illustrated by e-tivities to help understand national and international policies in e-learning</article-title>
          ,
          <source>International Journal of Technologies in Higher Education</source>
          <volume>3</volume>
          (
          <year>2006</year>
          )
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>M. d'Aquin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Dietze</surname>
            , E. Herder,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Drachsler</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Taibi</surname>
          </string-name>
          ,
          <article-title>Using linked data in learning analytics</article-title>
          ,
          <source>eLearning Papers</source>
          <volume>36</volume>
          (
          <year>2014</year>
          )
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>McKenzie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Abdalla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Janowicz</surname>
          </string-name>
          ,
          <article-title>A linked-data-driven web portal for learning analytics: Data enrichment, interactive visualization, and knowledge discovery</article-title>
          ,
          <source>in: Workshops at the 4th International Conference on Learning Analytics and Knowledge (LAK</source>
          <year>2014</year>
          ), Indianapolis, Indiana, USA, March
          <volume>24</volume>
          -28,
          <year>2014</year>
          , volume
          <volume>1137</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Zouaq</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jovanovic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Joksimovíc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gašević</surname>
          </string-name>
          ,
          <article-title>Linked data for learning analytics: Potentials and challenges, Handbook of Learning Analytics (</article-title>
          <year>2017</year>
          )
          <fpage>347</fpage>
          -
          <lpage>355</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>M. d'Aquin</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Jay</surname>
          </string-name>
          ,
          <article-title>Interpreting data mining results with linked data for learning analytics: motivation, case study and directions</article-title>
          ,
          <source>in: Third Conference on Learning Analytics and Knowledge</source>
          , LAK '13,
          <string-name>
            <surname>Leuven</surname>
          </string-name>
          , Belgium, April 8-
          <issue>12</issue>
          ,
          <year>2013</year>
          , ACM,
          <year>2013</year>
          , pp.
          <fpage>155</fpage>
          -
          <lpage>164</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sahoo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lebo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. McGuinness</given-names>
            ,
            <surname>PROV-O: The PROV Ontology</surname>
          </string-name>
          ,
          <source>W3C Recommendation, W3C</source>
          ,
          <year>2013</year>
          . Https://www.w3.org/TR/2013/REC-prov-o-
          <volume>20130430</volume>
          /.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R.</given-names>
            <surname>Battle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kolas</surname>
          </string-name>
          ,
          <article-title>Geosparql: enabling a geospatial semantic web</article-title>
          ,
          <source>Semantic Web Journal</source>
          <volume>3</volume>
          (
          <year>2011</year>
          )
          <fpage>355</fpage>
          -
          <lpage>370</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>R.</given-names>
            <surname>Sanderson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ciccarese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Young</surname>
          </string-name>
          , Web Annotation Vocabulary,
          <source>W3C Recommendation, W3C</source>
          ,
          <year>2017</year>
          . Https://www.w3.org/TR/2017/REC-annotation-vocab-
          <volume>20170223</volume>
          /.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Iglesias-Molina</surname>
          </string-name>
          ,
          <string-name>
            <surname>D. Van Assche</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Arenas-Guerrero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>De Meester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Debruyne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jozashoori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Maria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Michel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chaves-Fraga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dimou</surname>
          </string-name>
          ,
          <article-title>The RML ontology: A communitydriven modular redesign after a decade of experience in mapping heterogeneous data to RDF</article-title>
          , in: 22nd
          <source>International Semantic Web Conference - ISWC</source>
          <year>2023</year>
          , Athens, Greece, November 6-
          <issue>10</issue>
          ,
          <year>2023</year>
          , Proceedings,
          <string-name>
            <surname>Part</surname>
            <given-names>II</given-names>
          </string-name>
          , volume
          <volume>14266</volume>
          <source>of LNCS</source>
          , Springer,
          <year>2023</year>
          , pp.
          <fpage>152</fpage>
          -
          <lpage>175</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>D. Van Assche</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Debruyne</surname>
          </string-name>
          ,
          <article-title>Burping through RML test cases</article-title>
          ,
          <source>in: 5th International Workshop on Knowledge Graph Construction co-located with ESWC</source>
          <year>2024</year>
          , Hersonissos, Greece, May
          <volume>27</volume>
          ,
          <year>2024</year>
          , volume
          <volume>3718</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>D.</given-names>
            <surname>Mouromtsev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Pavlov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Emelyanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. V.</given-names>
            <surname>Morozov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Razdyakonov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Galkin</surname>
          </string-name>
          ,
          <article-title>The simple web-based tool for visualization and sharing of semantic data and ontologies</article-title>
          , in: ISWC 2015 Posters &amp;
          <article-title>Demonstrations co-located with ISWC-2015, Bethlehem</article-title>
          , PA, USA, October
          <volume>11</volume>
          ,
          <year>2015</year>
          , volume
          <volume>1486</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>