<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Modeling for Object Segmentation and Eye Gaze Data in VR Art Exhibitions</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Delaram Javdani Rikhtehgar</string-name>
          <email>d.javdanirikhtehgar@utwente.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Batuhan Usta</string-name>
          <email>b.usta@student.utwente.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shenghui Wang</string-name>
          <email>shenghui.wang@utwente.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ontology-based modeling</institution>
          ,
          <addr-line>Eye Tracking, Object Detection, Segmentation, Virtual Reality, Cultural Heritage</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Twente</institution>
          ,
          <addr-line>Enschede</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Virtual reality (VR) art exhibitions provide an immersive platform for experiencing art, yet understanding user interaction and perception within these environments remains a challenge. Traditional VR approaches often fail to ofer personalized insights into how users engage with specific elements of an artwork. Object detection and segmentation techniques hold potential by enabling the identification and localization of objects within paintings, thereby adding semantic meaning to user eye-gaze data. In this work, we propose an ontology-based modeling framework that links segmented objects in artworks with user gaze data. This framework allows for the integration of numerical data, such as image segments, and measurement data, including user engagement metrics (e.g., eye-gaze patterns), time series (tracking eye movements across exhibits), and qualitative assessments of user experience. By semantically enriching gaze data through associations with specific objects and concepts within the artwork, we are able to generate insights into gaze-based behaviors, such as individual users' visual attention and navigation. These insights enable the exploration of higher-order interpretations, such as visitor behavior and engagement trends, which are critical for improving user experience and optimising exhibition design. Our study demonstrates how this ontology was populated with real-world data from a user study and present further analysis based on recorded numerical data from the virtual environment. Finally, we discuss our ifndings, limitations, and potential directions for future work.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>VR</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        Virtual Reality (VR) is increasingly used in museums thanks to its potential to enhance visitor
engagement and educational experiences [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5">1, 2, 3, 4, 5</xref>
        ]. These environments often feature virtual agents that
guide visitors and provide personalized information about artworks, improving both engagement and
attention [
        <xref ref-type="bibr" rid="ref6 ref7 ref8">6, 7, 8</xref>
        ]. VR also addresses geographical limitations by allowing remote access to exhibitions,
making art more accessible to a global audience [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. However, while VR museums can showcase multiple
artworks simultaneously, they face significant challenges in eficiently managing and storing large
volumes of data related to artworks and visitor interactions.
      </p>
      <p>
        Knowledge graphs [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] ofer a feasible solution for managing this data by storing extensive information
about artworks, including artist details, descriptions, and object locations. This structured information
is crucial for identifying objects of interest and enhancing user engagement [
        <xref ref-type="bibr" rid="ref10 ref6 ref7">6, 7, 10</xref>
        ]. Despite these
advantages, a major challenge remains: determining which specific objects within an artwork capture
visitors’ attention [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Eye-gaze data provides a potential solution, as it reveals where visitors focus
their attention, ofering insights into their engagement and behavior [
        <xref ref-type="bibr" rid="ref12">12, 13, 14</xref>
        ]. Integrating
eyegaze data into knowledge graphs, alongside object location data, allows for precise tracking of user
interactions with specific elements within artworks.
      </p>
      <p>However, manual annotation of objects within artworks is time-consuming and ineficient, especially
in dynamic virtual environments with frequent content updates. Deep learning (DL) models can
automate this process by identifying objects and generating bounding box coordinates [15, 16, 17, 18, 19,</p>
      <p>CEUR</p>
      <p>ceur-ws.org
20]. While bounding boxes provide a general localization of objects, they often lack the precision required
to handle overlapping or intricate elements within a painting. Segmentation models, which ofer
pixellevel accuracy, address these limitations by providing more precise object boundaries [21, 22, 23, 24, 25].
However, segmentation models frequently struggle with providing meaningful class labels for the
segmented objects [26]. To address this gap, an auto-annotation system combining both object detection
and segmentation is required, delivering both class labels and accurate object boundaries.</p>
      <p>In this paper, we propose an ontology-based modeling framework to link segmented objects within
paintings to class labels and user gaze data, enriching the semantic context of the visitor’s gaze. This
approach enables the standardization of key components, such as objects, actions, and user preferences,
facilitating interoperability across systems. Additionally, ontology-based models allow for semantic
reasoning and automated inference, generating insights into user behavior and optimizing content
delivery in personalized experiences [27, 28, 29]. Our work addresses the following research questions:
1. How can segmented objects within artworks and eye-gaze data as a time series be formally
represented in a knowledge graph to ensure semantic accuracy and interoperability?
2. What new behavioral insights can be derived when segmented object data and eye-gaze patterns
are semantically enriched and integrated within the ontology?
3. How can gaze patterns and object interaction data be eficiently stored, managed, and queried to
facilitate the extraction of high-level behavioral trends and visitor engagement metrics?</p>
    </sec>
    <sec id="sec-3">
      <title>2. Related Work</title>
      <p>Our research integrates multiple key areas, including object detection, segmentation, eye-gaze analysis,
and ontology-based systems, to improve user interaction in Virtual Reality (VR) museum environments.
Object Detection and Segmentation Models Object detection and segmentation are key techniques
in computer vision, often used together to enhance the accuracy of object localization and boundary
detection within images. Faster R-CNN and YOLO are widely adopted object detection models for
their eficiency in generating bounding boxes around objects within images [ 20, 19]. Faster R-CNN
ofers high accuracy through a two-stage process, while YOLO is faster and better suited for real-time
applications, though with slightly reduced accuracy [30, 19].</p>
      <p>However, while object detection ofers quick localization, it lacks the pixel-level precision needed
for complex scenes like artworks. Segmentation models such as Mask Region-Based Convolutional
Neural Network (Mask R-CNN) [21], Panoptic Segmentation Model [22], You Only Look Once Version 8
(YOLOv8) Instance Segmentation [23], Fast Segment Anything Model (SAM) [24], and Unified Panoptic
Segmentation Network (UPSNet) [25] address this issue by generating pixel-level segmentation masks.
While SAM ofers significant capabilities with minimal manual annotation, its inability to provide
semantic labels limits its use in tasks requiring both localization and classification. To overcome this
limitation, we combine the speed of YOLO for fast object detection with the precision of SAM for
segmentation, ensuring that user gaze interactions are accurately linked to the objects they observe.
Additionally, we use an ontology-based system to assign semantic meaning to the segmented objects,
enabling richer and more interactive user experiences.</p>
      <p>Eye-Gaze Data Interpretation Eyes can provide more information than just a single focal point.
Research shows that eye movement patterns can be further analyzed to reveal the intentions behind
tasks in various applications [31, 32]. Eye movement patterns are dificult to extract. Eye-gaze data
alone does not provide suficient information, as the gaze only gives coordinates. To make meaning
out of these gazes, semantic context is necessary. Semantic meaning can be provided by utilizing the
deep learning models as they provide identification and location to desired sections. Koochaki and
Najafizadeh proposed a method to interpret these gazes by incorporating semantic context to make
meaningful conclusions [32]. However, few approaches have combined these insights with structured
ontologies. Our work builds on this by using ontologies as a semantic framework to interpret real-time
eye-gaze data, enabling dynamic, personalized content adaptation in virtual museums.
Ontology-Based Systems in Human-Computer Interaction In recent years, ontologies have
become vital tools in enhancing human-computer interaction (HCI) by providing structured, semantic
representations of knowledge. Chang et al.[33] developed an ontology-based knowledge model for an
integrated robot framework, enabling interactive human-robot services. Their model incorporates both
general and domain-specific knowledge, structured into five key ontologies: user, robot, perception,
environment, and action. This framework was tested in a social robot service, demonstrating its
usability and efectiveness. Similarly, Costa et al.[ 34] introduced HCIO (Human-Computer Interaction
Ontology), rooted in the Unified Foundational Ontology (UFO), to address semantic interoperability
in HCI. HCIO is composed of three sub-ontologies—Interactive Computer System, User, and
HumanComputer Interaction—and serves as a reference model for communication and learning in the HCI
domain. Castro et al.[35] extended this work by introducing HCIDO (Human-Computer Interaction
Design Ontology), part of the broader HCI-ON (Human-Computer Interaction Ontology Network).
HCIDO supports the development of KTID (Knowledge Supporting Tool for HCI Design), facilitating
knowledge sharing, especially among novice designers. Freitas et al.[36] explored ontologies to support
Adaptive User Interface (AUI) systems, enabling real-time interface adjustments tailored to users with
low vision or color blindness. These examples highlight the growing role of ontologies in structuring
knowledge for more adaptive and inclusive HCI systems. However, previous work in HCI ontologies
does not fully address the needs of dynamic user interaction with visual elements in VR environments.
Multimodal Technologies in Virtual Museums Within the domain of Virtual Heritage, multimodal
technologies have proven instrumental in creating immersive and engaging experiences, particularly in
virtual museums [37, 38]. These technologies often incorporate conversational agents that act as guides,
enhancing information accessibility and user engagement [39, 40, 41]. Head-mounted displays (HMDs)
further enrich these experiences by integrating various input sensors, such as those tracking hand
movements, speech, and eye and head motions, which collectively enhance the sense of immersion [42].
Among these, eye-tracking technology has emerged as a powerful tool in virtual reality (VR), providing
rich insights into user behavior and interaction. Eye-tracking enables the customization of content
based on users’ gaze patterns, ofering a more personalized and adaptive experience [ 43, 44]. Beyond
interaction, eye-tracking also facilitates emotional and learning assessments, adding further dimensions
to user experience analysis environments [45, 46].</p>
      <p>Although multimodal technologies have significantly enhanced user immersion in virtual museums,
there remains a lack of systems that integrate gaze data with semantic representation to provide
contextaware, personalized content. Our framework bridges this gap by combining eye-tracking data with an
ontology-based system that enriches interactions with semantic meaning.</p>
      <p>In summary, while significant progress has been made in object detection, segmentation, and the use
of ontologies in HCI, there remains a gap in integrating these techniques to enhance user interactions
in VR environments. Our research addresses this gap by combining deep learning for object detection
and segmentation with ontology-based semantic representations, all informed by real-time eye-gaze
data, to create personalized, adaptive experiences in virtual museum settings.</p>
    </sec>
    <sec id="sec-4">
      <title>3. Ontology-based Data Modeling for User Interaction in VR</title>
      <p>Motivational context As virtual reality (VR) technology evolves, it creates new opportunities for
users to interact with digital environments in personalized and engaging ways. Our VR art exhibition, as
shown in Figure 3, featuring 19 paintings, aims to model visitor interactions by integrating pre-processed
numerical data from object detection and segmentation with real-time eye-gaze tracking data. The key
to achieving this dynamic interaction lies in an ontology-based framework that systematically organizes
and models both types of numerical data, allowing for personalized and adaptive user experiences.</p>
      <p>Before a visitor enters the VR environment, each painting undergoes object detection and
segmentation. This process identifies key elements, such as figures and objects, and generates numerical
data in the form of segmentation masks and spatial coordinates. This pre-processed data is essential
for understanding the structure of the artwork and is stored in an ontology to represent the spatial
relationships and semantic meaning of each object.</p>
      <p>When a visitor enters the VR environment, their eye-gaze data—captured in real time as spatial
coordinates—is linked to this pre-processed object data via the ontology. The ontological model plays
a critical role in mapping real-time gaze points to specific objects within the paintings by using the
pre-stored segmentation data. This allows the system to interpret where the visitor is focusing and
deliver tailored information about the object, such as its historical significance or artistic relevance.</p>
      <p>By organizing both the pre-processed numerical data (segmentation masks, object coordinates) and
the real-time gaze data within the ontology, the system gains the ability to semantically interpret and
respond to user behavior. The ontology provides a framework to represent not only static information
about the artworks but also dynamic user interactions. This enables the conversational agent to
adapt to each visitor’s unique engagement patterns, responding to prolonged gazes with more detailed
information or suggesting related artworks.</p>
      <p>The ontological modeling of these two types of numerical data—pre-processed spatial data from object
detection and real-time eye-gaze coordinates—creates a seamless bridge between passive observation
and active engagement. The system not only tracks user behavior but also uses it to generate new
insights, such as identifying which parts of the exhibition are most engaging or how users interact
with diferent types of artwork. By structuring this information within an ontology, we ensure that the
system can manage and query both pre-processed and real-time data eficiently, providing a flexible
and scalable solution for modeling user interactions in VR environments.</p>
      <sec id="sec-4-1">
        <title>3.1. Obtaining Segmentation Data (Pre-Processed Data)</title>
        <p>To analyze participants’ gaze patterns and understand which objects they are focusing on during the
VR exhibition, we first needed to obtain detailed numerical segmentation data for the objects within
the paintings. This was achieved through a combination of object detection and segmentation models.</p>
        <p>We employed YOLOv8,1 a state-of-the-art object detection model, which can recognize up to 601 class</p>
        <sec id="sec-4-1-1">
          <title>1https://docs.ultralytics.com/models/yolov8/#key-features</title>
          <p>labels. YOLOv8 was used to detect prominent objects in the paintings, such as people, buildings, and
other significant elements. Each detected object was assigned a class label (e.g., “Person”, “Building”),
providing semantic meaning to the identified elements within the artwork.</p>
          <p>Once the objects were detected, we applied the Segment Anything Model (SAM) from the Ultralytics
library2 to achieve pixel-level segmentation. SAM generated segmentation masks, which define the
precise location of each detected object. These segmentation masks were normalized to a coordinate
system where all points are represented within the range [0.00, 0.00] (top-left corner) to [1.00, 1.00]
(bottom-right corner). This normalization ensures consistency in how objects are represented across
diferent paintings of varying sizes.</p>
          <p>Once the objects were detected and segmented, we incorporated the data into our ontology following
these specific requirements:
1. Detected Objects as Areas of Interest (AOIs):
• Each object detected in the painting (e.g., a person or building) must be represented as an</p>
          <p>Area of Interest (AOI) within the ontology.
• The AOI is semantically linked to a class label (e.g., “Person”, “Building”), which defines the
type of object being observed.
2. Segmentation Masks as lists of coordinates:
• Each AOI must be associated with a segmentation mask that defines the pixel-level location
of the object, along with the painting in which the object is located.
• These segmentation masks are stored in the ontology as a list of normalised coordinates
[[ 1,  1], [ 2,  2], … , [  ,   ]], ensuring the precise location of the AOI can be used to match
real-time gaze data during the exhibition.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>3.2. Collecting real-time eye gaze data stream</title>
        <p>Once the objects in the paintings were represented through normalized segmentation data, the next step
involved integrating real-time eye-gaze data captured during the VR exhibition. This process required
storing and representing the gaze data in the ontology in a manner that enables dynamic interaction
with the previously identified Areas of Interest (AOIs).</p>
        <p>In the VR environment, gaze points are initially captured as three-dimensional coordinates [  ,   ,   ],
which represent the user’s view in the VR space. Each painting is also represented by a 3D array that
defines its position within this virtual environment. When the user focuses on a specific painting, the
gaze data must be converted from 3D coordinates to 2D coordinates [  ,   ], which correspond to the
surface of the painting. This transformation allows the gaze data to align with the 2D segmentation
masks previously generated for the painting’s AOIs. Like the segmentation data, these gaze points are
normalized between [0.00, 0.00] (top-left corner) to [1.00, 1.00] (bottom-right corner). This normalization
ensures that the gaze data can be directly compared with the normalized segmentation masks of the
AOIs, regardless of the painting’s original dimensions.</p>
        <p>As users explore the virtual environment, their gaze points are continuously recorded. To efectively
represent the real-time eye gaze data stream, the following requirements are implemented:
1. Tracking Gaze Interactions:
• Gaze points are stored in the ontology as part of an Observing action. This action includes
the normalized 2D gaze coordinates, the painting being observed, the actor performing the
action, and the event within which the action occurs.
• If a gaze point falls within the coordinates of an Area of Interest (AOI) associated with the
painting, the system logs that the user is observing that specific object (e.g., a person or
building). This relationship is recorded in the ontology, linking the user’s gaze data to the
corresponding AOI. This allows for precise tracking of what the user is focusing on at any
given moment.
2https://docs.ultralytics.com/models/sam/#key-features-of-the-segment-anything-model-sam
2. Capturing Temporal Data for Gaze Analysis:
• The ontology also captures temporal data related to the user’s gaze interactions, including
the start time of the gaze on each AOI and the duration of the observation.
• This temporal information is crucial for analyzing gaze patterns, such as the sequence in
which objects are viewed or identifying areas where the user’s attention is sustained for
longer periods. Such insights provide a deeper understanding of user engagement with
specific elements in the artwork.</p>
        <p>By meeting these ontological requirements, the system ensures that real-time gaze data can be
seamlessly integrated with pre-processed segmentation data, enabling dynamic, personalized interactions
throughout the VR experience.</p>
      </sec>
      <sec id="sec-4-3">
        <title>3.3. Ontology Components</title>
        <p>The ontology models both static elements of the virtual environment (such as paintings and their
attributes) and dynamic data related to user interactions (such as gaze patterns and actions). Figure 2
illustrates a sub-ontology designed to represent these relationships. Below, we outline the core elements
of the ontology and their roles in eye-gaze-based user behavior modeling.
3.3.1. Static Data Components
The Environment segment defines the virtual space, encompassing diferent VR configurations such
as multiple rooms (e.g., three distinct exhibition rooms). It also stores domain context, such as cultural
heritage, and high-level contextual information about the exhibition (e.g., an exhibition titled “HERE:
Black in Rembrandt’s Time”). These environmental attributes are essential for providing context to user
interactions and spatial navigation.</p>
        <p>The Object segment provides detailed information about virtual elements within the environment,
primarily focused on the Paintings displayed in the exhibition. Each painting is associated with
metadata, including the name, description, and narrative related to the artwork. Additionally, specific
Areas of Interest (AOI) within the paintings are defined, accompanied by sets of coordinates and
relevant semantics. These AOIs can represent particular elements, such as person or buildings. Each
AOI is linked to:
• Segmentation masks, which are represented as MultiCoordinates, which store the normalized
coordinates of the AOI.
• Class labels that provide semantic meaning to the AOIs (e.g., “Person,” “Building”). These labels
facilitate more meaningful user interaction and content retrieval.
• Cultural knowledge graphs that link certain elements (such as historically significant figures or
locations) to broader datasets, enhancing the richness of the interaction.</p>
        <p>Eficient Representation of Segmentation Masks In the ontology, segmentation masks are stored
as a list of normalized coordinates, which is represented as a single array rather than individual nodes
for each coordinate. This compact representation improves:
• Eficiency: Reduces the overhead of managing numerous individual nodes, simplifying data
manipulation.
• Query Performance: Makes it easier to perform comparisons and queries involving segmentation
masks, especially when matching real-time gaze points with AOIs.
• Semantic Clarity: Storing segmentation masks as arrays aligns intuitively with the way they
define spatial areas, making the model easier to understand and navigate.</p>
        <p>This design also facilitates easier future adjustments or dynamic modifications without the need to
restructure the entire ontology, ensuring the model is flexible and scalable.
3.3.2. Dynamic Data Components
While the Environment and Object segments are static, representing predefined data about the VR
space and paintings, the ontology must also accommodate dynamic data that evolves based on user
interactions during the exhibition.</p>
        <p>The Actor component encapsulates information about users navigating the virtual environment,
participating in events, and performing actions. It stores user information, such as their familiarity
with the VR environment, knowledge about the exhibition, and demographic details (e.g., age, culture,
education, gender, and language). This information is critical for analyzing user engagement and
personalizing interactions.</p>
        <p>The Action concept captures the diferent actions users perform during the exhibition, including
gazerelated Observing, or conversation-related actions such as, asking a question, answering a question,
commenting, or providing feedback.</p>
        <p>To fully analyze gaze patterns and user engagement, the ontology tracks temporal data for each
interaction, including StartTime and Duration. For gaze-related Observing action, they capture
when the user begins interacting with an AOI and the length of time the user’s gaze remains on an
AOI, allowing for in-depth analysis of attention patterns, such as prolonged focus on particular areas or
sequences of viewed objects.</p>
      </sec>
      <sec id="sec-4-4">
        <title>3.4. Populating and Querying the Ontology for Eye-Gaze Analysis</title>
        <p>The following sections illustrate how to populate and query the ontology to connect gaze data with
AOIs and extract meaningful insights.
3.4.1. Populating the Ontology with Gaze Data and AOIs
Listing 1 provides a SPARQL query for inserting a new observation (gaze point) into the ontology.
The query links the user’s gaze point to the corresponding AOI by comparing the normalized gaze
coordinates [ ,  ] with the MultiCoordinates of the AOI.</p>
        <p>Listing 1: Adding a new gaze point (X,Y) and connecting it to the corresponding AOI
INSERT {
# Create a new action instance labeled as ’action_1’
:action_1 a :Action_Observing ;
:interacting_object ?object ; # Specify interacting object
:interacting_aoi ?aoi ; # Specify interacting AOI
:has_coordinates ?coordXY. # Specify gaze point
# Connect the actor performing the new action
?actor :performs :action_1 .
# Specify the object as a ’Painting’ with AOI ?aoi
?object a :Painting ;</p>
        <p>:has_AOI ?aoi .
}
WHERE {
# Find the AOI related to the coordinates [X,Y]
?aoi :has_segmentation_coordinates ?MultiCoordinates .</p>
        <sec id="sec-4-4-1">
          <title>BIND((x, y) AS ?coordXY) FILTER(contains(?MultiCoordinates, ?coordXY))</title>
          <p># Find the actor with name ’actor_1’
?actor a :Actor ;</p>
          <p>:actor_name ”actor_1” .
}
}
3.4.2. Analyzing Gaze Patterns through Temporal Data
Inferences about user behavior, such as decision-making processes and attentional patterns, can be
drawn from gaze-based metrics like Dwell Time, Fixation Count, and Transitions between AOIs [47].
Listing 2 shows how to calculate Dwell Time, which represents the sum of gaze fixation durations on a
specific AOI.</p>
        </sec>
        <sec id="sec-4-4-2">
          <title>Listing 2: Calculating Dwell Time SELECT (SUM(?durationValue) AS ?DwellTime) WHERE {</title>
          <p>?actor :performs ?action . # An actor performs an action.
?action a :Observing .
?action :interacting_aoi ?aoi . # The action interacts with an AOI.
?aoi a :AOI ; # The AOI is of type AOI and its name is ”aoi_1”.</p>
          <p>:aoi_name ”aoi_1” .
?action :has_time ?duration . # The action has a duration.
?duration :Duration ; # The duration has a value.</p>
          <p>:value ?durationValue .
3.4.3. Comparing Actor Behavior Across AOIs
The ontology allows for the comparison of multiple actors’ behavior by analyzing how often they
interact with diferent AOIs. Listing 3 presents a SPARQL query that calculates how frequently each
actor visits an AOI and compares these observation frequencies across actors.</p>
        </sec>
        <sec id="sec-4-4-3">
          <title>Listing 3: Comparative analysis of AOI observation frequencies</title>
          <p>SELECT ?actor ?aoi (COUNT(?action) AS ?observationCount)
WHERE {
?actor :performs ?action .
?action a :Observing .
?action :interacting_aoi ?aoi .</p>
          <p>?aoi a :AOI .
}
GROUP BY ?actor ?aoi
ORDER BY DESC(?observationCount)
3.4.4. Extracting Gaze Sequences
To analyze the sequential patterns in which users observe diferent AOIs, Listing 4 retrieves gaze
observations in chronological order. This query provides information on the start time and duration of
each observation, allowing the system to track the order in which AOIs are viewed and analyze users’
attention shifts.</p>
        </sec>
        <sec id="sec-4-4-4">
          <title>Listing 4: Extracting transitional sequences of visiting AOI</title>
          <p>SELECT ?actor ?aoi ?startTimeValue ?durationValue
WHERE {
?actor :performs ?action .
?action a :Observing .
?action :interacting_aoi ?aoi .
?aoi a :AOI .
# Fetching the Start Time
?action :has_time ?startTime .
?startTime a :StartTime .
?startTime :value ?startTimeValue .
# Fetching the Duration
?action :has_time ?duration .
?duration a :Duration .</p>
          <p>?duration :value ?durationValue .
}
ORDER BY ?startTimeValue</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Case study: Analyzing gaze patterns and user engagement in virtual exhibitions</title>
      <sec id="sec-5-1">
        <title>4.1. VR exhibition setup</title>
        <p>In 2020, the Museum Rembrandthuis3 organized a special exhibition titled “HERE: Black in Rembrandt’s
Time.” To preserve this physical exhibition, we developed a virtual exhibition using Unity, featuring
a selection of 19 paintings arranged across three thematic rooms (see Figure 3). Each painting, along
with its accompanying textual descriptions, was modeled as a separate virtual object, enabling us to
precisely track and analyze visitor interactions.</p>
        <p>The development of this VR experience allowed us to address a key challenge in cultural heritage:
understanding how visitors engage with individual elements of an exhibition. By utilizing advanced
eye-tracking technology, we were able to monitor and interpret the specific elements that captured
users’ attention throughout their virtual visit.</p>
      </sec>
      <sec id="sec-5-2">
        <title>4.2. User study and eye-tracking</title>
        <p>To assess user interaction with the virtual artworks, two user studies were conducted with a group of
24 high school students (aged 16–18). Participants were equipped with HTC VIVE Pro Eye headsets,
allowing for real-time tracking of their eye movements during the exhibition. The eye-gaze data
collected included timestamps, painting identifiers, and x-y gaze coordinates for each painting. This
comprehensive dataset formed the foundation for our gaze behavior analysis.</p>
        <p>Our study contributes to VR exhibition research by providing an ontology-driven framework to link
eye-tracking data to specific objects within the paintings. This allows for a detailed analysis of how
users visually explore diferent areas of the artwork, enhancing our understanding of their attention
patterns.</p>
      </sec>
      <sec id="sec-5-3">
        <title>4.3. Analysis of Gaze Patterns and Engagement</title>
        <p>Using object detection models like YOLOv8 and segmentation techniques via the Segment Anything
Model (SAM), we detected key objects within each painting and linked them to semantic class labels
(e.g., “Man,” “Dress,” “Table”). An example of a segmented painting is depicted in the top left of Figure 4.
This image showcases five objects, each uniquely colored for enhanced visualization: a man, a woman,
a piece of clothing, a dress, and a table. These objects were stored as Areas of Interest (AOIs) in our
ontology, providing a structured framework for integrating gaze data with the content of the paintings.
The segmentation results and class annotations were overlaid onto the paintings, allowing us to study
user gaze interactions with specific objects.</p>
        <p>Table 1 displays the total time each participant spent on each type of objects. We observe diverse
behaviors among users: some participants did not look at all the identified objects (e.g., participants P2,
P8, and P19). Others spent very little time on the identified objects, like participants P8 and P15, while
some, such as P22, spent considerable time on the majority of identified objects. Additionally, some
participants exhibited mixed patterns in their interaction with the objects; for instance, participant P17
spent significant time on certain objects and very little time on others.</p>
        <p>Table 2 provides information about the number of paintings each participant viewed, as well as the
number of paintings in which they spent more time looking at objects rather than the background. The
data shows that in the majority of paintings, most participants primarily focused on the background.
Percentages of time spent on objects confirm this, with exceptions noted for P2, P14, P17, and P19,
who focused more on objects than the background. However, the total and average number of times
participants looked at objects exceeded those for the background across all participants. This indicates
that while participants spent the majority of their time on the background, they visited the objects
more frequently than the background.</p>
        <p>Such insights can inform the design of personalized virtual guides or interactive agents that adapt
to users’ interests, ofering tailored recommendations or additional information based on their gaze
behavior. For example, if a participant shows prolonged interest in a particular object, the virtual agent
could provide additional historical or artistic context, enriching the user’s experience. This ability to
considered background) in the paintings. Bold numbers indicate the classes that received the highest gaze time.
Number of visited paintings (#p), instances where participants focused more on objects than the background
(#f), mean percentage of time spent on objects (%o), and the frequencies of visits to objects (#o) and background
dynamically adapt to user behavior represents a significant advancement in virtual cultural heritage
experiences.</p>
      </sec>
      <sec id="sec-5-4">
        <title>4.4. Gaze transition sequences</title>
        <p>Using the SPARQL query provided in Listing 4, we extracted the transitional sequences for each
participant who spent time observing this painting. From the sequences, we observed that participant
P3 primarily focused on the dress of the lady on the right, briefly shifted attention to her face, then moved
to the table on the left. Their gaze alternated between the table and the dress. Similarly, participant P15
scanned briefly from the man and woman to the woman’s dress, then back to the table before moving
on to other paintings. In contrast, participants P21 and P22 dedicated substantial time exploring each
object in the exhibition. P21 focused more on the lady and her dress on the right, whereas P22 showed
interest in the man and the table on the left.</p>
        <p>These insights ofer a deeper understanding of how users’ attention moves across diferent elements
Woman
Dress
Table</p>
        <p>P15
Clothing
Table
P22</p>
        <p>Woman
Man</p>
        <p>Dres
of a painting, providing valuable data for curators and designers to optimize exhibit layouts and content
presentation. By linking gaze data with AOIs and analyzing gaze durations, our system is capable
of generating high-level behavioral trends, which can be used to enhance user engagement. These
ifndings contribute to the broader goal of creating more interactive and personalized cultural heritage
experiences in VR environments.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Discussion and conclusion</title>
      <p>A critical limitation in this study is the performance of the object detection model, particularly for
lowerresolution paintings. Despite predefined class labels, some objects were misclassified as background,
skewing the analysis of user gaze patterns. Improving object detection accuracy – by incorporating
additional class labels and domain-specific datasets, particularly for classical artworks – will be essential
for enhancing the precision of gaze-based analysis in future work.</p>
      <p>Despite these challenges, our work contributes to the field of knowledge management for numerical
modeling by integrating segmented object data and real-time eye-gaze data into a structured,
ontologybased framework. This allows for a comprehensive analysis of user interactions in virtual art exhibitions.
By linking eye-gaze data to specific Areas of Interest (AOIs), identified through object segmentation,
we provide a semantic layer that enhances the understanding of user behavior. This enriched data
supports personalized virtual guides that can adapt in real-time to visitor engagement, ofering tailored
experiences based on gaze patterns.</p>
      <p>Our approach also demonstrates the value of gaze data in informing knowledge management practices,
particularly by capturing visitor behavior at a granular level. The identification of diferent visitor
types based on gaze patterns enables the development of personalized, automated tours, and managing
complex datasets to support adaptive, data-driven experiences. Additionally, insights into demographic
engagement patterns, such as shorter attention spans in younger visitors, can inform the design of
more efective cultural heritage applications.</p>
      <p>Managing the large volume of gaze data within the ontology required balancing data granularity
with memory eficiency. To address this, we propose selectively storing key metrics such as fixation
counts, transitions between AOIs, and time spent on objects in the future. This would ensure that
meaningful engagement data is retained while minimizing the system’s data footprint, enhancing the
scalability of the framework for larger exhibitions and more complex datasets.</p>
      <p>In conclusion, our ontology-based framework successfully integrates segmented object data with gaze
patterns, ofering a novel approach to understanding user behaviors in VR exhibitions. While further
improvements in object detection are necessary, our methodology contributes to the broader field of
knowledge management by semantically enriching gaze data and enabling adaptive, personalized user
experiences. This work demonstrates the potential for integrating complex measurement datasets and
behavioral data into knowledge management systems.
Conference on Engineering, Technology and Innovation/IEEE lnternational Technology
Management Conference (ICE/ITMC), 2016. URL: https://doi.org/10.1109/ice/itmc39735.2016.9025900.
[13] D. Javdani Rikhtehgar, S. Wang, H. Huitema, J. Alvares, S. Schlobach, C. Riefe, D. Heylen,
Personalizing cultural heritage access in a virtual reality exhibition: A user study on viewing behavior
and content preferences, in: Adjunct Proceedings of the 31st ACM Conference on User Modeling,
Adaptation and Personalization, 2023, pp. 379–387.
[14] S. Wang, D. Kulyk, D. J. Rikhtehgar, D. Heylen, C. Riefe, Correlating eye gaze with object to enrich
cultural heritage knowledge graph, in: CEUR workshop proceedings, volume 3632, Rheinisch
Westfälische Technische Hochschule, 2023.
[15] S. Zhang, Z. Zhang, L. Sun, W. Qin, One for all: A mutual enhancement method for object
detection and semantic segmentation, Applied Sciences 10 (2019) 13. URL: https://doi.org/10.3390/
app10010013.
[16] R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate object detection
and semantic segmentation, in: 2014 IEEE Conference on Computer Vision and Pattern Recognition,
2014. URL: https://doi.org/10.1109/cvpr.2014.81.
[17] R. Mottaghi, X. Chen, X. Liu, N. Cho, S. Lee, S. Fidler, R. Urtasun, A. Yuille, The role of context for
object detection and semantic segmentation in the wild, in: 2014 IEEE Conference on Computer
Vision and Pattern Recognition, 2014. URL: https://doi.org/10.1109/cvpr.2014.119.
[18] J. Dong, Q. Chen, S. Yan, A. Yuille, Towards unified object detection and semantic segmentation, in:
Computer Vision – ECCV 2014, 2014, pp. 299–314. URL: https://doi.org/10.1007/978-3-319-10602-1_
20.
[19] R. Padilla, S. L. Netto, E. A. B. da Silva, A survey on performance metrics for object-detection
algorithms, in: 2020 International Conference on Systems, Signals and Image Processing (IWSSIP),
2020, pp. 237–242. URL: https://doi.org/10.1109/IWSSIP48289.2020.9145130.
[20] L. Liu, W. Ouyang, X. Wang, P. Fieguth, J. Chen, X. Liu, M. Pietikäinen, Deep learning for generic
object detection: A survey, International Journal of Computer Vision 128 (2020) 261–318. URL:
https://doi.org/10.1007/s11263-019-01247-4.
[21] K. He, G. Gkioxari, P. Dollar, R. Girshick, Mask r-cnn, in: International Conference on Computer
Vision (ICCV), 2017, pp. 2961–2969. URL: https://openaccess.thecvf.com/content_iccv_2017/html/
He_Mask_R-CNN_ICCV_2017_paper.html.
[22] A. Kirillov, K. He, R. Girshick, C. Rother, P. Dollar, Panoptic segmentation, in: Conference on
Computer Vision and Pattern Recognition (CVPR), 2019, pp. 9404–9413. URL: https://openaccess.
thecvf.com/content_CVPR_2019/html/Kirillov_Panoptic_Segmentation_CVPR_2019_paper.html.
[23] X. Yue, K. Qi, X. Na, Y. Zhang, Y. Liu, C. Liu, Improved yolov8-seg network for instance
segmentation of healthy and diseased tomato plants in the growth stage, Agriculture 13 (2023). URL:
https://doi.org/10.3390/agriculture13081643.
[24] L. P. Osco, Q. Wu, E. L. de Lemos, W. N. Gonçalves, A. P. Ramos, J. Li, J. Marcato, The segment
anything model (sam) for remote sensing applications: From zero to one shot, International
Journal of Applied Earth Observation and Geoinformation 124 (2023) 103540. URL: https://doi.org/
10.1016/j.jag.2023.103540.
[25] Y. Xiong, R. Liao, H. Zhao, R. Hu, M. Bai, E. Yumer, R. Urtasun, Upsnet: A unified panoptic
segmentation network, in: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition
(CVPR), 2019. URL: https://doi.org/10.1109/cvpr.2019.00902.
[26] A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C.</p>
      <p>Berg, W. Lo, P. Dollar, R. Girshick, Segment anything, in: International Conference on Computer
Vision (ICCV), 2023, pp. 4015–4026. URL: https://openaccess.thecvf.com/content/ICCV2023/html/
Kirillov_Segment_Anything_ICCV_2023_paper.html.
[27] D. Javdani Rikhtehgar, I. Tiddi, S. Wang, S. Schlobach, D. Heylen, Assessing the hi-ness of virtual
heritage applications with knowledge engineering, in: HHAI 2024: Hybrid Human AI Systems for
the Social Good, IOS Press, 2024, pp. 173–187.
[28] Y. Zhang, J. Zhu, Q. Zhu, Y. Xie, W. Li, L. Fu, J. Zhang, J. Tan, The construction of personalized
virtual landslide disaster environments based on knowledge graphs and deep neural networks,
International Journal of Digital Earth 13 (2020) 1637–1655.
[29] X. Chen, S. Jia, Y. Xiang, A review: Knowledge reasoning over knowledge graph, Expert systems
with applications 141 (2020) 112948.
[30] T. Diwan, G. Anirudh, J. v. Tembhurne, Object detection using yolo: challenges, architectural
successors, datasets and applications, Multimedia Tools and Applications 82 (2023) 9243–9275.</p>
      <p>URL: https://doi.org/10.1007/s11042-022-13644-y.
[31] S. Castagnos, P. Pu, Consumer decision patterns through eye gaze analysis, in: Proceedings
of the 2010 workshop on Eye gaze in intelligent human machine interaction, 2010, pp. 26–33.
doi:10.1145/2002333.2002346.
[32] F. Koochaki, L. Najafizadeh, Predicting intention through eye gaze patterns, in: 2018 IEEE
Biomedical Circuits and Systems Conference (BioCAS), 2018, pp. 1–4. doi:10.1109/biocas.2018.
8584665.
[33] D. S. Chang, G. H. Cho, Y. S. Choi, Ontology-based knowledge model for human-robot interactive
services, in: Proceedings of the 35th Annual ACM Symposium on Applied Computing, 2020, pp.
2029–2038.
[34] S. D. Costa, M. P. Barcellos, R. de Almeida Falbo, T. Conte, K. M. de Oliveira, A core ontology
on the human–computer interaction phenomenon, Data &amp; Knowledge Engineering 138 (2022)
101977.
[35] M. Castro, M. Barcellos, An ontology to support knowledge management solutions for
humancomputer interaction design, in: Proceedings of the XXI Brazilian Symposium on Software Quality,
2022, pp. 1–10.
[36] A. A. de Freitas, M. B. Scalser, S. D. Costa, M. P. Barcellos, Towards an ontology-based approach
to develop software systems with adaptive user interface, in: Proceedings of the 21st Brazilian
Symposium on Human Factors in Computing Systems, 2022, pp. 1–7.
[37] E. Champion, Virtual heritage: a guide, Ubiquity Press, 2021.
[38] F. Liarokapis, P. Petridis, D. Andrews, S. de Freitas, Multimodal serious games technologies for
cultural heritage, Mixed reality and gamification for cultural heritage (2017) 371–392.
[39] D. Liu, Knowledge Graph Driven Conversational Virtual Museum Guide, Master’s thesis, University
of Twente, 2021.
[40] M. Duguleană, V.-A. Briciu, I.-A. Duduman, O. M. Machidon, A virtual assistant for natural
interactions in museums, Sustainability 12 (2020) 6958.
[41] B. De Carolis, N. Macchiarulo, C. Valenziano, Marta: A virtual guide for the national archaeological
museum of taranto, in: Proceedings of the 2022 AVI-CH Workshop on Advanced Visual Interfaces
for Cultural Heritage. CEUR-WS. org, 2022.
[42] P. Dondi, M. Porta, Gaze-based human–computer interaction for museums and exhibitions:</p>
      <p>Technologies, applications and future perspectives, Electronics 12 (2023) 3064.
[43] M. Mokatren, T. Kuflik, I. Shimshoni, Exploring the potential of a mobile eye tracker as an intuitive
indoor pointing device: A case study in cultural heritage, Future generation computer systems 81
(2018) 528–541.
[44] T. Yi, M. Chang, S. Hong, J.-H. Lee, Use of eye-tracking in artworks to understand information
needs of visitors, International Journal of Human–Computer Interaction 37 (2021) 220–233.
[45] M. Pelowski, H. Leder, V. Mitschke, E. Specker, G. Gerger, P. P. Tinio, E. Vaporova, T. Bieg,
A. Husslein-Arco, Capturing aesthetic experiences with installation art: An empirical assessment
of emotion, evaluations, and mobile eye tracking in olafur eliasson’s “baroque, baroque!”, Frontiers
in Psychology 9 (2018) 1255.
[46] M. Rainoldi, B. Neuhofer, M. Jooss, Mobile eyetracking of museum learning experiences, in:
Information and Communication Technologies in Tourism 2018: Proceedings of the International
Conference in Jönköping, Sweden, January 24-26, 2018, Springer, 2018, pp. 473–485.
[47] R.-M. Rahal, S. Fiedler, Understanding cognitive and afective mechanisms in social psychology
through eye-tracking, Journal of Experimental Social Psychology 85 (2019) 103842.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Puig</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Rodríguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Arcos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Rodríguez-Aguilar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Cebrián</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bogdanovych</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Morera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Palomo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Piqué</surname>
          </string-name>
          ,
          <article-title>Lessons learned from supplementing archaeological museum exhibitions with virtual reality</article-title>
          ,
          <source>Virtual Reality</source>
          <volume>24</volume>
          (
          <year>2020</year>
          )
          <fpage>343</fpage>
          -
          <lpage>358</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wan</surname>
          </string-name>
          , G. Liu, G. Xie,
          <string-name>
            <given-names>J.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Meng, Investigating the efectiveness of virtual reality for culture learning</article-title>
          ,
          <source>International Journal of Human-Computer Interaction</source>
          <volume>37</volume>
          (
          <year>2021</year>
          )
          <fpage>1771</fpage>
          -
          <lpage>1781</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Trunfio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Lucia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Campana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Magnelli</surname>
          </string-name>
          ,
          <article-title>Innovating the cultural heritage museum service model through virtual reality and augmented reality: The efects on the overall visitor experience and satisfaction</article-title>
          ,
          <source>Journal of Heritage Tourism</source>
          <volume>17</volume>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>19</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Machała</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Chamier-Gliszczyński</surname>
          </string-name>
          ,
          <source>T. Królikowski, Application of ar/vr technology in industry 4.0, Procedia Computer Science</source>
          <volume>207</volume>
          (
          <year>2022</year>
          )
          <fpage>2990</fpage>
          -
          <lpage>2998</lpage>
          . URL: https://doi.org/10.1016/j.procs.
          <year>2022</year>
          .
          <volume>09</volume>
          . 357.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>H.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. H.</given-names>
            <surname>Jung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Tom</given-names>
            <surname>Dieck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Chung</surname>
          </string-name>
          ,
          <article-title>Experiencing immersive virtual reality in museums</article-title>
          ,
          <source>Information &amp; Management</source>
          <volume>57</volume>
          (
          <year>2020</year>
          )
          <article-title>103229</article-title>
          . URL: https://doi.org/10.1016/j.im.
          <year>2019</year>
          .
          <volume>103229</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B.</given-names>
            <surname>Fasel</surname>
          </string-name>
          ,
          <string-name>
            <surname>L. Van Gool</surname>
          </string-name>
          ,
          <article-title>Interactive museum guide: Accurate retrieval of object descriptions</article-title>
          ,
          <source>in: Lecture Notes in Computer Science</source>
          ,
          <year>2007</year>
          , pp.
          <fpage>179</fpage>
          -
          <lpage>191</lpage>
          . URL: https://doi.org/10.1007/ 978-3-
          <fpage>540</fpage>
          -71545-0_
          <fpage>14</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.</given-names>
            <surname>Bailey-Ross</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ashby</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Terras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hudson-Smith</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Warwick, Engaging the museum space: Mobilizing visitor engagement with digital content creation</article-title>
          ,
          <source>Digital Scholarship in the Humanities</source>
          <volume>32</volume>
          (
          <year>2016</year>
          )
          <fpage>689</fpage>
          -
          <lpage>708</lpage>
          . URL: https://doi.org/10.1093/llc/fqw041.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J. B.</given-names>
            <surname>Schreiber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Pekarik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Hanemann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Doering</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>Understanding visitor engagement and behaviors</article-title>
          ,
          <source>The Journal of Educational Research</source>
          <volume>106</volume>
          (
          <year>2013</year>
          )
          <fpage>462</fpage>
          -
          <lpage>468</lpage>
          . URL: https://doi.org/10. 1080/00220671.
          <year>2013</year>
          .
          <volume>833011</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kejriwal</surname>
          </string-name>
          ,
          <article-title>What is a knowledge graph? domain-specific knowledge graph construction, in: Domain-Specific Knowledge Graph Construction,</article-title>
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          . URL: https://doi.org/10.1007/ 978-3-
          <fpage>030</fpage>
          -12375-
          <issue>8</issue>
          _
          <fpage>1</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>P.</given-names>
            <surname>Weinzaepfel</surname>
          </string-name>
          , G. Csurka,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Cabon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Humenberger</surname>
          </string-name>
          ,
          <article-title>Visual localization by learning objects-ofinterest dense match regression</article-title>
          ,
          <source>in: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          ,
          <year>2019</year>
          . URL: https://doi.org/10.1109/cvpr.
          <year>2019</year>
          .
          <volume>00578</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>F.</given-names>
            <surname>Barth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Candello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cavalin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Pinhanez</surname>
          </string-name>
          , Intentions, meanings, and whys,
          <source>in: Proceedings of the 2nd Conference on Conversational User Interfaces</source>
          ,
          <year>2020</year>
          . URL: https://doi.org/10.1145/3405755. 3406128.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>P. P.</given-names>
            <surname>Morantes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Penarete</surname>
          </string-name>
          , G. Arbelaez,
          <string-name>
            <given-names>M.</given-names>
            <surname>Camargo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Dupont</surname>
          </string-name>
          ,
          <article-title>Understanding museum visitors' experience through an eye-tracking study and a living lab approach</article-title>
          , in: 2016 International
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>