<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Bologna, Italy.
$ anne.rother@ovgu.de (A. Rother)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Comparing visual tools for pairwise comparisons of tabular data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Anne Rother</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matteo Polsinelli</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Till Ittermann</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giuseppe Placidi</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Myra Spiliopoulou</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Otto-von-Guericke University Magdeburg</institution>
          ,
          <addr-line>Magdeburg</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University Medicine Greifswald</institution>
          ,
          <addr-line>Greifswald</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of L'Aquila</institution>
          ,
          <addr-line>L'Aquila</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of Salerno</institution>
          ,
          <addr-line>Salerno</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>AI-based diagnostics demand reliable medical record labeling. Despite the advances of few-shot and zero-shot learning, each specialized medical data collection demands at least some labels that agree with the feature space and the class distribution of the collection. However, human posteriori classification of existing records on diagnoses that have not been considered during the original data acquisition demands efort and expert knowledge. To facilitate human labor and decrease the required level of expertise, we propose a workflow that encompasses pairwise comparisons of medical records and dedicated visualizations for the juxtaposition of record pairs in the original feature space. We evaluate the potential of new visualization schemes in controlled experiments with human volunteers and we juxtapose the results to those achieved with earlier, much simpler visualizations.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Pairwise comparisons are used in machine learning to derive similarity functions that take local
proximity between objects into account [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Pairwise comparisons are also used crowdworking to
capitalize on the fact that humans can discern similarities between objects with their eyes, in a way
that AI still cannot immitate [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. For example, when called to perform a pairwise comparison
among the three faces in the upper part of Figure 1, humans are likely to ignore the whiskers, a feature
of some importance when comparing the three faces in the lower part of the same figure. When it
comes to high-dimensional medical records though, human annotators need more assistance when
deciding which features to concentrate on.
      </p>
      <p>In this paper, we investigate the potential of diferent structured record visualizations in assisting
humans in pairwise comparisons. We propose a workflow that encompasses a mechanism for triplet
construction from a set of labeled medical records for a binary classification problem (person has the
disease: Y/N), two visualizations for pairwise comparisons, an experiment design for the evaluation of
these visualizations on volunteers, and a set of evaluation criteria to assess the potential of each method
and its merit in comparison to simpler visualization mechanisms.</p>
      <p>Our first contribution is the complete workflow, intended to assist human annotators who do pairwise
comparison of structured medical records for the purpose of labeling. Our second contribution consists
of the two presented visualizations, which are intended to highlight similarities and diferences among
records in the original feature space. Our last contribution is the evaluation approach, covering an
experiment that involves human volunteers and a retrospective comparison to the results of an earlier
experiment that used simpler visualizations.</p>
      <p>The paper is organized as follows. We first discuss related work on pairwise comparisons and on
visualization of structured medical records, focusing on visualization methods for the original feature
space. In section III, we present the elements of our approach, while in section IV we present the
medical data we used, the experiment we performed with human volunteers and our evaluation criteria.
Section V contains our results and a discussion on them. The last section summarizes the findings and
provides an outlook.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <sec id="sec-2-1">
        <title>2.1. Pairwise comparisons</title>
        <p>
          Studied intensively from the machine learning perspective, see e.g. [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], where the objective is to induce
a distance function over the data space. The human-driven process of finding the two most similar
objects inside a triplet is investigated in psychology, but there the objective is to acquire insights into
human perception [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Insights into whether triplet comparisons performed by human annotators are
indeed exploitable by machine learning algorithms are mostly limited to the comparison of images [
          <xref ref-type="bibr" rid="ref7">7, 8</xref>
          ].
Arguably, pairwise comparison in triplets of tabular data records, such as medical instances, is diferent
from the comparison of image instances. Yao et al. used pairwise comparisons for the estimation of
treatment efects in observational data [ 9]: they chose three pairs of instances, one consisting of the
most proximal target instance  and control instance  , one consisting of the most remote target
instance with respect to , and one consisting of the most remote control with respect to  . They then
introduced two counteracting metrics on the basis of loss functions, intended to bring similar instances
close to each other but not too close in the representation space.
2.2. Measuring the dificulty of annotation and labeling tasks
Dificulty of pairwise comparisons of images has been investigated in [
          <xref ref-type="bibr" rid="ref7">7, 10</xref>
          ]. Similarly to our earlier
works [
          <xref ref-type="bibr" rid="ref2">11, 2</xref>
          ] on pairwise comparisons of non-image data. Ahonen et al. [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] used sensors that measure
electrodermal activity. Their results were not conclusive, in the sense that it did not become evident
what makes a comparison dificult independently of the person who performs the comparison. The
dificulty of pairwise comparisons of non-image objects is less investigated in general, despite the fact
that non-image objects are of relevance in several application domains, including the annotation of
clinical data. However, there are several investigations on the dificulty of crowdworkers tasks, including
labeling tasks and more elaborate annotations. Traditionally, ‘dificulty’ (which is not observable) is
modeled on the basis of observable quantities. One of them is ‘duration’, defined in [ 12] as the time
needed to complete a specific task and used as indicator of task dificulty for a specific crowdworker.
An important indicator is (dis)agreement among crowdworkers, pointing to task ambiguity [13] or
to diverging interpretations of a task [14], i.e. to inherent task properties independently of a specific
crowdworker’s skills and expertise. In [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] we focused on (dis)agreement as potential indicator of
dificulty: Annotator (dis)agreement was not predictive – neither for dificulty nor for correctness.
Furthermore annotators performed pairwise comparisons on triplets that consisted of 10-dimensional
medical instances from the cohort SHIP-2 of [15]. We found that for some instances proximity across
certain dimensions was misleading in the sense that annotators consistently decided that a pair of
instances inside a triplet were more similar than they truly were.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.3. Annotation of medical data</title>
        <p>Images, diagnostic texts or structured instances, is a very important task, for which crowd-working has
been applied increasingly and successfully in recent years [16, 17, 18]. In [18], Wazny et al. list 8 areas
of medical applications, where crowdsourcing is being used; among them, diagnosis, such as assigning
scores to tumors. This corresponds to the creation of ground truth in existing datasets through labeling.
However, medical annotations go beyond the assignment of labels or scores. For example, Joshi et
al. recruited volunteers who identified the ‘location’ of emotional episodes in timestamped data, as
well as the duration of these episodes [19]. Studies on the annotation of medical data follow diferent
directions. They include the study of the potential of Virtual Reality (VR) technologies as in [20, 21],
the generation of open access datasets [22], the role of annotated data collections in education [23], and
ways of semi-automating the labeling/annotation process. Among the latter, the earlier work of Nissim
et al. [24] highlighted the potential of active learning to reduce label acquisition cost. More recently,
combinations of semi-supervision and crowdsourcing have become a popular subject of investigation,
see e.g. [25, 26].
3. Workflow for record annotation through pairwise visual
comparisons</p>
      </sec>
      <sec id="sec-2-3">
        <title>3.1. A pie-based visualization</title>
        <p>
          The proposed method was inspired by the solution proposed in [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] in which the experiment participant
was shown two representations: a tile-based and a line-based. In the first, each triplet is composed of ten
tiles for each risk factor with the numerical values marked as shade (Figure 2, left box). In the second,
the position of the middle record value for some variables indicates its distance from the variable values
for the other two records (Figure 2, right box). This solution has been shown to be efective but can be
improved using a new visualization method that does not separate the features from the others.
        </p>
        <p>The main idea is to use the pie-based visualization shown in Figure 3. Compared to the old
visualization, this is more compact since each of the ten variables is represented as a slice of the pie. In this way,
three pies are necessary to represent the three subjects A, B, and C of the experiment. The comparison
between subjects is immediate, and the slices of the pie are position invariant, since the crowd worker
is not biased by the particular arrangement of each variable (there is no ordering between them).</p>
        <p>In both methods, the color palette is assigned by linearly distributing the colors in the Min-Max
interval of the feature values by using a discrete number of colors for discrete features. 5-values color
scales is used for continuous features, the 2-values color scales is used for binary features, and the
3-values color scales is used for the ternary feature.</p>
        <p>The resulting color-based triplet assignments are described in Algorithm 1 for the old method and in
Algorithm 2 for the new method.</p>
        <p>Algorithm number 1 tripletA, tripletB, tripletC
  ∈    ← ( )  ← ( )
     ← 2      ← 3  ← 5</p>
        <p>Palette ← createPalette(bins, m, M) ( ) ← ( , . .)
( ) ← ( , . .) ( ) ←
( , . .) ℎ( )</p>
        <p>Algorithm number 1 tripletA, tripletB, tripletC
  ← 10  ← ( )  ← ( )  ←
( )  ← 1  ∈    ← ( )  ← ( )
    ← 2     ← 3  ← 5
  ←  (, ,  ) () ← ( , . .)
() ← ( , . .) () ← ( , . .)
 ←  + 1   ←  (5, 0, 1) ℎ( )</p>
        <p>In Figure 3, it is possible to understand how easily it can be concluded that instance B is similar to
instance A because the right half-pie of both is equal, as well as the slices representing LDL, CRP and
Alcohol. Instead, in Figure 2, in which the same instances A,B,C are represented, the comparison is less
immediate because the crowdworker is led to analyze one variable at a time.</p>
        <p>This is even more evident in Figures 4 and 5 which present a less obvious case. In fact, instance
B is still more similar to instance A, but in this case, the similarities are few and it is not possible to
establish it by directly confronting each variable, but it is necessary an overall view, and for this reason,
pie-based visualization is still superior.</p>
        <p>The last example is presented in Figure 6 and Figure 7 and is very dificult to assess. Both instances A
and C are good candidates and looking carefully in the pie-based visualization, it is possible to conclude
that B is more similar to A, even if even if very little.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Our Evaluation Workflow</title>
      <sec id="sec-3-1">
        <title>4.1. The triplets of the experiment</title>
        <p>In this study, we investigate the potential of diferent visualization schemes for pairwise comparison of
medical records.</p>
        <p>
          As a follow-up of the experiment in [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] we asked 2 experts to annotate the new visualization to assess
whether an individual is more similar to a healthy or a diseased individual using hepatic steatosis as an
outcome. Both experts conduct research on active learning, prediction and classification. They do not
know the SHIP dataset. Each expert was asked to annotate 30 annotation tasks + 3 tasks of diferent
levels of dificulty. Furthermore, they have to express the perceived dificulty for the annotation of each
triplet by choosing one of the following four answers: “very certain", “rather certain", “rather uncertain",
and “very uncertain".
        </p>
        <p>
          For choosing the triplets we used the dataset as presented and described in [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. There we randomly
selected 90 records out of 852 individuals of SHIP-2. These are categorized into the following three
categories: “no hepatic steatosis" (liver fat fraction ≤ 5.0%, n = 501), “mild hepatic steatosis" (5.0% ≤ liver
fat fraction &lt;14%, n = 238), and “moderate to severe hepatic steatosis" (liver fat fraction ≥ 14%, n = 113)
[27]. More specific we selected 45 individuals from the class “no hepatic steatosis" and 45 from the class
“moderate to severe hepatic steatosis" and split this two subsamples into three groups of 15 individuals.
For each subject, ten risk factors of hepatic steatosis are reported: age, sex, alanine-aminotransferase
(ALAT), low-density lipoproteine (LDL) cholesterol, alcohol consumption, hypertension, beta-blocker
intake, type 2 diabetes mellitus, smoking status, and c-reactive protein (CRP).
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>4.2. Evaluation Criteria</title>
        <p>
          Our scenario is a controlled pairwise comparison experiment, in which we want to find out which
features catch the participants’ eye under each configuration and which configuration helps them most
in finding the ‘good features’. The configurations are (a) our new color-based one and (b) the baseline
used in the article of [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
        </p>
        <p>
          To compare the new graphic model with the article of [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], we compute correctness, and then we run
the experiment with the same triplets. We compute the average correctness as performance indicators
to evaluate the new graphic model for diferent degrees of task dificulty.
        </p>
        <p>To evaluate the performance of both methods, we define the following evaluation criteria:
• Correct classifications
• Score, to compare the two visualizations: How often one was correct under each visualization
In addition, we present the uncertainty of experts for the new visualization.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Findings</title>
      <sec id="sec-4-1">
        <title>5.1. Findings with the proposed visualization</title>
        <p>In Table 1 we show the annotation of the two experts. They difer in the annotation in 6 tasks (bold
marked). Furthermore, the column “Uncertainty" shows the perceived dificulty per triplet.</p>
        <p>As depicted both experts are “rather certain" in the annotation: 14 and 12 times out of 30. “Rather
uncertain" they are in 9 and 10 triplets out of 30. On 4 and 6 triplets, they are “very uncertain". The
experts gave the lowest response for “very certain": Only 3 and 2 times out of 30 triplets they chose this
answer.</p>
        <p>It is remarkable that the annotation of the triplets for easy, medium and dificult difer. T11 represent
the easy triplet - here the dificulty changes slightly. For middle dificulty the annotation changes
completely. Under T18, both experts annotated incorrectly. Later on, they annotated correctly when
they annotated this task again. For the dificulty triplet the perceived dificulty changes from very
uncertain to rather uncertain. The annotation remains the same, but incorrect.</p>
      </sec>
      <sec id="sec-4-2">
        <title>5.2. Comparison to the baseline visualization</title>
        <p>In Table 2 we juxtaposed how the experts annotated the triplets for both visualizations. For better
comparability we removed one expert annotation for the old version. This expert is an epidemiologist
and created the dataset.</p>
        <p>The annotations difer in 14 out of 30 tasks and are marked in bold. The new visualization was
annotated slightly better than the old visualization. We have better correctness for the easy triplets,
similar correctness for the medium ones, and also similar for the dificult ones. On average, the old
visualization was correctly annotated 0.50, the new visualization on average 0.57. This could also be
related to the choice of experts. In the old visualization, a physician annotated the triplets and another
expert knew the SHIP-2 dataset. In contrast, the two new experts for the new visualization have no
T31
T32
T33</p>
        <sec id="sec-4-2-1">
          <title>Correctness</title>
          <p>Expert 1 Expert 2
yes no
yes yes
yes yes
no no
yes yes
yes yes
yes no
no no
no no
yes yes
yes yes
no no
yes yes
no no
no no
yes yes
yes yes
yes yes
yes no
no no
yes yes
yes yes
no yes
yes yes
no yes
yes yes
no no
no no
no yes
no no
yes
no
no
yes
no
no
Expert 1
rather uncertain
rather uncertain
rather certain
very certain
rather certain
rather uncertain
rather uncertain
rather certain
very certain
very certain
rather certain
rather uncertain
rather certain
rather certain
rather certain
very uncertain
rather certain
very uncertain
rather certain
rather uncertain
rather uncertain
rather certain
rather uncertain
rather uncertain
rather certain
very uncertain
rather certain
rather uncertain
rather certain
very uncertain</p>
        </sec>
        <sec id="sec-4-2-2">
          <title>Uncertainty</title>
          <p>Expert 2
rather certain
rather uncertain
rather uncertain
very certain
rather certain
rather uncertain
rather uncertain
rather certain
very certain
rather uncertain
rather certain
very uncertain
rather certain
rather certain
rather certain
rather uncertain
very uncertain
very uncertain
rather certain
rather certain
rather certain
rather uncertain
rather uncertain
rather certain
rather certain
very uncertain
rather uncertain
very uncertain
rather certain
very uncertain
very uncertain
rather uncertain
rather certain
rather certain
very uncertain
rather certain
medical background and do not know the data set. We are not trying to find the most globally influential
variable. Since the important variables vary per triplet. Therefore, each variable has the same position
in each triplet.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>6. Conclusion and Future Work</title>
      <p>In this work, we investigated the potential of diferent visualization schemes of medical records. We
elaborated on an experiment whether a new visualization leads to a better annotation, based on
correctness and investigated this with expert annotation on a previous visualization. Thereafter, we
will start investigating the role of stress as a confounder. We will also expand the experiment to
non-experts and focus on uncertainty, to further improve the visualization and thus get better results
in the annotation. Moreover, we will investigate which features are afecting correctness and how to
combine with semisupervised pairwise comparisons.</p>
      <sec id="sec-5-1">
        <title>Triplet</title>
        <sec id="sec-5-1-1">
          <title>6.1. Further possibilities for data annotation</title>
          <p>In addition to various visualization methods, annotation can also take place on the basis of raw data,
for example as tabular data (see Table 3). Table 3 shows a simple triplet. The middle, B, instance is to be
assigned whether it is more similar to the A instance or C instance. Similar variables are marked in blue
(B more similar to A) or orange (B more similar to C). In this example, the IRIS data set consists of only
a few variables, so that a more manageable assessment can be made. In this example, annotators would
look at how many matches there are per variable (the class is not visible) and then decide whether the
B instance is more similar to the A instance or to the C instance. A rather more dificult example is
in Table 4. This is also based on the IRIS data set, but the assignment is made more dificult by the
similarity of the A and C instances. The variable “sepal lengh” is not unique in this example. Annotators
could therefore possibly ignore this variable for the decision-making process. In the triplet as a whole,
the B instance is slightly more similar to the C instance than to the A instance. As soon as a variable
is weighted more importantly, this decision could either strengthen the decision or lead to a diferent
decision. With data sets that contain more variables, such as the mushroom data set, it is very dificult
to recognize individual variables separately. Our suggestion would be to hide the variables where the
values are identical so that a better assignment can take place. This and the optimal number of variables
per triplet will be investigated in future experiments.</p>
          <p>instance
A
B
C</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Funding</title>
      <p>SHIP is part of the Community Medicine Research net of the University of Greifswald, Germany,
supported by the Federal Ministry of Education and Research (grants no. 01ZZ9603, 01ZZ0103, and
01ZZ0403), the Ministry of Cultural Afairs as well as the Social Ministry of the Federal State of
Mecklenburg-West Pomerania.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>The author(s) have not employed any Generative AI tools.
[8] S. Sharifi Noorian, S. Qiu, U. Gadiraju, J. Yang, A. Bozzon, What should you know? a
human-inthe-loop approach to unknown unknowns characterization in image recognition, in: Proceedings
of the ACM Web Conference 2022, 2022, pp. 882–892.
[9] L. Yao, S. Li, Y. Li, M. Huai, J. Gao, A. Zhang, Representation learning for treatment efect estimation
from observational data, Advances in Neural Information Processing Systems 31 (2018) 2633–2643.
[10] E. Amid, A. Ukkonen, Multiview triplet embedding: Learning attributes in multiple maps, in:</p>
      <p>International Conference on Machine Learning, 2015, pp. 1472–1480.
[11] N. Jambigi, T. Chanda, V. Unnikrishnan, M. Spiliopoulou, Assessing the dificulty of labelling an
instance in crowdworking, in: 2nd Workshop on Evaluation and Experimental Design in Data
Mining and Machine Learning@ ECML PKDD 2020, 2020.
[12] U. Gadiraju, G. Demartini, R. Kawase, S. Dietze, Crowd anatomy beyond the good and bad:
Behavioral traces for crowd worker modeling and pre-selection, Computer Supported Cooperative
Work (CSCW) 28 (2019) 815–841.
[13] M. Schaekermann, E. Law, K. Larson, A. Lim, Expert disagreement in sequential labeling: A case
study on adjudication in medical time series analysis, in: SAD/CrowdBias@ HCOMP, 2018, pp.
55–66.
[14] S. Kairam, J. Heer, Parting crowds: Characterizing divergent interpretations in crowdsourced
annotation tasks, in: Proceedings of the 19th ACM Conference on Computer-Supported Cooperative
Work &amp; Social Computing, 2016, pp. 1637–1648.
[15] H. Völzke, J. Schössow, C. O. Schmidt, C. Jürgens, A. Richter, A. Werner, N. Werner, D. Radke,
A. Teumer, T. Ittermann, et al., Cohort profile update: The study of health in pomerania (ship),
International journal of epidemiology (2022).
[16] J. D. Tucker, S. Day, W. Tang, B. Bayus, Crowdsourcing in medical research: concepts and
applications, PeerJ 7 (2019) e6762.
[17] C. Wang, L. Han, G. Stein, S. Day, C. Bien-Gund, A. Mathews, J. J. Ong, P.-Z. Zhao, S.-F. Wei,
J. Walker, et al., Crowdsourcing in health and medical research: a systematic review, Infectious
diseases of poverty 9 (2020) 1–9.
[18] K. Wazny, Applications of crowdsourcing in health: an overview, Journal of global health 8 (2018).
[19] A. A. Joshi, M. Chong, J. Li, S. Choi, R. M. Leahy, Are you thinking what i’m thinking?
synchronization of resting fmri time-series across subjects, NeuroImage 172 (2018) 740–752.
[20] A. Huaulmé, F. Despinoy, S. A. H. Perez, K. Harada, M. Mitsuishi, P. Jannin, Automatic annotation
of surgical activities using virtual reality environments, International journal of computer assisted
radiology and surgery 14 (2019) 1663–1671.
[21] O. Legetth, J. Rodhe, S. Lang, P. Dhapola, M. Wallergård, S. Soneji, Cellexalvr: A virtual reality
platform to visualize and analyze single-cell omics data, Iscience (2021) 103251.
[22] E. E. Kpokiri, R. John, D. Wu, N. Fongwen, J. Z. Budak, C. C. Chang, J. J. Ong, J. D. Tucker,
Crowdsourcing to develop open-access learning resources on antimicrobial resistance, BMC
infectious diseases 21 (2021) 1–7.
[23] M. van Deursen, L. Reuvers, J. D. Duits, G. de Jong, M. van den Hurk, D. Henssen, Virtual reality
and annotated radiological data as efective and motivating tools to help social sciences students
learn neuroanatomy, Scientific Reports 11 (2021) 1–10.
[24] N. Nissim, M. R. Boland, N. P. Tatonetti, Y. Elovici, G. Hripcsak, Y. Shahar, R. Moskovitch, Improving
condition severity classification with an eficient active learning based framework, Journal of
biomedical informatics 61 (2016) 44–54.
[25] W. Shi, V. S. Sheng, X. Li, B. Gu, Semi-supervised multi-label learning from crowds via deep
sequential generative model, in: Proceedings of the 26th ACM SIGKDD International Conference
on Knowledge Discovery &amp; Data Mining, 2020, pp. 1141–1149.
[26] P. A. Traganitis, G. B. Giannakis, Bayesian semi-supervised crowdsourcing, arXiv preprint
arXiv:2012.11048 (2020).
[27] J.-P. Kühn, D. Hernando, A. Muñoz del Rio, M. Evert, S. Kannengiesser, H. Völzke, B. Mensel,
R. Puls, N. Hosten, S. B. Reeder, Efect of multipeak spectral modeling of fat for liver iron and fat
quantification: correlation of biopsy with mr imaging results, Radiology 265 (2012) 133–142.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>V.</given-names>
            <surname>Simard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rönnqvist</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Lebel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Lehoux</surname>
          </string-name>
          ,
          <article-title>A method to classify data quality for decision making under uncertainty</article-title>
          ,
          <source>ACM Journal of Data and Information Quality</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rother</surname>
          </string-name>
          , U. Niemann,
          <string-name>
            <given-names>T.</given-names>
            <surname>Hielscher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Völzke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ittermann</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Spiliopoulou, Assessing the dificulty of annotating medical data in crowdworking with help of experiments</article-title>
          ,
          <source>PloS one 16</source>
          (
          <year>2021</year>
          )
          <article-title>e0254764</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rother</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ittermann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Spiliopoulou</surname>
          </string-name>
          ,
          <article-title>Semi-supervised learning with pairwise instance comparisons for medical instance classification</article-title>
          ,
          <source>in: International Symposium on Intelligent Data Analysis</source>
          , Springer,
          <year>2025</year>
          . To appear.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>A. Holzinger,</surname>
          </string-name>
          <article-title>Interactive machine learning for health informatics: when do we need the humanin-the-loop?</article-title>
          ,
          <source>Brain Informatics</source>
          <volume>3</volume>
          (
          <year>2016</year>
          )
          <fpage>119</fpage>
          -
          <lpage>131</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kleindessner</surname>
          </string-name>
          , U. von Luxburg,
          <article-title>Kernel functions based on triplet comparisons</article-title>
          ,
          <source>in: Advances in neural information processing systems</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>6807</fpage>
          -
          <lpage>6817</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>N.</given-names>
            <surname>Diersch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Valdes-Herrera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Tempelmann</surname>
          </string-name>
          , T. Wolbers,
          <article-title>Increased hippocampal excitability and altered learning dynamics mediate cognitive mapping deficits in human aging</article-title>
          ,
          <source>Journal of Neuroscience</source>
          <volume>41</volume>
          (
          <year>2021</year>
          )
          <fpage>3204</fpage>
          -
          <lpage>3221</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ahonen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Cowley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Torniainen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ukkonen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vihavainen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Puolamaki</surname>
          </string-name>
          ,
          <article-title>S1: Analysis of electrodermal activity recordings in pair programming from 2 dyads, PLoS One</article-title>
          . Retrieved from http://journals. plos. org/plosone/article/asset (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>