<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Bolzano, Italy
* Corresponding author.
†These authors contributed equally.
$ davide.dipierro@uniba.it (D. Di Pierro); stefano.ferilli@uniba.it (S. Ferilli)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>A Perturbation-based Dataset Evaluation Approach for Fair Classifications</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Davide Di Pierro</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefano Ferilli</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>We are witnesses to a society in which the growing need for Artificial Intelligence in every aspect of life has pushed the research in the field. However, this enduring efort often leads to a lack of conscience in the process of evaluation of results from several perspectives. One of the most still underrepresented aspects is the detection of possible biases in the datasets used for model training, leading to unforecastable consequences for society or specific groups of people. Techniques generally used in traditional Machine Learning settings like perturbation or randomization can also be part of the evaluation of the dataset itself, in order to distinguish whether perturbations on sensitive features lead to significant changes in the output. What we propose here is a solution that allows making fictitious instances given the possibility of varying the values, thanks to ontology definitions that specify all the possible combinations for the diferent instances, and a metric to measure the distance between them.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Ontology</kwd>
        <kwd>Similarity</kwd>
        <kwd>Reasoning</kwd>
        <kwd>Fairness</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>In the realm of Machine Learning (ML) becoming more and more pervasive and crucial in our everyday
life activities, the capability of understanding the process governing decision-making phases of
algorithms is essential to connect to Artificial Intelligence (AI) in a responsible and efective way. Basically,
all the ML models learn by a first step of instance training, in which the model learns the correlation
between input data and the target (correct in theory) label. The correlation between input and target
will be the rationale for the algorithm to predict unseen data, the ones that we are interested in and
make use of constantly.</p>
      <p>
        In more recent years, the community of ML engineers, but not only, has raised the problem of how the
models are trained. Specifically, they refer to the process of collecting and managing training data that
the model is based on for the rest of its life. The problem we are mentioning is the dataset unbalancement,
a sort of problem in which the dataset presents some biases, which can be defined as patterns that
should not appear in the data because it does not reflect the reality of the domain we are describing.
Many patterns emerging from datasets can be misleading, false, or appear only with the specificity of
the data. Still, we are referring to those that can harm some categories of people, or that lead to unfair
decision-making processes. Recognizing unfair datasets or biases is even a recent line of research in
which researchers endeavour to find techniques for detecting and (possibly solving) biases in data. One
of the most frequent tasks for which a great extent of bias has been detected is the classification of
criminals that, unfortunately, in many situations resulted in using ethnicity as one of (if not THE) main
features [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Among the possible solutions to deal with this problem, much room has been dedicated to
explainable or reasoning techniques. They are particularly suitable given their interpretable nature, and
the capability of experts to evaluate the quality of the model. There are, on the other hand, techniques
for improving ML models based on data perturbation or randomization. Perturbing is the process of
(pseudo-)randomly varying values of instances to evaluate how the model behaves after perturbation. In
the context of dataset evaluation, we exploit data perturbation to understand whether slight changes in
the instance (most likely in the suspected biased features) lead to significant changes in the prediction.
For instance, one of the basic questions we want to be able to answer is ’What happens if I change
the ethnicity of the instance?’. Perturbating values can be done in many ways but, in order to create
instances that make sense, we should know all the possible domains for the feature, so we know that
’caucasian’ is a possible value to test while ’42’ is not. For this purpose, we work under the assumption
that data is based on ontology (or schema). This is twofold: (i) we know exactly the domains of values
for each property, and (ii) we have a strategy to measure instances of the same class. Without this
assumption, an ontology alignment phase would be necessary.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <p>
        In traditional ML settings, bias is statistically detected [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Yet these kinds of analyses are time-consuming
in large scenarios and do not always provide evidence of the consequences of the biases in the decision
process. The most common causes of biases are dataset imbalance and label errors [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Statistical
analysis can mitigate this problem but, still, there is no measure for computing bias to the best of our
knowledge. Given the subtle nature of biases, it is quite common to prefer interpretable models over
others. For the sake of interpretability, decision trees are still one of the most frequent solutions to
recognize and solve the decision processes. Going further in this direction, Logic in AI is increasingly
mixing up with concerns like machine ethics and fairness.
      </p>
      <p>
        Here we are interested in measurement bias [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], i.e. when improper features become too selective for
the classification. In [
        <xref ref-type="bibr" rid="ref5 ref6 ref7">5, 6, 7</xref>
        ] a regularization-based technique has been proposed to mitigate unfairness.
One of the most common solutions is to train multiple classifiers (ensembled) [
        <xref ref-type="bibr" rid="ref10 ref8 ref9">8, 9, 10</xref>
        ]. Zafar et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
provided a mathematical interpretation of bias and developed a constraint-based solution to find the
optimal classifier. Kamirant et al. [] exploited the information available about the possible values for
every feature. In this way, they developed a preferential sampling that projects instances into a space in
which it becomes evident those that are more susceptible to bias. These statistical approaches do not
rely on any ontological knowledge, meaning that quite often the discriminative features need to be
recognised. Statistical techniques, although highly specialized in tackling the problem, barely cannot
provide a measure of imbalance, that is generally recognised as the imbalance in the dataset.
      </p>
      <p>
        More related to symbolic solutions, Adams et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] designed a Fuzzy Logic-based model capable of
outperforming black box technologies for financial support with the introduction of regulations and
fair outcomes. Fuzzy Logic was introduced by Zadeh [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] as an extension of predicate logic for the
eficient management and computation of fuzzy concepts (like “fairness”) and fuzzy rules (bureaucracy
in general).
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>In this section, we briefly describe our methodology for data perturbation in the context of
ontologybased data. In this context, we define an ontology  = (, , ℛ) where  is a set of classes,
each of them uniquely identified by its name,  a named function  :  ×  from entities to a
generic domain . Without losing generality, we suppose  can only be one of these domains
{, , , }.  represents a set of number withing a range. 
is a set of defined values like  but ordered like Likert scales.  expresses all the possible
properties that can be defined for instances. If a property exists for a class at the ontological level, it
means that it might be present also in instances. On the other hand, if a property is present for instances,
it must exists also in the ontological level. ℛ : →−  is the set of relationships between entities. In this
work, we do not take them into account. Among the relationships, isA the relationship expressing the
subclass ontological concept. This will be useful since instances can be compared only if they belong to
the same entity or one is subclass of the other. We assume every instance belongs to exactly one class
and that, for each class, there exists at most one isA relationship, i.e. single inheritance.</p>
      <sec id="sec-3-1">
        <title>3.1. Feature Similarity</title>
        <p>Given the above setting, we can now define how to measure diferences between features of instances.
Features are the peculiar characteristics of instances, described in the form of the above-mentioned
properties. Remind that the domain  can be any of {, , , }, and so a
distance metric must be defined for each of them. Specifically, we propose the following ones:
• Int: given ,  ∈ N,  ≤ , a range ℐ = [, ] and ,  ∈ ℐ,
0 ≤ (, ) = 2 · ⌈ log2 ||⌉ · |  − |
||
≤ 2 · ⌈ log2 ||⌉
• Date: given ,  two dates expressed as yyyy/mm/dd, (, ) is equal to the ordinary
number of days separating the two.
• Categorical: given a set of labels ℒ and ,  ∈ ℒ,
(, ) =
{︃1, if  ̸=</p>
        <p>0, otherwise
Note that with |ℒ| = 2 (boolean case), we can map this case onto the  with ℐ = {0, 1} and
we have (0, 1) = 2·⌈ log2 |∈|⌉·| 0− 1| = 1 as expected.</p>
        <p>2
• Ordered: given a set of labels ℒ and a bijective function  : ℒ→− { 0, 1, ..., || − 1} specifying
the order and ,  ∈ ℒ,</p>
        <p>0 ≤ (, ) = |() − ()| ≤ |ℒ|
• String: given an alphabet Σ and ,  ∈ Σ * ,</p>
        <p>
          0 ≤ (, ) = (, ) ≤ {||, ||}
where (, ) is the Damerau-Levenshtein distance [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], particularly useful and suited for
spell-checking.
        </p>
        <p>In principle, an ontology designer may recognize which are the most prominent features to define
similarity between instances of the same class. For instance, the features name and surname for Person
will be much more useful than age or title. Suppose for each property  ∈  we have a weight function
 : →− , the overall distance between two instances 1, 2 is
(1, 2) = ∑︀∈1∩2 ((1), (2)) · ()</p>
        <p>∑︀∈1∩2 ()
where  represents the set of properties available for instance , and () the value of property  for
instance .</p>
        <p>Unfortunately, it is not always the case for weighting properties for practical reasons given the time
required in specifying all the weights, but also for the specificity of the domain knowledge required.
For these reasons, it is much more convenient to identify a generic strategy based on types. Like the
example about name and surname, it is intuitive to assume that  types are much more relevant
in similarity computation. This is because the free text provides (in general) more specific information,
and their equality/inequality provides quite often a good guess of how similar two instances are. If you
reason with generic real-world instances you may think about people (often identified by name and
surname), objects (which have name), places (which have names and addresses) and so on. For this
reason, we prioritize similarities between strings over the others. Next, we claim that dates are stable,
reflecting the date for which events happened. Apart from errors, it can be referred to as a meaningful
feature for understanding similarity. Finally, integer values are less stable than categories, in the sense
that the number of times for which something happened is subject to changes over time. Following the
(1, 2) =</p>
        <p>∑︀∈   · |  ∈ 1 ∩ 2,  () = |</p>
        <p>Given the interpretable and ontology-based nature of the problem, we implemented the procedure
for distance computation in Prolog. Listing 1 shows the main computation.</p>
        <p>∑︀∈   ∑︀∈1∩2, ()= ((1), (2))
above-mentioned idea of weighting, we now assign the same weight to all the features of the same type.
In this case, it is much easier for an ontology designer to weigh only types, which should be a few in
principle. Naming  ,  ,  ,  ,   the weights of types (respectively) ,
, , , , and  = {, , , , } the formula
will be:</p>
        <p>Listing 1: Prolog Code for an example of distance computation
1 node_distance(N1, N2, AlfaInt, AlfaDate, AlfaCategorical, AlfaOrdered, AlfaString
, Properties, Distance)
:2 findall(P1, property(N1, P1, _), Properties1),
3 findall(P2, property(N2, P2, _), Properties2),
4 findall(P, (member(P, Properties1), member(P, Properties2)), Properties),
5 findall(P, (member(P, Properties), type(P, int)), IntProperties),
6 findall(P, (member(P, Properties), type(P, date)), DateProperties),
7 findall(P, (member(P, Properties), type(P, categorical)),</p>
        <p>CategoricalProperties),
8 findall(P, (member(P, Properties), type(P, ordered)), OrderedProperties),
9 findall(P, (member(P, Properties), type(P, string)), StringProperties),
10 node_int_distance(N1, N2, IntProperties, 0, DInt),
11 node_date_distance(N1, N2, DateProperties, 0, DDate),
12 node_categorical_distance(N1, N2, CategoricalProperties, 0, DCategorical)
,
13 node_ordered_distance(N1, N2, OrderedProperties, 0, DOrdered),
14 node_string_distance(N1, N2, StringProperties, 0, DString),
15 Numerator is AlfaInt * DInt + AlfaDate * DDate + AlfaCategorical *</p>
        <p>DCategorical + AlfaOrdered * DOrdered + AlfaString * DString,
16 length(IntProperties, LInt),
17 length(DateProperties, LDate),
18 length(CategoricalProperties, LCategorical),
19 length(OrderedProperties, LOrdered),
20 length(StringProperties, LString),
21 Divisor is LInt * AlfaInt + LDate * AlfaDate + LCategorical *</p>
        <p>AlfaCategorical + LOrdered * AlfaOrdered + LString * AlfaString,</p>
        <p>Distance is Numerator / Divisor.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Dataset Evaluation</title>
      <p>Given these instruments, we can perform some analysis of datasets, in order to understand distances
among instances that are diferently classified by some ML algorithms. Specifically, we mention two
possible analyses: distance-based and generative. While the first gives a glance at the relevant distance
for an instance to change the label, the latter allows a deeper understanding of every feature in the
dataset, providing a measure to verify how variations afect the result.</p>
      <p>Distance-based In this analysis, we group instances of the test set according to how they have been
classified and find the pair of instances belonging to distinct classifications that minimize the distance.
Generative In this analysis, we take a subset of instances equally classified and, by varying one of its
features up to certain distance thresholds, we verify the distance for which the classification changes
based only on one feature. After the analysis of one feature, pairs can be considered, followed by triples
and so on.</p>
      <sec id="sec-4-1">
        <title>4.1. First Results with Credit Cards Approval</title>
        <p>
          A first analysis has been conducted on the Credit Approval dataset 1. The dataset contains 690 instances
and is composed of 15 features, of types Int, Real and Categorical. Some features regard gender and
ethnicity, features that in principle should not be taken into account when deciding credit card approval
The goal is to classify whether people At first, we performed a classification based on an interpretable
model. Interpretability provided us with an easier way to determine which features to test first. We
performed classification tasks with Decision Tree [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] and traditional split between train (80%) and
test (20%). Following decision rules, we detected that some relevant rules are governed by the “gender”
or “ethnicity” of the person, features that should not be included in the classification process. For this
reason, we generated instances distant 1 for the two features and reclassified all instances in the test
set. Experiments showed that more than 5% of the people in the test set were classified diferently only
changing gender or ethnicity. Specifically, 3% of the population changed classification based on gender,
and the 2% on ethnicity, and the intersection is empty. Experiments have been conducted on an Intel(R)
Core(TM) i7-1065G7 CPU single-core, 16GB of RAM processor. Each node has been compared with all
the others, so the number of performed comparisons was ∑︀− 1
=1  ≃ 4000 given  = 690. The overall
time execution was about 4 minutes, which shows that about 16 distances per second are computed.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>In this work, we proposed a novel method to evaluate whether there are potential biases in the dataset
exploiting a generative distance-based strategy, that can be applied to ontology-guided data. Part of the
generative implementation is still ongoing and new experiments need to be conducted. Future works
regard applying ML to automatically select suspected biased features in the training process.
1https://archive.ics.uci.edu/dataset/27/credit+approval</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>d</surname>
          </string-name>
          . V. dos Santos Júnior,
          <string-name>
            <given-names>J. V. V.</given-names>
            <surname>Coelho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. A. A.</given-names>
            <surname>Cacho</surname>
          </string-name>
          , D. S. A.
          <string-name>
            <surname>de Araújo</surname>
          </string-name>
          ,
          <article-title>A criminal macrocause classification model: An enhancement for violent crime analysis considering an unbalanced dataset</article-title>
          ,
          <source>Expert Systems with Applications</source>
          <volume>238</volume>
          (
          <year>2024</year>
          )
          <fpage>121702</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T. G.</given-names>
            <surname>Dietterich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. B.</given-names>
            <surname>Kong</surname>
          </string-name>
          ,
          <article-title>Machine learning bias, statistical bias, and statistical variance of decision tree algorithms (</article-title>
          <year>1995</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Chakraborty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumder</surname>
          </string-name>
          , T. Menzies,
          <article-title>Bias in machine learning software: Why? how? what to do?</article-title>
          ,
          <source>in: Proceedings of the 29th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>429</fpage>
          -
          <lpage>440</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>N.</given-names>
            <surname>Mehrabi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Morstatter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Saxena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lerman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Galstyan</surname>
          </string-name>
          ,
          <article-title>A survey on bias and fairness in machine learning</article-title>
          ,
          <source>ACM computing surveys (CSUR) 54</source>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>35</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kamishima</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Akaho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Asoh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sakuma</surname>
          </string-name>
          ,
          <article-title>Fairness-aware classifier with prejudice remover regularizer</article-title>
          ,
          <source>in: Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD</source>
          <year>2012</year>
          ,
          <article-title>Bristol</article-title>
          , UK,
          <source>September 24-28</source>
          ,
          <year>2012</year>
          . Proceedings,
          <source>Part II 23</source>
          , Springer,
          <year>2012</year>
          , pp.
          <fpage>35</fpage>
          -
          <lpage>50</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>N.</given-names>
            <surname>Goel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yaghini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Faltings</surname>
          </string-name>
          ,
          <article-title>Non-discriminatory machine learning through convex fairness criteria</article-title>
          ,
          <source>in: Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>116</fpage>
          -
          <lpage>116</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>L.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Vishnoi</surname>
          </string-name>
          ,
          <article-title>Stable and fair classification</article-title>
          ,
          <source>in: International Conference on Machine Learning, PMLR</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>2879</fpage>
          -
          <lpage>2890</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Menon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. C.</given-names>
            <surname>Williamson</surname>
          </string-name>
          ,
          <article-title>The cost of fairness in binary classification</article-title>
          ,
          <source>in: Conference on Fairness, accountability and transparency, PMLR</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>107</fpage>
          -
          <lpage>118</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Hardt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Price</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Srebro</surname>
          </string-name>
          ,
          <article-title>Equality of opportunity in supervised learning</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>29</volume>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>B.</given-names>
            <surname>Ustun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Parkes</surname>
          </string-name>
          ,
          <article-title>Fairness without harm: Decoupled classifiers with preference guarantees</article-title>
          ,
          <source>in: International Conference on Machine Learning, PMLR</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>6373</fpage>
          -
          <lpage>6382</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>M. B. Zafar</surname>
            , I. Valera,
            <given-names>M. G.</given-names>
          </string-name>
          <string-name>
            <surname>Rogriguez</surname>
            ,
            <given-names>K. P.</given-names>
          </string-name>
          <string-name>
            <surname>Gummadi</surname>
          </string-name>
          ,
          <article-title>Fairness constraints: Mechanisms for fair classification</article-title>
          ,
          <source>in: Artificial intelligence and statistics</source>
          , PMLR,
          <year>2017</year>
          , pp.
          <fpage>962</fpage>
          -
          <lpage>970</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Adams</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hagras</surname>
          </string-name>
          ,
          <article-title>A type-2 fuzzy logic approach to explainable ai for regulatory compliance, fair customer outcomes and market stability in the global financial sector, in: 2020 IEEE international conference on fuzzy systems (FUZZ-IEEE)</article-title>
          , IEEE,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>L. A.</given-names>
            <surname>Zadeh</surname>
          </string-name>
          ,
          <article-title>Knowledge representation in fuzzy logic, in: An introduction to fuzzy logic applications in intelligent systems</article-title>
          , Springer,
          <year>1992</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>25</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sahni</surname>
          </string-name>
          ,
          <article-title>String correction using the damerau-levenshtein distance</article-title>
          ,
          <source>BMC bioinformatics 20</source>
          (
          <year>2019</year>
          )
          <fpage>1</fpage>
          -
          <lpage>28</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>B. De Ville</surname>
          </string-name>
          ,
          <article-title>Decision trees</article-title>
          ,
          <source>Wiley Interdisciplinary Reviews: Computational Statistics</source>
          <volume>5</volume>
          (
          <year>2013</year>
          )
          <fpage>448</fpage>
          -
          <lpage>455</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>