<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Learning Preferences for Collaboration</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Eva Armengol</string-name>
          <email>eva@iiia.csic.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Arti cial Intelligence Research Institute (IIIA - CSIC)</institution>
          ,
          <addr-line>Campus de la UAB, 08193 Bellaterra, Catalonia</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper we propose the acquisition of a set of preferences of collaboration between classi ers based on decision trees. A classi er uses a well-known algorithm (k-NN with leaf-one-out) on its own knowledge base to generate a set of tuples with information about the object to be classi ed, the number of similar precedents, the maximum similarity, and about if it is a situation of collaboration or not. We considered that a classi er does not collaborate when it is able to reach by itself the correct classi cation for an object, otherwise it has to collaborate. The mentioned set of tuples is given as input to generate a decision tree from which a set of collaboration preferences is obtained.</p>
      </abstract>
      <kwd-group>
        <kwd>Machine learning</kwd>
        <kwd>Classi cation</kwd>
        <kwd>Learning preferences</kwd>
        <kwd>Collaboration</kwd>
        <kwd>Decision trees</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        In Machine Learning the idea of cooperation between entities appears with the
formation of ensembles. An ensemble is composed of several classi ers (using
inductive learning methods), each one of them being capable of completely solving
a problem. Since classi ers can provide di erent solutions for the same problem,
the key issue of ensembles is how to aggregate the solutions proposed by the
different classi ers. Perrone and Cooper [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] proved that aggregating the solutions
obtained by independent classi ers improves the accuracy of each classi er on its
own. In that approach the cooperation among entities is done by both sharing
the results for the same problem and reaching an aggregated solution.
      </p>
      <p>
        Plaza and Ontan~on [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] take the idea of ensemble learning and apply it to
multi-agent systems. These authors de ne a commitee as an ensemble of agents
where each agent has its own experience and it is capable of completely solving
new problems. Each agent in a commitee can solve problems but it can also
collaborate with other agents in order to improve its accuracy. The di erence
between this approach and the most common approaches to multi-agent learning
systems (MALS) is that in a commitee each agent is able to completely solve a
problem whereas in MALS approaches each agent solves a part of a problem.
      </p>
      <p>
        Related to the idea of ensemble there is also the idea of meta-learning whose
aim is to construct a classi er from distributed knowledge bases. The idea is
to combine the predictions of an ensemble of classi ers in order to obtain a
global classi er. This global classi er establishes what could be seen as a set
of preferences (since it is not a simple aggregation procedure) to give the nal
classi cation. Prodromidis et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] analyzed meta-learning and give a simpli ed
meta-learning scenario composed of the following phases:
      </p>
      <sec id="sec-1-1">
        <title>1. the base classi ers are trained from the data,</title>
        <p>2. each classi er generates independently a prediction for the data on a separate
test set,
3. a meta-level training set is constructed from the test set and the predictions
generated by each classi er on the test set,
4. the meta-classi er is trained from the meta-level training set.</p>
        <p>
          What we propose in this paper is similar to both, ensembles and
metalearning. As in ensembles, our goal is to solve a new problem and we take the
approach proposed by Plaza and Ontan~on [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], that is to say, each classi er does
not solve a part of a problem (like in the most common ensemble approaches)
but it can solve completely the problem. The metaphor of our approach is the
following: let us suppose that a physician has to diagnose a patient but he has
not enough experience for this. The most usual behavior could be that this
physician asks other colleagues for advice in diagnosing that patient. As long as the
physician interacts with others for solving problems that initially were outside
of his experience, he acquires in turn, experience on this kind of problems and,
consequently, the interaction with other experts will be reduced.
        </p>
        <p>
          In a previous work [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] we implemented this scenario and we proposed that
the agents can take bene t from the collaboration with other agents by learning
domain knowledge. Our point was that if agents are able to justify the solution
they provide, then agents receiving these justi cations could use them as new
domain knowledge (like domain rules). The idea of taking bene t from
cooperation between learners was pointed out by Provost and Hennessy [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. These
authors shown how individual learners can share domain rules in order to build
a common domain theory from distributed data and how this improves the
performance of the whole system.
        </p>
        <p>
          In the current paper we are interested to show how one individual agent can
improve its own domain knowledge from the collaboration with other agents. To
do that we analyze the answer of a question that has to be taken into account
before to start the machinery described in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]: when an agent prefers to ask
for collaboration instead to give the classi cation it has reached using its own
experience? In the previous work we assumed that this collaboration is done
when the classi cation has not enough support. However now we take a closer
look on the agent's own competence and learn situations where the agent prefers
the classi cation it has obtained and when it prefers to ask other agents.
        </p>
        <p>The paper is organized as follows. In Section 2 we present the scenario and
introduce the elements that will be used as input for learning. In Section 3 we
describe the procedure to construct the preference rules for collaboration.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Scenario</title>
      <p>
        Let us suppose a classi er capable to solve problems of a given domain. Domain
objects are described by sets of attribute-value pairs and each object has
associated a class label belonging to a set C = fC1 : : : Cng. We assume that all the
domain objects are described using the same set of attributes. The experience
of the classi er is formed by a knowledge base containing domain objects with
its class label, i.e., hOi; Cj i. Given a problem p to classify, the classi er uses
the k-Nearest Neighbor (k-NN) algorithm [
        <xref ref-type="bibr" rid="ref4 ref5">5, 4</xref>
        ] on the knowledge base to obtain
the class label for p. The k-NN algorithm uses a similarity measure to assess
the similarity between the object p and each one of the domain objects in the
knowledge base. The outcome of k-NN is the set of the k objects most similar
to p.
      </p>
      <p>
        What the classi er knows about its own knowledge are the problems in its
knowledge base. Therefore the knowledge base is the only source from which it
can learn about its own competence. The procedure we propose now is similar to
the one described in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], but here the classi er tries to solve its own problems.
In other words, for each hOi; Cii in its knowledge base, the classi er takes Oi and
uses k-NN for classifying it. Let us suppose that k-NN proposes as classi cation
for Oi the class Cj . In such situation there are three possible scenarios:
1. Cj = Ci, i.e, Oi has been classi ed correctly.
2. Cj 6= Ci, i.e, Oi has been classi ed incorrectly.
3. k-NN proposes more than one class for the object.
      </p>
      <p>This procedure can be done either for all the objects of the knowledge base
(using leave-one-out) or for a selected subset of objects. Each object Oi is a
domain object described by a set of attributes A = fa1; : : : ; ang each one with
a value that may be either numeric or symbolic. For each object Oi in the
knowledge base the classi er generates a tuple as follows:</p>
      <p>hOi:a1; : : : ; Oi:an; Cj ; ; sim; actioni
where the notation Oi:al stands for the value that the object Oi takes in the
attribute al; Cj is the classi cation of Oi using the majority rule; is the number
of examples used by k-NN to reach the solution; sim is the maximum
similarity among Oi and the examples; and action is either collaboration or
nocollaboration. When Oi is solved correctly, the action is no-collaboration
(meaning that the classi er is able to solve correctly the problem Oi with its own
knowledge), otherwise the action is collaboration, (meaning that the classi er is
not able to solve correctly the problem Oi with its own knowledge).</p>
      <p>
        From the set of all these tuples, the classi er constructs a decision tree that
allows to learn preferences about two situations: a) when to collaborate with
other classi ers, and b) when the classi er prefers its own solution. This approach
is similar to the one proposed in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] where the authors use a decision tree to
compute the con dence degree on the classi cation given for an object. In [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
the examples used to construct the decision tree are tuples of three elements,
therefore the tree has as maximum three levels meaning that the leaves have
elements belonging to both classes. In our approach, the tuples we use to
construct the decision three have n + 3 components (the n attributes plus , sim
and action). Also, the preferences we obtain are (or may be) in terms of some
of the attributes describing the objects.
2.1
      </p>
      <sec id="sec-2-1">
        <title>Similarity between objects</title>
        <p>
          The k-NN algorithm uses a similarity measure to retrieve k objects that are the
more similar ones to a given object Oi. The most common similarity measures
used in k-NN when objects are represented as a set of attribute-value pairs and
the values are numerical is the Euclidean distance (although other measures are
also used, see for instance [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]). When the values of the attributes are symbolic,
the most common similarity measure is the following one:
sim(Oi:al; Oj :al) =
1;
0;
if Oi:al = Oj :al
otherwise
Therefore the similarity between objects Oi and Oj is computed as follows:
Pn
        </p>
        <p>l=1 sim(Oi:al; Oj :al)
sim(Oi; Oj ) =
n
in other words, the similarity of both objects is the number of attributes taking
the same value in both objects and normalized by the number of attributes
describing the objects.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Retrieving a subset of similar objects</title>
        <p>Commonly, the k-NN algorithm retrieves k objects similar to a given one.
However what we propose is to retrieve all objects with similarity equal or bigger
than a given threshold of similarity h. When a problem pi is given to the
classier, h is the threshold under which the objects are not considered similar enough
to pi and they are rejected. This means that the number of retrieved objects may
be di erent for each input problem. The closer to 1 h is, the more con dent is
the classi cation. For instance, let us suppose a knowledge base containing the
objects O1; O2; O3 and O4, and the problems p1 and p2 to be classi ed. Table 1
shows the similarity between p1 and p2 and the objects of the knowledge base.
If we take h = 0:80, when solving p1 the classi er retrieves = 2 similar objects
(only O2 and O4 have similarity equal or greater than h); however, when solving
p2 the classi er retrieves = 3 similar objects (only O4 has similarity lower than
h).</p>
        <p>Let P be the set of objects in the knowledge base whose similarity to pi
is greater or equal than h. There are four possible situations concerning the
elements of P:
1. For all object oi 2 P the solution class is the same. If the class is the correct
one, then there is a situation of no collaboration, otherwise the algorithm has
retrieved similar objects but the classi cation proposed is not the correct
one. These cases are specially useful since they belong to a region where
the classes are similar. Therefore, when the problem to be solved belongs to
these regions the classi er has to prefer to collaborate with other classi ers
since its own classi cation may be incorrect.
2. The majority of the objects in P belongs to the same class. In this situation
the classi er has to collaborate when the majority class is not the correct
one. Both sim and give an idea of how strong is this classi cation.
3. There is a tie between two solution classes, i.e., there are two classes with the
same number of elements in P. This is a situation of collaboration because
the classi er has not enough information for classifying pi.
4. The algorithm does not retrieve any object, meaning that there are not
objects in the knowledge base similar enough to the new problem to be
solved. In this situation, the classi er also prefers to collaborate with others
than give its own classi cation based on objects that are not similar enough.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Learning Preferences for Collaboration</title>
      <p>From the procedure described in the previous section, the classi er acquires a set
of tuples describing situations for collaboration/no collaboration. We assume that
only when a problem has been solved correctly, it gives a situation of no
collaboration, otherwise the classi er should collaborate. The tuples can be used to
construct a decision tree with the goal to induce a general model of collaboration.</p>
      <p>As we already have explained in Section 2, the tuples have the form
hOi:a1; : : : ; Oi:an; Cj ; ; sim; actioni:
Notice that, because we assumed that all the objects are described using the
same set of attributes A, all the tuples have the same length. For convenience,
we suppose that there are not attributes with unknown values. However, when
an object has unknown value in an attribute (say Oi:aj ) we can take the option
that the corresponding position of the tuple (i.e., the position j) will hold the
value unknown. In other words, unknown is plays the same role than any other
value. Let us analyze these elements in more detail. The rst n elements of the
tuple, Oi:a1; : : : ; Oi:an, are the values that the object Oi takes in each one of its
attributes.</p>
      <p>The element Cj of the tuple is the class to which belong the majority of
elements in P. Therefore the tuple contains the class to which Oi is classi ed
using the k-NN algorithm with the majority rule. When there is a tie between
two classes then we consider that Cj = ;.</p>
      <p>The element of the tuple is the cardinality of P, i.e., the number of elements
in the knowledge base that have a similarity greater or equal than the given
threshold h. The number is related with the threshold h and gives information
about the knowledge base. For instance, if h has to be low in order to obtain
6= 0, this means that the object Oi is not much similar to any of the elements
in the knowledge base.</p>
      <p>The element sim is the maximum similarity of Oi and the objects of the
knowledge base. Although we give a threshold, it is possible that the object Oi
has the highest similarity with some of the objects in the knowledge base. We
want to take into account this fact, especially when the classi cation has been
incorrect. Notice that these cases (high similarity and incorrect classi cation)
mean that the knowledge base has not enough objects to clearly distinguish the
classes involved. In the example shown in Table 1 taking h = 0:85, the maximum
similarity of the objects retrieved when solving o1 is 0.90 and when solving o2 is
0.91.</p>
      <p>The element action plays the role of class label. It can take two values:
collaborate or no collaborate. As we have already mentioned, the classi er will
prefer to collaborate when the classi cation reached for an object has been either
incorrect or a tie. Also it prefers to collaborate when there are no objects in the
knowledge base similar enough to the problem Oi.</p>
      <p>From the set T of tuples obtained from the procedure described above, we
propose to construct a decision tree to induce rules describing preferences of
collaboration/no collaboration. In the next section we describe in some detail
the process of construction of a decision tree.
3.1</p>
      <sec id="sec-3-1">
        <title>Construction of Decision Trees</title>
        <p>A Decision Tree (DT) is a kind of directed acyclic graph in the form of a tree.
The root of the tree has not incoming edges and the remaining ones have exactly
one incoming edge. Nodes without outgoing edges are called leaf nodes and the
remaining ones are internal nodes. A DT is commonly used to create a domain
model predictive enough to classify future unseen domain objects.</p>
        <p>
          The construction of a decision tree is performed by splitting the source set of
examples. This process is repeated on each derived subset in a recursive manner
called recursive partitioning. Figure 1 shows the algorithm (for more details see
[
          <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
          ]) commonly used to construct decision trees. It is assumed that domain
objects are represented by means of a set of pairs attribute-value. For instance,
a mushroom can be described using the attributes texture, form and, size and each
one of these attributes can take a value. Therefore, the following is a description
of a particular mushroom using a set of attribute-values:
((texture = spots)(form = planar)(size = big))
ID3 (examples, attributes)
create a node
if all examples belong to the same class return class as the label for the node
otherwise
        </p>
        <p>A ← best attribute
for each possible value vi of A
add a new tree branch below node
examplesvi ← subset of examples such that A = vi</p>
        <p>ID3(examplesvi, attributes - {A})
return node</p>
        <p>The values of the attributes may be continuous-valued or categorical. The
description of the mushroom above is categorical (i.e., the values of the attributes are
labels). Examples of continuous-valued attributes are the heigh and the weight of
a person. Each tree node represents an attribute ai selected by some criteria and
each arch is followed according to the value of ai. For instance, Fig. 2 shows an
example classifying mushrooms as eatable or poisonous. Attributes describing a
mushroom are texture, form and, size. The most relevant attribute for classifying
a mushroom is texture since if it is smooth the mushroom can be classi ed as
eatable. Otherwise the node has to be expanded. Next relevant attribute is form
with two possible values: planar corresponding only to poisonous mushrooms;
and round that is a characteristic shared by both classes of mushrooms. Finally,
the attribute size allows a perfect classi cation of all the known mushrooms.</p>
        <p>Each node of a tree has associated a set of examples that are those satisfying
the path from the root to that node. For instance, the node size of the tree shown
in Fig. 2 has associated all the examples having texture = spots and form = round.</p>
        <p>From a decision tree we can extract rules giving descriptions of classes. For
instance, some eatable mushrooms are described by means of the rule:
if texture=spots and form=round and size=small then eatable.</p>
        <p>A key issue of the construction of decision trees is the selection of the most
relevant attribute to split a node. This selection is made by means of a distance
measure. Each measure uses a di erent criteria, therefore the selected attribute
could be di erent depending on it and, thus the whole tree could also be di erent.
The most common measures are based on the degree of impurity of a node. That
is to say, they compute the proportion of examples of each class contained in
a node. The goal is to obtain nodes (the leaves of the tree) having examples of
only one class, that is to say, with impurity zero. Intermediate nodes are more
pure as closer they are to the leaves, meaning that they are able to di erentiate
the classes.</p>
        <p>Impurity measures compare the impurity of a node, say t, with the impurity
of the children nodes t1 : : : tk generated by an attibute ai. This comparison is
done for each one of the attributes used to represent the domain objects. The
texture</p>
        <p>smooth
spots
form</p>
        <p>eatable
round planar</p>
        <p>size poisonous
small big
eatable poisonous
texture = smooth : eatable
texture = spots
form = planar : poisonous
form = round
size = small : eatable
size = big : poisonous
general expression to calculate the gain
following:
associated to an attribute ai is the
(ai) = I(t)</p>
        <p>Xk N (tj )</p>
        <p>N
j=1</p>
        <p>I(tj )
where I( ) is an impurity measure, N is the total number of examples associated
to the parent node t, k is the number of di erent values taken by ai and N (tj )
is the number of examples associated with the child node tj .
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Example</title>
        <p>
          We have performed some preliminary experiments using the procedure described
in previous sections on the data set Soybean from the UCI Machine Learning
Repository [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. The Soybean dataset contains around 300 domain objects
distributed on 18 solution classes and described by means of 35 categorical
attributes without unknown values.
        </p>
        <p>The rst step of our approach is to generate a model of the classi er
capabilities using the leaf-one-out method. For each object hOi; Cii in the knowledge
base the leave-one-out process is an evaluation technique, commonly used in
machine learning, with the following procedure:</p>
        <sec id="sec-3-2-1">
          <title>1. Take only the description Oi.</title>
          <p>2. Use the classi er to achieve a classi cation for Oi using the remaining objects
of the knowledge base.
3. Let Cj be the classi cation proposed for Oi. If Cj = Ci then the classi cation
for the object Oi is correct; otherwise the classi cation is incorrect.</p>
          <p>In our approach, when an object Oi is classi ed correctly, it is labelled as
belonging to the class no collaboration otherwise it is classi ed as collaboration.
For instance, a tuple generated in this process is the following:
hApril,LT-normal,GT-normal,no-hail,. . . ,normal, 11, 0.88, collaborationi (1)
Max-­‐sim  &lt;=  0.828  </p>
          <p>Max-­‐sim  &gt;  0.828  
Stem=normal  </p>
          <p>Stem=abnormal  
Collabora:on  
Canker_lesion=tan  </p>
          <p>No  collabora:on  </p>
          <p>Temp=  LT-­‐normal  </p>
          <p>No  collabora:on  
Canker_lesion=DNA-­‐lesion  </p>
          <p>Temp=GT-­‐normal  </p>
          <p>No  collabora:on  
Collabora:on  </p>
          <p>Temp=  normal  </p>
          <p>Collabora:on  
where the rst part is composed of 38 values corresponding to the 38 attributes
of the description of the domain objects, and the three last values indicate that:
1) the classi er has based its classi cation on 11 objects of the knowledge base, 2)
the maximum similarity between the new object and the most similar retrieved
object is 0.88, and 3) the classi cation has been incorrect, therefore the classi er
has labelled the object as collaboration.</p>
          <p>
            Since the knowledge base has 300 objects, we have obtained 300 tuples as
the one shown in (1) at the end of the leave-one-out process. These 300 tuples
have been given as input to a decision tree. We used the J48 algorithm [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ]
implemented in Weka [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ] to generate a decision tree. The J48 algorithm is, in
fact, the ID3 algorithm proposed by Quinlan [
            <xref ref-type="bibr" rid="ref15">15</xref>
            ] evolved to be able to deal with
both categorical and continuous attributes without previous discretization.
          </p>
          <p>We have experimented with di erent similarity thresholds. The decision tree
shown in Fig. 3 is the one corresponding to the threshold of similarity h =
0:80. This tree shows that the classi er prefers to collaborate when: 1) the most
similar object is under 0.828 (in fact between 0.80 and 0.82 since h is the lower
threshold given as input) or, 2) when the similarity is higher than 0.828 and the
object to be classi ed has Stem= normal, canker lesion=DNA-lesion and either
Temp=LT normal or Temp=normal. That is, in addition to the similarity, the
description of the object is also taken into account to decide when to collaborate.
4</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Future Work</title>
      <p>
        The work introduced in the current paper opens several interesting lines of future
research. First of all, we plan to integrate a classi er as the one described in this
paper into a system formed by n other classi ers. Each classi er forming the
system is capable to completely solve a problem on a given domain and use
objects described by means of a common representation (i.e., with the same
set of attributes). Moreover, each classi er has a model of its own capability in
solving a problem. The general idea is that when one of the classi ers, say Clk
has to solve a problem p, the rst step is to use the tree of preferences in order to
detect if Clk is capable to solve p using its own knowledge. If the model labels p
as collaboration Clk will ask all other classi ers for collaboration. The easy case
is to assume that all the classi ers propose a class and that the classi cation
for p is the class proposed by the majority of the classi ers. This approach
should be similar to the one used in ensemble learning [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. More complicated
cases could occur when only a few (or any) of the rest of classi ers are able to
propose a classi cation for p. These cases should be analyzed in detail. We also
plan to deeply evaluate a system as the one described above in order to check its
accuracy. In particular, we expect that solving problems in collaboration between
classi ers produces higher accuracy than having only one.
      </p>
      <p>Another interesting issue is that each classi er has a model of the capabilities
of each one of the classi ers of the system. This model could be constructed in the
same way described in the paper and its utility could be twofold: 1) a classi er
should knowns in advance which classi er will most probably give a correct
classi cation and, consequently, 2) it will not be necessary ask all the classi ers
in the system. The second issue is especially interesting when the system is
formed by a high number of classi ers since a ltering like this will reduce the
communication load between the classi ers.</p>
      <p>A third line of research is that each classi er gives, in addition to the
classi cation for p, the con dence in that classi cation. Such con dence could be
obtained from the parameters and sim of the tuple. High values of both
and sim mean that p has been classi ed taking into account many known
examples having all them high similarity with p, therefore the classi cation is highly
con dent. In such situation, the nal classi cation for p could be obtained by
means of a weighted aggregation of the classi cations proposed by the system
classi ers. This same con dence could also be used by a classi er to assess its
own capability in classifying p.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <p>The author thanks Angel Garc a-Cerdan~a, Pilar Dellunde and the anonymous
reviewers their helpful comments and suggestions. The author also acknowledges
support by the Spanish MICINN projects EdeTRI (TIN2012-39348-C02-01) and
COGNITIO (TIN2012-38450-C03-03) and the grant 2014SGR-118 from the
Generalitat de Catalunya.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>E.</given-names>
            <surname>Armengol</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Puertas</surname>
          </string-name>
          .
          <article-title>Learning from cooperation using justi cations</article-title>
          . In M. Polit,
          <string-name>
            <given-names>T.</given-names>
            <surname>Talbert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Lopez</surname>
          </string-name>
          , and J. Melendez, editors,
          <source>Arti cial Intelligence Research and Development</source>
          , pages
          <volume>47</volume>
          {
          <fpage>54</fpage>
          . IOS Press,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>A.</given-names>
            <surname>Asuncion</surname>
          </string-name>
          and
          <string-name>
            <surname>D. Newman.</surname>
          </string-name>
          <article-title>UCI machine learning repository</article-title>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>N.</given-names>
            <surname>Bhargava</surname>
          </string-name>
          , G. Sharma,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bhargava</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Mathuria</surname>
          </string-name>
          .
          <article-title>Decision tree analysis on J48 algorithm for data mining</article-title>
          .
          <source>International Journal of Advanced Research in Computer Science and Software Engineering</source>
          ,
          <volume>3</volume>
          (
          <issue>6</issue>
          ):
          <volume>1114</volume>
          {
          <fpage>1119</fpage>
          ,
          <year>June 2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>B. V.</given-names>
            <surname>Dasarathy</surname>
          </string-name>
          .
          <article-title>Handbook of Data Mining and Knowledge Discovery, chapter Data Mining Tasks and Methods: Classi cation: Nearest-neighbor Approaches</article-title>
          , pages
          <volume>288</volume>
          {
          <fpage>298</fpage>
          . Oxford University Press, Inc., New York, NY, USA,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>D.</given-names>
            <surname>Dasarathy</surname>
          </string-name>
          .
          <article-title>Nearest Neighbor Norms: NN Pattern Classi cation Techniques</article-title>
          . IEEE Press,
          <year>1991</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>M.</given-names>
            <surname>Hall</surname>
          </string-name>
          , E. Frank,
          <string-name>
            <given-names>G.</given-names>
            <surname>Holmes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Pfahringer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Reutemann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and I. H.</given-names>
            <surname>Witten</surname>
          </string-name>
          .
          <article-title>The Weka data mining software: An update</article-title>
          .
          <source>SIGKDD Explorations Newsletter</source>
          ,
          <volume>11</volume>
          (
          <issue>1</issue>
          ):
          <volume>10</volume>
          {
          <fpage>18</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>T. W.</given-names>
            <surname>Liao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Mount</surname>
          </string-name>
          .
          <article-title>Similarity measures for retrieval in case-based reasoning systems</article-title>
          .
          <source>Applied Arti cial Intelligence</source>
          ,
          <volume>12</volume>
          (
          <issue>4</issue>
          ):
          <volume>267</volume>
          {
          <fpage>288</fpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8. S. Ontan~on and
          <string-name>
            <given-names>E.</given-names>
            <surname>Plaza</surname>
          </string-name>
          .
          <article-title>Learning when to collaborate among learning agents</article-title>
          . In L. D. Raedt and P. A. Flach, editors,
          <source>ECML</source>
          , volume
          <volume>2167</volume>
          of Lecture Notes in Computer Science, pages
          <volume>394</volume>
          {
          <fpage>405</fpage>
          . Springer,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>M. P.</given-names>
            <surname>Perrone</surname>
          </string-name>
          and
          <string-name>
            <given-names>L. N.</given-names>
            <surname>Cooper</surname>
          </string-name>
          .
          <article-title>When networks disagree: Ensemble methods for hybrid neural networks</article-title>
          . In R. J. Mammone, editor,
          <source>Neural Networks for Speech and Image Processing</source>
          , pages
          <volume>126</volume>
          {
          <fpage>142</fpage>
          .
          <string-name>
            <surname>Chapman-Hall</surname>
          </string-name>
          ,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10. E. Plaza and
          <string-name>
            <surname>S.</surname>
          </string-name>
          <article-title>Ontan~on. Ensemble case-based reasoning: Colaboration policies for multiagent cooperative CBR. In I. Watson and Q</article-title>
          . Yang, editors,
          <source>CBR Research and Development: ICCBR-2001</source>
          , volume
          <year>2080</year>
          , pages
          <fpage>437</fpage>
          {
          <fpage>451</fpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>A.</given-names>
            <surname>Prodromidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Chan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Stolfo</surname>
          </string-name>
          .
          <article-title>Meta-learning in distributed data mining systems: Issues and approaches</article-title>
          . In H. Kargupta and P. Chan, editors,
          <source>Book on Advances of Distributed Data Mining</source>
          . AAAI press,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>F. J.</given-names>
            <surname>Provost</surname>
          </string-name>
          and
          <string-name>
            <given-names>D. N.</given-names>
            <surname>Hennessy</surname>
          </string-name>
          .
          <article-title>Scaling up: Distributed machine learning with cooperation</article-title>
          .
          <source>In Proceedings of the 13th AAAI/IAAI</source>
          , Volume
          <volume>1</volume>
          , pages
          <fpage>74</fpage>
          {
          <fpage>79</fpage>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>E.</given-names>
            <surname>Puertas</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Armengol</surname>
          </string-name>
          .
          <article-title>Inducing domain theory from problem solving in a multi-agent system</article-title>
          . In J. Vitria,
          <string-name>
            <given-names>P.</given-names>
            <surname>Radeva</surname>
          </string-name>
          , and I. Aguilo, editors,
          <source>Recent Advances in Arti cial Intelligence Research and Development</source>
          , pages
          <volume>325</volume>
          {
          <fpage>332</fpage>
          . IOS Press,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>J. Quinlan</surname>
          </string-name>
          .
          <article-title>Discovering rules by induction from large collection of examples</article-title>
          .
          <source>In Expert Systems in the Microelectronic Age. D. Michie (Ed.)</source>
          , pages
          <fpage>168</fpage>
          {
          <fpage>201</fpage>
          . Edimburg University Press,
          <year>1979</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Quinlan</surname>
          </string-name>
          .
          <article-title>Induction of decision trees</article-title>
          .
          <source>Machine Learning</source>
          ,
          <volume>1</volume>
          (
          <issue>1</issue>
          ):
          <volume>81</volume>
          {
          <fpage>106</fpage>
          ,
          <year>1986</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>