<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Using Big Data Classification and Mining for the Decision-making 2.0 Process</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rhizlane Seltani</string-name>
          <email>sel.rhizlane@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Souad Amjad</string-name>
          <email>amjad_souad@uae.ma</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Noura Aknin</string-name>
          <email>aknin@ieee.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kamal Eddine El Kadiri</string-name>
          <email>elkadiri@uae.ma</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>A. Definition</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science, Operational Research and Applied Statistics Laboratory Faculty of Science, Abdelmalek Essaadi University Tetuan</institution>
          ,
          <country country="MA">Morocco</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <fpage>29</fpage>
      <lpage>33</lpage>
      <abstract>
        <p>-Web 2.0 is a revolution that has affected all areas, especially those of the new technology. Several new concepts have emerged, and a large number of innovative applications continue to come out every day. However, the social networking remains the racehorse of web 2.0, giving the user at the same time, a space for communication and for information sharing, which generates too much data, variable and characterized by a great creation speed. So, we can call them big data, and consider them a very rich and interesting basis for decision-making. Big Data is a type of data which are characterized by the veracity, important volumes, and increasing variety and velocity, which makes their treatment and their processing by traditional database management tools a very difficult task. To overcome this problem, we opt for the big data classification process. In this paper, we make a study of some big data classification methods, which are the most significant to be used to classify big data dedicated to decision-making, we detect their points of strength and weakness. Then we propose a framework summarizing the process of the formulation of the decision from the web 2.0 content, based on the big data classification, and we specify the criteria to be taken into account when choosing the big data classification methods intended for the decision-making.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Keywords—Web 2.0; Big Data; Decision-making; Data
Classification</p>
    </sec>
    <sec id="sec-2">
      <title>I. INTRODUCTION</title>
      <p>The large variety of applications that appeared after the
emergence of the web 2.0, produce a huge mass of various
and diverse data. This wealth of information is a very
important resource that we want to exploit to enrich our</p>
      <p>The web 2.0 is a combination of technologies, business
plans and social skills, which allow users to create web
content, and to be more involved in the process of the
management of this content. It has brought many creative
concepts and techniques that did not exist before and which
made the electronic life simpler and more enjoyable [1][2].
With the web 2.0, a new era of web use is born. Several
applications have been developed and which have enriched
our lives by allowing more of interactivity and collaboration,
such as blogs and social networks [3].</p>
      <sec id="sec-2-1">
        <title>Architecture and Principals</title>
        <p>Web 2.0 is based on a varied and robust architecture,
founded on the introduction of new principles such as
collaboration and interactivity, and the use of new
applications like web interface design techniques, those of
content syndication, XHTML, URL, etc [4].</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>There are several emerging principles appearance of web 2.0, the most notable: with the</title>
      <p>
</p>
      <p>Collaboration: This is an important aspect which
describes when a user has the opportunity to
contribute in the creation of the web content by
creating its own content.</p>
      <p>Interactivity: one of the introduced principles by the
web 2.0, interactivity is reflected by the interaction
of the user with the web content and with other
users.</p>
      <p>These two principles constitute new trends that have
changed our lives and our way of working, they are the basis
of social networks, blogs, wikis, etc.</p>
    </sec>
    <sec id="sec-4">
      <title>III. BIG DATA</title>
      <p>A.</p>
      <sec id="sec-4-1">
        <title>Definition</title>
        <p>The term big data refers to data sets exchanged by
connected objects in the web, and whose volumes are
important and the variety and the velocity are increased [5].
It is a compilation of data sets which are characterized by
complexity and large volume, so their management and
processing constitute a difficult task if we use traditional
database management tools [6].</p>
        <p>B.</p>
      </sec>
      <sec id="sec-4-2">
        <title>Characteristics</title>
        <p>Compared to other types of data, big data are different
and have some specifications. These differences concern
several facets as the data format, their volume, the time
required for their creation, and their nature.</p>
        <p>The principal features are: Data volume, data velocity,
data variety, and data veracity. We can consider these
elements as the characterizing pillars of big data (Fig. 1.),
and which make their processing and their analysis a special
challenge.</p>
        <p>Data Volume: refers to a very important quantity of
generated information. Data is considered as big
data if their size is very large, so we cannot control
them to make analysis easily.</p>
        <p>Data Variety: This makes analyzing this type of
data a very difficult mission. We have more
different data presentation formats: text, audio,
image, etc.</p>
        <p>Data Velocity: It refers to the speed of creation and
generation of data, which have been increased with
the different new web applications.</p>
        <p>Data Veracity: Data veracity refers to the
anomalies in data. Veracity in data analysis
constitutes the biggest challenge to overcome,
because, veracity of data sources can largely affect
the precision of analyzes.</p>
        <p>IV. BIG DATA CLASSIFICATION FOR DECISION-MAKING</p>
      </sec>
      <sec id="sec-4-3">
        <title>A. Clustering</title>
        <p>Clustering (also called Cluster Analysis), is a task of data
mining, which means the mission of assembling a set of
objects, by the way that, objects which belong to the same
group have more similarities than with those belonging to
others groups. A group is called a cluster. The clustering was
used for the first time in the classification tasks by Cattell in
1943 for personality psychology classification [7]. Many
clustering algorithms exist. Making the choice about which
algorithm we must use, depends on the used cluster models
[8]. Among the most distinctive cluster models, we find:
Centroid models, Distribution models, Group models, and
Connectivity models.</p>
        <p>In addition to its important role in the classification task,
clustering has several advantages, such as the definition of
information relating to the data, which were not revealed
before, as associations, so we can look for new patterns.
Also, clustering provides a logical structure which makes
results read and interpreted easily. But it is not the case, if we
opt for a large scale of clusters, because there are no
definitive methods to determine precisely the suitable
number of clusters.</p>
        <p>
          The decision tree is a technique which we can use for
classification tasks, by creating a model to predict the output
value based on a number of input values [9] [
          <xref ref-type="bibr" rid="ref11">10</xref>
          ]. To use
decision trees for classification, we construct trees starting
by the root of the tree, and subsequently, proceeding down
to its leaves.
        </p>
        <p>
          A classification rule is developed based on example
objects, which are known by their values of a collection of
attributes. Then, the decision tree is expressed in function of
the same attributes [
          <xref ref-type="bibr" rid="ref12">11</xref>
          ]. Decision trees constitute a good
way to well represent decisions. An example of a decision
tree form is shown in the Fig. 2.
        </p>
        <p>Fig. 2. A General Form of a Decision Tree</p>
        <p>The decision trees are characterized by the robustness
and the simplicity of understanding and interpreting. What
is important about decision trees is that we can treat
categorical and numerical data. On the other hand, decision
trees are instable, since a miniature change in the input data
can affect the entire tree, by causing large changes in it.</p>
      </sec>
      <sec id="sec-4-4">
        <title>C. Support Vector Machines</title>
        <p>
          Support vector machines, more usually SVMs, were
introduced the first time for binary classification. They refer
to a collection of methods used for regression
and classification, to analyze data in order to verify to which
category an element belongs [
          <xref ref-type="bibr" rid="ref13">12</xref>
          ]. They can be used in
several ways depending on the nature of their application,
such as, text categorization, recognition of images,
handwriting code, bioinformatics, etc.
        </p>
        <p>
          Some of the advantages of using SVM algorithms are:
the robustness, the ability to learn well using a few
parameters, and the computational efficiency. On the other
hand, apply SVM can at times require taking into
consideration many aspects of learning methods [
          <xref ref-type="bibr" rid="ref14">13</xref>
          ], SVM
is oriented to be applicable directly in the case of two-class
tasks. For that reason, when we deal with a multi-class task,
we must use algorithms that can reduce it to a set of binary
problems, or take account of all the classes at once by giving
one formulation of optimization for all the data. Different
methods of treating multi-class support vector machines
continue to emerge [
          <xref ref-type="bibr" rid="ref15">14</xref>
          ].
        </p>
      </sec>
      <sec id="sec-4-5">
        <title>D. Associative Classification</title>
        <p>
          Associative classification refers to a classification which
is based on the use of association rules, by combining both
classification and mining of associations [
          <xref ref-type="bibr" rid="ref16">15</xref>
          ] [
          <xref ref-type="bibr" rid="ref17">16</xref>
          ].
Compared to other approaches, it is considered a highly
accurate and competitive method, and can be applied in
different ways [
          <xref ref-type="bibr" rid="ref18">17</xref>
          ] [
          <xref ref-type="bibr" rid="ref19">18</xref>
          ] [
          <xref ref-type="bibr" rid="ref20">19</xref>
          ] [
          <xref ref-type="bibr" rid="ref21">20</xref>
          ]. We can define three types
of associative classification systems:
Classification by Emerging Patterns: based on
emerging patterns from a sample, which means
event associations whose supports vary, depending
on the dataset [
          <xref ref-type="bibr" rid="ref22">21</xref>
          ].
        </p>
        <p>
          Classification based on High-Order Pattern: is a
classification system, which uses the algorithm of
high-order pattern discovery, which detects
considerable connection or association patterns
using residual analysis in statistics [
          <xref ref-type="bibr" rid="ref23">22</xref>
          ].
        </p>
        <p>
          Associative Classifiers based on the Apriori
Algorithm: the Apriori Algorithm is an algorithm
which proceeds by determining the prevalent items
in the database. So, we can define association
rules to wrap up trends in the database, many
applications in various domains were done using
this technique, such as market basket analysis [
          <xref ref-type="bibr" rid="ref24">23</xref>
          ].
        </p>
        <p>Associative classification provides a high accuracy and
it is easy to understand. However, it presents some
challenges, like the lack of obvious criteria to classify
objects. Since it is based on a large number of rules, the
process of its elaboration is a time-consuming task, and it
becomes a difficult task to select the suitable ones to
develop the classifier.</p>
        <p>V. BIG DATA CLASSIFICATION AS A BASIS OF
DECISION</p>
        <p>MAKING 2.0</p>
      </sec>
      <sec id="sec-4-6">
        <title>The Data Generation Process</title>
        <p>Web 2.0 is a very important source of information. The
user interacts continuously with the web content through
collaborative applications, such as blogs, social networks,
etc. With the increase of the number of actors on the web,
the rate of information circulating on its channels increases.
This large data flow generates the phenomenon of big data.
Hence, web 2.0 is a rich platform of information, which can
be treated to generate significant data. The user is primarily
a passive actor, becomes in an instant an active actor, by
transmitting opinions, which we propose to treat to ensure
the mission of decision-making. These opinions can take,
for example, the form of:</p>
        <p>A solution to a particular problem: a problem can be
solved quickly and efficiently if the process of the
generation of the solution is collaborative. So the
reviews, including those of experts, about an issue
may be of great use to make decisions to solve a
given problem.</p>
        <p>A feedback to a given subject: any feedback
contains in itself a notice that we can use to extract
useful information which enriches the process of the
decision making.</p>
        <p>A proposal for improvement: in any field,
application, or system, we always look for ways of
improvement, especially in the case of business.
Opinions of clients and in particular those which are

the most affected by the service, constitute a very
important resource of inspiration to make the right
decision of improvement.</p>
        <p>A complaint about a process, a product, a service: as
with proposals for improvement, complaints also
lead to the generation of significant decisions about
a product, a process, a service, etc.</p>
      </sec>
      <sec id="sec-4-7">
        <title>Decision-Making 2.0 Based Big Data Classification</title>
      </sec>
      <sec id="sec-4-8">
        <title>Model</title>
        <p>To exploit the generated data on the web 2.0, it is
necessary to isolate the significant information. Circulating
data through the web 2.0 applications such as social
networks have the characteristics that make them a part of
what is called big data. To process them, we proposed to
adopt a classification process.</p>
        <p>When we want to treat data based on the web 2.0
content, in order to make decisions. A simple comment or
tweet can generate a large data stream, through feedbacks of
users. Taking account of these data in decision-making is
very important to harness the collective intelligence.</p>
        <p>After a preliminary process of data streams, to centralize
those that meet our study needs, comes the classification
phase to derive classified data according to specific
parameters that depend on the issue in question. Finally, we
get the basis of decision-making. The framework which
presents the general process starting with the creation of the
data on the web and ending with the decision-making is
represented in the Fig. 3.</p>
        <p>In the decision-making 2.0 process, the classification
serves as a passage from the raw data to the classified ones,
which will be used later to generate decisions. Data which
circulate across the web, especially in social networks,
blogs, etc, are difficult to track and manage. So to overcome
this problem, our classification process should follow some
specifications to properly carry out this mission.</p>
        <p>Taking into consideration our aim, which is
decisionmaking based on the content reflected by the comments and
the feedbacks of users, and to provide relevant decision,
which must be generated based on meaningful data, our
classification process must be efficient and suits our
purpose.</p>
        <p>As already mentioned, the classification methods have
drawbacks as advantages. That is why, we opt for a
combination, to elaborate a multiple classification model to
exploit the strengths of the cited methods, taking into
account different parameters, as shown in the Fig. 4.


</p>
        <p>Accuracy: the classification process must guarantee
high accuracy, to ensure the relevance of our
decisions, which is a very important factor for the
evaluation of the quality of the decision.</p>
        <p>Facility of understanding: it is essential that
classification must be a process that provides results
which are easy to understand. It means also, that
results must be interpreted without difficulties.</p>
        <p>Flexibility: flexibility is represented by the fact that
the classification can take into consideration
categorical data, and not just the numerical ones, for
more significant and common decisions.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>VI. CONCLUSION</title>
      <p>In this paper, we gave a vision on the results of a
developed study of the big data classification tools, we
presented a summary of the results concerning the
techniques that we can use to treat data coming from web
2.0, to ensure the decision-making mission. Then, we
presented a general framework of the entire process and
mentioned the criteria to take into consideration when
choosing the classification method.</p>
      <p>To exploit the strengths of the cited methods, we opt for
a combination, to develop a multiple classification model,
so that we can ensure three pillars of big data classification
for a decision-making 2.0 process, which are accuracy,
facility of understanding and flexibility.</p>
    </sec>
    <sec id="sec-6">
      <title>ACKNOWLEDGMENT</title>
      <p>The authors of this paper would like to thank our
Research Team, Information Technology and Modeling
Systems Research Unit, and more generally, the Computer
Science, Operational Research and Applied Statistics
Laboratory, from the Faculty of Science, Abdelmalek
Essaadi University of Tetuan, Morocco, for their great
support.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>T.</surname>
          </string-name>
          <article-title>O'reilly, “What is Web 2.0: Design patterns and business models for the next generation of software</article-title>
          .”
          <source>Communications &amp; strategies, (1)</source>
          ,
          <fpage>17</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>T. O'Reilly</surname>
            ,
            <given-names>and J.</given-names>
          </string-name>
          <string-name>
            <surname>Musser</surname>
          </string-name>
          ,
          <article-title>Web 2.0 principles and best practices</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>O'Reilly Radar</surname>
          </string-name>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>R.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Novak</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Tomkins</surname>
          </string-name>
          , “
          <article-title>Structure and evolution of online social networks.” In Link mining: models, algorithms</article-title>
          , and applications, Springer New York 2010, pp.
          <fpage>337</fpage>
          -
          <lpage>357</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>T. O'Reilly</surname>
          </string-name>
          ,
          <source>What is web 2</source>
          .0.
          <string-name>
            <given-names>O</given-names>
            <surname>'Reilly Media</surname>
          </string-name>
          , Inc,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>P.</given-names>
            <surname>Zikopoulos</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Eaton</surname>
          </string-name>
          ,
          <article-title>Understanding big data. Analytics for enterprise class hadoop and streaming data</article-title>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>E.</given-names>
            <surname>Letouzé</surname>
          </string-name>
          , “
          <article-title>Big data for development: challenges &amp; opportunities”</article-title>
          .
          <source>UN Global Pulse</source>
          ,
          <volume>47</volume>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>R. B. Cattell</surname>
          </string-name>
          ,
          <article-title>"The description of personality: basic traits resolved into clusters</article-title>
          .
          <source>" Journal of Abnormal and Social Psychology</source>
          <volume>38</volume>
          :
          <fpage>476</fpage>
          -
          <lpage>506</lpage>
          ,
          <year>1943</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>V.</given-names>
            <surname>Estivill-Castro</surname>
          </string-name>
          ,
          <article-title>"Why so many clustering algorithms - a position paper."</article-title>
          <source>ACM SIGKDD Explorations Newsletter</source>
          <volume>4</volume>
          (
          <issue>1</issue>
          ):
          <fpage>65</fpage>
          -
          <lpage>75</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>L.</given-names>
            <surname>Rokach</surname>
          </string-name>
          ,
          <article-title>Data mining with decision trees: theory and applications</article-title>
          .
          <source>World Scientific Pub Co Inc. ISBN 978-9812771711</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S. B.</given-names>
            <surname>Kotsiantis</surname>
          </string-name>
          , “
          <article-title>Decision trees: a recent overview</article-title>
          .”
          <source>Artificial Intelligence Review</source>
          ,
          <volume>39</volume>
          (
          <issue>4</issue>
          ),
          <fpage>261</fpage>
          -
          <lpage>283</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Quinlan</surname>
          </string-name>
          , “
          <article-title>Induction of decision trees</article-title>
          .
          <source>” Machine learning</source>
          ,
          <volume>1</volume>
          (
          <issue>1</issue>
          ),
          <fpage>81</fpage>
          -
          <lpage>106</lpage>
          ,
          <year>1986</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>V. N.</given-names>
            <surname>Vapnik</surname>
          </string-name>
          ,
          <article-title>The nature of statistical learning</article-title>
          . Springer-Verlag New York,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>I.</given-names>
            <surname>Steinwart</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Christmann</surname>
          </string-name>
          , Support vector machines.
          <source>Springer Science &amp; Business Media</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>C. W.</given-names>
            <surname>Hsu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Lin</surname>
          </string-name>
          , “
          <article-title>A comparison of methods for multiclass support vector machines.” Neural Networks</article-title>
          , IEEE Transactions on,
          <volume>13</volume>
          (
          <issue>2</issue>
          ),
          <fpage>415</fpage>
          -
          <lpage>425</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <surname>A. K. C. Wong</surname>
          </string-name>
          , “
          <article-title>From association to classification: Inference using weight of evidence</article-title>
          .”
          <source>IEEE Trans. On Knowledge and Data Engineering</source>
          ,
          <volume>15</volume>
          (
          <issue>3</issue>
          ):
          <fpage>764</fpage>
          -
          <lpage>767</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>X.</given-names>
            <surname>Yin</surname>
          </string-name>
          , and J. Han, “
          <article-title>CPAR: Classification based on predictive association rules</article-title>
          .”
          <source>In Proceedings 2003 SIAM International Conference on Data Mining(SDM'03)</source>
          , San Francisco, CA, May
          <year>2003</year>
          , pp.
          <fpage>331</fpage>
          -
          <lpage>335</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>G.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , L. Wong, and
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          , “
          <article-title>CAEP:classification by aggregating emerging patterns</article-title>
          .”
          <source>In Proceedings of The Second International Conference on Discovery Science (DS'99)</source>
          , pp.
          <fpage>43</fpage>
          -
          <lpage>55</lpage>
          , Japan,
          <year>December 1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ramamohanarao</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Wong</surname>
          </string-name>
          , “
          <article-title>DeEPS: a new instance-based lazy discovery and classification system</article-title>
          .
          <source>” Machine Learning</source>
          ,
          <volume>54</volume>
          (
          <issue>2</issue>
          ):
          <fpage>99</fpage>
          -
          <lpage>124</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          , J. Han, and
          <string-name>
            <given-names>J.</given-names>
            <surname>Pei</surname>
          </string-name>
          , “
          <article-title>CMAR: accurate and efficient classification based on multiple class-association rules</article-title>
          .”
          <source>In Proceedings of The 2001 IEEE International Conference on Data Mining (ICDM'01)</source>
          , pp.
          <fpage>369</fpage>
          -
          <lpage>376</lpage>
          , San Jose, CA,
          <year>November 2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>B.L.W.H.Y.</given-names>
            <surname>Ma</surname>
          </string-name>
          , “
          <article-title>Integrating classification and association rule mining</article-title>
          .”
          <source>In Proceedings of the Fourth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>
          , pp.
          <fpage>80</fpage>
          -
          <lpage>86</lpage>
          , New York, NY,
          <year>August 1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>G.</given-names>
            <surname>Dong</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          . “
          <article-title>Efficient mining of emerging patterns: discovering trends and differences</article-title>
          .” In S. Chaudhui and D. Madigan, editors,
          <source>Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>
          , pp.
          <fpage>43</fpage>
          -
          <lpage>52</lpage>
          . ACM Press, San Diego, CA,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>High-order pattern discovery and analysis of discretevalued data sets</article-title>
          .
          <source>PhD thesis</source>
          , University of Waterloo, Waterloo, Ontario, Canada,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>R.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Srikant</surname>
          </string-name>
          , “
          <article-title>Fast algorithms for mining association rules</article-title>
          .”
          <source>In Proc. 20th int. conf. very large data bases</source>
          ,
          <source>VLDB</source>
          (Vol.
          <volume>1215</volume>
          , pp.
          <fpage>487</fpage>
          -
          <lpage>499</lpage>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>