<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Privacy Preserving Association Rule Mining Using Perturbation Technique</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tobi Deborah Omoyele</string-name>
          <email>Omoyeletobi@yahoo.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Solomon Olalekan Akinola</string-name>
          <email>solom202@yahoo.co.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Ibadan</institution>
          ,
          <addr-line>Ibadan</addr-line>
          ,
          <country country="NG">Nigeria</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <fpage>7</fpage>
      <lpage>9</lpage>
      <abstract>
        <p>The information age has enabled organizations to gather large volumes of data. However, the usefulness of this data is negligible if “meaningful information” cannot be extracted from it. Data mining answers this need. The problem of privacy-preserving data mining has become important recently because of the increasing ability to store personal data and the sophistication of data mining algorithms to leverage information. Many researches have been done in this field but few with quantitative data had drawbacks of high number of rules generated and few number of item hidden. This study, proposes a perturbation association rules hiding algorithm for privacy of quantitative data to provide a better algorithm for preserving quantitative data. In hiding of rules, the noise associated with each item was calculated. The noise was used to calculate the support and confidence of rules which were then compared with minimum support and confidence. Item whose support/confidence is less than or equal to minimum support or confidence would be hidden. Experimental result shows that the algorithm hides more rules than the existing works.</p>
      </abstract>
      <kwd-group>
        <kwd>Data Perturbation</kwd>
        <kwd>Data Mining</kwd>
        <kwd>Data Privacy</kwd>
        <kwd>Randomization</kwd>
        <kwd>Association rule mining</kwd>
        <kwd>Summarization</kwd>
        <kwd>Suppression</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CCS Concepts
• Information systems ➝ Information systems applications ➝
Data mining ➝ Association rules • Security and privacy ➝
Human and societal aspects of security and privacy ➝Privacy
protection</p>
      <p>The problem of mining quantitative association rule was first
introduced by Evfimievski et al [7]. Two works have been done in
the field of hiding fuzzy association rule in quantitative data
according to Manoj and Joshi [8] and Berberoglu [9]. However,
little or no work has been done in the aspect of perturbation of
association rules for quantitative data using the concept of noise.
Manoj and Joshi [8] proposed a hiding algorithm that integrates
the fuzzy set concepts and Apriori mining algorithm to find
useful fuzzy association rules from a quantitative database and
then hide them using privacy preserving technique. Unlike
previous approaches which mainly deals with association rules
in binary database, the approach deals with hiding the
association rules in quantitative database. Numerical
experiments were performed to measure the performance of
the algorithm according to three criteria: the number of rules
hidden, side effects and database effects of the algorithm.
However, the algorithm only deals with association rule in small
dataset in which the algorithm generated much rules but the
algorithm is not efficient since a large part of the rules generated
resulted into lost rules.</p>
      <p>In this paper, we attempt to present a perturbation association rules
hiding algorithm for privacy preservation of quantitative data with
few rules and hides higher percentage of the rules by decreasing
the support of the rule. The support of rule is decreased by
decreasing the support of A. This is achieved by decreasing the
support value of either A or B of the rule until either the support or
confidence is below the minimum support or minimum confidence
value respectively.</p>
      <p>The rest of this paper is organized as follows. Privacy preserving
quantitative association rule hiding was defined in Section II, our
approach to hide useful association rules in quantitative data was
presented in Section III, the perturbation association rule hiding
process was presented in Section IV, and the proposed algorithm
related example was presented in Section V. Section VI includes
the conclusion and future works.
2. PROBLEM STATEMENT
The objective of privacy preserving data mining is to hide certain
information so that they cannot be discovered through data mining
techniques. There have been two broad approaches for privacy
preserving data mining. The first approach, called output privacy,
is to alter the data before delivery to data miner so that real data is
obscured and mining result will not disclose certain privacy. For
example, perturbation, blocking, merging, swapping and sampling
are some methods that have been proposed for this type of output
privacy. The second approach, called input privacy, is to
manipulate the data using data distribution methods in which
the privacy of the data is protected before releasing to the user. In
this approach, mining result is not affected or minimally affected.
Almost all of studies that have been proposed in this research area
concentrated on hiding Boolean association rules which are
concerned only with whether an item is present in a
transaction or not, without considering its quantity.</p>
      <p>However, transactions with quantitative values are commonly
found in real world application. For example, in a patient’s blood
test, many attributes could be found. In addition, attribute’s
quantity instead of just presence/absence of the attribute in blood
is more important for determination of the illness. Furthermore,
many people have the problem of sugar but this does not
mean that one is sick or not, the only criterion to used to
determine the illness is the surplus/deficiency in sugar ’s
quantity.</p>
      <p>Some works have been done to discover association rules from
quantitative data which generate set of rules but hides less than
30% of the rules generated. This paper proposes a privacy
preserving data mining algorithm (that uses the output privacy
approach in which the data is preserved before it will be released
to the miner) that improves on the association rule hiding
algorithms for quantitative data by combining the perturbation
method for privacy and association rule mining technique for
privacy. Furthermore, the paper addresses the challenges of
applying privacy algorithm to only Boolean data by applying the
algorithm to quantitative data and not just the presence or absence
of data.
3. PROPOSED ALGORITHM
In order to hide an association rule, A→B we can either decrease
its support to be smaller than minimum support value or its
confidence to be smaller than its minimum confidence value. To
decrease the confidence of a rule, two strategies can be used. The
first one is to increase the support count of A i.e. LHS of the rule,
but not support count of A→B. The second one is to decrease the
support count of A→B. For the second case, if we only decrease
the support of B, the right hand side of the rule, it would reduce
the confidence faster than simply reducing the support of A∪B.
Based on these two strategies, we propose a privacy preserving
data mining algorithm for hiding sensitive quantitative data items
called perturbation of association rule for quantitative data using
the concept of noise. This algorithm first calculates the value of
noise for each data items and the column with the highest noise
value form the rule with the sensitive column. Secondly, the
algorithm find useful association rule that consist of only one
item on both sides of the rule and then hide them using
privacy preserving technique. For hiding purpose, the
algorithm tries to decrease the support of rule A→B by
decreasing the support count of itemset AB until either support or
confidence value of the rule goes below minimum support or
minimum confidence value respectively.</p>
    </sec>
    <sec id="sec-2">
      <title>Input:</title>
      <p>
        (
        <xref ref-type="bibr" rid="ref1">1</xref>
        )A source database D,
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) Minimum support threshold
(
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) Minimum confidence threshold
Output: A transformed database D where rules containing A on
LHS (Left Hand Side) or B at the RHS (Right Hand Side) will be
hidden.
      </p>
      <p>Algorithm
1.</p>
      <p>Compute the support of rule</p>
      <p>(
((
))
)
=
(
)</p>
    </sec>
    <sec id="sec-3">
      <title>Transform the rough dataset(Rd) into original database(D) as Rd→D</title>
    </sec>
    <sec id="sec-4">
      <title>For each itemset XɛD, for .</title>
    </sec>
    <sec id="sec-5">
      <title>Calculate the value of the noisy attribute V such that</title>
      <p>√</p>
    </sec>
    <sec id="sec-6">
      <title>Calculate the sumCount of each noisy attribute</title>
      <p>for
) = Maximum Value such that k =</p>
    </sec>
    <sec id="sec-7">
      <title>If sumCount(</title>
    </sec>
    <sec id="sec-8">
      <title>Maximum Value</title>
    </sec>
    <sec id="sec-9">
      <title>THEN</title>
      <p>1-itemset = { }</p>
    </sec>
    <sec id="sec-10">
      <title>EXIT sumCount</title>
      <p>Sensitive column = {</p>
    </sec>
    <sec id="sec-11">
      <title>1-itemset { }</title>
    </sec>
    <sec id="sec-12">
      <title>Find 2-itemset from the union of Find ={ rules from itemset X} //for X= { } the possible rules are } such that mɛN</title>
      <p>and
STEP 4: join the 1-itemset to the sensitive column in a way
similar to that of apriori algorithm to form a 2- itemset. The
2itemset is use to find the useful association rule by placing at
the LHS and at the RHS similar to that of apriori algorithm
STEP 5 (a): in order to hide sensitive rule, calculate the support
and confidence of each rule
))
(
(
(
)
and
)
)
(
(
)
)
5. ILLUSTRATION OF THE ALGORITHM
In this section, we give an example to illustrate how the proposed
algorithm can be used.</p>
      <p>Let the original data be depicted below:</p>
      <p>)
(
)
(
(
)
)=
( )</p>
      <p>( )
11. if Sup(LHS,RHS) ≤ MST or conf(LHS,RHS) ≤ MCT
= X+</p>
    </sec>
    <sec id="sec-13">
      <title>Else</title>
      <p>=
12. go to line 8
13. select the next rule
14. repeat // line 9,10 and 11 on each new rule
15. if is empty</p>
    </sec>
    <sec id="sec-14">
      <title>THEN 16. transform the updated database D and output the updated D</title>
      <p>3.1 Explanation to Abbreviations in the
Algorithm</p>
    </sec>
    <sec id="sec-15">
      <title>LHS (left hand side): this is the left hand side of the rule</title>
      <p>RHS (right hand side): this is the right hand side of the rule
generated
MST (minimum support threshold): this is the support
specified by the user of the algorithm
MCT (minimum confidence threshold): this is the confidence
threshold specified by the user
min = minimum</p>
      <p>Sup/Supp = support
4. STEPS TO THE PROPOSED
ALGORITHM
n = total number of transaction data
m= total number of attributes (items)
D= attribute 1 ≤ i ≤ n</p>
      <p>= attribute 1 ≤ j ≤ m
V= noise
X= original data</p>
      <p>= inverse of original data
MCT= minimum confidence threshold
MST= minimum support threshold</p>
      <p>= each noisy attribute 1≤ k ≤ i
STEP 1: for each transaction data D, i =1 to n, and for each
attribute (item) j= 1 to m, transform the quantitative value into a
noisy quantitative attribute value using the randomly generated
formula</p>
      <p>√
For X =</p>
      <p>=</p>
    </sec>
    <sec id="sec-16">
      <title>STEP 2: calculate the sumcount of each noisy attribute transaction data as</title>
      <p>= ∑
on the
STEP 3: for each noisy attribute 1≤j≤m and 1≤k≤m check for
the that has the maximum value. If satisfies the
above condition then the column whose has the maximum
value is put in the set of 1- itemset which form the left hand side
(LHS) of the rule
i.e ={ }</p>
      <p>T1
T2
T3
T4
T5
T1
T2
T3
T4
T5
((
))
Or
STEP 3: for each noisy attribute 1≤j≤m and 1≤k≤m check for
the that has the maximum value. if satisfies the
above condition then the column whose has the maximum
count is put in the set of 1- itemset which form the left hand side
(LHS) of the rule
i.e = * }</p>
    </sec>
    <sec id="sec-17">
      <title>In this case</title>
      <p>* }
STEP 4: join the 1-itemset to the sensitive column in a way
similar to that of apriori algorithm to form a 2- itemset. The
2itemmset is use to find the useful association rule by at the LHS
and at the RHS similar to that of apriori algorithm
STEP 5 (a): in order to hide sensitive rule, calculate the support
and confidence of each rule
((
(
(
(</p>
      <p>)
(
)
)
(
)
(
)
(
)
)
) &gt; MST but conf(
) &lt; MCT, hence
i.e

(</p>
      <p>(</p>
    </sec>
    <sec id="sec-18">
      <title>Since Sup( is hidden i.e</title>
    </sec>
    <sec id="sec-19">
      <title>Since Sup(</title>
      <p>is hidden
i.e</p>
      <p>+
=

Since Sup(
is hidden i.e
= +
=
)
)
(</p>
      <p>(
=
(
(
6. DISCUSSION OF RESULT
From the result shown in the transformed database, the algorithm
was able to hide 4 out of the 5 items on the sensitive column
resulting into 80% privacy protection of the sensitive column B.
Comparing with the previous works of Manoj and Joshi [8] and
Berberoglu and Kaya [9] on the same dataset, which hide only 1
out of the 5 items, resulting into 20% privacy protection of the
sensitive column B of the data while the proposed algorithm hides
80% of the sensitive column B.
7. CONCLUSION AND FUTURE WORK
In this work, we proposed an improved algorithm for hiding
sensitive quantitative data that combines the concept of noise with
association rule mining to generate the useful association rules in
quantitative data and then hide them using privacy concept. The
algorithm works on association rules in quantitative data unlike
previous work done in this field which works on binary data. In
the future we plan to apply the algorithm to real life dataset; also
((
))
Or
)
)
)</p>
      <p>)

Since Sup(
is hidden
i.e</p>
      <p>+
=

(
(
(
Since Sup(
is not hidden
(
8/5=0.49
))
and
(
(
(
)</p>
      <p>)
(
( )
(
(
(
(
(
(
)
)
)
)
)
(
)
(
)
(
)</p>
      <p>)
)
(
(
)</p>
      <p>)
( )
(
)
(
)
)
) &lt; MST but conf(</p>
      <p>) &gt; MCT ,hence
) &gt; MST and conf(
) &gt; MCT, hence
we propose that in the proposed algorithm the association rule
should contain more than one item on each side of the rule and
that the algorithm should be used for text data after they might
have been normalized into quantitative data.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Doug</given-names>
            <surname>Alexander</surname>
          </string-name>
          (
          <year>2016</year>
          )
          <article-title>- data mining http://www</article-title>
          .laits.utexas.edu/anorman/BUS.FOR/course.mat/ Alex/. Accessed on January 2016
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Alexandre</given-names>
            <surname>Evfimirvski</surname>
          </string-name>
          , Ramakrishnan Srikant,
          <article-title>Rakesh Agrawal and Johannes Gerhke Privacy preserving mining of association rules</article-title>
          .
          <source>In proceedings of the 8th ACM SIGKDD international Conference on Knowledge discovery in databases and data mining</source>
          , Edmonton, Akberta,
          <source>Canada July 23-26</source>
          ,
          <year>2002</year>
          , pages
          <fpage>217</fpage>
          -
          <lpage>228</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Rakesh</given-names>
            <surname>Agrawal and Ramakrishna Srikant</surname>
          </string-name>
          (
          <year>2000</year>
          )
          <article-title>SIGMOD 'oo proceedings of the 2000 ACM SIGMOD international conference</article-title>
          on Management of data pages
          <fpage>439</fpage>
          -
          <lpage>450</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Lindell</surname>
            <given-names>Y</given-names>
          </string-name>
          and
          <string-name>
            <surname>Pinkas B</surname>
          </string-name>
          (
          <year>2000</year>
          )
          <article-title>Advances in cryptology 'crypt '000 proceedings</article-title>
          , LNCS 1880, SpringerVertag,pp.
          <fpage>20</fpage>
          -
          <lpage>24</lpage>
          ,
          <year>August 2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Alexandre</given-names>
            <surname>Evfimievski</surname>
          </string-name>
          (
          <year>2002</year>
          )
          <article-title>- randomization in a privacy preserving data mining SIGKDD explorations</article-title>
          . Volume 4
          <article-title>- issue-2, 2002- international journal of computer science and information technologies Ila Chandrakar (</article-title>
          <year>2016</year>
          ).
          <article-title>Hybrid algorithm for privacy preserving association rule mining</article-title>
          .
          <source>Department of Information Technology VNR Vignana Jyothi Institute of Engineering and Technology Hyderabad</source>
          , India http://www.ieee.org/documents/hybrid-algorithm.
          <source>Accessed May</source>
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Evfimievski A.</given-names>
            ,
            <surname>Srikant</surname>
          </string-name>
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Agrawal</surname>
          </string-name>
          <string-name>
            <given-names>R.</given-names>
            , and
            <surname>Gehrk</surname>
          </string-name>
          <string-name>
            <surname>J.</surname>
          </string-name>
          , (
          <year>2002</year>
          ).
          <article-title>Privacy Preserving Mining of Association Rules</article-title>
          ,
          <source>in Proceedings of the 8th Conference on Knowledge Discovery and Data Mining</source>
          , New York, USA, pp.
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Manoj</given-names>
            <surname>Gupta and Joshi R. C.</surname>
          </string-name>
          (
          <year>2009</year>
          ).
          <article-title>Privacy Preserving Fuzzy Association Rules Hiding in Quantitative Data</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <source>International Journal of Computer Theory and Engineering</source>
          , Vol.
          <volume>1</volume>
          , No. 4,
          <string-name>
            <surname>October</surname>
          </string-name>
          ,
          <year>2009</year>
          ,
          <fpage>1793</fpage>
          -
          <lpage>8201</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Berberoglu</surname>
            <given-names>T.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Kaya</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2008</year>
          ).
          <article-title>Hiding Fuzzy Association Rules in Quantitative Data</article-title>
          ,
          <source>The 3rd International Conference on Grid and Pervasive Computing Workshops, May</source>
          <year>2008</year>
          , pp.
          <fpage>387</fpage>
          -
          <lpage>392</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>