<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>PRIVATEER: A Private Record Linkage Toolkit</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alexandros Karakasidis</string-name>
          <email>a.karakasidis@eap.gr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Georgia Koloniari</string-name>
          <email>gkoloniari@uom.edu.gr</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vassilios S. Verykios</string-name>
          <email>verykios@eap.gr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Hellenic Open University, School of Science &amp; Technology</institution>
          ,
          <addr-line>Patras</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Macedonia, Department of Applied Informatics</institution>
          ,
          <addr-line>Thessaloniki</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Privacy preserving record linkage (PPRL) is the process of integrating data across multiple heterogeneous data sources without compromising their privacy. While many techniques have been developed for PPRL, there has not been, up to now, a universally accepted method providing both performance and quality of results in all cases. To this end, we present PRIVATEER, a toolkit which aims at enabling practitioners to compare various techniques involved in the PPRL process and determine the best for their needs. The toolkit is based on a simulator, designed to be highly configurable, modular and extensible, allowing the user to test different configurations by combining a number of privacy preserving blocking and matching methods with corresponding distance and similarity measures on her own or sample data. We showcase the usability of our toolkit by presenting experimental results measuring both quality and performance of state-of-the-art PPRL methods.</p>
      </abstract>
      <kwd-group>
        <kwd>Privacy</kwd>
        <kwd>Record Linkage</kwd>
        <kwd>Toolkit</kwd>
        <kwd>Simulator</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Large volumes of data, describing individuals, are collected by public or
private organizations. Being able to interconnect and analyze these independently
stored data is beneficial for businesses, governments and research. For instance,
being able to interconnect data of distinct health organizations, hospitals and
physicians could prevent epidemic outbreaks or help us better understand the
mechanisms behind certain diseases.</p>
      <p>
        The task of linking such heterogeneous data, known as the record linkage
problem, is not trivial [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Since data are stored by independent organizations,
there is no unique identifier to determine common data. Also, such data
suffer from low quality, exhibiting typos, abbreviations and misspellings, requiring
approximate matching methods to link them. Additional challenges arise when
we need to protect the privacy of the subjects described by the data.
Therefore, Privacy Preserving Record Linkage (PPRL) has emerged as the problem
where data from heterogeneous sources are integrated without revealing to the
participating sources any extra knowledge not relating to their common records.
      </p>
      <p>To privately link data with accuracy, Privacy Preserving Matching (PPM )
methods focus on elaborately matching data, while maintaining privacy. Though
accurate, matching methods often incur high computational costs. To increase
their scalability, PPM methods are combined with Privacy Preserving Blocking
(PPB ) methods. PPB methods prune out unlikely to match candidate pairs of
records, while also maintaining privacy. Even though they improve efficiency,
PPB methods often compromise matching quality.</p>
      <p>
        To the best of our knowledge, there is no single solution that successfully
addresses the privacy preserving record linkage problem to its full extent. In
particular, state-of-the-art PPB and PPM methods usually focus on a specific
type of data. For instance, there are PPM approaches appropriate for string
data [
        <xref ref-type="bibr" rid="ref11 ref16">11, 16</xref>
        ] and others for numerical data [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ]. Moreover, there are two party
methods and others that require the deployment of a third party. As a result, each
time that a PPRL solution has to be deployed, the most appropriate algorithms
for the given application scenario have to be selected.
      </p>
      <p>To this end, we introduce PRIVATEER, a PRIVATE REcord linkage toolkit
designed to assist practitioners in this selection process, and also serve as a
comparison tool for different methods. PRIVATEER is a modular simulator,
constantly under development, implementing a set of state-of-the-art PPRL
methods. Considering the various tasks involved in PPRL, the aim of PRIVATEER is
to accommodate all this functionality into a single toolkit. It consists of clearly
separate modules, each of them performing a separate task, and uses
different components implementing various privacy preserving matching and blocking
methods, and auxiliary components that are deployed by matching and blocking
methods, such as different distance measures used in approximate matching and
reference set generators, where a reference set is used as an intermediate point of
comparison. Highly composable, PRIVATEER allows all compatible PPB
methods to be combined with compatible PPM ones, and with corresponding auxiliary
methods, so as to form a complete PPRL solution.</p>
      <p>PRIVATEER is also configurable enabling the user to fine tune each of the
deployed methods and provides a variety of performance measures. While the
toolkit is equipped with sample data, it also enables the use of a user’s own data.
Towards extensibility, PRIVATEER’s clear and comprehensive design provides
an API for extending its functionality. Through the Java paradigm the user is
able to add its own modules and algorithms.</p>
      <p>
        Similar toolkits, such as TAILOR [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], have been developed for the classical
record linkage problem. Related tools for PPRL include data generators [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ],
corrupters [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], and tools that combine both functionalities [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. While these tools
aim at developing test data for PPRL, PRIVATEER aims at testing different
PPRL methods. A first version of our toolkit is presented in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], but without
offering composability or a graphical user interface.
      </p>
      <p>The rest of this paper is organized as follows. In Section 2, we present the
architectural design of PRIVATEER and briefly describe the PPM and PPB
algorithms, distance measures, and reference set generation methods currently
implemented in the toolkit. Section 3 provides experimental results derived from
PRIVATEER and we conclude in Section 4.
The primary goals of PRIVATEER’s design is composability and extensibility.
We wanted PRIVATEER to easily enable the combination of different methods
to compose a complete PPRL solution, and the incorporation of new ones as
state-of-the-art PPM and PPB methods are constantly developed. To this end,
an object-oriented design is appropriate, thus, we used Java to build the
simulator comprising PRIVATEER. Also, Java has the property of Write Once Run
Anywhere, providing machine independence, and there are many out-of-the-box
libraries and data structures one can exploit.</p>
      <p>We have made a particular effort to limit the assumptions when designing
the architectural components of the simulator. In particular, we assume that all
operations are executed sequentially. In many PPB and PPM algorithms, there
are functions that require the parallel operation of the matching parties.
PRIVATEER simulates this functionality by sequentially executing separate objects.
Also, we only focus on the processing cost and do not examine communication
costs and data propagation delays.
We followed a modular design, separating data, execution functionality and
implemented methods. This approach improves PRIVATEER’s extensibility, as it
is very easy through its API to add new features and modules. Moreover, since
data is retrieved in a transparent way, new data sources may be added as well.</p>
      <p>Next, we present the architectural elements that comprise PRIVATEER.
The Base Module. The base module holds PRIVATEER’s main
functionality organizing the execution of each evaluation experiment. It coordinates the
different modules and enables us to execute each experiment multiple times. As
illustrated in Fig. 1(1), it hosts the configuration module, a PPB and a PPM
module, the results processing and the visualization module.</p>
      <p>The Configuration Module. The configuration module is invoked by the base
module and is responsible for setting an experiment’s execution parameters. It
consists of a GUI implemented in JavaFX which receives input from the user and
a mechanism that maps user input to specific parameters. This mapping
mechanism is important, since each of PRIVATEER’s modules has to be separately
configured using separate parameters provided by the same user interface.
The Nodes. The nodes implement the functionality of the data sources and a
third party, when one is needed. While a data source node has direct access to
its data, the third party neither holds nor has direct access to any data. Instead,
it is fed with data by the source nodes. Nodes are deployed by the PPB or PPM
module and implement their part of the respective algorithm (Fig. 1(2)).
The Data Module. The data module is used to provide direct access to external
data. It is invoked by each data source node. The used data may be stored in
different databases and the configuration module enables the user to select her
own input database, thanks to a transparent connectivity mechanism. Currently,
MySQL and SQLite databases are supported.</p>
      <p>The Measure Module. This module implements the distance or similarity
measure to be used by the PPB and later PPM algorithm, in a transparent way.
Thus, it allows us to deploy any compatible measure with any PPB and PPM
method using the GUI. This way, the user has the ability to test the available
algorithms and assess their behavior with different configurations.
The Reference Set Generator Module. Similarly to the measure module,
the reference set generator provides different methods that can be used
interchangeably with our PPB and PPM methods.</p>
      <p>The PPB and PPM Modules. The privacy preserving blocking (PPB) and
the privacy preserving matching (PPM) modules implement the corresponding
type of protocols. Depending on the type of the protocol, as shown in Fig. 1(1),
each module initiates two nodes and, when necessary, a third party node. All
these nodes implement the same algorithm, which is chosen through the GUI and
provided to the corresponding module by the base module. For a PPB or PPM
algorithm to operate, the PPB or PPM module load a measure module, which
implements the measure used by the corresponding algorithm. If it is required
by the implemented algorithm, a reference set generator module is also loaded.</p>
      <p>PPRL protocols may vary regarding their implementation and the number of
participating nodes. Generally, however, they are either based on some type of
data transformation or they may exhibit a multiparty computational behavior. In
both cases, a third party may be required and all options may be implemented in
PRIVATEER. Moreover, to offer the ability of experimenting with a no blocking
setup, a mockup No Blocking method has also been implemented.</p>
      <p>Regarding its operation, the PPB passes on the parameters received from the
configuration module to the respective modules it hosts and executes the selected
PPB algorithm. When the execution concludes, it produces lists of blocks of
record ids which are then used by the PPM module. The PPM module receives
the blocks of record ids from the PPB module and applies the selected PPM
algorithm on each of them. Upon the conclusion of the execution of the PPM
module, a list with matching record ids is produced which is then passed to the
results processing module for processing.</p>
      <p>The Results Processing Module. The results processing module processes
the matches list received from the PPM module and calculates the number of
total matches, the number of true and false matches, precision, recall and the
execution times of the PPM and PPB modules. If the input data have the same
unique identifiers (i.e., corrupted versions of the same dataset), the module
automatically calculates the measurements based on the common primary key.
Otherwise, a matching csv file with the matches between different primary keys
is used.</p>
      <p>The Visualization Module. The role of the visualization module is to visualize
the results of each experiment by collecting the statistics produced by the results
processing module and produce plots and tables of these statistics.
2.2</p>
      <p>Implemented Algorithms and Measures
The modules of the architecture enable the selection of a method for each of
the different tasks in the PPRL process. Thus, the PRIVATEER is equipped
with a pool of implemented state-of-the-art approaches for each such task, i.e.,
PPB and PPM algorithms, distance and similarity measures and reference set
generators (Tables 1-3).</p>
      <p>The PPB algorithms currently implemented in PRIVATEER are illustrated
in the upper part of Table 1. The first column holds the name of the algorithm,
the second a short description, the third shows the type of data each algorithm
is designed for and the forth indicates whether a third party node is required.
Similarly, PPM algorithms are included in the lower part of Table 1.</p>
      <p>
        Many of the PPRL algorithms that have been suggested, use a distance or
similarity measure [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. We designed PRIVATEER so as to offer the ability to
use in the PPRL algorithms alternative measures, other than the default ones.
Possible combinations are illustrated in the third column of Table 2. Default
algorithms that use the measure are indicated by bold.
      </p>
      <p>
        Reference sets (RS) are used in PPRL as an intermediate point of
comparison to provide privacy. A reference set may be derived from a publicly available
database or be arbitrarily generated. Table 3 summarizes PRIVATEER’s
currently available methods for RS generation. All methods apart from Arbitrary
String Generation [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] use a user defined publicly available dataset.
3
      </p>
    </sec>
    <sec id="sec-2">
      <title>Execution Results</title>
      <p>To illustrate PRIVATEER’s usability, we present sample experiments exploring
alternative combinations of PPB and PPM methods, beyond their default setup,
so as to assess their behavior and performance in different configurations.</p>
      <p>All experiments are conducted using an Intel i3 PC with 8GB of RAM. We
use a sample dataset, which originates from the North Carolina voters database3.
We assume that the attributes: last name, first name, middle name, city of
residence and precinct description comprise a candidate key, and use them for both
matching and blocking. After deduplicating the dataset based on the selected
candidate key, we generate two databases of 50 000 records each, so that the
25% of their records is common. Finally, we use the “master” table of Lahman’s
baseball database4, as a source for reference set creation. Since we are interested
3 Available at ftp://alt.ncsbe.gov/data/
4 Available at http://www.seanlahman.com</p>
      <p>Name Description Datatype 3P</p>
      <p>PPB Algorithms</p>
      <p>Option when the use of blocking is not desired. Records to be
B1 No Blocking linked are organized in a single block for each of the two data N/A
sources.</p>
      <p>SNEF is a meta-blocking algorithm specifically designed to
B2 S[N1E3]F sphairfatditghme . STohrtisedalgNoeriigthhmbourhseosodrefaeprepnrcoeacshet [c8l]usttoeritnhge foPrPcRreL- NSutrminbgesr,s ✔
ating blocks.</p>
      <p>A single attribute is utilized for blocking. Each data source
creB3 T[P19P]B aTtheesna, sitepmaerragteesrietfserdeantcaewseitthfrtohme raefpeurebnliccelyseatvaainladblleexdicaotgarbaapshe-. Strings
ically sorts the result.</p>
      <p>Introduces hash signature transformations to securely
repre</p>
      <p>
        Simple sent TF/IDF weight vectors. Blocks consist of records which
B4 Blocking share common strings called tokens, with a token being an Strings ✔
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] intact word. The Jaccard metric is used for comparing hash
      </p>
      <p>signatures.</p>
      <p>
        Phonetic Uses Soundex’s inherent privacy characteristics, which are
B5 Blocking pbahsoendetoicn etnhceocdoinmgmaolgnoirnitfhomrmsa.tRioencosrudpsparreessoirognanpirzoepderintybolofcaklsl Strings ✔
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] based on their phonetic encoding similarity.
      </p>
      <p>PPM Algorithms</p>
      <p>
        Bloom Bigrams of strings are inserted into a Bloom Filter. Then, a
M1 Bigrams bitwise comparison is performed on Bloom Filters using the Strings ✔
[
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] Dice Coefficient.
      </p>
      <p>
        Embedding Based on the edit distances between actual and reference set
M2 Spaces data, a transformation of the data, called embedding, is derived Strings ✔
[
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] and used for matching.
      </p>
      <p>Secure Edit Secures the use of edit distance through the use of Bloom
filM3 Dis[1ta2n]ce taeprps.enEdaicnhg cihtsarpaocstietrioonf ianmthaetcshtirninggfi, eiltdisisinexsetrratecdteidnaandBlaofotemr Strings
filter which is encrypted using a secure hash function.</p>
      <p>
        Homomorphic Homomorphic encryption manages to calculate differences
seM4 Encryption curely based on the properties of specific cryptographic algo- Numbers
[
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ] rithms, such as RSA.
      </p>
      <p>
        Soundex
M5 Matching hTerraennstfoirnmfosrmatattriiobnutseusppinrteossSioonunpdroexpecrotideesstoanpdroevxipdleoiptriivtasciyn.- Strings ✔
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
M6
      </p>
      <p>Calculates the TF/IDF weights of all words, called tokens,</p>
      <p>
        Hash within all fields. Then, a hash function is used to create arrays
Signatures of these weights, one for each record. This method is not suit- Strings
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] able for approximate matching, since the entire token (word)
is encoded.
      </p>
      <p>
        Table 1. Supported PPB and PPM algorithms.
✔
in linking low quality data, we corrupt one of the two databases using the
corrupter described in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The corrupted records contain one error per attribute so
that a join with the original ones returns no matches. With equal probability,
an error might either be a character insertion, deletion or substitution.
      </p>
      <p>
        In the second experiment, we combine a matching method with different
blocking methods. In particular, Bloom Bigrams is combined with Simple
Blocking [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], SNEF [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and TPPB [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. As for blocking, efficiency is important, we
measure time to determine the most efficient blocking method for Bloom Bigrams
matching. Figure 2(center) illustrates the time required for PPB, for PPM and
for the overall PPRL process. We observe how the matching time varies
depending on the blocking method used. While Simple Blocking is the fastest blocking
approach, it does not manage to significantly reduce the matching time. On the
other hand, while SNEF is slower while blocking, it achieves the best overall
time as it significantly reduces matching time. Finally, TPPB has the highest
blocking time, but achieves the best matching time. It clearly creates the optimal
blocks, but unfortunately the time blocking requires is greater than the overall
time for SNEF.
      </p>
      <p>
        Next, we evaluate three reference set generation methods with SNEF[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]
and use Bloom Bigrams for matching [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. The first one is the arbitrary string
generation method [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], the second is a modification of the Durstenfeld shuffle
algorithm [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], while the third uses the K-medoids algorithm [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], which was
originally used in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Figure 2(right) shows that with regards to precision and
recall, the Durstenfeld Shuffle, and not the K-medoids used in the default setup
of SNEF, is the best choice.
      </p>
    </sec>
    <sec id="sec-3">
      <title>Conclusions</title>
      <p>We have presented PRIVATEER, a modular, highly configurable privacy
preserving record linkage toolkit that enables combining and comparing different
PPRL solutions. We showed the usability of our toolkit by including evaluation
results derived from experiments run by the PRIVATEER on sample data.
Acknowledgment. Attendance to this conference was partially supported by
the Hellenic Artificial Intelligence Society (EETN).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Al-Lawati</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McDaniel</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Blocking - aware private record linkage</article-title>
          .
          <source>In: IQIS</source>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bachteler</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reiher</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>A test data generator for evaluating record linkage methods</article-title>
          .
          <source>Tech. rep., German RLC Work. Paper No. wp-grlc-2012-01</source>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Christen</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Febrl-: an open source data cleaning, deduplication and record linkage system with a graphical user interface</article-title>
          .
          <source>In: ACM SIGKDD</source>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Christen</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <source>Data Matching. Data-Centric Systems and Applications</source>
          , Springer (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Durstenfeld</surname>
          </string-name>
          , R.:
          <source>Algorithm</source>
          <volume>235</volume>
          :
          <article-title>Random permutation</article-title>
          .
          <source>Communications of the ACM</source>
          <volume>7</volume>
          (
          <issue>7</issue>
          ),
          <fpage>420</fpage>
          -
          <lpage>(</lpage>
          Jul
          <year>1964</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Elfeky</surname>
            ,
            <given-names>M.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verykios</surname>
            ,
            <given-names>V.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Elmagarmid</surname>
            ,
            <given-names>A.K.</given-names>
          </string-name>
          :
          <article-title>Tailor: a record linkage toolbox</article-title>
          .
          <source>In: IEEE ICDE</source>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Gravano</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ipeirotis</surname>
            ,
            <given-names>P.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jagadish</surname>
            ,
            <given-names>H.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koudas</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Muthukrishnan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pietarinen</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Srivastava</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Using q-grams in a dbms for approximate string processing</article-title>
          .
          <source>IEEE Data Engineering Bulletin</source>
          <volume>24</volume>
          (
          <issue>4</issue>
          ),
          <fpage>28</fpage>
          -
          <lpage>34</lpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8. Hern´andez,
          <string-name>
            <given-names>M.A.</given-names>
            ,
            <surname>Stolfo</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.J.</surname>
          </string-name>
          :
          <article-title>Real-world data is dirty: Data cleansing and the merge/purge problem</article-title>
          .
          <source>Data Mining &amp; Knowledge Discovery</source>
          <volume>2</volume>
          (
          <issue>1</issue>
          ),
          <fpage>9</fpage>
          -
          <lpage>37</lpage>
          (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Inan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kantarcioglu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bertino</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scannapieco</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>A hybrid approach to private record linkage</article-title>
          .
          <source>In: IEEE ICDE</source>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Inan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kantarcioglu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghinita</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bertino</surname>
          </string-name>
          , E.:
          <article-title>Private record matching using differential privacy</article-title>
          .
          <source>In: ACM EDBT</source>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Karakasidis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verykios</surname>
            ,
            <given-names>V.S.:</given-names>
          </string-name>
          <article-title>Privacy preserving record linkage using phonetic codes</article-title>
          .
          <source>In: BCI</source>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Karakasidis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verykios</surname>
            ,
            <given-names>V.S.:</given-names>
          </string-name>
          <article-title>Secure blocking + secure matching = secure record linkage</article-title>
          .
          <source>JCSE</source>
          <volume>5</volume>
          (
          <issue>3</issue>
          ),
          <fpage>223</fpage>
          -
          <lpage>235</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Karakasidis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verykios</surname>
            ,
            <given-names>V.S.:</given-names>
          </string-name>
          <article-title>A sorted neighborhood approach to multidimensional privacy preserving blocking</article-title>
          .
          <source>In: IEEE ICDMW</source>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Karakasidis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verykios</surname>
            ,
            <given-names>V.S.:</given-names>
          </string-name>
          <article-title>A simulator for privacy preserving record linkage</article-title>
          . In: EANN,
          <string-name>
            <surname>Part</surname>
            <given-names>II</given-names>
          </string-name>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Kaufman</surname>
          </string-name>
          , L.,
          <string-name>
            <surname>Rousseeuw</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Clustering by Means of Medoids. Reports of the Faculty of Mathematics and Informatics</article-title>
          .
          <source>Delft Univ. of Technology</source>
          (
          <year>1987</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Scannapieco</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Figotin</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bertino</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Elmagarmid</surname>
            ,
            <given-names>A.K.</given-names>
          </string-name>
          :
          <article-title>Privacy preserving schema and data matching</article-title>
          .
          <source>In: ACM SIGMOD</source>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Schnell</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bachteler</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reiher</surname>
          </string-name>
          , J.:
          <article-title>Privacy preserving record linkage using bloom filters</article-title>
          .
          <source>BMC Medical Informatics &amp; Decision Making</source>
          <volume>9</volume>
          (
          <issue>1</issue>
          ),
          <volume>41</volume>
          + (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Tran</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vatsalan</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Christen</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Geco: an online personal data generator and corruptor</article-title>
          .
          <source>In: ACM CIKM</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Vatsalan</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Christen</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verykios</surname>
            ,
            <given-names>V.S.</given-names>
          </string-name>
          :
          <article-title>Efficient two-party private blocking based on sorted nearest neighborhood clustering</article-title>
          .
          <source>In: ACM CIKM</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>