<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>June</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Data Processing: Re ections on Ethics</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Donatella Firmani</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Letizia Tanca</string-name>
          <email>letizia.tanca@polimi.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Riccardo Torlone</string-name>
          <email>riccardo.torloneg@uniroma3.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Politecnico di Milano</institution>
          ,
          <addr-line>Milan</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Roma Tre University</institution>
          ,
          <addr-line>Rome</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <volume>4</volume>
      <issue>2019</issue>
      <abstract>
        <p>Ethics-related aspects are becoming prominent in data management, thus the current processes for searching, querying, or analyzing data should be designed is such a way as to take into account the social problems their outcomes could bring about. In this paper we provide re ections on the unavoidable ethical facets entailed by all the steps of the information life-cycle, including source selection, knowledge extraction, data integration and data analysis. Such re ections motivated us to organize the First International Workshop on Processing Information Ethically (PIE).</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Information management naturally involves ethical concerns about how data can
be used or misused, posing new challenges to researchers and practitioners across
the whole spectrum of information systems. Data is the bridge between stark
hardware and people, thus the data produced by any information system cannot
convey the appropriate knowledge if humans do not give it semantics. Today, it is
widely accepted that a decent system must manage data in a truthful, accurate
and secure way, hence the satisfaction of ethical requirements is fundamental in
modern applications.</p>
      <p>
        In this paper we discuss the role of the most common ethical requirements
of any information system { namely fairness, transparency, diversity and data
protection { and provide re ections and open problems related to how each of
them can be considered in the following phases of the information life-cycle [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]:
1. source selection, i.e., identifying the datasets, or data sources, containing
the data of interest,
2. data integration, namely, extracting and integrating those data in order to
produce a unique dataset, and
3. information extraction, that is, applying the information extractions tools,
from a basic query up to a sophisticated machine learning method, to
produce knowledge.
      </p>
      <p>Speci cally, we discuss how each of these steps may imply ethically relevant
choices, advocate the ethics by design in the information life-cycle, and discuss
related goals and challenges. We also reason on how a goal might be
interdependent with another one.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Ethical Requirements</title>
      <p>We rst brie y review the above-mentioned ethical principles.</p>
      <p>
        { Fairness of data is de ned as the lack of bias, and di erent notions of bias
can yield di erent fairness measures [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. Fairness has often been studied
for processes [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], while recently its importance has been acknowledged also
for the data involved in the process itself, due to the (possibly dramatic)
consequences of training Arti cial Intelligence (AI) systems with biased data,
both in the general setting [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and in speci c data management tasks [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
{ Transparency is the ability to interpret the information extraction process
in order to verify which aspects of the data determine its results.
Transparency metrics can use the notions of (i) data provenance, by measuring
the degree to which the meta-data describe where the original data come
from; (ii) explanation, by describing how a result has been obtained.
Transparency enables the detection of possible biases, thus is somehow \of service"
to fairness.
{ Diversity is the degree to which di erent kinds of objects are represented
in a dataset. Several metrics are proposed in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Ensuring diversity at the
beginning of the information extraction process may be useful for enforcing
fairness at the end. However, note that sometimes diversity may con ict
with fairness [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], for instance when an e ort to guarantee diversity leads to
loosing sight of the objective merit of the involved people.
{ Data Protection concerns the ways to secure data, algorithms and models
against unauthorized access. De ning measures for privacy can be an elusive
goal since, on the one hand, anonymized datasets that are secure in
isolation can reveal sensible information when combined [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], and on the other
hand, robust techniques such as -di erential privacy [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] can only describe
the privacy impact of speci c queries. Needless to say, data protection may
con ict with transparency.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Discussion</title>
      <p>We now discuss our viewpoint of the main ethically-relevant aspects for each of
the information life-cycle steps.</p>
      <p>
        Phase 1: Source selection. Data typically come from multiple sources, and it
is most desirable that each of these complies with the fairness requirement
individually. Unfortunately, often sources are biased with respect to some categories.
For instance, a source with restaurants in Rome may over-represent restaurants
with Italian cuisine. It is thus appropriate to consider ethics throughout multiple
sources, so that the bias towards a certain category in a single source can be
eliminated by adding others with opposite biases. Another fundamental
challenge in the context of source selection is data protection. While fairness can
bene t from multiple sources, data shall be protected at the level of the single
data source, as adding more information can only lower the protection level, or,
at most, leave it as it is. For instance, the case study in [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] about a dataset
released by the New York City Taxi showed that with only a small amount of
auxiliary knowledge, an attacker could violate privacy of passengers identifying
where an individual went, how much they paid, and weekly habits.
Phase 2: Data Integration. Data integration usually involves three main
steps: (i) schema matching, i.e. the alignment of the schemata of the data sources,
(ii) entity resolution, i.e., identi cation of the items stored in di erent data
sources that refer to the same entity, and (iii) data fusion, i.e., construction of
an integrated database over the data sources, obtained by merging their contents.
      </p>
      <p>
        For similar reasons as to those discussed for the source selection phase, entity
resolution across several data sources owned by di erent parties can reveal
sensitive information about these entities. Examples range from public health
surveillance to crime and fraud detection, and national security. We refer the reader
to [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] for a survey of existing techniques and challenges of privacy-preserving
entity resolution in the context of Big Data. As for the steps of schema
matching and data fusion, we observe that groups treated fairly in the sources can
become over- or under-represented as a consequence of the integration process,
since combining data coming from di erent sources might lead to the exclusion
of some groups. Similar issues arise in connection with diversity.
      </p>
      <p>
        In all the above steps transparency is critical since, while data provenance
and explanation of the intermediate results play key roles in the enforcement
of ethical requirements, both can con ict with data protection requirements.
The idea of data transparency without privacy violation is put forward in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ],
envisioning the application of blockchain technology.
      </p>
      <p>
        We nally observe that the rise of machine learning and deep learning
techniques for the data integration tasks [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] poses new challenges in grasping how
integration outputs are produced [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. We refer the reader to [
        <xref ref-type="bibr" rid="ref5 ref9">5,9</xref>
        ] for two of the
rst attempts in explaining deep learning systems for the entity resolution task.
Phase 3: Information Extraction. This step aims at presenting the user
with data organized as to satisfy their needs. As an example, among all the
possible means for extracting information we focus on two, namely search and
aggregation.
      </p>
      <p>
        Search is a widely studied task and literature is rich with methods for
maximizing user satisfaction [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. However, in a job candidate selection or for university
admissions, as part of the task, we would like to nd rankings that also satisfy
certain notions of fairness, for instance that di erent demographic groups be
equally represented in the top search results. Interestingly, we believe that the
system should allow the possibility to specify the desired type of fairness, so that
the data scientists can comply with the requirements coming from the customers
who, in their turn, will take responsibility for their choices. We refer the reader
to [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] for one of the rst attempts to de ne fairness of exposure in ranking.
      </p>
      <p>
        A notable phenomenon that instead occurs during aggregation is the
Simpson's paradox, where trends appearing in di erent groups of data can disappear
or even be reversed when these groups are combined.3 The work of [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] provides
a framework for incorporating fairness for speci c aggregations based on
independence tests, whereas detecting bias in combined data with full- edged query
systems is still an open problem.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Further Readings</title>
      <p>
        An early attempt to consider ethics in the broad information systems and data
management area can be found in the data quality book [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] by Batini and
Scannapieco. In their work, the authors describe a framework for data quality clusters
and dimensions, where those in the trust cluster are related to some ethics
principles, including, believability, reliability, and reputation. Recent developments
of these dimensions triggered by the Big Data challenge are discussed in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>
        One of the rst attempts to consider the ethical principles as rst-class
citizens in data management is in the tutorial [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] by Stoyanovich et al., with the
primary goal of drawing the attention of the data management community to
the emerging subject of responsible data management. The tutorial gives an
overview of existing technical work, primarily from the data mining and
algorithms communities, and discusses related research directions.
      </p>
      <p>
        More recently, the work [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] advocates the injection of ethical principles
into the whole information extraction process, by properly amalgamating and
resolving contrasts between various ethical requirements. The paper provides the
vision of a large group of data management researchers towards the description
of a comprehensive checklist of ethical desiderata for information processing, to
ensure and verify that ethically motivated requirements and related legal norms
are ful lled throughout the data selection and exploration processes.
      </p>
      <p>
        With an analogous standardization spirit than [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], but in the more speci c
scenario of building AI systems and supporting robust data analysis, the work
in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] describes the Dataset Nutrition Label, that is a diagnostic framework to
provide a distilled yet comprehensive overview of dataset \ingredients" for AI
model development. Future directions for the project include research and public
policy agendas to further advance consideration of the concept.
      </p>
      <p>
        Finally, Abiteboul et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] bring regulatory frameworks { such as the
European Union's General Data Protection Regulation (GDPR), the New York City
3 One of the best-known examples of such paradox is a study of gender bias among
graduate school admissions to University of California, Berkeley. Overall, men were
more likely than women to be admitted. However, by examining the individual
departments, instead, women were signi cantly more successful than men. A deeper
analysis showed than women tended to apply to competitive departments with low
rates of admission, thus yielding the inverse overall trend.
Automated Decisions Systems (ADS) Law, and the Net Neutrality principle { to
the attention of the data management community. Governments are starting to
acknowledge the importance of building norms and codes for data-driven
algorithmic technologies, and such regulatory frameworks are prominent examples.
The main take-away of the paper is that in order to comply with regulatory
frameworks we shall think in terms of ethics by design, viewing ethics as a
systems requirement, rather that incorporating it into systems in retrospection.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Concluding Remarks</title>
      <p>Ethics in Information Management is a multifaceted concept, including some
requirements implicitly related or in con ict. As ethics by design is starting to
be recognized as a system requirement, we argue that a key ingredient is to
achieve an explicit and holistic vision of ethics as a rst-class citizen for data
management, as data is at the core of modern information systems. To this end,
we discussed the challenges of introducing ethics during the three phases of the
information life-cycle, as a necessary step to allow di erent stakeholders to enact
law regulations and equally bene t from modern data processing techniques.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Serge</given-names>
            <surname>Abiteboul</surname>
          </string-name>
          and
          <string-name>
            <given-names>Julia</given-names>
            <surname>Stoyanovich</surname>
          </string-name>
          .
          <article-title>Transparency, fairness, data protection, neutrality: Data management challenges in the face of new regulation</article-title>
          . To appear
          <source>in the ACM Journal of Data and Information Quality (JDIQ)</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Rakesh</given-names>
            <surname>Agrawal</surname>
          </string-name>
          , Sreenivas Gollapudi, Alan Halverson, and
          <string-name>
            <given-names>Samuel</given-names>
            <surname>Ieong</surname>
          </string-name>
          .
          <article-title>Diversifying search results</article-title>
          .
          <source>In 2nd ACM International Conference on Web Search and Data Mining (WSDM)</source>
          , pages
          <fpage>5</fpage>
          {
          <fpage>14</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Carlo</given-names>
            <surname>Batini</surname>
          </string-name>
          and
          <string-name>
            <given-names>Monica</given-names>
            <surname>Scannapieco</surname>
          </string-name>
          .
          <source>Data and Information Quality - Dimensions, Principles and Techniques. Data-Centric Systems and Applications</source>
          . Springer,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Elisa</given-names>
            <surname>Bertino</surname>
          </string-name>
          , Ashish Kundu, and
          <string-name>
            <given-names>Zehra</given-names>
            <surname>Sura</surname>
          </string-name>
          .
          <article-title>Data transparency with blockchain and AI ethics</article-title>
          . To appear
          <source>in the ACM Journal of Data and Information Quality (JDIQ)</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Vincenzo</given-names>
            <surname>Di</surname>
          </string-name>
          <string-name>
            <surname>Cicco</surname>
          </string-name>
          , Donatella Firmani, Nick Koudas, Paolo Merialdo, and
          <string-name>
            <given-names>Divesh</given-names>
            <surname>Srivastava</surname>
          </string-name>
          .
          <article-title>Interpreting deep learning models for entity resolution: an experience report using LIME</article-title>
          .
          <source>In 2nd International Workshop on Exploiting Arti cial Intelligence Techniques for Data Management @ SIGMOD, page 8</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Xin</given-names>
            <surname>Luna</surname>
          </string-name>
          Dong and
          <string-name>
            <given-names>Theodoros</given-names>
            <surname>Rekatsinas</surname>
          </string-name>
          .
          <article-title>Data integration and machine learning: A natural synergy</article-title>
          .
          <source>In 2018 International ACM Conference on Management of Data (SIGMOD)</source>
          , pages
          <fpage>1645</fpage>
          {
          <fpage>1650</fpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Marina</surname>
            <given-names>Drosou</given-names>
          </string-name>
          , HV Jagadish, Evaggelia Pitoura, and
          <string-name>
            <given-names>Julia</given-names>
            <surname>Stoyanovich</surname>
          </string-name>
          .
          <article-title>Diversity in big data: A review</article-title>
          .
          <source>Big data</source>
          ,
          <volume>5</volume>
          (
          <issue>2</issue>
          ):
          <volume>73</volume>
          {
          <fpage>84</fpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Cynthia</given-names>
            <surname>Dwork</surname>
          </string-name>
          .
          <article-title>Di erential privacy</article-title>
          .
          <source>Encyclopedia of Cryptography and Security</source>
          , pages
          <volume>338</volume>
          {
          <fpage>340</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Amr</given-names>
            <surname>Ebaid</surname>
          </string-name>
          , Saravanan Thirumuruganathan, Walid G. Aref,
          <string-name>
            <surname>Ahmed K. Elmagarmid</surname>
            , and
            <given-names>Mourad</given-names>
          </string-name>
          <string-name>
            <surname>Ouzzani</surname>
          </string-name>
          .
          <article-title>EXPLAINER: entity resolution explanations</article-title>
          .
          <source>In 35th IEEE International Conference on Data Engineering, ICDE</source>
          , pages
          <year>2000</year>
          {
          <year>2003</year>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Donatella</surname>
            <given-names>Firmani</given-names>
          </string-name>
          , Massimo Mecella, Monica Scannapieco, and
          <string-name>
            <given-names>Carlo</given-names>
            <surname>Batini</surname>
          </string-name>
          .
          <article-title>On the meaningfulness of \big data quality"</article-title>
          .
          <source>Data Science and Engineering</source>
          ,
          <volume>1</volume>
          (
          <issue>1</issue>
          ):6{
          <fpage>20</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Luciano</surname>
            <given-names>Floridi</given-names>
          </string-name>
          , Josh Cowls, Monica Beltrametti, Raja Chatila, Patrice Chazerand, Virginia Dignum, Christoph Luetge, Robert Madelin, Ugo Pagallo,
          <string-name>
            <given-names>Francesca</given-names>
            <surname>Rossi</surname>
          </string-name>
          , et al.
          <article-title>Ai4people-an ethical framework for a good ai society: Opportunities, risks, principles, and recommendations</article-title>
          .
          <source>Minds and Machines</source>
          ,
          <volume>28</volume>
          (
          <issue>4</issue>
          ):
          <volume>689</volume>
          {
          <fpage>707</fpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Sainyam</surname>
            <given-names>Galhotra</given-names>
          </string-name>
          , Yuriy Brun, and
          <string-name>
            <given-names>Alexandra</given-names>
            <surname>Meliou</surname>
          </string-name>
          .
          <article-title>Fairness testing: testing software for discrimination</article-title>
          .
          <source>In 11th ACM Joint Meeting on Foundations of Software Engineering</source>
          , pages
          <volume>498</volume>
          {
          <fpage>510</fpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. Sarah Holland, Ahmed Hosny, Sarah Newman, Joshua Joseph, and
          <string-name>
            <given-names>Kasia</given-names>
            <surname>Chmielinski</surname>
          </string-name>
          .
          <article-title>The dataset nutrition label: A framework to drive higher data quality standards</article-title>
          .
          <source>arXiv preprint arXiv:1805.03677</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Babak</surname>
            <given-names>Salimi</given-names>
          </string-name>
          , Johannes Gehrke, and
          <string-name>
            <given-names>Dan</given-names>
            <surname>Suciu</surname>
          </string-name>
          .
          <article-title>Bias in OLAP queries: Detection, explanation, and removal</article-title>
          .
          <source>In 2018 International ACM Conference on Management of Data (SIGMOD)</source>
          , pages
          <fpage>1021</fpage>
          {
          <fpage>1035</fpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>Ashudeep</given-names>
            <surname>Singh</surname>
          </string-name>
          and
          <string-name>
            <given-names>Thorsten</given-names>
            <surname>Joachims</surname>
          </string-name>
          .
          <article-title>Fairness of exposure in rankings</article-title>
          .
          <source>In 24th ACM International Conference on Knowledge Discovery &amp; Data Mining (KDD)</source>
          , pages
          <fpage>2219</fpage>
          {
          <fpage>2228</fpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Julia</surname>
            <given-names>Stoyanovich</given-names>
          </string-name>
          , Serge Abiteboul, and
          <string-name>
            <given-names>Gerome</given-names>
            <surname>Miklau</surname>
          </string-name>
          .
          <article-title>Data, responsibly: Fairness, neutrality and transparency in data analysis</article-title>
          .
          <source>In 19th International Conference on Extending Database Technology (EDBT)</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Letizia</surname>
            <given-names>Tanca</given-names>
          </string-name>
          , Paolo Atzeni, Davide Azzalini, Ilaria Bartolini, Luca Cabibbo, Luca Calderoni, Paolo Ciaccia, Valter Crescenzi, Juan Carlos De Martin,
          <string-name>
            <surname>Selina Fenoglietto</surname>
          </string-name>
          , Donatella Firmani, Sergio Greco, Francesco Isgro, Dario Maio, Davide Martinenghi, Maristella Matera, Paolo Merialdo, Cristian Molinaro, Marco Patella, Roberto Prevete, Elisa Quintarelli, Antonio Santangelo, Andrea Tagarelli, Guglielmo Tamburrini, and
          <string-name>
            <given-names>Riccardo</given-names>
            <surname>Torlone</surname>
          </string-name>
          .
          <article-title>Ethics-aware data governance (vision paper)</article-title>
          .
          <source>In 26th Italian Symposium on Advanced Database Systems (SEBD)</source>
          ,
          <source>page 49</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <given-names>Anthony</given-names>
            <surname>Tockar</surname>
          </string-name>
          .
          <article-title>Riding with the stars: Passenger privacy in the nyc taxicab dataset</article-title>
          .
          <source>Neustar Research</source>
          , September,
          <volume>15</volume>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Dinusha</surname>
            <given-names>Vatsalan</given-names>
          </string-name>
          , Ziad Sehili,
          <string-name>
            <given-names>Peter</given-names>
            <surname>Christen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Erhard</given-names>
            <surname>Rahm</surname>
          </string-name>
          .
          <article-title>Privacypreserving record linkage for big data: Current approaches and research challenges</article-title>
          .
          <source>In Handbook of Big Data Technologies</source>
          , pages
          <volume>851</volume>
          {
          <fpage>895</fpage>
          . Springer,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Xiaolan</surname>
            <given-names>Wang</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Laura Haas</surname>
            , and
            <given-names>Alexandra</given-names>
          </string-name>
          <string-name>
            <surname>Meliou</surname>
          </string-name>
          .
          <article-title>Explaining data integration</article-title>
          .
          <source>Data Engineering Bulletin</source>
          ,
          <volume>41</volume>
          (
          <issue>2</issue>
          ),
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>