<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Privacy-Preserving Important Passage Retrieval</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Luís Marujo</string-name>
          <email>lmarujo@cs.cmu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>José Portêlo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David Martins de Matos</string-name>
          <email>david.matos@inesc-id.pt</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>João P. Neto</string-name>
          <email>joao.neto@inesc-id.pt</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anatole Gershman</string-name>
          <email>anatoleg@cs.cmu.edu</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jaime Carbonell</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Isabel Trancoso</string-name>
          <email>isabel.trancoso@inesc-id.pt</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bhiksha Raj</string-name>
          <email>bhiksha@cs.cmu.edu</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>INESC-ID</institution>
          ,
          <addr-line>Lisboa</addr-line>
          ,
          <country country="PT">Portugal</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Instituto Superior Técnico</institution>
          ,
          <addr-line>Lisboa</addr-line>
          ,
          <country country="PT">Portugal</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Language Technologies Institute, Carnegie Mellon University</institution>
          ,
          <addr-line>Pittsburgh, PA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2014</year>
      </pub-date>
      <fpage>3</fpage>
      <lpage>8</lpage>
      <abstract>
        <p>State-of-the-art important passage retrieval methods obtain very good results, but do not take into account privacy issues. In this paper, we present a privacy preserving method that relies on creating secure representations of documents. Our approach allows for third parties to retrieve important passages from documents without learning anything regarding their content. We use a hashing scheme known as Secure Binary Embeddings to convert a key phrase and bagof-words representation to bit strings in a way that allows the computation of approximate distances, instead of exact ones. Experiments show that our secure system yield similar results to its non-private counterpart on both clean text and noisy speech recognized text.</p>
      </abstract>
      <kwd-group>
        <kwd>Secure Passage Retrieval</kwd>
        <kwd>Important Passage Retrieval</kwd>
        <kwd>KPCentrality</kwd>
        <kwd>Secure Binary Embeddings</kwd>
        <kwd>Data Privacy</kwd>
        <kwd>Automatic Key Phrase Extraction</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        Important Passage Retrieval (IPR) is the problem of
extracting the most important passages in a body of text. By
\important", we mean those passages that capture most of
the key information the text is attempting to convey. Of
the various solutions proposed, state-of-the-art solutions for
IPR based on centrality achieve excellent results [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ].
A potential problem to the deployment of such methods
is that they usually assume that the input data are of public
domain. However, this data may come from social network
pro les, medical records or other private documents, and
their owners may not want to, or even be allowed to share it
with third parties. Consider the scenario where a company
has millions of classi ed documents. The company needs to
retrieve the most important passages from those documents,
but lacks the computational power or know-how to do so. At
the same time, they can not give access to the documents to
a third party with such capabilities because they may
contain sensitive information. As a result, the company must
obfuscate their own data before sending it to the third party,
a requirement that is seemingly at odds with the objective
of extracting important passages from it.
      </p>
      <p>
        In this paper, we propose a new privacy-preserving
technique for IPR based on Secure Binary Embeddings (SBE) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
that enables exactly this { it provides a mechanism for
obfuscating the data, while still achieving near state-of-the-art
performance in IPR.
      </p>
      <p>
        SBEs are a form of locality-sensitive hashing which
convert data arrays such as bag-of-words vectors to obfuscated
bit strings through a combination of random projections
followed by banded quantization. The method has information
theoretic guarantees of security, ensuring that the original
data cannot be recovered from the bit strings. At the same
time, they also provide a mechanism for locally computing
distances between vectors that are close to one another
without revealing the global geometry of the data, consequently
enabling tasks such as IPR. This is possible because,
unlike other hashing methods which require exact matches for
performing classi cation tasks, SBEs allows for a near-exact
matching: the hashes can be used to estimate the distances
between vectors that are very close, but provably provide no
information whatsoever about the distance between vectors
that are farther apart. The usefulness of SBE has already
been shown for implementing a privacy-preserving speaker
veri cation system [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] yielding promising results.
      </p>
      <p>The remainder of the paper is structured as follows. In
Section 2 we brie y present some related work regarding
Important Passage Retrieval and privacy-preserving methods
in IR. In Section 3 we detail the two stages of the important
passage retrieval technique. Section 4 presents the method
for obtaining a secure representation method. We describe
our approach to privacy-preserving important passage
retrieval in Section 5. Section 6 describes the dataset used and
illustrates our approach with some experiments. Finally, we
present some conclusions and plans for future work.</p>
    </sec>
    <sec id="sec-2">
      <title>RELATED WORK</title>
    </sec>
    <sec id="sec-3">
      <title>Important Passage Retrieval</title>
      <p>
        Text and speech information sources in uence the
complexity of the important passage retrieval approaches
differently. For textual passage retrieval, it is common to use
complex information, such as syntactic [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ], semantic [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ],
and discourse information [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ], either to assess relevance
or reduce the length of the output. However, speech
important passage retrieval approaches have an extra layer of
complexity, caused by speech-related issues like recognition
errors or dis uencies. As a result, it is useful to use
speechspeci c information (e.g.: acoustic/prosodic features [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ],
recognition con dence scores [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ]), or by improving both the
assessment of relevance and the intelligibility of automatic
speech recognizer transcriptions (by using related
information [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]). These problems not only increase the di culty
in determining the salient information, but also constrain
the applicability of passage retrieval techniques to speech
passage retrieval. Nevertheless, shallow text summarization
approaches such as Latent Semantic Analysis (LSA) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and
Maximal Marginal Relevance (MMR) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] seem to achieve
performances comparable to the ones using speci c
speechrelated features [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. In addition, discourse features start to
gain some importance in speech retrieval [
        <xref ref-type="bibr" rid="ref15 ref35">15, 35</xref>
        ].
      </p>
      <p>
        Closely related to the important passage retrieval used by
this work are approaches using the unsupervised key phrase
extraction methods. These methods are used to reinforce
passage retrieval [
        <xref ref-type="bibr" rid="ref11 ref25 ref27 ref31 ref34">34, 31, 11, 25, 27</xref>
        ]. Namely, they propose
the use of key phrases to summarize news articles [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and
meetings [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]. In [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], the authors explored both supervised
and unsupervised methods with a limited set of features to
extract key phrases as a rst step towards important passage
retrieval. Furthermore, the important passage retrieval used
in this work adapts the centrality retrieval model, which
plays an important role in the whole process. This kind of
model adaptation is explored in [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ], where the rst stage of
their method consists in a simple key phrase extraction step,
based on part-of-speech patterns; then, these key phrases are
used to de ne the relevance and redundancy components of
a MMR summarization model.
      </p>
      <p>Most of the IPR methods could be easily adapted to be
secure using the method described in Section 4. We opted to
use the KP-Centrality method described in the next section
because it has the current state-of-the-art IPR method.
2.2</p>
    </sec>
    <sec id="sec-4">
      <title>Privacy Preserving Methods</title>
      <p>
        In this work, we focused on creating a method to perform
important passage retrieval keeping the information in the
original documents private. There is a large body of
literature on important passage retrieval and privacy preserving
or secure methods. To the best of our knowledge, the
combination of both research lines has not been explored yet.
However, there are some recent works combining
information retrieval and privacy. Most of these works use data
encryption [
        <xref ref-type="bibr" rid="ref12 ref17 ref7 ref8">8, 17, 7, 12</xref>
        ] to transfer the data in a secure way.
This does not solve our problem because the content of the
document would be decrypted by the retrieval method and
therefore it would not remain con dential to the retrieval
method. Another alternative secure information retrieval
methodology is to obfuscate queries, which hides user
topical intention [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], but does not secure documents content.
      </p>
      <p>
        In many areas the interest in privacy-preserving methods
2.1
where two or more parties are involved and they wish to
jointly perform a given operation without disclosing their
private information is not new, and several techniques such
as Garbled Circuits [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ], Homomorphic Encryption [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ],
Locality-Sensitive Hashing [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] have been introduced. However,
they all have limitations regarding the Important Passage
Retrieval task we wish to address. Until recently Garbled
Circuits were extremely ine cient to use due to several
intrinsic issues, and even now it is di cult to adapt them when
the computation of non-linear operations is required.
Solutions to many of these problems have been developed, such
as performing o ine computation of the oblivious
transfers, using shorter ciphers, evaluating XOR gates for 'free',
etc. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Systems based on Homomorphic Encryption
techniques introduce substantial amounts of computational
overhead and usually require extremely long amounts of time
to evaluate any function of interest. The Locality-Sensitive
Hashing technique allows for near-exact match detection
between data points, but does not provide any actual notion
of distance, leading to degradation of performance in some
applications. As a result, we decided to consider Secure
Binary Embeddings [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] as the data privacy method for our
approach, as it does not show any of the disadvantages
mentioned above for the task at hand. We describe this
technique in depth in Section 4.
      </p>
    </sec>
    <sec id="sec-5">
      <title>3. IMPORTANT PASSAGE RETRIEVAL</title>
      <p>
        To determine the most important sentences of an
information source, we used the KP-Centrality model [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. We
chose this model for its adaptability to di erent types of
information sources (e.g., text, audio and video) and
state-ofthe-art performance. It is based on the notion of combining
key phrases with support sets. A support set is a group
of the most semantically related passages. These semantic
passages are selected using heuristics based on the passage
order method [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. This type of heuristic explore the
structure of the input source to partition the candidate passages
to be included in the support set in two subsets: the ones
closer to the passage associated with the support set under
construction and the ones further apart. These heuristics
use a permutation, di1; di2; ; diN 1, of the distances of the
passages sk to the passage pi, related to the support set
under construction, with dik = dist(sk; pi), 1 k N 1,
where N is the number of passages, corresponding to the
order of occurrence of passages sk in the input source. The
metric that is normally used is the cosine distance.
      </p>
      <p>
        The KP-Centrality method consists of two steps, as
illustrated in Figure 1. First, it extracts key phrases using
a supervised approach [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and combines them with a
bagof-words model in a compact matrix representation, given
by:
2w(t1; p1) : : :
64 ...
      </p>
      <p>w(tT ; p1) : : :
w(t1; pN )</p>
      <p>w(t1; k1) : : :
w(tT ; pN ) w(tT ; k1) : : :
w(t1; kM )3
.
.</p>
      <p>.
w(tT ; kM )
7 ;
5
(1)
where w is a function of the number of occurrences of term ti
in passage pj or key phrase kl, T is the number of terms and
M is the number of key phrases. Then, using a segmented
information source I , p1; p2; : : : ; pN , a support set Si is
computed for each passage pi using:</p>
      <p>Si , fs 2 I [ K : sim(s; pi) &gt; "i ^ s 6= pig;
(2)</p>
    </sec>
    <sec id="sec-6">
      <title>4. SECURE BINARY EMBEDDINGS</title>
      <p>
        A Secure Binary Embedding (SBE) is a scheme that
converts real-valued vectors to bit sequences using
band-quantized random projections. These bit sequences, which we
will refer to as hashes, possess an interesting property: if
the Euclidean distance between two vectors is lower than a
threshold, then the Hamming distance between their hashes
is proportional to the Euclidean distance between the
vectors; if it is higher, then the hashes provide no
information about the true distance between the two vectors. This
scheme relies on the concept of Universal Quantization [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
which rede nes scalar quantization by forcing the
quantization function to have non-contiguous quantization regions.
      </p>
      <p>Given an L-dimensional vector x 2 RL, the universal
quantization process converts it to an M -bit binary sequence,
where the m-th bit is given by
qm(x) = Q
hx; ami + wm
:
Here h; i represents a dot product. am 2 RL is a projection
vector comprising L i.i.d. samples drawn from N ( = 0; 2),
is a precision parameter, and wm is a random dither drawn
from a uniform distribution over [0; ]. Q( ) is a
quantization function given by Q(x) = bx mod 2c. We can represent
the complete quantization into M bits compactly in vector
form:
q(x) = Q
1(Ax + w) ;
where q(x) is an M -bit binary vector, which we will refer to
as the hash of x. A 2 RM L is a matrix composed of the
row vectors am, is a diagonal matrix with entries , and
w 2 RM is a vector composed from the dither values wm.</p>
      <p>The universal 1-bit quantizer of Equation 4 maps the real
line onto 1=0 in a banded manner, where each band is m
wide. Figure 2 compares conventional scalar 1-bit
quantization (left panel) with the equivalent universal 1-bit
quantization (right panel).</p>
      <p>
        The binary hash generated by the Universal Quantizer of
Equation 5 has the following properties [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]: the
probabil(4)
(5)
ity that the ith bits, qi(x) and qi(x0) respectively, of hashes
of two vectors x and x0 are identical depends only on the
Euclidean distance dE = kx x0k between the vectors and
not on their actual values. As a consequence, the
following relationship can be shown [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]: given any two vectors
x and x0 with a Euclidean distance dE, with probability
at most e 2t2M the normalized (per-bit) Hamming distance
dH (q(x); q(x0)) between the hashes of x and x0 is bounded
by:
(6)
where t is the control factor. The above bound means
that the Hamming distance dH (q(x); q(x0)) is correlated to
the Euclidean distance dE between the two vectors, if dE is
lower than a threshold (which depends on ). Speci cally,
for small dE, E[dH (q(x); q(x0))], the expected Hamming
distance, can be shown to be bounded by p2 1 1dE,
which is linear in dE. However, if the distance between x
and x0 is higher than this threshold, then dH (q(x); q(x0)) is
bounded by 0:5 4 2exp 0:5 2 2 2d2E , which rapidly
converges to 0:5 and e ectively gives us no information
whatsoever about the true distance between x and x0.
      </p>
      <p>In order to illustrate how this scheme works, we
randomly generated pairs of vectors in a high-dimensional space
(L = 1024) and plotted the normalized Hamming distance
between their hashes against the Euclidean distance between
them (Figure 3). The number of bits in the hash is also
shown in the gures.</p>
      <p>We note that in all cases, once the normalized distance
exceeds , the Hamming distance between the hashes of two
vectors ceases to provide any information about the true
distance between the vectors. We will nd this property useful
in developing our privacy-preserving system. Changing the
value of the precision parameter allows us to adjust the
distance threshold until which the Hamming distance is
informative. Increasing the number of bits M leads to a
reduction of the variance of the Hamming distance. A converse
property of the embeddings is that for all x0 except those
that lie within a small radius of any x, dH (q(x); q(x0))
provides little information about how close x0 is to x. It can
be shown that the embedding provides information theoretic
security beyond this radius, if the embedding parameters A
and w are unknown to the potential eavesdropper. Any
algorithm attempting to recover a signal x from its embedding
q(x) or to infer anything about the relationship between two
signals su ciently far apart using only their embeddings will
fail to do so. Furthermore, even in the case where A and w
are known, it seems computationally intractable to derive x
from q(x) unless one can guess a starting point very close to
x. In e ect, it is infeasible to invert the SBE without strong
a priori assumptions about x.</p>
    </sec>
    <sec id="sec-7">
      <title>5. SECURE IMPORTANT PASSAGE</title>
    </sec>
    <sec id="sec-8">
      <title>RETRIEVAL</title>
      <p>Our approach for a privacy-preserving important passage
retrieval system closely follows the formulation presented in
Section 3, and it is illustrated in Figure 4. However, this is
a very important di erence in terms of who performs each
of the steps. Typically there is only one party involved,
Alice, who owns the original documents, performs key phrase
extraction, combines them with the bag-of-words model in a
compact matrix representation, computes the support sets
for each documents and nally uses to retrieve the
important passages. In our scenario, Alice does not know how to
extract the important passages from them or does not
possess the computational power to do so. Therefore, she must
outsource the retrieval process to a third-party, Bob, who
has these capabilities. However, Alice must rst obfuscate
the information contained in the compact matrix
representation. If Bob receives this information as is, he could use the
term frequencies to infer on the contents of the original
documents and gain access to private or classi ed information
Alice does not wish to disclosure to anyone. Alice computes
binary hashes of her compact matrix representation using
the method described in Section 4, keeping the
randomization parameters A and w to herself. She sends these hashes
to Bob, who computes the support sets and extracts the
important passages. Because Bob receives binary hashes
instead of the original matrix representation, he must use the
normalized Hamming distance instead of the cosine distance
in this step, since it is the metric the SBE hashes best
relate to. Finally, we returns the hashes corresponding to the
important passages to Alice, who then uses them to get the
information she desires.</p>
    </sec>
    <sec id="sec-9">
      <title>EXPERIMENTS</title>
      <p>In this section we illustrate the performance of our
privacypreserving approach to Important Passage Retrieval and
how it compares to its non-private counterpart. We start
by presenting the datasets we used in our experiments, then
we describe the experimental setup and nally we present
some results.
6.1</p>
    </sec>
    <sec id="sec-10">
      <title>Datasets</title>
      <p>
        In order to evaluate our approach, we performed
experiments on the English version of the Concisus dataset [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] and
the Portuguese Broadcast News (PT BN) dataset [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. The
Concisus dataset is composed by seventy eight event reports
and respective summaries, distributed across three di erent
Metric
Cosine distance
Euclidean distance
types of events: aviation accidents, earthquakes, and train
accidents. This corpus also contains comparable data in
Spanish. However, since our Automatic Key Phrase
Extraction (AKE) system uses some language-dependent features,
we opted for not using in this part of the dataset in previous
work [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] nor in this one.
      </p>
      <p>The PT BN dataset consists of automatic transcriptions
of eighteen broadcast news stories in European Portuguese,
which are part of a news program. News stories cover several
generic topics like society, politics and sports, among others.
For each news story, there is a human-produced abstract,
used as reference.
6.2</p>
    </sec>
    <sec id="sec-11">
      <title>Setup</title>
      <p>
        We extracted key phrases from both datasets using the
MAUI toolkit [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] expanded with shallow semantic features,
such as number of named entities, part-of-speech tags and
four n-gram domain model probabilities. This expanded
feature set leads to improvements regarding the quality of the
key phrases. Regarding the Concisus dataset, we extracted
yet additional features, such as the detection of rhetorical
devices, which further improved the key phrase extraction
process [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. As for the PT BN dataset, we only used the
shallow semantic features as the remaining features were not
available [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
6.3
      </p>
    </sec>
    <sec id="sec-12">
      <title>Results</title>
      <p>
        We present some baseline experiments in order to obtain
reference values for our approach. We generated three
passages summaries for Concisus Dataset, which are commonly
found in online news web sites like Google News. In the
experiments using the PT BN dataset, the summary size was
determined by the size of the reference human summaries,
which consisted in about 10% of the input news story. For
both experiments, we used the Cosine and the Euclidean
distance as evaluation metrics, since the rst is the usual metric
for computing textual similarity, but the second is the one
that relates to the Secured Binary Embeddings technique.
All results are presented in terms of ROUGE [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], in
particular ROUGE-1, which is the most widely used evaluation
measure for this scenario. The results we obtained for the
Concisus and the PT BN datasets are presented in Tables 1
and 2, respectively.
      </p>
      <p>
        We considered forty key phrases in our experiments since
it is the usual choice when news articles are considered [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
As expected, we notice some slight degradation when the
Euclidean distance is considered, but we still achieve better
results than other state-of-the-art methods such as default
centrality [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] and LexRank [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Reported results in the
leakage
bpc=4
bpc=8
bpc=16
      </p>
      <p>5%
0.365
0.424
0.384
leakage
bpc=4
bpc=8
bpc=16</p>
      <p>
        5%
0.314
0.327
0.338
literature include ROUGE-1 = 0:443 and 0:531 using default
Centrality and ROUGE-1 = 0:428 and 0:471 using LexRank
for the Concisus and PT BN datasets, respectively [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. This
means that the forced change of metric due to the intrinsic
properties of SBE does not a ect the validity of our approach
in any way.
      </p>
      <p>For our privacy-preserving approach we performed
experiments using di erent values for the SBE parameters. The
results we obtained in terms of ROUGE for the Concisus
and the PT BN datasets are presented in Tables 3 and 4,
respectively. Leakage refers to the fraction of SBE hashes for
which the normalized Hamming distance dH is proportional
to the Euclidean distance dE between the original data
vectors. The amount of leakage is exclusively controlled by .
Bits per coe cient (bpc) is the ratio between the number
of measurements M and the dimensionality of the original
data vectors L, i.e., bpc = M=L. As expected, increasing
the amount of leakage (i.e. increasing ) leads to
improving the retrieval results. Surprisingly, changing the values
of bpc does not lead to improved performance. The reason
for this results might be due to the KP-Centrality method
using support sets that consider multiple partial
representations of the documents. Nevertheless, the most signi cant
results is that for 95% leakage there is an almost negligible
loss of performance. This scenario, however, does not
violate our privacy requisites in any way, since although most
of the distances between hashes are known, there is no way
to use this information to learn anything about the original
underlying information.</p>
    </sec>
    <sec id="sec-13">
      <title>CONCLUSIONS AND FUTURE WORK</title>
      <p>In this work, we introduced a privacy-preserving technique
for performing Important Passage Retrieval that performs
similarly to their non-private counterpart. Our Secure
Binary Embeddings based approach provides secure document
representations that allows for sensitive information to be
processed by third parties without any risk of sensitive
information disclosure. Although there was some slight
degradation of results regarding the baseline, our approach still
outperforms other state-of-the-art methods like default
Centrality and LexRank, but with important advantage that no
private or classi ed information is disclosed to third parties.</p>
      <p>For future work we intend to use the secure representation
based on Secure Binary Embeddings in multi-document
important passage retrieval. Another additional research line
that we would like to purse is to apply this privacy
preserving technique in other Information Retrieval tasks, such as
classi ed military and medical records retrieval.
8.</p>
    </sec>
    <sec id="sec-14">
      <title>ACKNOWLEDGMENTS</title>
      <p>We would like to thank FCT for supporting this research
through PPEst-OE/EEI/LA0021/2013, the Carnegie
Mellon Portugal Program, PTDC/EIA-CCO/122542/2010, and
grants SFRH/BD/33769/2009 and SFRH/BD/71349/2010.
We would like to thank NSF for supporting this research
through grant 1017256.
9.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bellare</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. T.</given-names>
            <surname>Hoang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Keelveedhi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Rogaway</surname>
          </string-name>
          .
          <article-title>E cient garbling from a xed-key blockcipher</article-title>
          .
          <source>In IEEE Symposium on SP</source>
          , pages
          <volume>478</volume>
          {
          <fpage>492</fpage>
          . IEEE,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Boufounos</surname>
          </string-name>
          .
          <article-title>Universal rate-e cient scalar quantization</article-title>
          .
          <source>IEEE Transactions on Information Theory</source>
          ,
          <volume>58</volume>
          (
          <issue>3</issue>
          ):
          <year>1861</year>
          {
          <year>1872</year>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Boufounos</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Rane</surname>
          </string-name>
          .
          <article-title>Secure binary embeddings for privacy preserving nearest neighbors</article-title>
          .
          <source>In WIFS 2011</source>
          ,
          <article-title>pages 1{6</article-title>
          . IEEE,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Carbonell</surname>
          </string-name>
          and J.
          <string-name>
            <surname>Goldstein</surname>
          </string-name>
          .
          <article-title>The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries</article-title>
          .
          <source>In SIGIR 1998: Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , pages
          <volume>335</volume>
          {
          <fpage>336</fpage>
          . ACM,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Erkan</surname>
          </string-name>
          and
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Radev</surname>
          </string-name>
          . LexRank:
          <article-title>Graph-based Centrality as Salience in Text Summarization</article-title>
          .
          <source>Journal of Arti cial Intelligence Research</source>
          ,
          <volume>22</volume>
          :
          <fpage>457</fpage>
          {
          <fpage>479</fpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P.</given-names>
            <surname>Indyk</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Motwani</surname>
          </string-name>
          .
          <article-title>Approximate nearest neighbors: towards removing the curse of dimensionality</article-title>
          .
          <source>In Proceedings of the thirtieth annual ACM symposium on Theory of computing</source>
          , pages
          <volume>604</volume>
          {
          <fpage>613</fpage>
          . ACM,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>W.</given-names>
            <surname>Jiang</surname>
          </string-name>
          and
          <string-name>
            <given-names>B.</given-names>
            <surname>Samanthula</surname>
          </string-name>
          .
          <article-title>N-gram based secure similar document detection</article-title>
          . In Y. Li, editor,
          <source>Data and Applications Security</source>
          and
          <string-name>
            <surname>Privacy</surname>
            <given-names>XXV</given-names>
          </string-name>
          , volume
          <volume>6818</volume>
          of Lecture Notes in Computer Science, pages
          <volume>239</volume>
          {
          <fpage>246</fpage>
          . Springer Berlin Heidelberg,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>W.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Si</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          .
          <article-title>Protecting source privacy in federated search</article-title>
          .
          <source>In SIGIR 2007: Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '07</source>
          , pages
          <fpage>761</fpage>
          {
          <fpage>762</fpage>
          , New York, NY, USA,
          <year>2007</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>T. K.</given-names>
            <surname>Landauer</surname>
          </string-name>
          and
          <string-name>
            <given-names>S. T.</given-names>
            <surname>Dumais</surname>
          </string-name>
          .
          <article-title>A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction and Representation of Knowledge</article-title>
          .
          <source>Psych. Review</source>
          ,
          <volume>104</volume>
          (
          <issue>2</issue>
          ):
          <volume>211</volume>
          {
          <fpage>240</fpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>C.-Y. Lin</surname>
          </string-name>
          .
          <article-title>ROUGE: A Package for Automatic Evaluation of Summaries</article-title>
          . In M.-
          <string-name>
            <given-names>F.</given-names>
            <surname>Moens</surname>
          </string-name>
          and S. Szpakowicz, editors,
          <source>Text Summ. Branches Out: Proc. of the ACL-04 Workshop</source>
          , pages
          <volume>74</volume>
          {
          <fpage>81</fpage>
          . Association for Computational Linguistics,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Litvak</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Last</surname>
          </string-name>
          .
          <article-title>Graph-Based Keyword Extraction for Single-Document Summarization</article-title>
          .
          <source>In Coling 2008: MMIES</source>
          , pages
          <volume>17</volume>
          {
          <fpage>24</fpage>
          .
          <article-title>Coling 2008 Org</article-title>
          . Committee,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>W.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Varna</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Wu</surname>
          </string-name>
          .
          <article-title>Con dentiality-preserving image search: A comparative study between homomorphic encryption and distance-preserving randomization</article-title>
          . Access, IEEE,
          <volume>2</volume>
          :
          <fpage>125</fpage>
          {
          <fpage>141</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>L.</given-names>
            <surname>Marujo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gershman</surname>
          </string-name>
          , J. Carbonell, R. Frederking, and
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Neto</surname>
          </string-name>
          .
          <article-title>Supervised topical key phrase extraction of news stories using crowdsourcing, light ltering and co-reference normalization</article-title>
          .
          <source>In Proceedings of the Eighth International Language Resources and Evaluation (LREC</source>
          <year>2012</year>
          ),
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>L.</given-names>
            <surname>Marujo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Viveiros</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Neto</surname>
          </string-name>
          .
          <article-title>Keyphrase Cloud Generation of Broadcast News</article-title>
          .
          <source>In Interspeech</source>
          <year>2011</year>
          . ISCA,
          <year>September 2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S. R.</given-names>
            <surname>Maskey</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Hirschberg</surname>
          </string-name>
          . Comparing Lexical, Acoustic/Prosodic,
          <article-title>Structural and Discourse Features for Speech Summarization</article-title>
          .
          <source>In Proceedings of the 9th EUROSPEECH - INTERSPEECH</source>
          <year>2005</year>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>O.</given-names>
            <surname>Medelyan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Perrone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and I. H.</given-names>
            <surname>Witten</surname>
          </string-name>
          .
          <article-title>Subject metadata support powered by Maui</article-title>
          .
          <source>In Proceedings of the JCDL '10</source>
          ,
          <string-name>
            <surname>page</surname>
            <given-names>407</given-names>
          </string-name>
          , New York, USA,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>M.</given-names>
            <surname>Murugesan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Clifton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Si</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Vaidya</surname>
          </string-name>
          .
          <article-title>E cient privacy-preserving similar document detection</article-title>
          .
          <source>The VLDB Journal</source>
          ,
          <volume>19</volume>
          (
          <issue>4</issue>
          ):
          <volume>457</volume>
          {
          <fpage>475</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>P.</given-names>
            <surname>Paillier</surname>
          </string-name>
          .
          <article-title>Public-key cryptosystems based on composite degree residuosity classes</article-title>
          .
          <source>In EUROCRYPT'99</source>
          , pages
          <fpage>223</fpage>
          {
          <fpage>238</fpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>H.</given-names>
            <surname>Pang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Xiao</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Shen</surname>
          </string-name>
          .
          <article-title>Obfuscating the topical intention in enterprise text search</article-title>
          .
          <source>In Data Engineering (ICDE)</source>
          ,
          <year>2012</year>
          IEEE 28th International Conference on, pages
          <volume>1168</volume>
          {
          <fpage>1179</fpage>
          ,
          <string-name>
            <surname>April</surname>
          </string-name>
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>G.</given-names>
            <surname>Penn</surname>
          </string-name>
          and
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhu</surname>
          </string-name>
          .
          <article-title>A Critical Reassessment of Evaluation Baselines for Speech Summarization</article-title>
          .
          <source>In Proc. of ACL-08: HLT</source>
          , pages
          <volume>470</volume>
          {
          <fpage>478</fpage>
          .
          <string-name>
            <surname>ACL</surname>
          </string-name>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>J.</given-names>
            <surname>Portelo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Raj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Boufounos</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.</surname>
          </string-name>
          <article-title>Trancoso, and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Alberto</surname>
          </string-name>
          .
          <article-title>Speaker veri cation using secure binary embeddings</article-title>
          .
          <source>In EUSIPO</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>R.</given-names>
            <surname>Ribeiro</surname>
          </string-name>
          and
          <string-name>
            <surname>D. M. de Matos</surname>
          </string-name>
          .
          <article-title>Revisiting Centrality-as-Relevance: Support Sets and Similarity as Geometric Proximity</article-title>
          .
          <source>Journal of Arti cial Intelligence Research</source>
          ,
          <volume>42</volume>
          :
          <fpage>275</fpage>
          {
          <fpage>308</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>R.</given-names>
            <surname>Ribeiro</surname>
          </string-name>
          and
          <string-name>
            <surname>D. M. de Matos.</surname>
          </string-name>
          Multi-source,
          <article-title>Multilingual Information Extraction and Summarization, chapter Improving Speech-to-Text Summarization by Using Additional Information Sources</article-title>
          .
          <source>Theory and Applications of NLP</source>
          . Springer,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>R.</given-names>
            <surname>Ribeiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Marujo</surname>
          </string-name>
          , D. Martins de Matos,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Neto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gershman</surname>
          </string-name>
          , and J. Carbonell.
          <article-title>Self reinforcement for important passage retrieval</article-title>
          .
          <source>In SIGIR 2013: Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '13</source>
          , pages
          <fpage>845</fpage>
          {
          <fpage>848</fpage>
          . ACM,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>K.</given-names>
            <surname>Riedhammer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Favre</surname>
          </string-name>
          , and
          <string-name>
            <surname>D.</surname>
          </string-name>
          Hakkani-Tu
          <article-title>r. Long story short { Global unsupervised models for keyphrase based meeting summarization</article-title>
          .
          <source>Speech Communication</source>
          ,
          <volume>52</volume>
          :
          <fpage>801</fpage>
          {
          <fpage>815</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>H.</given-names>
            <surname>Saggion</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Szasz</surname>
          </string-name>
          .
          <article-title>The concisus corpus of event summaries</article-title>
          .
          <source>In Proceedings of the Eighth International Language Resources and Evaluation (LREC</source>
          <year>2012</year>
          ),
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>R.</given-names>
            <surname>Sipos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Swaminathan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shivaswamy</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Joachims</surname>
          </string-name>
          .
          <article-title>Temporal corpus summarization using submodular word coverage</article-title>
          .
          <source>In Proc. of CIKM</source>
          , pages
          <volume>754</volume>
          {
          <fpage>763</fpage>
          , New York, NY, USA,
          <year>2012</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <surname>R. I.</surname>
          </string-name>
          <article-title>Tucker and</article-title>
          K. Sparck Jones.
          <article-title>Between shallow and deep: an experiment in automatic summarising</article-title>
          .
          <source>Technical Report 632</source>
          , University of Cambridge,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>V. R.</given-names>
            <surname>Uz</surname>
          </string-name>
          ^eda,
          <string-name>
            <given-names>T. A. S.</given-names>
            <surname>Pardo</surname>
          </string-name>
          , and
          <string-name>
            <surname>M. das Gracas Volpe Nunes</surname>
          </string-name>
          .
          <article-title>A comprehensive comparative evaluation of RST-based summarization methods</article-title>
          .
          <source>ACM Trans. on Speech and Language Processing</source>
          ,
          <volume>6</volume>
          (
          <issue>4</issue>
          ):1{
          <fpage>20</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>L.</given-names>
            <surname>Vanderwende</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Suzuki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Brockett</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Nenkova</surname>
          </string-name>
          . Beyond SumBasic:
          <article-title>Task-focused summarization and lexical expansion</article-title>
          .
          <source>Information Processing and Management</source>
          ,
          <volume>43</volume>
          :
          <fpage>1606</fpage>
          {
          <fpage>1618</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>X.</given-names>
            <surname>Wan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Xiao</surname>
          </string-name>
          .
          <article-title>Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction</article-title>
          .
          <source>In Proceedings of the 45th ACL</source>
          , pages
          <volume>552</volume>
          {
          <fpage>559</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <surname>A. C.-C. Yao</surname>
          </string-name>
          .
          <article-title>Protocols for secure computations</article-title>
          .
          <source>In Foundations of Computer Science (FOCS)</source>
          , volume
          <volume>82</volume>
          , pages
          <fpage>160</fpage>
          {
          <fpage>164</fpage>
          ,
          <year>1982</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>K.</given-names>
            <surname>Zechner</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Waibel</surname>
          </string-name>
          .
          <article-title>Minimizing Word Error Rate in Textual Summaries of Spoken Language</article-title>
          .
          <source>In Proceedings of the North American Chapter of the ACL (NAACL)</source>
          , pages
          <fpage>186</fpage>
          {
          <fpage>193</fpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zha</surname>
          </string-name>
          .
          <article-title>Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering</article-title>
          .
          <source>In SIGIR 2002: Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , pages
          <volume>113</volume>
          {
          <fpage>120</fpage>
          . ACM,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. H. Y.</given-names>
            <surname>Chan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Fung</surname>
          </string-name>
          .
          <article-title>Extractive Speech Summarization Using Shallow Rhetorical Structure Modeling</article-title>
          .
          <source>IEEE Transactions on Audio, Speech, and Language Processing (TASLP)</source>
          ,
          <volume>18</volume>
          (
          <issue>6</issue>
          ):
          <volume>1147</volume>
          {
          <fpage>1157</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>