<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>at the EYRE 2020 Entity Sum marization Task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Qingxia Liu</string-name>
          <email>qxliu2013@smail.nju.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gong Cheng</string-name>
          <email>gcheng@nju.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yuzhong Qu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Galway, Ireland</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Commons License Attribution 4.0 International</institution>
          ,
          <addr-line>CC BY 4.0</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>State Key Laboratory for Novel Software Technology, Nanjing University</institution>
          ,
          <addr-line>Nanjing 210023</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Entity summaries provide human users with the key information about an entity. In this system paper, we present the implementation of our entity summarizer ESSTER. It aims at generating entity summaries that contain structurally important triples and exhibit high readability and low redundancy. For structural importance, we exploit the global and local characteristics of properties and values in RDF data. For readability, we learn the familarity of properties from a text corpus. To reduce redundancy, we perform logical reasoning and compute textual and numerical similarity between triples. ESSTER solves a combinatorial optimization problem to integrate these features. It achieves state-of-the-art results on the ESBM v1.2 dataset.</p>
      </abstract>
      <kwd-group>
        <kwd>Entity summarization</kwd>
        <kwd>readability</kwd>
        <kwd>redundancy</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In RDF data, an entity is described by a possibly large
set (e.g., hundreds) of RDF triples. The entity
summarization task is to automatically generate a compact
summary to provide human users with the key
information about an entity. Specifically, an entity
summary is a size-constrained subset of triples selected
from an entity description. Current methods [
        <xref ref-type="bibr" rid="ref1">1, 2, 3, 4,
5, 6</xref>
        ] are mainly focused on selecting important triples,
but ignore the reading experience of human users. In
this system paper, we present the implementation of
our entity summarizer named ESSTER [7].1 It aims at
generating entity summaries of structural importance,
high readability, and low redundancy. Improving
textual readability and reducing information redundancy
help to enhance the reading experience of users.
Experiments on the ESBM v1.2 dataset [8] show that
ES
      </p>
      <sec id="sec-1-1">
        <title>STER achieves state-of-the-art results.</title>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Definition</title>
      <p>RDF data is a set of subject-predicate-object triples  .
For an entity  , its description desc( ) is the subset of
triples in  such that  is the subject or the object. Each
triple  ∈ desc( )</p>
      <p>provides a property-value pair ⟨,  ⟩
for  . When  is the subject of  , the property  is  ’s
orcid: 0000-0001-6706-3776 (Q. Liu); 0000-0003-3539-7776 (G.
Cheng); 0000-0003-2777-8149 (Y. Qu)
© 2020 Copyright for this paper by its authors. Use permitted under Creative</p>
      <p>CEUR
Workshop
1https://github.com/nju-websoft/ESSTER
of desc( ) satisfying | | ≤ 
define
predicate and the value  is  ’s object. When  is the
object of  , the property  is the inverse of  ’s predicate
and the value  is  ’s subject. For convenience, we
prop( ) =  and val( ) =  . Given an integer
size constraint</p>
      <p>, an entity summary  for  is a subset</p>
    </sec>
    <sec id="sec-3">
      <title>3. Implementation of ESSTER</title>
      <p>ESSTER considers structural importance, readability,
and redundancy. Below we present their computation
and finally integrate them by solving a combinatorial
optimization problem.</p>
      <sec id="sec-3-1">
        <title>3.1. Structural Importance</title>
        <p>two perspectives.</p>
        <p>We measure the structural importance of a triple  from</p>
        <p>First, globally popular properties often reflect
important aspects of entities, while globally unpopular
values are informative. Therefore, we compute the
global importance of a triple as follows:
glb( ) = ppop</p>
        <p>global( ) ⋅ (1 − vpop( )) ,
ppop
global( ) =
vpop( ) =
log(pfreq</p>
        <p>global( ) + 1)
log(| | + 1)
log(vfreq( ) + 1)
log(| | + 1)</p>
        <p>(1)
pfreq
where prop
triples in 
where  is the set of all entities described in RDF data  ,
global( ) is the number of entity descriptions in 
( ) appears, and vfreq( ) is the number of
where val( ) is the value.</p>
        <p>Second, multi-valued properties are intrinsically
popular compared with single-valued properties. To
compensate for this, we penalize multi-valued properties
portance of a triple as follows:
by using local popularity. We compute the local im- and val(  ) are equal, and rdfs:subPropertyOf is a
 ≠  ,
(8)
(9)
relation between prop(  ) and prop(  ).</p>
        <p>Otherwise, we rely on the similarity between
properties and the similarity between values:
sim(  ,   ) = max{simp(  ,   ), simv(  ,   ), 0} ,</p>
        <p>(6)
ical values. We compute
For simv, we diferentiate between two cases.
where for simp we use the ISub string similarity [9].</p>
        <p>In the first case, val(  ) and val(  ) are both
numersimv(  ,   ) =
{
−1
min{val(  ),val(  )}
max{val(  ),val(  )}
val(  ) ⋅ val(  ) ≤ 0 ,
otherwise .</p>
        <p>(7)</p>
        <p>In all other cases, we simply use ISub for simv.</p>
        <p>We formulate entity summarization as a 0-1 quadratic
knapsack problem (QKP), and we solve it using a
heuristic algorithm [10].</p>
        <p>Specifically, we define the profit of choosing two
triples   ,   for a summary:</p>
        <p>{
profit
, =
(1 −  ) ⋅ ( struct(  ) +  text(  ))  =  ,
 ⋅ (−sim(  ,   ))
loc( ) = (1 − ppoplocal( )) ⋅ vpop( ) ,
ppoplocal( ) =
log(pfreq</p>
        <p>local( ) + 1)
log(|desc( )| + 1)</p>
        <p>,
where prop( )</p>
        <p>is the property.
where pfreqlocal( ) is the number of triples in desc( )</p>
        <p>
          Finally, we compute structural importance:
 struct( ) =  ⋅ glb( ) + (1 −  ) ⋅ loc( ) ,
(2)
(3)
where  ∈ [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ] is a parameter to tune.
3.2. Textual Readability
property is familiar to users if it is often used in an
open-domain corpus. Specifically, given a text corpus
of  documents where 
the name of prop( ) appears. We compute
by the user, let  ( ) be the number of documents where
        </p>
        <p>documents have been read
 ( ) =</p>
        <p>− ( )
min( ( ), )  ( )) ⋅ (  − ) ⋅ familarity( ) ,
∑ ( 
 =0
familarity( ) =</p>
        <p>( )
log( + 1)
log( + 1)
Here,  represents the number of documents the user
has read where the name of prop( ) appears, based
on which familarity( ) gives the degree of
familarity of prop( ) to the user. However, it is dificult</p>
        <p>in practice, so  ( ) computes the expected
value of familarity( )
 is a constant. In the experiments we set 
we use the Google Books Ngram2 as our corpus.</p>
        <p>= 40 and
. For simplicity, we assume</p>
        <sec id="sec-3-1-1">
          <title>Finally, we compute textual readability:</title>
          <p>text( ) = log( ( ) + 1).</p>
          <p>To generate readable summaries, we measure the
familiarity of a triple  based on its property prop( ). A 3.4. Combinatorial Optimization
(4)</p>
          <p>
            Finally, our goal is to
where  ∈ [
            <xref ref-type="bibr" rid="ref1">0, 1</xref>
            ] is a parameter to tune.
          </p>
          <p>maximize
subject to
|desc( )| |desc( )|
|desc( )|
∑
 =1
∑
 =1
∑
 =
  ≤  ,
profit</p>
          <p>, ⋅   ⋅   ,
  ∈ {0, 1} for all  = 1 … |desc( )| .
(5)</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <sec id="sec-4-1">
        <title>4.1. Settings</title>
      </sec>
      <sec id="sec-4-2">
        <title>3.3. Information Redundancy</title>
        <p>To reduce redundancy in summaries, we measure the in DBpedia and LinkedMDB. We follow the provided
similarity between two triples   ,   in various ways.</p>
        <p>First, we perform logical reasoning to measure on- dation, and we use the training and development sets
tological similarity. We define</p>
        <p>sim(  ,   ) = 1 if prop(  )
and prop(  ) are rdf:type, and rdfs:subClassOf
is a relation between val(  ) and val(  ); or if val(  ) the evaluation metric.
for tuning our parameters  and  by grid search in the
range of 0–1 with 0.01 increments. We use F1 score as
We use the ESBM v1.2 dataset [8]. It provides
groundtruth summaries under  = 5 and</p>
        <p>= 10 for entities
training-development-test splits for 5-fold cross
valiTable 1 [2] G. Cheng, T. Tran, Y. Qu, RELIN: relatedness and
F1 Scores informativeness-based centrality for entity
sum</p>
        <p>DBpedia LinkedMDB marization, in: ISWC’11, Part I, 2011, pp. 114–
 = 5  = 10  = 5  = 10 129. doi:10.1007/978-3-642-25073-6_8.
RELIN 0.242 0.455 0.203 0.258 [3] K. Gunaratna, K. Thirunarayan, A. P. Sheth,
DIVERSUM 0.249 0.507 0.207 0.358 FACES: diversity-aware entity summarization
FACES 0.270 0.428 0.169 0.263 using incremental hierarchical conceptual
clusFACES-E 0.280 0.488 0.313 0.393 tering, in: AAAI’15, 2015, pp. 116–122.
CD 0.283 0.513 0.217 0.331 [4] K. Gunaratna, K. Thirunarayan, A. P. Sheth,
LinkSUM 0.287 0.486 0.140 0.279 G. Cheng, Gleaning types for literals in RDF
BAFREC 0.335 0.503 0.360 0.402 triples with application to entity summarization,
KAFCA 0.314 0.509 0.244 0.397
MPSUM 0.314 0.512 0.272 0.423 in: ESWC’16, 2016, pp. 85–100. doi:10.1007/
ESSTER 0.324 0.521 0.365 0.452 978-3-319-34129-3_6.</p>
        <p>[5] A. Thalhammer, N. Lasierra, A. Rettinger,</p>
        <p>LinkSUM: Using link analysis to summarize
en4.2. Results tity data, in: ICWE’16, 2016, pp. 244–261. doi:10.
1007/978-3-319-38791-8_14.</p>
        <p>Table 1 presents the evaluation results. We compare [6] H. Kroll, D. Nagel, W.-T. Balke, BAFREC:
Balancwith known results of existing unsupervised entity sum- ing frequency and rarity for entity
characterizamarizers [8]. On DBpedia under  = 5, BAFREC [6] tion in linked open data, in: EYRE’18, 2018.
achieves the highest F1 score, and is closely followed [7] Q. Liu, G. Cheng, Y. Qu, Entity summarization
by ESSTER. In all the other three settings, ESSTER out- with high readability and low redundancy, Sci.
performs all the baselines. Overall, ESSTER achieves Sin. Inform. 50 (2020) 845–861. doi:10.1360/
state-of-the-art results on ESBM v1.2. SSI-2019-0291.
[8] Q. Liu, G. Cheng, K. Gunaratna, Y. Qu, ESBM:
5. Conclusion an entity summarization benchmark, in:
ESWC’20, 2020, pp. 548–564. doi:10.1007/
In this system paper, we presented the implementa- 978-3-030-49461-2_32.
tion of our entity summarizer ESSTER. By integrat- [9] G. Stoilos, G. B. Stamou, S. D. Kollias, A string
ing structural importance, textual readability, and in- metric for ontology alignment, in: ISWC’05,
formation redundancy via combinatorial optimization, 2005, pp. 624–637. doi:10.1007/11574620_45.
ESSTER achieves state-of-the-art results among unsu- [10] Z. Yang, G. Wang, F. Chu, An efective GRASP
pervised entity summarizers on the ESBM v1.2 dataset. and tabu search for the 0-1 quadratic knapsack
However, the results are not comparable with super- problem, Comput. Oper. Res. 40 (2013) 1176–
vised neural entity summarizers [11, 12]. 1185. doi:10.1016/j.cor.2012.11.023.</p>
        <p>For the future work, we will consider more powerful [11] Q. Liu, G. Cheng, Y. Qu, Deeplens: Deep learning
measures of readability and redundancy, and will in- for entity summarization, in: DL4KG’20, 2020.
corporate these features into a neural network model. [12] J. Li, G. Cheng, Q. Liu, W. Zhang, E. Kharlamov,
K. Gunaratna, H. Chen, Neural entity
summarization with joint encoding and weak
superviAcknowledgments sion, in: IJCAI’20, 2020, pp. 1644–1650. doi:10.
24963/ijcai.2020/228.</p>
        <p>This work was supported by the National Key R&amp;D
Program of China (2018YFB1004300) and by the NSFC
(61772264).</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Liu</surname>
          </string-name>
          , G. Cheng, K. Gunaratna,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Qu</surname>
          </string-name>
          ,
          <article-title>Entity summarization: State of the art and future challenges</article-title>
          , CoRR abs/
          <year>1910</year>
          .08252 (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>