=Paper=
{{Paper
|id=Vol-2699/paper14
|storemode=property
|title=ESSTER at the EYRE 2020 Entity Summarization Task
|pdfUrl=https://ceur-ws.org/Vol-2699/paper14.pdf
|volume=Vol-2699
|authors=Qingxia Liu,Gong Cheng,Yuzhong Qu
|dblpUrl=https://dblp.org/rec/conf/cikm/LiuCQ20
}}
==ESSTER at the EYRE 2020 Entity Summarization Task==
<pdf width="1500px">https://ceur-ws.org/Vol-2699/paper14.pdf</pdf>
<pre>
ESSTER at the EYRE 2020 Entity Summarization Task
Qingxia Liua , Gong Chenga and Yuzhong Qua
a State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China


                                          Abstract
                                          Entity summaries provide human users with the key information about an entity. In this system paper, we present the im-
                                          plementation of our entity summarizer ESSTER. It aims at generating entity summaries that contain structurally important
                                          triples and exhibit high readability and low redundancy. For structural importance, we exploit the global and local character-
                                          istics of properties and values in RDF data. For readability, we learn the familarity of properties from a text corpus. To reduce
                                          redundancy, we perform logical reasoning and compute textual and numerical similarity between triples. ESSTER solves a
                                          combinatorial optimization problem to integrate these features. It achieves state-of-the-art results on the ESBM v1.2 dataset.

                                          Keywords
                                          Entity summarization, readability, redundancy


1. Introduction                                                                                                    predicate and the value 𝑣 is 𝑡’s object. When 𝑒 is the
                                                                                                                   object of 𝑡, the property 𝑝 is the inverse of 𝑡’s predicate
In RDF data, an entity is described by a possibly large                                                            and the value 𝑣 is 𝑡’s subject. For convenience, we
set (e.g., hundreds) of RDF triples. The entity summa-                                                             define prop(𝑡) = 𝑝 and val(𝑡) = 𝑣. Given an integer
rization task is to automatically generate a compact                                                               size constraint 𝑘, an entity summary 𝑆 for 𝑒 is a subset
summary to provide human users with the key infor-                                                                 of desc(𝑒) satisfying |𝑆| ≤ 𝑘.
mation about an entity. Specifically, an entity sum-
mary is a size-constrained subset of triples selected
from an entity description. Current methods [1, 2, 3, 4,                                                           3. Implementation of ESSTER
5, 6] are mainly focused on selecting important triples,
but ignore the reading experience of human users. In                                                               ESSTER considers structural importance, readability,
this system paper, we present the implementation of                                                                and redundancy. Below we present their computation
our entity summarizer named ESSTER [7].1 It aims at                                                                and finally integrate them by solving a combinatorial
generating entity summaries of structural importance,                                                              optimization problem.
high readability, and low redundancy. Improving tex-
tual readability and reducing information redundancy                                                               3.1. Structural Importance
help to enhance the reading experience of users. Ex-
periments on the ESBM v1.2 dataset [8] show that ES-                                                               We measure the structural importance of a triple 𝑡 from
STER achieves state-of-the-art results.                                                                            two perspectives.
                                                                                                                     First, globally popular properties often reflect im-
                                                                                                                   portant aspects of entities, while globally unpopular
2. Task Definition                                                                                                 values are informative. Therefore, we compute the
                                                                                                                   global importance of a triple as follows:
RDF data is a set of subject-predicate-object triples 𝑇 .
For an entity 𝑒, its description desc(𝑒) is the subset of                                                                      glb(𝑡) = ppopglobal (𝑡) ⋅ (1 − vpop(𝑡)) ,
triples in 𝑇 such that 𝑒 is the subject or the object. Each                                                                                log(pfreqglobal (𝑡) + 1)
triple 𝑡 ∈ desc(𝑒) provides a property-value pair ⟨𝑝, 𝑣⟩                                                                ppopglobal (𝑡) =                              ,    (1)
                                                                                                                                              log(|𝐸| + 1)
for 𝑒. When 𝑒 is the subject of 𝑡, the property 𝑝 is 𝑡’s
                                                                                                                                        log(vfreq(𝑡) + 1)
                                                                                                                              vpop(𝑡) =                    ,
Proceedings of the CIKM 2020 Workshops, October 19-20, 2020,                                                                               log(|𝑇 | + 1)
Galway, Ireland
email: qxliu2013@smail.nju.edu.cn (Q. Liu); gcheng@nju.edu.cn                                                      where 𝐸 is the set of all entities described in RDF data 𝑇 ,
(G. Cheng); yzqu@nju.edu.cn (Y. Qu)                                                                                pfreqglobal (𝑡) is the number of entity descriptions in 𝑇
url: http://ws.nju.edu.cn/~gcheng (G. Cheng);
http://ws.nju.edu.cn/~yzqu (Y. Qu)
                                                                                                                   where prop(𝑡) appears, and vfreq(𝑡) is the number of
orcid: 0000-0001-6706-3776 (Q. Liu); 0000-0003-3539-7776 (G.                                                       triples in 𝑇 where val(𝑡) is the value.
Cheng); 0000-0003-2777-8149 (Y. Qu)                                                                                   Second, multi-valued properties are intrinsically pop-
                                    © 2020 Copyright for this paper by its authors. Use permitted under Creative
                                    Commons License Attribution 4.0 International (CC BY 4.0).                     ular compared with single-valued properties. To com-
 CEUR

                                    CEUR Workshop Proceedings (CEUR-WS.org)
                                                                                                                   pensate for this, we penalize multi-valued properties
               http://ceur-ws.org
 Workshop      ISSN 1613-0073
 Proceedings


               1 https://github.com/nju-websoft/ESSTER
by using local popularity. We compute the local im- and val(𝑡𝑗 ) are equal, and rdfs:subPropertyOf is a
portance of a triple as follows:                        relation between prop(𝑡𝑖 ) and prop(𝑡𝑗 ).
                                                           Otherwise, we rely on the similarity between prop-
           loc(𝑡) = (1 − ppoplocal (𝑡)) ⋅ vpop(𝑡) ,     erties and the similarity between values:
                      log(pfreqlocal (𝑡) + 1)       (2)
      ppoplocal (𝑡) =
                         log(|desc(𝑒)| + 1)
                                              ,             sim(𝑡𝑖 , 𝑡𝑗 ) = max{simp (𝑡𝑖 , 𝑡𝑗 ), simv (𝑡𝑖 , 𝑡𝑗 ), 0} , (6)

where pfreqlocal (𝑡) is the number of triples in desc(𝑒) where for simp we use the ISub string similarity [9].
where prop(𝑡) is the property.                           For simv , we differentiate between two cases.
 Finally, we compute structural importance:                 In the first case, val(𝑡𝑖 ) and val(𝑡𝑗 ) are both numer-
                                                         ical values. We compute
      𝑊struct (𝑡) = 𝛼 ⋅ glb(𝑡) + (1 − 𝛼) ⋅ loc(𝑡) ,  (3)                  {
                                                                            −1                    val(𝑡𝑖 ) ⋅ val(𝑡𝑗 ) ≤ 0 ,
where 𝛼 ∈ [0, 1] is a parameter to tune.                 simv (𝑡𝑖 , 𝑡𝑗 ) = min{val(𝑡𝑖 ),val(𝑡𝑗 )}
                                                                                                  otherwise .
                                                                                  max{val(𝑡𝑖 ),val(𝑡𝑗 )}
                                                                                                                           (7)
3.2. Textual Readability                                           In all other cases, we simply use ISub for simv .
To generate readable summaries, we measure the fa-
miliarity of a triple 𝑡 based on its property prop(𝑡). A 3.4. Combinatorial Optimization
property is familiar to users if it is often used in an
                                                          We formulate entity summarization as a 0-1 quadratic
open-domain corpus. Specifically, given a text corpus
                                                          knapsack problem (QKP), and we solve it using a heuris-
of 𝐵 documents where 𝑀 documents have been read
                                                          tic algorithm [10].
by the user, let 𝑏(𝑡) be the number of documents where
                                                             Specifically, we define the profit of choosing two
the name of prop(𝑡) appears. We compute
                                                          triples 𝑡𝑖 , 𝑡𝑗 for a summary:
         min(𝑏(𝑡),𝑀) 𝑏(𝑡)      𝐵−𝑏(𝑡)                                       {
                     ( 𝑚 ) ⋅ ( 𝑀−𝑚 )
 𝑄(𝑡) =      ∑               𝐵
                                       ⋅ familarity(𝑚) ,                      (1 − 𝛿) ⋅ (𝑊struct (𝑡𝑖 ) + 𝑊text (𝑡𝑖 )) 𝑖 = 𝑗 ,
            𝑚=0            (𝑀 )                            profit𝑖,𝑗 =
                                                                              𝛿 ⋅ (−sim(𝑡𝑖 , 𝑡𝑗 ))                    𝑖 ≠𝑗,
 familarity(𝑚) =
                        log(𝑚 + 1)
                                     .                                                                                    (8)
                        log(𝐵 + 1)                        where 𝛿 ∈ [0, 1] is a parameter to tune.
                                                      (4)    Finally, our goal is to
Here, 𝑚 represents the number of documents the user                              |desc(𝑒)| |desc(𝑒)|
has read where the name of prop(𝑡) appears, based                   maximize       ∑         ∑ profit𝑖,𝑗 ⋅ 𝑥𝑖 ⋅ 𝑥𝑗 ,
on which familarity(𝑚) gives the degree of fami-                                   𝑖=1       𝑗=𝑖
larity of prop(𝑡) to the user. However, it is difficult                          |desc(𝑒)|                                 (9)
to know 𝑚 in practice, so 𝑄(𝑡) computes the expected                subject to     ∑ 𝑥𝑖 ≤ 𝑘 ,
value of familarity(𝑚). For simplicity, we assume                                  𝑖=1
𝑀 is a constant. In the experiments we set 𝑀 = 40 and                            𝑥𝑖 ∈ {0, 1} for all 𝑖 = 1 … |desc(𝑒)| .
we use the Google Books Ngram2 as our corpus.
   Finally, we compute textual readability:
                                                                 4. Experiments
                 𝑊text (𝑡) = log(𝑄(𝑡) + 1).               (5)
                                                                 4.1. Settings
3.3. Information Redundancy                                      We use the ESBM v1.2 dataset [8]. It provides ground-
                                                                 truth summaries under 𝑘 = 5 and 𝑘 = 10 for entities
To reduce redundancy in summaries, we measure the                in DBpedia and LinkedMDB. We follow the provided
similarity between two triples 𝑡𝑖 , 𝑡𝑗 in various ways.          training-development-test splits for 5-fold cross vali-
   First, we perform logical reasoning to measure on-            dation, and we use the training and development sets
tological similarity. We define sim(𝑡𝑖 , 𝑡𝑗 ) = 1 if prop(𝑡𝑖 )   for tuning our parameters 𝛼 and 𝛿 by grid search in the
and prop(𝑡𝑗 ) are rdf:type, and rdfs:subClassOf                  range of 0–1 with 0.01 increments. We use F1 score as
is a relation between val(𝑡𝑖 ) and val(𝑡𝑗 ); or if val(𝑡𝑖 )      the evaluation metric.

    2 http://books.google.com/ngrams
Table 1                                                  [2] G. Cheng, T. Tran, Y. Qu, RELIN: relatedness and
F1 Scores                                                    informativeness-based centrality for entity sum-
                       DBpedia         LinkedMDB             marization, in: ISWC’11, Part I, 2011, pp. 114–
                    𝑘 = 5 𝑘 = 10 𝑘 = 5 𝑘 = 10                129. doi:10.1007/978-3-642-25073-6_8.
     RELIN          0.242    0.455   0.203    0.258      [3] K. Gunaratna, K. Thirunarayan, A. P. Sheth,
     DIVERSUM 0.249          0.507   0.207    0.358          FACES: diversity-aware entity summarization
     FACES          0.270    0.428   0.169    0.263          using incremental hierarchical conceptual clus-
     FACES-E        0.280    0.488   0.313    0.393          tering, in: AAAI’15, 2015, pp. 116–122.
     CD             0.283    0.513   0.217    0.331      [4] K. Gunaratna, K. Thirunarayan, A. P. Sheth,
     LinkSUM        0.287    0.486   0.140    0.279          G. Cheng, Gleaning types for literals in RDF
     BAFREC         0.335    0.503   0.360    0.402
                                                             triples with application to entity summarization,
     KAFCA          0.314    0.509   0.244    0.397
     MPSUM          0.314    0.512   0.272    0.423
                                                             in: ESWC’16, 2016, pp. 85–100. doi:10.1007/
     ESSTER         0.324   0.521    0.365    0.452          978-3-319-34129-3_6.
                                                         [5] A. Thalhammer, N. Lasierra, A. Rettinger,
                                                             LinkSUM: Using link analysis to summarize en-
4.2. Results                                                 tity data, in: ICWE’16, 2016, pp. 244–261. doi:10.
                                                             1007/978-3-319-38791-8_14.
Table 1 presents the evaluation results. We compare [6] H. Kroll, D. Nagel, W.-T. Balke, BAFREC: Balanc-
with known results of existing unsupervised entity sum-      ing frequency and rarity for entity characteriza-
marizers [8]. On DBpedia under 𝑘 = 5, BAFREC [6]             tion in linked open data, in: EYRE’18, 2018.
achieves the highest F1 score, and is closely followed [7] Q. Liu, G. Cheng, Y. Qu, Entity summarization
by ESSTER. In all the other three settings, ESSTER out-      with high readability and low redundancy, Sci.
performs all the baselines. Overall, ESSTER achieves         Sin. Inform. 50 (2020) 845–861. doi:10.1360/
state-of-the-art results on ESBM v1.2.                       SSI-2019-0291.
                                                         [8] Q. Liu, G. Cheng, K. Gunaratna, Y. Qu, ESBM:
                                                             an entity summarization benchmark,              in:
5. Conclusion                                                ESWC’20, 2020, pp. 548–564. doi:10.1007/
In this system paper, we presented the implementa-           978-3-030-49461-2_32.
tion of our entity summarizer ESSTER. By integrat-       [9] G. Stoilos, G. B. Stamou, S. D. Kollias, A string
ing structural importance, textual readability, and in-      metric   for ontology alignment, in: ISWC’05,
formation redundancy via combinatorial optimization,         2005,  pp.  624–637. doi:10.1007/11574620_45.
ESSTER achieves state-of-the-art results among unsu-    [10] Z.  Yang,  G.  Wang, F. Chu, An effective GRASP
pervised entity summarizers on the ESBM v1.2 dataset.        and   tabu  search for the 0-1 quadratic knapsack
However, the results are not comparable with super-          problem,      Comput.  Oper. Res. 40 (2013) 1176–
vised neural entity summarizers [11, 12].                    1185.  doi: 10.1016/j.cor.2012.11.023        .
   For the future work, we will consider more powerful  [11] Q.  Liu, G. Cheng, Y. Qu, Deeplens:  Deep learning
measures of readability and redundancy, and will in-         for entity summarization, in: DL4KG’20, 2020.
corporate these features into a neural network model.   [12] J. Li, G. Cheng, Q. Liu, W. Zhang, E. Kharlamov,
                                                             K. Gunaratna, H. Chen, Neural entity summa-
                                                             rization with joint encoding and weak supervi-
Acknowledgments                                              sion, in: IJCAI’20, 2020, pp. 1644–1650. doi:10.
                                                             24963/ijcai.2020/228.
This work was supported by the National Key R&D
Program of China (2018YFB1004300) and by the NSFC
(61772264).


References
 [1] Q. Liu, G. Cheng, K. Gunaratna, Y. Qu, Entity
     summarization: State of the art and future chal-
     lenges, CoRR abs/1910.08252 (2019).

</pre>