=Paper=
{{Paper
|id=Vol-2621/CIRCLE20_38
|storemode=property
|title=Capturing Entity Hierarchy in Data-to-Text Generative Models
|pdfUrl=https://ceur-ws.org/Vol-2621/CIRCLE20_38.pdf
|volume=Vol-2621
|authors=Clément Rebuffel, Laure Soulier,Geoffrey Scoutheteen,Patrick Gallinari
|dblpUrl=https://dblp.org/rec/conf/circle/RebuffelSSG20
}}
==Capturing Entity Hierarchy in Data-to-Text Generative Models==
<pdf width="1500px">https://ceur-ws.org/Vol-2621/CIRCLE20_38.pdf</pdf>
<pre>
    Capturing Entity Hierarchy in Data-to-Text Generative Models
                             Clément Rebuffel                                                               Laure Soulier
     Sorbonne Université, CNRS, LIP6, F-75005 Paris, France                          Sorbonne Université, CNRS, LIP6, F-75005 Paris, France
                   clement.rebuffel@lip6.fr                                                          laure.soulier@lip6.fr

                         Geoffrey Scoutheeten                                                            Patrick Gallinari
                           BNP Paribas, France                                       Sorbonne Université, CNRS, LIP6, F-75005 Paris, France
                 geoffrey.scoutheeten@bnpparibas.com                                                 Criteo AI Lab, Paris
                                                                                                   patrick.gallinari@lip6.fr

ABSTRACT                                                                             descriptions. Closer to our work, very recent work [8, 9, 14] have
We aim at generating summary from structured data (i.e. tables,                      proposed to take into account the data structure. For instance,
entity-relation triplets, ...). Most previous approaches relies on an                Puduppully et al. [13] design a more complex two-step decoder:
encoder-decoder architecture in which data are linearized into a                     they first generate a plan of elements to be mentioned, and then
sequence of elements. In contrast, we propose to take into account                   condition text generation on this plan.
entities forming the data structure in a hierarchical model. More-
over, we introduce the Transformer encoder in data-to-text models                    2   CONTRIBUTION AND MAIN RESULTS
to ensure robust encoding of each element/entities in comparison to                  In this paper, we focus on the encoding step of data-to-text mod-
all others, no matter their initial positioning. Our model is evaluated              els since we assume that a large amount of work is done in lan-
on the RotoWire benchmark (statistical tables of NBA basketball                      guage generation and summary. We believe that the most important
games). This paper has been accepted at ECIR 2020.                                   challenge relies here on the data structure encoding. Therefor, we
                                                                                     identify two limitations of previous work :
KEYWORDS                                                                                 (1) Linearization of the data-structure. In practice, most works
Data-to-Text, Hierarchical Encoding, Deep Learning                                           focus on introducing innovating decoding modules, and still
                                                                                             represent data as a unique sequence of elements to be en-
1    CONTEXT AND MOTIVATION                                                                  coded, effectively losing distinction between rows, and there-
Understanding data structure is an emerging challenge to enhance                             fore entities. To the best of our knowledge, only Liu et al.
textual tasks, such as question answering [11, 18] or table retrieval                        [8, 9] propose encoders constrained by the structure but
[4, 17]. One emerging research field, referred to as “data-to-text"                          these approaches are designed for single-entity structures.
[5], consists in transcribing data-structures into natural language in                   (2) Arbitrary ordering of unordered collections in recurrent net-
order to ease their understandablity and their usablity. Numerous                            works (RNN). Most data-to-text systems use RNNs as en-
examples of applications can be cited: journalism [10], medical diag-                        coders (such as LSTMs), which require in practice their in-
nosis [12], weather reports [16], or sport broadcasting [2, 21]. Figure                      put to be fed sequentially. This way of encoding unordered
1 depicts a data-structure containing statistics on NBA basketball                           sequences (i.e. collections of entities) implicitly assumes an
games, paired with its corresponding journalistic description.                               arbitrary order within the collection which, as in Vinyals et
   Until recently, efforts to bring out semantics from structured-                           al. [20], significantly impacts the learning performance.
data relied heavily on expert knowledge (e.g. rules) [3, 16]. Modern                     To address these shortcomings, we propose a new structured-
data-to-text models [1, 6, 21] leverage deep learning advances and                   data encoder assuming that structures should be hierarchically
are generally designed using two connected components: 1) an en-                     captured. Our contribution focuses on the encoding of the data-
coder aiming at understanding the structured data and 2) a decoder                   structure, thus the decoder is chosen to be a classical module as used
generating associated descriptions. This standard architecture is                    in [13, 21]. Our contribution, illustrated in Figure 2, is threefold:
often augmented with 1) the attention mechanism which computes                             • We model the general structure of the data using a two-level
a context focused on important elements from the input at each                               architecture, first encoding all entities on the basis of their
decoding step and, 2) the copy mechanism to deal with unknown                                elements, then encoding the data structure on the basis of
or rare words. However, most of work [1, 6, 21] represent the data                           its entities;
records as a single sequence of facts to be fed to the encoder. These                      • We introduce the Transformer encoder [19] in data-to-text
models reach their limitations on large structured-data composed                             models to ensure robust encoding of each element/entities in
of several entities (e.g. row in tables) and multiple attributes (e.g.                       comparison to all others, no matter their initial positioning;
column in tables) and fail to accurately extract salient elements.                         • We integrate a hierarchical attention mechanism to compute
   To improve these models, a number of work [7, 13] proposed                                the hierarchical context fed into the decoder.
innovating decoding modules based on planning and templates,                             As shown in Figure 2, our model relies on two encoders:
to ensure factual and coherent mentions of records in generated                          • the Low-level encoder encodes each entity 𝑒𝑖 on the basis of
"Copyright © 2020 for this paper by its authors. Use permitted under Creative Com-   its record embeddings r𝑖,𝑗 . Each record embedding r𝑖,𝑗 is compared
mons License Attribution 4.0 International (CC BY 4.0)."                             to other record embeddings to learn its final hidden representation
                                                                                 Clément Rebuffel, Laure Soulier, Geoffrey Scoutheeten, and Patrick Gallinari


Figure 1: Example of structured data from the RotoWire dataset. Rows are entities (a team or a player) and each cell a record,
its key being the column label and its value the cell content. Factual mentions from the table are boldfaced in the description.


                                                                             For a more in-depth understanding of our contribution, please
                                                                          read our ECIR paper [15].

                                                                          3    ACKNOWLEDGEMENTS
                                                                          We would like to thank the H2020 project AI4EU (825619) which
                                                                          partially supports Laure Soulier and Patrick Gallinari.

                                                                          REFERENCES
                                                                           [1] Shubham Agarwal and Marc Dymetman. 2017. A surprisingly effective out-of-
                                                                               the-box char2char model on the E2E NLG Challenge dataset. In SIGdial. 158–163.
                                                                           [2] David L. Chen and Raymond J. Mooney. 2008. Learning to Sportscast: A Test of
                                                                               Grounded Language Acquisition (ICML 2008).
                                                                           [3] Dong Deng, Yu Jiang, Guoliang Li, Jian Li, and Cong Yu. 2013. Scalable column
                                                                               concept determination for web tables using large knowledge bases. Proceedings
Figure 2: Our proposed hierarchical encoder. Once the records are              of the VLDB Endowment (2013).
embedded, the low-level encoder works on each entity indepen-              [4] Li Deng, Shuo Zhang, and Krisztian Balog. 2019. Table2Vec: Neural Word and
                                                                               Entity Embeddings for Table Population and Retrieval. In SIGIR 2019.
dently (A); then the high-level encoder encodes the collection of en-      [5] Albert Gatt and Emiel Krahmer. 2018. Survey of the State of the Art in Natural
tities (B). In circles, we represent the hierarchical attention scores:        Language Generation: Core Tasks, Applications and Evaluation. J. Artif. Int. Res.
                                                                               (2018).
the 𝛼 scores at the entity level and the 𝛽 scores at the record level.
                                                                           [6] Rémi Lebret, David Grangier, and Michael Auli. 2016. Neural Text Generation
                                                                               from Structured Data with Application to the Biography Domain. In EMNLP.
                                                                           [7] Liunian Li and Xiaojun Wan. 2018. Point Precisely: Towards Ensuring the Preci-
h𝑖,𝑗 . We also add a special record [ENT] for each entity, illustrated         sion of Data in Generated Texts Using Delayed Copy Mechanism. In ICCL.
                                                                           [8] Tianyu Liu, Fuli Luo, Qiaolin Xia, Shuming Ma, Baobao Chang, and Zhifang Sui.
in Figure 2 as the last record. Since entities might have a variable           2019. Hierarchical Encoder with Auxiliary Supervision for Neural Table-to-Text
number of records, this token allows to aggregate final hidden                 Generation: Learning Better Representation for Tables. AAAI (2019).
                                 𝐽𝑖
record representations {h𝑖,𝑗 } 𝑗=1   in a fixed-sized representation       [9] Tianyu Liu, Kexiang Wang, Lei Sha, Baobao Chang, and Zhifang Sui. 2018. Table-
                                                                               to-text Generation by Structure-aware Seq2seq Learning. In AAAI.
vector h𝑖 .                                                               [10] Will Oremus. 2014. The First News Report on the L.A. Earthquake Was Written
   • the High-level encoder encodes the data-structure on the                  by a Robot.
                                                                          [11] Panupong Pasupat and Percy Liang. 2015. Compositional Semantic Parsing on
basis of its entity representation h𝑖 . Similarly to the Low-level             Semi-Structured Tables. In IJCNLP.
encoder, the final hidden state ei of an entity is computed               [12] Steffen Pauws, Albert Gatt, Emiel Krahmer, and Ehud Reiter. 2019. Making
by comparing entity representation h𝑖 with each others. The                    Effective Use of Healthcare Data Using Data-to-Text Technology: Methodologies and
                                                                               Applications. 119–145.
data-structure representation z is computed as the mean of these          [13] Ratish Puduppully, Li Dong, and Mirella Lapata. 2018. Data-to-Text Generation
entity representations, and is used for the decoder initialization.            with Content Selection and Planning. In AAAI.
                                                                          [14] Ratish Puduppully, Li Dong, and Mirella Lapata. 2019. Data-to-text Generation
                                                                               with Entity Modeling. In ACL 2019.
   We report experiments on the RotoWire benchmark [21] which             [15] Clément Rebuffel, Laure Soulier, Geoffrey Scoutheeten, and Patrick Gallinari.
contains around 5𝐾 statistical tables of NBA basketball games                  2020. A Hierarchical Model for Data-to-Text Generation. In ECIR 2020. 65–80.
                                                                          [16] Ehud Reiter, Somayajulu Sripada, Jim Hunter, Jin Yu, and Ian Davy. 2005. Choos-
paired with human-written descriptions. Comparisons against base-              ing Words in Computer-generated Weather Forecasts. Artif. Intell. (2005).
lines show that introducing the Transformer architecture is a             [17] Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Y. Halevy, Hongrae Lee, Fei
promising way to implicitly account for data structure, and leads              Wu, Reynold Xin, and Cong Yu. 2012. Finding Related Tables. In SIGMOD.
                                                                          [18] Huan Sun, Hao Ma, Xiaodong He, Wen-tau Yih, Yu Su, and Xifeng Yan. 2016.
to better content selection even before introducing hierarchical               Table Cell Search for Question Answering. In Proceedings of the 25th International
encoding. Furthermore, our hierarchical model outperforms all                  Conference on World Wide Web - WWW ’16. ACM Press, 771–782.
                                                                          [19] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,
baselines on content selection, showing that capturing structure in            Aidan N. Gomez, Kaiser, and Illia Polosukhin. [n.d.]. Attention is All You Need
the encoding process is more effective that predicting a structure             (NIPS 2017).
in the decoder (e.g., planning or templating). We show via ablation       [20] Oriol Vinyals, Samy Bengio, and Manjunath Kudlur. 2016. Order matters: Se-
                                                                               quence to sequence for sets. In ICLR.
studies that further constraining the encoder on structure (through       [21] Sam Wiseman, Stuart Shieber, and Alexander Rush. 2017. Challenges in Data-to-
hierarchical attention) leads to even better performances.                     Document Generation. In EMNLP.

</pre>