=Paper=
{{Paper
|id=Vol-2621/CIRCLE20_38
|storemode=property
|title=Capturing Entity Hierarchy in Data-to-Text Generative Models
|pdfUrl=https://ceur-ws.org/Vol-2621/CIRCLE20_38.pdf
|volume=Vol-2621
|authors=Clément Rebuffel, Laure Soulier,Geoffrey Scoutheteen,Patrick Gallinari
|dblpUrl=https://dblp.org/rec/conf/circle/RebuffelSSG20
}}
==Capturing Entity Hierarchy in Data-to-Text Generative Models==
Capturing Entity Hierarchy in Data-to-Text Generative Models Clément Rebuffel Laure Soulier Sorbonne Université, CNRS, LIP6, F-75005 Paris, France Sorbonne Université, CNRS, LIP6, F-75005 Paris, France clement.rebuffel@lip6.fr laure.soulier@lip6.fr Geoffrey Scoutheeten Patrick Gallinari BNP Paribas, France Sorbonne Université, CNRS, LIP6, F-75005 Paris, France geoffrey.scoutheeten@bnpparibas.com Criteo AI Lab, Paris patrick.gallinari@lip6.fr ABSTRACT descriptions. Closer to our work, very recent work [8, 9, 14] have We aim at generating summary from structured data (i.e. tables, proposed to take into account the data structure. For instance, entity-relation triplets, ...). Most previous approaches relies on an Puduppully et al. [13] design a more complex two-step decoder: encoder-decoder architecture in which data are linearized into a they first generate a plan of elements to be mentioned, and then sequence of elements. In contrast, we propose to take into account condition text generation on this plan. entities forming the data structure in a hierarchical model. More- over, we introduce the Transformer encoder in data-to-text models 2 CONTRIBUTION AND MAIN RESULTS to ensure robust encoding of each element/entities in comparison to In this paper, we focus on the encoding step of data-to-text mod- all others, no matter their initial positioning. Our model is evaluated els since we assume that a large amount of work is done in lan- on the RotoWire benchmark (statistical tables of NBA basketball guage generation and summary. We believe that the most important games). This paper has been accepted at ECIR 2020. challenge relies here on the data structure encoding. Therefor, we identify two limitations of previous work : KEYWORDS (1) Linearization of the data-structure. In practice, most works Data-to-Text, Hierarchical Encoding, Deep Learning focus on introducing innovating decoding modules, and still represent data as a unique sequence of elements to be en- 1 CONTEXT AND MOTIVATION coded, effectively losing distinction between rows, and there- Understanding data structure is an emerging challenge to enhance fore entities. To the best of our knowledge, only Liu et al. textual tasks, such as question answering [11, 18] or table retrieval [8, 9] propose encoders constrained by the structure but [4, 17]. One emerging research field, referred to as “data-to-text" these approaches are designed for single-entity structures. [5], consists in transcribing data-structures into natural language in (2) Arbitrary ordering of unordered collections in recurrent net- order to ease their understandablity and their usablity. Numerous works (RNN). Most data-to-text systems use RNNs as en- examples of applications can be cited: journalism [10], medical diag- coders (such as LSTMs), which require in practice their in- nosis [12], weather reports [16], or sport broadcasting [2, 21]. Figure put to be fed sequentially. This way of encoding unordered 1 depicts a data-structure containing statistics on NBA basketball sequences (i.e. collections of entities) implicitly assumes an games, paired with its corresponding journalistic description. arbitrary order within the collection which, as in Vinyals et Until recently, efforts to bring out semantics from structured- al. [20], significantly impacts the learning performance. data relied heavily on expert knowledge (e.g. rules) [3, 16]. Modern To address these shortcomings, we propose a new structured- data-to-text models [1, 6, 21] leverage deep learning advances and data encoder assuming that structures should be hierarchically are generally designed using two connected components: 1) an en- captured. Our contribution focuses on the encoding of the data- coder aiming at understanding the structured data and 2) a decoder structure, thus the decoder is chosen to be a classical module as used generating associated descriptions. This standard architecture is in [13, 21]. Our contribution, illustrated in Figure 2, is threefold: often augmented with 1) the attention mechanism which computes • We model the general structure of the data using a two-level a context focused on important elements from the input at each architecture, first encoding all entities on the basis of their decoding step and, 2) the copy mechanism to deal with unknown elements, then encoding the data structure on the basis of or rare words. However, most of work [1, 6, 21] represent the data its entities; records as a single sequence of facts to be fed to the encoder. These • We introduce the Transformer encoder [19] in data-to-text models reach their limitations on large structured-data composed models to ensure robust encoding of each element/entities in of several entities (e.g. row in tables) and multiple attributes (e.g. comparison to all others, no matter their initial positioning; column in tables) and fail to accurately extract salient elements. • We integrate a hierarchical attention mechanism to compute To improve these models, a number of work [7, 13] proposed the hierarchical context fed into the decoder. innovating decoding modules based on planning and templates, As shown in Figure 2, our model relies on two encoders: to ensure factual and coherent mentions of records in generated • the Low-level encoder encodes each entity 𝑒𝑖 on the basis of "Copyright © 2020 for this paper by its authors. Use permitted under Creative Com- its record embeddings r𝑖,𝑗 . Each record embedding r𝑖,𝑗 is compared mons License Attribution 4.0 International (CC BY 4.0)." to other record embeddings to learn its final hidden representation Clément Rebuffel, Laure Soulier, Geoffrey Scoutheeten, and Patrick Gallinari Figure 1: Example of structured data from the RotoWire dataset. Rows are entities (a team or a player) and each cell a record, its key being the column label and its value the cell content. Factual mentions from the table are boldfaced in the description. For a more in-depth understanding of our contribution, please read our ECIR paper [15]. 3 ACKNOWLEDGEMENTS We would like to thank the H2020 project AI4EU (825619) which partially supports Laure Soulier and Patrick Gallinari. REFERENCES [1] Shubham Agarwal and Marc Dymetman. 2017. A surprisingly effective out-of- the-box char2char model on the E2E NLG Challenge dataset. In SIGdial. 158–163. [2] David L. Chen and Raymond J. Mooney. 2008. Learning to Sportscast: A Test of Grounded Language Acquisition (ICML 2008). [3] Dong Deng, Yu Jiang, Guoliang Li, Jian Li, and Cong Yu. 2013. Scalable column concept determination for web tables using large knowledge bases. Proceedings Figure 2: Our proposed hierarchical encoder. Once the records are of the VLDB Endowment (2013). embedded, the low-level encoder works on each entity indepen- [4] Li Deng, Shuo Zhang, and Krisztian Balog. 2019. Table2Vec: Neural Word and Entity Embeddings for Table Population and Retrieval. In SIGIR 2019. dently (A); then the high-level encoder encodes the collection of en- [5] Albert Gatt and Emiel Krahmer. 2018. Survey of the State of the Art in Natural tities (B). In circles, we represent the hierarchical attention scores: Language Generation: Core Tasks, Applications and Evaluation. J. Artif. Int. Res. (2018). the 𝛼 scores at the entity level and the 𝛽 scores at the record level. [6] Rémi Lebret, David Grangier, and Michael Auli. 2016. Neural Text Generation from Structured Data with Application to the Biography Domain. In EMNLP. [7] Liunian Li and Xiaojun Wan. 2018. Point Precisely: Towards Ensuring the Preci- h𝑖,𝑗 . We also add a special record [ENT] for each entity, illustrated sion of Data in Generated Texts Using Delayed Copy Mechanism. In ICCL. [8] Tianyu Liu, Fuli Luo, Qiaolin Xia, Shuming Ma, Baobao Chang, and Zhifang Sui. in Figure 2 as the last record. Since entities might have a variable 2019. Hierarchical Encoder with Auxiliary Supervision for Neural Table-to-Text number of records, this token allows to aggregate final hidden Generation: Learning Better Representation for Tables. AAAI (2019). 𝐽𝑖 record representations {h𝑖,𝑗 } 𝑗=1 in a fixed-sized representation [9] Tianyu Liu, Kexiang Wang, Lei Sha, Baobao Chang, and Zhifang Sui. 2018. Table- to-text Generation by Structure-aware Seq2seq Learning. In AAAI. vector h𝑖 . [10] Will Oremus. 2014. The First News Report on the L.A. Earthquake Was Written • the High-level encoder encodes the data-structure on the by a Robot. [11] Panupong Pasupat and Percy Liang. 2015. Compositional Semantic Parsing on basis of its entity representation h𝑖 . Similarly to the Low-level Semi-Structured Tables. In IJCNLP. encoder, the final hidden state ei of an entity is computed [12] Steffen Pauws, Albert Gatt, Emiel Krahmer, and Ehud Reiter. 2019. Making by comparing entity representation h𝑖 with each others. The Effective Use of Healthcare Data Using Data-to-Text Technology: Methodologies and Applications. 119–145. data-structure representation z is computed as the mean of these [13] Ratish Puduppully, Li Dong, and Mirella Lapata. 2018. Data-to-Text Generation entity representations, and is used for the decoder initialization. with Content Selection and Planning. In AAAI. [14] Ratish Puduppully, Li Dong, and Mirella Lapata. 2019. Data-to-text Generation with Entity Modeling. In ACL 2019. We report experiments on the RotoWire benchmark [21] which [15] Clément Rebuffel, Laure Soulier, Geoffrey Scoutheeten, and Patrick Gallinari. contains around 5𝐾 statistical tables of NBA basketball games 2020. A Hierarchical Model for Data-to-Text Generation. In ECIR 2020. 65–80. [16] Ehud Reiter, Somayajulu Sripada, Jim Hunter, Jin Yu, and Ian Davy. 2005. Choos- paired with human-written descriptions. Comparisons against base- ing Words in Computer-generated Weather Forecasts. Artif. Intell. (2005). lines show that introducing the Transformer architecture is a [17] Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Y. Halevy, Hongrae Lee, Fei promising way to implicitly account for data structure, and leads Wu, Reynold Xin, and Cong Yu. 2012. Finding Related Tables. In SIGMOD. [18] Huan Sun, Hao Ma, Xiaodong He, Wen-tau Yih, Yu Su, and Xifeng Yan. 2016. to better content selection even before introducing hierarchical Table Cell Search for Question Answering. In Proceedings of the 25th International encoding. Furthermore, our hierarchical model outperforms all Conference on World Wide Web - WWW ’16. ACM Press, 771–782. [19] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, baselines on content selection, showing that capturing structure in Aidan N. Gomez, Kaiser, and Illia Polosukhin. [n.d.]. Attention is All You Need the encoding process is more effective that predicting a structure (NIPS 2017). in the decoder (e.g., planning or templating). We show via ablation [20] Oriol Vinyals, Samy Bengio, and Manjunath Kudlur. 2016. Order matters: Se- quence to sequence for sets. In ICLR. studies that further constraining the encoder on structure (through [21] Sam Wiseman, Stuart Shieber, and Alexander Rush. 2017. Challenges in Data-to- hierarchical attention) leads to even better performances. Document Generation. In EMNLP.