=Paper=
{{Paper
|id=Vol-2962/paper48
|storemode=property
|title=ZREC architecture for textual sentiment analysis
|pdfUrl=https://ceur-ws.org/Vol-2962/paper48.pdf
|volume=Vol-2962
|authors=Martin Pavlíček,Tomáš Filip,Petr Sosik
|dblpUrl=https://dblp.org/rec/conf/itat/PavlicekFS21
}}
==ZREC architecture for textual sentiment analysis==
ZREC architecture for textual sentiment analysis Martin Pavlíček, Tomáš Filip, and Petr Sosík Institute of Computer Science - Faculty of Philosophy and Science - Silesian University in Opava martin.pavlicek@fpf.slu.cz Abstract: We present recent results of the research project IT and in an applied research. We focus on understanding ZREC aimed at psycho-social phenomena (group polar- these phenomena within specific ecosystem - nation, lan- ization, belief echo chamber and confirmatory bias) analy- guage, a selected group of sources and other parameters. sis based on bio-inspired computing methods. We present In a simple way, we can analyze approval or disapproval two updated pipeline solutions to work with bio inspired with world events which occurred as information in cy- AI methods and data gathering tools integrated in a com- berspace or within interaction of individuals who act on plex (but simple to implement) vertical information sys- the surface Internet. tem. The scope of the investigated phenomena is reduced The paper is organized as follows: in the next section, to the aspect based sentiment analysis with an integration we describe a novel project architecture based on pipelined of methods covering named entity recognition and relation tasks. Data pre-processing phase is described in Section 3. extraction. We present a simple ontology addition to group Section 4 presents details of the key project component - polarization in the last year due to COVID pandemic and aspect based sentiment analysis, and experimental result stress the importance of project in the social and IT sphere we have obtained with our architecture using three differ- and multi-tier cooperation. We also provide introductory ent deep learning models. The two last sections contain results based on test data using several deep learning ar- discussion and conclusions. chitectures and demonstrating that the presented approach is robust and functional. 2 Project description 1 Introduction 2.1 Pipeline In the recent years we can see dramatic increase in inter- In a pipeline view of the system we introduce two pipeline action between individuals and groups in cyberspace [7] solutions which cover both data and the AI methodol- together with news dissemination [2] and real time report- ogy integration model. This division is needed to track ing, as well as increasingly polarized groups presenting changes, to track learning data and their ability to create a their narrative and beliefs [13] in the cyberspace. narrative bias and to share these metadata within develop- We can also see processes of regulation [12] [3] and ers community. specific narrative information enforcement, which are not The first pipeline covers the implementation and train- only due to the novel COVID situation worldwide. To- ing of ML methods for the NLP analysis. In this pipeline gether with cybersecurity, national interests are aligned we store and train specific models of our live data and we with acceptation of information as weapons and informa- also store pre-trained models and analyze the results. At tion warfare battlefield [5] [6] [4] [10]. any time we can access a specific version of the model to- These premises motivate us to investigate and build gether with specified data which can provide feedback and tools to understand the flow of information in cyberspace a possible rollback in the system’s development. in a more open and rigorous manner. To keep the project The second pipeline focuses on data gathering, cleanup manageable, we restrict our investigation to information and storage. To exploit different sources and different so- about event exposures and specific sentiment reactions cial networks like Facebook, GAB, Twitter, Parler and oth- (positive, negative, neutral) which rise in an individual and ers, we maintain a set of tools which are used to gather data which can be traced to a group behavior. We focus on three from predefined sources within a defined algorithm. The phenomena – group polarization [1], belief echo chamber data are cleaned, meta-annotated and stored in the system. [18] [22] and confirmatory bias [21]. Besides the interac- tion we monitor world events through the GDELT dataset Further work with the data is possible within the com- which is viewed as a trigger of sentiment response. mon batch analysis framework (described bellow) which The goal is to investigate these phenomena and main- is available to the users (Figure 1). tain an open system ZREC (www.zrec.org) and its cor- nerstones – algorithms, research community and methods 2.2 Architecture which can be used for further work both in the scope of Copyright ©2021 for this paper by its authors. Use permitted under We can describe the state of the system as a scalable verti- Creative Commons License Attribution 4.0 International (CC BY 4.0). cal architecture which has emerged from the initial phase. The advantage of this tool is the ability to process data without using Twitter´s API. Ontology is used as the main data structure to define groups and individuals. A comprehensive definition of on- tology of captions is a strong tool to solve complex situa- tion of similarity and anomaly detection. We use a relation database to store a predefined a specific static ontology of captions transformed into graph network [16] which is then used for computational purposes. Figure 1: A pipeline view of the system architecture with Batch analysis defines framework of methods of analy- an optional batch analysis sis in the system. The system is built to handle multiple tasks from multiple users on multiple data sources. Batch analysis provides a robust system of common analytical In the scope of technology, we work with scripting lan- queries which can be used as a simple batch scheduler. guages for creating the application part of the system, rela- This definition of tasks gives us the ability to store specific tional (SQL) and graph databases are used to store the data combinations of data, users and methods which altogether and to provide the basic architecture. For presentation we control the analysis. In the user scenario this gives us the use the concept of web information system and use a li- ability to cache and speed-up processes and to have a pool brary of visual front-end framework to simply present the of results which can be used for further comparison and front-end of the system to end users. cross-check. Our goal is to create a complex yet relatively simply implementable system (Figure 2). The architecture can be Information system core is the meta programming lan- divided into two parts. The first part is an administrative guage we use to build the system. Base of the information and methodical system. The second part is the data part system has the ability to render data pages, to check global combined with AI methods. The key components of the and parametric permissions, to define users and their roles system represent data collection and tagging, NLP meth- and their history. The core gives us an ability to tweak ods and dataset warehouse, group and individual ontology the system, to view it with permissions of other roles and graphs, common system analytical tasks scheduler. users, and to give a transparent model of accessing all the data and all subsystems. Bio inspired methods training ground is used to store specific (mostly deep learning) AI methods [33, 30], with Specific data module interrelates data sources and events pre-selected training data and specific iteration of pre- gathered from the surface Internet. Information about trained methods as an essential part of our system. This events are obtained through the GDELT2 dataset in a part of system gives us the ability to strongly support the CAMEO format. Further specific datasets (textual and nu- integration of new bio inspired learning models for emer- merical) are being integrated to the system - currently the gence of update both models and specific data which were storage of COVID cases from authoritative sources (Johns used to train these models. From our experiments we see Hopkins University3 ). a strong trend to gain a specific bias when training our Translation module define roles of translators which can models on live data from certain sources. This is, e.g., the access the system and proceed with translation from/to dif- effect of echo chambers present in the sources we gather ferent languages, increasing system’s accessibility. data from. The ability to snapshot model training data and model definition is essential. Data gathering and tagging is a part of the system fo- 3 Data pre-processing cusing on definition of selected sources and individuals, In this section we describe a series of recent known meth- as well as selected methods and algorithms to gather the ods for text feature extraction which are (or will be) used predefined text data. We focus on simple definition of se- in our architecture to pre-process input data for the exper- lectors and the ability to self heal within error spaces. iments described in the next section. For a survey of possible methods we refer the reader to, e.g. [11]. For instance we use Twint tools1 - Twitter In- telligence Tool to collect data from Twitter using Python 3.1 Creating a dataset language and bypassing the need to use Twitter API. With the help of this tool we can select queries for specific users It is necessary to label the collected data for further pro- and specify the time period for which we want to collect cessing. Manual data labeling is time consuming. Tomáš all available data. Our gathered data includes posts, com- Mikolov et al. (2013) [17] introduced a method Word2Vec ments, and user interactions, including related metadata. 2 https://www.gdeltproject.org/ 1 https://github.com/twintproject/twint 3 https://github.com/owid/covid-19-data/tree/master/public/datag the given texts. A concatenated input document and doc- uments returned from search engine are used together to train the model. The assumption is that both outputted distributions should be similar. This is done by updating loss function. This model achieved the highest score on 8 different NER datasets from different domains. 3.3 Relation Extraction Apart from the NER, another important task for text com- prehension is to classify the relationship between entities. Xu et al. (2021) [29] added the Structured Self-Attention Network (SSAN) to the Transformer deep learning archi- tecture. The SSAN model incorporates the Biaffine Trans- formation or Decomposed Linear Transformation which creates the structure Si, j . This structure represents the con- Figure 2: Architecture of the system nection between words wi and w j and makes it possible to classify the type of link between entities and discover co- reference structures. Wadden et al. (2019) [26] introduced which project the word into a multidimensional feature the multi-task framework DYGIE++, for three tasks of in- vector. This projection allows us to use vector alge- formation extraction: RE, NER and event extraction. The bra tools to measure the distance between words. If we basis is a pre-trained NLP model. Its outputs are sent to the are able to determine how semantically similar individ- graph propagation module. It then modifies the representa- ual words are, we can use this technique to measure the tion by integrating the current representation with previous relevance of texts. One such tool for document similar- representations using the gating function. The resulting ity metric is Word Mover´s Distance (WMD Kusner et al. predictions are obtained from the re-contextualized repre- 2015) [14]. WMD finds the minimum distance to transport sentation using a scoring function. It contains two feed- all words from a source document to a destination docu- forward neural nets (FFNN). The final outputs are equal ment. Because this method uses pre-trained embedding, to FFNN(gi ) for NER and FFNN([gi g j ]) for RE, where gi WMD allows us to find a relationship between texts that do and g j are the representations for span i and j. A differ- not share same words but has a similar meaning. Relaxed ent approach was used by Zhang et al. (2021) [34] where word mover´s [14] distance further reduces the time con- they applied U-Net (Ronneberger et al. 2015) [20] model suming of WMD from O(p3 logp) to O(p2 ), where p de- known from computer vision to find global relationships notes the number of unique words in the texts. This tech- between entities. First, they created an entity-level relation nique allows to find the most relevant texts for the given matrix. Entity similarity was calculated using similarity- query and thus to streamline the process of creation of a based method (concatenating cosine similarity, element- training dataset. wise similarity and bilinear similarity) or context based method (entity-aware attention). The feature vectors form a matrix M i× j×d , where i and j indicate a relation between 3.2 Named Entity Recognition (NER) i-th and j-th entity, d is the size of feature vector. This ma- One of the essential functions of natural text processing trix is put to the U-Net model where d serves as a feature models is to correctly predict name entities and the re- channel. The resulting relational type probability are ob- lationships between them. This capability is important tained using feedforward network, entity pair embedding for tasks that use named entities such as Question An- and output from the U-net model. swering (QA) or entity Relation Extraction (RE). Models handling contextual information have brought significant improvement for NER. Yamada et al. (2020) [31] added 4 Experimental results the entity-aware self-attention mechanism and entity type 4.1 Aspect Based Sentiment Analysis (ABSA) embedding to its model. He also added a pre-training task where he replaced a certain number of entities with ABSA is a method for classifying text polarity. In contrast a special hMASKi token in order to predict these entities. to aspect analysis, it makes it possible to determine senti- This model has achieved the most accurate results on tasks ment in a fine-grained detail. The analyzed document may working with entities: NER, relation classification and en- be related to several independent aspects and each of these tity typing. Wang et al. (2021) [27] used a search engine aspects may have different sentiment. Thus ABSA can be to find texts semantically similar to the input text. To eval- divided into two separate tasks. First, finding all aspects uate similar texts they used BertScore (Zhang et al. 2020) which occur in the sentence. Second, predict sentiment to [35], which measures cosine similarity between tokens of each aspects. Various methods have been proposed to solve this task. Model Parameters Precision Recall AUC One of the classic solutions is the formation of a depen- BERT 5 701 889 0.9996 0.9504 0.9918 dency tree. Devlin et al. (2019) [9] introduced the BERT XLNet 6 368 001 0.9996 0.9460 0.9939 model built on the Transformer architecture (Vaswani et RoBERTa 5 701 889 0.9987 0.9383 0.9917 al. 2017) [25]. BERT was created to capture the right and left context of a word and it was used as a backbone in Table 1: Training results of the ML models BERT, XL- many ABSA models. The BERT was pre-trained to pre- Net and RoBERTa on the Sentihood dataset with targeted dict tokens in the sentences that were artificially corrupted. auxiliary sentences. Some randomly selected words from sentence were re- places by special hMASKi token. A disadvantage of this of using auxiliary sentence for predicting polarities is the pre-training task is the loss of the context between masked need for repeated predictions for each aspect. words. In our experiments we tested our architecture with the This problem is solved by the XLNet model (Yang et al. pre-trained BERT, XLNet and RoBERTa deep learning 2020) [32] learning contextual information from all per- models. Hyperparameters of the models were set as fol- mutations of the factorization order. This method ensures lows: No. of training epochs 150–200, batch size 48, that contextual information from all possible positions of learning rate 1e-5, optimizer: Adam Weight Decay. The the right and left context are used. numerical scores of training of the three models is sum- Liu et al. (2019) [15] introduced the Robustly optimized marized in Table 1, where AUC stands for the Area Under BERT approach (RoBERTa), which has been pre-trained Curve Score. on a more robust data corpus than BERT using larger batch Graphical comparison of results of the three models is sizes. However, the pre-training tasks do not directly in- presented at Fig. 3 and 4. We can conclude that all three corporate text sentiment determination. models provided rather impressive results and that the tex- Tian et al. (2020) [24] introduced the self-supervised tual analysis in our ZREC architecture proves applicable SKEP method for pre-training the BERT model. Instead to real world data which we are now collecting. of randomly selected words as in BERT, words related to sentiment or aspects are selected for replacement with the hMASKi token. The model predicts the words polarity 5 Discussion and the masked sentiment words. Models pre-trained us- ing this method achieved better performance than baseline 5.1 Retrospective look models. Dai et al. (2021) [8] used the Pertubed masking method The project ZREC defines two areas of importance – for which searches for syntactic connections in a pre-trained society and for IT. Both can be achieved by creating an BERT model to create an induced tree. open distributed ecosystem which can be used to under- Finally, Sun et al. (2019) [23] used two different inputs stand emerging phenomena. This is now even more im- for the pre-trained BERT model. The first input is a sen- portant as in the last year we saw a world transforming tence from the dataset and the second input is an auxiliary via COVID restrictions, and so the need to understand sentence. The auxiliary sentence contains the target and cyberspace phenomena and their influence on society is the aspect. Using these two inputs, the model predicts the still more urgent as the communication is moving to cy- resulting polarity. This method transforms the ABSA task berspace. We see this like a clear trend and motivation for to a QA task. the project. The IT research side is more profound since we want to develop, integrate and implement state-of-the-art AI meth- 4.2 Experiments with BERT, XLNet and RoBERTa ods aimed at natural language understanding in specific areas. Hence the function of the project as a strongly de- To evaluate the capabilities of our pipeline architecture, fined sandbox which is an integration tool for plethora of we performed a series of experiments based on the specific methods from the NLP filed is both effective and test dataset Sentihood which is publicly available at promising. https://github.com/uclnlp/jack/tree/master/ data/sentihood as a part of the project Jack the Reader (JACK) [28]. The Sentihood dataset contains opinions 5.2 Trends and main ontology themes about living in various locations in London, UK. In particular, there are 2480 training samples (opinions) with In our previous publication [19] we defined a main ontol- positive sentiment and 921 with negative sentiment, i.e., ogy based on basic polarization which defined entities in 3401 in total. Instead of processing the whole ABSA our information ecosystem, like sentiment towards: Czech pipeline, we used a predefined subset of aspects which we Republic, United States, Russia, Israel, Ukraine, politi- wanted to predict in the collected data and we created an cal figures from the United States, Russia and also Czech appropriate set of auxiliary sentences. The disadvantage Republic, intelligence agencies like CIA, FSB, GRU, BIS Figure 3: Training progress of the ML models BERT, XLNet and RoBERTa on the Sentihood dataset during 200 training epochs. Figure 4: Prediction success of the ML models BERT, XLNet and RoBERTa on the Sentihood dataset. and others. This ontology together with the used sources of new self-pretraining methods specifically designed for is a key factor in creating an individual or group profile. sentiment classification. Promising solutions for ABSA A new communication topic with an enormous socio- can be based on auxiliary sentences and attention model economic impact and an adequate amount of hoax and fake usage. news has emerged: vaccination, COVID restrictions and COVID pandemic acknowledgment. These topics are (to- 6 Conclusion gether with topics covering national security and politics) in the center of interaction covering basic events emerg- We have presented an updated ZREC project (www.zrec. ing in the cyberspace. With our modular architecture we org) whose aim is the analysis of psycho-social phenom- can continue to follow individual and group responses and ena (group polarization, belief echo chamber and confir- polarizations based on the interaction in the field of vac- matory bias) in the surface Internet. These phenomena are cination narrative with just an addition of new terms to analyzed in the context of reactions (positive, negative) to our existing ontology. In accordance, we added to our information about local and world events. Our primary ontology sentiment to specific vaccines (Pfizer, Moderna, sources are social networks, and discussions and comment Astra Zeneca, Sputnik, NovaVax), specific medical terms boards within web pages. A part of the project focuses like SARS, Spike-protein, RNA, sentiment towards the ef- on analysis, visualization and dissemination of informa- ficiency and need of vaccination. tion about events at the surface Internet. We have also presented a novel architecture in the 5.3 Industry and research feedback scheme of two pipeline solutions. The first pipeline covers AI methods used for NLP tasks, training and data manage- Our system is not scaled for harvesting all available data ment. The second pipeline covers data gathering, storage, on social networks and surface Internet. We stress that we cleaning and simple meta-annotation. Main tasks run in focus on specific datasets and specific ecosystems that are a batch mode via an open analytic toolbox. First experi- used like a main observation point for the phenomena we mental results based on test dataset Sentihood proved ef- model and try to understand. To be more specific we find ficiency of our architecture which is now prepared to pro- a value in a transparent definition of dataset and sources cess larger-scale datasets acquired from Internet. description – both in the system and internally within re- Our recent research focuses on the task of aspect based search community. We thus see the system ZREC also as sentiment analysis (ABSA). We see a clear promise in a tool presenting some basic methodologies to select and building a strong ontology of entities and relation which describe sources which are used to get data. can detect both standard narratives related to key topics We still assume the creation of a universal AI crawler (national security, politics, COVID. . . ) and anomalies. which can process data collections from various sources Further research work is seen mainly in the development as very important, but in the core development we focus and implementation of new ABSA methods, and in defi- more on the creation of the NLP AI pipeline which can be nition of new data transformation into multi-dimensional used to understand the phenomena. spaces allowing for their better understanding. Finally, the We expect that our project would benefit from multi-tier crucial step is the data acquisition focusing on current ac- cooperation with research centers, universities and indus- tive narratives in cyberspace which are in the center of our try partners. This is confirmed by the response of potential studies. benefiters, and we use the academic space also as a call for a join initiative incorporating people, IT resources and in- ternal information and ecosystem knowledge. Acknowledgements The research was supported by the Silesian Univer- 5.4 Progress and upcoming tasks sity in Opava under the Student Funding Scheme, project SGS/9/2019, by the Student Grant Founda- The ZREC system is being developed under the SCRUM tion - SGF/5/2020, and by European Union un- methodology. The complexity of the development was re- der European Structural and Investment Funds Opera- duced due the clustering of the system into mentioned sub- tional Programme Research, Development and Education systems. An efficient way of dealing with data and models project “Zvýšení kvality vzdělávání na Slezské univerzitě was the introduction of the two pipeline solutions provid- v Opavě ve vazbě na potřeby Moravskoslezského kraje” ing an open tool set. CZ.02.2.69/0.0/0.0/18_058/0010238. Incorporation of AI models suitable for NLP tasks is human-intensive within the scope of acquiring state of the art ideas, and the NLP training is demanding also in IT References resources. Due to this fact we focus on the integration of [1] C. A. BAIL ET AL ., Exposure to opposing views on social the ontology based solution with prepared data, which can media can increase political polarization, Proceedings of be used as best cost effective way to achieve results. As a the National Academy of Sciences, 115 (2018), pp. 9216– next step we will focus on development and incorporation 9221. [2] D. BAR -TAL, Group beliefs: A conception for analyzing [19] M. PAVLÍ ČEK , T. F ILIP, AND P. S OSÍK, Zrec.org - group structure, processes, and behavior, Springer Science psychosocial phenomena studies in cyberspace, in ITAT & Business Media, 2012. 2020: Information Technologies – Applications and The- [3] BBC, Twitter hides Trump tweet for ’glorifying vi- ory, 2020, pp. 209–216. olence’, 2020. https://www.bbc.com/news/ [20] O. RONNEBERGER , P. F ISCHER , AND T. B ROX, U-net: technology-52846679. Convolutional networks for biomedical image segmenta- [4] Annual report of the security information service for 2016, tion, 2015. 2017. https://www.bis.cz/public/site/bis.cz/ [21] K. S HU , A. S LIVA , S. WANG , J. TANG , AND H. L IU, content/vyrocni-zpravy/en/ar2016en.pdf. Fake news detection on social media: A data mining per- [5] Annual report of the security information service for 2017, spective, SIGKDD Explor. Newsl., 19 (2017), p. 22–36. 2018. https://www.bis.cz/public/site/bis.cz/ [22] C. S INDERMANN , J. D. E LHAI , M. M OSHAGEN , AND content/vyrocni-zpravy/en/ar2017en.pdf. C. M ONTAG, Age, gender, personality, ideological atti- [6] Annual report of the security information service for 2018, tudes and individual differences in a person’s news spec- 2019. https://www.bis.cz/public/site/bis.cz/ trum: how many and who might be prone to “filter bub- content/vyrocni-zpravy/en/ar2018en.pdf. bles” and “echo chambers” online?, Heliyon, 6 (2020), [7] V. B LAZEVIC , C. W IERTZ , J. C OTTE , K. DE RUYTER , p. e03214. AND D. I. K EELING , Gosip in cyberspace: Conceptual- [23] C. S UN , L. H UANG , AND X. Q IU, Utilizing bert for ization and scale development for general online social in- aspect-based sentiment analysis via constructing auxiliary teraction propensity, Journal of Interactive Marketing, 28 sentence, arXiv preprint 1903.09588, (2019). (2014), pp. 87 – 100. [24] H. T IAN , C. G AO , X. X IAO , H. L IU , B. H E , H. W U , [8] J. DAI , H. YAN , T. S UN , P. L IU , AND X. Q IU, Does syn- H. WANG , AND F. W U, Skep: Sentiment knowledge en- tax matter? a strong baseline for aspect-based sentiment hanced pre-training for sentiment analysis, 2020. analysis with roberta, 2021. [25] A. VASWANI , N. S HAZEER , N. PARMAR , J. U SZKOREIT, [9] J. D EVLIN , M.-W. C HANG , K. L EE , AND L. J ONES , A. N. G OMEZ , L. U . K AISER , AND I. P OLO - K. T OUTANOVA, Bert: Pre-training of deep bidirec- SUKHIN , Attention is all you need, 2017. tional transformers for language understanding, 2019. [26] D. WADDEN , U. W ENNBERG , Y. L UAN , AND H. H A - [10] A Europe that protects: The EU steps up action JISHIRZI , Entity, relation, and event extraction with con- against disinformation, 2018. http://europa.eu/ textualized span representations, 2019. rapid/press-release_IP-18-6647_en.htm. [27] X. WANG , Y. J IANG , N. BACH , T. WANG , Z. H UANG , [11] A. G IACHANOU AND F. C RESTANI, Like it or not: A sur- F. H UANG , AND K. T U, Improving named entity recogni- vey of twitter sentiment analysis methods, ACM Computing tion by external context retrieving and cooperative learn- Surveys (CSUR), 49 (2016), pp. 1–41. ing, 2021. [12] T. H ATMAKER, Youtube bans david duke, richard [28] D. W EISSENBORN ET AL, Jack the Reader — a machine spencer and other white nationalist accounts, 2020. reading framework, in Proceedings of the 56th Annual https://techcrunch.com/2020/06/29/youtube-ban-stefan- Meeting of the Association for Computational Linguistics molyneux-david-duke-white-nationalism/. (ACL) System Demonstrations, July 2018. [13] J. K APUSTA , P. H ÁJEK , M. M UNK , AND L’ UBOMÍR [29] B. X U , Q. WANG , Y. LYU , Y. Z HU , AND Z. M AO, Entity B ENKO, Comparison of fake and real news based on structure within and throughout: Modeling mention depen- morphological analysis, Procedia Computer Science, 171 dencies for document-level relation extraction, 2021. (2020), pp. 2285 – 2293. Third International Conference on [30] A. YADAV AND D. K. V ISHWAKARMA, A comparative Computing and Network Communications (CoCoNet’19). study on bio-inspired algorithms for sentiment analysis, [14] M. K USNER , Y. S UN , N. KOLKIN , AND K. W EIN - Cluster Computing, 23 (2020), pp. 2969–2989. BERGER , From word embeddings to document distances, in [31] I. YAMADA , A. A SAI , H. S HINDO , H. TAKEDA , AND Proceedings of the 32nd International Conference on Ma- Y. M ATSUMOTO, Luke: Deep contextualized entity repre- chine Learning, F. Bach and D. Blei, eds., vol. 37 of Pro- sentations with entity-aware self-attention, 2020. ceedings of Machine Learning Research, Lille, France, 07– [32] Z. YANG , Z. DAI , Y. YANG , J. C ARBONELL , 09 Jul 2015, PMLR, pp. 957–966. R. S ALAKHUTDINOV, AND Q. V. L E, Xlnet: General- [15] Y. L IU , M. OTT, N. G OYAL , J. D U , M. J OSHI , D. C HEN , ized autoregressive pretraining for language understand- O. L EVY, M. L EWIS , L. Z ETTLEMOYER , AND V. S TOY- ing, 2020. ANOV , Roberta: A robustly optimized bert pretraining ap- [33] L. Z HANG , S. WANG , AND B. L IU, Deep learning for sen- proach, 2019. timent analysis: A survey, Wiley Interdisciplinary Reviews: [16] P. M IKA, Ontologies are us: A unified model of social net- Data Mining and Knowledge Discovery, 8 (2018), p. e1253. works and semantics, Journal of web semantics, 5 (2007), [34] N. Z HANG , X. C HEN , X. X IE , S. D ENG , C. TAN , pp. 5–15. M. C HEN , F. H UANG , L. S I , AND H. C HEN, Document- [17] T. M IKOLOV, K. C HEN , G. C ORRADO , AND J. D EAN, Ef- level relation extraction as semantic segmentation, 2021. ficient estimation of word representations in vector space, [35] T. Z HANG , V. K ISHORE , F. W U , K. Q. W EINBERGER , 2013. AND Y. A RTZI , Bertscore: Evaluating text generation with [18] C. T. N GUYEN, Echo chambers and epistemic bubbles, bert, 2020. Episteme, 17 (2020), p. 141–161.