=Paper= {{Paper |id=Vol-2962/paper48 |storemode=property |title=ZREC architecture for textual sentiment analysis |pdfUrl=https://ceur-ws.org/Vol-2962/paper48.pdf |volume=Vol-2962 |authors=Martin Pavlíček,Tomáš Filip,Petr Sosik |dblpUrl=https://dblp.org/rec/conf/itat/PavlicekFS21 }} ==ZREC architecture for textual sentiment analysis== https://ceur-ws.org/Vol-2962/paper48.pdf
                           ZREC architecture for textual sentiment analysis

                                              Martin Pavlíček, Tomáš Filip, and Petr Sosík

                    Institute of Computer Science - Faculty of Philosophy and Science - Silesian University in Opava
                                                   martin.pavlicek@fpf.slu.cz

Abstract: We present recent results of the research project               IT and in an applied research. We focus on understanding
ZREC aimed at psycho-social phenomena (group polar-                       these phenomena within specific ecosystem - nation, lan-
ization, belief echo chamber and confirmatory bias) analy-                guage, a selected group of sources and other parameters.
sis based on bio-inspired computing methods. We present                   In a simple way, we can analyze approval or disapproval
two updated pipeline solutions to work with bio inspired                  with world events which occurred as information in cy-
AI methods and data gathering tools integrated in a com-                  berspace or within interaction of individuals who act on
plex (but simple to implement) vertical information sys-                  the surface Internet.
tem. The scope of the investigated phenomena is reduced                      The paper is organized as follows: in the next section,
to the aspect based sentiment analysis with an integration                we describe a novel project architecture based on pipelined
of methods covering named entity recognition and relation                 tasks. Data pre-processing phase is described in Section 3.
extraction. We present a simple ontology addition to group                Section 4 presents details of the key project component -
polarization in the last year due to COVID pandemic and                   aspect based sentiment analysis, and experimental result
stress the importance of project in the social and IT sphere              we have obtained with our architecture using three differ-
and multi-tier cooperation. We also provide introductory                  ent deep learning models. The two last sections contain
results based on test data using several deep learning ar-                discussion and conclusions.
chitectures and demonstrating that the presented approach
is robust and functional.
                                                                          2     Project description
1    Introduction
                                                                          2.1   Pipeline
In the recent years we can see dramatic increase in inter-
                                                                          In a pipeline view of the system we introduce two pipeline
action between individuals and groups in cyberspace [7]
                                                                          solutions which cover both data and the AI methodol-
together with news dissemination [2] and real time report-
                                                                          ogy integration model. This division is needed to track
ing, as well as increasingly polarized groups presenting
                                                                          changes, to track learning data and their ability to create a
their narrative and beliefs [13] in the cyberspace.
                                                                          narrative bias and to share these metadata within develop-
   We can also see processes of regulation [12] [3] and
                                                                          ers community.
specific narrative information enforcement, which are not
                                                                             The first pipeline covers the implementation and train-
only due to the novel COVID situation worldwide. To-
                                                                          ing of ML methods for the NLP analysis. In this pipeline
gether with cybersecurity, national interests are aligned
                                                                          we store and train specific models of our live data and we
with acceptation of information as weapons and informa-
                                                                          also store pre-trained models and analyze the results. At
tion warfare battlefield [5] [6] [4] [10].
                                                                          any time we can access a specific version of the model to-
   These premises motivate us to investigate and build
                                                                          gether with specified data which can provide feedback and
tools to understand the flow of information in cyberspace
                                                                          a possible rollback in the system’s development.
in a more open and rigorous manner. To keep the project
                                                                             The second pipeline focuses on data gathering, cleanup
manageable, we restrict our investigation to information
                                                                          and storage. To exploit different sources and different so-
about event exposures and specific sentiment reactions
                                                                          cial networks like Facebook, GAB, Twitter, Parler and oth-
(positive, negative, neutral) which rise in an individual and
                                                                          ers, we maintain a set of tools which are used to gather data
which can be traced to a group behavior. We focus on three
                                                                          from predefined sources within a defined algorithm. The
phenomena – group polarization [1], belief echo chamber
                                                                          data are cleaned, meta-annotated and stored in the system.
[18] [22] and confirmatory bias [21]. Besides the interac-
tion we monitor world events through the GDELT dataset                       Further work with the data is possible within the com-
which is viewed as a trigger of sentiment response.                       mon batch analysis framework (described bellow) which
   The goal is to investigate these phenomena and main-                   is available to the users (Figure 1).
tain an open system ZREC (www.zrec.org) and its cor-
nerstones – algorithms, research community and methods                    2.2   Architecture
which can be used for further work both in the scope of
     Copyright ©2021 for this paper by its authors. Use permitted under   We can describe the state of the system as a scalable verti-
Creative Commons License Attribution 4.0 International (CC BY 4.0).       cal architecture which has emerged from the initial phase.
                                                                 The advantage of this tool is the ability to process data
                                                                 without using Twitter´s API.
                                                                 Ontology is used as the main data structure to define
                                                                 groups and individuals. A comprehensive definition of on-
                                                                 tology of captions is a strong tool to solve complex situa-
                                                                 tion of similarity and anomaly detection. We use a relation
                                                                 database to store a predefined a specific static ontology
                                                                 of captions transformed into graph network [16] which is
                                                                 then used for computational purposes.
Figure 1: A pipeline view of the system architecture with        Batch analysis defines framework of methods of analy-
an optional batch analysis                                       sis in the system. The system is built to handle multiple
                                                                 tasks from multiple users on multiple data sources. Batch
                                                                 analysis provides a robust system of common analytical
In the scope of technology, we work with scripting lan-          queries which can be used as a simple batch scheduler.
guages for creating the application part of the system, rela-    This definition of tasks gives us the ability to store specific
tional (SQL) and graph databases are used to store the data      combinations of data, users and methods which altogether
and to provide the basic architecture. For presentation we       control the analysis. In the user scenario this gives us the
use the concept of web information system and use a li-          ability to cache and speed-up processes and to have a pool
brary of visual front-end framework to simply present the        of results which can be used for further comparison and
front-end of the system to end users.                            cross-check.
   Our goal is to create a complex yet relatively simply
implementable system (Figure 2). The architecture can be         Information system core is the meta programming lan-
divided into two parts. The first part is an administrative      guage we use to build the system. Base of the information
and methodical system. The second part is the data part          system has the ability to render data pages, to check global
combined with AI methods. The key components of the              and parametric permissions, to define users and their roles
system represent data collection and tagging, NLP meth-          and their history. The core gives us an ability to tweak
ods and dataset warehouse, group and individual ontology         the system, to view it with permissions of other roles and
graphs, common system analytical tasks scheduler.                users, and to give a transparent model of accessing all the
                                                                 data and all subsystems.
Bio inspired methods training ground is used to store
specific (mostly deep learning) AI methods [33, 30], with        Specific data module interrelates data sources and events
pre-selected training data and specific iteration of pre-        gathered from the surface Internet. Information about
trained methods as an essential part of our system. This         events are obtained through the GDELT2 dataset in a
part of system gives us the ability to strongly support the      CAMEO format. Further specific datasets (textual and nu-
integration of new bio inspired learning models for emer-        merical) are being integrated to the system - currently the
gence of update both models and specific data which were         storage of COVID cases from authoritative sources (Johns
used to train these models. From our experiments we see          Hopkins University3 ).
a strong trend to gain a specific bias when training our         Translation module define roles of translators which can
models on live data from certain sources. This is, e.g., the     access the system and proceed with translation from/to dif-
effect of echo chambers present in the sources we gather         ferent languages, increasing system’s accessibility.
data from. The ability to snapshot model training data and
model definition is essential.
Data gathering and tagging is a part of the system fo-
                                                                 3      Data pre-processing
cusing on definition of selected sources and individuals,
                                                                 In this section we describe a series of recent known meth-
as well as selected methods and algorithms to gather the
                                                                 ods for text feature extraction which are (or will be) used
predefined text data. We focus on simple definition of se-
                                                                 in our architecture to pre-process input data for the exper-
lectors and the ability to self heal within error spaces.
                                                                 iments described in the next section.
   For a survey of possible methods we refer the reader to,
e.g. [11]. For instance we use Twint tools1 - Twitter In-
telligence Tool to collect data from Twitter using Python        3.1     Creating a dataset
language and bypassing the need to use Twitter API. With
the help of this tool we can select queries for specific users   It is necessary to label the collected data for further pro-
and specify the time period for which we want to collect         cessing. Manual data labeling is time consuming. Tomáš
all available data. Our gathered data includes posts, com-       Mikolov et al. (2013) [17] introduced a method Word2Vec
ments, and user interactions, including related metadata.              2 https://www.gdeltproject.org/
   1 https://github.com/twintproject/twint                             3 https://github.com/owid/covid-19-data/tree/master/public/datag
                                                                 the given texts. A concatenated input document and doc-
                                                                 uments returned from search engine are used together to
                                                                 train the model. The assumption is that both outputted
                                                                 distributions should be similar. This is done by updating
                                                                 loss function. This model achieved the highest score on 8
                                                                 different NER datasets from different domains.

                                                                 3.3   Relation Extraction
                                                                 Apart from the NER, another important task for text com-
                                                                 prehension is to classify the relationship between entities.
                                                                 Xu et al. (2021) [29] added the Structured Self-Attention
                                                                 Network (SSAN) to the Transformer deep learning archi-
                                                                 tecture. The SSAN model incorporates the Biaffine Trans-
                                                                 formation or Decomposed Linear Transformation which
                                                                 creates the structure Si, j . This structure represents the con-
           Figure 2: Architecture of the system                  nection between words wi and w j and makes it possible to
                                                                 classify the type of link between entities and discover co-
                                                                 reference structures. Wadden et al. (2019) [26] introduced
which project the word into a multidimensional feature           the multi-task framework DYGIE++, for three tasks of in-
vector. This projection allows us to use vector alge-            formation extraction: RE, NER and event extraction. The
bra tools to measure the distance between words. If we           basis is a pre-trained NLP model. Its outputs are sent to the
are able to determine how semantically similar individ-          graph propagation module. It then modifies the representa-
ual words are, we can use this technique to measure the          tion by integrating the current representation with previous
relevance of texts. One such tool for document similar-          representations using the gating function. The resulting
ity metric is Word Mover´s Distance (WMD Kusner et al.           predictions are obtained from the re-contextualized repre-
2015) [14]. WMD finds the minimum distance to transport          sentation using a scoring function. It contains two feed-
all words from a source document to a destination docu-          forward neural nets (FFNN). The final outputs are equal
ment. Because this method uses pre-trained embedding,            to FFNN(gi ) for NER and FFNN([gi g j ]) for RE, where gi
WMD allows us to find a relationship between texts that do       and g j are the representations for span i and j. A differ-
not share same words but has a similar meaning. Relaxed          ent approach was used by Zhang et al. (2021) [34] where
word mover´s [14] distance further reduces the time con-         they applied U-Net (Ronneberger et al. 2015) [20] model
suming of WMD from O(p3 logp) to O(p2 ), where p de-             known from computer vision to find global relationships
notes the number of unique words in the texts. This tech-        between entities. First, they created an entity-level relation
nique allows to find the most relevant texts for the given       matrix. Entity similarity was calculated using similarity-
query and thus to streamline the process of creation of a        based method (concatenating cosine similarity, element-
training dataset.                                                wise similarity and bilinear similarity) or context based
                                                                 method (entity-aware attention). The feature vectors form
                                                                 a matrix M i× j×d , where i and j indicate a relation between
3.2   Named Entity Recognition (NER)
                                                                 i-th and j-th entity, d is the size of feature vector. This ma-
One of the essential functions of natural text processing        trix is put to the U-Net model where d serves as a feature
models is to correctly predict name entities and the re-         channel. The resulting relational type probability are ob-
lationships between them. This capability is important           tained using feedforward network, entity pair embedding
for tasks that use named entities such as Question An-           and output from the U-net model.
swering (QA) or entity Relation Extraction (RE). Models
handling contextual information have brought significant
improvement for NER. Yamada et al. (2020) [31] added
                                                                 4     Experimental results
the entity-aware self-attention mechanism and entity type
                                                                 4.1   Aspect Based Sentiment Analysis (ABSA)
embedding to its model. He also added a pre-training
task where he replaced a certain number of entities with         ABSA is a method for classifying text polarity. In contrast
a special hMASKi token in order to predict these entities.       to aspect analysis, it makes it possible to determine senti-
This model has achieved the most accurate results on tasks       ment in a fine-grained detail. The analyzed document may
working with entities: NER, relation classification and en-      be related to several independent aspects and each of these
tity typing. Wang et al. (2021) [27] used a search engine        aspects may have different sentiment. Thus ABSA can be
to find texts semantically similar to the input text. To eval-   divided into two separate tasks. First, finding all aspects
uate similar texts they used BertScore (Zhang et al. 2020)       which occur in the sentence. Second, predict sentiment to
[35], which measures cosine similarity between tokens of         each aspects.
   Various methods have been proposed to solve this task.            Model      Parameters    Precision    Recall      AUC
One of the classic solutions is the formation of a depen-            BERT        5 701 889      0.9996     0.9504    0.9918
dency tree. Devlin et al. (2019) [9] introduced the BERT             XLNet       6 368 001      0.9996     0.9460    0.9939
model built on the Transformer architecture (Vaswani et              RoBERTa     5 701 889      0.9987     0.9383    0.9917
al. 2017) [25]. BERT was created to capture the right and
left context of a word and it was used as a backbone in          Table 1: Training results of the ML models BERT, XL-
many ABSA models. The BERT was pre-trained to pre-               Net and RoBERTa on the Sentihood dataset with targeted
dict tokens in the sentences that were artificially corrupted.   auxiliary sentences.
Some randomly selected words from sentence were re-
places by special hMASKi token. A disadvantage of this
                                                                 of using auxiliary sentence for predicting polarities is the
pre-training task is the loss of the context between masked
                                                                 need for repeated predictions for each aspect.
words.
                                                                    In our experiments we tested our architecture with the
   This problem is solved by the XLNet model (Yang et al.
                                                                 pre-trained BERT, XLNet and RoBERTa deep learning
2020) [32] learning contextual information from all per-
                                                                 models. Hyperparameters of the models were set as fol-
mutations of the factorization order. This method ensures
                                                                 lows: No. of training epochs 150–200, batch size 48,
that contextual information from all possible positions of
                                                                 learning rate 1e-5, optimizer: Adam Weight Decay. The
the right and left context are used.
                                                                 numerical scores of training of the three models is sum-
   Liu et al. (2019) [15] introduced the Robustly optimized
                                                                 marized in Table 1, where AUC stands for the Area Under
BERT approach (RoBERTa), which has been pre-trained
                                                                 Curve Score.
on a more robust data corpus than BERT using larger batch
                                                                    Graphical comparison of results of the three models is
sizes. However, the pre-training tasks do not directly in-
                                                                 presented at Fig. 3 and 4. We can conclude that all three
corporate text sentiment determination.
                                                                 models provided rather impressive results and that the tex-
   Tian et al. (2020) [24] introduced the self-supervised        tual analysis in our ZREC architecture proves applicable
SKEP method for pre-training the BERT model. Instead             to real world data which we are now collecting.
of randomly selected words as in BERT, words related to
sentiment or aspects are selected for replacement with the
hMASKi token. The model predicts the words polarity              5     Discussion
and the masked sentiment words. Models pre-trained us-
ing this method achieved better performance than baseline
                                                                 5.1   Retrospective look
models.
   Dai et al. (2021) [8] used the Pertubed masking method        The project ZREC defines two areas of importance – for
which searches for syntactic connections in a pre-trained        society and for IT. Both can be achieved by creating an
BERT model to create an induced tree.                            open distributed ecosystem which can be used to under-
   Finally, Sun et al. (2019) [23] used two different inputs     stand emerging phenomena. This is now even more im-
for the pre-trained BERT model. The first input is a sen-        portant as in the last year we saw a world transforming
tence from the dataset and the second input is an auxiliary      via COVID restrictions, and so the need to understand
sentence. The auxiliary sentence contains the target and         cyberspace phenomena and their influence on society is
the aspect. Using these two inputs, the model predicts the       still more urgent as the communication is moving to cy-
resulting polarity. This method transforms the ABSA task         berspace. We see this like a clear trend and motivation for
to a QA task.                                                    the project.
                                                                    The IT research side is more profound since we want to
                                                                 develop, integrate and implement state-of-the-art AI meth-
4.2   Experiments with BERT, XLNet and RoBERTa
                                                                 ods aimed at natural language understanding in specific
                                                                 areas. Hence the function of the project as a strongly de-
To evaluate the capabilities of our pipeline architecture,
                                                                 fined sandbox which is an integration tool for plethora of
we performed a series of experiments based on the
                                                                 specific methods from the NLP filed is both effective and
test dataset Sentihood which is publicly available at
                                                                 promising.
https://github.com/uclnlp/jack/tree/master/
data/sentihood as a part of the project Jack the Reader
(JACK) [28]. The Sentihood dataset contains opinions             5.2   Trends and main ontology themes
about living in various locations in London, UK. In
particular, there are 2480 training samples (opinions) with      In our previous publication [19] we defined a main ontol-
positive sentiment and 921 with negative sentiment, i.e.,        ogy based on basic polarization which defined entities in
3401 in total. Instead of processing the whole ABSA              our information ecosystem, like sentiment towards: Czech
pipeline, we used a predefined subset of aspects which we        Republic, United States, Russia, Israel, Ukraine, politi-
wanted to predict in the collected data and we created an        cal figures from the United States, Russia and also Czech
appropriate set of auxiliary sentences. The disadvantage         Republic, intelligence agencies like CIA, FSB, GRU, BIS
Figure 3: Training progress of the ML models BERT, XLNet and RoBERTa on the Sentihood dataset during 200 training
epochs.




        Figure 4: Prediction success of the ML models BERT, XLNet and RoBERTa on the Sentihood dataset.
and others. This ontology together with the used sources       of new self-pretraining methods specifically designed for
is a key factor in creating an individual or group profile.    sentiment classification. Promising solutions for ABSA
   A new communication topic with an enormous socio-           can be based on auxiliary sentences and attention model
economic impact and an adequate amount of hoax and fake        usage.
news has emerged: vaccination, COVID restrictions and
COVID pandemic acknowledgment. These topics are (to-           6   Conclusion
gether with topics covering national security and politics)
in the center of interaction covering basic events emerg-      We have presented an updated ZREC project (www.zrec.
ing in the cyberspace. With our modular architecture we        org) whose aim is the analysis of psycho-social phenom-
can continue to follow individual and group responses and      ena (group polarization, belief echo chamber and confir-
polarizations based on the interaction in the field of vac-    matory bias) in the surface Internet. These phenomena are
cination narrative with just an addition of new terms to       analyzed in the context of reactions (positive, negative) to
our existing ontology. In accordance, we added to our          information about local and world events. Our primary
ontology sentiment to specific vaccines (Pfizer, Moderna,      sources are social networks, and discussions and comment
Astra Zeneca, Sputnik, NovaVax), specific medical terms        boards within web pages. A part of the project focuses
like SARS, Spike-protein, RNA, sentiment towards the ef-       on analysis, visualization and dissemination of informa-
ficiency and need of vaccination.                              tion about events at the surface Internet.
                                                                  We have also presented a novel architecture in the
5.3   Industry and research feedback                           scheme of two pipeline solutions. The first pipeline covers
                                                               AI methods used for NLP tasks, training and data manage-
Our system is not scaled for harvesting all available data     ment. The second pipeline covers data gathering, storage,
on social networks and surface Internet. We stress that we     cleaning and simple meta-annotation. Main tasks run in
focus on specific datasets and specific ecosystems that are    a batch mode via an open analytic toolbox. First experi-
used like a main observation point for the phenomena we        mental results based on test dataset Sentihood proved ef-
model and try to understand. To be more specific we find       ficiency of our architecture which is now prepared to pro-
a value in a transparent definition of dataset and sources     cess larger-scale datasets acquired from Internet.
description – both in the system and internally within re-        Our recent research focuses on the task of aspect based
search community. We thus see the system ZREC also as          sentiment analysis (ABSA). We see a clear promise in
a tool presenting some basic methodologies to select and       building a strong ontology of entities and relation which
describe sources which are used to get data.                   can detect both standard narratives related to key topics
   We still assume the creation of a universal AI crawler      (national security, politics, COVID. . . ) and anomalies.
which can process data collections from various sources           Further research work is seen mainly in the development
as very important, but in the core development we focus        and implementation of new ABSA methods, and in defi-
more on the creation of the NLP AI pipeline which can be       nition of new data transformation into multi-dimensional
used to understand the phenomena.                              spaces allowing for their better understanding. Finally, the
   We expect that our project would benefit from multi-tier    crucial step is the data acquisition focusing on current ac-
cooperation with research centers, universities and indus-     tive narratives in cyberspace which are in the center of our
try partners. This is confirmed by the response of potential   studies.
benefiters, and we use the academic space also as a call for
a join initiative incorporating people, IT resources and in-
ternal information and ecosystem knowledge.
                                                               Acknowledgements
                                                               The research was supported by the Silesian Univer-
5.4   Progress and upcoming tasks                              sity in Opava under the Student Funding Scheme,
                                                               project SGS/9/2019, by the Student Grant Founda-
The ZREC system is being developed under the SCRUM             tion - SGF/5/2020, and by European Union un-
methodology. The complexity of the development was re-         der European Structural and Investment Funds Opera-
duced due the clustering of the system into mentioned sub-     tional Programme Research, Development and Education
systems. An efficient way of dealing with data and models      project “Zvýšení kvality vzdělávání na Slezské univerzitě
was the introduction of the two pipeline solutions provid-     v Opavě ve vazbě na potřeby Moravskoslezského kraje”
ing an open tool set.                                          CZ.02.2.69/0.0/0.0/18_058/0010238.
   Incorporation of AI models suitable for NLP tasks is
human-intensive within the scope of acquiring state of the
art ideas, and the NLP training is demanding also in IT        References
resources. Due to this fact we focus on the integration of      [1] C. A. BAIL ET AL ., Exposure to opposing views on social
the ontology based solution with prepared data, which can           media can increase political polarization, Proceedings of
be used as best cost effective way to achieve results. As a         the National Academy of Sciences, 115 (2018), pp. 9216–
next step we will focus on development and incorporation            9221.
 [2] D. BAR -TAL, Group beliefs: A conception for analyzing          [19] M. PAVLÍ ČEK , T. F ILIP, AND P. S OSÍK, Zrec.org -
     group structure, processes, and behavior, Springer Science           psychosocial phenomena studies in cyberspace, in ITAT
     & Business Media, 2012.                                              2020: Information Technologies – Applications and The-
 [3] BBC, Twitter hides Trump tweet for ’glorifying vi-                   ory, 2020, pp. 209–216.
     olence’, 2020.             https://www.bbc.com/news/            [20] O. RONNEBERGER , P. F ISCHER , AND T. B ROX, U-net:
     technology-52846679.                                                 Convolutional networks for biomedical image segmenta-
 [4] Annual report of the security information service for 2016,          tion, 2015.
     2017. https://www.bis.cz/public/site/bis.cz/                    [21] K. S HU , A. S LIVA , S. WANG , J. TANG , AND H. L IU,
     content/vyrocni-zpravy/en/ar2016en.pdf.                              Fake news detection on social media: A data mining per-
 [5] Annual report of the security information service for 2017,          spective, SIGKDD Explor. Newsl., 19 (2017), p. 22–36.
     2018. https://www.bis.cz/public/site/bis.cz/                    [22] C. S INDERMANN , J. D. E LHAI , M. M OSHAGEN , AND
     content/vyrocni-zpravy/en/ar2017en.pdf.                              C. M ONTAG, Age, gender, personality, ideological atti-
 [6] Annual report of the security information service for 2018,          tudes and individual differences in a person’s news spec-
     2019. https://www.bis.cz/public/site/bis.cz/                         trum: how many and who might be prone to “filter bub-
     content/vyrocni-zpravy/en/ar2018en.pdf.                              bles” and “echo chambers” online?, Heliyon, 6 (2020),
 [7] V. B LAZEVIC , C. W IERTZ , J. C OTTE , K. DE RUYTER ,               p. e03214.
     AND D. I. K EELING , Gosip in cyberspace: Conceptual-           [23] C. S UN , L. H UANG , AND X. Q IU, Utilizing bert for
     ization and scale development for general online social in-          aspect-based sentiment analysis via constructing auxiliary
     teraction propensity, Journal of Interactive Marketing, 28           sentence, arXiv preprint 1903.09588, (2019).
     (2014), pp. 87 – 100.                                           [24] H. T IAN , C. G AO , X. X IAO , H. L IU , B. H E , H. W U ,
 [8] J. DAI , H. YAN , T. S UN , P. L IU , AND X. Q IU, Does syn-         H. WANG , AND F. W U, Skep: Sentiment knowledge en-
     tax matter? a strong baseline for aspect-based sentiment             hanced pre-training for sentiment analysis, 2020.
     analysis with roberta, 2021.                                    [25] A. VASWANI , N. S HAZEER , N. PARMAR , J. U SZKOREIT,
 [9] J. D EVLIN , M.-W. C HANG , K. L EE , AND                            L. J ONES , A. N. G OMEZ , L. U . K AISER , AND I. P OLO -
     K. T OUTANOVA, Bert: Pre-training of deep bidirec-                   SUKHIN , Attention is all you need, 2017.
     tional transformers for language understanding, 2019.           [26] D. WADDEN , U. W ENNBERG , Y. L UAN , AND H. H A -
[10] A Europe that protects: The EU steps up action                       JISHIRZI , Entity, relation, and event extraction with con-
     against disinformation, 2018.          http://europa.eu/             textualized span representations, 2019.
     rapid/press-release_IP-18-6647_en.htm.                          [27] X. WANG , Y. J IANG , N. BACH , T. WANG , Z. H UANG ,
[11] A. G IACHANOU AND F. C RESTANI, Like it or not: A sur-               F. H UANG , AND K. T U, Improving named entity recogni-
     vey of twitter sentiment analysis methods, ACM Computing             tion by external context retrieving and cooperative learn-
     Surveys (CSUR), 49 (2016), pp. 1–41.                                 ing, 2021.
[12] T. H ATMAKER, Youtube bans david duke, richard                  [28] D. W EISSENBORN ET AL, Jack the Reader — a machine
     spencer and other white nationalist accounts, 2020.                  reading framework, in Proceedings of the 56th Annual
     https://techcrunch.com/2020/06/29/youtube-ban-stefan-                Meeting of the Association for Computational Linguistics
     molyneux-david-duke-white-nationalism/.                              (ACL) System Demonstrations, July 2018.
[13] J. K APUSTA , P. H ÁJEK , M. M UNK , AND L’ UBOMÍR              [29] B. X U , Q. WANG , Y. LYU , Y. Z HU , AND Z. M AO, Entity
     B ENKO, Comparison of fake and real news based on                    structure within and throughout: Modeling mention depen-
     morphological analysis, Procedia Computer Science, 171               dencies for document-level relation extraction, 2021.
     (2020), pp. 2285 – 2293. Third International Conference on      [30] A. YADAV AND D. K. V ISHWAKARMA, A comparative
     Computing and Network Communications (CoCoNet’19).                   study on bio-inspired algorithms for sentiment analysis,
[14] M. K USNER , Y. S UN , N. KOLKIN , AND K. W EIN -                    Cluster Computing, 23 (2020), pp. 2969–2989.
     BERGER , From word embeddings to document distances, in         [31] I. YAMADA , A. A SAI , H. S HINDO , H. TAKEDA , AND
     Proceedings of the 32nd International Conference on Ma-              Y. M ATSUMOTO, Luke: Deep contextualized entity repre-
     chine Learning, F. Bach and D. Blei, eds., vol. 37 of Pro-           sentations with entity-aware self-attention, 2020.
     ceedings of Machine Learning Research, Lille, France, 07–       [32] Z. YANG , Z. DAI , Y. YANG , J. C ARBONELL ,
     09 Jul 2015, PMLR, pp. 957–966.                                      R. S ALAKHUTDINOV, AND Q. V. L E, Xlnet: General-
[15] Y. L IU , M. OTT, N. G OYAL , J. D U , M. J OSHI , D. C HEN ,        ized autoregressive pretraining for language understand-
     O. L EVY, M. L EWIS , L. Z ETTLEMOYER , AND V. S TOY-                ing, 2020.
     ANOV , Roberta: A robustly optimized bert pretraining ap-       [33] L. Z HANG , S. WANG , AND B. L IU, Deep learning for sen-
     proach, 2019.                                                        timent analysis: A survey, Wiley Interdisciplinary Reviews:
[16] P. M IKA, Ontologies are us: A unified model of social net-          Data Mining and Knowledge Discovery, 8 (2018), p. e1253.
     works and semantics, Journal of web semantics, 5 (2007),        [34] N. Z HANG , X. C HEN , X. X IE , S. D ENG , C. TAN ,
     pp. 5–15.                                                            M. C HEN , F. H UANG , L. S I , AND H. C HEN, Document-
[17] T. M IKOLOV, K. C HEN , G. C ORRADO , AND J. D EAN, Ef-              level relation extraction as semantic segmentation, 2021.
     ficient estimation of word representations in vector space,     [35] T. Z HANG , V. K ISHORE , F. W U , K. Q. W EINBERGER ,
     2013.                                                                AND Y. A RTZI , Bertscore: Evaluating text generation with
[18] C. T. N GUYEN, Echo chambers and epistemic bubbles,                  bert, 2020.
     Episteme, 17 (2020), p. 141–161.