<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards a Cross-article Narrative Comparison of News</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Harith Alani h.alani@open.ac.uk</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Alistair Willis</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Martino Mensio</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>The Open University</institution>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <abstract>
        <p>In the world of public misinformation, there are many cases where the information is not false or fabricated, but rather has been manipulated using more subtle techniques such as word replacements, selection of details, omissions and argument distortion. These techniques can have the e↵ect of influencing the reader's frame of mind towards the events reported. We currently lack the necessary tools to uncover such manipulations automatically. In this position paper, we propose an integrated analysis framework and pipeline to identify various narrative signals in news articles; such as structural roles, framing, and subjectivity. By comparing these at the document level and sentence level, it will be possible to highlight di↵erences of narrative techniques used to report the same news events.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Narrative analysis refers to the processing of a piece of text to understand and characterise its structure [Rie93].
Such an analysis could help to distinguish between event reports based on their narrative structure. These are
usually reflected through linguistic signals that can be more or less explicit, such as emphasising certain aspects,
changing the order in which certain information is presented, or using specific terminology to impose or stress a
certain opinion.</p>
      <p>In the specific case of news articles, their narrative structure usually follows a complex non-chronological
sequence, which tends to di↵er from other kinds of narrative that proceed more linearly [ZZBBN19]. It is a
choice that is made to “get a good story ” [Bel05], and can be exploited to emphasise or introduce non-objective
statements or causality relationships between events [Dah10].</p>
      <p>To avoid being manipulated, one solution suggested in the literature is to gather information from multiple
sources [ABS14, GAR97], and to cross-compare them in order to get a broader view of the event. The same
information, for example, may be presented by some sources and omitted by others, or the sequence of events
be presented di↵erently to emphasise di↵erent aspects. Therefore, we believe that readers should be made more
aware of the narrative and framing embedded in the piece of news they are consuming, and how they compare
with those in other articles reporting the same event. Currently, there are hardly any automated tools that o↵er
such functionality: the best readers can do is to use news aggregators that show articles grouped by events, but
they have to do such comparison on their own.</p>
      <p>In this position paper, we suggest a framework to automatically highlight the di↵erences in how the same
story is presented by di↵erent articles, by cross-comparing their narratives. To this end, the contributions of this
paper are: i) integrating several signals characterising the narrative of news; ii) presenting a processing pipeline
to link together similar articles at the document and sentence level, integrating the signals identified; and iii)
introducing a set of cross-article signals that aim to highlight the di↵erence of narrative techniques applied.</p>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <p>In this section, we provide an overview of previous studies in two areas of research. First, the investigation on
relationships between news articles which aims to find documents that cover the same information. Second, the
detection of narrative linguistic signals, which investigates and characterises several aspects of structure, framing,
and subjectivity. For both of them, we gather a set of techniques that enable our approach described in the next
Section 3.
2.1</p>
      <sec id="sec-2-1">
        <title>Relationships between news articles</title>
        <p>There are di↵erent possible types of relationships between news articles, such as similarity (covering the same
information), referencing (one is citing another one), and temporal proximity. They can be performed at the
document level (e.g., the whole article is similar to another one) or at the sentence level (e.g., the same sentence is
corroborated by a sentence in another article [BMTH18]) or even at the paragraph level. Since we are interested
in finding articles discussing the same information, we focus on similarity relationships. Other relationships
could add interesting features, such as the order of publication which would help to identify which of the articles
might have taken inspiration from the other. For the time being, we focus on studying and understanding the
role of similarity.</p>
        <p>At the article-level, there is a wide variety of work that investigates article clustering, and the methods mostly
used are Latent Dirichlet Allocation (LDA) or document embedding. LDA [BNJ03] is the most used technique for
topic modelling, as it allows the discovery of topics and to group articles accordingly using word distributions.
Another technique for grouping articles together is to compute a similarity measure (e.g., cosine similarity)
between numeric representations of the documents (TF-IDF [Jon72] or Language Models [DCLT18, CYyK+18,
YDY+19]). We plan to study these models in order to select the one that can eciently discriminate articles
that talk about the same events, even if they use di↵erent linguistics, from articles that may use the same subset
of words but talk about di↵erent events.</p>
        <p>Furthermore, there are works that not only link the articles at a document level, but also investigate in more
detail the connections between sentences. In one recent work [BMTH18], groups of similar articles are found,
then broken down to pieces of information and analysed to find if these details are corroborated (occurring in
multiple documents) or omitted (occurring in other documents of the same group, but not the current one).
We aim to use this idea of applying similarity to both article-level and sentence-level, extending it even to the
word-level. By doing so, not only we might be able to recognise which sentences appear in multiple documents
(with di↵erent degrees of similarity) but also we would be able to identify the specific words that have been
changed.</p>
        <p>However, this set of approaches are limited to bringing to the attention of the reader the linked information
pieces with a measure of similarity, without characterising the di↵erences. The reader would then need to evaluate
the di↵erences in the role of the sentence, the framing that it implies and how it compares with other sentences
in terms of subjectivity. Di↵erent documents may express the same set of details, but give them a di↵erent role
(reporting an action, commenting, contextualising, doing a digression, identifying causes and consequences) and
use di↵erent words that are semantically similar but may imply a di↵erent framing perspective. For this reason,
the next subsection presents a set of narrative linguistic signals that could provide us with the missing features.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Narrative linguistic signals</title>
        <p>There is much research on exposing the narrative using linguistic signals [ZZBBN19], with specific words that
indicate the structural role, framing and subjectivity of the part of text they belong to. One limitation is that
most of such works are applied to single articles, with little comparison between them.</p>
        <p>On one hand, some research considers the structural role of a sentence in the document (e.g., is it providing
some background, the main event, an evaluation). Di↵erent structural roles have been defined in the literature,
such as news schema [Bel91], which identifies hierarchical categories (e.g., action, reaction, consequence, context,
history), narrative structure [Bel05] (e.g., abstract, orientation, evaluation, complication, resolution), or linguistic
signals [ZZBBN19, Mar00]. Such signals could be used to identify the di↵erences between similar sentences with
regards to their structural roles in the articles.</p>
        <p>On the other hand, there is much literature on framing, defined as how a certain story is presented to shape
mass opinion [Gof74], the addition to the underlying facts that reflects the sociocultural context and acts as
an underlying force to persuade the reader. The work by [GM89] describes a set of framing packages, made
of framing devices (e.g., word choice, metaphors, catchphrases, use of contrast, quantification) and reasoning
devices (e.g., problem definition, cause, consequence, solution, action). Additionally, the Frame Semantics
Theory [Fil06] can be used to recognise lexical units of known frames. By extracting these linguistic signals, we
could represent the framing behind a certain piece of text, and there exist di↵erent approaches to extract the
listed features [MGB+17, GCCZ18, Asg16, STDS17].</p>
        <p>In addition to these two characterisations, we can add other signals derived from studies on subjectivity. As
found by recent research, in contemporary journalism the line between opinion and facts is blurring more and
more [JWJ+19]. For this reason, having signals of subjectivity on the document and paragraph-level would be
very useful [Liu10]. In this way, each article and each paragraph can be characterised with an indication of
subjectivity.</p>
        <p>All these features have been used in previous research, but as mentioned above, they are mainly applied to
single-article analysis. Extending this kind of analysis by taking into consideration the relationships both at the
article level and the sentence level would bring a big contribution by providing contrastive signals that would
not come up otherwise.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3 Cross-article comparison framework</title>
      <p>In this section, we propose a description of our comparison framework. We plan to use methods coming from
both the research areas identified (document linking and linguistic signals) as a starting point. In order to do
so, we propose the following processing pipeline:
• preprocessing: documents are retrieved, cleaned up and fragmented into paragraphs and sentences;
• narrative features are attached to each document, paragraph and sentence belonging to three main
types: structural role using and highlighting the linguistic devices provided by [ZZBBN19]; framing features
are extracted (framing and reasoning devices) finding some linguistic representatives from [GM89, Fil06];
subjectivity is computed, and strong word choices are highlighted [Liu10];
• linking: similar articles are found by using document-level similarity measures: in this way it would be
possible to find groups of documents that describe the same events; similar sentences and paragraphs are
found by sentence-level similarity measures, inside each group of documents: corroborated and omitted
sentences are identified [BMTH18].</p>
      <p>Figure 1 shows the result of such processing over two articles, where we have several features attached to the
sentences, with similar paragraphs across the two articles linked together using a similarity measure [CYyK+18].
TITLE: faig
BRITAIN ON EDGE Coronairus: NHS told risk("on edge",
hgroateochoanncdelrenisnfbeucgtioilulhsitbUodKieitshainmdidas "cgerrataei"n)t,("ill")
ARTICLE:
The 11-page guide as prepared b Public
Health England and has adised GPs to
aoid eamining suspected coronairus
ictims and keep them in closed rooms.</p>
      <p>All 31 people across England, Wales,
Scotland and Northern Ireland that had been freisakr((""dfeeaardsl"")),,
tnheeasgeteaitdtsiefr,osrbtutphtoetshdieteireaetdealsufte-"laiiktrhseintihrudesaUse"Kr.ecould idmams"i)nence("ithin
The Border Force is hunting don 2,000
people ho hae recentl traelled from
Wuhan - the epicentre of irus outbreak.</p>
      <p>Holidamakers ho touched don at
London Heathro from W uhan ere
shocked as the ere simpl handed a
leaet and told to call NHS 1 11 if the felt ill.</p>
      <p>The irus has killed 56 people and infected
tHmWououhbrhaeeaeitnhpoa-rroinagini2cnc,iat0eti.0eth0d1gfrloobm1amallialmlifoteenartpitmeiosaprbkleeeltiieninetdhe r"iisnkfe("cktiellde"d)",
detaining("keep them objectie
in closed rooms")</p>
      <p>emphasised Action
hunt("hunting don") language
surprise("shocked") subjectie
bjecii
strong,
negatie
objectie
objectie
ca
e TITLE: faig bjecii e</p>
      <p>cCahsineas icnotrhoenaUirKus: 'Increased likelihood' of (ue"nplicekeectrliathatoiionontd,") noebjuetcratile,
main focus ARTICLE:
Action main focus cpGaelosobepaslelloifn,tthCheehriienruaas.r,ehmichorheatshaknille5d0018conrmed danger("killed") noebjuetcratile, Background</p>
      <p>But there are no knon cases in the UK, Mr uncertaint("no
Background similarit = 0.57 tdPHheuaeabnlUlciicoKthcHhkaeansaealotinhduo,tEbhbnrieecgaehlaknan.tsde"sFsetaoelliuddpr.tfreoeerpntahrpeeedior"uptsole, in ckponrenopdna"e)rn,ecde"()"ell noebjuetcratile, Background</p>
      <p>simsiilmariliatr=it0=.604.39 iscdPnomoafenspifcsstittoedihomneengrsefeb,adreslsloflieoallprs,reseh[.sa.ur.ie]cspcaaeaelsftuiciieilnth.ngcetacadkndesitecaeelkoreeposndahsatatfoter j"eumpseotirscttiasitemio(p"nao("drstiacoen")"t",), snuebujteracltie, ACcotmiomne+ntar
ACcotmiomne+ntar ddbMeeereahHltoihagpsnhincaeogrncdtshkitathsuneaaittdnhiouotnhmsaeabtnteihdtraatothsfhecaaane"surebamsepbeiidesnlrliokfel to treoep"pe,oc"rIttaientpigoe(n"c(st""ai)si,dl"ik)el snuebgjaetcietie, Action
Background / conrmed so far [...]".</p>
      <p>Closing "The NHS is read to respond appropriatel condence("is read") subjectie Conclusion
to an cases that emerge."
ca</p>
      <p>This is is the starting point to identify the di↵erences, with a contrastive analysis. We propose here a set of
cross-article comparative signals that can bring the narrative analysis a step further:
• The main focus of the compared articles is on a di↵erent part or detail of the story: this means that
while they are both describing the same broad event, they are trying to emphasise or prioritise two di↵erent
aspects. This signal can be computed by looking at the most similar sentence to the article title (proxy of
the emphasis), and seeing how it is represented in other documents.
• Ordering: the compared articles present the same details, but in a di↵erent order. Re-ordering events tends
to be an ecient way of creating implicit cause-e↵ect relationships. To do this comparison, it is sucient
to find the crossovers in the sentence-level connections.
• Selection of details: One article is omitting certain details that have been reported by other articles, or
is describing events that are corroborated by other sources, or has unique parts that do not occur in other
articles [BMTH18]. In addition to seeing which parts are selected or omitted, the narrative analysis can help
us to find some insights about them (e.g., the article is omitting subjective statements reported by others,
or is describing a background event that others did not include).
• The articles are framing the narrative in di↵erent ways from each other. This manifests through comparing
linked sentences to observe the di↵erences in terms of framing features: the considered articles are describing
the same events but with di↵erent framing and reasoning. One concrete example is the usage of causality :
one article may contain causality signposting between a pair of sentences that is absent elsewhere. Or as
another example, the usage of specific words can reveal a specific framing: talking about the same detail
or entity, the usage of verbs or adjectives may change. For detecting such peculiarities, features as Named
Entities and subjectivity may be combined.
• The comparison can be also done on the subjectivity of the article: both at the document level (saying
that this is an opinion piece, while a similar one is more factual) or at the sentence level, by interweaving
this signal with the ones proposed before.</p>
      <p>From the signals in Figure 1, we can see that the first article pushes the narrative towards risk and other
negative frames, to sustain the idea presented in the title “Britain on Edge”. The second article, even though
it has a lot of information in common with the first one, is more confident on the preparedness of the National
Health Service to face the virus (e.g., confidence, expertise). The extraction of these cross-article signals is
the first step to finding possible cases of manipulation.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Evaluation</title>
      <p>The evaluation of this framework needs to be performed at di↵erent levels. Firstly, we need to find a similarity
model that performs well both at the article and sentence levels, going beyond the linguistic surface and being
able to relate pieces of text that may use di↵erent terms for describing the same events. The evaluation of
the similarity measure will be done at the article level using data coming from tools that aggregate articles
talking about the same events, such as Google News Headlines1 and AllSides2 as well as research datasets such
as NewsAggregator.3 Instead, for the sentence-level similarity, user feedback will be needed to understand when
and why a sentence is considered to describe the same detail while we are dealing with manipulations that can
be significant.</p>
      <p>Following that, we would also need to evaluate the whole framework with user studies to understand the
relevance, quality and usefulness of the indicators proposed. Currently, and to the best of our knowledge, there
are no similar approaches to the task we are addressing in this paper, and hence we are unable to establish
comparisons with other baseline approaches from the literature.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Discussion</title>
      <p>Much research exists that address the problem of misinformation. However, the vast majority of such research
focuses on distinguishing what is true from what is false, and hence mainly applies to a small subsection of the
1https://news.google.com/
2https://www.allsides.com/story/admin
3http://archive.ics.uci.edu/ml/datasets/News+Aggregator
misinformation ecosystem.4 There is a lack of research on identifying misleading content, false connection and
false context. To this end, there is an immediate need for technological solutions to address such cases, where the
information is manipulated in a subtle fashion, and thus cannot be easily dismissed as false. We want to reveal
the di↵erences in reporting, without declaring that one article contains true or false information, but rather to
provide a tool that exposes such diversities.</p>
      <p>In this paper we proposed a comparative approach that aims at bringing into light the di↵erences in the
narratives of news articles, using a set of cross-article narrative signals. These signals only exist when multiple
documents are compared, in contrast to single-article ones that already exist. With this method, we aim to
reveal the framing intentions of the writers, and making them more evident and comparable.</p>
      <p>This analysis may be useful for empowering users to form a critical view of pieces of news they are consuming,
to find missing pieces that have been omitted and to see the same information presented with a di↵erent framing
by di↵erent articles and sources.</p>
      <p>This work is partially supported by EU H2020 Project Co-Inform (grant no. 770302).</p>
      <sec id="sec-5-1">
        <title>Acknowledgements</title>
        <p>[ABS14]</p>
        <p>Nabiha Asghar. Automatic extraction of causal relations from natural language texts: a
comprehensive survey. arXiv preprint arXiv:1605.07895, 2016.</p>
        <p>Allan Bell. The language of news media. Blackwell Oxford, 1991.</p>
        <p>Allan Bell. News stories as narratives. The language of time: a reader, page 397, 2005.
[BMTH18] Dimitrios Bountouridis, M´onica Marrero, Nava Tintarev, and Claudia Hau↵. Explaining credibility
in news articles using cross-referencing. In SIGIR workshop on ExplainAble Recommendation and
Search (EARS), 2018.</p>
        <p>David M Blei, Andrew Y Ng, and Michael I Jordan. Latent dirichlet allocation. Journal of machine
Learning research, 3(Jan):993–1022, 2003.
[CYyK+18] Daniel Cer, Yinfei Yang, Sheng yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah
Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, and Ray
Kurzweil. Universal sentence encoder. arXiv preprint arXiv:1803.11175, 2018.
Ge Gao, Eunsol Choi, Yejin Choi, and Luke Zettlemoyer. Neural metaphor detection in context.
arXiv preprint arXiv:1808.09653, 2018.</p>
        <p>William A Gamson and Andre Modigliani. Media discourse and public opinion on nuclear power:
A constructionist approach. American journal of sociology, 95(1):1–37, 1989.</p>
        <p>4https://firstdraftnews.org/latest/fake-news-complicated/
[JWJ+19]</p>
        <p>Erving Go↵man. Frame analysis: An essay on the organization of experience. Harvard University
Press, 1974.</p>
        <p>Karen Sparck Jones. A statistical interpretation of term specificity and its application in retrieval.
Journal of documentation, 1972.</p>
        <p>Kavanagh Jennifer, Marcellino William, Blake Jonathan, Smith Shawn, Davenport Steven, and
Tebeka Mahlet Gizaw. News in a Digital Age: Comparing the Presentation of News Information
over Time and Across Media Platforms. Rand Corporation, 2019.</p>
        <p>Bing Liu. Sentiment analysis and subjectivity. In Handbook of Natural Language Processing, Second
Edition, pages 627–666. CRC Press Book, 2010.</p>
        <p>Daniel Marcu. The theory and practice of discourse parsing and summarization. MIT press, 2000.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [MGB+17]
          <string-name>
            <surname>Arpan</surname>
            <given-names>Mandal</given-names>
          </string-name>
          , Kripabandhu Ghosh, Arnab Bhattacharya, Arindam Pal, and
          <string-name>
            <given-names>Saptarshi</given-names>
            <surname>Ghosh</surname>
          </string-name>
          .
          <article-title>Overview of the fire 2017 irled track: Information retrieval from legal documents</article-title>
          .
          <source>In FIRE (Working Notes)</source>
          , pages
          <fpage>63</fpage>
          -
          <lpage>68</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Catherine</given-names>
            <surname>Kohler Riessman</surname>
          </string-name>
          .
          <article-title>Narrative analysis</article-title>
          , volume
          <volume>30</volume>
          .
          <string-name>
            <surname>Sage</surname>
          </string-name>
          ,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Swabha</given-names>
            <surname>Swayamdipta</surname>
          </string-name>
          , Sam Thomson, Chris Dyer, and
          <string-name>
            <surname>Noah</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Smith.</surname>
          </string-name>
          <article-title>Frame-Semantic Parsing with Softmax-Margin Segmental RNNs and a Syntactic Sca↵old</article-title>
          .
          <source>arXiv preprint arXiv:1706.09528</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [YDY+19]
          <string-name>
            <surname>Zhilin</surname>
            <given-names>Yang</given-names>
          </string-name>
          , Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le.
          <article-title>Xlnet: Generalized autoregressive pretraining for language understanding</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          , pages
          <fpage>5754</fpage>
          -
          <lpage>5764</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [ZZBBN19]
          <string-name>
            <given-names>Iqra</given-names>
            <surname>Zahid</surname>
          </string-name>
          , Hao Zhang, Frank Boons, and
          <string-name>
            <surname>Riza</surname>
          </string-name>
          Batista-Navarro.
          <article-title>Towards the automatic analysis of the structure of news stories</article-title>
          .
          <source>In Text2Story@ ECIR</source>
          , pages
          <fpage>71</fpage>
          -
          <lpage>79</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>