Estimating Credibility of News Authors from their WIKI
                 Validated Predictions

                        Navya Yarrabelly                             Kamalakar Karlapalem
                    DSAC, IIIT Hyderabad                             DSAC, IIIT Hyderabad
               yarrabelly.navya@research.iiit.ac.in                     kamal@iiit.ac.in


                                                                with reading articles can easily determine predictabil-
                                                                ity aspects of a news article and over time has some
                       Abstract                                 assurance about which articles or news agencies cor-
                                                                rectly predict some of the future scenarios. It is impor-
    In this paper, we consider a set of articles or             tant and necessary, therefore, to enhance our ability to
    reports by journalists or others, wherein they              computationally determine the credibility of journal-
    predict or promise something about future.                  ists based on their ability to predict the future sce-
    The problem we approach is determining the                  narios correctly. As a step towards this direction, we
    credibility of the authors based on the predic-             take up the automatic verification of predictive state-
    tions coming out to be true. The two specific               ments against facts collected from credible information
    problems we address are extracting the pre-                 sources. This task of machine reading at scale has the
    dictions from the articles and annotating with              difficulties of relevant article retrieval (finding the sig-
    various prediction attributes. And then we                  nificant facts) with that of machine perception of con-
    determine the truth of these predictions, us-               tent (entailment of predictions from facts).
    ing Wikipedia as a credible source to extract               Consider the following prediction published on date
    relevant facts which can ascertain the validity             ‘d’.
    of the predictions. We proposed and built an                Example: The Reserve Bank of India may lower the
    end to end system for automated predictions                 economic growth projection for 2017-18 to 6.7 per cent
    validation(APV) by extracting future specu-                 later this month, from its August forecast of 7.3 per
    lations and predictions from news articles and              cent, in view of issues with GST implementation and
    social media. We considered 28 news arti-                   lower kharif output estimates. In the above predic-
    cles and extracted 97 predictions from these                tive sentence, we have to precisely extract and validate
    articles and the range of credibility scores(F-             only the predictive part “The Reserve Bank of India
    scores) for these articles are (0.57-0.71).                 may lower the economic growth projection for 2017-18
                                                                to 6.7 per cent”, “in view of issues with GST imple-
1    Introduction                                               mentation and lower kharif output estimates.” is the
                                                                premise on which the prediction is made and “from its
In newspaper articles, many journalists evaluate the            August forecast of 7.3 per cent” is a supporting clause.
current state of affairs and predict possible future sce-       The reference future date for this prediction “later this
narios. [6] estimates from their investigations that            month” is translated to actual date ‘d+30’. The facts
nearly one-third of news articles contain predictive            relevant to the predictive part, which are published af-
statements. Therefore, it is imperative to determine            ter the target date ‘d+30’, are extracted to determine
the passages, sentences and phrases of the news articles        the entailment relation from fact to prediction.
that predict the future scenarios. A person well versed         Contributions: The main contributions of the ap-
                                                                proach proposed are
Copyright © 2018 for the individual papers by the papers’ au-
                                                                (1) To translate predictions to structured queries, we
thors. Copying permitted for private and academic purposes.
This volume is published and copyrighted by its editors.        annotate the predictions with a wide range of at-
In: D. Albakour, D. Corney, J. Gonzalo, M. Martinez,            tributes(in Table 1). This can further be used by an
B. Poblete, A. Vlachos (eds.): Proceedings of the NewsIR’18     IR system to retrieve predictions made in reference to
Workshop at ECIR, Grenoble, France, 26-March-2018, pub-         a future time period, targeting an event etc.
lished at http://ceur-ws.org
(2) We also report a timeline story of its relevant facts   3      Predictions
and analysis, and the fact sources confirming the truth
                                                            3.1     Predictions Extraction :
of the predictions. News IR systems can also come up
with recommendations or follow up links for an article      From each article, we annotate the sentences as pre-
read, based on the predictive attributes and from the       dictive or factual using the implementation from [15].
timeline of facts extracted.                                It also identifies the predictive phrase in the prediction
(3) We propose an approach to tackle open-domain            and resolves the scope of the prediction in a complex
prediction validation using Wikipedia as the unique         sentence.
knowledge source.
                                                            3.2     Semantic Graph Model for Predictive
                                                                    Sentence Simplification
2   Related Work
                                                            News articles often contain long and syntactically com-
Research has in the past focused on how to answer           plex sentences with relevant dependent relations span-
questions but has not devoted attention to discerning       ning over various clauses. It is required to determine
the accuracy of the predictions/promises made. To           constituents that commonly supply no more than con-
the best of our knowledge [5] is the only work which        textual background information. Inspired by the work
focused on the estimation of validity of predictions, by    of sentence simplification using relation graph1 and
calculating cosine similarity between predicted news        syntactic sub-structures [11, 1], we followed a syntax
and the relevant events that actually occurred. We of-      based sentence simplification approach to determine
fer semantic and syntactic analysis based on the struc-     such constituents and to annotate predictions with
ture of relation triplets in a predictive sentence and      various attributes. We constructed a Triplet-Level Se-
incorporated domain-specific knowledge into the sys-        mantic Graph Model (TLSGM) which has relation-
tem. Also, their retrieval model is limited to topics       triplets as vertices and the semantic relationships be-
contained by predictions(manually collected). Though        tween the triplets govern the edges in the graph. From
applications on future information retrieval have been      the TLSGM, we identified core triplets of the predic-
studied by a number of researchers, study on the prob-      tive part of the sentence and dis-embedded other pe-
lem of validating predictions from Natural Language         ripheral triplets w.r.t the head predictive phrase ex-
Understanding perspective is limited. [7] presents a        tracted in Section 3.1. Then only these core triplets
search engine for future and Past events relevant to        are validated to determine the accuracy of the predic-
a users query. [3] automatically generates summaries        tion.
of future events related to queries. Their methods             Vertices: Vertices in the TLSGM represent (sub-
rely on extracting and processing statements contain-       ject, predicate phrase, object) relation triplets ex-
ing future temporal references. [6] retrieves and ranks     tracted from the prediction.
predictions that are relevant to a news article using          Edges: An edge between two nodes N1 →N2 repre-
features: term similarity, entity-based similarity, topic   sents the semantic relation of node N2 w.r.t node N1 .
similarity, and temporal similarity.                        Edges can be formed either from the subject or object
   Relevance to Fact Checking and QA Systems                of a node to another node describing/modifying the
To some extent our problem can be compared with             noun phrase of subject/object, following the rules for
the Fact Checking and Question Answering systems.           noun descriptors. While edges formed from a predicate
Though research has been done on the truth assess-          to another node follow verb descriptor rules given be-
ment of fact statements relying on iterative peer vot-      low. We illustrate the descriptor rules using example
ing, leveraging language to infer accuracy of fact can-     sentences given below.
didates has only started. [14] calculates the credibility
of an uncertain fact by comparing other related facts.          1. Example 1 : Mary Kom, who won Bronze at Lon-
Fact validity is estimated by the co-occurrence degree             don Olympics, still has a fifty-fifty chance of gain-
of the doubt object and predicate by relying on page               ing a wildcard entry to the 2016 Rio Olympics.
counts for web queries.[4], [8] proposed to convert a              (Mary Kom, has, fifty-fifty chance) is the head
fact-checking question into a set of factoid-style ques-           predictive triplet (H).
tions and validated the answers against those retrieved         2. Example 2 : The Reserve Bank of India is likely
by Factoid Question Answering systems Our problem                  to leave interest rates unchanged inorder to keep
differs from existing fact checking systems and ques-              inflation rate controlled.
tion answering systems in its retrieval problem, as we
only have to validate the predictive part of a sentence     Rules for Noun Descriptors
and retrieve the relevant facts which occurred within       Modifiers and Dependents of the head of the noun
the implicit temporal constraints imposed.                      1 https://github.com/Lambda-3/Graphene
phrase of either the subject or object of a triplet are     8% this year has a relation ccomp(promised, cross),
discussed below, categorized by the dependency rela-        which adds an edge from (Modi, promised,) to (GDP
tions.                                                      growth rate, would cross, 8%) .
acl:relcl, appos : A relative clause modifier from             advcl : An adverbial clause modifier of a VP or
the head noun of an NP to the head of a relative            S is a clause modifying the verb to introduce either a
clause. The clause introduced by this dependency only       temporal, consequence, conditional or purpose clause
gives additional information on the noun phrase and         and adds specificity to the head clause. Example 2
does not remark about the future predictive action,         has a relation advcl(leave, keep) which adds an edge
which is our focus of interest. Example 1 has relation      from H to triplet N2 : (RBI, to keep , inflation rate
acl:relcl(Sindhu, won) from the subject of node H. And      controlled). The validity of the predictive sentence
the edge between H and N2 : (Mary Kom, won, Bronze          should be determined regardless of the state of truth
at London Olympics ) is only an additional descriptor       of the purpose/conditional clause. Hence the node N2
of H. Node N2 and its edges are pruned from the graph.      and its edges are discarded from the graph.
acl : An adjectival clause introduced by a Noun.
• If the dependent is a verb, and it has no subject,        3.3   Prediction Attributes
it takes the object of the governor. Example 1 has a
relation acl(chance,gaining) from the object of H. And      Each Node in TLSGM is further classified and la-
the edge between H and N2 : (fifty-fifty chance, gain-      beled with reference to the root node i.e head pre-
ing, wildcard entry to the 2016 Rio Olympics) further       diction node of the graph. We have determined the
specifies the predictive action of N1 and hence node        characteristics of following constituents, using a num-
N2 is retained in the graph.                                ber of syntactic features (dependency relation types,
    • If the dependent is an adjective, it will only de-    constituency-based parse trees as well as POS and
scribe the subject/object. This relation is also used       NER labels). Attributes : (Action; Event; Event
for optional depictives to modify the nominal of which      location; Event Time; Purpose / Consequence of pre-
it provides a secondary predication. Example 2 has          dictive action; Premise; Conditional clause; Qualifier
a relation acl(rates, unchanged) from the object of H.      Reference which adds specificity attributes of the en-
And the edge between H and N2 : (interest rates, un-        tities involved in the prediction; Numeric Quantifier
changed) acts as a qualifier reference for the entities     Reference; Certainty Perspective to isolate predictive
contained in the prediction.                                stances taken by an author from third party’s voices
    Rules for Verb Descriptors                              that are presented by the author).
    xcomp : An open clausal complement (xcomp) of
a VP, without its own subject, whose reference is de-       4     Extracting Relevant facts
termined by an external subject.                            In the following section, we describe our system for
• If the governor of the relation contains an object        Automatic Prediction Validation (APV) which con-
of its own, the clause introduced by xcomp provides         sists of three components: (1) Keyword selection mod-
attributes to the relation contained by the governor        ule to select keywords specific to the predictive part,
predicate and acts as a purpose or consequence clause.      dis-embedding the linguistic peripheral clauses identi-
Ex : Microsoft share values may go down by 10 dol-          fied in section 3.2 (2) the Document Retriever module
lars to give space to the new iPhone launch. We create      for finding facts relevant to the prediction and (3) a
an edge (Microsoft share values, may go down, by 10         machine comprehension model, Document Reader, for
dollars) -¿ (,give,space to the new iPhone launch), gov-    ascertaining the accuracy of predictions from a small
erned by the relation xcomp(go, give).                      collection of relevant facts.
• If the governor of the relation does not contain an ob-
ject, the dependent predicate modifies the head pred-
                                                            4.1   Keyword Selection
icate. We modify the predicate of the current node to
include the dependent predicate connected by xcomp          Obtaining the pertinent facts relevant to the predic-
relation. Example 2 has a relation xcomp(likely,            tion is in itself a complicated problem to solve. Pre-
leave). We modify H to (The Reserve Bank of India,          dictions have event and temporal based constraints,
is likely to leave , interest rates unchanged ).            clausal complements, appositives, relative clauses etc.
    ccomp for a verb : A clausal complement of a verb       to add specificity or modify the action of an event.
is a dependent clause with an internal subject which        To overcome the problem of query drift introduced by
functions like an object of the verb, or adjective. The     these clauses, we further dis-embed keywords express-
clause introduced further describes the future course of    ing the time constraints, premise clauses, certainty
action referred by the governor predicate P. Ex: Modi       perspective (annotated in Section 3.2) and the spec-
promised that Indian GDP growth rate would cross            ulative words used. We identify the headword of the
predictive phrase and used a rule based approach so               egory vectors and assigned a score based on the
that the predictive sentence fragments can be detected            cosine similarity between prediction category vec-
and to select keywords pertaining to the predictive ac-           tor and the fact category vector.
tion and its attributes in the sentence. Let K be the
                                                              • Wikipedia concept relevance: Cumulative pair-
set of relation triplets, we add the head vertex of the
                                                                wise similarity score of extracted Wikipedia con-
graph (TLSG) to K and recursively add selected nodes
                                                                cepts from the prediction and fact’s context from
from its edges to K. We select nodes with edge labels
                                                                the Wiki article.
corresponding to Action, Event, Qualifier and Quanti-
fier References as described in Table 1. We then give         • Context similarity: Distributional semantic simi-
proximity queries where subject, predicate and object           larity score between words and phrases from the
occur within a window of 7 words. We further expand             prediction and fact.
the query set iteratively by adding purpose clauses and
expand keywords in a query with their synonyms.             From these candidate facts, we filtered the top 100
Example: For the predictive sentence “Lizzie Armit-         facts sorted with their current score.
stead is predicted to win gold medal in cycling road
                                                            4.3    Validation of Predictions
race at the Rio Olympics. ”
Query : (Lizzie Armitstead ∼ win ∼ gold medal) OR           Our approach allows to translate the prediction and
(Lizzie Armitstead ∼ win ∼ cycling road race) OR            fact to a semantic representation, incorporating knowl-
(Lizzie Armitstead ∼ win ∼ Rio Olympics. )                  edge from external sources and then try to determine
                                                            if the representation of the prediction is subsumed by
4.2   Candidate Relevant            Facts   Extraction      that of the fact.
      From Wikipedia                                            We pass all the (prediction, fact) pairs to two com-
                                                            ponents: 1. (RATSR) framework(described below)
To extract pertinent facts which can ascertain the ac-
                                                            and 2. an RTE system which performs rich syntac-
curacy of the predictions, we used Wikipedia as a
                                                            tic analysis of the linguistic phenomena between the
knowledge source. Wikipedia’s publicly available apis2
                                                            entailment pair.
to access revision history of each article and its up-to-
                                                            Relation Alignment for Textual Similarity
date knowledge marked with timestamps makes it a
                                                            Recognition (RATSR) The RATSR framework has
reliable source for event-based prediction validation.
                                                            three major components: 1. Preprocessor. Prediction
We used tagme3 as a semantic interpreter that maps
                                                            and fact pairs are annotated with a range of analytical
fragments of natural language text into a weighted se-
                                                            tools. 2. Graph Generator. Applies metrics to com-
quence of Wikipedia concepts relevant to the input.
                                                            pare triplets in specified annotation views to generate a
Using the query set in the above step(Section 4.1), we
                                                            match graph over the Prediction and Fact constituents
extracted the top 50 documents, from a local Lucene
                                                            of the entailment pair. 3. Alignment Score. Filters the
index of Wikipedia English dump. To further extract
                                                            edges in the match graph to focus on a scoring function
the relevant snippet from the article, we only included
                                                            based on the alignment output.
the article content with revision dates occurring within
                                                                (1)Preprocessor: Sentence and word segmenta-
the time-window referenced by temporal constraints
                                                            tion; POS tagging; dependency parsing; named entity
extracted for the validity of the prediction. We used
                                                            recognition; co-reference resolution; temporal expres-
the word2vec python implementation of Gensim [13]
                                                            sion identifiers; Wikipedia concepts annotator; Multi
using Wikipedia as a corpus for generating embed-
                                                            word expression identifiers4 ; Phrasal verbs identifiers;
dings to represent contextual term vectors. Inspired
                                                            Quantifier and Qualifier references. These resources
from [10], we adapted Zero Filter, Terms filter, Exact
                                                            are used for annotating both predictions and facts at
Sequence Filter, Normalization Filter, N-grams Filter,
                                                            the sentence level and triplet level.
Density Filter to extract and sort the relevant candi-
                                                            (2)Graph Generator: Similarity metrics are applied to
date facts from the retrieved articles. Additionally, we
                                                            the relevant constituent pairs drawn from the Predic-
implemented the following filters
                                                            tion and Fact. [2] uses relation triplet similarity by
  • Distance filter : Assigns a score to a fact based       calculating similarity across subject, verb and object
    on the distance between subject and object from         pairs from PPDB[12], as a feature for stance classifica-
    each triplet in prediction.                             tion. We construct a relation match graph(RMG) by
                                                            iterating over each triplet in prediction and fact and
  • Category Filter: For all the annotated Wikipedia        calculate similarity over various views to give a simi-
    concepts in the prediction and facts, we build cat-     larity score between the two triplets being compared
  2 https://www.mediawiki.org/wiki/API:Search                  4 https://radimrehurek.com/gensim/models/phrases.
  3 https://tagme.d4science.org/tagme/                      html\#id2
and create an edge with similarity score as the weight.                 Rio Olympics from 6 sites (denoted as A5 , B6 , C7 , D8 ,
We propose methods for similarity between triplets for                  E9 , F10 ) and manually filtered the predictions which
various annotations mentioned in the pre-processing                     can be objectively evaluated and those which can be
step.                                                                   reduced to factoid questions. ‘Olympics Predictions’
                                                                        dataset consists of 97 predictions made for various
    • Triplet Similarity Score using Latent Semantic
                                                                        events in trials for Rio Olympics and the Rio Olympics
      Analysis Models (Score = S1 ): Adapting the
                                                                        2016. We further manually annotated each prediction
      implementation from [9] and using multiplication
                                                                        as true, if it has come true and false otherwise. We
      as vector composition operator for phrases with
                                                                        collected the second dataset ‘Obama Promises’ from
      more than one word, we define the similarity of
                                                                        politifact11 , where each promise is labeled as ‘broken’
      SPO triplets using distributional models as given
                                                                        or ‘promise kept’ or ‘compromised’. We collected a set
      below:
                                                                        of 257 such promises which can be objectively evalu-
      Probability that fact triplet tf :(sf ,vf ,of ) implies
                                                                        ated.
      prediction triplet tp :(sp ,vp ,op )is
                                                                           We evaluated the predictions and obtained labels
              P (tp − > tf ) =P (sp |tf )(1 − P (sp ))+                 using our prediction validation system on the two
                                                                        datasets. Table 3 compares the accuracy scores of
                                 P (v p |tf )(1 − P (v p ))+      (1)
                                                                        these labels against the actual labels. Table 2 presents
                                 P (op |tf )(1 − P (op ))               the reliability scores obtained(normalized by the num-
         P (sp |tf ) = P (sp |sf ) + P (sp |v f ) + P (sp |of )   (2)   ber of predictions) of the 6 sources we considered.

    • Triplet Similarity Score using Lexical Seman-                         Table 1: Credibility scores for the news sites
      tic Models(Score = S2 ): We calculate similar-                      Source     True       False      Factual Credibility
      ity scores between subject, predicate and object                              Predic-    Predic-      State-    Score
      pairs from prediction and fact from synonym and                                tions       tions      ments
      antonym similarity using Wordnet, PPDB and
                                                                            A          14           7        67%       29.3
      Wikipedia concept Similarity; hyponym and hy-
                                                                            B           4           9        88%       17.4
      pernym similarity using Wikipedia and Wordnet
                                                                            C           7           6        72%       31.1
      taxonomy structure; length of the path between
                                                                            D           7           4        69%       37.2
      two entities in DBPedia; Numeric references simi-
                                                                            E           7          10        71%       21.3
      larity. We then combine these scores to give a cu-
                                                                            F           9          13        66%       21.7
      mulative lexical similarity score between the two
      triplets.
   (3)Alignment: The goal of alignment component is                          Table 2: Results for predictions validation
to decompose the text and hypothesis into semantic                           Dataset      TP      TN     FP    FN Fscore
constituents, and determine which prediction triplet                      Rio Olympics    37      29     20      9     0.718
should be aligned to which fact triplet. In con-                             Obama        111     30     26     93     0.651
trast to aligning words[2] from prediction to fact, we                      Promises
align triplets to exploit the semantic roles of the con-
stituents; to facilitate for the analysis of specific pre-                 Discussion: ‘Obama Promises’ contains multi-
diction attributes(in Table 1) which are matched in the                 sentence predictions and requires more robust NLP
fact; and also to validate against a cluster of relevant                modules to identify the main predictive clause that has
facts. We used a maximum weight perfect bipartite                       to be validated, besides other supporting predictive
graph matching algorithm to align triplets from pre-                    clauses (example: “Create a $10 billion fund to
diction to relevant triplets from facts.                                help homeowners refinance or sell their homes.
   From the similarity scores obtained from the                         The Fund will not help speculators, people who bought
RATSR framework and an RTE system[?], we set                            vacation homes or people who falsely represented their
threshold limits to label the entailment pair as true,                  incomes”). High false negative rate can be attributed
false or unrelated. .
                                                                          5 https://www.eurosport.co.uk
                                                                          6 http://edition.cnn.com/
5     Results & Discussion                                                7 https://www.foxsports.com.au
                                                                          8 http://www.couriermail.com.au/
Dataset Preparation: We collected two datasets,                           9 https://www.theguardian.com/
one from predictions in sports domain and the other                      10 https://www.thehindu.com/news
from campaign promises made by Barack Obama. We                          11 http://www.politifact.com/truth-o-

automatically extracted predictions from articles on                    meter/promises/obameter/browse/
to the drift in both facts retrieval module and val-             [8] Mio Kobayashi, Ai Ishii, Chikara Hoshino, Hiroshi
idation module, due to other insignificant predictive                Miyashita, and Takuya Matsuzaki. Automated his-
clauses. ‘Rio Predictions’ contains mostly event-based               torical fact-checking by passage retrieval, word statis-
                                                                     tics, and virtual question-answering. In Proceedings
predictions and the high false positive rate for this                of the Eighth International Joint Conference on Nat-
dataset is partly due to omitting explicit negative en-              ural Language Processing (Volume 1: Long Papers),
tity similarity in the context of a given prediction. For            volume 1, pages 967–975, 2017.
example, the entities ‘Ussain Bolt’ and ‘Wayde van
                                                                 [9] Dmitrijs Milajevs, Mehrnoosh Sadrzadeh, and
Niekerk’ are negatively related in the context of ‘win-              Thomas Roelleke. Ir meets nlp: On the semantic
ning a medal at Rio Olympics’. This negative similar-                similarity between subject-verb-object phrases. In
ity should be translated to negative triplet similarity              Proceedings of the 2015 International Conference on
and further to labeling as a contradicting relation for              The Theory of Information Retrieval, pages 231–240.
the prediction-fact entailment pair. We plan to ad-                  ACM, 2015.
dress this in our future work, by generating alternative        [10] Piero Molino, Pierpaolo Basile, Annalina Caputo,
statements for a prediction by automatically identify-               Pasquale Lops, and Giovanni Semeraro. Exploiting
ing the doubt unit in a sentence and filling with rele-              distributional semantic models in question answer-
vant comparable entities/phrases.                                    ing. In Semantic Computing (ICSC), 2012 IEEE Sixth
                                                                     International Conference on, pages 146–153. IEEE,
                                                                     2012.
References
                                                                [11] Christina Niklaus, Bernhard Bermeitinger, Siegfried
 [1] Gabor Angeli, Melvin Jose Johnson Premkumar, and                Handschuh, and André Freitas. A sentence simplifica-
     Christopher D Manning. Leveraging linguistic struc-             tion system for improving relation extraction. arXiv
     ture for open domain information extraction. In Pro-            preprint arXiv:1703.09013, 2017.
     ceedings of the 53rd Annual Meeting of the Associa-
     tion for Computational Linguistics and the 7th Inter-      [12] Ellie Pavlick, Pushpendre Rastogi, Juri Ganitkevitch,
     national Joint Conference on Natural Language Pro-              Benjamin Van Durme, and Chris Callison-Burch.
     cessing (Volume 1: Long Papers), volume 1, pages                Ppdb 2.0: Better paraphrase ranking, fine-grained en-
     344–354, 2015.                                                  tailment relations, word embeddings, and style clas-
                                                                     sification. In Proceedings of the 53rd Annual Meet-
 [2] William Ferreira and Andreas Vlachos. Emergent: a               ing of the Association for Computational Linguistics
     novel data-set for stance classification. In Proceedings        and the 7th International Joint Conference on Nat-
     of the 2016 conference of the North American chapter            ural Language Processing (Volume 2: Short Papers),
     of the association for computational linguistics: Hu-           volume 2, pages 425–430, 2015.
     man language technologies, pages 1163–1168, 2016.
                                                                [13] Radim Řehůřek and Petr Sojka. Software Framework
 [3] Adam Jatowt, Kensuke Kanazawa, Satoshi Oyama,                   for Topic Modelling with Large Corpora. In Proceed-
     and Katsumi Tanaka. Supporting analysis of future-              ings of the LREC 2010 Workshop on New Challenges
     related information in news archives and the web. In            for NLP Frameworks, pages 45–50, Valletta, Malta,
     Proceedings of the 9th ACM/IEEE-CS joint confer-                May 2010. ELRA. http://is.muni.cz/publication/
     ence on Digital libraries, pages 115–124. ACM, 2009.            884893/en.

 [4] Hiroshi Kanayama, Yusuke Miyao, and John Prager.           [14] Yusuke Yamamoto and Katsumi Tanaka. Finding
     Answering yes/no questions via question inversion.              comparative facts and aspects for judging the cred-
     Proceedings of COLING 2012, pages 1377–1392, 2012.              ibility of uncertain facts. Web Information Systems
                                                                     Engineering-WISE 2009, pages 291–305, 2009.
 [5] Kensuke Kanazawa, Adam Jatowt, and Katsumi
                                                                [15] Navya Yarrabelly and Kamalakar Karlapalem. Ex-
     Tanaka. Improving retrieval of future-related infor-
                                                                     tracting predictive statements with their scope from
     mation in text collections. In Proceedings of the 2011
                                                                     news articles. In The 12th International AAAI Con-
     IEEE/WIC/ACM International Conferences on Web
                                                                     ference on Web and Social Media (ICWSM-18), Sub-
     Intelligence and Intelligent Agent Technology-Volume
                                                                     mitted for publication.
     01, pages 278–283. IEEE Computer Society, 2011.

 [6] Nattiya Kanhabua, Roi Blanco, and Michael
     Matthews. Ranking related news predictions. In Pro-
     ceedings of the 34th international ACM SIGIR con-
     ference on Research and development in Information
     Retrieval, pages 755–764. ACM, 2011.

 [7] Hideki Kawai, Adam Jatowt, Katsumi Tanaka, Kazuo
     Kunieda, and Keiji Yamada. Chronoseeker: Search
     engine for future and past events. In Proceedings of
     the 4th International Conference on Uniquitous In-
     formation Management and Communication, page 25.
     ACM, 2010.