=Paper=
{{Paper
|id=Vol-1926/paper6
|storemode=property
|title=A Prototype Method for Future Event Prediction Based on Future Reference Sentence Extraction
|pdfUrl=https://ceur-ws.org/Vol-1926/paper6.pdf
|volume=Vol-1926
|authors=Yoko Nakajima,Michal Ptaszynski,Hirotoshi Honma,Fumito Masui
}}
==A Prototype Method for Future Event Prediction Based on Future Reference Sentence Extraction==
<pdf width="1500px">https://ceur-ws.org/Vol-1926/paper6.pdf</pdf>
<pre>
                            A Prototype Method for Future Event Prediction
                            Based on Future Reference Sentence Extraction
           Yoko Nakajima†         Michal Ptaszynski‡     Hirotoshi Honma†         Fumito Masui‡
           †Department of Creative Engineering                ‡Department of Computer Science
            National Institute of Technology, Kushiro College    Kitami Institute of Technology
        2-32-1 Otanoshike, Kushiro, 084-0916, Japan         165 Koen-cho, Kitami, 090-8507, Japan
                {yoko,honma}@kushiro-ct.ac.jp           {ptaszynski,f-masui}@cs.kitami-it.ac.jp
                           Abstract                                     knowledge to predict future events. In such everyday pre-
     This paper presents our study on future prediction sup-
                                                                        dictions people often use widely available resources (news-
     port method based on state-of-the-art natural language
                                                                        papers, Internet). Especially people who bare significant so-
     processing and pattern extraction techniques. We pro-
                                                                        cial responsibility, such as politicians, managers, strategists,
     pose two practical applications of the method. The first
                                                                        planing specialists, or policy developers in large companies
     is supporting future prediction in human users. The sec-
                                                                        are in need of tools that would support them in their predic-
     ond is automatic future prediction based on provided
                                                                        tions and decision making, since the company’s results and
     data. We conduct experiments on the developed proto-
                                                                        profits depend on how accurate their competence in future
     type method and evaluate its effectiveness. In the devel-
                                                                        trend prediction is.
     opment of the method we assumed that sentences from
                                                                           The goal of this study is to provide a tool, that would help
     official sources such as newspapers that refer to future
                                                                        in such predictions. In particular, we aim at creating a sys-
     events could be useful for future prediction. By using so-
                                                                        tem that would perform such predictions automatically. To
     phisticated patterns combining morphemes and seman-
                                                                        achieve this we focus on sentences referring to the future as
     tic roles, we successfully extracted future reference sen-
                                                                        potentially useful in predicting future unfolding of events.
     tences and effectively used them in future prediction per-
                                                                           When humans consider future events, they usually process
     formed both by human users as well as by the fully auto-
                                                                        the information from various domains to join it in one reason-
     matic prototype method. In the experiments we tested the
                                                                        ing. For example, when a company finds out that a decision
     method on a number of future prediction questions from
                                                                        they need to take depends on two events/factors, ‘X’ and ‘Y’
     official Future Prediction Competence Test, performed
                                                                        (although in real life the number of factors is much larger),
     yearly in Japan. Both, the results of the support appli-
                                                                        they can prepare four strategic decisions A, B, C or D for
     cation as well as the automatic method were higher then
                                                                        their company management, depending on the predictions on
     the results of original test participants. Moreover, the
                                                                        what would happen in the future when they select:
     prototype fully automatic method greatly outperformed              1. Strategic decision ‘A’ when both events ‘X’ and ‘Y’ take place.
     all human users. This suggests that the method can be              2. Strategic decision ‘B’ when the event ‘X’ takes place but the
     expected to not only reduce the response time and the                 event ‘Y’ does not.
     amount of information needed for prediction in humans,             3. Strategic decision ‘C’ when the event ‘Y’ takes place but the
     but also perform the prediction automatically on a level              event ‘X’ does not.
     comparable and exceeding an average human.
                                                                        4. Strategic decision ‘D’ when both events ‘X’ and ‘Y’ do not take
                                                                           place.

1   Introduction                                                           When people select their actions from a range of possible
                                                                        options, usually they consider and combine a wide spectrum
One of the main tasks in the field of Artificial Intelligence           of information, including one’s and also other people’s ex-
(AI) is providing means for understanding of the reality                periences and expertise regarding the events. Obtaining such
around us and predicting possible outcomes of certain de-               information for future predictions is a challenging task requir-
cisions. The understanding is often restricted to analysis of           ing much time and labor with a lot of information to process
specific general meaning of, e.g., a particular instance of lan-        and deep foresight ability for making the decision.
guage behavior (sentences, documents, etc.). For example, in               When predicting future events such as X and Y, for ex-
subfields of AI such as Natural Language Processing (NLP),              ample, “Will consumers do shopping with augmented reality
or Sentiment Analysis, speaker attitudes and emotions ex-               (AR) in two years time?” we can think of the following pos-
pressed in a sentence are in the focus of analysis. A different         sibilities: (1) More than half of consumers will shop with
kind of task from the field of NLP, which we focus on in this           AR; (2) Several percent of consumers will shop with AR; (3)
research, is predicting trends of future events on the basis of         The way of shopping will not change comparing to current
provided limited information.                                           situation. This way we can reduce the problem of future pre-
   In everyday life people often apply their knowledge and              diction to predicting which of the limited number of potential
experience about past events as well as their own general               answers (two or more) has higher probability of occurring.


                                                                   42
The predicting thus can be formulated as selecting the correct         human user analysis of occurring future phenomena.
answer, even if the answer is not yet specified at the time of            [Kanhabua et al. 2011] in their investigation of newspa-
prediction.                                                            per articles, found out that one-third of all sentences contains
   Previous studies have shown that data mining using sim-             some reference to the future.
ple statistics can support such predictions regarding future              [Kanazawa et al. 2010] extracted implications for future
outcomes of events. However, to achieve that, one needs to             information from the Web using explicit information, such as
process numerous numeric data, which requires professional             time expressions.
skills, and the expertise to explain the numbers in a compre-             When it comes to predicting the probability of an event to
hensible manner.                                                       occur in the future, [Jatowt and Au Yeung 2011] have pro-
   There have also been studies on predicting future outcomes          posed a clustering algorithm for detecting future phenomena
of events with the use of NLP techniques. Some of them                 based on the information extracted from text corpus, and pro-
have proposed applying causality information and past events           posed a method of calculating the probability of an event to
[Radinsky et al. 2012], which assume that when the event               happen in the future.
A happens, the event B will usually follow. However, such                 Also, [Kanazawa et al. 2011] extracted unreferenced fu-
methods usually are limited to general events from the range           ture time expressions from a large collection of text, and pro-
of widely perceived common sense (e.g., “what will hap-                posed a method for estimating the validity of the prediction
pened when an apple falls on ones head”). Others applied               by searching for a real-world event corresponding to the one
methods based on keyword extraction with their occurrence              predicted automatically.
frequencies in a timeline using past events, temporal expres-             [Aramaki et al. 2011] used SVM-based classifier on Twit-
sions and event-related keywords [Kanazawa et al. 2011].               ter to perform classification of information related to in-
   As for a different research, [Nakajima et al. 2016] have            fluenza and tried to predict the spread of the disease by using
proposed a method for automatic extraction of future refer-            a truth validation method.
ence sentences using combined morphological and semantic                  [Kanazawa et al. 2011] proposed a method for estimation
(morphosemantic) information and suggested that future ref-            of validity of the prediction by automatically calculating co-
erence sentences could be applied in supporting predictions            sine similarity between predicted relevant news and searching
about future events, since they usually contain various related        for the events that actually occurred.
background information, which is also used as the source of
                                                                          [Radinsky et al. 2012] proposed the Pundit system for pre-
knowledge for prediction.
                                                                       diction of future events in news based on causal reasoning
   In this research we propose a prototype method which ap-
                                                                       derived from a similarity measure calculated using different
plies such future reference sentences in the process of future
                                                                       ontologies.
prediction. In the experiments we compare the performance
                                                                          [Jatowt et al. 2013] studied relations between future news
of laypeople and the proposed automated approach and dis-
cuss its effectiveness.                                                in English, Polish and Japanese by using keywords queried
   The outline of this paper is as follows. In Section 2 we            on the Web.
describe previous research related to the prediction of future            Recently, [Zhang et al. 2016] performed a variation anal-
events. Section 3 describes the proposed method applying               ysis of the evolution of technology for techniques to learn
automatic extraction of references to future events and the            causality relations from past events for the extraction of fea-
experiments evaluating the method. Section 4 describes the             tures that cause future changes.
experiment to verify the effectiveness of future reference sen-           The above findings have lead us to the idea that by using
tences applied to real-world future prediction events. Section         expressions referring to the future included in newspaper ar-
5 describes the prototype method for automated prediction of           ticles it could be possible to support human users in the pro-
event unfolding. Finally, section 6 contains conclusions and           cess of future prediction as one of the activities people per-
our plans for improvement of the prototype method and pos-             form everyday. Moreover, by applying appropriate reason-
sible applications of the proposed method in future tasks.             ing algorithm, it could be possible to automatize the process
                                                                       and create a system capable of automatic future prediction.
                                                                       A method like that could have a number of applications in
2   Previous Research                                                  various fields, such as in corporate management, trend fore-
There has been a number of studies in linguistically expressed         sight, and preventive measures, etc. Also, as indicated by
future reference detection.                                            previous research, when applied in real time analysis of So-
   For example, [Baeza-Yates 2005] investigated half a mil-            cial Networking Services (SNS), such as Twitter or Facebook,
lion of sentences containing future events extracted from one          it could also be helpful in disaster prevention or handling of
day of Google News (http://news.google.com/), and found                disease outbreaks.
out that scheduled events occur with high probability and with            As for practical applications used in future trend predic-
correlation between the occurrence of an event and its time            tion, Stanford Temporal Tagger1 converts natural language
proximity. Therefore the information about upcoming events             input such as “next Wednesday at 3pm” into particular calen-
is of a high importance for predicting future outcomes.                dar based schedule such as “2016-02-17 T 15:00” depending
   [Jatowt et al. 2009] also focused on news articles, and used        on the assumed current reference time. Similar is possible
a rate of incidence of reconstructed news articles over time
                                                                          1
to forecast recurring events, which they used for supporting                  https://nlp.stanford.edu/software/sutime.html


                                                                  43
with HidelTime2 [Strötgen and Gertz] is as well.                            Table 1: An example of a sentence analyzed by ASA.
   Methods like above, using time referring information, such
as “year”, “hour”, or more general “tomorrow”, etc., has been             Example I: Romanized Japanese (RJ): Ashita kare wa kanojo
applied before in extracting future information and retrieving            ni tegami o okuru darō. / Glosses: Tomorrow he TOP her DIR
relevant documents. It has also been indicated that it is useful          letter OBJ send will (TOP: topic particle, DIR: directional particle,
to predict future outcomes by using information occurring in              OBJ: object particle.) / English translation (E): He will [most
                                                                          probably] send her a letter tomorrow.
present documents. A main difference with our research is
the fact that we focused not only on the explicit simple and              No. Surface     Label
obvious patterns, such as time expressions, but on more so-               1   ashita      [Time-Point]
phisticated expressions, combining both morphological and                 2   kare ha     [Agent]
                                                                          3   kanojo ni   [Patient]
semantic information, and automatically extracted such mor-               4   tegami o    [Object]
phosemantic sentence patterns.                                            5   okuru darou [State change]-[Place change]-
                                                                                          [Change of place(physical)]
3       Automatic Extraction of Future Reference                         synsets [Fellbaum et al. 2009], or analysis of a Croatian lexi-
        Sentences                                                        con [Raffaelli 2013].
                                                                            Below we describe the process of morphosemantic repre-
In this section, we describe the method for extraction of future
                                                                         sentation of sentences we applied in this research.
reference sentences from news corpora.
                                                                            At first, the sentences from the datasets (Japanese news-
   Future reference sentences include both explicit as well as
                                                                         paper corpora) are analyzed using semantic role labeling
implicit expressions referring to the future. Explicit expres-
                                                                         (SRL), which provides labels for words and phrases accord-
sions include e.g., future temporal expressions, or words and
                                                                         ing to their role in sentence context.
phrases referring to the future (e.g. will∼, is expected to∼,
                                                                            For SRL in Japanese we used ASA, a system which pro-
plan to∼, etc.).
                                                                         vides semantic roles for words and generalizes their seman-
   However, many important sentences do not contain such                 tic representation using an originally developed thesaurus
explicit expressions, but the information regarding future out-          [Takeuchi et al. 2010]. An example of SRL provided by ASA
comes is implicit. See the example below regarding the future            is represented in Table 1.
of America’s army troops dispatch to Afghanistan.                           However, not all words are semantically labeled by ASA.
“He rejoiced to hear that President Obama had reemphasized the
                                                                         The omitted words include, e.g., grammatical particles, or
need to focus on the War on Terror in Afghanistan, increasing the
                                                                         function words not having a direct influence on the seman-
likelihood of an early withdrawal of U.S. troops from Iraq.”
                                                                         tic structure of the sentence, but in practice contributing to
The sentence does not contain any future referring expres-               the overall meaning. For those remaining words we used a
sions. Moreover, the sentence is in past tense (“rejoiced”,              morphological analyzer MeCab3 in combination with ASA to
“had reemphasized”), and therefore it is not possible to spec-           provide morphological information, such as “Proper Noun”,
ify that the sentence refers to the future by using standard             or “Verb”. Moreover, as a post-processing procedure we
methods. Yet, the sentence clearly presents potential future             added a set of linguistic rules for specifying compound words
outcomes (“withdrawal of U.S. troops from Iraq”) with the                in cases where only morphological information was provided.
use of implicit information.                                                Finally, for cases where the labels provided by ASA were
   The method proposed here deals with both explicit as well             too specific (see Table 1), we normalized and simplified the
as implicit information, such as above. It consists of two               labels according to the following label priorities.
stages. Firstly, the sentences are represented in a morphose-
                                                                           1. Semantic role (Agent, Patient, Object, etc.)
mantic structure [Levin and Rappaport Hovav 1998] (combi-                  2. Semantic meaning (State change, etc.)
nation of semantic role labeling and morphological informa-                3. Category (Dog → Living animal → Animated object)
tion). Secondly, frequent combinations of such patterns are                4. In case ASA does not provide any of the above la-
automatically extracted from training data and used in classi-                 bels, perform compound word clustering for parts
fication.                                                                      of speech (e.g., “International Joint Conference on
                                                                               Artificial Intelligence” → Adjective Adjective
3.1       Morphosemantic Patterns                                              Noun Preposition Adjective Noun                       →
In the first stage of the method, all sentences are represented                Proper Noun)
in morphosemantic structure (MS) for further extraction of                4.1 If a compound word can be specified, output the part-of-
morphosemantic patterns (MoPs).                                                speech cluster.
   The idea of MS has been described widely in linguistics
and structural linguistics. [Levin and Rappaport Hovav 1998]              4.2 If it is not a compound word, output part-of-speech for
distinguish morphosemantics as one of the basic type of mor-                   each word.
phological operations on words, modifying the Lexical Con-                  Below is an example of a sentence represented in the
ceptual Structure (LCS) of a word.                                       above morphosemantic structure.
   MoPs have been applied in analysis of an Indonesian suf-              Romanized Japanese: Nihon unagi ga zetsumetsu kigushu
fix –kan [Kroeger 2007], improving links between WordNet                 ni shitei sare, kanzen yōshoku ni yoru unagi no ryōsan ni
    2                                                                       3
        http://dbs.ifi.uni-heidelberg.de/index.php?id=129                       http://taku910.github.io/mecab/


                                                                    44
Table 2: Examples of extracted morphosemantic patterns
(MoPs).

 Occ.    Future Reference Patterns       Occ.   Non-future Reference Patterns
  43     [Action]*[Object]                4     [Numeric]*[Agent]
  42     [Action]*[Action]                4     [Verb]*[Artifact]
  26     [Action]*[State change]          5     [Place]*[Agent]
  20     [State change]*[Object]          4     [Person]*[Place]
  16     [State change]*[State change]    3     [Numeric]*[Agent]*[Action]
                        .                                      .
                        .                                      .
                        .                                      .

kitai ga takamatte iru.                                                                Figure 1: Final overall results of fully optimized model.
English: As Japanese eel has been specified as an en-                                3.3      Future Reference Sentence Extraction with
dangered species, the expectations grow towards mass                                          Morphosemantic Patterns
production of eel in full aquaculture.
SRL:       [Object][Agent][State change][Action]-                                    From three newspaper corpora4 we collected and annotated
[Noun][State change][Object][State change]                                           a dataset containing equal number of (1) sentences referring
                                                                                     to future events and (2) other (describing past, or present
3.2     Future Reference Pattern Extraction                                          events). We conducted an evaluation experiment with train-
From sentences represented in morphosemantic structure we                            ing dataset containing 130 sentences each, furthermore as the
extract frequent MoPs, by firstly, generating ordered non-                           test data we used randomly extracted additional 170 sentences
repeated combinations from all sentence elements. In every                           from the news corpora.
n-element sentence there is k-number of combination groups,                             The test datasets were applied in a text classification task
such as that 1 ≤ k ≤ n. Next, all non-subsequent elements                            with 10-fold cross validation. Each classified test sentence
are separated with a wildcard (“*”, asterisk). Pattern lists ex-                     was given a score calculated as a sum of weights of patterns
tracted this way from training sets are then used in classifica-                     extracted from training data and found in the input sentence.
tion of test and validation sets.                                                    The results were calculated with Precision, Recall and bal-
   For all patterns generated this way their occurrences O are                       anced F-score. We compared fourteen classifier versions. The
calculated, and frequent (O ≥ 2) patterns are retained. Next,                        results indicated that the highest overall performance was ob-
the occurrences are used to calculate pattern weight. Two                            tained by the version using pattern list containing all patterns
features are important in weight calculation: pattern length k                       (including ambiguous patterns and n-grams). We looked at
(number of elements it contains) and its occurrence O (how                           top scores within the threshold, checked which version got
many times it occurs in the dataset) Thus in the experiments                         the highest break-even point (BEP) of Precision and Recall,
we modified the weight by                                                            and calculated statistical significance of the results.
   • awarding length (LA),                                                              Finally, we compared the proposed method to [Jatowt et
   • awarding length and occurrence (LOA),                                           al. 2013], who extracted future reference sentences with 10
   • awarding none (normalized weight, NW).                                          words explicitly referring to the future, such as “will” or “is
   The generated list of frequent patterns can be also further                       likely to”, etc. In comparison, the proposed method obtained
modified. When two collections of sentences of opposite                              better results even when only 10 most frequent MoPs were
features (such as “future-related vs. non-future-related”) are                       used (see Table 3 for details).
compared, the list will contain patterns that appear uniquely                           Moreover, we verified the performance of the fully opti-
in only one of the sides (e.g., uniquely positive patterns and                       mized model. We retrained the best model using all sentences
uniquely negative patterns) or in both (ambiguous patterns).                         from the initial training dataset and verified the performance
Thus we also modified pattern lists by                                               by classifying the new validation set. The final overall per-
   • using all patterns (ALL),                                                       formance was represented in Figure 1. Finally, the obtained
   • erasing all ambiguous patterns (AMB),                                           BEP was 0.76.
   • erasing only those ambiguous patterns which appear in
      the same number in both sides (zero patterns 0P, since                         Table 3: Comparison of results for validation set between dif-
      their normalized weight is equal zero).                                        ferent pattern groups and the state-of-the-art.
Moreover, a list of patterns will contain both the sophisticated
patterns (with disjointed elements) as well as more common                             Pattern set                         Precision Recall F-score
                                                                                       top 10 patterns                       0.39     0.49 0.43
n-grams. Therefore the system can be trained on a model                                top 10 patterns with over 3 elements 0.42      0.37 0.40
using                                                                                  top 5 patterns                        0.35     0.35 0.35
   • patterns (PAT), or                                                                Optimized (see Fig. 1)                0.76     0.76 0.76
   • only n-grams (NGR).                                                               [Jatowt et al. 2013] (10 phrases)     0.50     0.05 0.10
   All combinations of the above modifications are tested in
the experiments.
   Examples of extracted MoPs of FRS and non-FRS with
                                                                                        4
their occurrences were shown in Table2.                                                     Nihon Keizai Newspaper, Asahi Newspaper, Hokkaido Newspaper.


                                                                                45
4       Future Prediction Support Experiment                                  Question 3: Predict the stationing status
                                                                              of US troops in Afghanistan at the end of
The validity of the method described in previous section                      June 2011.
needs to be tested twofold. Firstly, by verifying the capability
of the method to provide a support for human users perform-                   (A) The U.S. troops will be still present and
                                                                                  further reinforced comparing to October
ing a task of prediction of how a future event will unfold.                       2009.
Secondly, by testing a fully automated process of prediction.                 (B) The U.S. troops will be still present on
   In this section we present a validation experiment for the                     similar level comparing to October 2009.
effectiveness of using Future Reference Sentences in the task                 (C) The U.S. troops will be still present but
of supporting human users in predictions regarding future                         in decreased number comparing to October
events.                                                                           2009.
                                                                              (D) The U.S. troops will be completely
4.1      Experiment Setup                                                         withdrawn.
                                                                              Answer:
In the experiment for supporting future trend prediction we
                                                                              [ 1st candidate:                  / 2nd candidate:
used the fully optimized model of FRS trained on MoPs de-                     / 3rd candidate           ]
scribed in section 3.3. The model was applied to extract new
FRS concerning a specific topic, from the available newspa-                   Specify which sentence (number ID) from
                                                                              the prepared Future Prediction Support
per data. Such sentences were further called future predic-                   Sentences was most useful in making the
tion support sentences (FPSS). Future prediction task was                     above decision:
performed by a group of thirty laypeople5 , who were told to
                                                                              1st candidate: [              ]      2nd candidate:       [
read the FPSS and reply to questions asking them to predict                   ]    3rd candidate:           [      ]
the future in 1–2 years from now, or from the starting point of
prediction.                                                                 Figure 2: An example of one multiple choice question from
   The questions were taken from the Future Prediction Com-                 the 4th Future Prediction Competence Test with an additional
petence Test (FPCT, japanese: Senken-ryoku Kentei), re-                     question about which of the automatically extracted sentences
leased by the Language Responsibility Assurance Associa-                    presented to the user was the most useful.
tion(LRAA, japanese: Genron Sekinin Hoshō Kyōkai)6 , a
nonprofit organization focused on supporting people of in-                  and leisure. The test contained a large number of multiple
creased public responsibility (managers, politicians) and peo-              choice questions and several questions requiring predicting
ple responsible of making decisions influencing civic life. In              specific numbers. There was also a small number of ques-
particular, the organization helps preparing public speeches                tions requiring a written explanation of the reasoning for the
and responsibility bound presentations, by training individu-               prediction. When participating in the test, respondents can
als in predicting possible outcomes of future events. A part                browse any and all available materials, and are free to seek
of this training consists of taking part in the FPCT.                       opinions of others in answering the question, but the submis-
   The FPCT is an examination that measures prediction abil-                sion deadline was fixed and set at December 31st, 2009 (end
ities in humans regarding specific events that are to happen                of the year). The scoring is set at 90 total points on prediction
in 1–2 years in the future. It has been initiated in 2006 and               questions and 30 total points for descriptive questions, with a
from that time it has been performed six times. The test con-               total of 120 points.
sists of various questions, including multiple choice questions                In the future prediction support experiment the developed
(e.g., “Will US Army contingent in Afghanistan increase or                  method extracts FPSS related to a given question and pro-
decrease during next year?”), essay questions (e.g., “Describe              vides assistance for human users on which answer to choose
economic situation of a country after next two years”), and                 during the test. Therefore for its evaluation we limited the
questions that must be answered using numbers (e.g., “What                  questions to multiple-choice questions. Questions with two
will be the exchange rate of Japanese Yen to US Dollar after                or more (multiple) possible answers were selected from the
two years”), and they are scored after those particular events              4th FPCT and applied as questions for the experiment. One
have come to light.                                                         of the questions was represented in Figure 2.
   The questions for the experiment were selected from the
4th of the past six FPCTs, as it had the largest total number of            4.2   Data Preparation
questions, and respondents, which would assure the highest
                                                                            In this section we describe the process of data preparation for
possible objectivity of the evaluation. Implemented in 2009,
                                                                            the experiment. Firstly, a total of 7 multiple-choice questions
the 4th FPCT contained questions regarding predictions for
                                                                            were selected from the 4th FPCT. Next, for each question we
2010 and 2011, and the scoring was performed in 2011. Re-
                                                                            extracted a number of FPSSs from news corpus to be read by
spondents were to choose to answer at least 15 questions from
                                                                            participants. Differently to the original settings of the Future
a total of 25 questions in six areas, namely, politics, eco-
                                                                            Prediction Competence Test, where the participants could re-
nomics, international events, science and technology, society,
                                                                            fer to any information and had the whole year to prepare their
    5
     25 males and 5 females, age groups from university students            answers, participants of our experiment were to only use the
studying computer science (28 user samples) to their fifties (2 user        provided FPSSs answer the questions at the time in the exper-
samples).                                                                   iment.
   6
     http://genseki.a.la9.jp/kentei.html                                       The FPSSs for each question is collected in the following


                                                                       46
Table 4: Keywords for collecting future reference sentences             Table 5: Performance of subjects in the future prediction task
(FRS) for each of the questions.                                        supported with the proposed method with comparison to orig-
                                                                        inal results of the 4th Future Prediction Competence Test.
       No.                         Keywords
                                                                                                     correct answer accuracy
       Q1-1   Participation in regional government by foreigners
              with permanent resident status | Participation in                                    average highest lowest
              regional government permanent resident alien                      Original FPCT       0.3344 0.6111 0.0666
       Q1-2   Husband and wife retaining separate family names                  Our experiment      0.3810 0.6190 0.1428
       Q2     Midterm elections | (Republic Party | Democratic
              Party) & (United States | America)                        that some of the experiment participants might have already
       Q3     Afghanistan                                               known the unfolding of the events in question and use this
       Q4     Analog broadcasting | Digital broadcasting
       Q5     Child allowance
                                                                        knowledge in their advantage. Therefore just in case, we also
       Q6     (Democratic Party | Liberal Party | Ruling Party)         warned the experiment participants that in answering, they
              & election                                                should use only the knowledge provided in the FPSS.
                                                                           Although participants of our experiment obtained higher
steps. At first we extracted from the Mainichi Newspaper’s              scores than the participants of original test, the improve-
entire 2009 year all sentences related to the questions on the          ment was not large. However, as the major contribution of
basis of topic keywords (Table 4), selected as nouns that ap-           our method for future prediction support the following can
peared in the original questions or answering options. We               be considered. Even if we acknowledge that the improve-
also manually expanded search query by adding semantically              ment was not sufficient, and that our subjects performed sim-
related keywords.                                                       ilarly to original participants, our subjects made their deci-
                                                                        sions based only on about thirty specifically extracted FPSSs
                                                                        and were given only a short time for decision, whereas the
   The sentences were represented in morphosemantic struc-              original participants had over one year term for preparing the
ture, analyzed using our fully optimized model, and sorted in           answer, unlimited access to all available data and receiving to
a descending order of resemblance to FRS. Next, we retained             help from any other people including experts.
only those FPSS which scores were over 0.0 and presented                   It indicates that accurately extracted future reference sen-
the highest 30 of them to the subjects in chronological order           tences are useful in prediction of future event unfolding for
(date the sentence appeared in a newspaper), so the subjects            people who have no knowledge regarding those future events.
had a better image of how the events unfolded, which would              Hence, supporting future prediction with FPSS for specific
make the prediction more natural.                                       topics can be considered at least as efficient as collecting
   The limit of thirty sentences was set so the subjects did            available information by oneself for one year time.
not become tired of the task. However, we kept the rest of
the sentences in case the subjects insisted on further reading.         5     Prototype Method for Automatic Future
In situations where the list of initial sentences extracted with              Prediction
topic keywords was less than thirty, we presented to the sub-
                                                                        In this section, we describe and validate a prototype method
jects all sentences which had a probability of being FRS.
                                                                        for prediction of future event unfolding. In the validation ex-
   The questions were answered directly after reading only
                                                                        periment we aim to perform the previous task – described in
the FPSS. Additionally, the laypeople were asked to report
                                                                        Section 4 – fully automatically.
the ID number of the FPSS they referred to in their answer
(or the FPSS considered as the most informative or useful).             5.1    Method Description
   We evaluated their answers based on the original scoring             We developed the prototype method for automatic future pre-
schema. In particular, each of the questions 1, 2, and 7 were           diction to analyze the questions from the Future Prediction
allocated up to 3 points. Moreover, in questions 2–5 the                Competence Test used in previous experiment. Although the
laypeople were allowed to make up to three candidate choice             method can be adapted to analyze any content, in this research
answers: primary, secondary and tertiary candidate, assigned            we limited its functionality to the existing data to make the
3, 2 points and 1 point, respectively.                                  evaluation possible and as objective as possible.
                                                                           The method consists of following steps.
4.3   Experiment Results and Discussion
                                                                          1. Building an optimized model for Future Reference Sen-
The results of experiment were summarized in Table 5.                        tence (FRS) extraction (Section 3.3),
   In the performed experiment, the average score of the ex-              2. Extracting topic keywords from questions about future
periment participants was 38.10% (see Table 5). In compar-                   unfolding of events (Section 4.2),
ison, in the original FPCT, the average score percentage of               3. Applying the FRS extraction model and the topic key-
the test participants was 33.44%. Therefore the results of our               words to extract FRS related to question from a limited
experiment participants were slightly higher.                                corpus data,
   One issue with the performed experiment was that the                   4. Train a new event-topic-specific model on the extracted
questions for the prediction task were in fact past events                   topic-related FRS, using the method for Automatic Ex-
for the laypeople. Therefore, although the questions were                    traction of Future Reference Sentences (Section 3),
very specific, meaning it was not likely that the participants            5. Analyze answers to the each Question and choose the
knew or remembered the events, there was always a chance                     one with highest score as the correct answer.


                                                                   47
5.2   Experiment Setup and Data Preparation                             of the method, there was no difference in final ratio of correct
We evaluated the performance of the prototype method for                answers between the version using thirty FRS or less, and all
automatic a future prediction. Originally, the evaluation task          FRS with over 0.98 of FRS-resemblance score (see Table 6).
was that laypeople were to read the automatically extracted             Considering that the number of FRS used in training did not
Future Prediction Support Sentences (FPSS) related to Ques-             influence the results, it could be more efficient to use the ver-
tions from the Future Prediction Competence Test (FPCT)                 sion of the method using fewer number of sentences.
and to select those answers to the questions they considered
as correct using only the provided FPSS.                                5.4    Discussion
   The method for automatic prediction takes the human out              In this experiment, we automatized the task of reading future
of the loop in the prediction task. Therefore in practice the           reference sentences and responding to future prediction ques-
method would account for automatically reading through the              tions. The experiment results showed an improvement of over
limited corpus and provide automatic inference regarding an-            20 percentage points of the developed prototype method over
swers to the FPCT questions based only on the automatically             human participants who took part in the prediction support
learned information.                                                    experiment. Moreover, the result was 23.7 percentage-point
   In the evaluation, as the reference corpus for the method for        higher than for average results of participants of the original
learning we applied the same newspaper corpus as in Section             4th FPCT. In fact, with Accuracy on the level of 57.14% it
3.3, but limited to one year, namely 2009, which presumably             was very close to the highest results obtained by participants
contained news articles related to the questions.                       of original test (61.11%) and our future prediction support
   For each of the questions we used the extracted topic key-           experiment (61.9%). Therefore we can clearly say that the
words (Table 4) with the fully optimized model (Section 3.3)            prototype method was nearly as good in predicting the un-
to extract FRS related to each question. Next, the newly ob-            folding of future events as the best humans, and it was almost
tained FRS were used as training data to train a new model              twice as good as an average human, both using all available
for each question. Finally, the newly created topic-oriented            resources and preparing their answer for a year, and using our
FRS-based model was used to analyze the answers for each                support method and making the prediction at the time of the
question (see example in Figure 2) and the answer with the              experiment. The final results were compared in Table 8. In
highest score was selected as the correct one.                          addition, when the correct answer was allowed till the third
   Moreover, in order to analyze the influence of FRS on the            candidate, the 5 out of 7 questions could be considered cor-
accuracy rate of correct answers, we developed two versions             rect, which gives a 71.43% of Accuracy, being over twice as
of the prototype method.                                                high as an average and over 10 percentage points higher the
Ver. 1: Using for training thirsty or less FRS (condition sim-          best scoring human. Furthermore, if the tendencies of cor-
        ilar to the one under which experiment participants             rect answer rates for each question is compared between the
        performed the future prediction task in the future pre-         prototype method, and the future prediction support experi-
        diction support experiment, explained in Section 4),            ment, the tendencies of correct and incorrect answers were
Ver. 2: Using for training all FRSs which scored over 0.98              very similar, meaning, the inference resembles, and exceeds
        (condition experimentally selected as optimal for               human performance.
        FRS, explained in Section 3.3).                                    The result was more than satisfactory, although we ac-
                                                                        knowledge that there were many limitations imposed by the
                                                                        controlled character of the experiment. Therefore we need to
5.3   Experiment Results                                                convey additional experiments on other real world events to
To put the developed prototype method in the same standpoint            obtain a clearer image of the capabilities of our method, most
as human participants, in evaluation of the prototype method            desirably on events that in reality will unfold in future from
we adopted the same weighted scoring schema as in the fu-               the time of the prediction.
ture prediction support experiment (Section 4.2). Namely, for
questions 1, 2, and 7 if the prototype method answered cor-
rectly, it obtained 3 points for each question. Furthermore,
                                                                        6     Conclusions and Future Works
for questions 2–5 if the correct answer was selected by the             In this paper we conducted two experiments to determine
prototype method as either first, second or third candidate, it         whether Future Reference Sentences (FRS) are effective in
was assigned 3, 2 points or 1 point, respectively.                      supporting future trend prediction by (1) laypeople and (2)
   The results of the prototype method for each question were           by a prototype fully automatic method. We applied questions
shown in Table 6. An example of scoring of answers for two              from the official Future Prediction Competence Test (FPCT)
questions (Q1-1 and Q3), for both versions of the prototype             and, using topic keywords from those questions, gathered
method were represented in Table 7. For each question the               newspaper articles from the entire applicable year. Then we
answer with the highest score is selected as the correct one.           extracted topic-related FRS from those articles, to be used in
   The version of the method using thirty FRS, obtained the             predicting the unfolding of the events.
accuracy rate of correct answers 57.14%. It is an improve-                 In the results we obtained a small improvement over the
ment of over 20 percentage-points over the results obtained             original FPCT. The original test allowed preparing answers
by human participants.                                                  for over a year and using any available information. On the
   Additionally, although the the scores assigned by the pro-           other hand, participants of our experiment answered the ques-
totype method to each answer were different for both versions           tions immediately after reading the provided support material


                                                                   48
Table 6: Results of classification of answer options by both versions of the prototype method for Question 1-1 and Question 3.
             correct (*)   answer option                                                                                                                         score ≥ 0.00   score ≥ 0.98
      Q1-1                                                                                                                                                               (16)           (10)
                           (a) The right to participate in regional politics will be established for foreign permanent residents by the end of June, 2010.               1.82           1.82
                      *    (b) The right to participate in regional politics will not be established for foreign permanent residents by the end of June, 2010.           2.00           2.00
       Q3                                                                                                                                                                (30)          (172)
                      *    (a) The U.S. troops will be still present and further reinforced comparing to October 2009.                                                   1.71           2.17
                           (b) The U.S. troops will be still present on similar level comparing to October 2009.                                                         1.82           2.22
                           (c) The U.S. troops will be still present but in decreased number comparing to October 2009.                                                  2.04           2.41
                           (d) The U.S. troops will be completely withdrawn.                                                                                             1.50           1.91


Table 7: Results of prototype method in comparison to future prediction support experiment for each of the applied questions.
                       type of experiment                                      score average                            total (max. 21 pt)     Accuracy (% of correct)
                                                           Q1-1     Q1-2     Q-2 Q-3 Q-4               Q-5     Q-6
                       Prototype method                    3.00     3.00     3.00 1.00 0.00            2.00    0.00     12.00                  57.14%
                       Experiment with laypeople           1.91     0.55     1.36 2.36 0.27            1.27    0.27     8.00                   38.10%


Table 8: Comparison of the correct answer accuracies be-                                             [Jatowt et al. 2013] Adom Jatowt, Hideki Kawai, Kensuke
tween the prototype method, the future prediction support ex-                                           Kanazawa, Katsumi Tanaka, Kazuo Kunieda, Keiji Yamada.
periment and original Future Prediction Competence Test.                                                2013. Multilingual, Longitudinal Analysis of Future-related
                correct answer accuracy rate                                                            Information on the Web. Proceedings of the 4th International
                                                                                                        conference on Culture and Computing 2013, IEEE Press.
   Prototype method Our experiment Original FPCT                                                     [Kanazawa et al. 2010] Kensuke Kanazawa, Adam Jatowt, Satoshi
              0.5714            0.3810             0.3344                                               Oyama, Katsumi Tanaka. 2010. Exracting Explicit and Implicit
                                                                                                        future-related information from the Web(O) (in Japenese). DEIM
which consisted of only thirty (or less) automatically selected                                         Forum 2010, paper ID: A9-1.
sentences. The time spent and the amount of information to                                           [Kanazawa et al. 2011] Kensuke Kanazawa, Adam Jatowt, Kat-
be processed for answering the future related questions was                                             sumi Tanaka. 2011. Improving Retrieval of Future-Related Infor-
greatly reduced with our support method. This indicates that                                            mation in Text Collections. 2011 IEEE/WIC/ACM International
FRS are useful in supporting the prediction of future events.                                           Conference on Web Intelligence and Intelligent Agent Technology
   However, the most interesting discovery was made in the                                              (WI-IAT), pp. 278–283.
experiments with the proposed prototype fully automatic                                              [Kanhabua et al. 2011] Nattiya Kanhabua, Roi Blanco, Michael
method for prediction of future unfolding of event, which re-                                           Matthews. 2011. Ranking related news predictions. Proceedings
                                                                                                        of the 34th international ACM SIGIR conference on Research
sults showed that our prototype method exceeded even the                                                and development in Information Retrieval, pp. 755-764.
best humans in the prediction task and the average human                                             [Kroeger 2007] Paul Kroeger. 2007. Morphosyntactic vs. mor-
over two times. This provides a strong suggestion that full                                             phosemantic functions of Indonesian -kan. In Architectures,
automation of future prediction is possible.                                                            Rules, and Preferences: Variations on Themes of Joan Bresnan,
   In the future, we plan to use this method with other cor-                                            edited by A. Zaenen, J. Simpson, T. H. King, J. Grimshaw, J.
pora to conduct experiments on real-world problems, includ-                                             Maling and C. Manning, pp. 229-251. Stanford, CA: CSLI Publ.
ing various lengths of the term of prediction to specify to what                                     [Levin and Rappaport Hovav 1998] Beth Levin and Malka Rap-
extent the method is applicable in future prediction. Also,                                             paport Hovav. 1998. Morphology and Lexical Semantics. In
carrying out a chronological analysis of FRS and the addition                                           Spencer and Zwicky, eds., pp. 248-271.
of sentiment analysis could lead to the discovery of additional                                      [Nakajima et al. 2016] Yoko Nakajima, Michal Ptaszynski, Fumito
new knowledge. We also plan to take part in the next FPCT.                                              Masui, Hirotoshi Honma. 2016. A Method for Extraction of Fu-
                                                                                                        ture Reference Sentences Based on Semantic Role Labeling, IE-
                                                                                                        ICE Trans. on Inf. and Sys., Vol.E99-D, No.2, pp.514-524.
References                                                                                           [Radinsky et al. 2012] Kira Radinsky, Sagie Davidovich and Shaul
[Aramaki et al. 2011] Eiji Aramaki, Sachiko Maskawa, Mizuki                                             Markovitch. 2012. Learning causality for news events prediction.
   Morita. 2011. Twitter catches the flu: Detecting influenza epi-                                      The 21st International Conference on World Wide Web.
   demics using twitter. Proceedings of the Conference on Empiri-                                    [Raffaelli 2013] Ida Raffaelli. 2013. The model of morphosemantic
   cal Methods in Natural Language Processing, pp. 1568–1576.                                           patterns in the description of lexical architecture. Lingue e lin-
[Baeza-Yates 2005] R. Baeza-Yates. 2005. Searching the Future.                                          guaggio, Vol. 12, No. 1 (2013), pp. 47-72.
   SIGIR Workshop on MF/IR.                                                                          [Strötgen and Gertz] Jannik Strötgen and Michael Gertz. 2010.
[Fellbaum et al. 2009] Christiane Fellbaum, Anne Osherson, and                                          HeidelTime: High Quality Rule-Based Extraction and Normal-
   Peter E. Clark. 2009. Putting semantics into WordNet’s “mor-                                         ization of Temporal Expressions. Proceedings of the 5th Interna-
   phosemantic” links. Human Language Technology. Challenges                                            tional Workshop on Semantic Evaluation, pp.321–324.
   of the Information Society, Springer, pp. 350-358.                                                [Takeuchi et al. 2010] Koichi Takeuchi, Suguru Tsuchiyama,
[Jatowt et al. 2009] Adam Jatowt, Kensuke Kanazawa, Satoshi                                             Masato Moriya, Yuuki Moriyasu. 2010. Construction of Argu-
   Oyama, Katsumi Tanaka. 2009. Supporting analysis of future-                                          ment Structure Analyzer Toward Searching Same Situations and
   related information in news archives and the Web. In 9th ACM/                                        Actions. IEICE Technical Report, Vol. 109, No. 390, pp. 1-6.
   IEEE-CS Joint Conference on Digital Libraries, pp. 115-124.                                       [Zhang et al. 2016] Yating Zhang, Adam Jatowt, Katsumi Tanaka.
[Jatowt and Au Yeung 2011] Adam Jatowt, Ching-man Au Yeung.                                             2016. Detecting Evolution of Concepts based on Cause-Effect
   2011. Extracting collective expectations about the future from                                       Relationships in Online Reviews. Proceedings of the 25th Inter-
   large text collections. The 20th ACM international conference on                                     national World Wide Web Conference (WWW 2016), ACM Press,
   Information and knowledge management, pp. 1259–1264.                                              pp. 649-660.


                                                                                                49

</pre>