<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Matrix Completion for Storm Damages Prediction</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Quang-Khai Tran</string-name>
          <email>khai.tran@kisti.re.kr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jung-Ho Um</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sa-kwang Song</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Korea Institute of Science and Technology Information (KISTI).</institution>
          <addr-line>Daejeon</addr-line>
          ,
          <country>Republic of Korea</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Science and Technology (UST).</institution>
          <addr-line>Daejeon</addr-line>
          ,
          <country>Republic of Korea</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Forecasting weather disasters is very important, but still remains a big challenge for science. Aiming to tackle this issue, our study attempts to predict storm damages by using Semantic Web data (SRBench) and techniques (matrix completion methods and Statistical Unit Node Set framework). Preliminary experiments try predicting which regions are likely to be hit by the most deadly storms like hurricane Katrina (in USA, 2005). Result shows that, even with incomplete data, the approach can determine highly threatened locations at different timesteps. It also hints the ability to forecast different storm damage scenarios.</p>
      </abstract>
      <kwd-group>
        <kwd>matrix completion</kwd>
        <kwd>statistical unit node set</kwd>
        <kwd>storm damages prediction</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        In 2005, hurricane Katrina hit the US, killed more than 1800 people and caused about
$108 billion of property loss [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Unfortunately, prediction of storm's intensity and
track is still a big challenge for modern meteorological models. Approaching the issue
differently to such models, which often require good and sufficient data, and inspired
by work in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], we propose an alternative to estimate the likelihood that storm
damages may happen to some locations, based-on Semantic Web (SW) technologies.
      </p>
      <p>
        We consider 5 hurricanes in the US provided in the SRBench data set [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]: Charley
(2004), Katrina (2005), Wilma (2005), Gustav (2008) and Ike (2008). For learning
and predicting, Matrix Completion (MC) methods, used in the Statistical Unit Node
Set (SUNS) framework [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], are employed as the multivariate regression approach.
Usually, SW data is incomplete, but multivariate learning has shown its strength in
dealing with information-missing data [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In [
        <xref ref-type="bibr" rid="ref2 ref3 ref6">2, 3, 6</xref>
        ], the same research group
combined SPARQL with MC under SUNS framework and applied it successfully with
several SW data sets (such as friend-of-a-friend and gene-disease relationships).
Known triples of storms' data are trans-coded into matrices by using SUNS approach,
and MC process based-on Singular Value Decomposition (SVD) technique is used for
filling the missing value of unknown triples.
(*) Corresponding author
2.1
      </p>
      <sec id="sec-1-1">
        <title>Constructing data matrix from SRBench</title>
        <p>We consider “Storm” and “Location” as the center concepts, and their relationship is
the objective, of the learning and predicting processes. Under the SUNS framework,
they become statistical units of interest, with relationship represented by the triple
form (subject, predicate, object) of the Resource Description Framework. Hence, a
triple is to indicate that storm- may “hit” location- . Considering the
time-series of streaming data, the triple is extended as . Value of this
extended triple is 1 if is occurring over at time , and 0 otherwise.</p>
        <p>Fig. 1 shows an example of a grid of 6 locations hit by hurricane Katrina. At tim e
, truth values of 6 hitting triples corresponding to 6 locations form the i-th row of
the hitting matrix. So that, an -to-6 matrix is generated for time-steps of the storm.
To perform multivariate regression, data of weather attributes in each location
(winddirection, wind-speed, pressure) and attributes of the storm (latitude, longitude,
pressure, wind-speed) is also used to form the columns of co-variate matrices in the same
way, but filling the matrices with observed values rather than truth values.</p>
        <p>Fig. 2 represents the data matrix aggregated from the above co-variate matrices
and hitting triple matrix, with new data of observation (input data of prediction) added
as new rows. For predicting the occurrence of the storm in the future, for example at
next 6 hrs, hitting data of the next 6hrs will replace the matrix of current hitting triple,
or it can be used to expand the aggregated matrix (column-expansion).</p>
        <p>
          With a training data matrix , MC is to find a matrix model , which can be
considered as a generalization of , via low-rank SVD decomposition. In [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], authors
introduced Reduced-rank Penalized Regression (RRPR1) algorithm:
(1)
(2)
2.2
        </p>
      </sec>
      <sec id="sec-1-2">
        <title>Matrix Completion</title>
        <p>SUNS framework uses MC methods based-on SVD factorization for filling unknown
entries in the aggregated matrix. Low-rank SVD decomposition is defined as formula
(1), with is a data matrix, and are orthonormal matrices, and is a diagonal
matrix formed from the -biggest eigenvalues:
where
is an approximation of
,
is derived from SVD factorization of
and
is
(with
is the -th eigenvalue and
is the balance parameter).</p>
        <p>
          In Table 1, two algorithms used in our study are presented. They are adapted from
SVD-Impute algorithm [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] and SOFT-Impute algorithm [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
In preliminary experiments, data of hurricane Katrina [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] is tested, with 6 locations in
Fig. 1, and 3 attributes per location and 4 attributes of the storm (lik e in Fig. 2) over
31 6-hr time-steps (31 rows and 22 columns). However, there are just 80 observed
entries of 6 locations (558 entries in total), together with 124 observed entries of the
hurricane, to form a sparse co-variate matrix (density ~29.9%). It is aggregated with a
31x6 matrix of current-hitting-triple and a 31x6 matrix of next-6-hr-hitting-triple to
1 The authors used the abbreviation name “RRPP”.
form a 31x34 training matrix (density ~25.8%). For testing, two observations of new
locations' attributes and current hitting statements are expanded as new rows (entries
of the next-6-hr-hitting-triple of each row are set to 0 (unknown)).
        </p>
        <p>This data is limited, but still meaningful for testing our idea. Results show that two
algorithms fill the missing entries with similar patterns of other similar training
observations. Both predict (0.1, 0.7, 0.1, 0.0, 0.0, 0.0) for expected pattern (0, 1, 0, 0, 0, 0)
and (0.0, -0.1, -0.2, 0.0, -0.1, 0.1) for (0, 0, 0, 0, 0, 0). This means that the pattern of
hurricane Katrina is reflected well, despite of missing information. In comparing root
mean square error and relative error of the training process, RRPR performs slightly
better than “Naive” SVD (6.687x10-2 and 2.245x10-3 comparing to 6.697x10-2 and
2.248x10-3, respectively). However, as the data is simple, the difference is very small,
and two algorithms result in the same predictions.
4</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Conclusion and Discussion</title>
      <p>Even though the tested data is limited and incomplete, the recognition of hurricane
Katrina's occurrence pattern indicates strongly that MC algorithms can be used for
forecasting storm damages. Moreover, using SUNS approach, other types of data in
SW can be used to investigate other types of disaster damages.</p>
      <p>So far, research on bridging SW resources and weather disasters prediction seems
to be undiscovered, and we can not find other similar works. In next stage, data of 5
storms will be combined for predicting storm damage index related to population of
some counties in the US, and we target to add new algorithm and heuristics for
improving SUNS approach. It can be applied for forecasting damages of different disaster
scenarios, which is very meaningful in the context of global climate change situation.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Hastie</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tibshirani</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sherlock</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eisen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brown</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Botstein</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          (
          <year>1999</year>
          ).
          <article-title>Imputing missing data for gene expression arrays</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tresp</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bundschus</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rettinger</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Kriegel</surname>
            ,
            <given-names>H. P.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Multivariate prediction for learning on the semantic web</article-title>
          .
          <source>In: Inductive Logic Programming</source>
          (pp.
          <fpage>92</fpage>
          -
          <lpage>104</lpage>
          ). Springer Berlin Heidelberg.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Jiang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nickel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Tresp</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>Combining information extraction, deductive reasoning and machine learning for relation prediction</article-title>
          .
          <source>In The Semantic Web: Research and Applications</source>
          (pp.
          <fpage>164</fpage>
          -
          <lpage>178</lpage>
          ). Springer Berlin Heidelberg.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Knabb</surname>
          </string-name>
          , R. D.,
          <string-name>
            <surname>Rhome</surname>
            ,
            <given-names>J. R.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Brown</surname>
          </string-name>
          , R. D. (
          <year>2005</year>
          ).
          <source>Tropical Cyclone Report: Hurricane Katrina. Technical Report</source>
          , NOAA National Hurricane Center,
          <fpage>23</fpage>
          -
          <lpage>30</lpage>
          August.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Mazumder</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hastie</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Tibshirani</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>Spectral regularization algorithms for learning large incomplete matrices</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>11</volume>
          , (
          <volume>2287</volume>
          -
          <fpage>2322</fpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Tresp</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bundschus</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Rettinger</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2009</year>
          ).
          <article-title>Materializing and querying learned knowledge</article-title>
          .
          <source>Proc. of IRMLeS</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. Zhang,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Duc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. M.</given-names>
            ,
            <surname>Corcho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            , &amp;
            <surname>Calbimonte</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. P.</surname>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>SRBench: A Streaming RDF/SPARQL Benchmark</article-title>
          .
          <source>In: The Semantic Web-ISWC</source>
          <year>2012</year>
          (pp.
          <fpage>641</fpage>
          -
          <lpage>657</lpage>
          ). Springer Berlin Heidelberg.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>