<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Using Semantic Web and Relational Learning in the Context of Risk Management</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Information Systems University of Jena Carl-Zeiss-Stra e 3</institution>
          ,
          <addr-line>07743 Jena</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The semantic web increasingly o ers information that can be useful for decisions in the context of risk management. The concept of this thesis is to perform research on relational learning techniques that are able to improve risk management through the incorporation of background information based on description logics of the semantic web.</p>
      </abstract>
      <kwd-group>
        <kwd>Semantic Web</kwd>
        <kwd>Relational Learning</kwd>
        <kwd>Risk Management</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Financial risk management is one prominent domain of risk managament (RM),
which ensures the functioning and stability of banking and insurance systems.
Naturally, a nancial RM analysis should adhere as much as possible relevant
information to evaluate an investment or portfolio of investments. The complex
net of relations between macroeconomic factors, sectors, companies, products,
services, people, geographical locations, nancial statements, news etc. make this
a very di cult task. Two main problems typically arise in such a situation, if one
wants to perform appropriate decisions. First, one needs appropriate data, and
second, one needs a quantitative methodology that utilizes a suitable underlying
representational framework for the data.</p>
      <p>
        Since the advent of the semantic web (SW) an increasing amount of machine
readable and "understandable" meta information can be automatically retrieved
from a vide variety of sources. Standards, such as RDF, RDF-S and OWL,
provide a formal logical way to specify shared vocabularies that can be used in
statements about resources. It is therefore interesting to analyse, which kind of
quantitative strategies are suitable to utilize the increasing amount of semantic
information in RM. How can risks be quanti ed based on available relational
information from the SW? Risk measures like value at risk (VaR) or expected
shortfall (ES)[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] are an important instrument to quantify risks, which are in
general based on a so called loss distribution, which is estimated for instance for
an investment or a portfolio of nancial positions. The value of such a portfolio at
time t is denoted by Vt. The value Vt of such a portfolio is a function Vt = f (t; Zt)
of the time and a vector of observable risk factors Zt (i.e. stock prices). Equation
1 de nes the loss at time t + 1 of an investment or portfolio.
      </p>
      <p>Lt+1 :=
(Vt+1</p>
      <p>Vt)
One can de ne the conditional loss distribution based on the available
information Ft that are observable at time t.</p>
      <p>
        FLt+1jFt (l) := P (Lt+1 6 ljFt) = P ( (Vt+1
Vt) 6 ljFt)
While in a typical setting the sigma eld Ft is based only on the historical
risk factor changes [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] (i.e. changes of the stock prices), this thesis proposes
to incorporate information from the semantic web into the quanti cation of the
loss distribution via relational learning algorithms, because the SW provides an
increasing free available amount of relational information and relational learning
algorithms are able to utilize the rich relational structures of the underlying
description logics of the SW. The measured risks should be more objective,
if more information (relations between entities) in the respective domain are
considered.
(1)
(2)
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Relational learning builds upon the solid theoretical foundations of machine
learning and knowledge representation [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Relational indicates that such
algorithms are able to adhere di erent entities and relationships among them.
There a two main research directions: Inductive Logic Programming (ILP) and
Statistical Relational Learning (SRL).
      </p>
      <p>
        ILP is concerned about the development of relational data mining algorithms
to perform (deterministic) inductive inference based on the observations of a
rst-order representation of the information [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. The propositional data mining
algorithms have been upgraded to its rst-order variants [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], with several
application scenarios [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. A prominent example are relational association rules and
relational decision and regression trees [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In general these algorithms are
deterministic, but they can have a probabilistic interpretation, such as relational
association rules.
      </p>
      <p>
        SRL performs research to "represent, reason and learn in domains with
complex relational and rich probabilistic structure "[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. There are logic and frame
based algorithms, with the logical ones tting naturally to semantic web
description logics. Neville and Jensen [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] point out that in relational data sets the
evidence of autocorrelation provides opportunity to improve the performance
of statistical relational models, because inferences about one object can inform
inferences about related objects [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], which is called collective inference.
Relational dependency networks (RDNs) [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] are a relational extension of dependency
networks that can represent and reason with cyclic dependencies and exploit
autocorrelations. RDNs has been successfully applied to fraud detection [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
      </p>
      <p>
        SRL approaches such as probabilistic relational models (PRMs) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], Bayesian
logic programs (BLPs) [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] attempt to model a probability distribution over a
set of relational interpretations. PRMs and BLPs extend Bayesian networks with
expressive relational representations. However, as discussed by Braz et al.[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ],
these solutions still perform inference mostly on propositional level, because
they instantiate propositional graphical models based on a given query. Braz et
al. outline that this propositional grounding can be computational expensive and
therefore motivate rst-order probabilistic inference, which is one of the current
important research topics. Since risk management is a complex domain with
large numbers of data, a simple reuse of existing tools is not suitable. Structure
and parameter learning has to be adapted to the present context. Furthermore,
most tools operate on rst-order logic and not on description logics, which are
in focus of this thesis.
      </p>
      <p>
        In the context of the semantic web, relational learning has also gained
interest. Initially the approaches had a focus on learning to create the semantic
web, but there is also increasing research on learning from the semantic web
[
        <xref ref-type="bibr" rid="ref19 ref20">19, 20</xref>
        ]. Research on ILP techniques in the SW is outlined in Lisi and Esposito
[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Rettinger et al. [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] give insight into statistical relational learning in the
semantic web. Several approaches also analyse clustering of data with background
information [
        <xref ref-type="bibr" rid="ref12 ref14 ref8">14, 12, 8</xref>
        ].
      </p>
      <p>
        In the context of risk management, there are a vide variety of (probabilistic)
quantitative methodologies. However, we concentrate here on the Bayesian
approaches in RM, because of their ability to incorporate quantitative and
qualitative data as well as their ability to provide useful estimates of model parameters
even when data is sparse. Bayesian networks, which are closely related to PLPs
and PRMs, are often used in risk management [
        <xref ref-type="bibr" rid="ref10 ref2">10, 2</xref>
        ], in particular they have
been also used to estimate the loss distribution of operational risks [
        <xref ref-type="bibr" rid="ref1 ref3 ref4">1, 4, 3</xref>
        ].
Despite their success, Bayesian networks are not able to model complex relational
domains appropriately [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Therefore it seems to be desirable to utilize
statistical relational learning to improve nancial risk management, which to the best
of the authors knowledge has been not reported so far.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Proposed Approach and Methodology</title>
      <p>The idea is to use a kind of statistical relational model to estimate the loss
distribution of an investment. This means that the di erent information (entities and
relations), which are represented i.e. in RDF-S or OWL, are used to learn the
structure and the parameters of this model, which is itself used to estimate the
distribution of risk factor changes such as stock price movements. Several
characteristics are particularly desirable to utilize a relational dependency network
as the model. On the one hand, autocorrelation between stocks etc. in nancial
markets is present. RDNs have the ability to e ciently represent cycles, which
facilitates reasoning with autocorrelation. On the other hand, the large amount
of data in this domain requires also an e cient approach to learning. RDNs are
an approximate representation of the joint distribution, which leads to signi cant
e ciency gains.</p>
      <p>There are di erent criteria of success for the proposed approach. First, it is
important to utilize a suitable framework that combines probabilistic and logic
in the complex domain of nancial risk management. Much research has been
done in other application scenarios, which resulted in di erent approaches such
as PRMs, BLPs, RDNs etc. Suitability of such a framework depends on the type
of patterns that can be covered, computational e ectiveness of structure and
parameter learning as well as inference. Due to this, the author will compare
di erent statistical relational models in this domain according to the criteria.</p>
      <p>Second, it will be important to gather appropriate data from di erent sources
to evaluate the approach. The author will employ a quantitative experimentation
of the approach, based on di erent real world as well as arti cial data sets. In
particular, arti cial data sets are important, because the distribution of the
random variables is known in advance, and therefore the prototype algorithms
can be evaluated according to the found patterns. Furthermore, complexity of the
arti cial data set can be increased in a stepwise process. Real world data will be
based on data from the open semantic web 1, such as geographical data, company
and product information as well as other sources such as stock market data,
news and nancial statements 2 that can be transformed through schemas into
needed logical formalisms. The thesis should demonstrate the feasibility as well
as performance of the approach in comparison to the existing RM approaches
on real world data.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>This thesis proposes an original approach to quantitative risk management that
should overcome the methodical limitations of widely used propositional learning
approaches through the application and enhancement of statistical relational
learning. The approach utilizes a representational framework for information
based on semantic web standards, because an increasing amount of relevant
information in this domain is exposed to the semantic web. The semantic web
standards utilize description logic a subset of rst-order logic and are therefore
a suitable framework for relational learning techniques.
1 http://linkeddata.org/
2 http://www2.reuters.com/productinfo/</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>V.</given-names>
            <surname>Aquaro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bardoscia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bellotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Consiglio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>De Carlo</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Ferri</surname>
          </string-name>
          .
          <article-title>A Bayesian Networks approach to Operational Risk</article-title>
          .
          <source>Physica A: Statistical Mechanics and its Applications</source>
          ,
          <volume>389</volume>
          (
          <issue>15</issue>
          ):
          <volume>1721</volume>
          {
          <fpage>1728</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>C.E.</given-names>
            <surname>Bonafede</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Giudici</surname>
          </string-name>
          .
          <article-title>Bayesian Networks for enterprise risk assessment</article-title>
          .
          <source>Physica A: Statistical Mechanics and its Applications</source>
          ,
          <volume>382</volume>
          (
          <issue>1</issue>
          ):
          <volume>22</volume>
          {
          <fpage>28</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Chiara</given-names>
            <surname>Cornalba</surname>
          </string-name>
          and
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Giudici</surname>
          </string-name>
          .
          <article-title>Statistical models for operational risk management</article-title>
          .
          <source>Physica A: Statistical Mechanics and its Applications</source>
          ,
          <volume>338</volume>
          (
          <issue>1-2</issue>
          ):
          <volume>166</volume>
          {
          <fpage>172</fpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>L. Dalla</given-names>
            <surname>Valle</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Giudici</surname>
          </string-name>
          .
          <article-title>A bayesian approach to estimate the marginal loss distributions in operational risk management</article-title>
          .
          <source>Computational Statistics &amp; Data Analysis</source>
          ,
          <volume>52</volume>
          (
          <issue>6</issue>
          ):
          <volume>3107</volume>
          {
          <fpage>3127</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Rodrigo de Salvo Braz</surname>
            , Eyal Amir, and
            <given-names>Dan</given-names>
          </string-name>
          <string-name>
            <surname>Roth</surname>
          </string-name>
          .
          <article-title>Lifted First-Order Probabilistic Inference</article-title>
          .
          <source>In Lise Getoor and Ben Taskar</source>
          , editors,
          <article-title>Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning)</article-title>
          . The MIT Press,
          <year>11 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6. Saso Dzeroski.
          <article-title>Inductive Logic Programming in a Nutshell. In Lise Getoor</article-title>
          and Ben Taskar, editors,
          <article-title>Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning)</article-title>
          . The MIT Press,
          <year>11 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. Saso Dzeroski and Nada Lavrac, editors.
          <source>Relational Data Mining</source>
          . Springer,
          <volume>1</volume>
          <fpage>edition</fpage>
          ,
          <year>October 2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Nicola</given-names>
            <surname>Fanizzi</surname>
          </string-name>
          , Claudia d'Amato,
          <string-name>
            <given-names>and Floriana</given-names>
            <surname>Esposito</surname>
          </string-name>
          .
          <article-title>Conceptual Clustering and Its Application to Concept Drift and Novelty Detection</article-title>
          . In Sean Bechhofer, Manfred Hauswirth, Jorg Ho mann, and Manolis Koubarakis, editors,
          <source>ESWC</source>
          <year>2008</year>
          , volume
          <volume>5021</volume>
          of Lecture Notes in Computer Science, pages
          <volume>318</volume>
          {
          <fpage>332</fpage>
          . Springer, June 1{
          <issue>5</issue>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Nir</given-names>
            <surname>Friedman</surname>
          </string-name>
          , Lise Getoor, Daphne Koller, and
          <article-title>Avi Pfe er. Learning probabilistic relational models</article-title>
          .
          <source>In IJCAI</source>
          , pages
          <volume>1300</volume>
          {
          <fpage>1309</fpage>
          . Morgan Kaufmann,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>Jozef</given-names>
            <surname>Gemela</surname>
          </string-name>
          .
          <article-title>Learning bayesian networks using various datasources and applications to nancial analysis</article-title>
          .
          <source>Soft Computing</source>
          ,
          <volume>7</volume>
          (
          <issue>5</issue>
          ):
          <volume>297</volume>
          {
          <fpage>303</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>Lise</given-names>
            <surname>Getoor</surname>
          </string-name>
          and Ben Taskar, editors.
          <article-title>Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning)</article-title>
          . The MIT Press,
          <year>11 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12. Gunnar Aastrand Grimnes,
          <string-name>
            <given-names>Peter</given-names>
            <surname>Edwards</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Alun D.</given-names>
            <surname>Preece</surname>
          </string-name>
          .
          <article-title>Instance Based Clustering of Semantic Web Resources</article-title>
          . In Sean Bechhofer, Manfred Hauswirth, Jorg Ho mann, and Manolis Koubarakis, editors,
          <source>ESWC</source>
          <year>2008</year>
          , volume
          <volume>5021</volume>
          of Lecture Notes in Computer Science, pages
          <volume>303</volume>
          {
          <fpage>317</fpage>
          . Springer, June 1{
          <issue>5</issue>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Francesca</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Lisi</surname>
            and
            <given-names>Floriana</given-names>
          </string-name>
          <string-name>
            <surname>Esposito</surname>
          </string-name>
          .
          <article-title>An ILP Perspective on the Semantic Web</article-title>
          .
          <source>In SWAP</source>
          , volume
          <volume>166</volume>
          <source>of CEUR Workshop Proceedings. CEUR-WS.org</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>Alexander</given-names>
            <surname>Maedche</surname>
          </string-name>
          and
          <string-name>
            <given-names>Valentin</given-names>
            <surname>Zacharias</surname>
          </string-name>
          .
          <article-title>Clustering Ontology-Based Metadata in the Semantic Web</article-title>
          . In Tapio Elomaa, Heikki Mannila, and Hannu Toivonen, editors,
          <source>PKDD</source>
          , volume
          <volume>2431</volume>
          of Lecture Notes in Computer Science, pages
          <volume>348</volume>
          {
          <fpage>360</fpage>
          . Springer, August
          <volume>19</volume>
          {
          <fpage>23</fpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Alexander J. McNeil</surname>
          </string-name>
          ,
          <string-name>
            <surname>Rudiger Frey</surname>
            , and
            <given-names>Paul</given-names>
          </string-name>
          <string-name>
            <surname>Embrechts</surname>
          </string-name>
          .
          <source>Quantitative Risk Management: Concepts</source>
          ,
          <string-name>
            <surname>Techniques</surname>
          </string-name>
          , and Tools. Princeton University Press,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <given-names>Jenniver</given-names>
            <surname>Neville</surname>
          </string-name>
          and
          <string-name>
            <given-names>David</given-names>
            <surname>Jensen</surname>
          </string-name>
          .
          <article-title>Relational Dependency Networks</article-title>
          .
          <source>In Lise Getoor and Ben Taskar</source>
          , editors,
          <article-title>Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning)</article-title>
          . The MIT Press,
          <year>11 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17. Luc De Raedt.
          <source>Logical and Relational Learning (Cognitive Technologies)</source>
          . Springer,
          <volume>1</volume>
          <fpage>edition</fpage>
          ,
          <year>October 2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Achim</surname>
            <given-names>Rettinger</given-names>
          </string-name>
          , Matthias Nickles, and
          <string-name>
            <given-names>Volker</given-names>
            <surname>Tresp</surname>
          </string-name>
          .
          <article-title>Statistical Relational Learning with Formal Ontologies</article-title>
          .
          <source>In ECML/PKDD</source>
          , volume
          <volume>5782</volume>
          of Lecture Notes in Computer Science, pages
          <volume>286</volume>
          {
          <fpage>301</fpage>
          . Springer,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Gerd</surname>
            <given-names>Stumme</given-names>
          </string-name>
          , Andreas Hotho, and
          <string-name>
            <given-names>Bettina</given-names>
            <surname>Berendt</surname>
          </string-name>
          .
          <article-title>Semantic Web Mining - State of the Art and Future Directions</article-title>
          .
          <source>Journal of Web Semantics</source>
          ,
          <volume>4</volume>
          (
          <issue>2</issue>
          ):
          <volume>124</volume>
          {
          <fpage>143</fpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Volker</surname>
            <given-names>Tresp</given-names>
          </string-name>
          , Markus Bundschus, Achim Rettinger, and
          <string-name>
            <given-names>Yi</given-names>
            <surname>Huang</surname>
          </string-name>
          .
          <source>Towards Machine Learning on the Semantic Web. In Uncertainty Reasoning for the Semantic Web I, volume 5327 of Lecture Notes in Computer Science</source>
          , pages
          <volume>282</volume>
          {
          <fpage>314</fpage>
          . Springer,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>