<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Generating Multidimensional Schemas from the Semantic Web</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Oscar Romero</string-name>
          <email>oromero@lsi.upc.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alberto Abelló</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universitat Politècnica de Catalunya Jordi Girona 1-3</institution>
          ,
          <addr-line>E-08034 Barcelona</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <fpage>69</fpage>
      <lpage>72</lpage>
      <abstract>
        <p>In this paper, we introduce a semi-automatable method aimed to find the business multidimensional concepts from an ontology representing the organization domain. With these premises, our approach falls into the Semantic Web research area, where ontologies play a key role to provide a common vocabulary describing the meaning of relevant terms and relationships among them.</p>
      </abstract>
      <kwd-group>
        <kwd>OLAP</kwd>
        <kwd>Multidimensional Design</kwd>
        <kwd>Ontologies</kwd>
        <kwd>Semantic Web</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>OLAP (On-line Analytical Processing ) tools are intended to ease information
analysis and navigation all through the business data previously integrated in
a huge repository of data, the Data Warehouse (DW), from a multidimensional
perspective. Despite traditional methodologies to design multidimensional DWs
are typically carried out manually by DW experts, a few works automatizing
the design of multidimensional databases have been presented in the last years.
However, all these approaches start from a detailed analysis of the data sources
to determine the multidimensional concepts in a reengineering process, as well as
all of them also assume to start from a relational OLTP (On-Line Transaction
Processing ) system.</p>
      <p>We introduce a semi-automatable method aimed to find the business
multidimensional concepts from an ontology representing the organization or business
domain. With these premises, our approach falls into the Semantic Web research
area. This approach raises new challenges with regard to traditional modeling
so that the multidimensional design process needs to be reconsidered. Mainly,
we can not provide the method with end-user requirements to guide the process,
since we are working over external (maybe unknown) data and, a priori, the user
does not know what kind of information will be available. Moreover, we can not
perform massive data mining over all existent instances due to a complexity
issue, nor assume data sources are implemented over relational databases as many
traditional methods do. In fact, we need to focus on the ontologies representing
the knowledge contained in those sites, and narrow and guide the DW design
process from knowledge captured in the ontologies, and at most, extract missing
knowledge by means of samples of data from some known sites.</p>
      <p>Section 2 discusses about the related work presented in the literature
underlining those automatable approaches. Section 3 sets the foundations of our
method that is presented in section 4. Finally, section 5 concludes this article.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        In the literature, partially automatized approaches to design multidimensional
DWs ([
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]) and those fully automatizing the process ([
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]) always start
from a thorough analysis of the relational sources to determine the
multidimensional concepts in a reengineering process. We would remark two main general
restrictions shared by all these methods not suiting them for the
multidimensional design over The Web: (1) they all work exclusively over relational sources,
and (2) they work with a table granularity. That is, each table in the relational
sources is determined to play a fact or a dimension role, overlooking their
attributes; and as discussed in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], a table, a relationship or even an attribute may
be playing a fact/dimension role. Hence, in these methods, attributes within
each table are considered as a whole, since they are not able to work with finer
granularities. Consequently, they need a certain degree of normalization in the
relational schema to work properly. Working with ontologies we will be able to
get rid of these two inherent restrictions.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Problem Context</title>
      <p>In this section we aim to define the context of the problem introduced and
point out those criteria our method is based on; that is, those criteria allowing
us to identify multidimensional concepts. Multidimensionality pays attention to
two main aspects; placement of data in a multidimensional space and correct
summarizability of data. Therefore, our method looks for meaningful conceptual
schemas with orthogonal Dimensions fully functionally determining Facts, and
free of summarizability problems.</p>
      <p>Bearing in mind that our method input would be an ontology, we also assume
the following premises: (1) the ontology is expressed in an ontology language
providing basic reasoning tools such as subsumption, allowing us to work with
taxonomies of concepts. For instance, OWL (Web Ontology Language), an W3C
recommendation, fits properly for our purposes. (2) We have a mapping among
the ontology concepts and the data sources. In the Semantic Web area this
mapping is supposed to exist and, for instance, preserving the concept names in
the implementation would be enough.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Our Method</title>
      <p>In this section we expose an schematic view of our method composed by three
well-differentiated steps (see figure 1):
First step: It looks for potential subjects of analysis (i.e. Facts). We consider
a concept to be a potential subject of analysis if it is related to as many
potential Dimensions (analysis perspectives) and Measures (factual data)
as possible. So that, our aim is twofold:
– Discover potential analysis Dimensions: According to
multidimensionality, each instance of data must be identified (i.e. placed in the
multidimensional space) by a point in each of its analysis Dimensions. In our
approach, to identify potential analysis Dimensions we look for
concepts being functionally determined by a given concept (the potential
Fact). To carry out this step we suggest to take advantage of the
reasoning services provided by ontology languages to automatically point
out Dimensions.
– Pointing out Measures: Typically, Measures are numeric facts
allowing data aggregation. We consider any numeric datatype to be a
Measure of a given Fact if it preserves a correct aggregation of data.
At the end of this step, we will ask the user to choose his/her subjects of
interest among those concepts proposed by the method as potential Facts (for
instance, in figure 1, the user would have disregarded the proposed Fact2 ).
The rest of steps will be carried out once per each subject of analysis
identified. Consequently, from each Fact, it will give rise to a multidimensional
conceptual schema.</p>
      <p>Second step: It points out sets of concepts likely to be used as Base for each
Fact identified in previous step. We call a Base to those minimal sets of
Levels fully functionally determining a Fact. Bases must contain
orthogonal (i.e. functionally independent) Dimensions, and a set of potential
Dimensions will be considered a feasible Base if they are able to identify
all the instances of a Fact. In a few words, we look for concepts being able
to univocally identify objects of analysis (i.e. to univocally place data in the
multidimensional space).</p>
      <p>Third step: In this step we give rise to Dimensions hierarchies in order to
allow summarizability of data; one of the multidimensionality principles. In
our approach, from every concept identified as Dimension, we conform their
hierarchies of Levels from those concepts related to them by typical
wholepart relationships (i.e. one-to-many relationships); or, as known in OLAP,
“Roll-up” relationships.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>In this paper we have introduced a semi-automated method to point out
multidimensional concepts from an ontology representing our business domain.
Unlike traditional approaches that work exclusively over relational sources, our
approach is able to integrate information from heterogeneous data sources that
describe their domain through ontologies. One of the most promising areas where
to apply our method is the Semantic Web.</p>
      <p>Acknowledgments. This work has been partly supported by the Spanish
Ministerio de Educación y Ciencia under project TIN 2005-05406.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Golfarelli</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maio</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rizzi</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>The Dimensional Fact Model: A Conceptual Model for Data Warehouses</article-title>
          .
          <source>Int. Journal of Cooperative Information Systems (IJCIS) 7</source>
          (
          <issue>2</issue>
          -3) (
          <year>1998</year>
          )
          <fpage>215</fpage>
          -
          <lpage>247</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Phipps</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Davis</surname>
            ,
            <given-names>K.C.</given-names>
          </string-name>
          :
          <article-title>Automating Data Warehouse Conceptual Schema Design and Evaluation</article-title>
          .
          <source>In: Proc. of 4th Int. Workshop on Design and Management of Data Warehouses (DMDW'02)</source>
          . Volume 58 of CEUR Workshop Proceedings., CEURWS.org (
          <year>2002</year>
          )
          <fpage>23</fpage>
          -
          <lpage>32</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Jensen</surname>
            ,
            <given-names>M.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Holmgren</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pedersen</surname>
          </string-name>
          , T.B.:
          <article-title>Discovering Multidimensional Structure in Relational Data</article-title>
          .
          <source>In: 6th Int. Conf. on Data Warehousing and Knowledge Discovery (DaWaK'04)</source>
          . Volume 3181 of LNCS., Springer (
          <year>2004</year>
          )
          <fpage>138</fpage>
          -
          <lpage>148</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Romero</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abelló</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Multidimensional Design by Examples</article-title>
          .
          <source>In: Proc. of 8th Int. Conf. on Data Warehousing and Knowledge Discovery (DaWaK</source>
          <year>2006</year>
          ).
          <article-title>Volume 4081 of LNCS</article-title>
          ., Springer (
          <year>2006</year>
          )
          <fpage>85</fpage>
          -
          <lpage>94</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Cabibbo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Torlone</surname>
            ,
            <given-names>R.:</given-names>
          </string-name>
          <article-title>A Logical Approach to Multidimensional Databases</article-title>
          .
          <source>In: Proc. of 6th Int. Conf. on Extending Database Technology (EDBT</source>
          <year>1998</year>
          ).
          <article-title>Volume 1377 of LNCS</article-title>
          ., Springer (
          <year>1998</year>
          )
          <fpage>183</fpage>
          -
          <lpage>197</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>