<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>MMBR: a report-driven approach for the design of multidimensional models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Antonia Azzini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefania Marrara</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Maurino</string-name>
          <email>maurino@disco.unimib.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Amir Topalovic</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Consorzio per il Trasferimento Tecnologico</institution>
          ,
          <addr-line>C2T, Milano</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Universita degli studi di Milano Bicocca Dipartiment of Informatics</institution>
          ,
          <addr-line>Systemistics and Communication Milano</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <fpage>83</fpage>
      <lpage>97</lpage>
      <abstract>
        <p>Nowadays, large organizations and regulated markets are subject to the control activity of external audit associations that require huge amounts of information to be submitted in the form of prede ned and rigidly structured reports. Compiling these reports requires one to extract, transform and integrate data from several heterogeneous operational databases. This task is usually performed by developing a di erent ad-hoc and complex software for each report. Another solution involves the adoption of a data warehouse and related tools, which are today wellestablished technologies. Unfortunately, the data warehousing process is notoriously long and error-prone, therefore it is particularly ine cient when the output of the data warehouse is a limited number of reports. This article presents MMBR, an approach able to generate a multidimensional model starting from the structure of the reports expected as output of the data warehouse. The approach is able to generate the multidimensional model, and to populate the data warehouse by de ning a domain-speci c knowledge base. Even if using semantic information in data warehousing is not new, the novel contribution of our approach is the idea to simplify the design phase of the data warehouse, and make it more e cient, by using a domain speci c knowledge base and a reportdriven approach.</p>
      </abstract>
      <kwd-group>
        <kwd>multidimensional design</kwd>
        <kwd>knowledge base</kwd>
        <kwd>report driven methodology</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Reporting is a fundamental part of the business intelligence and knowledge
management activity and it is strongly required by audit organizations. Reporting
activity can be realized in an ad hoc way by means of speci c and complex
softwares, or by involving typical operations of extracting, transforming, and
loading (ETL) procedures in coordination with a data warehouse. A data
warehouse essentially combines information from several heterogeneous sources into
one comprehensive database. By combining all of this information in one place,
a company can analyze its data in a more holistic way, ensuring that it has
considered all the information available. At the basis of a data warehouse lies the
concept of multidimensional (MD) conceptual view of data. The main
characteristics of the multidimensional conceptual view of data is the fact/dimension
dichotomy, which represents the data in an n-dimensional space. This
representation facilitates the data interpretation and analysis in terms of facts (the
subjects of analysis and related measures) and dimensions that represent the
di erent perspectives from which a certain object can be analyzed.</p>
      <p>
        Even if data warehousing bene ts are well recognized by enterprises, it is
well known that the warehousing process is time consuming, complex and error
prone. Today the increasing reduction of the time-to-market of products forces
enterprises to dramatically cut down the time devoted to the design ad the
development of MD models that support the evaluation of the key performance
indicators of services and products. Securitization is known by the literature
as the nancial practice of pooling various types of contractual debt such as
residential mortgages, commercial mortgages, auto loans or credit card debt
obligations (or other non-debt assets which generate receivables) and selling
their related cash ows to third party investors as securities [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Mortgage-backed
securities, which are the case study presented in this paper, are a perfect example
of securitization.
      </p>
      <p>Reports for auditing are often very speci c, and their structure is usually
imposed by the supervising organizations (e.g. Europan Central Bank, or the
rating agency Moodys). The data included in the report are, in most cases, not
useful for decision making activities due to the \control" nature of these reports.
As a consequence, companies are forced to develop complex systems to compute
data that are not useful for their business activities. In this situation, there is
the need to develop a new approach able to support, in a fast and e cient way,
the generation of reports. In this scenario we propose to adopt a data warehouse
as storage system for data, but we introduce a new approach aimed at designing
the multidimensional models on the basis of the structure of the report itself
in a (semi-)automatic way, in order to signi cantly reduce the time needed to
produce the report.</p>
      <p>The MMBR (multi dimensional model by report) approach is able to
automatically create the structure of a multi dimensional model (MD in the follow)
and ll it on the basis of a knowledge base enriched with mapping information
that depend on the speci c application context. The preprocessing phase of the
report (often a raw Excel le) is based on a table identi cation algorithm, which
is able to extract the information needed to de ne the MD structure of the data
warehouse. The approach has been tested in the context of nancial data with
the aim to automatically create the reports required by the Italian National Bank
and by the European central bank. The methodology supports the creation of
multidimensional model able to produce a given (set of) report(s). The term
\by report" refers to the capability of our solution to create a multidimensional
model starting from a given report that must to be lled with real data. MMBR
is also able to generate the relational data structure related to the created Md
and it is also in charge of lling both fact and dimensional tables thanks to the
use of domain ontologies enriched with mapping information to the operational
sources. In the literature there are many methodologies for creating MDs starting
by requirements, but this is the rst attempt to de ne an approach for creating
a MD model starting directly from the structure of the nal reports only.</p>
      <p>The remaining of the paper is organized as follows: Section 2 introduces
the state of the art. Section 3 presents the proposed approach, while Section 4
describes the knowledge base that is a key element in the MMBR methodology.
In Section 5, the table identi cation algorithm is presented, while Section 6
describes the creation of the MD models. A real example taken from the nancial
domain is then reported in Section 7. Conclusions and nal remarks are reported
in Section 8.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        In the literature several approaches for creating conceptual MD schema from
heterogeneous data sources have been presented. According to [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], these approaches
can be classi ed into three broad groups:
{ Supply-driven: starting from a detailed analysis of the data sources these
techniques try to determine the MD concepts. By this way there is the risk
to waste resources by specifying unnecessary information structures, and by
not being able to really involve data warehouse users. See for instance [3{5].
{ Demand-driven: These approaches focus on determining the MD
requirements based on an end-user point of view (as typically performed by other
information systems), and mapping them to data sources in a subsequent
step (see for example [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ]).
{ Hybrid approaches: Some authors (see for example [8{10]) propose to
combine the two previously presented approaches in order to harmonize, in the
design of the data warehouse, the data sources information with the end-user
requirements.
      </p>
      <p>
        All the methodologies available in literature, however, have the goal to create a
MD model as general as possible in order to allow the generation of any report.
This assumption requires a lot of e ort in both the warehouse conceptualization
phase and in the ETL procedure design and development. In several industrial
contexts, there is the need to produce a limited number of reports only and,
sometimes, with a very strict and well de ned structure due to auditing rules
or for speci c business requirements. In the nance domain, for example, banks
are required by central authorities and rating agencies to produce very speci c
reports related to the securization activities they perform. In the eld of the
Semantic Web, Bontcheva and colleague [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] present an approach for the
automatic generation of reports from domain ontologies encoded in Semantic Web
standards like OWL. The novel aspects of their so-called \MIAKT generator"
are in the use of the ontology, mainly the property hierarchy, in order to make
it easier to connect a generator to a new domain ontology.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] Nebot and colleagues propose an approach in which a Semantic Data
Warehouse is considered as a repository of ontologies and other semantically
annotated data resources. Then, they propose an ontology-driven framework to
design multidimensional analysis models for Semantic Data Warehouses. This
framework provides means for building an integrated ontology, called the
Multidimensional Integrated Ontology (MIO), including the classes, relationships and
instances representing the analysis developed over dimensions and measures.
Romero and colleague [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] introduce a user-centered approach to support the
end-user requirements elicitation and the data warehouse multidimensional
design tasks. The authors explain how the feedback of a user is needed to lter and
shape results obtained from analyzing the sources, and eventually produce the
desired conceptual schema. In this scenario, they de ne the AMDO
(Automating Multidimensional Design from Ontologies) method, aimed at discovering the
multidimensional knowledge contained in the data sources regardless of the users
requirements. Another work aimed at supporting the multidimensional schema
design is given by [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], in which the authors propose an extension of their
previous work [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. They follow a hybrid methodology where the data source and the
end-user requirements are conciliated at the early stage of the design process, by
deriving only the entities that are of interest for the analysis. The requirements
are converted from natural language text into a logical format. The concepts
in each requirement are matched to the source ontology and tagged. Then, the
multidimensional elements such as fact and dimensions are automatically derived
using reasoning.
      </p>
      <p>
        On the other hand, Benslimane and colleague [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] de ne a contextual
ontology as an explicit speci cation of a conceptualization, while Barkat [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] proposes
a complete and comprehensive methodology to design multi-contextual semantic
data warehouses. This contribution is aimed to provide a context meta model
(language) that uni es the de nitions provided in Database literature. This
language is considered as an extension of OWL, which is the standard proposed
by the W3C Consortium [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] to de ne ontologies. It is de ned by the authors
in order to provide a contextual de nition of the used concepts, by o ering an
externalization of the context from the ontology side.
      </p>
      <p>
        Pardillo and colleagues [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] present an interesting approach aimed at
describing several shortcomings of the current data warehouse design approaches,
showing the bene ts of using ontologies to overcome them. This work is a
starting point for discussing the convenience of using ontologies in the data warehouse
design. In particular the authors present a set of situations in which ontologies
may help data warehouse designers with respect to some critical aspects.
      </p>
      <p>As also considered in this approach, it is important to underline that a
domain speci c ontological knowledge allows to enrich a multidimensional model
in aspects that have not been taken into account during the requirement analysis
or data-source alignment phases, as well as other aspects, like for example the
application of statistic functions in order to aggregate data.</p>
    </sec>
    <sec id="sec-3">
      <title>Description of the approach and outline of the architecture</title>
      <p>The MMBR approach main phases are shown in Figure 1: 1) Table Processing
(TP), 2) Row and Column Header Identi cation and Extraction (RCHIE), 3)
Ontology Annotation (OA), 4) Management of Non-Identi ed labels (MNL), 5)
creation of the MD model, and 6) ETL Schema Generation (ETL). The input
of the TP phase is the template le that has to be lled with the data extracted
from an Operational Data Base (ODB). In the TP phase the preprocessing of the
template is performed by removing icons and other gures, moreover all terms
in the schema are lowered and comment and description elds are removed.</p>
      <p>The RCHIE phase is based on the table identi cation algorithm aimed at
identifying and extracting the row and column headers in the template. The
details of the table identi cation algorithm are presented in Section 5.</p>
      <p>The list of terms recognized in the reports by the table identi cation
algorithm is then annotated on the basis of a knowledge base (see Section 4). This
phase produces two lists; the rst one is the list of identi ed terms annotated
w.r.t. the knowledge base, the second one is the list of terms that are not
annotated. There are several possible reasons of failure for the annotation activity.
The most frequent reason is that a given term may be not included in the
knowledge base because it is not relevant to the domain (e.g \Total"). It is also possible
that a term is not annotated because it is a composition of di erent terms (such
as \MortageLoan" or \DelinquentLoan")3. Moreover some terms are written in
a language di erent from English (e.g. \garantito" that means guaranteeded in
Italian). In all these cases, not annotated terms are manually checked and, if
relevant, added to the ontology by de ning the corresponding rdf:label property.
The annotated list of terms is the input for the creation of the dimensional fact
model (see Section 6). This logical model is nally translated into a relational
star schema. In this phase the relational database is lled with data coming from
the ODB. This activity is performed on the basis of the mapping rules included
in the knowledge base. This activity is fully described in Section 4.</p>
      <p>The architecture supporting the MMBR approach is represented in Figure 2.
3 the description of these terms is reported in the case study section</p>
      <p>The Annotation Editor is in charge of the rst three phases of the MMBR
approach, by removing non relevant strings and images from the input le (e.g.
logo, comments), and by identifying the terms that are annotated w.r.t. the KB.
The Schema builder is the software component aimed at creating the logical
relational description of the MD model. The ETL generator is in charge of
extracting, on the basis of the knowledge base, the information necessary to create
the extraction-transformation-load data from the ODB to the data warehouse.
Then, the Knowledge base manager is in charge of managing and evolving the
knowledge base. Any popular tool as, for instance, Protege4 may be used for the
KB creation.
4</p>
    </sec>
    <sec id="sec-4">
      <title>MMBR Knowledge base</title>
      <p>At the core of the proposed approach lies the creation of the knowledge base
KB, which includes:
{ the set of MD concepts and relations (fact, dimensions, measures, attributes);
{ the list of terms adopted in the speci c application domain (eg. ecommerce,
bank securitization,...);
{ the ODB schema.</p>
      <p>In order to create a sharable knowledge base we started by using existing
ontological description and only when no ontology is available we created new concepts.
A simpli ed version of the data cube vocabulary5, i.e., a W3C recommendation
for modeling multidimensional data, is used to de ne the MD concepts. The top
level representation of the de ned KB is shown in Figure 3.</p>
      <sec id="sec-4-1">
        <title>4 https://protege.stanford.edu/</title>
      </sec>
      <sec id="sec-4-2">
        <title>5 https://www.w3.org/TR/vocab-data-cube/</title>
        <p>The MD concepts are organized as follows. A fact (the event that is the target
of a report, e.g., a sell in a e-commerce domain, a loan in the bank domain) is
described by a set of measures and can by analyzed by considering its dimension
and descriptive attributes. In the data cube vocabulary dimensions, measures
and descriptive attributes are described by the concept component properties
instances. Dimensions, measures and descriptive attributes are terms of the
application domain and they are de ned by the human (domain) expert trough the
Knowledge Base. In fact, the KB annotation speci es if a KB component refers
to a fact, a measure or to a dimension. Such elements are then compared with
each label extracted from the Excel le in order to de ne fact, measures and
dimensions of the corresponding model. In order to build a KB related to the
e-commerce domain, it is possible, for example, to use concepts described in the
good relation section of the vocabulary 6. In this scenario instances of
DimensionProperties are gr:ProductOrService, gr:Brand, while instances of dq:Measure
are gr:UnitPriceSpeci cation, gr:amountOfThisGood. If no vocabulary is
available, a new, ad-hoc vocabulary, has to be de ned as rst (as also reported in
Section 7).</p>
        <p>Concept qd:ComponentProperty can have one or more rdf:label properties
associated to, that represent the references to the instances of the target concept.
For example the dimension gr:Brand may be labeled as "NameOfProduct" or
"BrandName". During the annotation phase, labels are used to associate terms
of the report to the application domain concepts.</p>
        <p>In order to populate the MD model it is necessary to know how the
qb:componentProperties are described in the operational DB. This mapping
is described in the KB itself, by means of the c2t:mappingRule concept,
which associates a c2t:mappingFormula related to a given instance of the
qb:ComponentProperties. The c2t:mappingFormula contains a reference to some
tables of the ODB and a query predicate over their tuples.</p>
        <p>For example, in a bank scenario we can assume that the TLoan table of the
ODB contains all information related to loans. A loan with a xed rate (i.e.,
a loan where the interest rate on the note remains the same through the term
of the loan) can be represented in the ODB by the predicate InterestRate=1,
while a oating rate can be described by the predicate InterestRate&gt;1. The</p>
      </sec>
      <sec id="sec-4-3">
        <title>6 http://www.heppnetz.de/projects/goodrelations/</title>
        <p>formula c2t:mappingFormula includes the references to the TLoan table and the
predicate regarding InterestRate.</p>
        <p>The concept c2t:context in Figure 3 has value when reports provided by
di erent audit authorities have di erent mapping formulas for the dimension
dq:componentProperties. For example, a given audit authority may classify a
company as "small" if the employee number does not reach 10, while for another
authority a company is small if it has less than 15 people employed. In this case
we will have two di erent c2t:MappingFormula.
5
Reports are usually represented by tables that can be divided into di erent areas,
according to their structure. Thus, being able to identify the inner structure of
the table is important to nd the concepts relevant to the MD models generation.
As discussed in the introduction, the multidimensional model represents the data
into a n-dimensional space; under this perspective each report can be considered
as one of the possible hyperplane slicing the n-dimensional cube of data. To
represent this hyperplane into a bi dimensional table it is necessary to reduce
the dimensions. In gure 4 the MD is composed by three dimensions (time,
nations and type of sold goods) that are \ attened" into a bi-dimensional space
by associating the values of type of sold goods (Food and non Food) to the
nation dimension. According to this assumption row and columns header may
contain dimensions, values of dimensions and measures of the MD.</p>
        <p>In the RCHIE phase a table was assumed as composed by three types of cell;
respectively textual, data and schema ones. Figure 5 shows the general schema.
The cell identi ers are represented by the couple &lt; X; Y &gt;, as reported in the
table shown in the gure.</p>
        <p>The table may contain several types of cells, as de ned in the following:
{ textual-cell: this cell is not used for table annotation, these cells are shown
in grey in Figure 5, and they may contain simple text.
{ data-cell: it contains data that are computed on the basis of the MD model.</p>
        <p>These cells are shown in white in the gure.
{ schema-cell: it speci es properties over a set of data-cells. It is shown in
dark grey in the gure. This cell de nes the header h =&lt; x; y &gt; of a set of
data cells, by specifying some semantic aspects (i.e., the measure or a value
on a dimension).</p>
        <p>Rows and columns are identi ed in order to extract the labels corresponding,
respectively, to measures, dimensions, instances of the dimensions, etc. (for
instance not relevant information as the T OT AL value shown in Figure 6). These
labels represent the input of the annotation phase, which produces the annotated
list of terms as output.</p>
        <p>
          In the literature di erent table identi cation algorithms aimed at handling
the tables structure have been proposed [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]; in our work the focus is identifying
and removing multi spanning cells. An example is reported in Figure 6, where
one of the reports related to securitization is shown. The Stub Header details
information w.r.t. the measures Loan and Oustanding Principal of di erent types
of companies, as Corporate, SME and "Impresa" (it refers to retail companies in
the Italian jargon). Measures, names and instances of dimensions are placed in
the Box Header and/or the Stub areas as headers, and they are used to index the
elements located in the Body area of the table. The Stub Header may also contain
a header naming or describing the dimensions located in the stub. Results of table
identi cation algorithm is shown in gure 7 where all data-cell are semantically
associated to their row and column headers.
        </p>
        <p>Finally, the RCHIE phase extracts a list of unique terms that are in the
column and row headers. These terms are then annotated by means of the
knowledge base, by evaluating the labels related to the application domain concepts
and the terms extracted from the report table.
6</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>MD creation and population</title>
      <p>
        The list of annotated terms and the KB are the only two elements needed to
design and populate the MD. The Dimensional Fact Model (DFM)[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] approach
is used to describe the MD model. Each annotated term of the list is enriched by
its type or subclass in order to understand if it is a measure, a dimension or an
instance of dimension. This can be realized by means of a set of SPARQL7 queries
over the KB (an example of query is shown in Section 7). With this information
is possible to create the DFM and the corresponding logical relational schema
by means of the original methodology proposed in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The relational schema is
then populated according to the mapping information de ned in the knowledge
base.
      </p>
      <p>All dimensional tables are populated with the instances de ned in the KB,
while the fact table is de ned in a two steps procedure. In the rst step all
instances of the facts (e.g. sell or loan) are selected from the ODB by taking
into account only the measures available in the annotated list. The second step
is in charge of connecting the fact table with the dimensional tables. An Update
7 https://www.w3.org/TR/rdf-sparql-query/
query is executed to associate each instance of the fact table with the instances
of the dimensions tables. Even in this case the KB plays a strategic role since it
allows to extract the mapping formula at the basis of the SPARQL queries (see
section 7).
7</p>
    </sec>
    <sec id="sec-6">
      <title>Case Study</title>
      <p>The scenario motivating the de nition of a report driven approach for the design
of multidimensional models is related to the nancial domain. In particular, the
reporting activity of securitization was analyzed.</p>
      <p>Applying the MMBR approach in this context, the rst activity performed is
the generation of the domain KB and vocabulary. The Literature has proposed
two di erent vocabularies that partially describe the loan domain: FIBO 8 and
Schema.org 9. FIBO, a Financial Industry Business Ontology, contains the loan
terms de nitions without any further speci cation. Schema.org, does not contain
a full exhaustive speci cation of the securitization domain, but it includes the
LoanOrCredit concepts10 only. The KB de ned in this work to describe the
securitization domain is an ontology called OntoLoan. During the KB de nition,
domain experts were in charge to de ne the main terms and concepts. OntoLoan
ontology is not freely available, since it is covered by the company's intellectual
properties. However, the top level of OntoLoan is shown in Figure 8.</p>
      <p>Figure 9 shows an example of securization report. Note that all personal data
related to the bank owning the report are removed for privacy issues, while the
values for di erent kinds of loans are reported.</p>
      <p>The term Performing Loans refers to all those loans with no overdue interest
payments, or with unpaid installments due, even if under the limit on the delay of
days outstanding (which changes according to the securitization contract terms).
Delinquent Loan refers to the loans close to default, i.e., to unpaid installments
due to a delay in payments close to the limit on the delay of days overdue.
Defaulted Loans refers then to loans with signi cant delays in payments.</p>
      <p>Any kind of loan is further divided according to other features, generating
the de nition of Mortgage Loan, Guaranteed Loan, i.e. loans insured not by
mortgages but by other guarantees (e.g., pledges), and, nally, Unguaranteed
Loan, i.e. not insured.</p>
      <p>The rst phase of the MMBR approach removes text elds that do not carry
relevant information from the report. An example of removed test is the string
\A. PORTFOLIO OUTSTANDING BALANCE". The annotation tool removes
the cell spanning starting from the table of Figure 9, arriving to the table
structure shown in Figure 10. Data-cell in position &lt; 3; 3 &gt; represents the aggregation
of the values of Outstanding Principal of loans that are both performing and able
to pay o the loan even in case of default of the borrower. The value in the cell</p>
      <sec id="sec-6-1">
        <title>8 https://www.edmcouncil.org/financialbusiness</title>
      </sec>
      <sec id="sec-6-2">
        <title>9 https://schema.org 10 https://schema.org/LoanOrCredit</title>
        <p>with position &lt; 3; 4 &gt; represents the aggregation of the Outstanding Principal
of loans that are both performing and guaranteed.</p>
        <p>With this rst activity the following list of terms related
to the domain is extracted loan:Performing, loan:Mortgage,
loan:Guaranteed, loan:Unguaranteed, loan:Delinquent, loan:Defaulted,
loan:DelinquentInstalments, loan:OutstandingPrincipal, loan:AccruedInterest,
loan:PrincipalInstalment, loan:InterestInstalment.. For each element of such
list, MMBR retrieves from the KB the name of the dimensions or measures
related to it, by means of Sparql queries. An example of query is the following.
SELECT distinct ?x, ?p
WHERE {
loan:Guarantee rdf:type ?x.
?x rdfs:subClassOf ?p
}</p>
        <p>
          The example query is able to recognize, as shown in Figure 8 that
loan:Guarantee is member of an entity named Guarantee Category that is a
subclass of qb:DimensionProperty. Figure 8 also shows the query properties. After
the identi cation of measures and dimension the DFM is designed as shown in
gure 11, according to the approaches already published in the literature [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]
for the schema de nition.
        </p>
        <p>The DFM is then translated into a relational schema, whose instance is
created in a relational dbms as described in section 6.</p>
        <p>In order to update the fact table, it is possible to retrieve the mapping formula
in the KB, by means of a SPARQL query. For example to update the guaranteed
loan, rst we recover from the KB the corresponding mapping formula by using
the following query:
SELECT ?table, ?rule
WHERE {
?s rdf:type loan:MappingRule.
?s loan:hasContext loan:context1.
?s loan:hasTargetDimension loan:Guarantee.
?s loan:refersToTable ?table.</p>
        <p>?s loan:hasMappingFormula ?rule.
}</p>
        <sec id="sec-6-2-1">
          <title>The result is the following predicate:</title>
          <p>TLoan
VAL_IPOTECA = 0 and (flag_garanzia_confidi='Y' or
(importo_pegno + importo_garan_pers) &gt; 0)^^string</p>
        </sec>
        <sec id="sec-6-2-2">
          <title>The corresponding update query using IBM DB2 SQL is:</title>
          <p>UPDATE Fact
SET id_Guarantee_category=
(SELECT Guarantee</p>
          <p>FROM fact join odb.TLoan
WHERE fact.id=obd.TLoan.id and
VAL_IPOTECA = 0 and
(flag_garanzia_confidi='Y' or (importo_pegno + importo_garan_pers) &gt; 0)
)
8</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Conclusion and Future Work</title>
      <p>This work presents a \multidimensional model by report" (MMBR) approach
supporting the creation of multidimensional models able to produce a given (set
of) report(s). The term \by report" refers to the ability to create a
multidimensional (MD) model starting from a given report (typically expressed as Microsoft
Excel le) that has to be lled with data extracted from a set of heterogeneous
sources. Important contributions are the automatic generation of the relational
data structure correlated to the MD models generated by the approach, and the
ability to ll both fact and dimensional tables on the basis of domain ontologies
enriched with mapping information related to the data sources. There may be
several future directions of research. The rst one is related to the de nition of
an approach for the automatic computation of aggregates of data according to
the topological position of the cells that contain them, by taking into account
row and column headers. Another interesting research activity will study how to
enrich the table identi cation algorithm. The aim is to allow the management
of a larger (w.r.t., the actual algorithm) number of types of report, improving
the e ciency of the presented approach.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Simkovic</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Competition and crisis in mortgage securitization</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Winter</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strauch</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>A method for demand-driven information requirements analysis in data warehousing projects</article-title>
          .
          <source>In: 36th Hawaii International Conference on System Sciences (HICSS-36</source>
          <year>2003</year>
          ),
          <source>CD-ROM / Abstracts Proceedings, January 6-9</source>
          ,
          <year>2003</year>
          ,
          <string-name>
            <given-names>Big</given-names>
            <surname>Island</surname>
          </string-name>
          ,
          <string-name>
            <surname>HI</surname>
          </string-name>
          , USA, IEEE Computer Society (
          <year>2003</year>
          )
          <fpage>231</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Golfarelli</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maio</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rizzi</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>The dimensional fact model: A conceptual model for data warehouses</article-title>
          .
          <source>Int. J. Cooperative Inf. Syst</source>
          .
          <volume>7</volume>
          (
          <issue>2</issue>
          -3) (
          <year>1998</year>
          )
          <volume>215</volume>
          {
          <fpage>247</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Golfarelli</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Graziani</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rizzi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Starry vault: Automating multidimensional modeling from data vaults</article-title>
          . In Pokorny, J.,
          <string-name>
            <surname>Ivanovic</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thalheim</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saloun</surname>
          </string-name>
          , P., eds.
          <source>: Advances in Databases and Information Systems - 20th East European Conference, ADBIS 2016</source>
          , Prague, Czech Republic,
          <source>August 28-31</source>
          ,
          <year>2016</year>
          , Proceedings. Volume
          <volume>9809</volume>
          of Lecture Notes in Computer Science., Springer (
          <year>2016</year>
          )
          <volume>137</volume>
          {
          <fpage>151</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Blanco</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>de Guzman</surname>
            ,
            <given-names>I.G.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fernandez-Medina</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Trujillo</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>An architecture for automatically developing secure OLAP applications from models</article-title>
          .
          <source>Information &amp; Software Technology</source>
          <volume>59</volume>
          (
          <year>2015</year>
          )
          <volume>1</volume>
          {
          <fpage>16</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Jovanovic</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Romero</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Simitsis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abello</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mayorova</surname>
            ,
            <given-names>D.:</given-names>
          </string-name>
          <article-title>A requirementdriven approach to the design and evolution of data warehouses</article-title>
          .
          <source>Inf. Syst</source>
          .
          <volume>44</volume>
          (
          <year>2014</year>
          )
          <volume>94</volume>
          {
          <fpage>119</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Prat</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Akoka</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Comyn-Wattiau</surname>
            ,
            <given-names>I.:</given-names>
          </string-name>
          <article-title>A uml-based data warehouse design method</article-title>
          .
          <source>Decision Support Systems</source>
          <volume>42</volume>
          (
          <issue>3</issue>
          ) (
          <year>2006</year>
          )
          <volume>1449</volume>
          {
          <fpage>1473</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Nabli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Feki</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gargouri</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Automatic construction of multidimensional schema from OLAP requirements</article-title>
          .
          <source>In: 2005 ACS / IEEE International Conference on Computer Systems and Applications (AICCSA</source>
          <year>2005</year>
          ), January 3-
          <issue>6</issue>
          ,
          <year>2005</year>
          , Cairo, Egypt, IEEE Computer Society (
          <year>2005</year>
          )
          <fpage>28</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Giorgini</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rizzi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garzetti</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Grand</surname>
          </string-name>
          :
          <article-title>A goal-oriented approach to requirement analysis in data warehouses</article-title>
          .
          <source>Decision Support Systems</source>
          <volume>45</volume>
          (
          <issue>1</issue>
          ) (
          <year>2008</year>
          )
          <volume>4</volume>
          {
          <fpage>21</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Blanco</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>de Guzman</surname>
            ,
            <given-names>I.G.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fernandez-Medina</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Trujillo</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>An MDA approach for developing secure OLAP applications: Metamodels and transformations</article-title>
          .
          <source>Comput. Sci. Inf</source>
          . Syst.
          <volume>12</volume>
          (
          <issue>2</issue>
          ) (
          <year>2015</year>
          )
          <volume>541</volume>
          {
          <fpage>565</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Bontcheva</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wilks</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <source>In: Automatic Report Generation from Ontologies: The MIAKT Approach</source>
          . Springer Berlin Heidelberg, Berlin, Heidelberg (
          <year>2004</year>
          )
          <volume>324</volume>
          {
          <fpage>335</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Nebot</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berlanga</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perez</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aramburu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pedersen</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Multidimensional integrated ontologies: A framework for designing semantic data warehouses</article-title>
          .
          <source>Journal on Data Semantics XIII</source>
          (
          <year>2009</year>
          )
          <volume>1</volume>
          {
          <fpage>36</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Romero</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abello</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>A framework for multidimensional design of data warehouses from ontologies</article-title>
          .
          <source>Data &amp; Knowledge Engineering</source>
          <volume>69</volume>
          (
          <issue>11</issue>
          ) (
          <year>2010</year>
          )
          <volume>1138</volume>
          {
          <fpage>1157</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Thenmozhi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vivekanandan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>An ontology based hybrid approach to derive multidimensional schema for data warehouse</article-title>
          .
          <source>International Journal of Computer Applications</source>
          <volume>54</volume>
          (
          <issue>8</issue>
          ) (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Thenmozhi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vivekanandan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>A framework to derive multidimensional schema for data warehouse using ontology</article-title>
          .
          <source>In: Proceedings of National Conference on Internet and WebSevice Computing</source>
          , NCIWSC. (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Benslimane</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arara</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Falquet</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maamar</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thiran</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gargouri</surname>
          </string-name>
          , F. In: Contextual Ontologies. Springer Berlin Heidelberg, Berlin, Heidelberg (
          <year>2006</year>
          )
          <volume>168</volume>
          {
          <fpage>176</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Barkat</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khouri</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bellatreche</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boustia</surname>
          </string-name>
          , N.:
          <article-title>Bridging context and data warehouses through ontologies</article-title>
          .
          <source>In: Proceedings of the Symposium on Applied Computing</source>
          , ACM (
          <year>2017</year>
          )
          <volume>336</volume>
          {
          <fpage>341</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>18. : W3C Standard Consortium, howpublished = http://www.w3.org</mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Pardillo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mazon</surname>
            ,
            <given-names>J.N.</given-names>
          </string-name>
          :
          <article-title>Using ontologies for the design of data warehouses</article-title>
          .
          <source>International Journal of Database Management Systems, (IJDMS) 3</source>
          (
          <issue>2</issue>
          ) (
          <year>2011</year>
          )
          <volume>73</volume>
          {
          <fpage>87</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Zanibbi</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blostein</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cordy</surname>
            ,
            <given-names>J.R.:</given-names>
          </string-name>
          <article-title>A survey of table recognition. Document Analysis and Recognition, Models, observations, transformations , and inferences 7(1) (Mar</article-title>
          <year>2004</year>
          )
          <volume>1</volume>
          {
          <fpage>16</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Sugumaran</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Storey</surname>
            ,
            <given-names>V.C.</given-names>
          </string-name>
          :
          <article-title>Ontologies for conceptual modeling: their creation, use, and management</article-title>
          .
          <source>Data &amp; Knowledge Engineering</source>
          <volume>42</volume>
          (
          <issue>3</issue>
          ) (
          <year>2002</year>
          )
          <volume>251</volume>
          {
          <fpage>271</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>