<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Model for Assisting Business Users along Analytical Processes</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Corentin Follenfant</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David Trastour</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olivier Corby</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>INRIA Sophia Antipolis Mediterranee</institution>
          ,
          <addr-line>2004 route des Lucioles, BP 93, 06902 Sophia Antipolis Cedex</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>SAP Research, SAP Labs France SAS 805 avenue du Dr. Maurice Donat</institution>
          ,
          <addr-line>BP 1216, 06254 Mougins Cedex</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <fpage>38</fpage>
      <lpage>41</lpage>
      <abstract>
        <p>User-centric business intelligence aims at empowering analysts who interact with complex tools, by allowing them to perform accurate data manipulations and analysis without necessarily requiring IT expertise and knowledge of underlying data speci cations. Recommender systems contribute to easing their tasks but most of them operate inside walled gardens and cannot assist properly the user throughout his BI work ow. In this paper we introduce a lightweight vocabulary intended to capture fragments of analytical work ows as multidimensional data transformations, within a Semantic Web framework. We utilize this model for calculating content-based recommendations.</p>
      </abstract>
      <kwd-group>
        <kwd>business intelligence</kwd>
        <kwd>content-based recommendation</kwd>
        <kwd>analytical layers</kwd>
        <kwd>usage semantics</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Traditional Business Intelligence (BI) platforms provide tools that are designed
to cover a wide range of operations in a data-driven decision making
workow. The prerequisites steps concern data extraction, cleansing and integration.
On top of them come what we call analytical processes: it includes querying,
analysis and visual data consumption. These operations often require various
technical competencies, for instance SQL expertise and a good understanding
of underlying relational models. Since the current businesses landscapes rarely
allow users to maintain both technical and analytical pro les, this hinders the
decision makers' capacity to leverage the tools at their full potential without
requiring extensive assistance from IT departments.</p>
      <p>
        A common approach to tackle this problem is by providing contextual
assistance through recommender systems in order to suggest items such as datasets,
business entities, queries or visualizations, depending on the current step of the
user into his analytical process. Although those systems start to work beyond
the legacy U ser Item space and integrate broader contextual information [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ],
they can hardly be applied on the whole analytical process as items become
heterogeneous and implicit rating functions complex.
      </p>
      <p>In this paper we propose an information model based on Semantic Web
technologies, designed to capture semantics of sequential transformations applied on
multidimensional data structures. We describe a content-based recommendation
use case of this model, where items' granularity vary from business entities to
analytical processes aspects, and their utility is computed by arbitrary functions
over their usage statistics.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Context</title>
      <p>BI systems architecture can be split in three layers: rst, raw data mainly comes
from operational systems where it is stored into heterogeneous databases.
Secondly, Extract, Transform, Load (ETL) and integration processes federate those
sources into data warehouses. Combined with metadata management
components, they expose data through a (hyper)cube model of business entities named
after users' familiar terminology: measures (factual data, e.g. Revenue) that
can be driven by dimensions (dimensional data, e.g. Country, Year, Store).
Thirdly, analytical processes of end user applications such as reporting tools
begin by querying the data warehouse layer to retrieve multidimensional data,
before applying transformations and visualizations as the user authors his report.</p>
      <p>
        Number of e orts are devoted to making these tools more usable and
accessible by involving recommender systems for speci c steps of analytical processes.
This goes from querying the data warehouse [
        <xref ref-type="bibr" rid="ref2 ref4">2, 4</xref>
        ], to higher-level work ows such
as exploration [
        <xref ref-type="bibr" rid="ref3 ref6">6, 3</xref>
        ]. Assisting users throughout the analytical process requires
a common metamodel to capture multidimensional operations.
3
      </p>
      <p>An Information Model to Capture Usage Semantics
The RDF Data Cube vocabulary introduced by Cyganiak et al.3 is mainly
intended to enable the publication of statistics, and thus provides a metamodel
for multidimensional datasets. In order to enable high level description of
analytical processes that can be performed within reporting tools, we extend the
vocabulary4 as presented in gure 1. Processes are split into sequences of
multidimensional data transformations that</p>
      <p>are derived from users' interactions with the report design tool. Each
transformation corresponds to a mda:AnalysisLayer subclass instance. Like a web
service in the OWL-S5 fashion, it consumes and exposes interfaces of
multidimensional data structures described by qb:DataStructureDefinition individuals
3 http://publishing-statistical-data.googlecode.com/svn/trunk/specs/src/main/html/cube.html
4 The RDFS classes and properties of our extension use the mda: pre x, for
MultiDimensional Analysis. The qb: pre x refers to Data Cube vocabulary.
5 http://www.ai.sri.com/daml/services/owl-s/1.2/overview/
through mda:inputStructure and mda:outputStructure properties.
Transformations can thus be interchanged and reused for describing di erent snippets
(reports elements) that share layers of the analytical process. These layers are
connected together through sets of bindings, assemblies, that plug the
multidimensional structures atoms, business entities represented by qb:ComponentSpecification
individuals.
4</p>
    </sec>
    <sec id="sec-3">
      <title>Experiments</title>
      <p>Aiming at providing assistance and reuse capabilities to business users who
consume reports and have authoring expectations, we leverage our model to compute
content-based recommendations. We ran a snippet crawler against a repository
of BI reports in order to harvest snippets' underlying analysis sequences and
populate an RDF graph with generated triples. The source is an internal
repository storing 645 reports used to perform analysis on 101 data warehouse models.
A total of 8121 snippets were extracted, all of them being split into up to ve
layers of transformations over business entities.</p>
      <p>
        Usage statistics measures are extracted from this graph and then used to
feed utility functions of a recommender system, for which we identi ed two
granularities of recommendations. First, basic top-k SPARQL queries can
suggest business entities that are likely to complete a ProjectionLayer, that is
adding dimensions or measures to a snippet's axis. As opposed to this horizontal
recommendation, the vertical one aims at recommending entire layers in the
analytical process in order to assist a user into reusing relevant transformations or
visualizations that can be applied on top of a query. To do so, we compute item
similiarity measures for AnalyticalLayer individuals that are not already
connected together through assemblies. The similarity measure strategy is adapted
from the Levenshtein distance implemented in the iSPARQL extension [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
We introduced an approach to reuse-oriented analytical processes modelling with
Semantic Web technologies, which captures the di erent steps of analysis as
multidimensional structures transformations. The rst use case concerns BI
reporting applications, for which we exempli ed our model by triplifying a repository
of reports snippets. The graph data resulting from this initial experiment can
be queried for basic usage statistics or content similarity measures with simple
SPARQL aggregates or iSPARQL statements.
      </p>
      <p>
        Areas of research that we expect will require further investigation include the
formal de nition of matching criteria between layers of analytical processes, and
its implementation as a speci c similarity strategy for analytical layers' RDF
resources in iSPARQL; and mechanisms to capture or infer the provenance of
the data surfacing into end user visualizations [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Finally, we will check the
model's genericity by using crawlers for BI applications besides reporting tools,
such as dashboarding or exploration ones. This will enable representing
analytical processes composed of transformations and data derived from di erent
environments, for instance statistical data published with respect to RDF Data
Cube vocabulary and external to the data warehouses.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Adomavicius</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tuzhilin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          <volume>17</volume>
          , 734{
          <fpage>749</fpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Chatzopoulou</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eirinaki</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polyzotis</surname>
          </string-name>
          , N.:
          <article-title>Query Recommendations for Interactive Database Exploration</article-title>
          .
          <source>In: Proceedings of the 21st International Conference on Scienti c and Statistical Database Management</source>
          . pp.
          <volume>3</volume>
          {
          <fpage>18</fpage>
          .
          <source>SSDBM</source>
          <year>2009</year>
          , SpringerVerlag, Berlin, Heidelberg (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Jerbi</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ravat</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Teste</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zur</surname>
            <given-names>uh</given-names>
          </string-name>
          , G.:
          <article-title>Applying Recommendation Technology in OLAP Systems</article-title>
          . In: Aalst,
          <string-name>
            <given-names>W.</given-names>
            ,
            <surname>Mylopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Rosemann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Shaw</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.J.</given-names>
            ,
            <surname>Szyperski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Filipe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Cordeiro</surname>
          </string-name>
          ,
          <string-name>
            <surname>J</surname>
          </string-name>
          . (eds.)
          <source>Enterprise Information Systems. Lecture Notes in Business Information Processing</source>
          , Springer Berlin Heidelberg (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Khoussainova</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kwon</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Balazinska</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suciu</surname>
            ,
            <given-names>D.:</given-names>
          </string-name>
          <article-title>SnipSuggest: Context-aware Autocompletion for SQL</article-title>
          .
          <source>Proc. VLDB Endow</source>
          .
          <volume>4</volume>
          ,
          <issue>22</issue>
          {33 (
          <year>October 2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Kiefer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bernstein</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stocker</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>The Fundamentals of iSPARQL: A Virtual Triple Approach for Similarity-Based Semantic Web Tasks</article-title>
          .
          <source>In: The Semantic Web. Lecture Notes in Computer Science</source>
          , Springer Berlin / Heidelberg (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Marcel</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Negre</surname>
          </string-name>
          , E.:
          <article-title>A Survey of Query Recommendation Techniques for Datawarehouse Exploration</article-title>
          .
          <source>In: Proceedings of the 7th Conference on Data Warehousing and On-Line Analysis. EDA '11 (June</source>
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Reisser</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Priebe</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Utilizing Semantic Web Technologies for E cient Data Lineage and Impact Analyses in Data Warehouse Environments</article-title>
          .
          <source>In: Proceedings of the 2009 20th International Workshop on Database and Expert Systems Application. DEXA '09</source>
          , IEEE Computer Society, Washington, DC, USA (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>