A Model for Assisting Business Users
             along Analytical Processes

         Corentin Follenfant1,2 , David Trastour1 , and Olivier Corby2
                      1
                        SAP Research, SAP Labs France SAS
     805 avenue du Dr. Maurice Donat, BP 1216, 06254 Mougins Cedex, France
                           firstname.lastname@sap.com
                     2
                        INRIA Sophia Antipolis Méditerranée,
       2004 route des Lucioles, BP 93, 06902 Sophia Antipolis Cedex, France
                          firstname.lastname@inria.fr


      Abstract. User-centric business intelligence aims at empowering an-
      alysts who interact with complex tools, by allowing them to perform
      accurate data manipulations and analysis without necessarily requiring
      IT expertise and knowledge of underlying data specifications. Recom-
      mender systems contribute to easing their tasks but most of them oper-
      ate inside walled gardens and cannot assist properly the user throughout
      his BI workflow. In this paper we introduce a lightweight vocabulary in-
      tended to capture fragments of analytical workflows as multidimensional
      data transformations, within a Semantic Web framework. We utilize this
      model for calculating content-based recommendations.

      Keywords: business intelligence, content-based recommendation, ana-
      lytical layers, usage semantics


1   Introduction

Traditional Business Intelligence (BI) platforms provide tools that are designed
to cover a wide range of operations in a data-driven decision making work-
flow. The prerequisites steps concern data extraction, cleansing and integration.
On top of them come what we call analytical processes: it includes querying,
analysis and visual data consumption. These operations often require various
technical competencies, for instance SQL expertise and a good understanding
of underlying relational models. Since the current businesses landscapes rarely
allow users to maintain both technical and analytical profiles, this hinders the
decision makers’ capacity to leverage the tools at their full potential without
requiring extensive assistance from IT departments.
    A common approach to tackle this problem is by providing contextual assis-
tance through recommender systems in order to suggest items such as datasets,
business entities, queries or visualizations, depending on the current step of the
user into his analytical process. Although those systems start to work beyond
the legacy U ser × Item space and integrate broader contextual information [1],


                                        38
they can hardly be applied on the whole analytical process as items become
heterogeneous and implicit rating functions complex.
   In this paper we propose an information model based on Semantic Web tech-
nologies, designed to capture semantics of sequential transformations applied on
multidimensional data structures. We describe a content-based recommendation
use case of this model, where items’ granularity vary from business entities to
analytical processes aspects, and their utility is computed by arbitrary functions
over their usage statistics.


2     Context

BI systems architecture can be split in three layers: first, raw data mainly comes
from operational systems where it is stored into heterogeneous databases. Sec-
ondly, Extract, Transform, Load (ETL) and integration processes federate those
sources into data warehouses. Combined with metadata management compo-
nents, they expose data through a (hyper)cube model of business entities named
after users’ familiar terminology: measures (factual data, e.g. Revenue) that
can be driven by dimensions (dimensional data, e.g. Country, Year, Store).
Thirdly, analytical processes of end user applications such as reporting tools
begin by querying the data warehouse layer to retrieve multidimensional data,
before applying transformations and visualizations as the user authors his report.
    Number of efforts are devoted to making these tools more usable and accessi-
ble by involving recommender systems for specific steps of analytical processes.
This goes from querying the data warehouse [2, 4], to higher-level workflows such
as exploration [6, 3]. Assisting users throughout the analytical process requires
a common metamodel to capture multidimensional operations.


3     An Information Model to Capture Usage Semantics

The RDF Data Cube vocabulary introduced by Cyganiak et al.3 is mainly in-
tended to enable the publication of statistics, and thus provides a metamodel
for multidimensional datasets. In order to enable high level description of an-
alytical processes that can be performed within reporting tools, we extend the
vocabulary4 as presented in figure 1. Processes are split into sequences of multi-
dimensional data transformations that
    are derived from users’ interactions with the report design tool. Each trans-
formation corresponds to a mda:AnalysisLayer subclass instance. Like a web
service in the OWL-S5 fashion, it consumes and exposes interfaces of multidimen-
sional data structures described by qb:DataStructureDefinition individuals
3
    http://publishing-statistical-data.googlecode.com/svn/trunk/specs/src/main/html/cube.html
4
  The RDFS classes and properties of our extension use the mda: prefix, for MultiDi-
  mensional Analysis. The qb: prefix refers to Data Cube vocabulary.
5
  http://www.ai.sri.com/daml/services/owl-s/1.2/overview/


                                              39
              Fig. 1: Multidimensional Analysis extension outline


through mda:inputStructure and mda:outputStructure properties. Transfor-
mations can thus be interchanged and reused for describing different snippets (re-
ports elements) that share layers of the analytical process. These layers are con-
nected together through sets of bindings, assemblies, that plug the multidimen-
sional structures atoms, business entities represented by qb:ComponentSpecification
individuals.


4   Experiments

Aiming at providing assistance and reuse capabilities to business users who con-
sume reports and have authoring expectations, we leverage our model to compute
content-based recommendations. We ran a snippet crawler against a repository
of BI reports in order to harvest snippets’ underlying analysis sequences and
populate an RDF graph with generated triples. The source is an internal reposi-
tory storing 645 reports used to perform analysis on 101 data warehouse models.
A total of 8121 snippets were extracted, all of them being split into up to five
layers of transformations over business entities.
    Usage statistics measures are extracted from this graph and then used to
feed utility functions of a recommender system, for which we identified two
granularities of recommendations. First, basic top-k SPARQL queries can sug-
gest business entities that are likely to complete a ProjectionLayer, that is
adding dimensions or measures to a snippet’s axis. As opposed to this horizontal
recommendation, the vertical one aims at recommending entire layers in the an-
alytical process in order to assist a user into reusing relevant transformations or
visualizations that can be applied on top of a query. To do so, we compute item
similiarity measures for AnalyticalLayer individuals that are not already con-
nected together through assemblies. The similarity measure strategy is adapted
from the Levenshtein distance implemented in the iSPARQL extension [5].


                                        40
5    Conclusion & Future Work

We introduced an approach to reuse-oriented analytical processes modelling with
Semantic Web technologies, which captures the different steps of analysis as mul-
tidimensional structures transformations. The first use case concerns BI report-
ing applications, for which we exemplified our model by triplifying a repository
of reports snippets. The graph data resulting from this initial experiment can
be queried for basic usage statistics or content similarity measures with simple
SPARQL aggregates or iSPARQL statements.
    Areas of research that we expect will require further investigation include the
formal definition of matching criteria between layers of analytical processes, and
its implementation as a specific similarity strategy for analytical layers’ RDF
resources in iSPARQL; and mechanisms to capture or infer the provenance of
the data surfacing into end user visualizations [7]. Finally, we will check the
model’s genericity by using crawlers for BI applications besides reporting tools,
such as dashboarding or exploration ones. This will enable representing ana-
lytical processes composed of transformations and data derived from different
environments, for instance statistical data published with respect to RDF Data
Cube vocabulary and external to the data warehouses.


References
1. Adomavicius, G., Tuzhilin, A.: Toward the Next Generation of Recommender Sys-
   tems: A Survey of the State-of-the-Art and Possible Extensions. IEEE Transactions
   on Knowledge and Data Engineering 17, 734–749 (2005)
2. Chatzopoulou, G., Eirinaki, M., Polyzotis, N.: Query Recommendations for Interac-
   tive Database Exploration. In: Proceedings of the 21st International Conference on
   Scientific and Statistical Database Management. pp. 3–18. SSDBM 2009, Springer-
   Verlag, Berlin, Heidelberg (2009)
3. Jerbi, H., Ravat, F., Teste, O., Zurfluh, G.: Applying Recommendation Technol-
   ogy in OLAP Systems. In: Aalst, W., Mylopoulos, J., Rosemann, M., Shaw, M.J.,
   Szyperski, C., Filipe, J., Cordeiro, J. (eds.) Enterprise Information Systems. Lecture
   Notes in Business Information Processing, Springer Berlin Heidelberg (2009)
4. Khoussainova, N., Kwon, Y., Balazinska, M., Suciu, D.: SnipSuggest: Context-aware
   Autocompletion for SQL. Proc. VLDB Endow. 4, 22–33 (October 2010)
5. Kiefer, C., Bernstein, A., Stocker, M.: The Fundamentals of iSPARQL: A Virtual
   Triple Approach for Similarity-Based Semantic Web Tasks. In: The Semantic Web.
   Lecture Notes in Computer Science, Springer Berlin / Heidelberg (2007)
6. Marcel, P., Negre, E.: A Survey of Query Recommendation Techniques for Dataware-
   house Exploration. In: Proceedings of the 7th Conference on Data Warehousing and
   On-Line Analysis. EDA ’11 (June 2011)
7. Reisser, A., Priebe, T.: Utilizing Semantic Web Technologies for Efficient Data Lin-
   eage and Impact Analyses in Data Warehouse Environments. In: Proceedings of the
   2009 20th International Workshop on Database and Expert Systems Application.
   DEXA ’09, IEEE Computer Society, Washington, DC, USA (2009)


                                          41