=Paper= {{Paper |id=Vol-1603/10000044 |storemode=property |title=Eliciting Information Requirements for DW Systems |pdfUrl=https://ceur-ws.org/Vol-1603/10000044.pdf |volume=Vol-1603 |authors=Deepika Prakash |dblpUrl=https://dblp.org/rec/conf/caise/Prakash16 }} ==Eliciting Information Requirements for DW Systems== https://ceur-ws.org/Vol-1603/10000044.pdf
    Eliciting Information Requirements for DW systems
                                     Deepika Prakash
                        Delhi Technological University, Delhi, India
                                dpka.prakash@gmail.com
                               Supervisor: Dr. Daya Gupta

      Abstract: Data Warehouse technology addresses business process analysis.
      However, it ignores upstream decision-making like formulating policies and
      policy enforcement rules. We provide a requirements engineering approach for
      building an integrated data for all types of decisions in an organization. To do
      this, we develop a two level generic platform with the bottom level having
      generic models of decisions, information, and decision-information association
      as well as information elicitation techniques for eliciting information for
      decisions. The higher level is the source of decision for the lower layer and is
      exemplified by policy enforcement rule decisions as well as operational
      decisions for managing the business process. Each source produces its own data
      warehouse requirements specification and these are integrated together using
      our integration technique.

      Keywords: Data Warehouse, Requirements Engineering, Early Information,
      Decision, Policy enforcement rule, Data Warehouse Integration

1        Introduction
Data Warehouse (DW) failure statistics highlight the crucial role of RE in mitigating
system failure [1]. Hayen [2] refers to studies that indicate the typical cost of a DW
project to be one million dollars in the very first year. However, one-half to two-third
of these projects fail. One of the causes of this failure [1] is inadequate determination
of the relationship of the DW with strategic business requirements. These statistics
reinforce the need for RE for DW systems.

DWRE techniques identify DW structures like facts, dimensions and finally arrive at
star schema. DW structures are identified from existing systems, information gathered
from users’ of the DW or a combination of the two. DWRE techniques are classified
into three broad categories based on the nature and phases of the process. By nature
they can be top-down, bottom-up or mixed and by phases they can have a single-
derivation phase or multiple phases in which the process is performed.

A more detailed comparison of top-down, bottom-up and mixed approach is as
follows:
I.         Bottom-Up or Supply driven approaches assume that existing, already
available, information, needs to only be converted into the multi-dimensional form.
Thus, the starting point is existing data bases and data sources. Desired facts and
dimensions are then imposed on these available sources. There are two basic
approaches, (a) Database Driven approach that starts from existing databases [3], and
(b) ER schema driven approach that starts from ER schema [4].




                                             1
ER driven techniques have been criticised on several grounds. According to [5]
“Entity relation models cannot be used for enterprise data warehouses”. Information
is limited to what has been captured by the ER diagram. These techniques do not give
primary importance to the users’ perspective [6].

II.         Top-Down or Demand driven approaches determine information
contents of a DW To-Be from scratch. These approaches directly adopted model
driven techniques developed in software/information systems RE like goal-orientation
and scenario-orientation. User driven and Goal driven approaches of DWRE belong
to this category.

Some User driven approaches include techniques developed by [7, 8]. However, it has
been observed that some users may not be able to describe their requirements [9].
Users do not see their organization from a “broad angle” and so the requirements are
“narrow” [10].

Goal driven approaches [9], [13] suffer from the inherent limitation of goal orientation.
Firstly, goals are fuzzy concepts. [11] points out that goal are “informal and
incomplete” and “difficult to precisely define”. GORE is subjective, dependent on the
requirements engineer view of the real world from where goals are identified [12].
Further, the process of goal reduction is unguided.


III.      Mixed Driven Approaches In purely demand driven techniques, the
information needed for decision making may not necessarily be available in existing
data sources, whereas in purely supply driven techniques decision making may
require information outside of that available in existing data sources. This led to the
development of mixed driven techniques where the needed information was
identified and the available data was determined.

According to the approach of [6], there is a change of perspective required that views
nodes of a goal hierarchy as goals in the first perspective and as decisional
alternatives later. This treats all alternatives uniformly and deals with ‘what is to be
achieved’. In the approach of [14], there is little guidance on what questions to ask
even though the metrics determined are critically dependent on these questions.

We notice the following drawbacks currently facing DWRE:

1.   Lack of DW support for upstream decision-making in an organization:
          According to [15], the primary concern of data warehouse technology is to
     provide support to decision makers for managing business processes better. Thus,
     the focus is on “what to do next” type of decisions that are operational in nature.
     Information support for operational decision-making is provided to all levels in
     an organization.

         In [16], there is compartmentalization of operational, business analytics and
     content analytics in separate modules. The authors recognize multiple levels of




                                           2
     decision makers for long term goals. In terms of decision making, the nature
     continues to be operational.

          OMG, in its Business Motivation Model [17] conceptualizes a business in
     terms of policies and directives that govern their enforcement. This suggests yet
     another classification of decisions that is based on the nature of the task to be
     carried out, namely, policy formulation, determination of policy enforcement
     rules, operational decisions. Notice that the first two of these are upstream to the
     third and not supported by DW technology. Thus, there is a need to develop
     specific techniques for these as well.

2.  Limited understanding of the Decision-Information Link :
         Decisional and Information perspectives have been introduced by [6] [14]
    respectively. However, we find that
(a) The relationship between the notions of decision and information is not fully
    explored. Thus, the decision-information association is left un-articulated and
    remains implicit. This inhibits a full investigation into what information is needed
    for which decision and vice-versa.
(b) DWRE does not take into account the structure of a decision and the semantic
    notions underlying decisions. The former means that it is not possible to adopt
    model driven requirements engineering leading to relatively poor guidance in the
    elicitation task. The latter implies that the conceptual basis for adopting the
    notion of a decision itself remains weak.
(c) Information models are assumed to be multi-dimensional in nature. This leads to
    an emphasis on determining facts and dimensions at the expense of determining
    information properties like required aggregations and historical information
    needs. As for decisions, this implies that only partial guidance can be provided in
    the information elicitation task.

          There is a need to explicitly model the decision-information relationship and
     treat both decision and information as first class concepts of DWRE.

3.   Limited techniques specific to Information Elicitation:
          DWRE techniques are highly oriented towards arriving at information in the
     form of Facts, Dimensions and Measures. This is either done directly without
     analyzing information and or without sufficiently exploring information before
     structuring it. Techniques like [6] belong to the former class and techniques like
     [3], [13, 14], [18] to the latter. Even though some investigation of information
     was done with Information scenarios of [13] there is no guidance provided for
     developing these scenarios.

          While arriving at MD structures is essential, it is equally important to elicit,
     examine and analyze information that is unstructured and also to elicit
     information in a guided manner.

To sum up, there is need to treat decision and information as first class concepts of
RE models, develop decision and information models for conceptual clarity and




                                            3
effective guidance, and to lay emphasis on eliciting early, unstructured information
before arriving at multi-dimensional structures.

With three inter-related DW systems, policy formulation, policy enforcement, and
operational, there may be common data across them. Therefore, need for integrating
these arises. Existing approaches do star schema/data mart integration by identifying
conformed dimensions. However, this implies long lead times due to first arriving at
the star schemas and then integrating them. Requirements may change during this
period and the integrated system may be out of step with desired one.

We can now define the following research questions:
   1. What are the different kinds of decisions, the applications from which they
        originate and their inter-relationship?
   2. Can we define information elicitation techniques that are neutral to the types
        of decisions?
   3. Having determined information relevant to decisions of each application, do
        we keep separate Data Warehouses application-wise or do we maintain one
        integrated form?

2.      Solution Approach
In order to answer the research questions, we propose a solution divided into the
following steps:

2.1.    Defining upstream and operational decisions and the decision continuum

We start by establishing the ‘Decision Environment’ and derive a typology of
decisions. We define two broad categories of decisions: Imperative decisions and
Managerial decisions. Managerial decision-making is upstream and is of two kinds.
One kind of Managerial decisions’ is those that deal with formulation of norms and
standards that are to be followed in organizations. These decisions are Policy
Decisions. The other kind are concerned with the enforcement of given policies. The
decision problem here is that of defining an appropriate set of rules that the
organization will follow during its operations. These decisions are Policy
Enforcement Rule (PER) Decisions. Imperative decisions are derived from policy
enforcement rules and consist of operational actions. The imperative decision
making problem is that of selecting the most appropriate action in a given
situation such that it also does not violate policy enforcement rules.

Imperative decisions can be taken once Policy enforcement rule decisions are taken.
Policy enforcement rule decisions are taken once policy decisions have been taken.
Thus, there is a continuum between policy, policy enforcement rule decisions and
operational decisions.

Policy formulation is done in a number of contexts with varied stakeholders [19, 20],
shows dependence on related policies [21] and consensus building [20]. Since policy
formulation is a many-facetted and complex task, we have left it for a separate




                                         4
investigation. Therefore, in this thesis, we assume a policy representation system and
consider DWRE for PER decisions and operational decisions only.


2.2.      Developing a Generic Platform
Our RE process is rooted in Decision Requirement which we model as a tuple
. Decision Requirement implies that RE process has two
steps, first to determine the choice set of decisions and then to elicit information to
choose one from the choice set.

This and our treatment of decision and information as first class concepts leads us a
two level generic platform. The bottom of Fig 1 shows the generic platform having
generic models of Decision Requirement, decision and information. Information
elicitation techniques for decisions are defined in this layer. The higher layer is the
source of decisions for this lower layer. Decisions are obtained from PER formulation
and operational decisions. This means that the lower layer is neutral to the source of
decisions and can be used for any kind of decisions. It is generic.



            PER Formulation technique                          Operational

                             Decision Source: Higher Layer




                             Generic Decision Requirement Model



               Generic Decision Model               Generic Information Model


                         Generic Platform for Decision Information

                                Fig 1: The Generic Platform

For the bottom layer, we propose three information elicitation techniques namely,
CSFI, ENDSI and MEANSI as a set of generic techniques for eliciting information for
the DW to-be. The elicited information has to be converted to multi-dimensional form
and conforms to a generic information model developed in the Generic Platform.

For the higher layer, we propose to develop RE approaches for our two sources of
decisions. Once this is done, an integration process has to be developed to arrive at
the integrated DW.

2.3.     Building an integrated DW




                                              5
Our process begins with first developing Policy Enforcement Rule DW and
operational DW. For the former, we propose a PER life cycle thus address the issue of
providing support for strategic decision making. For the latter we propose the
operational life cycle. Now the remaining question is to integrate the two DWs into
one comprehensive DW for the entire organization. .This makes the integrated DW a
single enterprise resource that supports ALL forms of decision making. The
integration is done using our Vertical integration technique.

Let us consider the RE process in each individual life cycle.

PER Life Cycle

PER life cycle creates a DW for PER decision making, DWper, with a rule base and
information in its own operational early information base. The input to the PER life
cycle is operational policies. The PER life cycle has two parts; one is to formulate
rules and the other to elicit relevant information. We assume that organizational
policies are represented in our extended first order logic. Thereafter, we propose
guidelines to arrive at PERs. PERs are of the form “WHEN triggering action IF
condition THEN correcting action”. In general there can be more than one correcting
action for a given condition and the decision maker must formulate the appropriate set
of correcting actions. In order words, the decision maker works with the choice set
{select action, modify action, reject action}. Choosing one of these constitutes the
decision problem.

We now need to elicit information that the decision maker will refer to in the decision
making task of choosing the appropriate correcting action from the choice set. The
generic early information elicitation approaches namely ENDSI, MEANSI and CSFI,
defined in the platform of Fig 1 are used. This constitutes early information, EIper,

EIper can be:
     1.       a source of information for the vertical integration life cycle
     2.       converted to late, structured information
For the latter we have proposed guidelines to convert early information into ER
diagram. Subsequently, ER can be converted to star schema by applying existing
algorithms of [4]. DWper thus obtained can be used by PER level decision maker.

We applied our process to AYUSH policies consisting of 151 policies. The RE
methodology was implemented in a tool called ELISPE and it was used to elicit the
required information.


Operational Life Cycle

Operational life cycle creates a DW for operational decision making, DWop. Actions
are first extracted from PERs and each action, a, of the PER layer is treated as a
decision, d, to be taken at the operational level. This yields the initial set of decisions
and is the input to the operational life cycle.




                                            6
Applying the generic decision model mentioned in Fig 1, each decision is subjected to
AND/OR decomposition and generalization-specialization process and leaf nodes are
determined. The next task is to elicit information for every decision using CSFI,
ENDSI, and MEANSI. The elicited information is EIop.

Again, EIop, can be either be converted to ER diagram, for which we propose
guidelines, or can be used as a source of early information for vertical integration. In
the case of the former, once the ER diagram is converted to star schema we obtain a
stand-alone DWop.

Vertical Integration Life Cycle

We found that there are two problems with keeping separate Data Warehouses. These
arise if there is common data in them.
      Difference in refresh cycles between DWper and DWop cause common data
           to have differen values in the two DWs. Thus, rule formulators and
           operational decision makers end up taking decisions on different data in this
           temporal window. The larger this window, the longer this inconsistency
           exists.
      Loss of business control occurs when data of an operational DW calls for
           decision makers of the policy enforcement DW to take decisions, but the
           decisions are not taken because daa in the latter do not suggest this need.
Thus, integration is required to maintain compatibility between PER and operational
level.

We show that there are in fact two forms of integration that can exist, horizontal and
vertical. While the former integrates data marts at the same level of decision-making,
the latter integrates data marts across PER and operational levels. For vertically
integrating DWper and DWop, we propose a ‘build by integrating’ approach that
works pair-wise pair-wise. When a new data mart is to be built, its requirements
specification is integrated with an existing one. The integrated requirements
specification then goes through the development cycle. Thus, the point of integration
is moved upstream into the requirements stage. In other words, early information is
integrated. The advantages of integrating upstream and in a pair-wise fashion are:
      downstream development effort is minimized.
      it never allows un-integrated data marts to be built. Thus, pre-empting our
          two problems.
      A complete logical DW is available for decision making.

Integration is done as a four-step process, Metadata reading, Correspondence
Drafting, Information Mapping and Conflict Resolving. The integrated early
information obtained is then converted into ER schema and finally into star schema.

Through vertical integration, an integrated DW is obtained that can be used for both
forms of decision making.




                                           7
3        Contribution of research work
A summary of the contributions made is as follows:
1. Addressing full decisional making continuum: DW support has been extended
    from providing just operational support to providing policy enforcement and
    operational support.

2.   Elicited Information can be traced back to members of the choice set thereby
     facilitating decision making. In the case of PERs the choice set is {select A,
     modify A, delete A} where A is an action and {select A1, select A2, select A3}
     for operationa decisions. For each alternative, information is elicited, thus
     relating information to a particular member of the choice set.

3.   Discovery of early information: Information is obtained in a two-step process,
     early information elicitation step, using ENDSI, MEANSI and CSFI and late
     information elicitation step where early information is converted into ER diagram
     which is subsequently converted to a star schema.

4.   Development of a requirements integration approach that pre-empts the problems
     of inconsistency in decision-making and loss of business control.

4        Conclusion and future work
This work addresses the issue of providing support to different kinds of decisional
needs in a unified, enterprise wide DW system. A two level generic platform is
proposed with generic models at the bottom level and decision sources at the higher
level.
We develop RE process to arrive at separate DWs for the PER and operational
decisions respectively. The two DWs are integrated upstream in the requirements
engineering phase. The integrated requirements specification is then converted into
multi-dimensional form.
Future work includes developing a policy life cycle for eliciting information for
policy decisions and integrating it with ‘lower’ levels of the decision continuum.


5        References
1.   Alshboul, R. (2012). Data Warehouse Explorative Study. Applied Mathematical
     Sciences, 6(61), 3015-3024
2.   Hayen R., Rutashobya C., Vetter D., (2007), An Investigation Of The Factors
     Affecting Data Warehousing Success, Issues In Information Systems, Volume
     VIII, No. 2, 547-553, 2007
3.   Golfarelli M., Maio D., Rizzi S. (1998, January). Conceptual Design of Data
     Warehouses from E/R schemes. In System Sciences, 1998., Proceedings of the
     Thirty-First Hawaii International Conference on (Vol. 7, pp. 334-343). IEEE.




                                          8
4.  Moody L.D., and Kortink M.A.R. (2000), From Enterprise Models to
    Dimensional Models: A Methodology for Data Warehouses and Data Mart
    Design, Proc. of the Intl Workshop on Design and Management of Data
    Warehouses, Stockholm, Sweden, (pp. 5.1-5.12)
5. Kimball, R. (1996): The Data Warehouse Toolkit, New York: J. Wiley & Sons.
6. Giorgini, P., Rizzi, S., Garzetti, M. (2005). Goal-oriented requirement analysis for
    data warehouse design. In Proceedings of the 8th ACM international workshop on
    Data warehousing and OLAP (pp. 47-56). ACM.
7. Winter, R., and Strauch, B. (2003, January). A method for demand-driven
    information requirements analysis in data warehousing projects. In System
    Sciences, 2003. Proceedings of the 36th Annual Hawaii International Conference
    on (pp. 9-pp). IEEE.
8. Bruckner, R., List, B., &Scheifer, J. (2001). Developing requirements for data
    warehouse systems with use cases. AMCIS 2001 Proceedings, 66.
9. Boehnlein, M., and Ulbrich vom Ende, A. (2000). Business Process Oriented
    Development of Data Warehouse Structures. In Proceedings of Data Warehousing
    2000 (pp. 3- 21). PhysicaVerlag HD
10. List, B., Bruckner, R. M., Machaczek, K., & Schiefer, J. (2002, January). A
    comparison of data warehouse development methodologies case study of the
    process warehouse. In Database and Expert Systems Applications (pp. 203-215).
    Springer Berlin Heidelberg.
11. Horkoff, J., and Yu, E. (2010). Interactive analysis of agent-goal models in
    enterprise modeling. International Journal of Information System Modeling and
    Design (IJISMD), 1(4), 1-23.
12. Haumer, P., Pohl, K., and Weidenhaupt, K. (1998). Requirements elicitation and
    validation with real world scenes. Software Engineering, IEEE Transactions
    on,24(12), 1036-1054.
13. Prakash N., and Gosain A. (2008). An Approach to Engineering the
    Requirements of Data Warehouses. Requirements Engineering Journal, Springer,
    13 (1), 49-72
14. Bonifati A.. Cattaneo F., CeriS., A. Fuggetta, and S. Paraboschi (2001).
    Designing Data Marts for Data Warehouses. ACM Trans. Software. Eng.
    Methodology, 10(4). (pp. 452–483).
15. Adamson (2010) The complete reference: Star Schema. Tata McGraw-Hill
16. Imhoff, C., & White, C. (2008). Full Circle: Decision Intelligence (DSS 2.0). B-
    Eye-Network, Published: August, 27.
17. BRG, 2010 BRG, Business Rules Group (2010), The Business Motivation Model:
    Business governance in a volatile world, Release 1.4, July 2010
18. Prakash, N., and Bhardwaj, H. (2014). Functionality for Business Indicators in
    Data Warehouse Requirements Engineering. In Advances in Conceptual
    Modeling (pp. 39-48). Springer International Publishing.
19. Lindbloom C.E. (1993), Woodhouse E.J., 3 rd edition, Prentice Hall, 1993
20. Ritchie J.R.B. (1988), Consensus policy formulation in tourism: Measuring
    resident views via survey research, Tourism Management, 9,3, 199-212, 1988
21. Park Y. T. (2000), National systems of Advanced Manufacturing Technology
    (AMT): hierarchical classification scheme and policy formulation process,
    Technovation, 20,3, 151-159, 2000




                                          9