=Paper=
{{Paper
|id=Vol-1755/217-222
|storemode=property
|title=A Design Comparison: Data Warehouse Schema versus Conventional Relational Database Schema
|pdfUrl=https://ceur-ws.org/Vol-1755/217-222.pdf
|volume=Vol-1755
|authors=Abdulrahman Yusuf
|dblpUrl=https://dblp.org/rec/conf/cori/Yusuf16
}}
==A Design Comparison: Data Warehouse Schema versus Conventional Relational Database Schema==
A Design Comparison: Data Warehouse Schema versus
Conventional Relational Database Schema
Abdulrahman Yusuf
Yobe State University Damaturu,
Yobe State, Nigeria.
audura33@yahoo.com
ABSTRACT database has erected – one for handling operational needs and the
Initially, relational database is for both operational and decision other for analytical needs. In 1990s, Inmon and Kimbell
support system, as the information society experiences transformed and extended the issue of data warehouse to become
exponential growth in the amount of data/information to be stored more sophisticated as a result of advent of PCs, 4GL technology
in a database, a line has been drown between transactional and end users encouragement [6]; [8].
database and decision support database. Unlike traditional Data warehouse development is seemed to be iterative
database, data warehouse aims to come from a number of pre- development process as it involves business requirement
existing databases (developed from relational schemas). This identification, solution development of identified requirements, as
conceptual paper discusses traditional database schema design and well as implementation of architectural design of the data
that of data warehouse schema architectural designs strategies that warehouse. Both relational schema and DW schema can be design
could be a guiding principles for both learners and beginners in using relational database design. Most DW involves the use of
database management system. It has explored the stages in pre-compiled tables of summary of materialized views. [11] stated
development processes of the two. Subject orientation, data that in designing data warehouse, existing databases are to be
integration, non-volatility of data, and time variations are the key taken into consideration; however, this is not necessary in
issues under consideration that could differentiate between operational database design (unless bottom up design is to be used
traditional databases and data warehouse schema designs. It has to incorporate previously created views). We should understand
also presented Design Modelling Techniques as well as that DW can be designed using either of the entity-relational
addressing logical data models for data warehouse schema and model or dimensional model. But star (dimensional) schema is
traditional relational database. more structurally easier to understand. And in term of query, it is
less cost, correctness and friendly. The data warehouse schema
CCS Concepts development could be in accordance with the relational model,
• Information systems ➝ Database management system based on data normalization, or the multidimensional based on de-
engines normalization.
Keywords This research paper is generally intended to clearly demonstrate
Database; Data Warehouse (DW) ; Schema Design; Relational; the road map for successful design as well as implementation of
Dimensional; Modelling; Normalization. schema of data warehouse and relational database. The paper has
been approached theoretically, using the existing literature,
1. INTRODUCTION explored and presented the technicalities in database design for
In the words of [1] formerly, data had been kept in files better comprehension. The study has compared and exposed the
(databases) which could not be accessed without programming Design Issues of relational database and that of data warehouse in
language knowledge. When the idea of relational database has which the designers could be able to get quick knowledge for the
eluded, the uneasy organizing and designing of database in the simplification of the schema design of both transactional database
underlying database structure transformed to very less (relational database) and decision support system (data
troublesome. The father of relational model, E. F. Godd brought warehouse). In this connection therefore, existing literature on
the basis for solving a lot of database problems, this is because relevant rr techniques were found from authentic sources for this
relational database have offered data independencies, data work. Furthermore, the study produced both theoretical and
consistency and data integration, thus capable of stockpiling data pictorial presentations of the design comparison of database
in rows and column tables, data normalization and tables’ schema and data warehouse schema that can stand as a guide to
relationship. especially both learners and beginners in field of database
management system.
According to [8] for a long, databases and data theory have been
in existence; originally, database was for information processing Consequently, the study has concretely revealed that either of
community (transaction and analysis); but fundamentally centered traditional database or data warehouse schema design requires
around transaction processing. The concept of elegant notion of requirement analysis and specification. Unlike traditional
database, data warehouse aims to combine a number of pre-
existing databases (developed from relational schema). It has been
explained that Subject orientation, data integration, non-volatility
of data, and time variations are the key issues under consideration
CoRI’16, Sept 7–9, 2016, Ibadan, Nigeria. that can give base to differentiate between traditional databases
and data warehouse schema design. Star and relational schemas
are logical data models for data warehouse. The research has
found two great disparities of schema design of the two models. In
data warehouse which uses star schema emphasis is on one table
217
(fact table) with the structural rule describing the organization 3.0.2 Conceptual Data Model
unlike traditional relational schema no particular table has total Individual entities are formed logically from requirement analysis,
consideration. In the other hand, the relationships established an entity appear to be very significant and is stored data [13].
between the tables in traditional relational database are based Rows uniqueness in the entities are identified (unique identifier or
logically while relationship is based on 1-Many in star schema. candidate key). According to [10] in data warehouse design semi-
Before indulging into design issues (schemas’ comparison) of the formal requirement specification from requirement analysis is
relational database and data warehouse, let have vivid understand converted into conceptual schema using a formalized
of the meaning of operational database and data warehouse. multidimensional model. The result is in multidimensional
diagram (fact tables and dimensions). For data warehouse this
2. MEANING OF DATABASE AND DATA phase is just for functional dependencies analysis among measures
WAREHOUSE as well as analysis of chosen dimension level. Due to the dynamic
Database is the data storage that consists of set of tables for nature of data warehouse requirements object oriented
operational data storage and is usually related to a single multidimensional is best to deal with the criteria [7].
application. In the word of [15] end users can retrieve and store
data in tables from relational database and is built using relational 3.0.3 Logical Data Model
model for business processing applications. Traditional database Technical issues are not taken into consideration; the model tries
sometime called OLTP - Online Transaction Processes. DW is a to depict all identified entities with their associated attributes and
central location (repository) of subjectively collected data from the relationship between the entities that stands for business
existing operational data storage relevant to qualitative and requirement. More attention is on business problems and come up
quantitative data spanning from several sources and time periods. with a dynamic design [13]. In [10], data warehouse schema
This can allow accomplishment of answering ad hoc, statistical, design, conceptual design is converted to logical data warehouse
and complex analytical queries [15]; [9]. DW is the root of schema. For each terminal dimension level, its dimension level is
Decision Support System (DSS) which is subject oriented, developed and come up with update-independent of a set of view
integrated, non-volatile and time variant [6]. It allows for creation definitions.
of ease use environment for management to conduct decision
analysis, identify new opportunities, or make tactical and strategic 3.0.4 Physical Data Model or Database Schema
choices based on both internal and external information [9]. DW design
also known with OLAP – Online Analytical Processes. Table 1 This is a graphical model that can be used for implementation of
gives clear dissimilarities of traditional databases and data database. And it is product of fully normalized logical design or
warehouse. model [13]. Demoralization is usually carried out at this phase.
This is adding redundancy tables for query performance
3. TRADITONAL DATABASES AND improvement. And it is done by expert data architect especially in
DATAWAREHOUSE DESIGN data warehouse where huge amount of data are sorted and
The stages used in traditional databases design are also used to summarized. The data warehouse schema development could be
handle the design goals of data warehouse [10]. [17] mentioned in accordance with the relational model, based on data
the following stages as complete development processes of a data normalization, or the multidimensional based on de-
warehouse: development of a feasibility study, business line normalization. Semantic meaning has been conveyed by a
analysis, data warehouse architecture design, selection of the drawing in graphical models as past research hypothesized and it
technological solution, planning the project iterations, detail is more advantageous in term of easy comprehension.
designing, data ware-house testing and implementation, 3.1 Schemas Design Issues of Relational
deployment and roll-out.
Databases and Data Warehouses
In another view [10] assumed data warehouse begins with The environment within which the relational database or data
conceptual design which is divided into requirement analysis and warehouse to be used needs to be explored for designing. This is
design, and then the individual stages of logical and physical done to find necessary details requirements for actual design. The
phases are performed. following are issues to consider:
3.0.1 Requirement Analysis 3.1.1 Subject Oriented Data
Any design begins with obtaining necessary information/data In data warehouse the subject area are considered in the design
from a business environment for analysis and specification. [10] which could be students, employees, learning materials; in
said in data warehouse analysis and specification of data contrast, transaction operational systems are planned around
requirements are performed to agree with approaching data organizational activities like health, payroll processing, students
warehouse. He further explained requirement analysis for enrollment [ 6]; [16].
traditional database design is to analyze and specify the activities
of the overall business operations. Unlike in traditional database,
requirement analysis for data warehouse aims to bring together a
3.1.2 Integrated Data
number of already exist operational sources of data (pre-existing Integrated data means “de-duplication information and merging it
databases), so that relevant data warehouse attributes are chosen from many sources into consistent locations” [16]. In the words of
and outlined initial OLAP queries on the information discovered [6] this happened to be of primary important out of all other
in operational database schemata. It has been seen that reporting aspects in data warehouse design because the data from different
and analyzing of users’ needs have to be considered in creating applications with various design decisions are to be considered for
data warehouse else it will be “data jail”. inconstancies coordination. He further explains given specific
example, an application may encode M for male and F for female
others may use 1 and 0 respectively; and. However, the issue is
218
different from relational database perspective where only one From software engineering view point, a typical process model
particular design is taken out of thousands available. encompasses all or some of the following:
3.1.3 Non-Volatile Data A functional decomposition
The nature of data in the warehouse are only loaded and accessed A context level diagram
but not updated [6]. This non-volatility characteristic makes it A data flow diagram
different from operational database in which data are constantly A structural chart
accessed, loaded as well as updated. [16] indicated that users can A state transition diagram
get the actual history of organization and reach at particular An HIPO chart and
business decision due to this unique feature. Pseudocode [6].
3.1.4 Time Reference Data 3.2.2 Data model
One crucial issue of consideration in schema design is time The most prominent data modeling techniques are two for data
variant. [6] have identified the following issue of time variance in warehouse, Entity-Relational and Dimensional modeling [15].
comparison of operational database system and data warehouse: OLTP database schema design strategies give base for the
philosophy of data warehouse architectural design. [3] found that
The time horizon in data warehouse is of interval of at data models have three (3) characteristics that make them easily
least 5 -10 years. However, for operational is shorter understandable: (i) uncomplicatedness to structure types (tables
than data warehouse of 60 – 90 days. and tupples) (ii) very negligible structure types in the model are
Data warehouse contains just sequence of refined described by styles (iii) pictorial representation used in the model.
snapshots of data at certain interval while operational They also mentioned that there are two competitive prominent
databases do carry “current value” and its correctness is models for data warehouse design: dimensional model or star
at the time of access hence updatable. schema and relational model. Snowflake schema appears to be
The essential structure of data warehouse is the present extension of star schema. Multidimensional schema is a generic
of some feature of time (example days, month, and term used to denote both star schema and snowflake schema and
year). However, the fundamental structure of they are shown in Figure 2. Hence, both traditional relational
operational data is may not have those time element. schema and the star schema are logical data models.
3.1.5 Granularity 3.2.2.1 Dimensional modeling (star schema)
This refers to the “level of details or summarization held in the This is dealt with the fact table having a kind of relation with
unit of data in the data warehouse” [6]. This reflects the facts that multi-attribute keys. An attribute primary key (mostly surrogate)
details increase with low level of granularity and vise vasa. in dimensional table corresponds exactly to one attribute in the
Therefore, granularity is a design issue but less considered in fact table with dimensional key. This feature appears to be star-
traditional relational database design. For efficiency the amount of like structure; therefore, called dimensional model or star join
data to be kept need to be considered at low level of details in schema or in short star schema [15]. In the word of [3] entity
DW. relationship constrained version are used for developing
dimensional data warehouses. The Figure 1 has central fact table
After exploring the design issues here are some tasks for DW. As depicting the business event (flight reservation) on which data
DW data are from a number of OLTP sources, Extract, Transfer, warehouse is constructed. This pronounces a reservation as a
and Load (ETL) tools are used for extraction and loading. The specific flight, with a specific passenger, on a specific airline, at a
subsets of data warehouses (Data marts) are adopted through a specific time. This diagrammatic representation is looked like star
data warehouse bus which is a standard set of attribute and is called star schema. A business event is always being
declarations. The integrate all of data warehouse components are captured within the central entities from which DW is built. In
from a metadata repository. The metadata is where definitions of another exemplification of purchase transaction where Purchase is
the source data, data models for target databases, and the fact table and Task, Part plus Supplier are dimensions tables
transformation rules that transfigure source data into target data with their associated attributes as PURCHASE (S#, P#, T#,
are stored [15]. Further stated a data warehouse need to be well- PRICE); TASK (T#, TNAME); PART (P#, PNAME, PWT) and
equipped with metadata management tools due to the size of SUPPLIER (S#, SNAME, SZIP). This is represented pictorially in
metadata. Figure 2A.
3.2 Design Modelling Techniques According to [12] for two reasons the star schema is considered to
There are number of modeling techniques that pave the way for be more efficient design than relational design: firstly, it uses
actual schema design of either relational database or data small join operations for the fact that has de-normalized table
warehouse. Prior chosen any technique, the designer should have design. Secondly, it has efficient access plan generated for the
clear understanding of the applicability and incapability of the operation because majority of the optimizers perfectly work with
techniques. As [6] pointed out two prominent models that can be the star schema. According to [18] the most commonly relational
used for schema design of either relational database or data representation uses for multi-dimensional database is star schema,
warehouse are process model and data model. which connect fact table directly to the dimensional table
(attributes descriptive table).
3.2.1 Process model
Process model in data warehouse design appears to be [5] opined that star and relational schema are logical data models.
problematic due to the fact that it is dealing with the requirement These schemas have two things that make them different the
assumptions and such cannot be worked in data warehouse Figures 1 and 3 are used to highlight the disparity
environment rather could be hold in operational environment. diagrammatically. In star schema emphasis is on one table (fact
219
table) with the structural rule describing the organization unlike The architecture is logical and physical formation. The existing
traditional relational schema no particular table has total components as well as missing ones of the organisation in
consideration. The relationships established between the tables in question are identified for data warehouse architecture
traditional relational database are based logically. Conversely, the completion. Data warehouse architecture need to be designed with
relationship is based on 1-Many in star schema. a minimal impact on the existing model, so that it can allow
However, having a more complex data warehouse with a lot of further development.
dimensions could weaken the advantages of a star schema's The two have different optimization techniques because of
presentation by the total number of elements. A complexity in different data access patterns. Different accesses perform require
data warehouse means having many several smaller star schemas various optimization techniques greatly. In the OLAP bundle, the
that share a common dimension [16]. queries are usually involve significant aggregation and joining for
decision support. Therefore, for performance improvement, de-
Snowflake schema – A perfectly star schema may encounter normalization is usually encouraged in a data warehouse
problem due to the increased number of de-normalized tables; environment.
therefore, performance problem may result, snowflake could be
the solution [12]. Snowflake is an extension of star schema [5]; The use of combination top-down and bottom-up of data design
[15]. Low level cardinality attributes are moved from dimension strategy facilitate the creation of data mart individually in a
table and placed in other tables that can be linked against bottom- up manner, which is in conformity with a skeleton
dimensional table and shown in Figure 2B which is the extension schema called data warehouse bus. The amalgamation of
of instance given previously. Dept , Proj and Region are the adapted data marts of whole organization is DW. A hub-and-
outer-level extensions that make snow-flake wth their associated spoke architecture, enterprise warehouse with operational data
attributes as PURCHASE (S#, P#, T#, PRICE); DEPT (D#, store (real-time access support), and distributed enterprise data
DNAME, DBUDGET); PROJ (PJ#, PJNAME) TASK (T#, warehouse architecture are various types of basic architectural
TNAME, D#, PJ#); PART (P#, PNAME, PWT); REGION (R#, design presented by [15] as shown Figure 4.
RNAME); and SUPPLIER (S#, SNAME, SZIP, R#). Figure 1
and 2A show the star is placed at the center. The presence of 3.4 Schema Evolution
outer-level dimension table is the significant extension. Hierarchy According to [12] over time the way things being carried out
is placed in which 1-to-many relationship from every outmost changes, the idea behind data model design of associated
table to the fundamental fact table. The de-normalized table might application may experience drastic change. Dimension changes,
be reduced due the introduction of outer-level dimension table(s). instant changes, fact changes, level changes, attribute changes,
Although, introduction of outer-level dimension table mostly constraint changes and quality changes are possible arising
reduces if not essentially eliminating the number of de-normalized changes in schema design especially for data warehouse [2]. The
tables. schema evolution challenges is everlasting and applicable to every
database practically possible. Data warehouse is developed
3.2.2.2 Entity-Relational model iteratively so that strategic decision in the future could be possible
This model follows same procedure with OLTP database design [12].
process, conceptual entity relationship is translated into relational
schema before normalization as well as de-normalization possible 4. CONCLUSION
[15]. [3] explained that entity relationship schema is not like star Initially data had been kept in files (databases) with a lot of
schema, has no focus on central table as shown in the Figure 3 maintenance difficulties. The idea of relational database have
(flight reservation). One important aspect of relational model is come up and simplified the underlying database structures to be
that normalization is used to preserve data integrity. However, this less troublesome where the developed database is for both
is different in data warehouse because data update is not an issue transaction and analytic (information processing community)
in DW. Clear differences can be seen when compared Figure 1 & principally on transaction processing. OLTP database schema
3 as feature the same information. From an information-content design strategies give base for the philosophy of data warehouse
perspective, the two schemas of different models could spotlight architectural design.
the same objective. In [18] Path Relational Schema which is The exponential growth of data storage plus the advancement of
transformation of relational database schema exhibited promising information and communication technology (ICT) pave the way, a
properties in term of query answering time compare with star and line has been drown between operational system (transactional
snowflake schemas. This is because; it is suitable for handling database) and analytical system (decision support database).
inconsistencies in the multi-dimensional databases. Inmon and Kimbel transformed and extended the issue of data
warehouse and become more sophisticated following the advent
Design difficulties, inefficient use of I/O and disk space, of PCs and 4GL technology [6], [8]. Before traditional database or
challenges in maintenance as well as designers’ effort to find data warehouse schema design requirement analysis and
trade-off between optimization of query performance and specification should be carried out. Unlike traditional database,
maximization of query flexibility [1] data warehouse aims to combine a number of pre-existing
database.
3.3 Schema Architectural Design
This paper was approached theoretically, the technicalities in
Architectural design happens to be blueprint that makes
database and data warehouse designs were presented from
communication, planning, maintenance, learning and reuse
explored existing literature for easy and better comprehension.
possible. Data design, technical design and hardware and software
The study compared and exposed the Design Issues of relational
infrastructure are some several aspects that involved in
database and that of data warehouse in which the designers can be
architecture. [15] mentioned top-down, bottom-up, inside out and
able to get quick knowledge for the simplification of the schema
mixed are schema design strategies available.
220
design of both transactional database (relational database) and
decision support system (data warehouse).
The study produced both theoretical and diagrammatical
presentations of the design comparison of database schema and
data warehouse schema that can stand as a guide, especially to
both learners and beginners in field of database management
system. Consequently, the paper was concretely revealed that
either of traditional database or data warehouse schema design
requires requirement analysis and specification. Unlike traditional
database, data warehouse aims to combine a number of pre-
existing databases (developed from relational schema). It
explained Subject orientation, data integration, non-volatility of
data, and time variations as the key issues under consideration that
can give base for the differentiation between traditional databases
and data warehouse schema design. Star and relational schemas
are logical data models for data warehouse. The research found
two great disparities of schema design of the two models. In data
warehouse which uses star schema emphasis is on one table (fact
table) with the structural rule describing the organization unlike
traditional relational schema no particular table has total
consideration. In the other hand, the relationships established
between the tables in traditional relational database are based
logically while relationship is based on 1-Many in star schema.
Figure 1 & 3 highlight the disparity.
5. TABLE AND FIGURES Figure 1. Simple star schema
Table 1: Distinction between OLTP and OLAP systems
OLTP System OLAP System
(Operational Data) (Data Warehouse)
Data From Traditional or From diverse databases
source original source
Aim of Organizational basic For planning and problem
data activities running solving known Business
support system
Data images of ongoing Business activities from
type business transaction various sections
Insertio Insert and update are Periodic refreshing inform
n and by end users which is of Batch jobs
updatin fast
g
Queries Simple queries that Complex aggregate queries
return small records
Efficien Very high speed to Takes some time depending
cy answer queries on the requirement but can
be improved using index
Space holds very small Comprehensive history due
hold history to aggregation
Databa Greatly normalized Highly denormalised with
se due to many tables few tables using Star and
design involvement snowflake schema.
Figure 2. Star and snowflake schema
221
[4] Datawarehouse4u 2009. OLTP VS OLAP. DOI=
http://www.datawarehouse4u.co.uk/OLTP-vs-OLAP.html.
[5] Dedley, C. 2009. Database Technologies. Lecture: Data
Warehouse [online]. [Accessed 28th November, 2012].
DOI= http://wolf.wlv.ac.uk
stech/72043/datawarehouse2009.ppt?menu=833910 /
[6] Inmon, W. H. 1996. Building the Data Warehouse. 2nd Ed.
New York: John Wiely & Sons.
[7] Jindal, R. and Taneja, S. 2012. Comparative Study of Data
Warehouse Design Approaches : A Survey. International
Journal of Database Management Systems [online], 4(1), pp.
33-45. DOI= http://wlv.summon.serialssolutions.com/link/.
[8] Johnston T. and Weis R. 2010. Managing Time in
Figure 3. Simple traditional relational schema Captions
Relational Databases: How to Design, Update and Query
Temporal Data. 1st Ed. Published: Morgan Kaufmann.
DOI=
http://proquestcombo.safaribooksonline.com/book/databases/
database-design/9780123750419 /
[9] Kazparaz, R. 2011. What is the differences between database
and data warehouse? DOI=
http://techforum4u.com/member.php/
[10] Lechtenborger, J. (2003) Data Warehouse Schema Design,
Dept. of Information Systems, University of Munster
Germany. DOI= www.pcwi202.uni-
muenster.de/dbms/media/.../2003/diss-jl-btw2003.pdf>
[11] Lechtenbörger, J. and Vossen, G. 2003. Multidimensional
normal forms for data warehouse design. Information
Systems [online], 28(5), pp. 415-434 DOI=
http://wlv.summon.serialssolutions.com/link/
[12] Martyn, T. 2004. Reconsidering Multi-Dimensional schemas.
ACM SIGMOD. 33(1), pp. 83-88 DOI=
http://wlv.summon.serialssolutions.com/link/
[13] Richert A. 2009. Oracle SQL by Example. 4th Edition.
[online] Publisher: Prentice hall pp 960-965. DOI=
http://proquestcombo.safaribooksonline.com/
[14] Schuff, D., Corral, K. and Turetken, O. 2011. Comparing the
Figure 4: Various architectural design of DW understandability of alternative data warehouse schemas: An
6. ACKNOWLEDGMENTS empirical study. Decision Support Systems. 52(1), pp. 9
DOI= http://wlv.summon.serialssolutions.com/link/>.
My special thanks to Prof. Carl Dedley and Dr. Mary G. of
Wolverhampton University, UK who have accessed part of this [15] Sen, A. and Sinha, A. 2005. A comparison of data
paper as their module assessment.2013. warehousing methodologies. Communications of the ACM
7. REFERENCES. . [online], 48(3), pp. 79-84. DOI=
[1] Anonymous 2013. Comparison of Data Warehousing DBMS http://wlv.summon.serialssolutions.com/link/
Platforms - ODBMS.org DOI= www.odbms.org/wp- [16] Suri, P. and Sharma, M. (2011) A Comparative Study
content/uploads/2013/11/illuminate-Comparison.pdf. Between the Performance of Relational & Object Oriented
[2] Arora, M. and Gosain, A. 2011. Schema Evolution for Data Database in Data Warehousing. International Journal of
Warehouse: A Survey. International Journal of Computer Database Management Systems [online], 3(2), pp. 116-127
Applications. [online] 22, 6, p6-14. DOI= DOI= http://wlv.summon.serialssolutions.com/link/ .