=Paper= {{Paper |id=Vol-19/paper-11 |storemode=property |title=Creating a Data Warehouse using SQL Server |pdfUrl=https://ceur-ws.org/Vol-19/paper10.pdf |volume=Vol-19 |dblpUrl=https://dblp.org/rec/conf/dmdw/SorensenA99 }} ==Creating a Data Warehouse using SQL Server== https://ceur-ws.org/Vol-19/paper10.pdf
                                        Creating a Data Warehouse using SQL Server
                        Jens Otto Sørensen                                                                   Karl Alnor
                Department of Information Sciences                                               Department of Information Sciences
              The Aarhus School of Business, Denmark                                           The Aarhus School of Business, Denmark
                           jos@hha.dk                                                                       kaa@hha.dk


                                                                                       contains information about a fictitious book distribution
                                                                                       company. An ER diagram of the Pubs database can be
                                           Abstract:                                   found in appendix A. There is information about
        In this paper we construct a Star Join Schema                                  publishers, authors, titles, stores, sales, and more. If we
        and show how this schema can be created                                        are to believe that this is the database of a book
                                                                                       distribution company then some of the tables really do
        using the basic tools delivered with SQL
                                                                                       not belong in the database, e.g. the table "roysched" that
        Server 7.0. Major objectives are to keep the
                                                                                       shows how much royalty individual authors receive on
        operational database unchanged so that data
                                                                                       their books. Likewise the many-to-many relationship
        loading can be done without disturbing the
        business logic of the operational database. The                                between authors and titles has an attribute "royaltyper"
        operational database is an expanded version of                                 indicating the royalty percentage. A book distribution
                                                                                       company would normally not have access to that kind
        the Pubs database [Sol96].
                                                                                       of information, however, these tables do not cause any
                                                                                       trouble in our use of the database.
1       Introduction
                                                                                       The Pubs database does not have a theoretically clean
SQL Server 7.0 is at the time of writing the latest                                    design, some tables are even not on second normal
release of Microsoft's "large" database. This version of                               form. Therefore it does not lend itself easily to a Star
the database is a significant upgrade of the previous                                  Join Schema and as such it presents some interesting
(version 6.5) both with respect to user friendliness as                                problems both in the design of the Star Join Schema
well as with respect to capabilities. There is therefore in                            and in the loading of data. In the book [Sol96] there is
our minds no question about the importance of this                                     an expanded version (expanded in terms of tuples, not
release to the database world. The pricing and the                                     tables) of the Pubs database containing some 168,725
aggressive marketing push behind this release will soon                                rows in the largest table.
make this one of the most used, if not the most used,
database for small to medium sized companies.
                                                                                       2     Setting up a Star Join Schema on the
Microsoft will most likely also try to make companies
use this database as the fundament for their data                                            Basis of the Pubs Database
warehouse or data mart efforts [Mic98c].                                               We need to decide on what the attributes of the fact
We therefore think that it is of great importance to                                   table should be and what the dimensions should be.
evaluate whether MS SQL Server is a suitable platform                                  Since this is a distributor database we do not have
for Star Join Schema Data Warehouses.                                                  access to information about actual sales of single books
                                                                                       in the stores, but we have information on "shipments to
In this paper we focus on how to create Star Join                                      bookstores". In [Kim96] there is an ideal shipments fact
Schema Data Warehouses using the basic tools                                           table. We do not have access to all the attributes listed
delivered with SQL Server 7.0. One major design                                        there but some of them.
objective has been to keep the operational database
unchanged.                                                                             2.1    Fact Table
The Pubs database is a sample database distributed by
Microsoft together with the SQL Server. The database                                   2.1.1 The Grain of the Fact Table
                                                                                       The grain of the fact table should be the individual
The copyright of this paper belongs to the paper’s authors. Permission to copy
                                                                                       order line. This gives us some problems since the Sales
without fee all or part of this material is granted provided that the copies are not   table of the Pubs database is not a proper invoice with a
made or distributed for direct commercial advantage.                                   heading and a number of invoice lines.
Proceedings of the International Workshop on Design and
                                                                                       The Sales table is not even on second normal form
Management of Data Warehouses (DMDW'99)
                                                                                       (2NF) because the ord_date is not fully dependent on
Heidelberg, Germany, 14. - 15. 6. 1999
                                                                                       the whole primary key. Please note that the primary key
(S. Gatziu, M. Jeusfeld, M. Staudt, Y. Vassiliou, eds.)
http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-19/




J. O. Sørensen, K. Alnor                                                                                                                     10-1
of the Sales table is (stor_id, ord_num, title_id).           2.2.1 Store Ship-to Dimension
Therefore ord_num is not a proper invoice line number.
On the other hand there is also no proper invoice (or         We can use the stores table "as it is".
order) number in the usual sense of the word i.e. usually     Attributes: stor_id, stor_name, stor_address, city, state,
we would have a pair: Invoice_number and                      zip.
invoice_line number, but here we just have a rather
messed up primary key. This primary key will actually         Primary key: stor_id.
only allow a store to reorder a particular title if the
ord_num is different from the order number of the             2.2.2 Title Dimension
previous order. We must create a proper identifier for
                                                              Likewise the title dimension is pretty straightforward
each order line or we must live with the fact that each
                                                              but we need to de-normalize it to include relevant
orderline has to be identified with the set of the three
                                                              publisher information. Titles and publishers have a 1-
attributes (stor_id, ord_num, title_id).
                                                              to-many relationship making it easy and meaningful to
We chose here to create an invoice number and a line          de-normalize. We will then easily be able to roll up on
number, the primary reason being that the stor_id will        the geographic hierarchy: city, state, and country.
also be a foreign key to one of the dimensional tables,       Another de-normalization along a slightly different line
and we don't want our design to be troubled by                is to extract from the publication date the month,
overloading the semantics of that key.                        quarter, and year. In other applications it might be
                                                              interesting to extract also the week-day and e.g. an
2.1.2 The Fact Table Attributes Excluding Foreign             indicator of whether the day is a weekend or not, but we
                                                              will restrict ourselves to month, quarter, and year.
         Keys to Dimensional Tables
                                                              The titles and authors have a many-to-many
Quantities shipped would be a good candidate for a fact       relationship apparently posing a real problem, if we
table attribute, since it is a perfectly additive attribute   were to denormalize author information into the title
that will roll up on several dimensions. One example          dimension. However if we look at our fact table we see
could be individual stores, stores grouped by zip             that there is really nothing to be gained by including
number, and stores grouped by state.                          author information, i. e. there is no meaningful way to
Other attributes to be included in the fact table are order   restrict a query on the fact table with author information
date, list price, and actual price on invoice. If we had      that will give us a meaningful answer.
access to a more complete set of information we would         Attributes: title_id, title, type, au_advance, pub_date,
have liked to include even more facts, such as on time        pub_month, pub_quarter, pub_year, pub_name,
count, damage free count etc. But we will content             pub_city, pub_state, and pub_country.
ourselves with these five attributes.
                                                              Primary key: title_id.
Our initial design for the Shipments Fact tables is as
follows:
                                                              2.2.3 Deal Dimension
Shipments_Fact(Various foreign keys,                          The deal dimension is very small and it could be argued
invoice_number, invoice_line, order_date,                     that it should be included as attributes in the stores'
qty_shipped, list_price, actual_price).                       ship-to dimension, since they are really attributes of the
                                                              store and not of the individual shipment. We have
Later we describe how to load this fact table based on        decided to include it as a dimension in its own right and
the tables in the Pubs database.                              thereby view it as an attribute of the individual
                                                              shipment. Even though this is slightly violating the
2.2   Dimensional Tables                                      intended semantics of the Pubs database, it does give us
We have not enough information to build some of the           one more dimension to use in our tests. We need the
dimensional tables that would normally be included in a       payterms attribute from the Sales table and the
shipment star join schema. Most notable we do not             discounttype and discount attributes from the Discount
have a ship date (!) thereby apparently eliminating the       table.
traditional time dimension, but it pops up again because      Attributes:   deal_id,   payterms,    discounttype,   and
we do have an order date attribute. However we have           discount
chosen not to construct a time dimension. We do not
have ship-from and ship mode dimensions either. What          Primary key: discount_id
we do have is described below.




J. O. Sørensen, K. Alnor                                                                                            10-2
2.2.4 Fact Table Primary and Foreign Keys                                        Pubs database and one to the shipment star join
                                                                                 schema).
The Shipments fact table will have as its primary key, a
composite key consisting of all the primary keys from                            These four connections must then be connected by three
the three dimension tables and also the degenerate                               data transformation flows (also known as data pumps).
dimension: The invoice and the invoice_line number.                              All in all we get the following graphical representation
Remember that in this case as well as in most cases the                          of the DTS Package shown in Figure 2.
invoice dimension, is a degenerate dimension and all
the header information in an invoice really belongs in
other dimensions. One example here is the payterms
attribute that is part of the deal dimension.
Key: stor_id,               title_id,      discount_id,    invoice_no,
invoice_lineno

2.3     The Shipment Star Join Schema
Figure 1 is a diagram of the Pubs shipment star Join
Schema.


 Stores_ship_to_dimension            Shipments_fact           Titles_dimension

 stor_id
 stor_name
                                stor_id
                                title_id
                                                          title_id
                                                          title
                                                                                 Figure 2: Graphical representation of DTS Package for
 stor_address
 city
                                deal_id
                                invoice_no
                                                          type
                                                          au_advance
                                                                                                  loading dimensions
 state                          invoice_linenumber        pub_date
 zip                            order_date                pub_month
                                qty_shipped               pub_quarter
                                list_price                pub_year
                                actual_price              pub_name
                                                          pub_city
                                                                                 Later we will describe each of the three data
      Deal_Dimension
                                                          pub_state              transformation flows one at a time, but first we need the
 deal_id                                                  pub_country
 payterms                                                                        complete definition of the shipment star join schema.
 discounttype
 discount

                                                                                 3.1   The Shipment Star Join Schema
                                                                                 The SQL statements needed to create the star schema is
      Figure 1: Diagram of the Pubs shipment star join
                                                                                 easily derived from the preceding paragraphs and the
                         schema
                                                                                 definition of the Pubs database datatypes. An ER
                                                                                 diagram for the shipment star join schema can be seen
3      Loading the Shipment Star Join                                            in appendix B.
      Schema Dimensions
                                                                                 3.2   The Store Ship-To Dimension Data Trans-
There are several options when you want to load tables.
                                                                                       formation Flow
The most appealing way to load the schema is to build a
Data Transformation Service Package (DTS Package)                                This is the easiest table to load since it can be loaded
using the DTS Designer (Microsoft, 1988b). The DTS                               without any modification from the Pubs database.
Designer is a graphically oriented tool for building a                           The SQL statement to be included in the DTS data
number of connections and a number of data trans-                                pump between the connection Pubs_Stores and the
formation flows to run between the connections.                                  connection PubsShpmtStar is:
Before creating the DTS Package you need to create the
shipment star join schema database and all the tables in                         SELECT
it including keys and other constraints. You then create                           s.stor_id,
connections and data transformation flows for each of                              s.stor_name,
the dimensions, and then finally a connection and a data                           s.stor_address,
transformation flow for the fact table. The data                                   s.city,
transformation flows can be expressed i.a. in SQL                                  s.state,
                                                                                   s.zip
including transformations, but transformation can also                           FROM Stores s
be carried out using an ActiveX script, Jscript,
PerlScript, or VBScript.                                                         The properties sheet for the data pump is shown in
To load the dimensions we create a DTS package with                              Figure 3. Please note that there are four tabs on the
four connections (three identical connections to the                             properties sheet: Source, Destination, Transformations,
                                                                                 and Advanced.



J. O. Sørensen, K. Alnor                                                                                                             10-3
                                                          Please note the use of a non-standard SQL function to
                                                          extract the month, quarter, and year information from
                                                          the pubdate attribute. DatePart(yy, t.pubdate) should in
                                                          correct SQL92 standard syntax be EXTRACT (YEAR
                                                          FROM t.pubdate) [Can93, p. 166], but this standard
                                                          SQL92 syntax is not supported by Microsoft SQL
                                                          Server 7.0.

                                                          3.4   The Deal Dimension Data Transformation
                                                                Flow
                                                          The deal dimension is a bit more difficult as we need to
                                                          generate values for the deal_id. This will be done
                                                          automatically since we have defined deal_id to be of
                                                          type "int IDENTITY". This means that each subsequent
                                                          row is assigned the next identity value, which is equal
                                                          to the last IDENTITY value plus 1.
                                                          According to our analysis in the Deal Dimension
                                                          section on page 2, the deal dimension should have one
                                                          row for each combination of a unique payterms and
                                                          discounttype.
                                                          If we let

  Figure 3: Properties sheet for the Store ship-to data   p = SELECT COUNT(DISTINCT s.payterms)
                         pump                                FROM Sales s
                                                          n = SELECT COUNT(d.discounttype)
                                                             FROM Discounts d
On the second tab, Destination, the destination table
must be selected from a drop-down list. Because this is   Then the cardinality of the deal dimension will be p*n.
basically just a copy from one table to another we do     The SQL statement to be included in the DTS data
not need to adjust anything on the third tab,             pump between the connection Pubs_Titles and the
Transformations. The correct default mapping between      connection PubsShpmtStar is.
attributes based on the SELECT statement is already
suggested.                                                SELECT
                                                            payterms          = p_sales.payterms,
                                                            discounttype      = d.discounttype,
3.3   The Title Dimension Data Transformation
                                                            discount          = d.discount
      Flow                                                FROM (select distinct payterms from
The SQL statement to be included in the DTS data          sales) AS p_sales, discounts d
pump between the connection Pubs_Titles and the
connection PubsShpmtStar is:                              Please note that we must not have a join condition in
                                                          the SQL statement, because we want the Cartesian
                                                          product of the two tables. Also please note that the
SELECT                                                    suggested default mapping between attributes is wrong,
  t.title_id,                                             therefore the Transformations section of the properties
  t.title,                                                sheet for the deal dimension must be changed to what is
  t.type,                                                 shown in Figure 4. The DTS Package can now be
  t.advance,                                              executed, and all three dimension tables will be loaded.
  t.pubdate,
  DatePart(mm, t.pubdate) AS pub_month,
  DatePart(qq, t.pubdate) AS pub_quarter,
  DatePart(yy, t.pubdate) AS pub_year,
  p.pub_name,
  p.city,
  p.state,
  p.country
FROM Titles t, Publishers p
WHERE t.pub_id = p.pub_id




J. O. Sørensen, K. Alnor                                                                                     10-4
                                                            4.1.1 Step 1: Creating Intermediary Tables
                                                            The first step is to create two intermediary tables,
                                                            Invoice_Head and Extended_Sales.
                                                            Invoice_Head has four attributes, invoice_no (no line
                                                            numbers yet), stor_id, ord_num, and deal_id. The
                                                            invoice_no attribute is declared as type INT
                                                            IDENTITY.

                                                            CREATE TABLE Invoice_Head (
                                                               stor_id    VARCHAR(4) NOT NULL,
                                                               ord_num    VARCHAR(20) NOT NULL,
                                                               invoice_no INT IDENTITY,
                                                               deal_id    INT,
                                                               CONSTRAINT PK_Invoice_Head
                                                               PRIMARY KEY (stor_id, ord_num))

                                                            Extended_Sales have the same attributes as the sales
                                                            table plus four additional attributes, deal_id, price,
                                                            invoice_no and invoice_lineno and must have the same
                                                            primary key (stor_id, ord_num, title_id) as sales,
                                                            otherwise the cursor-based updates below will fail.
Figure 4: Transformations for the Deal Dimension data
                        pump                                CREATE TABLE Extended_sales (
                                                               stor_id    VARCHAR(4) NOT NULL,
                                                               ord_num    VARCHAR(20) NOT NULL,
                                                               ord_date   DATETIME NOT NULL,
4      Loading the Shipment Star Join                          qty        SMALLINT NOT NULL,
                                                               payterms   VARCHAR(12) NOT NULL,
      Schema Fact Table                                        title_id   VARCHAR(6) NOT NULL,
There are a number of problems here. The most severe           deal_id    INT,
                                                               price      MONEY NOT NULL,
is the assignment of invoice numbers and invoice line
                                                               invoice_no INT NULL,
numbers to each sale. A preliminary analysis of the            invoice_lineno INT NULL,
sales table indicates that most invoices only have one      CONSTRAINT PK_Extended_Sales
invoice line, but there are also invoices with a number     PRIMARY KEY (stor_id, ord_num, title_id))
of invoice lines ranging from 2 to 4. That is, the Sales
table in the operational database is already                4.1.2 Step 2: Populating the Invoice_Head Table
denormalised in a way that supports the dimensional
                                                            Because Invoice_Head.invoice_no has been declared as
design.
                                                            INT IDENTITY, SQL Server will generate a new
Please remember that we have changed the semantics of       value, which is equal to the last IDENTITY value plus
how a discount is given. In the original Pubs database a    1, for each inserted row, starting with the value one.
discount is something a given store receives, and not all   Therefore the SELECT statement below has no
stores are given discounts. However, payterms are           reference whatsoever to the invoice_no attribute. This
assigned, apparently at random. In our interpretation a     SELECT statement is part of the data pump between
certain combination of payterms and a discounttype          the connections Pubs and PubsShpmtStar in figure 5
(with associated discount percentage) is given on every     and with the proper selections on the Destination and
sale (e.g. invoice number) posing us with the problem       Transformations tabs (not shown here) Invoice_Head
of assigning deal_id's to the sales fact table. The best    will be populated.
solution would appear to be assigning a random deal_id
to every sale.                                              SELECT
                                                               s.stor_id,
4.1    Assigning Invoice - and Invoice Line Numbers            s.ord_num,
       to the Fact Table                                       ROUND(RAND()*9+1, 0, 1) AS deal_id
                                                             FROM sales s
This has to be done via a number of steps and will           GROUP BY s.stor_id, s.ord_num
involve some intermediary tables as well as a step using
an updateable cursor.                                       Now we have a table with unique invoice numbers that
                                                            through joining over (stor_id, ord_num) can be



J. O. Sørensen, K. Alnor                                                                                     10-5
connected to each sale. The table's invoice_no attribute    Step 3.3: Insert invoice line numbers into
will be populated with a sequence starting with one, and    Extended_Sales.
the deal_id will be populated with random numbers in        Inserting the correct line numbers will involve a cursor,
the range 1 to n*p (= 9 in our particular case) as          becauce we need to go down the rows of the table one
discussed in the introduction to this section. Care must    by one and compare the previous invoice_no with the
be taken when using the RAND function without a             current, and thereby decide whether the line number
seed. The SQL Server books online [Mic98a] indicates        attribute should be incremented by one or reset to one.
that repeated calls to RAND will generate a sequence of     Please note the ORDER BY clause in the cursor, this is
random numbers, but only if the calls are a millisecond     essential, ensuring that when a break is encountered it is
or more apart. Even on moderately fast machines many,       guaranteed that we will not meet that invoice number
many rows will be populated within a millisecond.           again further down the table.

4.1.3 Step 3: Populating the Extended_Sales Table           /* auxillary variables */
                                                            DECLARE
Step 3 is most conveniently done by dividing it into 3
                                                              @prev_inv_no INT,
substeps.                                                     @cur_inv_no INT,
                                                              @inv_lineno INT,
Step 3.1: Populating the Extended_Sales table with          /* Extended_Sales attribute variables */
sales and prices.                                             @stor_id VARCHAR (4),
A simple DTS data pump will populate Extended_Sales           @ord_num VARCHAR(20),
                                                              @title_id VARCHAR (6),
with values from the operational database by joining          @invoice_no INT,
Sales and Titles. The involved SQL statement is:              @invoice_lineno INT

SELECT                                                      DECLARE Extended_Sales_cursor
  s.stor_id,                                                  CURSOR FOR
  s.ord_num,                                                    SELECT stor_id, ord_num, title_id,
  s.ord_date,                                               invoice_no, invoice_lineno
  s.qty,                                                        FROM Extended_Sales
  s.payterms,                                                   ORDER BY invoice_no
  s.title_id,                                                 FOR UPDATE OF invoice_lineno
  t.price
FROM Sales s JOIN Titles t ON s.title_id                    OPEN Extended_Sales_cursor
= t.title_id
                                                            /* Initialize and fetch first before
Furthermore the default Transformations will not do, it     entering loop */
has to be adjusted. This is very simple and is not shown    FETCH NEXT FROM Extended_Sales_cursor
                                                              INTO @stor_id, @ord_num, @title_id,
here.                                                       @invoice_no, @invoice_lineno
Please note that step 2 and step 3.1 can be done in
parallel.                                                   SET @prev_inv_no = @invoice_no
                                                            SET @inv_lineno = 1
Step 3.2: Update Extended_Sales with invoice
                                                            WHILE (@@fetch_status = 0)
numbers.                                                    /* equals not(eof) in 3th gen. languages*/
The Invoice_head table will be used to insert the correct   BEGIN
new invoice_numbers into the expanded copy of the             UPDATE Extended_Sales
sales table.                                                    SET Extended_Sales.invoice_lineno =
                                                            @inv_lineno
                                                                WHERE CURRENT OF
UPDATE extended_Sales                                       Extended_Sales_cursor
SET invoice_no = i.invoice_no
FROM Invoice_Head i                                           FETCH NEXT
WHERE Extended_Sales.ord_num = i.ord_num                        FROM Extended_Sales_cursor
AND   Extended_sales.stor_id =i.stor_id                         INTO @stor_id, @ord_num, @title_id,
                                                            @invoice_no, @invoice_lineno
Please note that this time we do not need a data pump
between the two connections, because we are operating         SET @cur_inv_no = @invoice_no
entirely within the PubsShipmStar database. Therefore           IF @cur_inv_no = @prev_inv_no
we use an "Execute SQL Task" [Mic98b]. See also                   SET @inv_lineno = @inv_lineno +1
figure 5 on page 10-7.                                          ELSE
                                                                  SET @inv_lineno = 1




J. O. Sørensen, K. Alnor                                                                                         10-6
                                                           5   Conclusion
  SET @prev_inv_no = @cur_inv_no
END /* WHILE */                                            In this paper we have concentrated on the logical design
                                                           of a Star Join Schema and how easy this Schema can be
CLOSE Extended_Sales_cursor                                created using the basic tools delivered with SQL Server
DEALLOCATE Extended_Sales_cursor
                                                           7.0. In particular we have demonstrated that all
                                                           transformations have been done without changing the
This code is contained in the Execute SQL Task named
                                                           operational database at all.
Step 3.3 in figure 5.
                                                           We have not compared with other databases and we
4.1.4 Step 4: Populate Shipments_Fact Table                have not done extensive timings, since it is very
                                                           dependent on the disk subsystem, the processor power
We have both the Deal_Dimension and the Exten-             avaliable and the amount of physical RAM. Also
ded_Sales tables making it relatively easy to insert the   equally important are the choices of indexing and size
relevant tuples into the Shipment_Fact table. Once         of key attributes. In a short paper like this we can not
again we will use an Execute SQL Task, see figure 5        possibly cover all aspects of traditional Star Join
below. The SQL statement is:                               Schema data warehousing, and we have not looked into
                                                           other very interesting issues such as programming
INSERT INTO Shipments_Fact                                 support in traditional third generation languages, end
   (stor_id, title_id, deal_id,                            user tools etc.
   invoice_no, invoice_lineno, order_date,
   qty_shipped, list_price, actual_price)                  In particular we are impressed with the relatively ease
(SELECT e.stor_id, e.title_id, e.deal_id,                  of use compared to the complexity of the tasks that
   e.invoice_no, e.invoice_lineno,                         SQL Server 7.0 offers. Microsoft SQL Server 7.0 has
   e.ord_date, e.qty, e.price,                             many of the tools required to build a traditional Star
   (1 - d.discount/100)*e.price                            Join Schema Data Warehouse from the ground based on
 FROM                                                      an operational database with transaction oriented tables.
   Extended_Sales e JOIN Deal_Dimension d
   ON e.deal_id = d.deal_id)                               Notably, the DTS Packages allow for an integrated
                                                           approach to loading and updating the data Warehouse.
These four steps conclude the loading of the fact table.
                                                           Although more ways to get data into a datawarehouse
Figure 5 below is a picture of the whole process. Please
                                                           exist, we believe that the easiness offered by the DTS
note that the dotted arrows indicate that the
                                                           Packages will be a very strong selling point.
transformation step is dependent on the succes of the
previous step.                                             In our opnion the best procedure is to build a prototype
  Step 2             Step 3.1                              of the data warehouse along the lines given in this paper
                                                           and then test it either on the actual hardware, or if that
                                                           is not possible on hardware that have relatively well
                                                           understood scaling capabilities. This way building a
                                                           traditional data warehouse could be a process very
                                                           much like developing systems with prototypes and
                                                           substantial user involvement.
                                                           Future work along these lines could include a closer
                                                           look at the hardware required to support a large data
                                                           warehouse and various performance issues. It would
                                                           also be interesting to look closer at the OLAP Server
                                                           and the PivotTable Service, both middle-tier tools for
                                                           analyzing and presenting the data included with SQL
                                                           Server.



   Figure 5: DTS Package for loading the fact table




J. O. Sørensen, K. Alnor                                                                                        10-7
6   Appendix A: ER Diagram for the Pubs Database




7   Appendix B: ER Diagram for the Pubs Shipment Star Join Schema




J. O. Sørensen, K. Alnor                                            10-8
                                                             References
8     Appendix C: Commands Needed to
      Create BigPubs                                         [Sol96] D. Solomon, R. Rankins, K. Delaney, and T.
                                                                     Dhawan, Microsoft SQL Server 6.5 Unleashed,
8.1    Creation of the schema of the operational                     2 ed. Indianapolis: SAMS Publishing, 1996.
       database BigPubs.                                     [Mic98a] Microsoft Corporation, “Using Mathematical
Please note that all testing has been done on SQL                    functions,” in Accessing and Changing Data.
Server 7.0 released version.                                         Included in the SQL Server Books Online. SQL
                                                                     Server v. 7.0: Microsoft Corporation, 1998.
Please also note that the script used to create the sample
database schema comes from [Sol96] and is originally         [Mic98b] Microsoft Corporation, “DTS Designer,” in
intended to be used with SQL Server 6.5, but version                Data Transformation Services. Included in
7.0 is capable of running it unmodified.                            SQL Server Books Online. SQL Server v. 7.0:
                                                                    Microsoft Corporation, 1998.
8.2    Bulk Loading of Data Into the Operational             [Mic98c] Microsoft Corporation, “Microsoft Back-
       Database                                                      Office - SQL Server 7.0 Data Warehousing,” ,
These commands have been executed from SQL Query                     1998.
Analyser.                                                            http://www.microsoft.com/backoffice/sql/70/g
                                                                     en/dw.htm. Read on 01-Dec-1998.
                                                             [Kim96] R. Kimball, The Data Warehouse Toolkit.
BULK INSERT authors      FROM                                       New York: Wiley Computer Publishing, 1996.
"D:\Source\authors.bcp"
BULK INSERT discounts    FROM                                [Can93] S. J. Cannan and G. A. M. Otten, SQL - The
"D:\Source\discounts.bcp"                                            Standard Handbook. Maidenhead, UK:
BULK INSERT employee     FROM                                        McGraw-Hill Book Company Europe, 1993.
"D:\Source\employee.bcp"
BULK INSERT jobs         FROM
"D:\Source\jobs.bcp"                                         Acknowledgements
BULK INSERT pub_info     FROM
"D:\Source\pub_info.bcp"                                     This work is supported in part by the Danish Social
BULK INSERT publishers FROM                                  Science Research Council, Project no. 9701783.
"D:\Source\publishers.bcp"                                   Part of the work was done while the first author was
BULK INSERT roysched     FROM
                                                             visiting the Nykredit Center for Database Research,
"D:\Source\roysched.bcp"
BULK INSERT sales        FROM                                Aalborg University in the winter of 1998/99.
"D:\Source\sales.bcp"
BULK INSERT stores       FROM
"D:\Source\stores.bcp"
BULK INSERT titleauthor FROM
"D:\Source\titleauthor.bcp"
BULK INSERT titles       FROM
"D:\Source\titles.bcp"

It turns out that one book does not have a price, which
is unacceptable in our use of the database. Therefore the
following SQL update has been performed on BigPubs.

UPDATE Titles SET price = 12.0 WHERE
title_id = "PC9999"




J. O. Sørensen, K. Alnor                                                                                    10-9