<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Data Warehouse Architecture for MeteoSwiss: An Experience Report</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Christian Häberli</string-name>
          <email>chi@sma.ch</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dimitrios Tombros</string-name>
          <email>dimitrios.tombros@stcg.ch</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>MeteoSwiss</institution>
          ,
          <addr-line>Krähbühlstr. 58, 8044 Zurich</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Swiss Technology Consulting Group AG</institution>
          ,
          <addr-line>Technoparkstr. 1, 8005 Zurich</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1993</year>
      </pub-date>
      <abstract>
        <p>In this experience report we consider some issues in the development of a data warehouse for meteorological and climatology data at MeteoSwiss. The paper describes the meteorological and climatology data process, the data warehouse architecture developed to support this process while integrating legacy systems and some of the modelling issues encountered.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>Efficient data storage and manipulation is a prerequisite in
the meteorological and climatology domain. Large
quantities of numerical and multi medial data are collected
on a frequent basis from automatic measurement stations.
This data includes measurements of several weather
parameters on a 10-minute, hourly or longer basis. The
collected data undergoes various quality control and
consistency check procedures during its lifetime that
effectively almost permanently update it. At the same time
the data is continuously used for analysis by
meteorological and climatology applications. It often
undergoes aggregations according to special formulas.
Thus different kinds of use patterns must be supported
which pose severe performance problems.</p>
      <p>In this paper we describe the data warehouse architecture
developed for the Swiss national weather service,
The copyright of this paper belongs to the paper’s authors. Permission to
copy without fee all or part of this material is granted provided that the
copies are not made or distributed for direct commercial advantage.</p>
      <sec id="sec-1-1">
        <title>Proceedings of the International Workshop on Design and</title>
      </sec>
      <sec id="sec-1-2">
        <title>Management of Data Warehouses (DMDW'2001)</title>
        <p>Interlaken, Switzerland, June 4, 2001
(D. Theodoratos, J. Hammer, M. Jeusfeld, M. Staudt, eds.)</p>
        <p>During the past five years relational database management
systems have been gradually introduced in many
meteorological organizations to substitute proprietary
filebased application storage concepts [Moe 1999]. Recently,
a few projects are being implemented using
objectrelational database technology. The main drivers for the
use of this technology in meteorological organizations
have been the need to substitute ageing, insufficiently
documented systems and the development of new
webbased applications that is well supported by many modern
DBMS. However, the solutions implemented are for the
most based on ad hoc architecture resulting in complex
and heterogeneous database landscapes. It is typical for
our specific application domain, that only parts of the data
are stored under the control of a DBMS while files are
used for other parts. Furthermore, multidimensional
modelling as applied in this project has to the best of our
knowledge not been used in meteorological and
climatology databases until now.
The use of meteorological data is mainly twofold:
q
q</p>
        <p>Weather forecasting, where quick access to actual
data is important;
Climatology, where flexible access to high quality
information about past weather is important.</p>
        <p>As a national weather service, MeteoSwiss is responsible
per legal mandate to collect and transmit to the ‘Global
Telecommunication System’ (GTS) of the ‘World
Meteorological Organization’ (WMO) meteorological and
climatology measurements and observations from the
territory of Switzerland. In order to provide meteorological
services for the public (weather forecasts, alerts etc.),
MeteoSwiss makes extensive use of data (observations
and products) transmitted via the GTS from other
countries.</p>
        <p>The life cycle of meteorological/climatology data is shown
schematically in Figure 1.
t/nem itano
re rv
u e
saeM sbO
a
t
a
D
1
l
e
v
e
L
n
o
i
t
c
e
ll
o
C
ino&amp; lro
t t
ram cno
o y
sfn lit</p>
        <p>a
raT qu
a
t
a
D
2
l
e
v
e</p>
        <p>L
Raw Data Repository
fo itno
llitaacuno isannged
c s
l,ro iit</p>
        <p>e
t t
n n
tcyo auqed
li
aqu irved</p>
        <sec id="sec-1-2-1">
          <title>Control data</title>
          <p>a
t
a
D
3
l
e
v
e
L</p>
        </sec>
        <sec id="sec-1-2-2">
          <title>External Data Archives</title>
          <p>lanod rrrso and )e
r e ed itv
tcno lfao ta ac
m re
litquy rvoem (tauo itn
a</p>
        </sec>
        <sec id="sec-1-2-3">
          <title>Meta</title>
          <p>data
a
t
a
D
4
l
e
v
e</p>
          <p>L
aggregation</p>
        </sec>
        <sec id="sec-1-2-4">
          <title>Control data</title>
        </sec>
        <sec id="sec-1-2-5">
          <title>Analytical</title>
          <p>Databases Data
Marts
i
ne i-n
og neito i-no
oHm oogmaz ogeitaz itno</p>
          <p>n
H om az</p>
          <p>H
a
ltvaeeL5D ltvaaeeL5D ltvaaee5D</p>
          <p>L
Applications
Customers</p>
        </sec>
        <sec id="sec-1-2-6">
          <title>Durable</title>
        </sec>
        <sec id="sec-1-2-7">
          <title>Data</title>
        </sec>
        <sec id="sec-1-2-8">
          <title>Archive</title>
          <p>Level 1 data are produced at the measuring site by
transforming raw sensor data (e.g. voltage, counts) to
physical units (e.g. temperature, precipitation rate)
that are meaningful for further meteorological
applications.</p>
          <p>Level 1 to level 2: level 1 data from various sites are
collected by a polling system. Some data are
subsequently transformed to level 2 data by applying
calibration polynomials. Simple quality control
procedures are used to flag gross errors in the data.
Some derived quantities are calculated depending on
the result of the quality control results. Level 1 data
are stored in a ‘raw data repository’. This repository
will be the data source, if the transformation from level
1 to level 2 data has to be repeated using improved
calibration constants. This process is called
‘req
q
evaluation’ (e.g. re-evaluation of radio probe data
using an improved radiation correction scheme).
Level 2 to level 3: level 2 from different polling
systems are integrated and loaded into the data
warehouse system. Similar data from different polling
systems are subjected to uniform quality control
procedures using more rigorous tests than in the
previous transformation step. Plausible values are
used to calculate derived quantities (e.g. vapor
pressure, mean sea level pressure). After the
transformation, these data are tagged as ‘level 3’ data.
They are further used for aggregation, propagated to
analytical databases and used for customer products
and services (e.g. weather forecasting).</p>
          <p>Level 3 to level 4: The core of this transformation step
is the ‘data cleansing’ process [WMO 1990], [WMO
1993]. Temporal and spatial consistency checks are
9-2
added to the quality control procedures from the
previous transformation steps and all checks base on
an integrated dataset. Obviously erroneous data are
corrected automatically and by means of interactive
tools. This transformation step can be performed
several times but the results of each transformation
have to be stored. Cleansing of meteorological data
may affect also historical data for two main reasons:
o New and more sensitive methods allow better
error detection;
o Historical data (mainly digitized from
publications) are added to the dataset and
provide additional information to check the
spatial and temporal consistency of the dataset.
Level 4 to level 5: A selection of level 4 data is
homogenized using context data (especially station
histories). Homogenization in this context means to
remove all non-natural fluctuations (e.g. changed
calibration constants, changed instrument exposition)
from the data series in order to get “a numerical series
representing the variations of a climatology element
with variations which are caused only by variations of
weather and climate [CP 1950]”. The results of this
transformation step are so-called homogenization
values (amounts or factors). These values produce
the homogeneous data series if they are added
to/multiplied with the corresponding level 4 data. This
transformation step can be performed in various
projects using different methods in different contexts.
Each project produces an own set of homogenization
values that are used depending on the customers
needs.</p>
          <p>The quality control and data cleansing procedures of all
transformation steps are based on parameters stored in the
‘control data’. Aggregation is performed on level 3, level 4
or level 5 data and is also based on parameters that are
part of the control data. The term ‘metadata’ means here
‘climatological context data’ (e.g. information about
station location, sensor types, station histories etc.).
4</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>The MeteoSwiss Data Warehouse System</title>
    </sec>
    <sec id="sec-3">
      <title>Architecture</title>
      <p>The architecture strategy of our project is mainly a
wholesale replacement of the operational systems and the
establishment of a new analytical database. Operational
systems in the context of our project are used to transform
the data from level 2 to level 4 whereas the analytical
database is the classical ‘data warehouse’. In the whole
design much emphasis is placed on the design of the
operational systems.</p>
      <p>Climate Information ModelValidation
Service
Bulletins (periodicals) Durable data archive
Yearbooks
Research
Climatology
Weather Forecast</p>
      <p>Z
Z</p>
      <p>Z
.ttsseymm itsrerooyp DMaatrat DMaatrat
taaagdm ttaaad ddaattaa tinratengsrfoartimonation
tem em data scrubbing
legacy
systems sheets</p>
      <p>Analytical Database</p>
      <p>T</p>
      <p>T</p>
      <p>L</p>
      <sec id="sec-3-1">
        <title>Work DB 1</title>
      </sec>
      <sec id="sec-3-2">
        <title>Work DB 2</title>
        <p>decoded Bulletins</p>
      </sec>
      <sec id="sec-3-3">
        <title>Decoder</title>
        <p>obs AGSTCSIIoBfWullMetiOns
Data for the public
Internet
Data delivery and
products (on-/offline)</p>
        <p>Shortrange
analytical DB</p>
        <p>Z
T</p>
        <p>L</p>
      </sec>
      <sec id="sec-3-4">
        <title>Radar Satellite NWP</title>
        <p>Calc, Check &amp; Transform to DB/GRIB/BUFR/ASCII
s
i
s
y
l
a
n
a
a
t
a
d
e
g
a
r
o
t
s
a
t
a
d
a
e
r
a
g
n
i
g
a
t
s
s
e
c
r
u
o
s
a
t
a
d
The conceptual architecture proposed in our project
consists of four layers (see figure 2). This architecture is
not a fully conventional data warehouse. It is rather a
special case where DW technology is combined with
classical relational database technology. This approach is
more flexible to cope with special cases. The
characteristics of the four layers are described in the next
sections.
9-3
4.1 Operational Data Sources
There is a wide variety of sources for meteorological and
climatology data:
q
q
q
q
q
q
q
q</p>
        <p>Observation systems operated by MeteoSwiss and
partners usually provide data in high temporal (10’ or
less) and spatial resolution (e.g. 115 automatic
weather stations for Switzerland) for a very limited
spatial area;
The numerical weather prediction model of
MeteoSwiss produces results in hourly resolution for
51 parameters on 385x325 grid points at 45 vertical
layers;
The GTS of WMO and similar systems provide data
from all over the world in a usually low temporal
resolution (hourly or coarser).</p>
        <p>Various data archives (databases, file systems etc.)
provide historical data that are very important for
climatology research (e.g. climate change research).
Point measurements from automatic weather stations
and weather stations providing hand-written
protocols;
Profile measurements from upper-air stations and
measurements along trajectories;
Picture data and field data from satellites and weather
radars;
Volume data from weather radars and numerical
weather forecast models.</p>
        <p>The data from the sources have different types:
4.2 Staging Area
The staging area comprises all databases and tools that
are necessary for digitizing, integrating, loading, quality
controlling and cleansing the meteorological data. Since
cleansing meteorological data may also affect historical
data it is necessary to store comprehensive datasets in the
so-called work DBs. Special emphasis is given to the
development of data quality control systems and tools for
error correction. The work DBs contain data at lowest level
of granularity and some lightly summarized data.
4.3 Storage Area
This is the classical ‘data warehouse’. In the analytical
database, the meteorologists and climatologists of
MeteoSwiss will find an integrated set of data to support
9-4
their decisions in weather forecasting and climate
research.</p>
        <p>Since data quality is less important compared to quick
access for certain applications, a ‘short range analytical
database’ will be implemented which will be fed directly
from the data source layer.
4.4 Analysis Area
This layer contains various analysis and data visualization
tools, which are not in the scope of the present project.
Various interfaces are provided to access the storage area.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5 Modeling Climatology Data</title>
      <p>The modeling process used in our project is based on the
classical database modeling process distinguishing
between conceptual, logical and physical models extended
by the implementation model approach proposed by
[Martyn 2000]. According to this approach, the logical
model is an ideal relational model that is the input to an
implementation model where performance enhancement
aspects such as vertical table partitioning are considered.
An important consideration for using this approach is to
support transparency in logical design e.g. by explicitly
designating a partitioning step in the design process. The
relevant characteristics of data warehouses include
storage of materialized views and de-normalized,
redundant data tables. These performance enhancement
schema elements are best defined as part of the
implementation design. An added benefit of this approach
for our case is that expert end-user involvement (which is
necessary due to scientific know how needed for the
design of the data model) can be extended into logical data
modeling and the logical data model can be used as a
vendor-independent development specification allowing
DBMS-specific optimization.</p>
      <p>The conceptual model covers all aspects of the data
without considering the technical database realization.
Specifically the conceptual data model has been used as a
map for defining boundaries for subsequent
implementation projects. It has proven a powerful
communication instrument within the project team and for
external project communication.</p>
      <p>The conceptual model of the system has been developed
through a series of coached workshops with extensive
user involvement and through reverse engineering of
existing databases followed by reviews of the results.
Important challenges during the conceptual modeling
where:
Establishing a common vocabulary of the application
domain: existing database systems were inadequately
documented and no consistent terminology and naming
was enforced. An important result of the conceptual
modeling has been the consolidation and precise
definition of used terms.</p>
      <p>Abstracting from current implementations and legacy
systems: often and due to lack of documentation it proved
difficult to extract conceptually relevant information. The
resulting conceptual model has provided a complete and
relatively simple overview of a very complex database
landscape. As an example, the following figure depicts the
conceptual representation of more than 1000 different
kinds of meteorological measurements currently
performed.</p>
      <p>The conceptual model distinguishes between two main
groups of entity types: measurement data for the various
meteorological phenomena and climatology context data
(metadata) required for their correct interpretation.
Measurement data are represented in the subtypes of
entity type “Measurement” in Figure 3.
It is important to note, that measurement data consists in
general of a value, a representation and data regarding the
type of measurement made. Different values may be in fact
aggregated over space and/or time resulting in different
types of measurements. Measurements can be interpreted
only by considering their context data.</p>
      <p>Context data can be divided into description of the
physical measurement environment and information
required for operations on measurement data (e.g.
homogenization, quality control). We note at this point
that the boundaries between the different kinds of context
information are not hard.</p>
      <p>The important role of the context information lies in the
fact that it defines the classification dimensions for
measurement data. However, certain complexities had to
be considered:
9-5
Parameter group</p>
      <p>Parameter</p>
      <p>Year</p>
      <p>Day of year
Reference time</p>
      <p>Point measurement
Value N8,2
Plausbility information MBT
Row status MBT
Mutation information MBT</p>
      <p>Installation
Measure site</p>
      <p>Station
The conceptual model is not the basis for the
implementation of a specific data warehouse system.
Rather, it describes data that are managed and used by
various applications and in various databases. As such a
multidimensional modeling technique optimized for OLAP
(e.g. [SBHD 1998] was not explicitly used; candidate
classification dimensions where informally documented in
the model.</p>
      <p>Large parts of the conceptual model for context
information do not fit a multidimensional model. Control
information for aggregation, homogenization and test
functions are among these parts.</p>
      <p>Due to this fact, multidimensional modeling has been
introduced in the subsequent modeling process for
selected implementation projects. For these, combined E/R
[Chen 1976] and M/ER Model [SBHD 1998] concepts are
used. An example of ME/R modeling is presented in Figure
4. It depicts the modeling of point measurement data and
uses an adapted notation but without modifying the
semantics of the metamodel of M/ER. More specifically
facts are depicted by an n-ary relationship (rounded
rectangle) and classification dimensions are modeled as
entity types. Possible classification relationships are
depicted by arrows; note however, that specialized
aggregation operators should be applied along each
classification dimension and between each classification
step. Snowflake schemas can implement the above
conceptual model. The large numbers of facts in the
database (ca. 1000 parameters, 10-minute time intervals for
ca. 100 years and several hundreds of installations) require
performance enhancement techniques including table
partitioning.</p>
    </sec>
    <sec id="sec-5">
      <title>6 Conclusions</title>
      <p>In the described MeteoSwiss project we have developed
an architecture that, while not a traditional data
warehouse, uses several elements from data warehouse
technology. We support the entire data life cycle and
enforce a unidirectional data flow which allows us to
optimize each data storage for the use patterns it should
support (OLAP vs. OLTP). This approach promises to
solve the long-term performance issues which often
plague meteorological and climatology databases.
[ICDP 2000] ICDP
potsdam.de</p>
      <p>Clearing</p>
      <p>House,
http://icdp.gfz[Konzelmann 1998] Konzelmann, Thomas, Martin Kiene,
Rudolf Doessegger and Gabriela Seiz, 1198: NEW
TREATMENT OF REAL TIME CLIMATE DATA SETS
FROM SMI WEATHER STATIONS. Proceedings 2nd
European conference on Applied Climatology 19-23 Oct
1998.
9-6</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>[Chen</source>
          <year>1976</year>
          ]
          <string-name>
            <given-names>P.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>The Entity Relationship Model - Towards a Unified View of Data</article-title>
          .
          <source>ACM TODS</source>
          ,
          <volume>1</volume>
          :
          <fpage>1</fpage>
          ,
          <year>1976</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [CP 1950]
          <article-title>Conrad</article-title>
          ,
          <string-name>
            <surname>V.</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.W.</given-names>
            <surname>Pollack</surname>
          </string-name>
          ,
          <year>1950</year>
          : 'Methods in Climatology', Harvard University Press.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [Martyn 2000]
          <string-name>
            <given-names>T.</given-names>
            <surname>Martyn</surname>
          </string-name>
          , Implementation Design for Databases: The “Forgotten” Step, IEEE IT Professional, Macrh/April 2000, pp.
          <fpage>42</fpage>
          -
          <lpage>49</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>[Moe 1999] M. Moe</surname>
          </string-name>
          (Ed),
          <source>Proceedings from the Oslo EUMETNET - ECSN Workshop</source>
          ,
          <volume>11</volume>
          .-
          <fpage>12</fpage>
          .
          <year>October 1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [SBHD 1998]
          <string-name>
            <given-names>C.</given-names>
            <surname>Sapiam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Blaschka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Höfling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Dinter</surname>
          </string-name>
          , Extending the
          <string-name>
            <surname>E</surname>
          </string-name>
          /
          <article-title>R Model for the Multimensional Paradigm</article-title>
          ,
          <source>Proc. International Workshop on Data Warehouse</source>
          and
          <article-title>Data Mining (DWDM, in connection with ER)</article-title>
          ,
          <year>November 1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>[WMO 1990] WMO</source>
          ,
          <year>1990</year>
          :
          <article-title>Guide to climatological practices</article-title>
          .
          <source>2nd edition. WMO 100.</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>