<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Semantic-Enabled Transformation Framework for Time Series</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Robert Barta</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thomas Bleier</string-name>
          <email>thomas.bleier@arcs.ac.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>rho information systems rho@devc.at</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ARC Austrian Research Centers GmbH</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Conventional processing of time series is done along a split horizon: on the one hand it has to handle quantitative data organized along the time axis, on the other hand meta data capturing circumstantial facts about the values, or about the time sequence as a whole. We propose to use an integrative approach using a domain speci c language for the transformation of time sequences, covering arithmetic, temporal but also semantic aspects of such computations. In that we leverage Topic Maps as one existing semantic technology.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>It are these patterns which sit at the core of modern environmental monitoring and forecasting systems.
For auditing, but also increasingly legal reasons more and more focus shifts from the data to the meta data,
so that whole processing chains have to reliably keep track on that, how and why particular time series
data is used for a particular decision. Recent EU regulations (INSPIRE) also mandate that environmental
information is properly passed on between countries and public agencies and citizens. This must include
meta information.</p>
      <p>
        The challenge is that quantitative, temporal, spatial and semantic information has to be brought
into one consolidated computational model. In this work we propose such a domain speci c language,
one which operates on time sequences. It should enable to specify transformations, not only based on
the numerical data, but also any semantic data available, be that inside the time sequence or within
an underlying semantic network. For practical reasons the language should be compatible with both
predominant semantic technology stacks, RDF [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and Topic Maps [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], and it should also degrade
gracefully in the absence of any semantic network.
      </p>
      <p>
        Our work is to be understood in the context of SWE [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and is speci cally targeted at enhancing sensor
observation services (SOS [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]). The latter are not only instrumental to expose sensor measurements via
a web service; also time series derived from original sensor values can be o ered by specialized SOSes
(virtual sensors). The necessary meta information to describe virtual sensors in SensorML [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] is directly
linked to the processing model we propose.
      </p>
      <p>Our contribution is
{ to choose and customize a semantic framework to seamlessly host temporal information,
{ to de ne a time sequence transformation language, Formula 3 (F3), and
{ to demonstrate how underlying ontological information can be leveraged to perform informed semantic
transformations.</p>
      <p>In that we will proceed as follows. First we focus on Topic Maps as semantic technology and
recapitulate the most important concepts together with a textual notation of our making. Then we turn to the
query language (subsection 2.2). Using this as baseline, we defend our choice over the more main-stream
RDF/S framework later in section 6.1 (Related Work). Why the TM data model is still suboptimal for
our purposes and how it can be extended we cover in subsection 2.3.</p>
      <p>The following larger section covers the language Formula 3 (F3). In order to keep this presentation
compact, we traded formal grammar rules with canonical examples from the sensor web domain. We only
hint at the fact that behind F3 a (process) algebra de nes the formal semantics. The extension framework
(new data types, operators, kernel functions, etc.) will be covered elsewhere. Section 5 nally demonstrates
how F3 meta data management can be made semantic with the use of an underlying semantic network
and a path expression language adopted from TMQL (query language for TMs).
2</p>
    </sec>
    <sec id="sec-2">
      <title>Temporal Topic Maps</title>
      <p>
        Topic Maps (TM, the ISO standard de nes this in plural) is a knowledge representation framework quite
comparable to the more main-stream RDF/S technology stack. While in the latter all information is
couched in form of triples (subject, predicate, object), basic concepts in TM are designed in a more
high-level, anthropocentric way. In the following we present these concepts in lockstep with a succinct
text notation (AsTMa= [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]).
2.1
      </p>
      <sec id="sec-2-1">
        <title>Factual Information</title>
        <p>Topics represent subjects, which can be anything, physical or not. To further knowledge aggregation,
topic identity can be supported by specially interpreted IRIs. In the case of objects which reside at
certain network locations such identi ers will naturally be URLs. For example, a given SOS deployment
can have its endpoint be used for identi cation:</p>
        <p>demo-sos isa SOS-deployment = http://env05.arcs.ac.at/SOSsrv/
In the notation above such a subject locator IRI is symbolized by pre xing it with =. The topic identi er
demo-sos is only local within the map and can be used there to refer to that topic. If a subject does not
have a network address, then one (or several) subject identi er s can be used for identi cation:
arcs isa organisation ~ http://www.arcs.ac.at/
These identi ers are meant to indirectly identify the subject, such as web sites for organisations, images
for persons, and so forth.</p>
        <p>As also shown in the example above, topics can have types, i.e. are instances of a class. That itself is
just another topic, to be elaborated on in this map or in some peripheral ontology. Topics can also have
any number of names attached, signalled by !:
arcs isa organisation ~ http://www.arcs.ac.at/
! Austrian Research Centers
! acronym : ARCS
! branding: Austrian Institute of Technology
...</p>
        <p>Names can be typed to allow to use di erent names for di erent purposes. While the rst above is just
a name, the next is an acronym, the other a branding. These types are again topics.</p>
        <p>To attach values to topics occurrences can be used. To add, say, a homepage or the number of employees
one would add to the above
...
homepage : http://www.arcs.ac.at/
nrEmployees : 1000
The data types here are implicit (IRI and xsd:integer, respectively), but it can be made explicit as well.
The types of occurrences themselves (homepage and nrEmployees) are further topics.</p>
        <p>Relationships between topics are expressed via associations, whereby every involved topic is a player
of a certain role. The fragment</p>
        <p>provisioning (provider : arcs, service : demo-sos)
means that arcs (in the role provider) provisions the demo-sos (in the role of a service). Obviously
the whole association itself is also of a certain type (provisioning). Notably, an association has no
intrinsic direction. It captures a certain fact, together with all involved parties. Other examples would
be marriages, or|to stay within the theme|observations and measurements. The roles themselves
(provider, service) are also topics to be detailed somewhere to the extent necessary.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>TM Query Language</title>
        <p>
          Instead of using an API into a consolidated topic map, we leverage TMQL [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] as access language. Like
any other query language TMQL has two concerns: (a) to locate and detect certain information in the
queried topic map, and (b) generate output based on the detected information. One familiar type of
output is the tabular form and it can be requested using a SELECT syntax:
select $p / acronym, $s =
where
        </p>
        <p>provisioning (provider: $p, service : $s)
A query processor will rst try to nd all associations which follow the pattern above, i.e. have the
required association type and the given roles. Once such an association is found, the variables $p and
$s will be bound to the respective players in the captured association. On the outgoing side, $p and $s
will be used in the SELECT clause to evaluate path expressions. The expression $p / acronym would
evaluate to all acronyms of what $p is currently bound to. The expression $s = would return all subject
locators of the topic bound to $s. The overall result would be:</p>
        <p>"ARCS", "http://env05.arcs.ac.at/SOSsrv/"</p>
        <p>The query language is exible enough to also generate XML output, not as string via text templates,
but in an internal representation (DOM). For this, one has to switch into FLWR (For, Let, Where,
Return) style:
return
&lt;services&gt;{
for $p in // organisation,</p>
        <p>$s in // web-service
where</p>
        <p>provisioning (provider : $p, service : $s)
return</p>
        <p>&lt;service href="{$s =}"&gt;{$p / acronym}&lt;/service&gt;
}&lt;/services&gt;
While the WHERE clause remains the same, the variables and the values over which they range are
made explicit. In the case of $p it should be all instances of organisation and for $s all instances of
web-service. The returned content is now organized as an XML structure. The expected output would
then be:
&lt;services&gt;</p>
        <p>&lt;service href="http://env05...at/SOSsrv/"&gt;ARCS&lt;/service&gt;
&lt;/services&gt;</p>
        <p>Additionally we assumed here that a SOS-deployment is a subclass of a web-service. Only then
taxonometric reasoning will deliver the above result.</p>
        <p>Inside such an XML query string it is also trivial to embed further topic map information, such as
the name of the service provider:
&lt;ows:ProviderName&gt;</p>
        <p>{$p / acronym || $p / name}
&lt;/ows:ProviderName&gt;</p>
        <p>The example also shows how TMQL expressions can be used to deal with incomplete or highly variable
data. Above, for instance, we looked rst for provider acronyms. If there were none, the query would fall
back to the full name for the provider (|| is the shortcut 'or').</p>
        <p>Naturally TMQL supports loops over repetitive items, so it is straightforward to include, say, a list
of SOS o erings:
return
&lt;ows:Parameter name="offering"&gt;
&lt;ows:AllowedValues&gt;{
for $o in // offering [ . &lt;-&gt; location == vienna ]
return</p>
        <p>&lt;ows:Value&gt;{$o !}&lt;/ows:Value&gt;
}&lt;/ows:AllowedValues&gt;
&lt;/ows:Parameter&gt;
The path expression // offering will compute all instances of offering in the map; notably not only
direct ones, but also instances along any subclass hierarchy existing the map. Then the path expression
continues with a lter (indicated by [] brackets). It only passes those things (each thing referenced with
.) which have an association of type location with a topic vienna.</p>
        <p>One by one, each Viennese o ering is bound to the variable $o. With such a binding the RETURN
clause is evaluated. It will extract the topic identi er (via $o !) and embed that into the XML fragment.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Extending the TM Model</title>
        <p>
          While the generic Topic Map Model (TMDM [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]) is su ciently equipped to host all information we
need for the experiment, it does not do it elegantly, or e ciently. Rather than to shoehorn measurement
data, temporal and spatial information into the model, we decided to experiment with rather minimal
extensions to the o cial TM data structure. Naturally, these extensions will propagate to the notation
and further to the query language.
        </p>
        <p>The rst step is to allow numerical values to have physical units, such as 5 kg or 20 mg / m^3.
Rather than to host quantity and the value inside a topic or a dedicated association, we prefer to de ne a
new basic data type. Accordingly, the impact on the model is minimal. Only the notation to create map
content has to allow units:
temperature-vienna isa temperature
value: 18 celsius
The very same notation is extended into TMQL, also to compare values and perform computations with
units.</p>
        <p>Another modi cation concerns how values and topics can be related. According to the standard model
literal values can only be hosted inside occurrences. To make them take part in an association one would
have to create a stub topic to hold the occurrence with the value. We lift this rather arbitrary restriction
and allow values also to directly take part in associations, albeit only as players. As a secondary bene t
we can now interpret occurrences as specialized associations, and names as specialized occurrences.</p>
        <p>A more dramatic model extension is needed to naturally host time sequences. These are the most
prevalent data structure in our targetted application domain, so an e ective coverage greatly a ects the
scalability of any semantic system, both in terms of speed and complexity.</p>
        <p>One particular value, say, a measurement inside such a sequence could be captured with an association:
measurement (value : 30 mg / m^3,</p>
        <p>phenomenon : SO2)
measurement (value : 30 mg / m^3,
phenomenon : SO2,
time : 2009-03-07T17:01)
Theoretically, the time aspect can be added using a prede ned role time:</p>
        <p>The downside of this approach is that the lack of any temporal role inside an association leaves that
time aspect open to interpretation. And so does the case when more than one such role exists. This makes
interpretation by query processors di cult.</p>
        <p>Another alternative would be to use the Topic Maps scope, an already existing mechanism to restrict
the validity of an association. But scope is not a very well de ned concept and is used for other contexts
as well.</p>
        <p>Instead, we rede ned associations to have a time stamp, one which always exists. Such strict
interpretation enables query processors to perform interval algebra operations (inside, outside, overlap, ...). If the
timestamp is left unde ned, then the association will range over all times. As physical events are never
instantaneous but interval-based, an interval length can be added to the time stamp. We do allow the
interval to be positive or negative to express subtle, but ultimately important information about when
the value is created and whether its validity extends into the future or the past. Of course, the interval
can be zero, covering the theoretical instantaneous case.</p>
        <p>A typical example using time stamps with negative intervals is that of the gliding mean: The time
when a mean value is computed will become its time stamp. The length of the time window over which
the mean was built will be pointing into the past. Alternatively, mean values can also be computed over
future time windows as is the case in non-causal systems.</p>
        <p>On the notational side, this extension is trivial; we simply allow time stamps and time intervals to be
added to an association:
measurement (value : 30 mg / m^3,</p>
        <p>phenomenon : SO2 )
at 2009-03-07T17:09:37 - 3 hours
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Formula 3</title>
      <p>F3 is a functional language that transforms time sequences. Time sequence processors (TSP, Fig. 1) can
consume any number of sequences on the incoming side.</p>
      <p>TSPs are called a source if no sequence is expected. Typically these are constants or data fetched
from a database backend. TSPs can produce any ( nite) number of sequences on the outgoing side; sinks
produce nothing and or are used for debugging, visualisation or again, database storage.</p>
      <p>When a TSP is triggered into evaluation, it will consume a certain number of sequences on the
incoming side. With these (and an additional variable binding to ne-control its behavior) the TSP will
perform its computation. If there are still sequences left on the incoming side, then the computation
will be repeated with those, continuing until the incoming side is exhausted. All partial results will be
combined into one outgoing sequence of time sequences. A greedy TSP is one which consumes always all
incoming sequences.
3.1</p>
      <sec id="sec-3-1">
        <title>Time Series Abstract Data Model</title>
        <p>While F3 makes no assumptions about the provenance of a time sequence, it has the abstract expectation
that it is a linear array of chronologically ordered slots.</p>
        <p>The time information within the slot is not just a time stamp (with a system speci c precision).
A time duration marks the temporal extension of the slot. That duration can be positive or negative,
depending on whether the validity of the slot reaches into the future or the past. Slots also have a logical
time which is the index in the sequence (starting with 0).</p>
        <p>The payload in the slot has the form of key/value pairs. Keys are either simple identi ers or take
the form of QNames or IRIs. Values are either anything of the former or literals such as strings, integer,
oats or application-speci c objects such as images and matrices. They can also be time durations or
time patterns.</p>
        <p>When slots are combined into a sequence obviously their time stamp and their signed durations have
to be honored. Any temporal overlaps have to be resolved to arrive at a functional time sequence, i.e. one
which can deliver one slot for one particular time stamp.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Virtual Machine Operators</title>
        <p>F3 de nes a minimal set of primitive TSPs. As a whole they cover all possible computation patterns as
all high-level language elements can be compiled into this set. Ignoring optimization, any implementation
of F3 will only have to implement these operators.</p>
        <p>{ Null: This operator takes one sequence and creates none.
{ Nmap: This operator takes one sequence and iterates over all slots. On each of them a lambda expression
is evaluated returning a new slot. A new sequence is constructed from these slots.
{ Nreduce: This operator takes one sequence and iterates over all slots. On each of them it will evaluate
a lambda expression which aggregates the slot into an aggregate slot. That will be the only slot in
the outgoing sequence.
{ Nfork: This operator takes one sequence and evaluates a lambda expression on each slot. The result
values are used for classi cation in that one outgoing sequence is generated for each di erent value,
holding only those slots which produced exactly that value.
{ Ngrep: This operator takes one sequence and evaluates a lambda expression on each slot. Slots for
which that result is empty are discarded. With the others an outgoing sequence is constructed.
{ Tfork: This operator takes one time sequence and slices it according to a time pattern and a window
size. The time pattern (for instance every 3 hours ) de nes a number of time stamps, all computed
relative to the start of the incoming time sequence. The time stamps are shifted along the window size,
resulting in a number of individual time windows. These are used for slicing the incoming sequence
into individual time sequences.
{ Tjoin: This operator is the inverse of Tfork. It joins all incoming time sequences into one. Any
temporal overlaps will be resolved.
{ Tee: This is the identity operator. It echos all incoming sequences. It is used for debugging and
visualisation.</p>
        <p>3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Surface Syntax Operator</title>
        <p>While applications can use an API to compose primitive TSPs for complex processing patterns,
formulating transformation algorithms is more convenient using a compact syntax. Such a transformation
can consist of (a) parameters for ne-control, (b) formal parameters for time sequences expected by the
operator, and (c) blocks to generate values along a particular timeline.</p>
        <p>At the beginning of a TSP de nition a parameter block can de ne one or more modalities of that
operator:
#-- modal parameters
---------{ progress =&gt; 30 mins }
Each parameter can be associated with a default value which the invoking application can override.</p>
        <p>This is following by a list of formal time sequence parameters. The example TSP expects two sequences,
one which holds wind directions measured in degrees and another holding wind speeds:
#-- time pattern
-------------every $progress
#-- value generation
---------&lt; @Speed(t) if ( @Direction(t) &lt; 30</p>
        <p>or @Direction(t) &gt; 330 ) &gt;
#-- meta data
----------------{ phenomenon =&gt; channel-0-speeds }</p>
        <p>If a TSP is to return time sequences, then at least one block to generate values has to be declared. The
rst section of that handles the temporal aspect via the speci cation of a time pattern to be used, the
second controls the quantitive aspect containing numerical computations, and the third aspect handles
the meta data generation for that outgoing sequence.</p>
        <p>First the time pattern every 30 mins is used to compute time stamps starting from the earliest of the
two incoming sequences. The time duration 30 mins is taken from the modal parameter progress which
is available as variable. For each of these times the @Direction sequence is sampled (under whatever
interpolation regime that sequence is in) and that value is tested against the range -30 .. 30. If the
direction falls within that range, the @Speed sequence is sampled at the same time and a value with the
corresponding timestamp put into an outgoing slot. These slots are collected into an outgoing sequence.
That is nally enriched by the meta information manifesting that these are speeds for a certain direction
channel.
3.4</p>
      </sec>
      <sec id="sec-3-4">
        <title>Modal Parameters</title>
        <p>To ne-control the behavior of TSPs modal parameters can be declared using a simple key/value scheme.
Key names are reinterpreted as variables to be used throughout the rest of a TSP de nition. Values can
be undef, any constant but also value expressions or time patterns.</p>
        <p>The invoking application can optionally rede ne the value of a modal parameter. If it does not, then
the declared value will be used by default.
3.5</p>
      </sec>
      <sec id="sec-3-5">
        <title>Time Patterns</title>
        <p>The times at which new values have to be generated can be controlled via a time pattern language. That
allows|in the simplest case|to enumerate individual times. But more general is the use of a declarative
time pattern speci cation. That uses repeating temporal patterns, such as every N hours or hourly
at 12:00. To increase the variability, time patterns can be hierarchical in that rst a longer pattern is
speci ed and within that a more ne-grained subpattern:
yearly:
in May .. June : every 2nd week
otherwise : every 30 minutes</p>
        <p>Starting with the start time of all involved time sequences, that pattern would create a yearly pattern
whereby in the months May and June a time stamp will be computed every 14 days. In all other months
a 30 minute rhythm will be used.</p>
        <p>When using these patterns, then a special variable t is bound always to one timestamp, one at a time.
Apart from generating physical times, there is also the option to use the index as logical time, such as in
every 2nd tick to address every second time slot in the incoming sequence. If no pattern is provided
the default is every tick. Using logical time, the variable n will always contain the current index, and
t will be bound to the time in the current slot.
3.6</p>
      </sec>
      <sec id="sec-3-6">
        <title>Sequence Parameters</title>
        <p>Operators can use explicit sequence parameters to not only give an incoming time sequence a local name,
but also to impose certain constraints on it. Only if all constraints are satis ed, evaluation will continue.</p>
        <p>In the example
@Direction { phenomenon =&gt; wind,</p>
        <p>unit =&gt; degrees }
@Speed { phenomenon =&gt; wind }
for the rst incoming sequence it is checked whether the phenomenon measured is actually wind and than
in degrees. If this test passes, the sequence will be bound to a variable @Direction. The second incoming
sequence will be bound to @Speed if it passes its own test.</p>
        <p>The constraints themselves are given by key/value pairs. Only if the incoming sequence has that very
key and that key links to the same value, then the constraint is satis ed. Values can be left undef to
simply check for the existence of a certain key. In any case, the keys are again reinterpreted as variables.
These can then be used throughout the rest of the TSP de nition and is bound to the sequence value for
that key.</p>
        <p>The binding of incoming time sequences to sequence parameters is purely positional. Any unbound
incoming sequence is left for another evaluation round of the same operator.</p>
        <p>If no formal sequence parameter is declared, then a default one named @ will be used. It will always
consume a single incoming sequence and it can be used implicitly within value expressions, i.e. (t) instead
of @ (t) and [n] instead of @ [n]. If an operator is greedy and needs to consume all sequences, then the
special @... must be used to indicate this. It cannot specify constraints.
3.7</p>
      </sec>
      <sec id="sec-3-7">
        <title>Simple Value Generation</title>
        <p>Value generation follows mostly the syntax and semantics of conventional programming languages such as
FORTRAN, C or Java. This includes the notation for constant values, the usual pre x and in x operators,
the general function invocations and the precedence grouping with parenthesis. An example would be</p>
        <p>There are, however, some language speci ca. Conditionals, i.e. expressions where the evaluation
depends on a condition, are not written with if cascades or a ternary operators, but instead use individual
post x if clauses:</p>
        <p>&lt; [n] if n.depth &lt; -100 m
or [n] * 0.01 if n.depth &lt; 100 m
or 0 otherwise &gt;
Depending on the depth component di erent formulas will be used to generate a value. The otherwise
is syntactic sugar as the lexical order is used for testing individual conditions.</p>
        <p>When expressions inside a condition are evaluated, they actually do not return a TRUE or FALSE value
as this data type per se does not exist in the language. Instead undef is used for FALSE, anything which
is not undef is regarded to be TRUE.</p>
        <p>3.8
Life sciences being the main application domain for the language, all expressions are also aware of physical
units, speci cally those from the SI system. This starts with constants having units, such as 3kg or 27.7
m/s. But it also implies that all computations must respect units as well. In expressions such as @Speed(t)
- 100 km/h the physical dimensions of all operands must match, i.e. the @Speed time sequence must have
only values with length per duration.</p>
        <p>Every expression can also be unit-converted. One way of conversion is to impose an additional unit
onto the value of the expression. In
every value would get mg as unit. If it already had a unit, that would be added as if the computation
@A[n] * 1 mg had been used. In the other direction any existing unit can be relinquished:
The processor will convert any value with a length dimension into the number of meters, dropping the
unit altogether from the value leaving a simple scalar. That mechanism can also be used to scale values.
In @A[n] &gt;-&gt; 1 km the values are converted to kilometers, or even leaving the SI system with @A[n] &gt;-&gt;
inch which converts into inches.
3.9</p>
      </sec>
      <sec id="sec-3-8">
        <title>Slot Selection</title>
        <p>
          When using time patterns which iterate over the incoming time sequence(s), a natural way to access a
particular slot is to use an index. Intuitively, the rst (and oldest) value is addressed via a subscript [0],
the next with [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] and so forth. To count from the last (and youngest) value one has to use negative
values: [-1] retrieves the last value, [-2] the second-to-last, etc.
        </p>
        <p>Referring to one particular value in the sequence is only moderately useful. If one needs to operate
on each individual value one needs to address the current value [n]. Logical indices can also be used
to timeshift sequences. In the same way as [n] always points to the current value, [n-1] refers to the
previous, [n-2] refers to that before, and [n+1] to the next. To shift a sequence into its own future, one
would write &lt; [n-1] &gt;, and to shift it into its past &lt; [n+2] &gt;.</p>
        <p>In the case that the iteration over time sequences is based not on logical but a physical time, the
current slot can be addressed via (t), t symbolizing the current time. Similar to above a particular past
or future can be referred to by providing a negative or positive duration, such as in (t - 30 secs) or
(t + 3 days).
3.10</p>
      </sec>
      <sec id="sec-3-9">
        <title>Aggregations</title>
        <p>Slots can also be addressed in groups using a logical range. This is only used together with aggregations,
so that
&lt; [n-2 .. n].sum &gt;
&lt; (n-2 .. n].sum &gt;
produces sums of the last 3 values. Such ranges can be open to the left or to the right. This re ects the
traditional interval notation where parentheses are used:
Now only the last two preceeding values are added.</p>
        <p>Aggregation intervals can also be speci ed via the physical time, such as in ( t-3 hours .. t
].mean to compute a 3 hour mean value. Again the interval can be open to either side.</p>
        <p>The aggregation functions themselves are prede ned (mean, sum, prod, max, min, count). All work
type sensitive, in the sense that for the rst two the plus operator for that data type is used, for prod the
multiplication and for max and min the comparison. That list can only be extended outside the language.
The same is true if the aggregation functions need a special treatment of unde ned values.</p>
        <p>If the selected interval does not contain any value, only sum and count render something de ned (0),
all other aggregates become unde ned.
[n] . phenomenon
{
}
value =&gt; A[n],
phenomenon =&gt; iso:SO2
3.11</p>
      </sec>
      <sec id="sec-3-10">
        <title>Property Management</title>
        <p>According to the abstract data model of the language, properties are always key/value pairs. This is
to ensure that properties can be easily mapped to RDF triples and Topic Map information items, such
as occurrences, names and associations. Properties can also be virtually supplied by the programming
environment in that pre-registered functions dynamically compute properties.</p>
        <p>The syntax ensures that properties work equally on individual slots, whole time sequences and even
sequences thereof. Some properties inherit downwards in that they apply to a single slot if the whole
sequence has them. One example is unit where individual slots have to rede ne this property to override
any sequence-wide value. The same applies to location.</p>
        <p>Slot Properties Only in simple cases a time sequence will have one single value in each slot. In general,
slots will contain any number of properties, each with a key and a corresponding value, be that interpreted
as data or meta data. To access a value, it has to be dereferenced via its key:
If such a property did not exist in that slot undef would be returned. That makes it simple to use as test
for property existence as in [n] if [n] . phenomenon.</p>
        <p>On the outgoing side, slots with their properties can be created using the following canonical syntax:
Obviously one particular key can appear only once. The key must be an</p>
        <p>identi er (or QName or IRI), the value can be speci ed via an arbitrary value expression. That
expression is always evaluated in the current variable binding.</p>
        <p>The key value is prede ned and simpli es the syntax when a slot has one distinguished value, the
rest being meta data. Then namely</p>
        <p>&lt; A[n] * 2 { phenomenon =&gt; iso:SO2 } &gt;
can be used instead of the canonical
&lt; { value =&gt; A[n] * 2,</p>
        <p>phenomenon =&gt; iso:SO2 } &gt;
The key value is also the one used by default when accessing slots within expressions. The syntax A[n]
itself is a shortcut for the canonical A[n].value. Other properties of the slot have to be explicitly accessed
via their corresponding keys.</p>
        <p>Some properties are prede ned as they have a special meaning in the language. For time aggregation,
for instance, the delta property will cover the time interval over which was aggregated. If several slot
values are involved in a computation, then the time interval they cover is used. Otherwise delta defaults
to undef.</p>
        <p>Also the unit property is handled by the language according to the computation. If an incoming value
had m/s and is divided by a value with a time dimension, then the outgoing property for unit will be
m/s^2. It is only de ned if there is exactly one value component.</p>
        <p>Time Sequence properties Also individual time sequences can have properties attached. Again, some
have a prede ned meaning, such as start and end. These, respectively, represent the start and the end
time of the sequence. As every sequence is working under a particular interpolation regime, the property
interpolation returns an identi er for that interpolation method.</p>
        <p>On the outgoing side, time sequence properties can be added directly after the value generator:
@A @B &lt; .... &gt; { location =&gt; vienna-stephansdom,</p>
        <p>phenomenon =&gt; @A . phenomenon }</p>
        <p>Some default handling here allows to reduce language noise: If the TSP is operating only on the
default sequence, then all its properties are automatically propagated. Only if the sequences are explicitly
declared, then the properties have to be as well.
In many cases the values within the properties will be numbers or strings, so that the language can
directly operate on them. In general, though, values might be matrices, images or arbitrarily complex.
These objects would be completely opaque to the language.</p>
        <p>One way to handle this, would be to make the application developer write special accessor functions
for these complex objects and overload the relevant arithmetic operators. But in practical cases it is much
more convenient to give the language access to value components and let it handle the arithmetic.</p>
        <p>
          Even though F3 cannot know anything about the internal structure of such objects, it can postulate
a generic accessor syntax which allows to drill down into any object. In the example
[n] . value . columns [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]
[n] . value / columns [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]
the dot notation is used to traverse an assumed tree structure within the value property. And indices are
used to select by a number. Alternatively to the dot we also allow a slash / to insinuate a path language:
        </p>
        <p>At evaluation time, the language processor hands over such path expressions to the object which has
to resolve the path to return a simple value.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Operator Algebra</title>
      <p>To reuse operators and reduce the overall complexity, individual operators can be combined to form larger
ones. One way is to pipeline them, so that the result of one operator becomes the input of the operator
next in the pipeline. In the following example the incoming sequence is rst incremented by one, then
the results are doubled.</p>
      <p>&lt; [n] + 1 &gt; | &lt; [n] * 2 &gt;</p>
      <p>Pipelines can be extended to any number of stages, a single operator being just a trivial pipeline.
If one stage produces more sequences than the next stage can consume, again the repetitive evaluation
semantics from section 3 is used. Consequently, the expression
is equivalent to
&lt; [n] + 1 &gt; &lt; [n] - 1 &gt;
|</p>
      <p>&lt; [n] * 2 &gt;
&lt; ( [n] + 1 ) * 2 &gt; &lt; ( [n] - 1 ) * 2 &gt;</p>
      <p>Generators (section 3.3) can be used to stack time sequences on top of each other. But also for
already existing operators it is possible to stack them. That is achieved by connecting them with &amp; (or
alternatively with commas):</p>
      <p>&lt; [n] + 1 &gt; &amp; &lt; [n] - 1 &gt;
When evaluating a stacked operator S, all incoming time sequences will be duplicated and subjected to
each of the inside operators. The time sequences produced by those are then stacked on top of each other,
honoring the lexical order in which the stacking was de ned inside S. Consequently, the expression
&lt; 2 * [n] &gt; | &lt; [n] + 1 &gt; &amp; &lt; [n] - 1 &gt;
is equivalent to the single operator</p>
      <p>&lt; 2 * [n] + 1 &gt; &lt; 2 * [n] - 1 &gt;</p>
      <p>As one would expect, the &amp; binds stronger than the pipelining operator |. That precedence can be
overridden by grouping inside () parentheses.</p>
    </sec>
    <sec id="sec-5">
      <title>Semantic Properties</title>
      <p>
        From here on it is assumed that the Formula 3 processor has access to an underlying topic map. That
semantic network contains information pertinent to the application domain, in the geosemantic case
generic concepts from O&amp;M [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], SensorML but also necessary geographical information and background
ontologies covering observable phenomena. Such a network may be materialized or it may be virtual
where external resources are mapped dynamically into the map [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>Depending on the needs, there are several levels of engagement with the semantic network.</p>
      <sec id="sec-5-1">
        <title>5.1 Identi cation Regime</title>
        <p>When testing for certain properties and when creating keys and identi ers one important aspect is a
consistent identity management for any addressed subjects. Only with this quality assurance measure a
robust and long-term management of data is feasible.</p>
        <p>In this scenario a Formula 3 processor accesses the underlying topic map whenever an identi er, a
QName or an IRI is used in the property handling. In the case of a simple identi er that is interpreted as
topic identi er of an existing topic inside the map; when an IRI is used, then that must be one subject
identi er for a topic there.</p>
        <p>The resolution of QNames, such as xsd:integer, is more complex: First the QName pre x (xsd) is
interpreted as topic identi er in the underlying map. That topic must be an instance of an ontology as|
according to Topic Maps concepts|it has to reify a map with all the vocabulary in that corresponding
namespace. That ontology is then consulted and in there a topic with the topic identi er integer must
exist.
5.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Path Expressions for Values</title>
        <p>A further step is to allow TMQL path expressions everywhere where simple expressions in Formula 3 are
allowed. At evaluation time these path expressions are evaluated against the underlying topic map. Any
existing variable binding can be passed into the path expression. With that the following is possible:
&lt; value =&gt; [n] . amplitude,
intensity =&gt; { // wave-forms [ ./low &lt;= $value ]</p>
        <p>[ $value &lt;= ./high ]
} &gt;
For every value in the one (anonymous) input sequence the</p>
        <p>amplitude property is extracted and propagated as value property to the outgoing sequence.
Additionally, the amplitude value is bound to the variable $value. The TMQL path expression is wrapped
in fg brackets. It will rst nd all instances of wave-forms in the underlying map. It then will lter out
those which have a high-low range inside which the $value lies. That remaining wave form topic will be
returned in form of its topic identi er. With small modi cations of the path expression also the subject
identi er or topic name(s) can be requested.</p>
        <p>Path expressions can also be used for properties of outgoing sequences as the following example
demonstrates:
@A{ pheno =&gt; undef } # formal parameters
&lt; ..... &gt; # generating values
{ risk-level =&gt; { $pheno / risk } }
The pheno property of the incoming sequence @A is rst bound to the variable $pheno. Then|when it
comes to create the result properties|the phenomenon is used as starting topic to nd a risk occurrence
in the underlying map. Whatever is returned here (string, URL, ...) is embedded as value of the property.
TMQL path expressions not only can provide a property value, but also implicitly de ne the key as well.
All this depends on what a path expression actually returns.</p>
        <p>In the most simplest case, a path expression can return an occurrence. That|according to the Topic
Maps paradigm|consists of a type, a value (and the scope). Ignoring the scope, the type can be naturally
interpreted as key and the occurrence value as the corresponding value. This is demonstrated with the
following:
{</p>
        <p># properties
{ tsunami / wikipedia }, # path expr
{ tsunami / homepage } # path expr
}
When these properties are generated for a slot or for a whole sequence then rst the tsunami topic is
located in the underlying map. Then an occurrence of type wikipedia is looked for. If it exists, then
the type wikipedia will be used as key and the WikiPedia URL as value. Similarily for the homepage
occurrence.</p>
        <p>What works for occurrences also works for names. Also here the name type will be used as key. The
name itself is always a string and it will serve as property value. Also Topic Maps associations have such
an embedding rule: Here per association role one property is generated. The role itself is used as key, the
player for that role is the value.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Related Work</title>
      <p>As the work here is rather architectural in nature, we touch several areas. First and foremost it is the choice
of the semantic technology which impacts on the available infrastructure, feature sets and limitations. As
we have chosen Topic Maps for our experiment, we will rst argue this decision with some rationale. The
remaining sections deal with similar processing models from which we have drawn ideas for the language
F3, and with ways to extend a semantic network model with temporal information.
6.1</p>
      <sec id="sec-6-1">
        <title>Topic Maps vs. RDF Rationale</title>
        <p>
          While there has been some work in conceptually mediate between the RDF and the Topic Maps model
( [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]) the two semantic technology stacks di er in various aspects.
        </p>
        <p>In contrast to RDF, TM have been designed to be subject-centric, rather than resource-centric.
Accordingly, Topic Maps include a dedicated identi cation regime with subject locators and identi ers to
control how to address resources, physical objects and abstract concepts. In that they avoid any
discussion what a URI actually means or any need to resolve theme on the network (httpRange-14 issue).
One practical consequence thereof is that merging of maps is based not only on the equivalence of two
node IRIs within graphs, but also whether these IRIs are used as locator or identi er. Map merging is
then more robust as several such identi ers may exist for one and the same subject. The identi cation
regime also does not make it necessary to resort to heavy-weight ontology-based mechanisms, such as
owl:equivalentClass or owl:sameAs.</p>
        <p>In terms of statements Topic Maps o er not only single-valued properties (equivalent to RDF triples
with literals), but also multilateral associations involving more than two topics. Multilateral statements
not only avoid the use of blank nodes, something which adds to inferencing complexity. They also allow
to directly model N-ary relationships and therefore put a topic in a relationship in that particular context
(relativistic modelling). All associations are symmetric in nature avoiding the need to keep multiple
versions of a property only to constrain later explicitly in an ontology that one is the inverse of the other.</p>
        <p>Any statement context can be further re ned with the use of a scope (not mentioned earlier). While
somewhat underde ned, it limits in a standardized way the validity of a statement, a feature so useful
that many RDF programming frameworks o er it and that SPARQL mimicks with the GRAPH concept.</p>
        <p>TMs have no limitations on the use of class/instance relationships. One and the same topic can be a
class and an instance in the same map. While this may have theoretical implications in some reasoning
scenarios, it drastically simpli es modelling of many (if not most) real-world scenarios where sets of sets
are needed.</p>
        <p>Like RDF, Topic Maps also allow to reify statements. The di erence is that in TM only already
asserted statements can be rei ed, staying consistent with a subject-centric approach.</p>
        <p>In terms of the standards stack, Topic Maps use a fundamentally di erent layout. Instead of de
ning independently an ontology language (OWL) and a query language (SPARQL) directly on the model
(RDF/S), Topic Maps rst position the query and access language (TMQL) on top of the model (TMDM),
committing hereby to closed world assumption and a particular inferencing regime. The constraint
language (TMCL) is fully de ned in terms of the query language; otherwise there is no ontology language for
Topic Maps. With this setup the Topic Maps standards architecture limits the range of possible ontology
languages, but it leads to a leaner overall model and a single point of de nition for the semantics. That
has a direct impact on the formal semantics and on optimization techniques when querying maps with
known constraints.</p>
        <p>There are also signi cant di erences between the query languages:
{ SPARQL only uses a pattern matching approach to detect certain node constellations in the
underlying graph. TMQL o ers that too, but additionally a path language to navigate to nearby corners of a
map. The path expression language is powerful enough so that (almost) all queries can be expressed
with it. It can also be used in SELECT clauses to further postprocess information bound to variables.
{ TMQL can return customized XML content directly to be used by the invoking application, not
just according to one particular standardized schema. This enables optimizations within TMQL and
avoids situation where SPARQL query returns many NULL results which are eventually ignored by
the application.
{ As TMQL subscribes to the closed world assumption (CWA) it can o er a straightforward NOT
operator within the WHERE clause. Many applications using SPARQL resort to postprocess the
results.</p>
        <p>The di erences re ect that Topic Maps address rather controlled (and controllable) application
scenarios, whereas RDF is more targeted to the (open) Semantic Web.
6.2</p>
      </sec>
      <sec id="sec-6-2">
        <title>Temporal Extensions</title>
        <p>
          There seem to be two schools how to embed temporal information into an existing semantic network
model. The rst, unobtrusive approach taken by OWL Time [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] is to de ne a dedicated vocabulary
covering things time events, durations and intervals together with their relations (contained-in, overlaps,
and so forth). Given an appropriate data type for the representation of dates, process information can
be modeled as nodes labeled in this vocabulary.
        </p>
        <p>
          The intrusive method is to modify the model itself. In the RDF space this has been proposed by [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
While their background is to capture historical events and incremental changes, their line of argumentation
is valid in the environmental monitoring domain and holds equally well for Topic Maps, as it does for RDF.
Speci cally the ability to reason over temporal aspects much more e ectively than using a vocabulary
approach is relevant for applications on a larger scale.
        </p>
        <p>Extending the Topic Maps model by an intrinsic temporal component on associations|or in fact on
any statement|we follow this approach. The variation we introduce is to also store the time interval
(even together with a direction) to even better re ect the nature of our data corpus.
7</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Summary</title>
      <p>One of the driving factors for this work is to o er a consistent framework|conceptually and then in
terms of programming languages|to manipulate time series of observation data. This is relevant for
both, adhoc virtual sensors as well as for from the craddle to the grave long-term management.</p>
      <p>Preliminary work on integrating semantic technologies into time series processing had shown that
rst not only concepts from the life sciences domain, but also that of the sensor web domain have to be
aligned. Only then an integration of involved programming languages seemed feasible, something which
also suggests global optimization opportunities.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>R.</given-names>
            <surname>Barta</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Bleier</surname>
          </string-name>
          .
          <source>Semantically enabled SOS with Topic Maps</source>
          ,
          <year>2008</year>
          .
          <article-title>FOSS4G 2008, Free and Open Source Software for GeoSpatial Conference</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>M.</given-names>
            <surname>Botts</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Robin. Sensor Model</surname>
          </string-name>
          <article-title>Language (SensorML), Open Geospatial Consortium Inc</article-title>
          ., ogc
          <volume>07</volume>
          -
          <fpage>000</fpage>
          .
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>S.</given-names>
            <surname>Cox</surname>
          </string-name>
          . Observations and Measurements, Part 1 -
          <string-name>
            <given-names>Observation</given-names>
            <surname>Schema</surname>
          </string-name>
          , Open Geospatial Consortium Inc.,
          <source>OGC 07-022r1</source>
          .
          <year>2007</year>
          .
          <article-title>Open Geospatial Consortium Inc</article-title>
          .,
          <source>OGC 07-022r1.</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>L. M.</given-names>
            <surname>Garshol</surname>
          </string-name>
          .
          <article-title>Living with topic maps and rdf</article-title>
          ,
          <source>2003. Technical Report.</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>L. M.</given-names>
            <surname>Garshol</surname>
          </string-name>
          .
          <article-title>Q: A model for topic maps: Unifying rdf</article-title>
          and topic maps,
          <source>2005. Extreme Markup Languages</source>
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>L. M.</given-names>
            <surname>Garshol</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Barta</surname>
          </string-name>
          . TMQL,
          <article-title>Topic Maps Query Language, working draft</article-title>
          .
          <source>ISO/IEC JTC1/SC34.</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>L. M.</given-names>
            <surname>Garshol</surname>
          </string-name>
          and
          <string-name>
            <given-names>G.</given-names>
            <surname>Moore. ISO</surname>
          </string-name>
          13250-2: Topic Maps - data
          <string-name>
            <surname>model</surname>
          </string-name>
          ,
          <year>2008</year>
          -
          <volume>06</volume>
          -03. http://www.isotopicmaps.org/sam/sam-model/.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>L.</given-names>
            <surname>Heuer</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Barta</surname>
          </string-name>
          .
          <source>AsTMa= 2</source>
          .0 language de nition.
          <year>2005</year>
          . http://astma.it.bond.edu.au/astma=-spec2.
          <year>0r1</year>
          .0.dbk.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Hobbs</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Pan</surname>
          </string-name>
          . Time ontology in owl,
          <source>w3c working draft 27 september</source>
          <year>2006</year>
          ,
          <year>2006</year>
          . W3C.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>T.</given-names>
            <surname>Kauppinen</surname>
          </string-name>
          , J. Vaatainen, and
          <string-name>
            <given-names>E.</given-names>
            <surname>Hyv</surname>
          </string-name>
          <article-title>onen. Creating and using geospatial ontology time series in a semantic cultural heritage portal</article-title>
          ,
          <year>2008</year>
          .
          <source>ESWC</source>
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>A.</given-names>
            <surname>Na</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Priest</surname>
          </string-name>
          . Sensor Observation Service, Open Geospatial Consortium Inc.,
          <source>OGC 06-009r6</source>
          .
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>O.</given-names>
            <surname>Lassila</surname>
          </string-name>
          and
          <string-name>
            <given-names>K.</given-names>
            <surname>Swick. Resource Description</surname>
          </string-name>
          <article-title>Framework (RDF) model and syntax speci cation</article-title>
          ,
          <source>Technical report, W3C</source>
          .
          <string-name>
            <surname>Camo</surname>
            <given-names>AS</given-names>
          </string-name>
          ,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. G. Percivall and
          <string-name>
            <given-names>C.</given-names>
            <surname>Reed. OGC Sensor Web Enablement Standard</surname>
          </string-name>
          .
          <source>Sensors &amp; Transducers Journal</source>
          , Vol.
          <volume>71</volume>
          , issue 9, pp.
          <fpage>698</fpage>
          -
          <lpage>706</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>