Introduction

Towards a Type System for Semantic Streams?

0 Informatics Theory Group University of Bamberg

This paper proposes an approach to use constructive description logics as a typing system for streaming data in the domain of auditing. We introduce the constructive description logic cALC and show how it can serve as a semantic type system and knowledge representation formalism for data streams and give a direct interpretation of proofs as computations following the Curry-Howard-Isomorphism. Description Logics(DLs) [1] are a family of knowledge representation languages that can be used to represent the terminological knowledge of a speci c application domain in a structured and formally well-understood way. DLs have become important in the Semantic Web as a formal base for the W3C-endorsed Web Ontology Language (OWL). In some domains like auditing, complications with the standard arise and one has to deal with partial knowledge that is evolving and incomplete. Auditors need to understand the ow of information (streaming data) and their related control activities to be able to ensure the validity and reliability of business transactions. Classical DLs, however, are insu ciently expressive to deal with streaming information (auditing knowledge) which is highly evolving and incomplete. The entities building up the domain may not be determined and concrete but are abstractions of real individuals whose properties are evolving and de ned only up to construction. Classical DLs assume that a concept is static and at the outset either includes a given entity or not. However, in auditing either option may be inconsistent, if the entity or the concept is not fully de ned until a later stage of the auditing process where lower levels of detail become available. Because the semantical meaning is context-dependent and possibly involves many levels of explication, DLs for auditing must support a constructive notion of abstraction which permits that concepts and entities evolve. Audit statements about the validity of the streaming information, e.g., accounting data, absence of fraud or conformance to nancial process standards must constructively take account of many dimensions of abstraction and re nement [4]. Towards this goal we have developed a new constructive description logic [4], called cALC, which is meant to form the core of a family of semantic type systems for the static analysis of audit processes but also to support the automated generation of executable audit processes themselves. cALC is based on DLs but its semantics are re ned to a constructive notion of truth which is capturing the uncertainty aspects for dealing with data streams. The idea is that cALC would be extended to represent semantic ontologies for streaming data in the domain of auditing and operationalised in terms of a type-theoretic speci cation language for a data- ow execution model of audit streams. In the following we recall the de nition of cALC [4] and we suggest its application as a type system showing a direct interpretation of proofs as computations following the Curry-Howard-Isomorphism. This is inspired by the work of Bozatto et al. [2]. Concept descriptions in cALC are based on sets of role names NR and atomic concept names NC and formed as follows, where A 2 NC and R 2 NR: ? Research project funded by the German Research Foundation (DFG) under ME 1427/4-1.

Introduction

Syntax and Semantics of Constructive Description Logic cALC

C; D ! A j ? j :C j C u D j C t D j C v D j 9R:C j 8R:C: In contrast to standard ALC [ 1 ] this syntax includes subsumption v as a concept{forming operator. Being a rst class operator, subsumption can be nested arbitrarily as in ((D v C) v B) v A. Negation : can be represented as :C = C v ?. Concepts describe a \class" of objects or individuals, a role represents a binary relation on these classes.

Example 1. To explain what we have in mind let be a class of database tables over attributes NR. These tables are collections of records r = [R1 = v1; R2 = v2; : : : ; Rn = vn] where Ri 2 NR and each vi is either a basic value vi 2 D or a (reference, key) to another table. The domain D of basic data might comprise, e.g., Booleans B = fT; Fg, naturals N and strings S, i.e., D = N]B]S. The record r has type type(r) = fR1:D; R2:D; : : : ; Rn:Dg.

E.g. assume a record rw = [hasN ame = P etit Chablis; hasOrigin = Alsace; isV intage = 1999; awarded = F] describing a single wine which is of the type type(rw) = f hasN ame:S; hasOrigin:S; isV intage:N; awarded:B g. This record is an instance of the concept description W IN Er = 9hasN ame:S u 9hasOrigin:S u 9isV intage:N u 8awarded:B. Therefore, this concept description can be thought of as a weak type speci cation of rw. Note, that there is the possibility to represent null-values by use of 8 instead of 9. tu

Constructive interpretations I of concept descriptions extend the classical models for ALC by a pre{ordering I for expressing re nement between individuals and by a notion of fallible entities ?I for interpreting contradiction. As explained in [ 4 ], entities of the domain I are partial descriptions representing incomplete information about individuals. The relationship a b says that a \is an abstraction of " b, i.e., that it has \contains no more information" than b. Fallible elements b 2 ?I may be thought of as over{constrained tokens of information, self-contradictory objects of evidence or unde ned computations. E.g., they may be used to model the situation in which computing a role{ ller for an abstract individual a fails, i.e., 8b: R(a; b) ) b 2 ?I , yet when a is re ned to a0 then a non-fallible role- ller b0 2 I exists with R(a0; b0). In [ 4 ] several examples are given illustrating this.

The re nement relation accommodates a range of di erent applications [ 3 ], one important interpretation is the stream interpretation. In this way a database table t 2 I can be presented as a stream of records as follows. Let t be a stream of records. In this case t = t1 t2 : : : is a nite or in nite stream where is the su x ordering, which is the least relation closed under the rule

v 2 D v s

s where v s is the stream s 2 (D [ D1) pre xed by value v 2 D. For instance, 1 (2; T) T F

I (2; T) T F

I T F

I F

I ; is a stream of naturals, booleans and their pairings, where denotes the empty stream. Under this interpretation, concepts CI , which must be closed under express future projected behaviour of streams. The empty stream is a fallible entity and has no future behaviour, it represents a computational deadlock, i.e., ?I = f g. Fallible elements ?I correspond to divergent or deadlocked reactions which do not produce any value. Viewed as stream computations these produce which is universally polymorphic, i.e., naturally contained in any type. This corresponds to the fact that fallible entities in cALC are information-wise maximal elements and therefore included in every concept, i.e., ?I CI for all C.

De nition 1. A constructive interpretation of cALC is a structure I = ( I ; I ; ?I ; I ) consisting of a non-empty set I of entities, the universe of discourse; a re nement pre-ordering

I on I , i.e., a re exive and transitive relation; a subset ?I I of fallible entities, and an interpretation function I mapping each role name R 2 NR to a binary relation RI I I and each atomic concept A 2 NC to a set ?I AI I , which is closed under re nement. tu

The crucial point about the constructive semantics is that each entity implicitly subsumes all its re nements and truth is inherited. Speci cally, if entity x satis es a concept description C and x I y then this implies that y satis es C as well, for all concepts C. Robustness under re nement is a natural prerequisite of type systems. Suppose p is a program of type , where expresses some static properties of the computations performed by p. The statement that \p guarantees " is usually written p: . Now, if p v! p0 is a computation step in which p produces a response value v and then transforms to the continuation program p0, then soundness of typing requires that p0, too, satis es the type speci cation . This is known as the subject reduction property. Thus, if we wish to view cALC concepts C as type speci cations then it is natural to consider computation steps p v! p0 as a form of re nement I . The typing statement p:C then corresponds to the condition p 2 CI and subject reduction is for free. Deadlocked programs are represented by fallible entities and do not produce any value, i.e., p:? i p 6 !.

In other words, if streams are considered as abstract entities then cALC concepts can act as a typing system to specify the static semantics of streams of business data such as as linearised database tables or time{series of nancial market transactions. Note that in this model database tables are rst class values. In standard relational databases tables do not usually contain references to tables but merely references (keys) to entries in other tables. This is a rst-order model of data. Here we will permit that attribute values inside an abstract entity or concept may be described by its own database table. In this way we obtain a second-order semantics in which the data to be manipulated are tables (i.e., collections of records) rather than records of atomic data. This is in line with the idea of an hierarchical data model in which objects are composite and arising by abstraction of lower level parameters. 3

A Constructive Term Assignment System

With each concept C we associate a set of realisers or information terms IT(C). Information terms [ 2 ] can be thought of as constructive explanation of logical connectives, i.e. they are mathematical objects explaining the truth of a formula by providing a witness in the form of a construction or program. Latter can be obtained by use of a calculus as has been demonstrated in [ 3 ].

Example 2. To illustrate information terms, let us reconsider the Food-Wine knowledge-base by Brachman et.al. (1991) as reported by [ 2 ]. First a knowledge-base consists of the terminological knowledge and is called in short TBox. In our case this is given by the atomic concepts F OOD; W IN E; COLOR, roles isColorOf; goesW ith and the following axioms: = fAx 1; Ax 2g where

Ax 1 =df food v 9goesWith:color

Ax 2 =df color v 9isColorOf:wine: Intuitionally AX 1 speci es that for all individuals of the concept food exists a role{ ller over relation goesWith that belongs to the concept color, AX 2 can be explained similarly. Instances of TBox concept descriptions are speci ed as assertional knowledge and form the so-called ABox: For the Food-Wine ontology we give the following ABox, which satis es the axioms AX 1 and AX 2: f ish : F OOD meat : F OOD red : COLOR white : COLOR chardonnay : W IN E barolo : W IN E (red; barolo) : isColorOf (white; chardonnay) : isColorOf (f ish; white) : goesW ith (meat; red) : goesW ith:

The Curry{Howard{Isomorphism can be customised to realise any Hilbert-proof of AX 1 and AX 2 as a program construction. For instance, AX 1 can be read as a function transforming food-entities u into color-entities x such that goesWith(u; x), similarly AX 2 is a function from colors c to wines w so that isColorOf(c; w). tu

These realisers are then taken as extra ABox parameters so that instead of I j= x:C we declare what it means that I j= x:h iC, relative to a given interpretation I, for a particular realiser 2 IT(C). This so-called realisability predicate gives additional constructive semantics to our concepts in the sense that I j= x:h iC implies I j= x:C.

The sets IT(C) and re ned concepts h iC are de ned by induction on C. For our example we need the following information terms: { IT(A) =df 1 = f0g for atomic concepts; { IT(?) =df 1 = f g

0 ; { IT(C u D) =df IT(C) IT(D); { IT(C t D) =df IT(C) + IT(D); { IT(C v D) =df IT(C) ! IT(D); { IT(9R:C) =df ( I ! I ) ( I ! IT(C)); { IT(8R:C) =df I ! IT(C).

De nition 2 (Realisability). Realisability is such that { I j= x:h0iA i x 2 AI for A = ? or atomic; { I j= x:h ; i(C u D) i I j= x:h iC and I j= x:h iC; { I j= x:h i(C t D) i = 0 and I j= x:h iC or = 1 and I j= x:h iC; { I j= x:hf i(C v D) i 8y x: 8 2 IT(C): I j= y:h iC ) I j= y:hf iD; { I j= x:ha; i(9R:C) i 8y x: (y; a y) 2 RI & I j= a y:h (a y)iC; { I j= x:h i(8R:C) i 8y x: 8a 2 I : (y; a) 2 RI ) I j= a:h aiC. tu

Under the Curry-Howard Isomorphism (propositions-as-types) [ 5, 6 ] the Cartesian product C D is the constructive interpretation of conjunction C u D, function spaces C ! D are the constructive reading of subsumptions C v D and Disjunction CtD can be interpreted by disjoint union. The information term for 9R:C can be written equivalently as I ! ( I IT(C)) that can be interpreted as a function from I to a pair, which consists of a witness for the role- ller and the information term for C. 8R:C is interpreted as function assigning every x 2 I an information term for C.

Example 3. Information terms for the axioms AX 1, AX 2 of Ex. 2 arise either from proof terms [ 3 ] or they are created from a particular ABox. Based on the ABox of Ex. 2 the information terms ax 1, ax 2 can be chosen such that ax 1 =df ax 2 =df u: x:case u of [meat ! (red; 0) j fish ! (white; 0)] u: x:case u of [red ! (barolo; 0) j white ! (chardonnay; 0)] where ax 1 2 I ! IT(food v 9goesWith:color) and ax 2 2 I ! IT(color v 9isColorOf:wine). These express the constructive content of Ax 1, Ax 2 in I. ax 1 can be read as a function which returns a pair of color and 0 dependent on case analysis on input u to decide between meat and f ish, 0 is taken here as second component since it is the information term for the atomic concept color. tu Proposition 1 (Subject Reduction). Let I j= x:h iC and x y. Then, I j= y:h iC.

Proof. By induction on concepts C. For atomic concepts h0iA the statement follows by invariance under re nement. The cases h ; i(C u D), h i(C t D) are by induction. For hf i(C v D), ha; i(9R:C) and h i(8R:C) we exploit transitivity of re nement. tu Example 4. An information term for A t B is an element 2 IT(A t B) = 1 + 1 = 2 which is represented canonically as the set 2 = f0; 1g. If = 0 then I j= x:h i(A t B) i x 2 AI and if = 1 then I j= x:h i(AtB) i x 2 BI . Thus, the realiser gives explicit information about which side of the disjunction holds. If we abstract from we get standard validity: 9 : I j= x:h i(AtB) is the same as x 2 (A t B). tu Example 5. Assume that I contains the particular stable stream data = data0data1data2 : : : where each datai is a non-deterministic record with every possible data value as a val - ller, i.e., (datai; v) 2 val I for all i 0 and v 2 D = N ] B. Such a stream data may be viewed as the universal abstraction of all entities that have a val - ller in D. Think of data as a concept name for the time-invariant set D and val as a role name for the set-theoretic element relationship 2.

Under interpretation I let nat and bool be the atomic concepts of natural numbers and booleans, i.e., such that natI = N and boolI = B. Then data 2 9val :nat and data 2 9val :bool. A realiser of 9val :nat on data is just a stream of naturals (as val - llers). Similarly, a realiser of 9val :bool on data is an arbitrary stream of Booleans. Every stream of records with val - llers in D can be seen as a realiser of 9val :(bool t nat) on data while a realiser of 9val :bool t 9val :nat must select either a stream of Booleans or a stream of naturals.

Now, what is an information term realising the concept description 9val :nat v 9val :bool for entity data, i.e., f such that According to the de nition f is a function from realisers with I j= y:h i(9val :nat) to realisers f such that I j= y:hf i(9val :bool) for all y data. Since data is a constant stream every realiser for a re nement of data is a realiser for data and vice versa, i.e., I j= data:h iC i I j= y:h iC for y data. Thus, f maps realisers of 9val :nat (on data) to realisers f of 9val :bool. Given what has been said above, this means that f is a function from streams of naturals to streams of Booleans. tu

One can then show that every Hilbert proof `H C generates, for any interpretation I, a function f : I ! IT(C) such that 8u 2 I : I j= u:hf uiC. Speci cally, each of the following Hilbert axioms is realised by a -term, for instance:

IPL1 : C v (D v C) ;

u: x: y:x IPL3 : C v (D v (C u D)) ; IPL2 : ((C v (D v E)) v (C v D) v (C v E)) ; u: x: y: (x; y)

u: x: y: z: (xz)(yz) u: x: y:( 1y; x( 1y)( 2y)). The rule of MP and Nec are Axiom 9K is the function 9K =df re ned to

If h iC and h i(C v D) then h u:( u)( u)iD

If h iC then h u: x: xi(8R:C).

Example 6. In this way, using the above mentioned axioms Ax 1 and Ax 2, the derivation of the concept description food v 9goesWith:(COLOR u 9isColorOf:wine), up to reductions in the -calculus, corresponds to prf =

u: x:( 1(ax 1 x); ( 2(ax 1 x); ( 1(ax 2( 2(ax 1 x))); 2(ax 2( 2(ax 1 x)))))) which is an information term such that 8u: I j= u:hprf ui(food v 9goesWith:(color u 9isColorOf:wine)) assuming that 8u: I j= u:hax 1 uiAx 1 and 8u: I j= u:hax 2 uiAx 2. Such realisers ax 1, ax 2 can be obtained from a concrete ABox as has been shown in Ex. 3. Auditors need reliable tools to draw the right consequences, thus they have strong requirements on correctness of software (certi ed code) so that formal, mathematical methods are needed. Extensions of cALC can serve as ontological speci cation languages to characterise the semantics of nancial information ows as strongly typed data streams. Logic-based reasoning algorithms (type-checking, model-checking) will then be available to raise the trustworthiness of next-generation auditing tools. We aim to develop a component{based data ow programming environment in which cALC types can be used as type speci cation of stream processing functions and for the extraction of auditing processes from proof terms by use of a calculus.

Baader ,

Calvanese ,

D. L.

McGuinness ,

Nardi , and

P. F.

Patel-Schneider . The description logic handbook: theory, implementation, and applications . Cambridge University Press, 2003 .

Botazzo ,

Ferrari ,

Fiorentini , and

Fiorino . A constructive semantics for ALC . In Int'l Workshop on Description Logics (DL 2007 ), pages 219 { 226 , 2007 .

Mendler and

Scheele . Towards constructive description logics for abstraction and re nement . Technical Report 77 ( 2008 ), University of Bamberg, September 2008 .

Mendler and

Scheele . Towards constructive dl for abstraction and re nement . In Franz Baader, Carsten Lutz, and Boris Motik, editors, 21st International Workshop on Description Logics (DL2008) , Dresden, Germany, May 13-16, 2008 , volume 353 of CEUR Workshop Proceedings. CEURWS.org , 2008 .

A. S.

Troelstra and D. van Dalen. Constructivism in Mathematics , volume II. North-Holland, 1988 .

6. D. van Dalen. Intuitionistic logic . In D. Gabbay and F. Guenthner, editors, Handbook of Philosophical Logic , volume III, chapter 4 , pages 225 { 339 . Reidel , 1986 .