9th Workshop on
Knowledge Engineering
  and Software Engineering (KESE)
at the
36th German Conference on Artificial Intelligence (KI2013)
September 17, 2013, Koblenz, Germany


Grzegorz J. Nalepa and Joachim Baumeister (Editors)


Technical Report No. 487, Würzburg University, Würzburg, Germany, 2013
The KESE Workshop Series is available online: http://kese.ia.agh.edu.pl

Technical Reports of the Würzburg University: http://www.informatik.uni-wuerzburg.de/forschung/technical reports
                                  Preface
                 Grzegorz J. Nalepa and Joachim Baumeister
                    AGH University of Science and Technology
                                  Kraków, Poland
                                  gjn@agh.edu.pl
                                        
                                 denkbares GmbH
               Friedrich-Bergius-Ring 15, 97076 Würzburg, Germany
                       joachim.baumeister@denkbares.com

    Research questions and practical exchange between Knowledge Engineering
for intelligent systems and Software Engineering of advanced software programs
have been fruitfully discussed over the last years. Many successful examples
demonstrate the clear symbiosis between these two research areas.
    In 2005 the KESE workshops took place for the rst time in Koblenz at the
28th German Conference on Articial Intelligence (KI-2005). Nine years later
the KESE9 workshops return to Koblenz, where it is collocated with the 36th
Annual Conference on Articial Intelligence in Koblenz (September 16-20, 2013).
This year we solicited contributions having the following topics:
  Knowledge and software engineering for the Semantic Web
  Ontologies in practical knowledge and software engineering
  Business Rules design, management and analysis
  Business Processes modeling in KE and SE
  Practical knowledge representation and discovery techniques in software en-
   gineering
  Agent-oriented software engineering
  Context and explanation in intelligent systems
  Knowledge base management in KE systems
  Evaluation and verication of KBS
  Practical tools for KBS engineering
  Process models in KE applications
  Software requirements and design for KBS applications
  Declarative, logic-based, including constraint programming approaches in SE
    As from the beginning the workshop series shows a healthy mixture of ad-
vanced research papers showing the direction to the next years and practical pa-
pers demonstrating the actual applicability of approaches in (industrial) projects
and concrete systems. This year ve regular, and two short papers were accepted
to the workshop. Moreover, one tool presentation was also included.
    In their paper "Integrating Semantic Knowledge in Data Stream Processing"
the authors Beckstein et al. describe dierent approaches on integrating stream
data and semantic domain knowledge. In particular, as accessing methods the
continuous query language CQL is compared with the SPARQL extension C-
SPARQL.
    Kramer et al. describe new possibilities for explanation generation. Their pa-
per "Towards Explanation Generation using Feature Models in Software Product
Lines" investigate how the approach can be applied in dynamic software product
lines (DSPL).
    In the paper "A Prolog Framework for Integrating Business Rules into Java
Applications" the authors Ostermayer and Seipel show an approach to connect
the data structures of the logic-based language Prolog with the wide-spread
programing language Java.
    Baumeister et al. report in their paper "Continuous Knowledge Representa-
tions in Episodic and Collaborative Decision Making" on a new type of decision
support systems and demonstrate its application in an industrial case study for
managing the knowledge about chemical substances.
    Pascalau introduces guidelines for designing and engineering advanced soft-
ware systems to be used by end-users. The paper "Identifying Guidelines for
Designing and Engineering Human-Centered Context-Aware Systems" proposes
a declarative level to hide the technical level of systems engineering from the
end-users.
    Kluza et al. tackle business process modeling and give an overview of recom-
mendation possibilities. Their paper "Overview of Recommendation Techniques
in Business Process Modeling" describes a categorization of recommendation
approaches.
    Newo and Altho report in their paper "Knowledge Acquisition for Life
Counseling" on a concrete project that uses case-based techniques and infor-
mation extraction methods in the life counseling domain.
    Kaczor et al. give a tool presentation and show in "HaDEsclipse - Integrated
Environment for Rules" an environment for engineering rule-based systems. The
tool is based in the well-established software tool Eclipse.
    The organizers would like to thank all who contributed to the success of the
workshop. We thank all authors for submitting papers to the workshop, and we
thank the members of the program committee as well as the external reviewers
for reviewing and collaboratively discussing the submissions. For the submission
and reviewing process we used the EasyChair system, for which the organizers
would like to thank Andrei Voronkov, the developer of the system. Last but not
least, we would like to thank the organizers of the KI 2013 conference for hosting
the KESE9 workshop.
                                                              Grzegorz J. Nalepa
                                                             Joachim Baumeister
                   Workshop Organization
   The 9th Workshop on Knowledge Engineering and Software Engineering
                                 (KESE9)
                    was held as a one-day event at the
            36th German Conference on Articial Intelligence
                                 (KI2013)
               on September 17 2013, in Koblenz, Germany


Workshop Chairs and Organizers
Joachim Baumeister, denkbares GmbH, Germany
Grzegorz J. Nalepa, AGH UST, Kraków, Poland


Programme Committee
Isabel María del Águila, University of Almeria, Spain
Klaus-Dieter Altho, University Hildesheim, Germany
Kerstin Bach, Verdande Technology AS, Norway
Joachim Baumeister, denkbares GmbH/University Wuerzburg, Germany
Joaquín Cañadas, University of Almeria, Spain
Adrian Giurca, BTU Cottbus, Germany
Jason Jung, Yeungnam University, Korea
Rainer Knauf, TU Ilmenau, Germany
Mirjam Minor, Johann Wolfgang Goethe-Universität Frankfurt, Germany
Pascal Molli, University of Nantes - LINA, France
Grzegorz J. Nalepa, AGH UST, Kraków, Poland
José Palma, University of Murcia, Spain
Alvaro E. Prieto, Univesity of Extremadura, Spain
Thomas-Roth Berghofer, University of West London, UK
José del Sagrado, University of Almeria, Spain
Dietmar Seipel, University Würzburg, Germany
                                        Table of Contents

Integrating Semantic Knowledge in Data Stream Processing . . . . . . . . . . . .                                        1
     Simon Beckstein, Ralf Bruns, Juergen Dunkel and Leonard Renners
Towards Explanation Generation using Feature Models in Software
Product Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   13
     Dean Kramer, Christian Sauer and Thomas Roth-Berghofer
Towards Continuous Knowledge Representations in Episodic and
Collaborative Decision Making . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .               24
     Joachim Baumeister, Albrecht Strier, Marc Brandt and Michael Neu-
     mann
Identifying Guidelines for Designing and Engineering Human-Centered
Context-Aware Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .           36
     Emilian Pascalau
Overview of Recommendation Techniques in Business Process Modeling . .                                                  46
     Krzysztof Kluza, Mateusz Baran, Szymon Bobek and Grzegorz J. Nalepa
A Prolog Framework for Integrating Business Rules into Java Applications                                                58
     Ludwig Ostermayer and Dietmar Seipel
Knowledge Acquisition for Life Counseling . . . . . . . . . . . . . . . . . . . . . . . . . . .                         70
     Régis Newo and Klaus-Dieter Altho
HaDEsclipse - Integrated Environment for Rules (Tool Presentation) . . . .                                              77
     Krzysztof Kaczor, Grzegorz J. Nalepa and Krzysztof Kutt
           Integrating Semantic Knowledge in
                Data Stream Processing

        Simon Beckstein, Ralf Bruns, Jürgen Dunkel, Leonard Renners

           University of Applied Sciences and Arts Hannover, Germany
                     Email: forname.surname@hs-hannover.de


      Abstract. Complex Event Processing (CEP) has been established as a
      well-suited software technology for processing high-frequent data streams.
      However, intelligent stream based systems must integrate stream data
      with semantical background knowledge. In this work, we investigate
      different approaches on integrating stream data and semantic domain
      knowledge. In particular, we discuss from a software engineering per-
      spective two different architectures: an approach adding an ontology ac-
      cess mechanism to a common Continuous Query Language (CQL) is
      compared with C-SPARQL, a streaming extension of the RDF query
      language SPARQL.


1   Introduction

Nowadays, much information is provided in form of data streams: sensors, soft-
ware components and other sources are continuously producing fine-grained data
that can be considered as streams of data. Examples of application fields exploit-
ing data streams are traffic management, smart buildings, health monitoring, or
financial trading. Intelligent decision support systems analyze stream data in
real-time to diagnose the actual state of a system allowing adequate reactions
on critical situations.
    In recent years, Complex Event Processing (CEP) [10] has been established as
a well-suited software technology for dealing with high frequent data streams. In
CEP each data item in a stream is considered as an event. CEP uses Continuous
Query Languages (CQL) to describe patterns in event streams, which define
meaningful situations in the application domain.
    However, for understanding the business meaning of stream data, the data
items must be enriched with semantical background knowledge. For instance
in traffic management, velocity measures must be related to specific knowledge
about the road network (e.g. road topology and speed limits). In contrast to
data streams, this background or domain knowledge is usually rather static and
stable, i.e. without frequent changes.
    Ontologies defined by Description Logic (DL) [8] provide a well-known for-
malism for knowledge representation, that can also be used for describing back-
ground knowledge. DL distinguishes two different aspects: (1) the TBox con-
tains terminological or domain concepts, and (2) the ABox defines assertional


                                         1
knowledge or individuals of the concepts that are defined in the TBox. Com-
mon languages for describing semantic knowledge are the Resource Description
Framework (RDF) for the TBox and the Ontology Language OWL1 for the
ABox. SPARQL [11] provides a standard query language for retrieving knowl-
edge represented in form of RDF data.
    Note that SPARQL was originally developed to process static data and is
therefore not suitable for the processing of data streams. Otherwise, conventional
CEP languages provide no inherent concepts for accessing ontological knowledge.
    In this work, we will investigate different approaches on how to integrate
data stream processing and background knowledge bases. In particular, we will
discuss two different aspects from a software engineering perspective:
 – How can CQL languages provided by standard CEP systems make use of
   ontology models?
 – How useful are recently proposed streaming extensions of SPARQL such as
   C-SPARQL?
    The remainder of the paper is organized as follows. The next section discusses
related work and other research approaches. Subsequently, section 3 introduces
briefly CEP. Then, section 4 discusses the different types of information that can
be exploited in stream based systems. The following sections 5 and 6 describe
and compare two different approaches of integrating background knowledge into
stream processing: The first approach adds an ontology access mechanism to a
common CQL-based architecture. The second one uses C-SPARQL, a streaming
extension of SPARQL. The final section 7 provides some concluding remarks and
proposes an outlook for further lines on research.

2   Related Work
In practice, nearly all stream processing systems are using a proprietary Con-
tinuous Query Language (CQL). At present, many mature implementations of
event processing engines already exist. Some well-known representatives are ES-
PER2 , JBoss Drools Fusion3 or Oracle CEP4 . As already discussed, none of
these engines neither target nor support a built-in way to integrate semantic
background knowledge.
    Another class of approaches target the integration of RDF ontologies with
stream processing. Different SPARQL enhancements have been developed in
order to query continuous RDF streams. Basically, they all extend SPARQL by
sliding windows for RDF stream processing:
 – C-SPARQL provides an execution framework using existing data manage-
   ment systems and triple stores. Rules distinguish a dynamic and a static part,
   which are evaluated by a CQL and a SPARQL engine, respectively [5, 4].
1
  http://www.w3.org/TR/2012/REC-owl2-primer-20121211/
2
  http://esper.codehaus.org/
3
  http://jboss.org/drools/drools-fusion.html
4
  http://oracle.com/technetwork/middleware/complex-event-processing


                                        2
 – Streaming-SPARQL simply extends a SPARQL engine to support window
   operators [6].
 – EP-SPARQL is used with ETALIS, a Prolog based rule engine. The knowl-
   edge (in form of RDF) is transformed into logic facts and the rules are
   translated into Prolog rules [1, 2].
 – CQELS introduces a so called white-box approach, providing native process-
   ing of static data and streams by using window operators and a triple-based
   data model [9].


    Beside SPARQL extensions, various proprietary CEP languages have been
proposed for integrating stream processing and ontological knowledge: For in-
stance, Teymourian et. al. present ideas on integrating background knowledge
for their existing rule language Prova5 (with a corresponding event processing
engine) [13, 14].
    In summary, many proposals for SPARQL dialects or even new languages
have been published, but so far not many results of practical experiments have
been proposed.
    This paper examines two different approaches for integrating RDF and stream
data from a software engineering perspective. First, we extend the well-known
CQL of ESPER with mechanisms for accessing RDF ontologies. Then, this ap-
proach is compared with C-SPARQL, one of the SPARQL extensions that inte-
grates SPARQL queries and stream processing.


3     Complex Event Processing - Introduction

Complex Event Processing (CEP) is a software architectural approach for pro-
cessing continuous streams of high volumes of events in real-time [10]. Everything
that happens can be considered as an event. A corresponding event object car-
ries general metadata (event ID, timestamp) and event-specific information, e.g.
a sensor ID and some measured data. Note that single events have no special
meaning, but must be correlated with other events to derive some understanding
of what is happening in a system. CEP analyses continuous streams of incoming
events in order to identify the presence of complex sequences of events, so called
event patterns.
    A pattern match signifies a meaningful state of the environment and causes
either creating a new complex event or triggering an appropriate action.
    Fundamental concepts of CEP are an event processing language (EPL), to
express event processing rules consisting of event patterns and actions, as well as
an event processing engine that continuously analyses event streams and executes
the matching rules. Complex event processing and event-driven systems generally
have the following basic characteristics:
5
    https://prova.ws/


                                        3
 – Continuous in-memory processing: CEP is designed to handle a consecutive
   input stream of events and in-memory processing enables real-time opera-
   tions.
 – Correlating Data: It enables the combination of different event types from
   heterogenous sources. Event processing rules transform fine-grained simple
   events into complex (business) events that represent a significant meaning
   for the application domain.
 – Temporal Operators: Within event stream processing, timer functionalities as
   well as sliding time windows can be used to define event patterns representing
   temporal relationships.


4   Knowledge Base
In most application domains, different kinds of knowledge and information can be
distinguished. In the following, the different types of knowledge are introduced
by means of a smart building scenario:6 An energy management system that
uses simple sensors and exploits the background knowledge about the building,
environment and sensor placement.
    The main concepts used in the knowledge base are rooms and equipment,
such as doors and windows of the rooms. Rooms and equipment can be attached
with certain sensors measuring the temperature, motion in a room or the state of
a door or a window, respectively. By making use of this background information,
the raw sensor data can be enriched and interpreted in a meaningful manner. For
instance, room occupancies due to rescheduled lectures or ad-hoc meetings can
be identified for achieving a situation-aware energy management. In this sample
scenario, we can identify three types of knowledge classified according to their
different change frequencies:

1. Static knowledge: We define static knowledge as the knowledge about the
   static characteristics of a domain, that almost never or very infrequently
   changes. A typical example in our scenario is the structure of a building and
   the sensor installation.
   Static knowledge can be modeled by common knowledge representation for-
   malisms such as ontologies. Because this information does usually not change,
   appropriate reasoners can derive implicit knowledge before the start of the
   stream processing. OWL can serve as a suitable knowledge representation
   language that is supported by various reasoners, for example KAON27 or
   FaCT++8 .
2. Semi-dynamic knowledge: We consider semi-dynamic knowledge as the
   knowledge about the expected dynamic behavior of a system. It can be rep-
   resented by static knowledge models, e.g. ontologies, as well. In our scenario,
   a class schedule predicts the dynamic behavior of the building: though the
6
  More details about the smart building scenario can be found in [12].
7
  http://kaon2.semanticweb.org/
8
  http://owl.man.ac.uk/factplusplus/


                                         4
    class schedule can be defined by static data (e.g facts in an ontology), it
    causes dynamic events, e.g. each monday at 8:00 a ’lecture start’ event. Of
    course, real-time data produced by sensor could outperform the predicted
    behavior, e.g. if a reserved class room is not used.
 3. High-dynamic knowledge: The third type of knowledge is caused by un-
    forseeable incidents in the real world. It expresses the current state of the
    real world and cannot be represented by a static ontology. Instead the cur-
    rent state has to be derived from continuous stream of incoming data. This
    type of knowledge can be described by an event model specifying the types
    of valid events.9 Examples in our scenario are sensor events representing ob-
    servations in the physical world, e.g. motion, temperature, or the state of a
    window or door, respectively.
    The three knowledge types introduced above provide only a basic classifica-
tion scheme. As already discussed in the introduction (section 1), various types
of information must be integrated and correlated in order to derive complex
events that provide insight to the current state of a system.

5      Using Semantic Knowledge in Event Processing
In this section, we will investigate how the different types of knowledge intro-
duced above can be integrated in stream processing – in particular, how onto-
logical knowledge can be exploited in stream processing.
    We start our discussion with a small part of an ontology for our energy
management scenario (see Figure 1). This sample ontology is used in the follow-
ing paragraphs for discussing the different knowledge integration approaches.
The model defines the three concepts ’room’, ’sensor’ and ’equipment’ and their
relationships. It shows that each room can contain sensors and equipment. Fur-
thermore, it specifies that a certain sensor is either directly located in a certain
room or attached to an equipment located in a room.
    Note that the location of a sensor can be inferred from the location of the
equipment it is attached to. The dashed line describes this implicit property,
which can be expressed as role composition in Description Logic:
    isAttacedT o ○ IsEquippedIn ⊑ hasLocation. A DL role composition can be
considered as a rule: If a sensor is attached to an equipment and the equipment
is equipped in a certain room, then the sensor is assumed to be located in the
same room.
    Listing 1.1 defines two individuals (Window362 and an attached contact
sensor C362W ) using the RDF turtle notation10 . Using the above presented
DL rule, it can be inferred that the contact sensor is located in room 362 and
the triple (:C362W :hasLocation :Room362) can be added to the knowledge
base.
    In the same way, further role and concept characteristics of the ontology can
be used for reasoning purposes.
9
     Note that such an event model can also be formally defined by an OWL ontology.
10
     http://www.w3.org/TR/turtle/


                                          5
                       Fig. 1. OWL ontology relationship

:Window362
    rdf:type :Window ,
    :isEquippedIn :Room362 .

:C362W
    rdf:type :ContactSensor ,
    :isAttachedTo :Window362 .
            Listing 1.1. Some sample entries of the domain knowledge


5.1   ESPER

As a first approach of integrating stream data and background knowledge we
have chosen the established event processing engine ESPER. Since it is a regular
CQL based engine it does not natively support the access of additional knowledge
bases. Figure 2 depicts the conceptional architecture of the approach. Different
event sources send streams of events via message channels to the ESPER CEP
engine. The event sources provide all events in a format that is processable
by ESPER, for instance simple Java objects (POJOS). The cycle within the
engine should denote that the events are processed in several stages. Each stage
transforms relatively simple incoming events into more complex and meaningful
events.
Knowledge Access: As already mentioned, ESPER does not inherently sup-
port a specific access to a knowledge base such as an OWL ontology, but it
provides a very general extension mechanism that allows invoking static Java
methods within an ESPER rule. Such methods can be used for querying a Java
domain model, a database or any other data source. To make our OWL domain


                                       6
                Fig. 2. Architecture using ESPER as CEP component


model accessible from ESPER rules, we implemented an adapter class that uses
the Jena Framework11 to query the ontology via SPARQL.
Events: Because ESPER can only process Java objects, the adapter has to map
RDF triples to Java objects. For instance, the mapping transforms an RDF-URI
identifying a sensor to an ID in the Java object. Each Java class corresponds
with a certain concept of the ontology TBox.
Queries: ESPER provides its own event processing language that is called ES-
PER Event Query Language (EQL). EQL extends SQL with temporal operators
and sliding windows. A simple example is given in Listing 1.2 that shows how
motion in a certain room is detected by an ESPER query.

SELECT room
FROM pattern [ every mse = MotionSensorEvent ] ,
     method : Adapter . getObject ( mse . sensorID )
     AS room

                       Listing 1.2. A sample ESPER query


   Actions triggered by a pattern match are implemented in a listener class that
must be registered for an ESPER rule. A listener can call any event handling Java
method or create a new complex event. The example rule looks rather simple,
because the access to the knowledge base is hidden behind the method call
(here: Adapter.getObject(mse.sensorID)). In our case, the adapter executes
a SPARQL query using the Jena framework as shown in Listing 1.3.

11
     http://jena.apache.org


                                       7
PREFIX      : < http :// eda . inform . fh - hannover . de / sesame . owl >
PREFIX rdf : < http :// www . w3 . org /1999/02⤦
    Ç/22 - rdf - syntax - ns # >
SELECT ? room ? object
WHERE { : " + sensorID + " : isAttachedTo ? object ;
                            : hasLocation ? room .
}
Listing 1.3. SPARQL query in the Jena-Adapter method Adapter.getObject(sensorID)


5.2     C-SPARQL

As an alternative approach, we investigate a software architecture using C-
SPARQL12 , a streaming extension of SPARQL. Figure 3 illustrates the main
building blocks of the architecture. The main difference to the previous approach
is, that all event sources produce a continuous stream of RDF data. This means
that the entire knowledge base of the system uses RDF as uniform description
formalism.


               Fig. 3. Architecture using C-SPARQL as CEP component


Knowledge access: In this approach, C-SPARQL queries are used for accessing
the homogeneous RDF knowledge base. A single C-SPARQL query can combine
incoming RDF streams with static background knowledge (also represented in
RDF).
12
     We used the ’ReadyToGoPack’, an experimental implementation of the concept in
     [4, 5], available on http://streamreasoning.org


                                         8
Events: The events themselves arrive as continuous streams of RDF triples.
To allow stream processing with RDF triples, they must be extended with a
timestamp. Thus, each event can be described by a quadruple of the following
form:
                           (⟨subji , predi , obji ⟩, ti )
The subject is a unique event identifier, the predicate and object describe event
properties. The timestamp is added by the engine and describes the point of time
the event arrived. Listing 1.4 shows a set of RDF triples describing a simplified
temperature sensor event.


: event123 rdf : type                                         :⤦
    Ç Te mp er atu re Se ns orE ve nt
: event123       : hasSensorId        :3432
: event123       : hasValue           24.7^^ xsd : double
                    Listing 1.4. A sample temperature event


Queries: C-SPARQL queries are syntactically similar to SPARQL. Listing 1.5
shows a C-SPARQL query expressing the same pattern as the ESPER query in
Listing 1.2. In contrast to SPARQL, it provides language extensions for temporal
language constructs like (sliding) time and batch windows as shown at the end
of the FROM STREAM-clause. The FROM-clause selects the various data streams


SELECT ? room
FROM STREAM < http :// eda . inform . fh - hannover . de /⤦
     ÇMotionSensorEvent . trdf >⤦
      Ç[ RANGE 10 s STEP 1 s ]
FROM < http :// eda . inform . fh - hannover . de⤦
      Ç/ sesame . owl >
WHERE {
  ? mEvent rdf : type           : MotionSensorEvent ;
               : hasSensorID ? sid .
  ? sid        : hasLocation ? room .
}
                    Listing 1.5. A sample C-SPARQL query


that are processed in the query. Each C-SPARQL query can either generate new
triples that can be processed by another query or call a (Java) listener class to
trigger an action.
    An interesting point to mention is that the C-SPARQL engine internally
transforms the query into a dynamic part dealing with the event stream pro-
cessing and a static part accessing the background knowledge. These parts are
each individually executed by a suitable engine or query processor. This behav-


                                       9
ior is transparent for the user as the entire rule is written in C-SPARQL and the
rule result contains the combined execution outcome.

6    Comparison
In this section, we will investigate the capabilities of two introduced approaches
of integrating stream processing and background knowledge. Based on our prac-
tical experiences, we discuss the two architectures from a software engineering
perspective. Table 1 summarizes the results of the comparison. The criteria will
be discussed in more details in the following paragraphs.

              Table 1. Comparison of CQL (ESPER) and C-SPARQL

                                                  ESPER C-SPARQL
             Maturity                               +       –
             Event Pattern Expressiveness           +       o
             Conceptual Coherence                   –       +
             Dynamic Rules                          o       +
             Heterogeneous knowledge sources        o       –
             Stream Reasoning Support               –       o

Maturity: ESPER is a widely used event processing engine, which is under
development by an active open source community for many years and, conse-
quently, has reached a stable and market-ready state. It provides a compre-
hensive documentation and several guides, as well as tutorials. In contrast, C-
SPARQL, and the ready-to-go-pack in particular, is a conceptual prototype. This
means that the implementation is not as mature and, furthermore, it is not as
good documented as ESPER. So far, there are no published experiences about
real-world projects using C-SPARQL.
Event Pattern Expressiveness: According to its maturity, ESPER provides a
rich set of operators for specifying event patterns, e.g. for defining different types
of sliding windows or various even aggregations operators. The event algebra of
C-SPARQL is less expressive compared to ESPER, but, nevertheless, it supports
all important features for general event processing tasks.
Conceptual Coherence: C-SPARQL allows the processing of stream data and
the integration of static background knowledge by using only one paradigm (or
language). Listing 1.5 shows a C-SPARQL query that combines event stream pro-
cessing and SPARQL queries. In this sense, a C-SPARQL query is self-contained
and coherent: only C-SPARQL skills are necessary for understanding it.
    In contrast, ESPER does not support inherent access to knowledge bases.
Consequently, specialized Java/Jena code must be written to integrate back-
ground data. The ESPER-based architecture combines the ESPER query lan-
guage (EQL) for stream processing and Jena/SPARQL code implemented in a
Java adapter class to query knowledge bases. The ESPER rules are not self-
contained and delegate program logic to the adapter classes. Note that this can


                                         10
also be viewed as an advantage: hiding a (perhaps) big part of the logic in method
calls results in simpler and easier understandable rules.
Dynamic Rules: Changing a rule at runtime is difficult in ESPER, because
modifying an ESPER rule can cause a change of the EQL pattern and of the
SPARQL query in the listener class of the corresponding rule. In this case, the
code must be recompiled. C-SPARQL makes changes much easier, because only
the C-SPARQL query must be adjusted. Such queries are usually stored as
strings in a separate file, which can be reloaded at runtime - even for rules
including completely new queries of the knowledge base.
Heterogeneous knowledge sources: C-SPARQL is limited to ontological back-
ground knowledge stored in RDF format. In contrast, ESPER can be extended
by arbitrary adapters allowing the usage of different knowledge sources. For in-
stance, beside RDF triple stores also relational databases or NoSQL data sources
can be used. However, the access methods have to be implemented and main-
tained by hand, as mentioned in the previous paragraph.
Stream Reasoning Support: Both approaches do not support stream rea-
soning, i.e. implicit knowledge is not automatically deduced when new events
arrive. Conventional reasoners can only deal with static data, but not with high-
frequent RDF streams. But, because (static) background knowledge changes
infrequently, a conventional reasoning step can be processed, if a new fact in the
static knowledge base appears.
    Considering the two approaches from a conceptional point of view, C-SPARQL
is better suited for inherent reasoning. For instance, SPARQL with RDFS en-
tailment can be achieved by using materialization or query rewriting [7]. These
approaches must be extended to stream processing. First discussions about this
issue can be found in [15] and [3].


7   Conclusion

In this paper, we have discussed two different architectural approaches of inte-
grating event stream processing and background knowledge.
    The first architecture uses a CQL processing engine such as ESPER with
an adapter class that performs SPARQL queries on a knowledge base. In this
approach stream processing and knowledge engineering is conceptually and phys-
ically separated.
    The second architecture is based on an extension of SPARQL to process
RDF data streams. C-SPARQL allows integrated rules that process stream data
and query RDF triple stores containing static background knowledge. Thus,
C-SPARQL provides a more homogeneous approach, where query logic, event
patterns and knowledge base access are combined in one rule and is, therefore,
superior from a conceptional point of view.
    Otherwise, CQL engines are well-established in real-world projects and at this
time, they offer higher maturity and better performance. Therefore, CQL-based
systems are (still) superior from a practical point of view.


                                       11
    Generally, the integration of semantic reasoning into stream processing is still
an open issue that is not fully supported by any approach yet. Stream reasoning
is therefore an important and promising research field to put effort in and has
several work in progress, for example the appproaches in [3].

Acknowledgment
This work was supported in part by the European Community (Europäischer
Fonds für regionale Entwicklung) under Research Grant EFRE Nr.W2-80115112.


References
 [1] Anicic, D., Fodor, P., Rudolph, S., Stojanovic, N.: Ep-sparql: A unified language
     for event processing and stream reasoning. In: Proceedings of the 20th Interna-
     tional Conference on World Wide Web. pp. 635–644. ACM (2011)
 [2] Anicic, D., Rudolph, S., Fodor, P., Stojanovic, N.: Stream reasoning and complex
     event processing in etalis. Semantic Web pp. 397–407 (2012)
 [3] Barbieri, D.F., Braga, D., Ceri, S., Della Valle, E., Grossniklaus, M.: Incremental
     reasoning on streams and rich background knowledge. ESWC pp. 1–15 (2010)
 [4] Barbieri, D.F., Braga, D., Ceri, S., Grossniklaus, M.: An execution environment
     for c-sparql queries. In: Proceedings of the 13th International Conference on Ex-
     tending Database Technology. pp. 441–452. EDBT (2010)
 [5] Barbieri, D.F., Braga, D., Ceri, S., Valle, E.D., Grossniklaus, M.: Querying rdf
     streams with c-sparql. SIGMOD Rec. pp. 20–26 (2010)
 [6] Bolles, A., Grawunder, M., Jacobi, J.: Streaming sparql - extending sparql to
     process data streams. In: The Semantic Web: Research and Applications, pp.
     448–462 (2008)
 [7] Glimm, B.: Using sparql with rdfs and owl entailment. In: Reasoning Web, pp.
     137–201. Lecture Notes in Computer Science, Springer Berlin Heidelberg (2011)
 [8] Krötzsch, M., Simancik, F., Horrocks, I.: A description logic primer. CoRR (2012)
 [9] Le-Phuoc, D., Dao-Tran, M., Xavier Parreira, J., Hauswirth, M.: A native and
     adaptive approach for unified processing of linked streams and linked data. In:
     The Semantic Web – ISWC 2011, pp. 370–388 (2011)
[10] Luckham, D.C.: The Power of Events: An Introduction to Complex Event Pro-
     cessing in Distributed Enterprise Systems. Addison-Wesley (2002)
[11] Prud’hommeaux, E., Seaborne, A.: Sparql query language for rdf,
     http://www.w3.org/TR/rdf-sparql-query/
[12] Renners, L., Bruns, R., Dunkel, J.: Situation-aware energy control by combining
     simple sensors and complex event processing. In: Workshop on AI Problems and
     Approaches for Intelligent Environments. pp. 29–34 (2012)
[13] Teymourian, K., Paschke, A.: Enabling knowledge-based complex event process-
     ing. In: Proceedings of the 2010 EDBT/ICDT Workshops. pp. 37:1–37:7. ACM
     (2010)
[14] Teymourian, K., Rohde, M., Paschke, A.: Fusion of background knowledge and
     streams of events. In: Proceedings of the 6th ACM International Conference on
     Distributed Event-Based Systems. pp. 302–313. ACM (2012)
[15] Volz, R., Staab, S., Motik, B.: Incrementally maintaining materializations of on-
     tologies stored in logic databases. In: Journal on Data Semantics II, pp. 1–34.
     Lecture Notes in Computer Science (2005)


                                          12
 Towards Explanation Generation using Feature
      Models in Software Product Lines

         Dean Kramer, Christian Sauer, and Thomas Roth-Berghofer

         School of Computing and Technology, University of West London,
                St Mary’s Road, London W5 5RF, United Kingdom
                          {first.lastname}@uwl.ac.uk


      Abstract. Dynamic Software Product Line (DSPL) Engineering has
      gained interest through its promise of being able to unify software adap-
      tation whereby software can be configured at compile time and runtime.
      Just like conventional adaptive software, software dynamism can con-
      fuse the user, and lower user trust. Variability knowledge expressed in a
      feature model though may not be understandable to the end user. Expla-
      nations have been shown to improve intelligibility of the software, and
      improve user trust. In this work, we consider how explanations can be
      used in DSPLs, by adding explanatory knowledge to feature models that
      can be used to generate explanations at runtime.

      Keywords: Explanation Generation, Dynamic Software Product Lines,
      Feature Models


1   Introduction

Smart phones in recent years have seen high proliferation, allowing more users
to stay productive while away from the desktop. It has become common for
these devices to have an array of sensors including GPS, accelerometers, digital
compass, proximity sensors, sound etc. Using these sensors with other equipment
already found in phones, a wide set of contextual information can be acquired.
    This contextual information can be used in Context-Aware Self Adaptive
(CASA) software. This software can monitor different contextual parameters
and dynamically adapt at runtime to satisfy the user’s current needs [8]. These
behavioural variations can be seen to share similarities with features in Software
Product Lines (SPL), where product commonality and variability is handled,
providing higher asset reuse. Within SPLs, Feature Oriented Software Devel-
opment (FOSD) has emerged as a method for modularising the features of a
system [3]. The one fundamental difference between these two concepts is that
while SPLs conventionally manage static variability which is handled at compile
time, adaptive software requires dynamic variability to be handled at runtime.
    Dynamic Software Product Lines (DSPL) enables the SPL to be reconfig-
urable at runtime [9]. By using DSPLs, variability can be static, adapted at
compile time, or dynamic and adapted at runtime. This allows for greater reuse


                                      13
as variability can be implemented for both static and dynamic adaptation, as dif-
ferent products may require the adaptation to be applied at different times [14].
    Feature Modelling has become the de facto method of variability represen-
tation, used in software product lines. In feature models, the adaptation of the
product, be it static, or dynamic, are modelled, enabling a wide variety of prod-
ucts and product behaviours. While feature modelling is of great use in the
development, the dynamics within feature modelling can be confusing to end-
users. To amend the seemingly unpredictable and thus confusing nature of the
behaviour of a dynamic system and the results it produces, it is desirable to
enable the system to explain its behaviour as well as the results it produces to
the end-user. As we will detail further on in this paper explanations are very
useful to justify results a system produces and thus help to rebuild the trust an
end-user has in the systems behaviour and results. So explanations are useful
to the end-user as they can counter the mentioned non-transparency of DSPL
end-products and their dynamic behaviours.
    In our previous work [18], on enabling a system we developed to gener-
ate explanations, we investigated the integration of explanations into a con-
text acquisition engine, used for developing context-aware applications. We did
this with regard to mobile applications were one has to adhere to many con-
straints. We developed a ContextEngine to easier deal with such limitations and
situation-specific information across applications [12], thus easing the creation of
context-aware, mobile systems. We noticed that with the increased adaptability
and dynamics of context-aware applications came an increase in complexity of
the application, which in turn made it harder to understand the behaviour of
such applications. In our initial research on this topic we then described how
we enhanced the ContextEngine platform with explanation capabilities. As we
describe in this paper and as it was proven in a variety of other work on ex-
planations, explaining can be seen as complex reasoning task on its own. In our
initial work we focused on the use of canned explanations. Canned explanations
are information artefacts, pre-formulated by the software engineer, that serve as
explanatory artefacts stored in the system and delivered to the user on demand.
We integrated storage facilities for such information artefacts, or explanatory
knowledge artefacts within the code structure of the ContextEngine and thus
were able to provide these stored canned explanations on demand to a software
engineer working with the ContextEngine. After this early steps and relatively
simple approach, based also on a further study into the matter of explanation
provision in the feature model and especially in the automated analysis feature
models (AAFM) domain, we decided to elaborate on our initial work.
   The rest of the paper is structured as follows: We introduce the feature
modelling background of our work in the following section and based on the
technological possibilities described there motivate our approach to use an ex-
tended feature model for explanation generation in Section 3. We then interlink
our approach with related work on feature modelling, explanation generation
and the use of explanations itself in the following section. We then introduce our
approach to explanation generation from explanatory knowledge stored in an


                                      14
extended feature model and demonstrate our concept of explanation generation
in Section 5 . After discussing the advantages and possible disadvantages of our
approach in Section 6 a summary and outlook on future aspects of our work
concludes the paper.

2   Feature Models
The purpose of a feature model is to represent all possible products from a SPL
in terms of features, and the relationships between them. An example feature
model for a content store application is shown in Figure 1. A feature of a system


                     Fig. 1. Feature Model of a content store


has been described in a number of variations [2]. For this paper, we use the
definition by Kang et al. [10] in that a feature is “a prominent or distinctive
user-visible aspect, quality, or characteristic of a software system or systems”.
Feature models are modelled using hierarchical trees of features, with each node
representing commonality and variability of its parent node. Relationships be-
tween each feature can be modelled using:
 – Tree feature relationships between parent (compound) features and their
   child features (subfeatures) .
 – Cross-tree constraints which typically apply feature inclusion or exclusion
   statements, normally using propositional formula. An example of this in-
   cludes “if ABC is included, then feature XYZ must also be included.”
    Within feature models, different feature relationships can be applied includ-
ing:
 – Mandatory. A child feature is defined as mandatory in all products where
   its parent is also contained.


                                     15
 – Optional. A child feature is defined as optional when it optionally can be
   included or excluded when its parent is contained in a product.
 – Or. A set of child features exhibit an or-relationship when one or more
   children are selected along with the parent of that set.
 – Alternative (XOR). A set of child features exhibit an xor-relationship
   when only a single child can be selected when the parent is included.
    Feature models have been applied not only to modelling system features, but
also context [1]. As DSPLs can be driven by context, modelling both contexts
and the features that they affect in feature models allows for a single modelling
language. The feature models introduced above represent what is known as ba-
sic feature models. There have been different additions to feature modelling,
including cardinality feature models [7], and extended feature models [6].
    Extended feature models extend basic feature models by the ability to attach
additional information about features to the model. This additional informa-
tion is included by the use of feature attributes, which are attached to features
within the feature model. Feature attributes normally consist of a name, do-
main, and value. Feature attributes have been used in previous work for specify-
ing extra-functional information [4]. We intend to also use feature attributes in
our approach. We will employ additional feature attributes to store explanatory
knowledge artefacts, see section 5.1 for details.


3   Motivation of our work
The GUI of an application, or even more intriguing, the behaviour of an ap-
plication generated by the use of SPL can be rather dynamic. This dynamic
behaviour can be confusing if not daunting to the end-user of the application.
The end-user might not be aware of why the GUI has adapted and the factors
influencing how it changes. Furthermore, the dynamic behaviour of the appli-
cation, producing different results while receiving identical inputs just under
different circumstances (for example a network being available or not), is a chal-
lenge to the trust the user develops towards the applications results. As the
feature model, being the component responsible for the dynamic behaviour of
the application, is a black box system to the end-user the need for explanations
of this black box systems behaviour arises.
    The benefits of being able to explain the behaviour of an application and
subsequently its GUI are plenty. According to [17] there are a number of benefits
explanations can provide to the end-user. The main benefits of interest with
regard to our problem at hand are the generation of trust into the results the
application generates and justification of and guidance on the changes in the
GUI of the application.
    As [6] have shown it can be a complicated process to apply abductive reason-
ing to generate minimal explanations from the logical representation of a feature
model within an AAFM. To circumvent the effort involved in using abductive
reasoning to generate a minimal explanations from the logical representation of
the feature model our approach aims at integrating canned ’micro’ or ’atomic’


                                     16
explanations within the logical representation of the feature model. By doing so
we aim to re-use the feature model itself in the same way it is used in the product
configuration to also ‘configure’ or synthesise more complex explanations of the
actual product generated from the ‘atomic’ building blocks given by the canned
‘micro’ explanations embedded in the feature descriptions themselves as well as
in the representation of the relationships between these features described in the
feature models logical representation.


3.1   Scenario Application

To illustrate our motivation, consider a DSPL example of a content store ap-
plication for a mobile device. This application may provide different content for
the user including applications, movies, music etc. Different content is organised
into different categories. A simplified feature model of the DSPL can be seen in
Figure 1. This application provides content for different age groups, and also the
application can be tailored to suit these different groups.
    In the feature model, we can see that the features Payment, ContentTypes,
History, and Retrieval are required in every configuration of this DSPL. The
Payment feature handles all payment transactions when content is bought or
rented. The ContentTypes feature contains the different components for brows-
ing, and buying different types of content. Because different regions in the world
may require different content distribution licenses, it may not be possible to
sell content in every region, so depending on the location of the user, different
content type features including Video, Music, and Applications will be bound
or unbound. In the History feature, all bought content is found, which can be
retrieved in Retrieval. There are two primary methods in which content can
be retrieved, downloaded or streamed. Certain content including video maybe
downloaded or streamed. Depending on how much storage is available on the
device, it may be not be possible to download the movie, so only the Streaming
feature is bound. In Figure 2, we can see the variability of the screen according
to content type features. If you consider the video feature, there is a button that
takes the user to a set of screens for video content, and also a containership of
widgets for advertising popular movies.


4     Related Work

Explanations and feature models have been used before, but more to aid the anal-
ysis process and error analysis of feature models [20] as well as in automated
feature model analysis in general as Benavides et al. describe in [5]. As we al-
ready mentioned there are a number of goals that can be reached by providing
explanations to the user of a, then, explanation aware system. An explanation
aware system is a system that is able to provide explanations of the results it
produces as well as of the means it employs to produce these results [11,13].
    The goals pursued by enabling a system to provide explanations [19] are the
following: Increase the transparency of a systems reasoning process to increase


                                     17
                       Fig. 2. Variability of the main screen


the users trust into the system. Justifying results the system produces. This goal
aims at explaining the quality and applicability of results the system produced
to the end-user. Another goal of explanation provision is to provide relevance
explanations of either question asked by the system or on information provided
by the system. Conceptualisation, thus explanations of the concepts the system
is working on, directly aids the last goal of providing explanations, learning.
By explaining the concepts the system works on to the end-user the end-user is
enabled to learn about the domain in which the system works.
    Our approach to providing applications needs, next to the knowledge used
by the feature model system, additional explanatory knowledge to create the ex-
planations we want the system to be able to provide to the end-user. It is always
necessary to provide explanatory knowledge in any system that is intended to
provide explanations on its reasoning [16]. This additional explanatory knowl-
edge is provided to and used by the explainer component of an explanation aware
system, enabling it to provide explanations of its reasoning and results. The need
for additional explanatory knowledge is also shown in [20] as the abduction pro-
cess described there is relying also on additional explanatory knowledge.
    In our approach the explanatory knowledge needed to generate explanations
in our system will be broken down into ‘atomic’ canned explanatory knowledge
artefacts that will be paired with each feature of the feature model as well as ad-
ditional ‘atomic’ canned explanatory knowledge artefacts that will be attached
to the relationship descriptors within our feature model. The aim of this ‘en-
richment’ or ‘dotation’ of the feature model with ‘micro’ or ‘atomic’ explanatory


                                      18
knowledge artefacts is it to reuse the artefacts ‘bound’ to the features and their
relationship descriptors in the final product to synthesise complex explanations
based on the available ‘bound’ atomic explanatory knowledge artefacts. We fo-
cus our approach especially on the issue of explaining a dynamic GUI to the
end-user. As for example [15] described in their work the problems that can re-
sult from a dynamic and automatically assembled GUI, we aim to amend these
problems by providing the end-user of an application with an insight into the
GUI’s changes by explaining them to her.


5     Our Approach

In our approach, we attempt to enable explanations in DSPLs. By adding ex-
planations to DSPLs, we see two benefits. Firstly, explanations have been shown
in other work to improve user understanding of a system which can be ap-
plied to DSPL systems [13]. Secondly, because a SPL enables many products to
be produced using reusable common assets, we can then easily produce many
explanation-ware applications, because we can leverage the reuse properties of
the SPL with the explanations. The first part of our approach regards the mod-
elling of the system.


5.1   Modelling

Just as the rest of the system is modelled using feature models, so too are the
explanations. To add explanations to the feature model, we use extended feature
models. As introduced earlier in the paper, with extended feature models, extra
explanatory information can be attached to features using feature attributes.
These feature attributes can be used for storing the specific explanation snippets
for each feature. For each feature attribute there is a name, domain, and value.
The domain of the attribute holds what type of explanation it is, with the value
holding the explanation fragment.
    Mapping explanatory knowledge fragments to features is not just enough; we
also need to map explanatory knowledge fragments to feature model relation-
ships. Examples of explanatory knowledge fragments mapped to relationships
include:

 – Mandatory - “in all configurations”.
 – Optional - “which is optional”.
 – Or - “which can be”.
 – Alternative - “which is either”.


5.2   Composing Explanations

Once we have the explanations added to the feature model, we can compose com-
plex explanations. These complex explanations are made up of the concatenation
of explanations added to the features and the relationship explanations.


                                     19
             Fig. 3. Composing Explanation of why streaming is active


    Lets take the example of considering a configuration where content is streamed
to the user instead of downloaded as shown in Figure 3. If the user wanted to
know why streaming is active, to generate an explanation we firstly follow the
tree from the streaming feature to the root. We then get the conceptual expla-
nation of the root, in this case “The content store”. Next we add the conceptual
explanation for the “History” feature, in this case “Stores historical content pur-
chases”, and because it is a mandatory feature, we add “in all configurations”.
Following this, we add the conceptual explanation for the “Retrieval” feature, in
this case “of which can be retrieved”, and add “in all configurations” because the
feature is mandatory. Then because the sub features of “Retrieval” are alterna-
tives, we add “either”, and the two conceptual explanations joined with an “or”.
Lastly, because in the configuration, the “Stream” feature is active, we add “in
this case, streamed”. We therefore can ‘reuse’ the structural information encoded
in the feature model representation directly for the composition of complex ex-
planations by simply ‘re-tracing’ the explanatory knowledge artefacts, stored in
feature and relationship nodes, along a path in the feature model.


6   Discussion
The implementation effort is expected to be minor, given the fact that our ap-
proach just adds three additional feature attributes to the feature and relation-
ship descriptions. The intention of ‘piggyback’ riding the inherent logical struc-
ture encoded in the feature model graph to also derive complex explanations
from it is still open to be tested for its actual quality of generated explanations
as well as for its scalability. With regard to the scalability of our approach we
intend it to be limited. Once a feature model exceeds a certain complexity the


                                     20
coherent concatenation of explanatory knowledge artefacts described in the fea-
ture and relation nodes along a path in such a model will fail or become too much
of a computational effort. However we assume that for small to medium scale
feature models our relatively ‘practical’ approach of concatenating explanatory
knowledge artefacts stored in the models nodes is relatively efficient compared
to existing more complex approaches of explanation generation explained, for
example, in [6].


7   Summary and Outlook


In this paper we presented how the variability of SPL based products and their
behaviours could be explained to their end-users using explanations composed
from explanatory knowledge added to enhanced feature model nodes. Based on
the feature modelling background of our work we motivated our approach to use
an extended feature model for swift explanation generation. We reviewed and
compared our approach with related work on feature modelling, explanation
generation and the use of explanations itself, especially inspecting approaches
employing relatively complex abductive reasoning. We then introduced our ap-
proach to explanation generation from explanatory knowledge stored in extended
feature model nodes explaining features themselves and their relationships. By
tracing an example graph in a feature model we showed the working of our ap-
proach. Given the fact that we are in the early stage of further examining our
approach we then discussed its possible limitations but also its possible advan-
tages given by the fact that our approach promises to be easily implemented
base on existing work on enhanced feature models and is very easy to use for
explanation composition in small to medium sized feature models, compared to
more complex approaches like abductive reasoning.
    As we cannot yet predict the effectiveness of our seemingly ‘pragmatic’ ap-
proach, we have to implement the enhanced node generation in our existing
enhanced feature model and then perform a series of experiments on this en-
hanced feature model. The aim of these experiments will be to establish our
approaches boundaries with regard to parameters such as quality and usability
of generated explanations as well as the scalability of our approach. We also
have to measure the computational effort necessary for explanation composition
and then measure it against the gain in usability to establish the worthiness of
further researching our approach of reusing extended feature model structures
for explanation generation from explanatory knowledge artefacts stored in the
nodes of the feature model.
    An additional feature of the reuse of an enhanced feature models structure
for explanation generation not yet investigated further is the reuse of proposi-
tional logic formulae derived from the feature model. We plan to investigate the
possibilities of this reuse in our immediate follow up research.


                                    21
References
 1. Acher, M., Collet, P., Fleurey, F., Lahire, P., Moisan, S., Rigault, J.P.: Modeling
    Context and Dynamic Adaptations with Feature Models. In: 4th International
    Workshop Models@run.time at Models 2009 (MRT’09). p. 10 (Oct 2009)
 2. Apel, S., Kästner, C.: An overview of feature-oriented software development. Jour-
    nal of Object Technology 8(5), 49–84 (2009)
 3. Batory, D., Sarvela, J., Rauschmayer, A.: Scaling step-wise refinement. IEEE
    Trans. Softw. Eng. 30, 355–371 (June 2004)
 4. Benavides, D., Segura, S., Ruiz-Cortés, A.: Automated analysis of feature models
    20 years later: A literature review. Inf. Syst. 35(6), 615–636 (Sep 2010)
 5. Benavides, D., Segura, S., Ruiz-Cortés, A.: Automated analysis of feature models
    20 years later: A literature review. Information Systems 35(6), 615–636 (2010)
 6. Benavides, D., Trinidad, P., Ruiz-Cortés, A.: Automated reasoning on feature mod-
    els. In: Proceedings of the 17th international conference on Advanced Information
    Systems Engineering. pp. 491–503. CAiSE’05, Springer-Verlag, Berlin, Heidelberg
    (2005)
 7. Czarnecki, K., Helsen, S., Ulrich, E.: Formalizing cardinality-based feature models
    and their specialization. Software Process: Improvement and Practice 10, 7 – 29
    (01/2005 2005)
 8. Daniele, L.M., Silva, E., Pires, L.F., Sinderen, M.: A soa-based platform-specific
    framework for context-aware mobile applications. In: Aalst, W., Mylopoulos, J.,
    Rosemann, M., Shaw, M.J., Szyperski, C., Poler, R., Sinderen, M., Sanchis, R.
    (eds.) Enterprise Interoperability, Lecture Notes in Business Information Process-
    ing, vol. 38, pp. 25–37. Springer Berlin Heidelberg (2009)
 9. Hallsteinsen, S., Hinchey, M., Park, S., Schmid, K.: Dynamic software product
    lines. Computer 41, 93–95 (April 2008)
10. Kang, K., Cohen, S., Hess, J., Nowak, W., Peterson, S.: Feature-Oriented Domain
    Analysis (FODA) Feasibility Study. Technical Report CMU/SEI-90-TR-21 (1990)
11. Kofod-Petersen, A., Cassens, J.: Explanations and context in ambient intelligent
    systems. In: Modeling and Using Context, pp. 303–316. Springer (2007)
12. Kramer, D., Kocurova, A., Oussena, S., Clark, T., Komisarczuk, P.: An extensi-
    ble, self contained, layered approach to context acquisition. In: Proceedings of the
    Third International Workshop on Middleware for Pervasive Mobile and Embed-
    ded Computing. pp. 6:1–6:7. M-MPAC ’11, ACM, New York, NY, USA (2011),
    http://doi.acm.org/10.1145/2090316.2090322
13. Lim, B.Y., Dey, A.K., Avrahami, D.: Why and why not explanations improve the
    intelligibility of context-aware intelligent systems. In: Proceedings of the SIGCHI
    Conference on Human Factors in Computing Systems. pp. 2119–2128. ACM (2009)
14. Parra, C.: Towards Dynamic Software Product Lines: Unifying Design and Runtime
    Adaptations. Ph.D. thesis, INRIA Lille Nord Europe Laboratory (March 2011)
15. Pleuss, A., Hauptmann, B., Dhungana, D., Botterweck, G.: User interface engi-
    neering for software product lines: the dilemma between automation and usability.
    In: Proceedings of the 4th ACM SIGCHI symposium on Engineering interactive
    computing systems. pp. 25–34. ACM (2012)
16. Roth-Berghofer, T.R.: Explanations and case-based reasoning: Foundational issues.
    In: Funk, P., Calero, P.A.G. (eds.) Advances in Case-Based Reasoning. pp. 389–
    403. Springer-Verlag, Berlin, Heidelberg, Paris (September 2004)
17. Roth-Berghofer, T.R., Cassens, J.: Mapping goals and kinds of explanations to the
    knowledge containers of case-based reasoning systems. In: Case-Based Reasoning
    Research and Development, pp. 451–464. Springer (2005)


                                       22
18. Sauer, C., Kocurova, A., Kramer, D., Roth-Berghofer, T.: Using canned explana-
    tions within a mobile context engine. Explanation-aware Computing ExaCt 2012
    p. 26 (2012)
19. Sørmo, F., Cassens, J., Aamodt, A.: Explanation in case-based reasoning–
    perspectives and goals. Artificial Intelligence Review 24(2), 109–143 (2005)
20. Trinidad, P., Benavides, D., Durán, A., Ruiz-Cortés, A., Toro, M.: Automated error
    analysis for the agilization of feature modeling. J. Syst. Softw. 81(6), 883–896 (Jun
    2008)


                                        23
       Towards Continuous Knowledge Representations
        in Episodic and Collaborative Decision Making

     Joachim Baumeister1 , Albrecht Striffler1 , Marc Brandt2 and Michael Neumann2
           1
            denkbares GmbH, Friedrich-Bergius-Ring 15, 97076 Würzburg, Germany
                      {firstname.lastname}@denkbares.com
       2
         The Federal Environment Agency (Umweltbundesamt), Section IV 2.3 Chemicals
                      Wörlitzer Platz 1, 06844 Dessau-Roßlau, Germany


         Abstract. With the success of knowledge-based approaches in decision support
         systems new requirements arise in practice. That way, users demand not only for
         the collaborative development of such systems, but also for the collaborative and
         episodic use in decision processes. Moreover, in complex decision domains mul-
         tiple knowledge representations are available that need to be jointly processed. In
         this paper we introduce a novel approach and a system implementation that aims
         to meet these requirements.


1     Introduction

In the past, decision support systems based on knowledge bases emphasized the explicit
representation of decision knowledge for its automated application in the target sce-
nario. Typically, those systems are used monolithically by one user or automated by a
machine. Examples are for instance the medical consultation system SonoConsult [12],
the medical therapeutic system SmartCare [6], and TIGER [8] for the monitoring of gas
turbines. With the success of those systems new requirements arise to adapt into new
environments. Advanced requirements are as follows:

    – Collaborative use: More than one person is working on the same decision process
      at the same time.
    – Episodic use: The actual decision process is not a one-step question-answer inter-
      view, but needs (sometimes sporadically) input over time, i.e., a decision episode.
    – Mixed representation: Systems are build from knowledge bases that do not use a
      single knowledge representation (e.g., rules) but a combination, for instance rules
      with models and ontologies.

   The requirements stated above call for extensions of todays systems in the following
manner:

    – A systematic extension of systems that support the collaborative and the episodic
      decision making. Here, especially an approach of representing the provenance of
      decisions is required.


                                             24
    – A continuous knowledge representation to support heterogenous representations for
      decision making and its episodic application. Here, the already introduced knowl-
      edge formalization continuum [2] needs to be reconsidered in the light of its use in
      decision making.

    In this paper, we try to shed more light into fulfilling the requirements mentioned
above. The formalization and use of the knowledge formalization continuum is intro-
duced in Section 2. In Section 3 we discuss a systematic approach for episodic decision
making in collaborative use. A case study in Section 4 exemplifies the successful appli-
cation of the described approach. The overall ideas are summarized and concluded in
Section 5.


2      Continuous Knowledge Representation and Application

One main challenge in complex decision making is finding the appropriate scope of the
knowledge base: Complex domains require a large number of aspects to be considered.
Thus, a ‘complete’ knowledge base needs to include many aspects, to be later useful in
practice. Most of the times however, not all aspects can be included in the knowledge
base:

    – Uncertain domain knowledge: Parts of the domain are not well-understood in a
      technical sense. Here, decisions in practice are often based more on past experience,
      evidence, and intuition than on strict domain laws and rules.
    – Bloated domain knowledge: For some parts of the domain, the explicit represen-
      tation of the knowledge would be too time-consuming and complex. For instance,
      much background knowledge needs to be included, that is required for proper deci-
      sion making. Here, the expected cost-benefit ratio is low, e.g., because many parts
      will be rarely used in real-world decisions1 .
    – Restless domain knowledge: Especially in technical domains, some parts of the do-
      main knowledge are frequently changing due to technological changes. The explicit
      representation of these parts would require frequent maintenance. Here, also the
      cost-benefit of the maintenance vs. the utility of the knowledge needs to evaluated.

In this section we introduce an approach that allows for the combined representation
and use of knowledge at a varying formalization granularity, i.e., the knowledge formal-
ization continuum. The main idea of the knowledge formalization continuum is to use
varying knowledge representations for one knowledge base and to select the best-fitting
representation for each partition. Besides the representation of different knowledge rep-
resentations, the approach also considers the mixed application of and reasoning with
knowledge at different formalization levels.

 1
     Costs for developing/maintaining the knowledge vs. the benefit/ frequency of using the single
     parts in practice


                                             25
2.1   The Knowledge Formalization Continuum
In general, the knowledge formalization continuum is a conceptual metaphor extending
the knowledge engineering model for a domain specialist. The metaphor emphasizes
that entities of a knowledge base can have different facets ranging from very informal
representations (such as text and images) to very explicit representations (such as logic
formulae), see Figure 1. Here, it is not necessary to commit to a specific knowledge rep-


                                   Knowledge Formalization Continuum
                                                                            Functional
                                Mindmaps                 Flow charts
                   Images                                                    models
                                                 Ontologies
                       Text     Tabular data                                      Logic
                                                                       Rules

                              Segmented              Fault       Decision
                                 text               models        trees
                       Tags                Semantic           Cases
                                          annotations


            Fig. 1. An idealistic view of the knowledge formalization continuum.


resentation at the beginning of a development project. Rather, it supports concentrating
on the actual knowledge by providing a flexible understanding of the knowledge formal-
ization process. Specific domain knowledge can be represented in different ways, where
adjacent representations are similar to each other, e.g., tabular data and cases. More ex-
treme representations are much more distinct, e.g., text vs. logic rules. It is important
to note that the knowledge formalization continuum is neither a physical model nor a
methodology for developing knowledge bases. Rather, the concept should help domain
specialists to see even plain data, such as text and multimedia, as first-class knowledge
that can be transformed by gradual transitions to more formal representations when
required. On the one hand, data given by textual documents denote one of the lowest
instances of formalization. On the other hand, functional models store knowledge at a
very formal level.
    When working with different representations of knowledge one has to keep in mind,
that every granularity of formalization has its advantages and disadvantages. On the
informal side, textual knowledge can be easily acquired and it is often already avail-
able. No prior knowledge with respect to tools or knowledge representation is nec-
essary. However, (automated) reasoning using textual knowledge is hardly possible.
The knowledge can only be used/retrieved through string-based searching methods.
The formal side proposes rules or models as knowledge representation; here automated
reasoning is effective but the acquisition of such knowledge is typically complex and


                                               26
time-consuming. Further, the knowledge engineer needs to "model" the knowledge in a
much more precise manner.
     The knowledge formalization continuum embraces the fact that knowledge is usu-
ally represented at varying levels of formality. A system supporting the knowledge for-
malization continuum should be able to store and work with different representations,
and it should support transitions between the representations where its cost-benefit ratio
is (in the best case) optimal.
     In typical projects, prior knowledge of the domain is already at hand, often in the
form of text documents, spreadsheets, flow charts, and databases. These documents
build the foundational reference of the classic knowledge engineering process, where a
knowledge engineer models domain knowledge based on these documents. The actual
utility and applicability of knowledge usually depends on a particular instance. The
knowledge formalization continuum does not postulate the transformation of the entire
collection into a knowledge base at a specific degree but the performance of transitions
on parts of the collection when it is possible and appropriate. This takes into account the
fact that sometimes not all parts of a domain can be formalized at a specific level or that
the formalization of the whole domain knowledge would be too complex, considering
costs and risks.

2.2   Reasoning in the Knowlegde Formalization Continuum
When using different types of knowledge representations the most important question
is how to connect these elements when used during the reasoning process.

Pragmatic Reasoning As a pragmatic approach to be used in decision support sys-
tems, we propose to define a taxonomy of decisions and connect entities of knowledge
(knowledge elements) with decisions of the decision taxonomy. See Figure 2.2 for an
exampled depiction. Here, the knowledge base contains rules, workflow models, and

                                                                                        Workflow Models
                                     Decision Taxonomy
          Rule Base
                                    ▶ decision1
  IF facts1 THEN decision2.1 (P5)   ▶ decision2
  IF facts2 THEN decision2.2 (N1)                               Module for
  IF facts3 THEN decision2.1 (P3)      ▼ decision2.1            decision1.x
  IF facts4 THEN decision2.2 (N5)      ▼ decision2.2
  ....
                                    ▶ decision3
                                            decision3.1                                 Decision Memo
                      Module for       ▼ decision3.2                                      Literature L1 says for
                      decision2.x           ▶ decision3.2.1   Ad-hoc decisions with       substance X that for
                                                              informal justifications         decision3.1...
                                            ▶ decision3.2.2
                                    ▶ ...


  Fig. 2. An example for connecting different knowledge elements with a decision taxonomy.


textual decision memos. All elements reference the same collection of decisions and


                                                  27
thus can jointly infer decisions. When a knowledge element is activated during deci-
sion making–the knowledge element fires–then the corresponding decision element is
established and presented as derived decision.
    Please note, that more formal approaches like RIF [16] do use a comparable connec-
tion, i.e., decisions are formalized as concepts/instances and rules are defined to derive
the existence of the concept/instance.


Decision Making using Scoring Weights With the simple approach sketched above,
decisions can be only taken categorically. For a more leveled approach, we propose
to introduce scores as weights for decisions. Scores are a well-understood weight-
ing scheme in knowledge engineering [11, 7] and has a simple reasoning semantics:
Each decision has an account which stores the scoring weights given to the decision by
knowledge elements during the reasoning process. When a knowledge element “fires”,
then the corresponding score is added to the account of the particular decision. Scoring
weights included in the account are aggregated in a predefined manner. A decision ele-
ment is established and shown as derived decision, when the aggregated scoring weight
exceeds a given threshold.


Example: Often a reduced set of score weights S = {N3, N2, N1, 0, P1, P2, P3} is suffi-
cient for developing large knowledge bases. Given the weight categories a developer can
select from seven weights N1 (weakly negative) to N3 (excluded) for negative scoring
and seven weights P1 (weakly positive) to P3 (clearly established) for positive scoring.
The weight 0 represents an unclear state. The score weights of a decision account are
aggregated as follows: The sum of two equal weights results in the next higher category,
e.g., P2 + P2 = P3. Positive and negative weights are aggregated, so that two equal score
weights nullify each other, e.g., P2 + N2 = 0. A decision is established (confirmed), if
the aggregation of the collected scoring weights exceeds the category P3.


         Rule Base
                                                                       Decision Accounts
          IF facts1 THEN decision1 (P1)
          IF facts2 THEN decision2 (N1)
          IF facts3 THEN decision4 (P3)
          IF facts4 THEN decision3 (N2)
          ....

                                                      P1                             P1

                                             P1       P2                  N3         P1

                                             P2       P2         P2       P3         P3

                                          decision1 decision2 decision3 decision4 decision5

                      Fig. 3. Exemplary score accounts for five decisions.


                                             28
    In Figure 3 the accounts of five decisions and an excerpt of a rule base are shown.
One rule fires and adds the weight P1 to the account of decision1. We see that
decision2 and decision5 are established, since the aggregation of their collected
scoring weights exceeds the weight P3. In contrast, decision4 is not established
because of the negative weight N3.

3     Episodic and Collaborative Decision Making
Complex decisions often are not made by taking one step, but are typically divided into
a number of sub-decisions. Each of them may need further research and collaborative
interaction for clarifying details. Collaboration is necessary when a complex decision
can only be made by joining experts from different domains into the decision process.
These requirements can be fulfilled by specific extensions of a decision support system:
 1. Contemporary access to the data and decisions.
 2. Episodic collaboration during decision making.
 3. Provenance of data and decisions.

3.1   Contemporary Access
Authorized persons need to be able to access the system at the same time. They should
be able to work with the system in order to make decisions or to retrieve already taken
decisions. Contemporary access can be provided by a web-based implementation of the
system, as for example implemented by semantic wiki systems [13]. Further examples
are collaborative ontology development environments such as WebProtégé [9, 14].
    In such a distributed setting we need to consider concepts like rights management
for access control, revision management of old versions of the knowledge, and conflict
management of simultaneous edits.

3.2   Episodic Collaboration
Authorized persons should be able to enter data for making a particular decision. The
data entry needs not to be made at one time but can be partitioned over multiple sessions,
i.e., decision episodes. Also, different users can enter data used for the same decision.

3.3   Provenance of Data and Decisions
When more than one person contributes to a complex decision making process and
when the process is partitioned into episodes, then the process and reasoning should
be traceable and understandable by the users. This implies the documentation of the
decisions including their history but also the provenance of the data used for making the
decision (see below). Therefore, the system needs to provide versioning of the decisions
made including a documentation by the respective users. When representing the history
and documentation of decisions by an ontology, then known approaches can be applied,
for instance [10, 4].
    Provenance of data and decisions is needed in collaborative and episodic environ-
ments. Here, the following questions need to be clearly answered:


                                        29
 – At which time was a particular data element entered?
 – Who entered the data?
 – Which knowledge elements are responsible for a particular decision?
 – What is the history of a particular data and decision?
 – Which persons contributed to the process of a particular decision?


                                                            prov:Entity
                                 wasAttributedTo


                       prov:Agent                  used          wasGeneratedBy


                      wasAssociatedWith

                                                          prov:Activity


                                          startedAtTime           endedAtTime


                                          xsd:dateTime               xsd:dateTime


                        Fig. 4. Simple version of the PROV ontology.


    We propose the application of the PROV ontology [15] to knowledge elements and
the entities of the decision process. That way, an extensible and standardized ontology
is used to represent the use and origin of decisions. In Figure 4 the three Starting Point
classes and properties of the PROV ontology are depicted. Here, an prov:Agent
is responsible for taking an prov:Activity. The prov:Activity generates an
prov:Entity, but instances of prov:Entity can be also used in (other) instances
of prov:Activity. An prov:Entity is a general representation for a thing, being
physical, digital, conceptual, or any other kind of interpretation. We can see that answers
to the questions stated above can easily be represented using the simple version of the
PROV ontology, when people involved in the decision making process are represented
as prov:Agent instances, entered data and the decisions themselves are represented
as prov:Entity instances, and the data entry and decision episodes are represented
as prov:Activities.


                                             30
4      Case Study: KnowSEC – A System for Managing Chemical
       Substances of Ecological Concern

In this section we describe the KnowSEC project and its corresponding tool. KnowSEC
stands for "Managing Knowledge of Substances of Ecological Concern" and it is used
to support substance-related work and workflows within a unit of the Federal Envi-
ronment Agency (Umweltbundesamt). More precisely, the tool supports decisions by
a number of knowledge-based modules. In the context of the KnowSEC project only
substances under REACH2 are considered. For the implementation of the KnowSEC
project the semantic wiki KnowWE [3] was extended by KnowSEC-specific plugins.
KnowWE is a full-featured tool environment for the development of diagnostic knowl-
edge bases and RDF(S) ontologies. It provides plugins for automatically testing and
debugging knowledge bases including continuous integration. For a recent overview
we refer to [1].
    Many of the requirements stated in the introduction of this paper apply to the
KnowSEC project: The group is divided into sub-groups; each sub-group collabora-
tively works on a number of substances. For each substance under consideration a num-
ber of complex decisions need to be made concerning the safety and regulation of the
substance. Decision making on a substance sometimes can take a couple of months or
even years, therefore support for episodic decision making is required.


4.1     Substances as Wiki Instances

Since the single substances are the primary target of decision making, every substance
under consideration is represented by a distinct (semantic) wiki article. The article
stores relevant information of the substance such as chemical end-points, relevant liter-
ature, and comments of group members. The information is entered by group members
using (user-friendly) editors. In the background the information is silently translated
into an ontology representation for automated reuse and processing. That way, any
information (e.g., alternative identifiers, end-points, paragraphs, comments) is repre-
sented as an RDF triple. Consequently, the visualization of the latest changes and spe-
cific overviews are easily defined by SPARQL queries [17]. The article of the imaginary
substance "Kryptonite" is depicted in Figure 5. The article is maintained for demonstra-
tion purposes and reflects by no means any real work of the agency.
     At the right of the article all decision work on the substance "Kryptonite" is dis-
played giving a summary of the currently taken (sub-)decisions, the comments by group
members, and a fact sheet showing the identifiers of the substance.
     At the time of writing, KnowSEC stores more than 11,000 substances as separate
wiki articles including a number of critical chemical characteristics. A small part of
these substances are currently under decision making.
 2
     REACH stands for the European Community Regulation on chemicals and their safe use (EC
     1907/2006). The regulation handles the registration, evaluation, authorization, and restriction
     of chemical substances.


                                              31
             Fig. 5. The article describing the imaginary substance "Kryptonite".


4.2   Continuous and Collaborative Decision Making
When displaying a substance article in KnowSEC, the left menu of the wiki is extended
by a decision making bar; see Figure 5. Here, all decision aspects are listed that are
relevant for the working group. When clicking on a decision aspect, the sub-aspects
of the selected aspect are shown. By selecting one of these (sub-)aspects the user can
initiate a decision form, where specific questions of the aspect are asked and decisions
are proposed automatically. These interactive decision forms are generated by explicit
knowledge represented by scoring rules or DiaFlux models [5]. Any data entry and
taken decision is recorded by KnowSEC including the time and user. An explanation
component shows the justifications of taken decisions by visualizing the supporting data
and the acting users of the data. For the explanation the PROV ontology—as described
in Section 3.3—is applied. Users, i.e., team members, are instances of prov:Agent
and entered data and (decision) memos are instances of prov:Entity. The cre-
ation or edit of a (decision) memo and an interactive decision form are represented
as prov:Activity instances including the corresponding edit times. A simplified
depiction of this application is shown in Figure 6; the prefix dss (decision support
system) stands for the KnowSEC ontology namespace.

Explicit Knowledge and the Taxonomy of Decisions From the technical point of
view, the explicit part of the knowledge base is partitioned into modules, that are con-


                                          32
                                dss:FormValue          dss:DecisionMemo        dss:Decision


        dss:UserMB
          Team member
               MNmember
             Team
                 NN
                                                           prov:Entity
                              wasAttributedTo


                   prov:Agent                   used          wasGeneratedBy
                                                                                               xsd:dateTime


                 wasAssociatedWith                                        startedAtTime

                                                        prov:Activity
                                                                          endedAtTime
                                                                                               xsd:dateTime


                                                                               dss:Decision-
                          dss:FormEntry                dss:MemoEntry
                                                                                Derivation


Fig. 6. Simplified version of the applied PROV interpretation for tracing the provenance of deci-
sions.


nected by a taxonomy of decision instances. Since the taxonomy is represented as an
RDF ontology, it is strongly connected with the ontology of the article information (see
paragraph above). The formal versions of the aspects are implemented by knowledge
base modules and connected by the taxonomy of decision instances. Some modules are
using decision trees, other modules use scoring rules, also RDF ontologies are used.


Decision Memos For some aspects and decisions, respectively, no explicit knowledge
base is available. When the user wants to document a decision without using an explicit
knowledge base, he/she is able to create a decision memo. A decision memo is, entered
by an authorized user and consists of free text, some meta-data (e.g., tags, time, etc.),
and an explicit decision with a scoring weight. The decision memos are attached to the
article of the corresponding substance. The included decision is used in the overall rea-
soning process. A decision memo is an implementation of an implicit reasoning element
of the knowledge formalization continuum. An example of a decision memo can be the
note of a group member that a particular aspect was proven by a specific experiment
giving the reason for deriving a specific (sub-)decision. For instance, see the decision
memos about the persistence of the substance "Kryptonite" being created in Figure 7.
Decision memos are automatically attached to the articles of the corresponding sub-
stances.


                                                       33
          Fig. 7. A decision memo created for the exemplary substance Kryptonite.


Size and Current Status Currently, KnowSEC provides explicit decision modules
for supporting the assessment of the relevance, the persistence in the environment, the
bioaccumulation potential, and the toxicity of a given substance. The taxonomy of de-
cisions however, contains 15 different main decisions on substances having a larger
number of sub-decisions.
    The static part of the knowledge base currently consists of 282 questions (user in-
puts to characterize the investigated substance) grouped by 92 questionnaires, 558 de-
cisions (assessments of the investigated substance), and about 1,000 rules to derive the
decisions. The rules are automatically generated from entered decision tables that al-
low for an intuitive and maintainable knowledge development process. Two knowledge
engineers are supporting a team of domain specialists, that partly define the knowledge
base themselves, partly giving domain knowledge to the knowledge engineers.
    At the beginning of the project a couple of internal data bases were integrated into
KnowSEC as (decision) memos. Currently, the system contains more than 27,000 (deci-
sion) memos for the 11,000 substances. In the form dialog more than 51,000 questions
were answered; partially automatically by imports of internal data bases. Both, decision
memos and the explicit rule base derived more than 42,000 module decisions.

5   Conclusions
Advanced decision support systems allow for the distributed and episodic handling of
complex decision problems. They implement large knowledge spaces by mixing differ-
ent knowledge representations with informal decision justifications. In this paper, we
introduced a novel approach for building decision making systems, that support collabo-
rative and episodic decision making. Furthermore, we motivated how the application of
the knowledge formalization continuum helps to create knowledge in complex domains.
The practical applicability and relevance of the presented approach was demonstrated
by the discussion of an installed decision support system for the assessment of chemi-
cal substances. When decisions are derived in a collaborative and episodic setting, the
transparency of found decisions is of prime importance. Thus, we are currently working
on an elaborated explanation approach based on the provenance ontology PROV, that is
capable to provide intuitive and effective ad-hoc explanations even for end users.


                                         34
References
 1. Baumeister, J., Reutelshoefer, J., Belli, V., Striffler, A., Hatko, R., Friedrich, M.: KnowWE -
    a wiki for knowledge base development. In: The 8th Workshop on Knowledge Engineering
    and Software Engineering (KESE2012). http://ceur-ws.org/Vol-949/kese8-05_04.pdf (2012)
 2. Baumeister, J., Reutelshoefer, J., Puppe, F.: Engineering intelligent systems on
    the knowledge formalization continuum. International Journal of Applied Math-
    ematics and Computer Science (AMCS) 21(1) (2011), http://ki.informatik.uni-
    wuerzburg.de/papers/baumeister/2011/2011-Baumeister-KFC-AMCS.pdf
 3. Baumeister, J., Reutelshoefer, J., Puppe, F.: KnowWE: A semantic wiki for knowledge engi-
    neering. Applied Intelligence 35(3), 323–344 (2011), http://dx.doi.org/10.1007/s10489-010-
    0224-5
 4. Franconi, E., Meyer, T., Varzinczak, I.: Semantic diff as the basis for knowledge base ver-
    sioning. In: 13th International Workshop on Non-Monotonic Reasoning (NMR). pp. 7–14
    (2010)
 5. Hatko, R., Baumeister, J., Belli, V., Puppe, F.: Diaflux: A graphical language for computer-
    interpretable guidelines. In: Riaño, D., ten Teije, A., Miksch, S. (eds.) Knowledge Represen-
    tation for Health-Care, Lecture Notes in Computer Science, vol. 6924, pp. 94–107. Springer,
    Berlin / Heidelberg (2012)
 6. Mersmann, S., Dojat, M.: SmartCaretm - automated clinical guidelines in critical care. In:
    ECAI’04/PAIS’04: Proceedings of the 16th European Conference on Artificial Intelligence,
    including Prestigious Applications of Intelligent Systems. pp. 745–749. IOS Press, Valencia,
    Spain (2004)
 7. Miller, R.A., Pople, H.E., Myers, J.: INTERNIST-1, an Experimental Computer-Based Di-
    agnostic Consultant for General Internal Medicine. New England Journal of Medicine 307,
    468–476 (1982)
 8. Milne, R., Nicol, C.: TIGER: Continuous diagnosis of gas turbines. In: ECAI’00: Proceed-
    ings of the 14th European Conference on Artificial Intelligence. Berlin, Germany (2000)
 9. Noy, N.F., Chugh, A., Liu, W., Musen, M.A.: A framework for ontology evolution in col-
    laborative environments. In: ISWC’06: Proceedings of the 5th International Semantic Web
    Conference, LNAI 4273. pp. 544–558 (2006), http://dx.doi.org/10.1007/11926078_39
10. Noy, N.F., Musen, M.A.: PromptDiff: a fixed-point algorithm for comparing ontology ver-
    sions. In: In 18th National Conference On Artificial Intelligence (AAAI-2002. pp. 744–750
    (2002)
11. Puppe, F.: Knowledge Reuse among Diagnostic Problem-Solving Methods in the Shell-Kit
    D3. International Journal of Human-Computer Studies 49, 627–649 (1998)
12. Puppe, F., Atzmueller, M., Buscher, G., Hüttig, M., Luehrs, H., Buscher, H.P.: Applica-
    tion and evaluation of a medical knowledge system in sonography (SONOCONSULT). In:
    ECAI’08/PAIS’08: Proceedings of the 18th European Conference on Artificial Intelligence,
    including Prestigious Applications of Intelligent Systems. pp. 683–687. IOS Press, Amster-
    dam, The Netherlands, The Netherlands (2008)
13. Schaffert, S., Bry, F., Baumeister, J., Kiesel, M.: Semantic wikis. IEEE Software 25(4), 8–11
    (2008)
14. Tudorache, T., Nyulas, C., Noy, N.F., Musen, M.A.: WebProtégé: A collaborative ontology
    editor and knowledge acquisition tool for the web. Semantic Web (2012)
15. W3C: PROV-O: The PROV Ontology: http://www.w3.org/tr/prov-o/ (April 2013)
16. W3C: RIF-Core Recommendation: http://www.w3.org/tr/rif-core/ (February 2013)
17. W3C: SPARQL 1.1 recommendation: http://www.w3.org/tr/sparql11-query/ (March 2013)


                                            35
      Identifying Guidelines for Designing and
    Engineering Human-Centered Context-Aware
                      Systems
                  (Position paper)

                                 Emilian Pascalau

                    Conservatoire National des Arts et Métiers,
                         2 rue Conté,75003 Paris, France
                           emilian.pascalau@cnam.fr


      Abstract. In the ”future internet” environment that is generative and
      fosters innovation, applications are simplifying, are becoming mobile, are
      getting more social and user oriented. Software design capable of coping
      with such a generative environment that drives and supports innovation
      is still an issue. Some of the challenges of such systems include: empower-
      ing end-users with the necessary tools to model and develop applications
      by themselves, while in the same time hiding the technical layer from
      them. This paper introduces a set of guidelines for designing and engi-
      neering human-centered context-aware systems from a human computer
      interaction and meta-design perspective.


1    Introduction

Recent years have brought rapid technological advances both hardware and soft-
ware: increasing pervasive computing paradigm, embedded sensor technologies
and wide range of wireless and wired protocols. Applications are simplifying, are
becoming mobile, are moving to the cloud, are getting more social and user fo-
cused [14]. ”Future Internet” is emerging as a new environment, an environment
that is generative, that fosters innovation, through the advances of technologies
and a shift in people’s perception about it and a paradigm shift in how people
act and react in this environment [22].
     In this context two directions get highlighted: context together with context
-awareness and human-centered computing. Studied for more than 30 years in
the field of artificial intelligence, computer and cognitive science, context has
still been identified by Gartner, alongside cloud computing, business impact of
social computing, and pattern based strategy, as being one of the broad trends
that will change IT and the economy in the next 10 years [21].
     We observe a paradigm shift in terms of users of context-aware systems. For
example a user is no longer a specific individual or organization. It is often a com-
munity of collaborating, or performing similar tasks groups of users. Therefore,
there is a growing need for systems meeting expectations of massive distributed


                                         36
user base of pervasive ubiquitous devices as well as distributed cloud-based web
services.
    Design and deployment of such software capable of coping with a generative
environment that drives and supports innovation through direct interaction and
empowerment of the end-user is still an issue. Not only the development of such
systems should be agile - the boundary is pushed even further - we need systems
that are designed to be agile on run-time. It has been already argued (see for
instance [15]) that designers and programmers can not foresee and anticipate
what end-users will need. Users know better what they need and future internet
environment clearly underlines this fact.
    In consequence some of the challenges of such systems that arise in the future
internet environment include: empowering end-users with the necessary tools to
model and develop applications by themselves, while in the same time hiding the
technical layer from them. This paper is part of a work in progress, and identifies
and introduces a set of guidelines for designing and engineering human-centered
context-aware systems.
    The rest of the paper is organized as follows: Section 2 discusses a use case
that is based on a concrete end-user problem (managing and tracking online pur-
chases) arising from the future internet environment; the use case will support us
in identifying a set of guidelines, for designing and engineering human-centered
context-aware systems, in Section 3; Section 4 is reserved for related work. We
conclude in section 5.


2   Application Scenario - Slice.com
In the future internet environment email communication is part of daily activi-
ties. Many of the business processes that take place in a company and not only
are started and / or integrate the actions of receiving / sending emails. In some
situations entire processes are comprised in email communications. This type of
email based use cases are important both for academia as well as for industry.
For academia from a perspective oriented towards methodology, for industry
from a practical perspective, addressing very specific problems. Emails contain
knowledge that is bundled in an unstructured way but which has meaning to
end-users, and which could be used to address end-user specific problems.
    A human-centered context-aware system dealing with such a use case would
be required to provide at least the following capabilities:
 – provide the means and allow the end-user to model and represent context;
 – allow the modeling of relationships between context elements;
 – allow the modeling of interactions between different contexts, this implies
   both in the form of conditions and sequences of events and actions (more
   precise business rules and business processes)
 – based on the provided models have the capabilities to discover in the envi-
   ronment the modeled context(s)
 – sense events and actions that are performed in the system
 – perform actions according to models defined.


                                       37
We support our previous assertions by discussing the Slice application scenario,
in the next paragraphs.
    Commius1 is a European research project that tackles systems for small and
medium enterprise systems (SMEs). The final goal of Commius, as argued in [5],
is to turn existing email-systems into a management framework for structured
processes. Each incoming email is autonomously matched to a process that is
enhanced with proactive annotations.
    Slice.com is an industry project. Similar to Commius uses emails to tackle a
very specific end-user related problem, keeping track of online purchases, that
emerged from the future internet dynamic and generative environment. This
project it is even more specific from the perspective of an end-user.
    Slice is an online purchase management tool that gets hooked into your email
account. Whenever a new email is received Slice automatically analyzes the
content of the email. If the email contains order information from one of your
online shops, then Slice via pattern-based recognition techniques extracts order
related contextual information and organizes this information for you. Hence all
your purchases will be gathered in one place, you will be able to keep track of
your shopping history, amount of money that you spent, type of products, time
related information i.e. when a shipment is about to arrive and so forth.
    We analyze from an end-user perspective what this use case is about.

 – Problem faced by users: keeping track of the purchases made online.
 – Applications involved, Services: Email as a legacy system; Services: on-
   line shops (Amazon, EBay), shipment services (FedEx, UPS); Geolocation
   services (Google Maps); other type of services i.e. topic extraction
 – Concepts: shop, service, user, invoice, email, time and date, amount of
   money, product, type of product, location, address, tracking number. The
   list of concepts is not exhaustive, and is up to the each user; however these
   are the most obvious ones. Concepts are all those entities that are used in
   taking decisions and / or involved in any way in the process of resolving the
   end-user’s problem.
 – Context: For example one context, from the perspective of an end-user in
   the Slice use case, could comprise: a specific service such as FedEx; concepts
   associated with it, i.e. shipment, location, address. Further more interaction
   related to this specific context could be provided, as what to do with this
   information and so forth.

     Figure 1 depicts a general interaction process with respect to this use case.


3     Identifying Guidelines

Fisher and Giaccardi argue in [11] that in a world that is not predictable, impro-
visation, evolution and innovation are a necessity. There is a shift from processes
towards users. Users want their problems and their requirements to be taken into
1
    http://www.commius.eu/


                                        38
Fig. 1. General interaction process to solve the problem of keeping track of online
purchases


account; they want to be part of the conversation. Continuously changing busi-
ness models, do not fit anymore the old and stiff approaches. Processes must be
in accordance with the reality. For instance process mining techniques [20] that
look at event logs emphasize the fact that processes which actually get executed
are different compared to the original blueprints. Companies need to change to
what customers/users actually do.
    Context greatly influences the way humans or machines act, the way they
report themselves to situations and things; furthermore any change in context,
causes a transformation in the experience that is going to be lived, sensed [4].
Many psychological studies have shown that when humans act, and especially
when humans interact, they consciously and unconsciously attend to context of
many types as stated in [12].
    Traditionally context has been perceived in computer science community as
a matter of location and identity, see for instance [9]. However interaction and
problems concerning interaction require more than just the environmental con-
text (location, identity) used traditionally in context-aware systems [12]. Lately
the notion of context has been considered not simply as state but as part of a
process in which users are to be involved [7].
    Nardi underlines this aspect clearly in [15] stating that ”we have only scratched
the surface of what would be possible if end users could freely program their own
applications... As has been shown time and again, no matter how much design-
ers and programmers try to anticipate and provide for what users will need, the
effort always falls short because it is impossible to know in advance what may be
needed... End users should have the ability to create customizations, extensions
and applications...”.
    From a human-centered computing perspective this type of system is what
Donald Norman calls in [16] the type of system, where the system itself disap-
pears from sight, and humans (end-users) can once again concentrate upon their
activities and their goals.
    Grundin [12] continues and argues that aggregation or interpretation done
by software systems are different than aggregation and interpretation done by
biological, psychological and social processes.
    Meta-design is a conceptual framework defining and creating social and tech-
nical infrastructures in which new forms of collaborative design can take place
[11]. Meta-design originates in human computer interaction field and tackles
end-user development.


                                        39
    Table 1, introduced in [11] compares side by side traditional design vs. meta-
design. However, in our perspective a human-centered context-aware system, in
order to provide a high degree of generality and to avoid re-implementation of
common things related to infrastructure, should be a layered system as discussed
in [18]. A low level that is very technical and should be hidden from the end-user.
This low level would follow to great extent traditional design. The high level on
the other hand should follow mainly meta-design. A translation mechanism has
to be put into place to assure translation between these two layers.


                 Table 1. Traditional Design vs. Meta-Design [11]

Traditional Design                        Meta-Design
guidelines and rules                      exceptions and negations
representation                            construction
content                                   context
object                                    process
perspective                               immersion
certainty                                 contingency
planning                                  emergence
top-down                                  bottom-up
complete system                           seeding
autonomous creation                       co-creation
autonomous mind                           distributed mind
specific solutions                        solutions spaces
design-as-instrumental                    design-as-adaptive
accountability, know-what (rational deci- affective model, know-how (embodied in-
sioning)                                  teractionism)


                           Fig. 2. Context requirements


    Several definitions for the concept context have been enunciated; Fischer,
however gives a definition that takes into account the human-centered computa-
tional environments. He defines context in [10] as being the ’right’ information,
at the ’right’ time, in the ’right’ place, in the ’right’ way to the ’right’ person.


                                        40
Figure 2 depicts aspects that have been identified in [1] as requirements for
dealing with context.
    Table 2, introduced in [10] depicts adaptive and adaptable systems. Con-
text aware systems traditionally position themselves, according to Table 2 in
the category of adaptive systems. These systems employ users’ profiles informa-
tion and other type of contextual information, like location to improve users’
experience. These approaches, although they provide a degree of flexibility, are
however still stiff approaches because they are still based on predefined designs
and very specific.


                  Table 2. Adaptive vs. Adaptable Systems [10]

                           Adaptive                        Adaptable
definition                 dynamic adaptation by the users change the functional-
                           system itself to current task ity of the system
                           and current user
knowledge                  contained in the system; knowledge is extended by
                           projected in different ways users
strengths                  little (or no) effort by users; users are in control; users
                           no special knowledge of know their tasks best
                           users is required
weaknesses                 users often have difficulties users must do substan-
                           developing a coherent model tial work; complexity is
                           of the system; loss of control increased (users need to
                                                           learn adaptation compo-
                                                           nents); systems may become
                                                           incompatible
mechanisms required        models of users, tasks, di- support for end-user modifi-
                           alogs; incremental update of ability and development
                           models
application domains        active help, critiquing sys- end-user modifibiality, tai-
                           tems, recommender systems lorability, design in use,
                                                           meta-design


    In our vision a human-centered context-aware system is a system where adap-
tivity and adaptability are blended together. By such a method users will be able
to directly model how the context should look like for a particular problem, and
afterwards the system would be required only to verify that the specified con-
text really exists in a given environment. Moreover while for adaptive systems
as stated in [3] the architecture of such systems comprises a user model (user
perspective on the problem) and a domain model (system perspective as it has
been designed), for a human-centered context-aware system there should be only
one conceptual model of the problem, that should be shared and understood in
the same way both by the end-user and the system.


                                       41
                            Fig. 3. Interrelated topics


    Figure 3 depicts the main perspectives that need to be blended together in
order to design a human-centered context-aware system. This particular view
has its roots in the meta-design conceptual framework [11].
    Aspect Oriented Programming [19] is a relatively new methodology for soft-
ware development that aims at providing software modularity by means of sepa-
ration of cross cutting concerns. This is an approach for requirements engineering
that focuses on customers concerns to be made consistent with aspect oriented
software development. In terms of software engineering throughout the code
there are specifically designed points that support adaptation based on defined
aspects.
    The authors in [8] discuss how aspect orientation and can be used in context
aware systems design. Furthermore because in aspect oriented programming di-
rect user input is taken into account this is an example of human-centered context
aware systems. However this approach although it goes into the right direction
it is restricted by the predefined points that support aspect orientation in the
implemented code.
    Based on the analysis made the design and engineering process of human-
centered context-aware systems should follow the following guidelines:

1. such a system should be as general as possible and should be able to tackle
   as many problems as possible [11];
2. such a system should should provide the right level of representation such
   that a problem representation could be automatically translated into the
   core constructs of the underlying programming language in which the overall
   system is implemented;
3. such systems should not be domain specific and therefor closed system, but
   an open system that could be used to address a wide range of problems and
   applications;


                                       42
 4. in such a system the focus should be on the needs of the end-user and not
    on the system itself; the system should be hidden from the end-user;
 5. such a system should be designed for evolution and should provide the means
    to evolve through the input of users, as well as by itself;
 6. such a system should support both skilled domain workers as well as novice
    users;
 7. such a system should be a co-adaptive environment where users change be-
    cause they learn and systems change because users become co-developers;
 8. such a system should allow active participation and empowerment of end-
    users;
 9. in such a system the end-user should be the coordinator of how the system
    works.

   Some of the guidelines refer to the relationship between system and the end-
user, and some concern just the system. The system that we envision is similar to
an operating system in terms of being general and not domain specific. The sys-
tem will be an intelligent one and will accept problem descriptions given by the
end-users. These descriptions will act as application models. Such a description
together with the intelligent system will make the application. We have already
made initial steps towards implementing such a system in [18], [17].


4   Related Work

Context awareness has been studied for several decades from many perspectives.
Development of context-aware application, however, is still very difficult and
time consuming and aside location-based services, not too many applications
have been put into real use.
    A series of surveys addressing different aspects (i.e. context modeling and
context-based reasoning, context-aware frameworks and applications, context-
aware web services) of development of context-aware systems and their applica-
tions have been developed. We observe that the present approaches to context
reasoning and context-aware system design are only partial, in most of the cases
being too specific, or too generic.
    Types of context used and models of context information systems that sup-
port collecting and disseminating context, and applications that adapt to the
changing context have been addressed by Chen and Kotz in [6].
    Bolchini et al. discuss in [4] general analysis framework for context mod-
els and a comparison of some of the data-oriented approaches available in the
literature. The framework addresses: modeled aspects of context (space, time,
absolute/relative, history, subject, user profile), representation features (type
of formalism, level of formality, flexibility, granularity, constraints) and context
management and usage (construction, reasoning, information quality monitor-
ing, ambiguity, automatic learning features, multi-context modeling).
    A unified architectural model and a new taxonomy for context data distri-
bution has been discussed in [2]. Authors identify 3 major aspects: (1) context


                                        43
data distribution should take into account node requests and quality of context
requirements to reduce management overhead; (2) context data distribution re-
quires adaptive and crosscutting solutions able to orchestrate the principal in-
ternal facilities according to specific management goals; (3) informed context
data distribution can benefit from their increased context-awareness to further
enhance system scalability and reliability.
    The impact of context-awareness on service engineering has also been noticed.
A classic and relatively recent survey [13] by Kapitsaki et al. considers context as
constituting an essential part of service behavior, especially with the interaction
of users. They observe that ”at the heart of every context-aware service, relevant
business logic has to be executed and (. . . ) adapted to context changes”.
    Related work concerning this human-centered context-aware perspective, as
it was analyzed in this paper is to the best of our knowledge only in an early
stage.


5   Conclusions

In this paper we have started an initial discussion about the design and en-
gineering of human-centered context-aware systems. Aspects discussed in this
paper are part of a work in progress. Our previous experience [18], [17] with
developing human-centered context-aware systems proved to be not trivial. This
discussion comprised aspects from human computer interaction, meta-design,
context and context-awareness. We have emphasized the fact that systems pre-
designed can not foresee all aspects and facets of a problem. Therefore end-user
should be given the necessary tools to design and develop their own solutions
based on existing services. We provide a set of guidelines and properties that
should characterize such human-centered context-aware systems.
    Next steps include formalizing a conceptual framework, methodology and im-
plementation guidelines for developing such a system that is capable of tackling
in a unified way the problem of development of human-centered context-aware
applications.


References

 1. Christos B. Anagnostopoulos, Athanasios Tsounis, and Stathes Hadjiefthymiades.
    Context awareness in mobile computing environments. Wireless Personal Com-
    munications, 42(3):445–464, 2007.
 2. Paolo Bellavista, Antonio Corradi, Mario Fanelli, and Luca Foschini. A survey of
    context data distribution for mobile ubiquitous systems. ACM Computing Surveys
    (CSUR), 44(4):50 pages, 2012.
 3. David Benyon and Dianne Murray. Applying user modeling to human-computer
    interaction design. Artificial Intelligence Review, 7(3-4):199–225, 1993.
 4. Cristiana Bolchini, Carlo A. Curino, Elisa Quintarelli, Fabio A. Schreiber, and
    Letizia Tanca. A data-oriented survey of context models. ACM SIGMOD Record,
    36(4):19–26, 2007.


                                        44
 5. Thomas Burkhart, Dirk Werth, and Peter Loos. Context-sensitive business process
    support based on emails. In WWW 2012 – EMAIL’12 Workshop, April 2012.
 6. Guanling Chen and David Kotz. A survey of context-aware mobile computing
    research. Technical report, Dartmouth College Hanover, NH, USA, 2000.
 7. Joëlle Coutaz, James L. Crowley, Simon Dobson, and David Garlan. Context is
    key. Communications of the ACM - The disappearing computer, 48(3):49–53, 2005.
 8. Abhay Daftari, Nehal Mehta, and Shubhanan Bakre Xian-He Sun. On design
    framework of context aware embedded systems. In Monterey Workshop on Software
    Engineering for Embedded Systems: From Requirements to Implementation, 2003.
 9. Anind K. Dey and Gregory D. Abowd. Towards a better understanding of context
    and context-awareness. In In HUC ’99: Proceedings of the 1st international sym-
    posium on Handheld and Ubiquitous Computing, pages 304–307. Springer-Verlag,
    1999.
10. Gerhard Fischer. Context-aware systems - the ’right’ information, at the ’right’
    time, in the ’right’ place, in the ’right’ way, to the ’right’ person. In AVI’12. ACM,
    2012.
11. Gerhard Fischer and Elisa Giaccardi. End User Development - Empowering People
    to Flexibly Employ Advanced Information and Communication Technology, chapter
    Meta-Design: A Framework fo the Future of the End-User Development. Kluwer
    Academic Publishers, 2004.
12. Jonathan Grudin. Desituating action: digital representation of context. Human-
    Computer Interaction, 16(2):269–286, 2001.
13. Georgia M. Kapitsaki, George N. Prezerakos, Nikolaos D. Tselikas, and Iakovos S.
    Venieris. Context-aware service engineering: A survey. The Journal of Systems
    and Software, 82(8):1285–1297, 2009.
14. Charles McLellan, Teena Hammond, Larry Dignan, Jason Hiner, Jody Gilbert,
    Steve Ranger, Patrick Gray, Kevin Kwang, and Spandas Lui. The Evolution of
    Enterprise Software. ZDNet and TechRepublic, 2013.
15. Bonnie A. Nardi. A Small Matter of Programming: Perspectives on End User
    Computing. MIT Press, 1993.
16. Donald A. Norman. The Invisible Computer. MIT Press, 1999.
17. Emilian Pascalau. Mashups: Behavior in context(s). In Proceedings of 7th Work-
    shop on Knowledge Engineering and Software Engineering (KESE7) at the 14th
    Conference of the Spanish Association for Artificial Intelligence (CAEPIA 2011),
    volume 805, pages 29–39. CEUR-WS, 2011.
18. Emilian Pascalau. Towards TomTom like systems for the web: a novel architecture
    for browser-based mashups. In Proceedings of the 2nd International Workshop on
    Business intelligencE and the WEB (BEWEB11), pages 44–47. ACM New York,
    NY, USA, 2011.
19. Ian Sommerville. Software Engineering 8. Addison Wesley, 2007.
20. W. M. P. van der Aalst, B. F. van Dongen, J. Herbst, L. Maruster, G. Schimm,
    and A. J. M. M. Weijters. Workflow mining: a survey of issues and approaches.
    Data & Knowledge Engineering, 47(2):237–267, 2003.
21. Min Wang.           Context-aware analytics: from applications to a system
    framework. http://e-research.csm.vu.edu.au/files/apweb2012/download/APWeb-
    Keynote-Min.pdf, 2012.
22. Jonathan Zittrain. The Future of the Internet And How to Stop It. Yale University
    Press New Haven and London, 2008.


                                           45
             Overview of Recommendation Techniques
                  in Business Process Modeling?

          Krzysztof Kluza, Mateusz Baran, Szymon Bobek, Grzegorz J. Nalepa

                         AGH University of Science and Technology,
                        al. A. Mickiewicza 30, 30-059 Krakow, Poland
                      {kluza,matb,s.bobek,gjn}@agh.edu.pl


         Abstract Modeling business processes is an important issue in Business Process
         Management. As model repositories often contain similar or related models, they
         should be used when modeling new processes. The goal of this paper is to pro-
         vide an overview of recommendation possibilities for business process models.
         We introduce a categorization and give examples of recommendation approaches.
         For these approaches, we present several machine learning methods which can be
         used for recommending features of business process models.


1     Introduction
Business Process (BP) models are visual representations of processes in an organiza-
tion. Such models can help to manage process complexity and are also easy to un-
derstand for non-business user. Although there are many new tools and methodologies
which support process modeling, especially using Business Process Model and Nota-
tion (BPMN) [1], they do not support recommendation mechanisms for BP modelers.
    As BPMN specifies only a notation, there can be several ways of using it. There are
style directions how to model BPs [2], or guidelines for analysts based on BPs under-
standability (e.g. [3]). However, a proper business process modeling is still a challeng-
ing task, especially for inexperienced users.
    Recommendation methods in BP modeling can address this problem. Based on cur-
rent progress or additional pieces of information, various features can be recommended
to a modeler, and he/she can be assisted during designing models. Such assistance can
provide autocompleting mechanisms with capabilities of choosing next process frag-
ments from suggested ones. Names of model elements or attachments can be recom-
mended as well. Such approaches can reduce number of errors during process design as
well as speed up modeling process. It also supports reusing of existing process models,
especially when a process repository is provided.
    The rest of this paper is organized as follows: In Section 2, we provide a cate-
gorization of recommendation methods used in business process modeling. Section 3
describes the current state of the art in this research area. Selected machine learning
methods that can be used for recommending features of process models are presented
in Section 4. Section 5 presents an example which can be considered as a suitable case
study for recommendation purposes. The paper is summarized in Section 6.
?
    The paper is supported by the Prosecco project.


                                            46
2     Types of recommendations
Basically, recommendation methods in BPs modeling can be classified as one of two
types: subject-based and position-based classification. The first one concentrates on
what is actually suggested, while the second one focuses on the place where the sug-
gestion is to be placed. However, they are suited for different purposes and therefore
are complementary. A hierarchy of the identified types of recommendation methods is
presented in Figure 1.

2.1    Subject-based classification
In subject-based classification we focus on what is actually suggested. The suggestion
itself is not directly dependent on the context it is placed in. The recommendation algo-
rithms may actually inspect the context to be able to deliver more accurate results but it
is not an inherent feature of recommended item.
 1. Attachment recommendations – as the name suggests, these recommendations
    suggest how to link a business process (or, more precisely, a selected element of it)
    with an external entity like a decision table or another process. Attachment recom-
    mendations appear naturally where user should link two already existing items.
      (a) Decision tables – recommendations for a decision table describing conditions
          in a gate. See an example in Figure 2.


                            Figure 2. Decision table suggestion


                                         47
                                                                                                                Recommendation
                                                                                                                  classification


                                             Subject-based                                                                                                              Position-based
                                              classification                                                                                                             classification


        Attachment                              Structural                                                       Textual                                  Forward                            Backward
     recommendations                        recommendations                                                  recommendations                          recommendations                     recommendations


48
                         Decision table                        Single element          Name of an element                          Guard conditions                     Autocomplete


                             Links                         Structure of elements                            Full name suggestion


                          Service task                                                                       Name completion


                       Subprocesses and
                        call subprocesses


                                                                                Figure 1. Identified types of recommendations
   (b) Links – recommendations for a catching event that should be connected with
       the selected throwing Intermediate Link Event. See an example on Figure 3.


                 Figure 3. Throwing Intermediate Link Event suggestion

   (c) Service task – recommendation for a service task performed in the given task
       item. See an example in Figure 4.


                  Figure 4. Service task selection with recommendation

   (d) Subprocess and call subprocess – recommendation for a subprocess or call
        subprocess that should be linked with the given activity (see Figure 5).
2. Structural recommendations – a new part of the diagram is suggested. One or
   more elements with, for example, missing incoming or outgoing flows are selected.
   The suggested structure is connected with old chosen elements.
   (a) Single element – a single item (activity, gate, swimlane, artifact, data object
        or event) is suggested. This is a more straightforward extension of editors like
        Oryx/Signavio that can already insert single elements quite easily.
   (b) Structure of elements – two or more items are suggested. A more sophis-
        ticated solution where an entire part of the process is inserted into existing,
        unfinished structure.
3. Textual recommendations are suggestions of names of elements or guard condi-
   tions. Either the full text can be suggested or suggestions may show while the text
   is being typed.
   (a) Name of an element – a name of activity, swimlane or event may be suggested.
          i. Name completion happens when user is typing the name. Several possible
             completions of partially entered name are suggested to the user.


                                       49
              Figure 5. Subprocess selection with recommendation


     ii. Full name suggestion happens when the user wants the name to be sug-
         gested by the system based on the context in which the element is placed.
(b) Guard condition suggestions are different from name suggestions because
    more than one text (condition) may be suggested at once and these conditions
    must satisfy the requirements of the gateway. The latter requirement implies
    that semantic analysis of conditions is necessary to give meaningful sugges-
    tions. See example in Figure 6.


                      Figure 6. Guard condition suggestion


                                   50
2.2   Position-based classification
 1. Forward completion – a part of the process is known and the rest of the process,
    starting with one selected activity, is to be suggested. See Figure 7.


                             Figure 7. Forward completion

 2. Backward completion – a part of the process is known and the rest of the process,
    ending with one selected activity, is to be suggested. See Figure 8.


                            Figure 8. Backward completion

 3. Autocomplete – a part of the process is known and the rest of the process is to
    be suggested. A number of items with no outgoing or incoming flows is selected –
    missing flows will lead to or from the suggested structure. See Figure 9.


                               Figure 9. Autocompletion


                                      51
3   Recommendation Techniques for Business Process Models

Empirical studies have proven that modelers prefered to receive and use recommen-
dation suggestions during design [4]. Recommendations can be based on many fac-
tors, including labels of elements, current progress of modeling process, or some ad-
ditional pieces of information, such as process description. There are several existing
approaches which can be assigned to the following subject-based categories:


 1. Attachment recommendations: Born et al. [5] presented an approach that sup-
    ports modelers during modeling tasks by finding appropriate services, meaningful
    to the modeler. More complex approach which helps process designers facilitate
    modeling by providing them a list of related services to the current designed model
    was proposed by Nguyen et al. [6]. They capture the requested service’s compo-
    sition context specified by the process fragment and recommend the services that
    best match the given context. The authors also described an architecture of a recom-
    mender system which bases on historical usage data for web service discovery [7].
 2. Structural recommendations: Mazanek et al. [8] proposed a syntax-based assis-
    tance in diagram editor which takes advantage of graph grammars for process mod-
    els. Based on this research they proposed also a sketch-based diagram editor with
    user assistance based on graph transformation and graph drawing techniques [9].
    Hornung et al. [10] presented the idea of interpreting process descriptions as tags
    and based on them provide a search interface to process models stored in a repos-
    itory. Koschmider and Oberweis extended this idea in [11] and presented their
    recommendation-based editor for business process modeling in [4]. The editor as-
    sists users by providing search functionality via a query interface for business pro-
    cess models or process model parts and using automatic tagging mechanism in
    order to unveil the modeling intention of a user at process modeling time. An ap-
    proach proposed by Wieloch et al. [12] delivers a list of suggestions for possi-
    ble successor tasks or process fragments based on analysis of context and annota-
    tions of process tasks. Case based reasoning for workflow adaptation was discussed
    in [13]. It allows for structural adaptations of workflow instances at build time or
    at run time. The approach supports the designer in performing such adaptations by
    an automated method based on the adaptation episodes from the past. The recorded
    changes can be automatically transferred to a new workflow that is in a similar
    situation of change.
 3. Textual recommendations: Naming strategies for individual model fragments and
    whole process models was investigated in [14] They proposed an automatic naming
    approach that builds on the linguistic analysis of process models from industry. This
    allows for refactoring of activity labels in business process models [15].
    According to Kopp et al. [16] it is not to automatically deduct concrete conditions
    on the sequence flows going out from the new root activity as we cannot guess the
    intention of the fragment designer. However, they presented how a single BPMN
    fragment can be completed to a BPMN process using autocompletion of model
    fragments, where the types of the joins are AND, OR, and XOR.


                                        52
4     Machine Learning Approach for Recommendation
The idea of recommender systems was evolving along with a rapid evolution of the
Internet in mid-nineties. Methods such as collaborative filtering, content-based and
knowledge-based recommendation [17] gained huge popularity in the area of web ser-
vices [18] and recently most often in context-aware systems [19]. The principal rule
that most of the recommendation methods are based on, exploits an idea of similarities
measures. This measures can be easily applied to items that features can be extracted
(eg. book genre, price, author) and ranked according to some metrics (customer liked
the book or not). However, when applied to BPMN diagrams, common recommender
systems face a big problem of non existence of standard metrics that will allow for
comparison of models. What is more, feature extraction of the BPMN diagrams that
will allow for precise and unambiguous description of models is very challenging and,
to our knowledge, still unsolved issue.
    Therefore, other machine learning methods should be investigated according to an
objective aiming at providing recommendation mechanisms for a designer. The follow-
ing Section contains an analysis of possible application of machine learning methods to
recommendations described in Section 2. A comprehensive summary is also provided
in Table 1. The black circle denotes full support of particular machine learning method
to recommendation; half-circle denoted partial support of particular machine learning
method to recommendation, and empty circle means no, or very limited support.

                              Clustering Decision          Bayesian Markov
                              algorithmsa treesb           networksc chains
Attachment recommendations         m           l               m         m
Structural recommendations         m           l               l         l
Textual recommendations            m           m               w         l
Position based classification      m           l               l         l

Table 1. Comparison of different machine learning methods for recommending features denoted
in Section 2

 a
   Useless as an individual recommendation mechanism, but can boost recommendation when
   combined with other methods
 b
   No cycles in diagram
 c
   No cycles in diagram


4.1   Classification
Clustering methods Clustering methods [20] are based on optimization task that can
be described as an minimization of a cost function that are given by the equation 1. K
denotes number of clusters that the data set should be divided into.
                                  N X
                                  X K
                                                       2
                                            kXn − µn k                                 (1)
                                  n=1 k=1
This cost function assume existence of a function f that allows for mapping element’s
features into an M dimensional space of X ∈ Rm . This however requires develop-
ing methods for feature extraction from BPMN diagrams, which is not trivial and still


                                         53
unsolved task. Nevertheless, clustering methods can not be used directly for recommen-
dation, but can be very useful with combination with other methods.

Decision trees Decision trees [21] provide a powerful classification tool that exploits
the tree data structure to represent data. The most common approach for building a tree,
assumes possibility of calculation entropy (or based on it, so-called information gain)
that is given by the equation 2.
                                          n
                                          X
                            E(X) = −            p(xi )log(p(xi )))                      (2)
                                          i=1
    To calculate the entropy, and thus to build a decision tree, only a probability p of
presence of some features in a given element is required. For the BPMN diagram, those
features could be diagram nodes (gateways, tasks, etc) represented by a distinct real
numbers. Having a great number of learning examples (diagrams previously build by
the user), it is possible to build a tree that can be used for predicting next possible
element in the BPMN diagram. However, the nature of the tree structure requires from
BPMN diagram to not have cycles, which not always can be guaranteed.

4.2   Probabilistic Graphical Models
Probabilistic Graphical Models use a graph-based representation as the basis for com-
pactly encoding a complex probability distribution over a high dimensional space [22].
The most important advantage of probabilistic graphical models over methods described
in Section 4.1 is that it is possible to directly exploit the graphical representation of BP
diagrams, which can be almost immediately translated into such model.

Bayesian networks Bayesian network (BN) [23] is an acyclic graph that represents
dependencies between random variables, and provide graphical representation of the
probabilistic model. The example of a Bayesian network is presented in Figure 10.

                                              B1


                             G1                              G2


                                              B2


                                  Figure 10. Bayesian network

    The advantage of BN is that the output of a recommendation is a set of probabilities,
allowing for ranking the suggestion from the most probable to the least probable. For
example to calculate the probability of the value of the random variable B1 from the
Figure 10, the equation 3 can be used. The G1,2 can be denoted as BPMN gateways,
and B1,2 as other blocks, e.g. Tasks or Events. Thus, having any of these blocks given,
we can calculate a probability of a particular block being a missing part.
                    XXX
         P (B1) =                P (G1)P (B1|G1)P (B2|G1)P (G2|B1, B2)                (3)
                      G1 G2 B2


                                           54
    This method however, will not be efficient for large diagrams, since exact inference
in Bayesian networks is NP-hard problem. To solve this problem either the small chunks
of BPMN diagram can be selected for the inference, or approximate inference applied.

Markov Chains Markov chain [24] is defined in terms of graph of state space V al(X)
and a transition model τ that defines, for every state x ∈ V al(X) a next-state distri-
bution over V al(X). These models are widely used for text auto-completion and text
correction, but can be easily extended to cope with other problems such as Structural
recommendations, or position-based classification.
    We can assume different BPMN block types as states x , and connections between
them as a transition model τ .
                                                                                                                                                                                                   0.4
                                                                                                                                                                          0.5


                                                                                                                                Gateway                                                       Task

                                                                                                                                                                          0.3

                                                                                                                                                                                   0.75
                                                                                                                                                                                                                               0.3
                                                                                                                                   0.5


                                                                                                                                                                            Event


                                                                                                                                                                                0.25


                                                                                                                                  Figure 11. Markov Chain

    An example Markov chain model is presented in Figure 11. The number above the
arrows denotes transition probability from one state to another. The characteristic of the
Markov chains allows for cycles.

5                             Case Study
The presented recommendation approaches can be applied to process models, especially
modeled on the basis of the existing processes in a model repository. For the purpose
of evaluation, we prepared 3 different BPMN models of bug tracking systems (Django
and JIRA) and the model of the issue tracking approach in VersionOne. A bug tracking
system is a software application that helps in tracking and documenting the reported
software bugs (or other software issues in a more general case). Such a system is often
integrated with other software project management applications.
                                                                                                 Reporter                                                                                                                                                                                  QA

                              Unreviewed     Create Issue
            Reporter


                                            [UNREVIEWED]
                                                                                                                                                                                                                                                                                         Reopen defect
                                                                                                                                                                                                                                                                                          [REOPENED]
                                                                                                                                                                                                               Create defect
                                                                                                                                                                                                                                                                           Test
                                                                                                                                                                                                   QA


                                                                                                                                                                                                                                                                        resolution
                                                                                              Any Contributor


                                                                    Close with any
                                                Review                   ﬂag
                                                                      [CLOSED]                                                 patch has issues
            Any Contributor


                                                                                                                                                                                                                                                                                          Close defect
                                                                                                                Cannot be                                                                                                                                                                  [CLOSED]
                                                                                                                reviewed by
                                                                                                                author
                                                                                                                                                                                       Developer
 Reporter


                                                                            Accept
                                                                          [ACCEPTED]          Create patch               Review patch                                                                                                                                                   Developer


                                                                                                                                                                                                                                                    ﬁxed             Resolve defect
                                                                                              Core Developer                                                                                                                                                          [RESOLVED]

                                                             Postpone
                                                                                                                                         Review patch
                                                                                                                                                                                                   Developer
            Core Developer


                                                                                                     patch has issues                                                                                                       Assess                  start progress
                                           Make Design                                                                                                                                                                    resolution                                 Prepare solution
                                            Decision                                 Accept
                                                                                                                                                                                                                            [OPEN]                                   [IN PROGRESS]
                                                                                                                                                          Merge and
                                                                                                                                                        resolve as ﬁxed
                                                                                                                                                            [FIXED]
                                                            Resolve as                                                                                                                                                                 invalid
                                                              wontﬁx
                                                            [WONTFIX]                                                                                                                                                                                                  Close defect
                                                                                                                                                                                                                                                                        [CLOSED]


                                                            Figure 12. Django                                                                                                                                                                    Figure 13. Jira


                                                                                                                                                                 55
                                                                                                                                QA, BA, Leader


                                QA, BA, Leader
                                                 Create defect
                                                    [OPEN]


                                                                                                                                    Leader


                                Leader
                                                                                                                        Move defect to
                                                    Assign
                                                    defect                                                              future iteration
                                                                                                                           [FUTURE]


                                                                                                                                   Developer


                                                                                                                                  blocking issues     Remove
                                                                                                                                                     blockages
                                                                          Prepare for work      Analyze defect                                      [BLOCKED]
                                                                             on defect              cause
                                                                             [PENDING
                                                                                                [IN ANALYSIS]


                                Developer
                                                                             ANALYSIS]
                                                         start analysys                                                                                           tests failed
                                                                                                                 As designed


               QA, BA, Leader
                                                                                                                                                                            Commit                  Point at build
                                                    Asses                                                                                        Prepare defect              defect                 number with
                                                                           quick ﬁx available                                                        solution                                        resolution
                                                  resolution                                                                                                                solution
                                                                                                                                                 [IN PROGRESS]                                       [PENDING
                                                                                                                                                                            [DONE]                      TEST]


                                                                                                                                      QA


                                                                                                                                                                                                         blocking issues     Remove
                                                                                                                                                                                                                            blockages
                                                                                                                                                                                                                           [BLOCKED]
                                QA


                                                                                                                                                                                       Test resolution                       Accept     Close defect
                                                                                                                                                                                         [IN TEST]                         [ACCEPTED]    [CLOSED]


                                                                                                                                                                                                                             Reject
                                                                                                                                                                                                    Defect still occurs    [REJECTED]


                                                                                                           Figure 14. VersionOne

    We selected such models as a case study because of their similarity. As the processes
of different bug trackers present the existing variability, such example can be easily used
to present for recommendation purposes when modeling a new bug tracking flow for
a bucktracking system.

6   Conclusion and future work
This paper focuses on a problem of recommendation methods in BP modeling. Such
methods help in speeding up modeling process and producing less error prone mod-
els than modeling from scratch. The original contribution of the paper is introducing
a categorization of recommendation approaches in BP modeling and short overview of
machine learning methods corresponding to the presented recommendations.
    Our future work will focus on specifying recommendation approach for company
management systems in order to enhance modeling process and evaluation of the se-
lected recommendation methods. We plan to carry out a set of experiments aiming at
testing recommendation approaches on various model sets.

References
 1. OMG: Business Process Model and Notation (BPMN): Version 2.0 specification. Technical
    Report formal/2011-01-03, Object Management Group (2011)
 2. Silver, B.: BPMN Method and Style. Cody-Cassidy Press (2009)
 3. Mendling, J., Reijers, H.A., van der Aalst, W.M.P.: Seven process modeling guidelines
    (7pmg). Information & Software Technology 52 (2010) 127–136
 4. Koschmider, A., Hornung, T., Oberweis, A.: Recommendation-based editor for business
    process modeling. Data & Knowledge Engineering 70 (2011) 483 – 503
 5. Born, M., Brelage, C., Markovic, I., Pfeiffer, D., Weber, I.: Auto-completion for executable
    business process models. In Ardagna, D., Mecella, M., Yang, J., eds.: Business Process
    Management Workshops. Volume 17 of Lecture Notes in Business Information Processing.
    Springer Berlin Heidelberg (2009) 510–515
 6. Chan, N., Gaaloul, W., Tata, S.: Context-based service recommendation for assisting busi-
    ness process design. In Huemer, C., Setzer, T., eds.: E-Commerce and Web Technologies.
    Volume 85 of Lecture Notes in Business Information Processing. Springer Berlin Heidelberg
    (2011) 39–51


                                                                                                                                       56
 7. Chan, N., Gaaloul, W., Tata, S.: A recommender system based on historical usage data for
    web service discovery. Service Oriented Computing and Applications 6 (2012) 51–63
 8. Mazanek, S., Minas, M.: Business process models as a showcase for syntax-based assistance
    in diagram editors. In: Proceedings of the 12th International Conference on Model Driven
    Engineering Languages and Systems. MODELS ’09, Berlin, Heidelberg, Springer-Verlag
    (2009) 322–336
 9. Mazanek, S., Rutetzki, C., Minas, M.: Sketch-based diagram editors with user assistance
    based on graph transformation and graph drawing techniques. In de Lara, J., Varro, D., eds.:
    Proceedings of the Fourth International Workshop on Graph-Based Tools (GraBaTs 2010),
    University of Twente, Enschede, The Netherlands, September 28, 2010. Satellite event of
    ICGT’10. Volume 32 of Electronic Communications of the EASST. (2010)
10. Hornung, T., Koschmider, A., Lausen, G.: Recommendation based process modeling sup-
    port: Method and user experience. In: Proceedings of the 27th International Conference on
    Conceptual Modeling. ER ’08, Berlin, Heidelberg, Springer-Verlag (2008) 265–278
11. Koschmider, A., Oberweis, A.: Designing business processes with a recommendation-based
    editor. In Brocke, J., Rosemann, M., eds.: Handbook on Business Process Management 1.
    International Handbooks on Information Systems. Springer Berlin Heidelberg (2010) 299–
    312
12. Wieloch, K., Filipowska, A., Kaczmarek, M.: Autocompletion for business process mod-
    elling. In Abramowicz, W., Maciaszek, L., W˛ecel, K., eds.: Business Information Systems
    Workshops. Volume 97 of Lecture Notes in Business Information Processing. Springer
    Berlin Heidelberg (2011) 30–40
13. Minor, M., Bergmann, R., Görg, S., Walter, K.: Towards case-based adaptation of workflows.
    In Bichindaritz, I., Montani, S., eds.: ICCBR. Volume 6176 of Lecture Notes in Computer
    Science., Springer (2010) 421–435
14. Leopold, H., Mendling, J., Reijers, H.A.: On the automatic labeling of process models. In
    Mouratidis, H., Rolland, C., eds.: Advanced Information Systems Engineering. Volume 6741
    of Lecture Notes in Computer Science. Springer Berlin Heidelberg (2011) 512–520
15. Leopold, H., Smirnov, S., Mendling, J.: On the refactoring of activity labels in business
    process models. Information Systems 37 (2012) 443–459
16. Kopp, O., Leymann, F., Schumm, D., Unger, T.: On bpmn process fragment auto-completion.
    In Eichhorn, D., Koschmider, A., Zhang, H., eds.: Services und ihre Komposition. Proceed-
    ings of the 3rd Central-European Workshop on Services and their Composition, ZEUS 2011,
    Karlsruhe, Germany, February 21/22. Volume 705 of CEUR Workshop Proceedings., CEUR
    (2011) 58–64
17. Jannach, D., Zanker, M., Felfernig, A., Friedrich, G.: Recommender Systems An Introduc-
    tion. Cambridge University Press (2011)
18. Wei, K., Huang, J., Fu, S.: A survey of e-commerce recommender systems. In: Service
    Systems and Service Management, 2007 International Conference on. (2007) 1–5
19. Bobek, S., Nalepa, G.J.: Overview of context-aware reasoning solutions for mobile devices.
    proposal of a rule-based approach. Computer Science and Information Systems (2014) ac-
    cepted.
20. Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statis-
    tics). Springer-Verlag New York, Inc., Secaucus, NJ, USA (2006)
21. Mitchell, T.M.: Machine Learning. MIT Press and The McGraw-Hill companies, Inc. (1997)
22. Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Techniques. MIT
    Press (2009)
23. Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Machine Learning
    29 (1997) 131–163
24. Rabiner, L., Juang, B.H.: An introduction to hidden markov models. ASSP Magazine, IEEE
    3 (1986) 4–16


                                           57
                   A P ROLOG Framework
    for Integrating Business Rules into JAVA Applications

                           Ludwig Ostermayer, Dietmar Seipel

               University of Würzburg, Department of Computer Science
                    Am Hubland, D – 97074 Würzburg, Germany
         {ludwig.ostermayer,dietmar.seipel}@uni-wuerzburg.de


       Abstract. Business specifications – that formerly only supported IT develop-
       ment – increasingly become business configurations in the form of rules that can
       be loaded directly into IT solutions. P ROLOG is well–known for its qualities in
       the development of sophisticated rule systems. It is desirable to combine the ad-
       vantages of P ROLOG with JAVA, since JAVA has become one of the most used
       programming languages in industry. However, experts of both programming lan-
       guages are rare.
       To overcome the resulting interoperability problems, we have developed a frame-
       work which generates a JAVA archive that provides methods to query a given set
       of P ROLOG rules; it ensures that valid knowledge bases are transmitted between
       JAVA and P ROLOG. We use X ML Schema for describing the format for exchang-
       ing a knowledge base between P ROLOG and JAVA. From the X ML Schema de-
       sciption, we scaffold JAVA classes; the JAVA programmer can use them and fill
       in the open slots by statements accessing other JAVA data structures. The data
       structure on the JAVA side reflects the complex structured knowledge base, with
       which the P ROLOG rules work, in an object–oriented way.
       We can to some extend verify the correctness of the data set / knowledge base sent
       from JAVA to P ROLOG using standard methods for X ML Schema. Moreover, we
       can add constraints that go beyond X ML. For instance, we can specify standard
       integrity constraints known from relational databases, such as primary key, for-
       eign key, and not–null constraints. Since we are dealing with complex structured
       X ML data, however, there can be far more general integrity constraints. These can
       be expressed by standard P ROLOG rules, which can be evaluated on the P ROLOG
       side; they could also be compiled to JAVA by available P ROLOG to JAVA convert-
       ers such as Prolog Cafe – since they will usually be written in a supported subset
       of P ROLOG.
       We have used our framework for integrating P ROLOG business rules into a com-
       mercial E–Commerce system written in JAVA.


Keywords. Business Rules, Logic Programming, P ROLOG, JAVA.


1    Introduction

P ROLOG is well–known for its qualities in rapid prototyping and agile software devel-
opment, and for building expert systems. In this paper we present an approach that


                                              58
allows to integrate P ROLOG rules seamlessly into JAVA applications. We could largely
automate the integration process with our framework P BR 4J (P ROLOG Business Rules
for JAVA). P BR 4J uses X ML Schema documents, from which it generates (scaffolds)
JAVA classes containing the information necessary for utilizing the business rules. The
business rules are accessed from JAVA simply by invoking the generated JAVA meth-
ods. From the JAVA point of view, the fact that a set of P ROLOG rules is requested is
hidden. The derived facts can be accessed as a result set by JAVA getter methods. In
terms of Domain Specific Languages (D SL) [8], we use P ROLOG as an external D SL
for expressing rules. Thus, our approach enables a clean separation between a JAVA ap-
plication and the business logic, and applications can benefit from the strengths of both
programming languages.
    There exists the following related work. We have already discussed the usage of
D ROOLS [6], a popular JAVA tool for business rules development, and the advantages
of knowledge engineering for business rules in P ROLOG [11, 12]. There are several so-
lutions for a communication between JAVA and P ROLOG, for instance the bidirectional
P ROLOG/JAVA interface J PL [17] that combines certain C functions and JAVA classes.
On the JAVA side, J PL uses the JAVA Native Interface (J NI), on the P ROLOG side it
uses the P ROLOG Foreign Language Interface (F LI). When working with J PL, one has
to create rather complex query strings and explicitly construct term structures prior to
querying. Slower in performance than J PL, I NTER P ROLOG [5] provides a direct map-
ping from JAVA objects to P ROLOG terms, and vice versa. P ROLOG C AFE [2] translates
a P ROLOG program into a JAVA program via the Warren Abstract Machine (WAM), and
then compiles it using a standard JAVA compiler. P ROLOG C AFE offers a core P ROLOG
functionality, but it lacks support for many P ROLOG built–in predicates from the ISO
standard.
    However, the challenge of our work was not to develop another interface between
JAVA and P ROLOG, but to simplify the access to the P ROLOG rules and data structures
in JAVA. We did not mix P ROLOG and JAVA syntax for querying the P ROLOG rules in
JAVA. Rules can be developed independently from JAVA, and our framwork ensures only
valid calls from JAVA to the P ROLOG rules. We just write the rules in P ROLOG and use
P BR 4J to generate JAVA classes; request and result handling are encapsulated in stan-
dard JAVA objects. Therefore in JAVA, the flavour of programming is unchanged. On
the other side, the easy–to–handle term structures and the powerful meta–predicates of
P ROLOG can be used to develop sophisticated rule systems. Furthermore, using P RO -
LOG ’s parsing techniques (DCGs) and infix operators, the rule syntax can be largely
adapted to a natural language level, which simplifies the rule creation process and im-
proves the readability of the rules. In particular, this is important to bridge the gap
between software engineers and business analysts without programming background.
    The structure of this paper is as follows. Section 2 presents a set of business rules
written in P ROLOG, which will serve as a running example. In Section 3, we describe
our framework: first, we represent a knowledge base in X ML and generate a correspond-
ing X ML Schema. Then, we generate JAVA classes from the X ML schema. In Section 4,
we give an example of a JAVA call to the business rules in P ROLOG. Finally, we sum-
marize our work in Section 5.

                                           2


                                          59
2     Business Rules in P ROLOG
In the following, we present a set of business rules in P ROLOG, that is part of a real com-
mercial Enterprise Resource Planning (ERP) system for online merchants. The purpose
of the business rules is to infer financial key data and costs in a given E–Commerce sce-
nario that is dealing with articles, online shopping platforms, shipping parameters, and
various other parameters. The derived data support the business intelligence module of
the application, which is implemented in JAVA.
    Due to space restrictions, we present only a simplified version of the original set of
business rules used in the application. We focus on a constellation consisting of order,
platform and shipment charges. For every shipment, taxes have to be paid according to
the country of dispatch. In our example, the inferred financial key data are gross margin,
contribution margin and profit ratio. First, we describe the input data format necessary
for a valid request, then we take a closer look at the business rules. Finally, we explain
how to describe relationships between facts in a knowledge base and how to check them
in P ROLOG.

2.1   The Knowledge Base
The input knowledge base consists of global data and orders. A P ROLOG fact of the
form tax(Country, Rate) describes the purchase tax rate of a country. The P RO -
LOG facts of the form platform_charges(Category, Commission, Dis-
count) describe the different commissions that online shopping platforms charge ac-
cording to article categories and merchants discount [7]. A P ROLOG fact of the form
shipping_charges(Country, Logistician, Charges) shows the price
that a logistician charges for a shipment to a given country.

                                 Listing 1.1: Global Data
tax(’Germany’, 0.190).
platform_charges(’Books’, 0.11, 0.05).
shipping_charges(’Germany’, ’Fast Logistics’, 4.10).

An order is a complex data structure – represented by a P ROLOG term – consisting of
an article, the country of dispatch, and the used logistician. An article is a data structure
relating a category and the prices (in Euro), i.e., base price and market price; every
article has a unique identifier EAN (European Article Number; usually 13 digits, but
we use only 5 digits in this paper).

                                  Listing 1.2: An Order
order( article(’98765’, ’Books’, prices(29.00, 59.99)),
   ’Germany’, ’Fast Logistics’ ).


2.2   The Rule Base
The following business rule demonstrates the readability and compactness offered by
P ROLOG. Join conditions can be formulated easily by common variable symbols, and


                                             3


                                            60
the term notation offers a convenient access to objects and subobjects in P ROLOG; more
than one component can be accessed in a single line. Usually, If–Then–Else statements
with many alternatives are hard to review in JAVA, but much easier to write and read
in P ROLOG. Due to the rules approach, multiple results are inferred implicitly; in a
DATALOG style evaluation, there is no need to explicitly encode a loop. In a P ROLOG
style evaluation, all results can be derived using the meta–predicate findall/3.
    Using the P ROLOG package DATALOG∗ [14] from the D IS L OG Developers’ Kit
(D DK), we can, e.g., support the development phase in P ROLOG by visualizing the rule
execution with proof trees [11]. DATALOG∗ allows for a larger set of connectives (in-
cluding conjunction and disjunction), for function symbols, and for stratified P ROLOG
meta–predicates (including aggregation and default negation) in rule bodies.
    The main predicate in the business rule base computes the financial key data for a
single order. The facts of the input knowledge base will be provided by the JAVA appli-
cation, as we will see later. Derived financial_key_data/2 facts are collected in
a P ROLOG list, which will be presented as a result set to JAVA.

                  Listing 1.3: Business Rules for Financial Key Data
financial_key_data(Order, Profits) :-
   order_to_charges(Order, Charges),
   Order = order(article(_, _, prices(Base, Market)), _, _),
   Charges = charges(Shipping, Netto, Fees).
   Gross_Profit is Netto - Base,
   C_Margin is Gross_Profit - Fees - Shipping,
   Profit_Ratio is C_Margin / Market,
   Profits = profits(Gross_Profit, C_Margin, Profit_Ratio).

order_to_charges(Order, Charges) :-
   Order = order(Article, Country, Logistician),
   Article = article(_, Category, prices(_, Market)),
   call(Order),
   tax(Country, Tax_Rate),
   shipping_charges(Country, Logistician, Charges),
   Shipping is Charges / (1 + Tax_Rate),
   Netto is Market / (1 + Tax_Rate),
   platform_charges(Category, Commission, Discount),
   Fees is Market * Commission * (1 - Discount),
   Charges = charges(Shipping, Netto, Fees).

    The predicate order_to_charges/4 first computes the charges for the ship-
ment, then an article’s netto price using the tax rate of the country of dispatch, and
finally the fees for selling an article on the online platform in a given category. We use
the P ROLOG terms Order, Profits, and Charges to keep the argument lists of the
rule heads short. E.g., order_to_charges/4 extracts the components of Order in
line 2 and calls the term Order in line 4. Thus, we can avoid writing the term Order
repeatedly – in the head and in the call. In the code, we can see nicely, which com-
ponents of Order are used in which rule, since the other components are labeled by
underscore variables.


                                            4


                                           61
2.3   Constraints in P ROLOG

In knowledge bases, facts often reference each other. E.g., in our business rules applica-
tion, we have the following foreign key constraints: for every order/3 fact, there must
exist corresponding facts for tax/2 and shipping_charges/4, whose attribute
values for Country match the attribute value for Country in order/3. The same
holds for category in platform_charges/3 and category in order/3. An-
other frequently occuring type of constraints are restrictions on argument values; e.g.,
the values for Country could be limited to countries of the European Union.
    This meta information between facts in a knowledge base usually remains hidden;
the developer of the rule set knows these constraints, and only sometimes they are easy
to identify within the set of business rules. For validation purposes of knowledge bases,
however, this information is crucial, in particular when a knowledge base for a request
is arranged by a programmer other than the creator of the set of rules.
    Constraints, such as the foreign key constraints from above, can simply be speci-
fied and tested in P ROLOG. The execution of the P ROLOG predicate constraint/1
is controlled using meta–predicates for exception handling from S WI P ROLOG. With
print_message/2, a meaningsful error message can be generated, and exceptions
can be caught with catch/3. In Listing 1.4, the foreign key constraints on Country
and Category are checked.
    We can also represent standard relational constraints in X ML. X ML representations
for create table statements have been developed and used in [3, 16]. Thus the
knowledge base – including the constraints – can be represented in X ML.

                          Listing 1.4: Foreign Key Constraints
constraint(fk(shipping_charges)) :-
   forall( shipping_charges(Country, _, _),
      tax(Country, _) ).

constraint(fk(article_charges)) :-
   forall( article(_, Category, _),
      platform_charges(Category, _, _) ).


3     Integration of P ROLOG Business Rules into JAVA

The workflow of P BR 4J follows three steps, cf. Figure 1. First, P BR 4J extracts an X ML
Schema description for the knowledge base and the result set of a given set of P ROLOG
rules. Then, the user must extend the extracted X ML Schema by names for atomic
arguments, numbers and strings from P ROLOG and review the type description. Finally,
P BR 4J uses the X ML Schema to generate JAVA classes and packs the generated classes
into a JAVA Archieve (JAR). After embedding the JAR into the JAVA application, the set
of P ROLOG rules can be called from JAVA. The facts derived in P ROLOG are sent back
to JAVA, where they are parsed; then, they can be accessed by the generated classes.


                                            5


                                           62
                             Fig. 1: Workflow of P BR 4J


    In the following, we describe the transformation of the knowledge base to an X ML
representation, from which we subsequently extract the X ML Schema. Then we show
that the JAVA classes generated from the X ML Schema reflect the complex structured
knowledge base in an object–oriented way. The result set is handled in a similar man-
ner; thus, we describe only the transformation of the knowledge base and omit further
processing details for the result set.


3.1   An X ML Schema for the Knowledge Base

X ML is a well–known standard for representing and exchanging complex structured
data. It allows for representing P ROLOG terms and improves the interoperability be-
tween P ROLOG and JAVA programs, since X ML is easy to read. We extract an X ML
Schema from the X ML representation of the knowledge base, and we generate JAVA
classes from the extracted X ML Schema.
    We use the predicate prolog_term_to_xml(+Term, -Xml) for the trans-
formation of a P ROLOG term to X ML. Listing 1.5 shows the X ML representation for the
P ROLOG term with the predicate symbol order/3. Notice the X ML attribute type
and the names of elements representing arguments of complex terms on the P ROLOG
side.

                        Listing 1.5: An Order in X ML Format
<order type="class">
   <country type="string">Germany</country>
   <logistician type="string">Fast Logistics</logistician>
   <article type="class">
      <ean type="integer">98765</ean>
      <category type="string">Books</category>
      <prices type="class">
         <base type="decimal">29.00</base>
         <market type="decimal">59.99</market>
      </prices>
   </article>
</order>


                                          6


                                         63
    These are necessary, because JAVA is a typed language, whereas P ROLOG builds
data structures from a few basic data types. The representation for class attributes in
JAVA is a typed Name="Value" pair. In order to map the knowledge base from P RO -
LOG to JAVA , we must give names to arguments of P ROLOG facts, if they are atomic,
numbers, or strings, and we must add a type information. The functor of a complex
P ROLOG term is mapped to the tag of an element with type="class". The structure
of the X ML representation easily can be generated from the P ROLOG term structure,
and some of the type information can be inferred automatically from the basic P ROLOG
data types. But, type preferences and meaningful names for atoms, numbers, and strings
must be inserted manually.
    From the X ML representation of the knowledge base and the result set, we can ex-
tract a describing X ML Schema using P ROLOG. The X ML Schema is a natural way to
describe and to define the complex data structure. Known techniques are available for
validating the X ML representation of the knowledge base w.r.t. the X ML Schema. List-
ing 1.6 shows the description of an order/3 term in X ML Schema. The X ML Schema
of the knowledge base can contain further information in attributes like minOccurs
and maxOccurs.


           Listing 1.6: Fragment of the X ML Schema describing order/3
<xsd:element name="order" type="order_Type"
   minOccurs="1" maxOccurs="unbounded" />

<xsd:complexType name="order_Type">
   <xsd:sequence>
      <xsd:element name="article" type="article_Type" />
      <xsd:element name="country" type="xsd:string" />
      <xsd:element name="logistician" type="xsd:string" />
   </xsd:sequence>
</xsd:complexType>

<xsd:complexType name="article_Type">
   <xsd:sequence>
      <xsd:element name="ean" type="xsd:integer" />
      <xsd:element name="category" type="xsd:string" />
      <xsd:element name="prices" type="prices_Type" />
   </xsd:sequence>
</xsd:complexType>

<xsd:complexType name="prices_Type">
   <xsd:sequence>
      <xsd:element name="base" type="xsd:decimal" />
      <xsd:element name="market" type="xsd:decimal" />
   </xsd:sequence>
</xsd:complexType>


                                          7


                                         64
3.2   Scaffolding of JAVA Code
From the X ML Schema, we generate JAVA classes using the P ROLOG–based X ML trans-
formation language F N T RANSFORM [13]. F N T RANSFORM offers recursive transfor-
mations of X ML elements using a rule formalism similar to – but more powerful than –
X SLT. Every xsd:element in the schema with a complex type will be mapped to
a JAVA class. Child elements with simple content are mapped to attributes. Figure 2
shows a fragment of the U ML diagram for the generated classes.


                              Fig. 2: Generated Classes


    All classes associated with the class KnowledgeBase implement the methods
check and toPrologString. An example of the method toPrologString of
the generated class Order is shown in Listing 1.7. A recursive call of check controls
that all necessary input data are set before the method toPrologString is called
to build a knowledge base in a string format, which can be parsed easily by P ROLOG
using the predicate string_to_atom/2. The transformation to a P ROLOG term can
be achieved by atom_to_term/3.
    Parts of the generated class RuleSet are shown in Listing 1.8. The method query
sends a P ROLOG goal together with a knowledge base in string format from JAVA to
P ROLOG. As a default interface between JAVA and P ROLOG, we have implemented
a simple connection with a communication layer based on standard TCP/IP sockets.
Other interfaces can be implemented and set as the value for the attribute prolog-
Interface of the class RuleSet. The default interface is represented by the class
PrologInterface, which is fix and not generated every time a set of P ROLOG rules


                                         8


                                        65
is integrated into a given JAVA application via P BR 4J. The class PrologInterface
must be integrated into the JAVA application once, and it must be accessible for all
generated classes of the type RuleSet.

                    Listing 1.7: toPrologString in Order
public String toPrologString() {
   this.check();
   StringBuilder sb = new StringBuilder();
   sb.append( "order" + "(" +
      this.article.toPrologString() + ", "
      "’" + this.getCountry() + "’" + ", "
      "’" + this.getLogistician() + "’" + ")" );
   return sb.toString();
}

    The result set that is sent back from P ROLOG to JAVA is parsed by the method
parseResult of the class RuleSet. As for the class PrologInterface, the
class PrologParser is not generated and must be integrated into the JAVA appli-
cation once and be accessible for all generated classes of the type RuleSet. The
method parseProlog of PrologParser saves the content of the string returned
from P ROLOG in a structured way to a hashmap. The hashmap than can be further pro-
cessed efficiently by the method readData that all classes associated with the class
ResultSet must implement. The method readData analyses the hashmap and fills
the data list of the class ResultSet.

                         Listing 1.8: The Class RuleSet
package pbr4j.financial_key_data;

public class RuleSet {
   private PrologInterface prologInterface = null;
   private String name = "financial_key_data";
   private KnowledgeBase knowledgeBase = null;
   private ResultSet resultSet = null;
   // ... code that we omit...
   private void parseResponse(String prologString) {
      DataList data = PrologParser.parseProlog(prologString);
      this.resultSet = new ResultSet();
      this.resultSet.readData(data); }
   // ... code that we omit...
   private void query(KnowledgeBase kb) {
      if (prologInterface == null) {
         this.setDefaultInterface(); }
      String response = prologInterface.callProlog(
         this.name, kb.toPrologString());
      this.parseResponse(response); }
   // ... code that we omit...
}


                                         9


                                        66
     All generated classes are organised in a namespace via a JAVA package. The package
access protection ensures that the class RuleSet can only contain a Knowledge-
Base from the same package. The package can be stored in a JAVA Archive (JAR) – a
compressed file that can not be changed manually. This creates an intentional generation
gap, cf. Fowler [8]. The JAR file can be embedded into any JAVA application easily, and
all classes in the JAR become fully available to the JAVA developers.


4   A JAVA Call to the Business Rules
In the following, we will give a short example of a request to a set of P ROLOG rules
using the classes generated with P BR 4J. Listing 1.9 shows a test call from JAVA to the
set of business rules described in Section 2. We omit object initialisation details, but we
assume that the necessary objects for a successful call are provided. For improving the
readability of the result of the call, we assume that all classes associated with the class
ResultSet implement the method toPrologString.

                     Listing 1.9: A JAVA Call to the Business Rules
import pbr4j.financial_key_data.*;

public class TestCall {

    public static void main(String[] args) {
       RuleSet rules = new RuleSet();
       KnowledgeBase kb = new KnowledgeBase();
       // ... filling the knowledge base with data ...
       rules.query(kb);
       ListIterator<Object> it =
          rules.getResultSet().listIterator();
       while (it.hasNext()) {
          System.out.println(it.next().toPrologString() + ".");
       } } }

    It is not visible in JAVA that a request is made to a rule set written in P ROLOG.
Running the JAVA code from above creates the system output shown in Listing 1.10;
we have added some newlines to improve readability. The first fact is derived from the
data described in Subsection 2.1. The second fact is derived from another order of the
same article, that is shipped to France. Charges for the shipment to a foreign country are
higher, and the tax rate of France is 0.196, which explains the slightly lower argument
values of profits/3.

                              Listing 1.10: order Result Set
financial_key_data(
   order( article(’98765’, ’Books’, prices(29.00, 59.99)),
      ’Germany’, ’Fast Logistics’ ),
   profits(21.41, 11.70, 0.195) ).

financial_key_data(


                                            10


                                           67
    order( article(’98765’, ’Books’, prices(29.00, 59.99)),
       ’France’, ’Fast Logistics’ ),
    profits(21.16, 7.70, 0.128) ).


5   Conclusions
We have presented a largely automatic approach for integrating a set of P ROLOG rules
seamlessly into JAVA applications. X ML Schema is used for specifying the X ML format
for exchanging a knowledge base and a result set, respectively, between P ROLOG and
JAVA.
    On the P ROLOG side, we use a generic mapping from the P ROLOG representation
of the knowledge base and the result set to an X ML representation enriched by data type
information and names for atoms or numbers, and we extract a describing X ML Schema
from the X ML representation. On the JAVA side, we scaffold JAVA classes from the X ML
Schema, that reflect the complex structured P ROLOG terms in an object–oriented way.
Accessing a set of rules from JAVA is simply done by invoking the JAVA methods of the
generated classes without programming P ROLOG or creating complex query strings.
    We have illustrated our approach using a set of business rules that we have already
integrated with our framework into a commercial E–Commerce system written in JAVA.

Acknowledgement. We acknowledge the support of the Trinodis GmbH.


References
 1. S. Abiteboul, P. Bunemann, D. Suciu: Data on the Web – From Relations to Semi–Structured
    Data and X ML, Morgan Kaufmann, 2000.
 2. M. Banbara, N. Tamura, K. Inoue.: Prolog Cafe: A Prolog to Java Translator, Proc. Intl.
    Conf. on Applications of Declarative Programming and Knowledge Management (INAP)
    2005, Springer, LNAI 4369.
 3. A. Böhm, D. Seipel, A. Sickmann, M. Wetzka: Squash: A Tool for Designing, Analyzing and
    Refactoring Relational Database Applications. Proc. Intl. Conf. on Applications of Declara-
    tive Programming and Knowledge Management (INAP) 2007, Springer, LNAI 5437.
 4. H. Boley: The Rule Markup Language: R DF –X ML Data Model, X ML Schema Hierarchy,
    and X SL Transformations. Proc. Intl. Conf. on Applications of Declarative Programming
    and Knowledge Management (INAP) 2001, Springer, LNAI 2543.
 5. M. Calejo: InterProlog: Towards a Declarative Embedding of Logic Programming in Java,
    Proc. 9th European Conference on Logics in Artificial Intelligence, JELIA, 2004.
 6. Drools – The Business Logic Integration Platform.
    http://www.jboss.org/drools/.
 7. Ebay Seller Fees. http://pages.ebay.de/help/sell/seller-fees.html.
 8. M. Fowler. Domain–Specific Languages. Addison–Wesley, 2011.
 9. E. Gamma, R. Helm, R. Johnson, J. Vlissides. Design Patterns. Elements of Reusable
    Object–Oriented Software. Addison–Wesley Longman, 2010.
10. B. v. Halle. Business Rules Applied. Wiley, 2002.
11. L. Ostermayer, D. Seipel. Knowledge Engineering for Business Rules in P ROLOG. Proc.
    Workshop on Logic Programming (WLP), 2012.


                                              11


                                             68
12. L. Ostermayer, D. Seipel. Simplifying the Development of Rules Using Domain Specific
    Languages in D ROOLS. Proc. Intl. Conf. on Applications of Declarative Programming and
    Knowledge Management (INAP) 2013.
13. D. Seipel. Processing X ML–Documents in Prolog. Proc. 17th Workshop on Logic Program-
    ming (WLP), 2002.
14. D. Seipel. Practical Applications of Extended Deductive Databases in DATALOG∗ . Proc.
    Workshop on Logic Programming (WLP) 2009.
15. D. Seipel. The D IS L OG Developers’ Kit (D DK).
    http://www1.informatik.uni-wuerzburg.de/database/DisLog/
16. D. Seipel, A. Boehm, M. Fröhlich: Jsquash: Source Code Analysis of Embedded Database
    Applications for Determining S QL Statements. Proc. Intl. Conf. on Applications of Declara-
    tive Programming and Knowledge Management (INAP) 2009, Springer, LNAI 6547.
17. P. Singleton, F. Dushin, J. Wielemaker: JPL: A Bidirectional Prolog/Java Interface
    http://www.swi-prolog.org/packages/jpl/, 2004.
18. J. Wielemaker. S WI P ROLOG Reference Manual
    http://www.swi-prolog.org/pldoc/refman/


                                              12


                                             69
     Knowledge Acquisition for Life Counseling

                      Régis Newo and Klaus-Dieter Althoff

         German Research Center for Artificial Intelligence, DFKI GmbH,
                  Research Group Knowledge Management,
                  Competence Centre Case-Based Reasoning
                     Email: firstname.surname@dfki.de


      Abstract. In this paper, we explain how highly unstructured domain
      knowledge can be acquired and integrated in a case-based reasoning sys-
      tem. We apply our approach to the life counseling domain. We introduce
      the two steps of our knowledge acquisition approach in such unstructured
      domains. The first step is manual and relies on domain experts. The sec-
      ond step is automatic and uses information extraction techniques. Our
      approach has the potential to contribute to the formalizing and estab-
      lishing of an important subset of life counseling terminology. In addition,
      our approach could serve as an example for comparable weak theory do-
      mains.


1   Introduction
Case-Based Reasoning (CBR) is a methodology for problem solving based on
the fact that previously experienced knowledge can be used to solve new prob-
lems [1]. It has been successfully applied in different domains like for example
medicine [2], help-desk [3] or technical diagnosis [4]. The needed knowledge used
in a CBR system is stored in the so-called knowledge containers (vocabulary,
similarity measures, adaptation knowledge and the case base) [5]. The amount
of knowledge available for each container depends on the application domain.
For application domains, in which a certain level of formalization is already
achieved, it might be easier to fill the vocabulary and similarity measures con-
tainers. Whereas it might be easier to fill the case base container in unformalized
and/or unstructured application domains.
Our application SeBaPort (Portal for counseling, in German Seelsorge- und Be-
ratungsportal) deals with life counseling. Life counseling deals with the wellbeing
of humans. Life counselors conduct converstions with consulters, give advices and
help them to help themselves. Counselors often rely on past expriences for the
counseling. The goal of SeBaPort is to help counselors by providing them with
counseling cases (depending on their requests), which they can learn from. This
makes CBR an ideal methodology to process the knowledge used in that do-
main.
Life counseling is a domain which is highly unstructured. This makes it very
difficult to develop a CBR system for life counseling and be able to provide
knowledge in the previously mentioned knowledge containers. For this applica-
tion domain, we would have to develop an initial set of vocabulary and similarity


                                         70
measures, and also find a methodology to process the available (unstructured)
cases and store them in our case base. SeBaPort does not aim at providing so-
lutions for a given counsel or problem, primarily because the acceptance of the
counselors would significantly diminish, if we claim to be able to provide com-
plete solutions to counseling problems.
In order to build a life counseling CBR system, we started by developing an
initial CBR model and acquiring structured cases. In this paper, we describe
in Section 3 how we designed our initial CBR model and our approach for the
acquisition of (structured) cases in Section 4. Afterwards we will present some
related work in Section 5 and will conclude the paper in Section 6. In the next
section, we will first give an elaborate presentation of the life counseling domain.


2   Life Counseling
Life counseling is concerned with the welfare of human beings, more precisely
the thinking, feeling, acting, and also the faith of persons. Life counselors help
people deal with their problems and conflicts. They conduct several counseling
interviews with the consulters. The main idea is to help people help themselves by
having several discussions with them, give them multiple views on their problem
and give them basic hints. Life counselors for example give exercises, which
are part of a counseling method, to consulters after an interview. During the
following interviews, they try to find out, whether it helped the consulter or it
should be changed.
In order to do that, counselors themselves mainly rely on their experience in
the domain, but also on the methodical knowledge they learned during their
formation. They are grouped in small communities to share their experiences.
As they do not only build on self-made experiences but also on those from others,
they often rely on peer consulting and supervision to critically analyse past cases
(and be able to learn from them). Further they contact other colleagues when
they need help in an actual or past counseling case. Such help might comprise
a whole counseling case or just information about parts or aspects (e.g., the
method or exercise that can be used in a given situation) of life counseling.
Our goal is to provide a system that can be used to help life counselors in their
work. We want to provide a decision support system that helps the counselors
to document and share their experiences. They would also learn new cases from
others and be able to find hints and references (e.g., to counseling methods) while
looking for help (when they deal with a given case). The intended functionality
of our system is presented in figure 1 on the basis of the CBR cycle.
    An example of the description of the patient’s problem in a documented case
is given below.
    Woman, 48 years old, married and 3 children: 12, 15 and 18 years old.
    She has been working shifts as a full-time midwife for 2 years. She at-
    tends counseling because of insomnia due to her often alternating shift
    work. She has particularly problems with insomnia after night shifts. She


                                        71
                      Fig. 1. CBR Cycle in Life Counseling


    cannot ignore surrounding noises and she cannot completely darken her
    bedroom. As a consequence of this, she is often tired, is not able to work
    under pressure and suffers from headaches.
The documentation of the case also contains the documentation of each interview
with for example the applied methods, goal validation and solution interventions.


3   Case Model for Life Counseling
When developing CBR systems, one of the first challenges is to fill the knowledge
containers. We have to evaluate which kind of knowledge is availaible in order
to know which containers can be filled. In life counseling, the knowledge that is
easier to acquire is the experirences made by the experts represented as cases.
Due to lack of formalization in the domain, we need to find a way to extract
formalized knowledge from the available cases. For that, we want to structure
the information contained in cases.


                                       72
                     Fig. 2. Structure of a life counseling case


Most experts have their own way to write down their cases. Furthermore, the
cases do not contain the same kind of information, have different levels of detail
and elaborateness. It is thus nearly impossible to automatically detect a structure
in raw cases.
Our approach to structure the available cases is to get the needed information
from the experts. Instead of getting unstructured cases, we want to be able to
get semi-structured cases from the experts. For that purpose, we elaborated a
structure that should be used by the experts. The used structure has to reflect
the way of thinking of life counselors. We thus have to involve experts in order
to define such a structure.
Table 2 shows the structure for life counseling cases that we developed. It is
based on a preliminary study done with domain experts (i.e. counselors) [6]. We
validated the structure by comparing it with a doctor’s report used in a clinic
for psychotherapy and psychosomatic medicine. It shows that a case can contain
a multitude of information. Although a given case must not contain all possible
information (i.e. each parameter of the structure must not be filled), it would
be very difficult to automatically map the knowledge from a given case to the
parameters. We used the defined case structure to develop a CBR model with
an initial vocabulary and an initial set of similarity measures. The CBR model
is more detailled, so we can have a better description of the cases and also more


                                        73
precise similarity measures. For example, the medication (in personal data) has
following attributes:
 – the name of the medication,
 – the generic type,
 – the active substance,
 – the daily dosage, etc.
As another example, the CBR has attributes for the number of children as well
as the gender and the age of each children.


4   Two-Step Case Acquisition
Now that we defined a case model, the next challenge is to fill our case base
with life counseling cases following the model. Unfortunately, counselors do not
have a formal manner to document their cases. This leads to unstructured case
descriptions. In order to be able to use those cases in our CBR system, we have
to find methods to formalize the existing knowledge (i.e. cases). This is a difficult
task because of the diversity of information available in a case, as can be seen
in the last section. Our approch for the knowledge acquisition consists of two
steps.
The goal of the first step is to organize the available information. This is a man-
ual step in which the available diversified information is mapped to the structure
defined in Section 3. We rely on domain experts to cope with this assignment. In
SeBaPort, this step is realized by providing experts with web forms, which can
be used to enter the cases. At the end of this step, we have a case description
that matches the structure defined in Table 2.
The second step of the knowledge acquisition is automatic and consists of using
information extraction to obtain structured CBR cases. The complexity of this
step depends on the type of information given by the experts in the first step.
Some information, like the gender or the nationality of the patients can be eas-
ily matched to a formal case model. Other, like the medication or the children,
need more effort for the formalization. For example, the way information about
medication is given differs from one expert to another and can be more or less
expressive. Nevertheless we have to be able to match the natural language de-
scription of the medication information to our formal model which contains the
additional attributes defined at the end of Section 3. Another example is the at-
tribute children, for which the same holds. From the given description, we have
to be able to identify, if given, the number of children, the age and/or gender of
each children and so on. In the example given in Section 2, we would identify 3
children and the ages of the children. However, the gender are not documented.
We used information extraction techniques provided by the component ANNIE
of the framework GATE (see [7]) to tackle this challenge.
Another goal we pursue in SeBaPort, is to be able to learn from the acquired
cases in order to formalize the application domain. We thus want to perform a
stepwise knowledge formalization for life counseling. This has to be done from


                                        74
scratch because the domain, as explained earlier, is highly unstructured. We are
actually trying to gain the formalized knowledge from the acquired cases. The
purpose is to be able to tackle the fact that there are not only several ways
to document a counseling case, but also different counseling perspective. The
representatives of each perspective often have problems to deal with case docu-
mentations from other perspectives. A formaliztion like the one we are targeting
would promote the intercommunication between the representatives of the diff-
ent perspectives.


5   Related Work

The idea of using CBR in medical related domains has been explored in the
last couple of years. In [2] the authors present four recent CBR applications in
different medical domains. The applications deal with:

 – Long-term follow-up of oncology patients
 – Assistance of type 1 Diabetes patients
 – Support of physicians in the domain of end stage renal disease
 – Diagnosis and treatment of stress.

There have been many other CBR applications in medical domains. Nevertheless,
to our knowledge, SeBaPort is the first one to deal with life counseling.
As for knowledge formalization, there are also other approaches that deal with
that topic. One of them is the knowledge formalization continuum presented in
[8]. The authors present a process for knowledge development based on a flexible
organization of knowledge. The main difference between our aproach and this
one (as well as many other knowledge formalization approaches) is that the
only initially available knowledge in life counseling are the unstructured case
descriptions. That is, our inital information can hardly be used for learning,
classification or even formalization.
In [9], the authors present an approach for knowledge extraction from data taken
from forums, which are communities of experts. This approach relies on a initial
auxiliary data to extract the knowledge and uses the extracted knowledge to
improve the knowledge extraction.


6   Conlusion

In this paper, we presented the domain of life counseling and how the available
knowledge can be used to develop a CBR system (SeBaPort). SeBaPort will
help life counselors to extend their knowledge and learn from past cases. We
showed how we are actually extracting the knowledge from available cases and we
want to use it for the knowledge formalization. We intend to test our knowledge
acquisition by evaluating the similarity measures with the acquired cases. The
evaluation is still an ongoing work. Furthermore we will develop an approach to
incorporate experts’ feedback to our formalization process.


                                      75
References
1. Aamodt, A., Plaza, E.: Case-based reasoning : Foundational issues, methodological
   variations, and system approaches. AI Communications 1(7) (March 1994)
2. Marling, C., Montani, S., Bichindaritz, I., Funk, P.: Synergistic case-based reasoning
   in medical domains. Expert Systems with Applications (2013)
3. Roth-Berghofer, T.: Learning from homer, a case-based help desk support system.
   In Melnik, G., Holz, H., eds.: Advances in Learning Software Organizations. Volume
   3096 of Lecture Notes in Computer Science. Springer Berlin Heidelberg (2004) 88–97
4. Althoff, K.D.: Machine Learning and Knowledge Acquisition in a Computational
   Architecture for Fault Diagnosis in Engineering Systems. In Weintraub, M., ed.:
   Proc. International Machine Learning Conference (ML92), Workshop on ”Computa-
   tional Architectures for Supporting Machine Learning and Knowledge Acquisition”.
   (1992)
5. Richter, M.M.: Fallbasiertes Schließen. In: Handbuch der Künstlichen Intelligenz.
   Oldenbourg Wissenschaftsverlag Verlag (2003) 407–430
6. Newo, R., Althoff, K.D., Bach, K., Althoff, M., Zirkel-Bayer, R.: Case-Based Rea-
   soning for Supporting Life Counselors. In Cassens, J., Roth-Berghofer, T., Kofod-
   Petersen, A., Massie, S., Chakraborti, S., eds.: Proceedings of the Workshop on
   Human-Centered and Cognitive Approaches to CBR at the ICCBR 2011. (Sept
   2011)
7. Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: A Framework
   and Graphical Development Environment for Robust NLP Tools and Applications.
   In: Proceedings of the 40th Anniversary Meeting of the Association for Computa-
   tional Linguistics (ACL’02). (2002)
8. Baumeister, J., Reutelshoefer, J., Puppe, F.: Engineering intelligent systems on the
   knowledge formalization continuum. International Journal of Applied Mathematics
   and Computer Science (AMCS) 21(1) (2011)
9. Bach, K., Sauer, C.S., Althoff, K.D.: Deriving case base vocabulary from web com-
   munity data. In Marling, C., ed.: ICCBR-2010 Workshop Proceedings: Workshop
   on Reasonng From Experiences On The Web. (2010) 111–120


                                          76
HaDEsclipse  Integrated Environment for Rules
             (Tool Presentation)?

            Krzysztof Kaczor1 , Grzegorz J. Nalepa1 , Krzysztof Kutt1

                     AGH University of Science and Technology,
                     Al. Mickiewicza 30, 30-059 Kraków, Poland
                 kk@agh.edu.pl, gjn@agh.edu.pl, kutt@agh.edu.pl


        Abstract.    In the paper a presentation of HaDEsclipse is given. It is an
        environment for design and implementation of rule-based systems within
        the Semantic Knowledge Engineering (SKE) approach. It is build with
        the use of the Eclipse framework, integrating the previously developed
        components of the HaDEs environment. HaDEsclipse integrates modules
        for conceptual prototyping of rule bases, and a visual editor for logical
        design of extended decision tables that group rules working in similar
        context. It also allows for generating an executable form of the system,
        that can be later executed by an inference engine. While the SKE is
        targeted mainly at knowledge engineers, the use of the Eclipse framework
        makes the development easier for software engineers.


1     Introduction and Motivation
Rule-based systems (RBS) play an important role in knowledge engineering and
software engineering, e.g. with the business rules approach [1]. However, practical
design of rules is a challenging task. It requires both ecient knowledge repre-
sentation methods for rule bases, as well as practical design tools that support
them. The Semantic Knowledge Engineering (SKE) [2] addresses these prob-
lems, by providing the XTT2 [3] and ARD+ [4] representation methods, and a
dedicated design framework HaDEs, previously presented at KESE [5].
    However, HaDEs turned out to be hard to use for knowledge engineers not
familiar with SKE as well as for software engineers. This gave the motivation to
develop a new front end to HaDEs, based on the popular Eclipse IDE. In this
paper we present this new tool called HaDEsclipse [6]. First, we shortly discuss
the SKE design process and how it is supported by HaDEs. Then we present
the architecture and selected aspects of implementation of HaDEsclipse.


2     SKE Design Process with HaDEs
Our research concerns the representation and formal verication of RBS. An im-
portant result of our research is the SKE (Semantic Knowledge Engineering ) [2]
?
    The paper is supported by the AGH UST Grant 15.11.120.361.


                                        77
methodology, which derives from the HeKatE (Hybrid Knowledge Engineering )
research project [7]. It aims at providing an integrated process for design, im-
plementation, and analysis of the RBS supported by HaDEs (HeKatE Design
Environment ) framework.
    The main features of this methodology are:

 1. Visual rule representation. The provided XTT2 [3] rule representation method
    that visualizes the rule base in the form of interconnected decision tables,
    which makes the design more transparent.
 2. Supported rule modeling. The HaDEs framework provides a set of dedicated
    tools, which facilitate the design process.
 3. Easy rule maintenance. The HaDEs-based design process consists of three
    stages. The transitions between stages are formally dened and automati-
    cally performed. The modication made in one stage can be automatically
    propagated into the following stages.
 4. One rule type. As opposed to Business Rules, SKE provides only one type of
    rule  production rule. However, the methodology provides dierent inference
    strategies that correspond to dierent types of Business Rules, e.g. derivation
    rule type corresponds to backward chaining inference mode.
 5. Formal rule description and verication. The provided formal rule language
    based on the ALSV(FD) (Attributive Logic with Set of Values over Finite
    Domains ) logic [7] allows for formalized representation and verication of
    rules. Moreover, the semantics of rules is precisely dened.

The SKE approach can be applied to a wide range of intelligent systems. In this
context, two main areas have been identied in the project: control systems, in
the eld of intelligent control, and Business Rules [1] and Business Intelligence
systems, in the eld of software engineering.
    The HaDEs framework aims at supporting the SKE approach. In this ap-
proach, the application logic is expressed using forward-chaining decision rules.
They form an intelligent rule-based controller or simply a business logic core.
The logic controller is decomposed into multiple modules represented by decision
tables. HaDEs supports a complete hierarchical design process for the creation
of knowledge bases. The whole process consists of three stages: conceptual, log-
ical and physical design and is supported by a number of tools providing the
visual design and automated implementation1 .
    The conceptual design is the rst stage of the process. During this step,
the ARD+ (Attribute Relationships Diagrams ) method is used. The principal
idea for this stage is to build a graph dening functional dependencies between
attributes on which the rules are built. This stage is supported by two visual
tools: VARDA (Visual ARD+ Rapid Development Alloy ), and HQEd.
    The logical design is the second stage of the process. During this stage, rules
are designed using the visual XTT2 (Extended Tabular Trees version 2 ) [3]
method. This phase can be performed as the rst one in the design or as the
1
    See: https://ai.ia.agh.edu.pl/wiki/hekate:hades


                                     78
second one, when the input is provided from the conceptual design. It is sup-
ported by the dedicated editor HQEd (HeKatE Qt Editor ). HQEd supports
the HML format, which allows for importing models generated by VARDA, as
well as for saving and loading the state of the design.
    Having a complete XTT2-based model the physical implementation can
be generated automatically. In this stage, a logical model is transformed into
an algebraic presentation syntax called HMR2 (HeKatE Meta Representation ).
HMR is a textual representation of the XTT2 logic. It is a human readable form,
as opposed to the machine readable HML format. The HMR representation can
be directly executed by the dedicated inference engine tool, called HeaRT3
(HeKatE Run Time ) [8]. The HeaRT engine has communication and integra-
tion facilities. It supports Java integration based on callback mechanism and
Prolog JPL library, called JHeroic.
    HaDEs proved to be an ecient framework for designing rule bases within
the SKE approach. However, its main limitation is that it is a set of loosely
connected tools. Moreover, these tools have custom GUIs, which is problematic
for engineers not familiar with SKE. This gave motivation for the development
of a new platform, providing a more user friendly front end to HaDEs.


3     Architecture of HaDEsclipse
A decision was made to use the popular Eclipse IDE, which is a widely used
tool in the software engineering community. Using it a new integrating front
end to HaDEs was developed [6]. HaDEsclipse was implemented as a plugin
for Eclipse. It integrates modules for conceptual prototyping of rule bases, and
a visual editor for logical design of extended decision tables grouping rules. It
also allows for generating an executable form of the system, that can be later
executed by an inference engine. Within this plugin, one can manage the whole
SKE design process described in the previous section.
    The main functional requirements of HaDEsclipse are aimed at integrating
the existing components of HaDEs using Eclipse:

1. ARD+ support:
   (a) Code editor with syntax highlighting, formatter, content assistant and
       error checking,
   (b) Integration with VARDA,
   (c) Wizard to create new ARD+ les.
2. HML support:
   (a) Code editor with syntax highlighting and checking, content assistant,
   (b) Integration with HQEd,
   (c) Wizard to create new HML les.
3. HMR support:
2
    See https://ai.ia.agh.edu.pl/wiki/hekate:hmr.
3
    See https://ai.ia.agh.edu.pl/wiki/hekate:heart


                                    79
    (a) Code editor with syntax highlighting, code formatter, content assistant
        and error checking,
    (b) Integration with HeaRT.
 4. Preferences card:
    (a) Code editors settings,
    (b) HaDEs environment parameters.
 5. Intuitive Eclipse wizards, views and perspective.
   The architecture of HaDEsclipse is presented on Fig. 1. It consists of 5 parts:
three of them support HaDEs languages and the other two are responsible for
view and wizards. Communication with HaDEs environment (VARDA, HQEd,
HeaRT) is handled using JHeroic library.


         ARD+ Support                  HML Support            HMR Support

          Parser                        Parser                  Parser
          Editor                        Editor                  Editor
          Syntax Coloring               Content Assistant       Syntax Coloring
          Content Assistant                                     Content Assistant
          Code Formatter                                        Code Formatter


         Views                             HaDEsclipse         Wizards

           HeaRT View                                           New File
           HQEd View                                            Import/Export
                                                                Verification
                                                                Model Execution

                                              JHeroic


                              VARDA            HQEd          HeaRT


                               Fig. 1. Architecture of HaDEsclipse


    The tool was implemented in Java as a plugin for Eclipse. All of the functional
requirements where met. Five modules of HaDEsclipse successfully support the
design with SKE. Thanks to HaDEsclipse the models created in the subsequent
design phases are easily interchanged between the HaDEs tools. Moreover, the
tool allows to run and verify rule models in HeaRT. For the visual design of
the XTT2 tables HQEd is used, but les produced with it are exchanged with
other tools transparently for the user.
    An example session with the tool is presented in Figures 2 and 3. In the rst
gure the conceptual design with ARD+ is presented. The conceptual model
of the rule base is described using a set of attributes and dependencies between
them. The HML le contains prototypes of decision tables holding the condi-
tional and decision attributes. In the second gure the HMR editing process is
presented. The plugin supports both syntax highlighting and hinting, as well as


                                             80
a structured XML editor in the case of ARD+. Schemas (headers) of the tables
are dened base on the HML description. In given tables rules are dened using
the xrule construct.


                               Fig. 2.   HML Editor

4   Summary and Future Work
In the paper the HaDEsclipse tool was presented. It is an integrating front-end
for the HaDEs framework, which supports the knowledge engineering process in
the SKE approach [2]. The new tool makes HaDEs more accessible and useful
for software engineers.
    Our future works include further integration of HaDEsclipse with other tools
we developed. This includes design tools for business processes, and integration
with business process engines. Finally, recent results include an Eclipse-based
tool for generating test cases for unit testing based on a rule-based specication.
This framework will be integrated with HaDEsclipse, bringing more practical
benets from the area of knowledge engineering to software engineers [9].


References
1. von Halle, B.: Business Rules Applied: Building Better Systems Using the Business
   Rules Approach. Wiley (2001)
2. Nalepa, G.J.:     Semantic Knowledge Engineering. A Rule-Based Approach.
   Wydawnictwa AGH, Kraków (2011)


                                      81
                                 Fig. 3.   HMR Editor
3. Nalepa, G.J., Lig¦za, A., Kaczor, K.: Formalization and modeling of rules using the
   XTT2 method. International Journal on Articial Intelligence Tools 20(6) (2011)
   11071125
4. Lig¦za, A.: Logical Foundations for Rule-Based Systems. Springer-Verlag, Berlin,
   Heidelberg (2006)
5. Kaczor, K., Nalepa, G.J.: HaDEs  presentation of the HeKatE design environment.
   In Baumeister, J., Nalepa, G.J., eds.: 5th Workshop on Knowledge Engineering
   and Software Engineering (KESE2009) at the 32nd German conference on Articial
   Intelligence: September 15, 2009, Paderborn, Germany, Paderborn, Germany (2009)
   5762
6. Bator, P.: Projekt i implementacja narz¦dzi do edycji wiedzy reguªowej HeKatE
   na platformie Eclipse. Master's thesis, AGH University of Science and Technology
   (2012) supervisor: Grzegorz J. Nalepa.
7. Nalepa, G.J., Lig¦za, A.: HeKatE methodology, hybrid engineering of intelligent
   systems. International Journal of Applied Mathematics and Computer Science 20(1)
   (March 2010) 3553
8. Nalepa, G.J.: Architecture of the HeaRT hybrid rule engine. In Rutkowski, L., [et
   al.], eds.: Articial Intelligence and Soft Computing: 10th International Conference,
   ICAISC 2010: Zakopane, Poland, June 1317, 2010, Pt. II. Volume 6114 of Lecture
   Notes in Articial Intelligence., Springer (2010) 598605
9. Grzegorz J. Nalepa, K.K.: Proposal of a rule-based testing framework for the au-
   tomation of the unit testing process. In: Proceedings of the 17th IEEE Interna-
   tional Conference on Emerging Technologies and Factory Automation ETFA 2012,
   Kraków, Poland, 28 September 2012. (2012)


                                       82