=Paper= {{Paper |id=Vol-1128/intro9 |storemode=property |title= Configuring Domain Knowledge for Natural Language Understanding |pdfUrl=https://ceur-ws.org/Vol-1128/paper9.pdf |volume=Vol-1128 |dblpUrl=https://dblp.org/rec/conf/confws/SelwayMS13 }} == Configuring Domain Knowledge for Natural Language Understanding== https://ceur-ws.org/Vol-1128/paper9.pdf
Matt Selway, Wolfgang Mayer, Markus Stumptner                                                                                    63




         Configuring Domain Knowledge for Natural Language Understanding

                         Matt Selway and Wolfgang Mayer and Markus Stumptner
                                       University of South Australia
                                                 Adelaide
                          {.}@unisa.edu.au


                         Abstract                                 tree into a semantic model to make use of the domain knowl-
                                                                  edge. This causes some issues in ensuring the consistency
     Knowledge-based configuration has been used for
                                                                  and correctness of the domain knowledge.
     numerous applications including natural language
                                                                     In this paper we present an approach to parsing natural lan-
     processing (NLP). By formalising property gram-
                                                                  guage that performs semantic processing directly using con-
     mars as a configuration problem, it has been shown
                                                                  figuration. Instead of a model of language categories (e.g.
     that configuration can provide a flexible, non-
                                                                  noun, verb, noun phrase) and the properties (or constraints)
     deterministic, method of parsing natural language.
                                                                  on those categories, such as property grammars, we use a
     However, it focuses only on syntactic parsing. In
                                                                  model of domain concepts and the relations between them.
     contrast, configuration is usually performed using
                                                                  As a result, we perform natural language understanding; at
     knowledge about a domain and is semantic in na-
                                                                  least with respect to the semantic model being used.
     ture. Therefore, we argue that configuration has
                                                                     Our approach maintains the advantages of using configu-
     the potential to be used, not only for syntactic pro-
                                                                  ration for natural language processing, while gaining the fol-
     cessing, but for the semantic processing of natural
                                                                  lowing: (1) a simplified lexicon containing minimal lexical
     language, effectively supporting Natural Language
                                                                  information, (2) improved consistency of the domain knowl-
     Understanding (NLU).
                                                                  edge as the configuration process ensures its consistency dur-
     In this paper, we propose an approach to NLP that            ing parsing, and (3) the semantic disambiguation of terms.
     applies configuration to the (partial) domain model             As our aim is to support the translation of informal natu-
     evoked by the processing of a sentence. This has             ral language business specifications into formal models, we
     the benefit of ensuring the meaning of the sentence          demonstrate our approach on an example from the business
     is consistent with the existing domain knowledge.            domain. The example business specification is defined us-
     Moreover, it allows the dynamic incorporation of             ing the Semantics of Business Vocabulary and Business Rules
     domain knowledge in the configuration model as               (SBVR) [OMG, 2008], which we use as our semantic model.
     the text is processed. We demonstrate the approach           SBVR and the example are discussed in more detail later.
     on a business specification based on the Semantics              The remainder of this paper is organised as follows: Sec-
     of Business Vocabulary and Rules.                            tion 1.1 provides a brief introduction to SBVR and its con-
                                                                  cepts, Section 2 presents an example that will be used
1   Introduction                                                  throughout the paper, Section 3 describes our approach to
Knowledge-based configuration has been used in numerous           parsing natural language, Section 4 presents experimental re-
applications. While historically used for configuring phys-       sults, Section 5 discusses related work, and Section 6 pro-
ical products, configuration has been applied to other do-        vides insight into future work and concludes the paper.
mains such as software services, software product lines, and
constraint-based language parsing [Hotz and Wolter, 2013].        1.1   Brief overview of SBVR
   In particular, [Estratat and Henocque, 2004; Kleiner et al.,   The Semantics of Business Vocabulary and Business Rules
2009] have applied configuration to a translation of prop-        (SBVR) is a standard developed by the Object Management
erty grammars (a constraint-based linguistic formalism). By       Group (OMG) to facilitate the transfer of business knowledge
formalising property grammars as a configuration problem,         between business people and the technical experts respon-
they show that configuration can provide a flexible, non-         sible for developing information systems for them [OMG,
deterministic method for parsing natural language. However,       2008]. It encompasses two aspects: (1) a meta-model for rep-
these approaches focus on syntactic parsing by using the con-     resenting vocabularies, facts, and rules in a machine process-
figuration process to generate a parse tree. In contrast, con-    able format, and (2) a controlled English notation for repre-
figuration is usually applied to domain knowledge, that is, se-   senting the same vocabularies, facts, and rules more suited to
mantic processing. Furthermore, in [Kleiner et al., 2009] ad-     people. Therefore, SBVR supports the exchange of business
ditional processes are required in order to transform the parse   specifications between both organisations and software tools.




                                                                                     Michel Aldanondo and Andreas Falkner, Editors
                                                                        Proceedings of the 15th International Configuration Workshop
                                                                                                  August 29-30, 2013, Vienna, Austria
64                                                                                             Matt Selway, Wolfgang Mayer, Markus Stumptner

   The SBVR meta-model standardises concepts for the defi-                      Although quite simple, this example demonstrates a num-
nition of business vocabularies (i.e. sets of concepts relevant              ber of important concepts such as nouns, verbs, and quanti-
to a particular organisation or business domain) and rules re-               fiers. The approach of [Kleiner et al., 2009] processes the
lating to those vocabularies. It is based on formal logic; pri-              sentence by executing a series of model transformations that
marily first-order predicate logic with an extension in modal                result in a UML model of the text, using SBVR as an interme-
logic (necessity, possibility, permissibility, and obligation).              diate model between the natural language text and UML. The
   Within the meta-model, vocabularies are defined on the ba-                steps up to the creation of the SBVR model are as follows:
sis of Meanings and Representations. They consist of sets of                  1. a text-to-model transformation creates an Ordered
interrelated object types, individual concepts, and fact types.                  Words model that annotates the words with their posi-
A distinction is made between a meaning and its representa-                      tion in the sentence
tion, allowing a single concept to be represented with multiple
words (possibly in different languages), images, or sounds.                   2. a model-to-model transformation creates a Labelled
   The semantic structure of business rules are formed by                        Words model by labelling each word with their possible
the Logical Formulations aspect of the meta-model. This in-                      syntactic categories using a lexicon (model)
cludes concepts for first-order logical operators (e.g. conjunc-              3. configuration is used to transform the Labelled Words
tion, disjunction, implication), quantification (e.g. universal,                 model into a Syntax model, performing syntactic and
existential, exactly n), and modal operators (e.g. necessity,                    grammatical analyses, and
obligation). These concepts allow business people to define
structural and operative rules. Structural rules include such                 4. a model-to-model transformation creates an SBVR
rules as cardinality constraints on the relations between con-                   model from the Syntax model
cepts and cannot be violated, while operative rules may be                      Performing this process on the example sentence would re-
violated by a person involved in conducting the business.                    sult in the Syntax and SBVR models displayed in Figure 2.
                                                                                This approach provides a flexible, non-deterministic, and
2    Motivating Example                                                      extensible method of parsing natural language [Kleiner et al.,
                                                                             2009]; however, it has several issues, chiefly: (1) it is pri-
This section introduces an example that identifies the limita-               marily syntactic, (2) it requires a detailed lexicon, and (3) the
tions of existing approaches and that will be used in the re-                mapping to SBVR can be problematic, e.g. in the handling of
mainder of this paper to demonstrate our approach. It is an                  ‘local area’ two correct interpretations can be conceived.
extract of the EU-Rent business specification included in the                   The first is the main issue as, although [Estratat and
SBVR specification [OMG, 2008, Annex E].                                     Henocque, 2004] suggest that configuration can combine syn-
   EU-Rent is a fictional car rental company with a global                   tactic and semantic analysis, it is primarily used for generat-
presence. The example business specification defines domain                  ing syntactic parse trees. As a result, a sentence could be syn-
specific vocabulary and rules for EU-Rent, its structure, and                tactically correct but not meaningful and, therefore, must be
how it conducts its business. Figure 1 shows a portion of the                linked to the semantics somehow (e.g. through a model trans-
vocabulary related to the organisational structure of EU-Rent.               formation to SBVR like that used in [Kleiner et al., 2009]).
The following is a structural rule, based on this vocabulary,                   Although [Kleiner et al., 2009] introduce some semantic
that defines a cardinality constraint on the part-of relationship            elements into their model (i.e. each category can be linked
between a ‘branch’ and a ‘local area’.                                       to a basic element of the SBVR model1 ), they do so only to
(1) Each branch is included in exactly one local area.                       ease the transformation to the SBVR model. The existence
                                                                             of the SBVR elements does not provide any semantic guaran-
                                                                             tees. Therefore, semantic inconsistencies need to be resolved
rental organisation unit                                                     either during the transformation to SBVR, making it much
   Definition: organisational unit that operates part of EU-Rent’s           more complex, or by post-processing of the SBVR model.
   car rental business                                                       However, this seems unnecessary if it can be achieved during
rental organisation unit having rental responsibility                        the configuration process itself.
   Definition: the
                . . . rental organisation unit is responsible for the
   operation of customer-facing rental business                                 The requirement of a detailed lexicon is more an issue
rental organisation unit having area responsibility                          for our target application area than with the parsing method
   Definition: the
                . . . rental organisation unit includes organisation
                                                                             itself. In the context of businesses creating and maintain-
   units for which it has the responsibility to coordinate operations        ing their own sets of domain specific business vocabularies
   and ensure resources                                                      and business rules, we do not see business people defining
local area                                                                   detailed lexicons with linguistic information such as voice,
   Definition: rental organisation unit .that
                                           . . . has area responsibility     genre, transitivity, etc. In this application area, the business
branch                                                                       vocabulary is more like a glossary containing domain specific
   Definition: rental organisation unit that
                                         . . . . has rental responsibility   words and their definitions, like that of Figure 1, rather than
branch is included in local area
                                                                             a dictionary containing detailed lexical information. There-
   Synonymous Form: local area includes branch
                                                                             fore, an approach that requires less linguistic information to
                                                                             be defined is required for our purposes.
Figure 1: Business vocabulary used by the example rule with
SBVR markup: object types, fact types, and keywords.
                                            ..........                          1
                                                                                    This is not shown in Figure 2a for readability.




Michel Aldanondo and Andreas Falkner, Editors
Proceedings of the 15th International Configuration Workshop
August 29-30, 2013, Vienna, Austria
Matt Selway, Wolfgang Mayer, Markus Stumptner                                                                                    65




                             (a)                                                                   (b)

         Figure 2: Example Syntax model (a) and SBVR model (b) generated by the process of [Kleiner et al., 2009]


   Finally, there are some problems with mapping the syntac-       we will show that our proposed approach does not reduce the
tic tree to the SBVR semantics. Consider how the term ‘local       flexibility with respect to the semantic representation used.
area’ is handled in the above example. For simplicity, ‘local
area’ is a single noun, which maps directly to the object type     3   Parsing Process
‘local area’ in SBVR. However, in reality it would be consid-
ered a noun phrase, where the term ‘local’ would be used in        In order to overcome these limitations we propose the use of
an adjective sense and ‘area’ would be the noun. This would        knowledge-based configuration on the semantic representa-
map to the object type ‘area’ with the characteristic ‘being lo-   tion directly, rather than configuring a syntactical parse tree.
cal’. However, in the vocabulary of EU-Rent, ‘local area’ is a     In this way we maintain the benefits of parsing using config-
single concept and should not be decomposed in this way.           uration, while ensuring semantic consistency and removing
                                                                   a step from the process. Furthermore, this approach remains
   It seems a simple problem to fix: the term ‘local area’ could   agnostic with respect to the semantic representation used; al-
be included as a noun in the lexicon, which is what would          though we utilise the SBVR meta-model in this paper.
happen if a business were defining its vocabulary. However,           In order to avoid complex syntactical analysis, which is a
unless the individual terms ‘local’ and ‘area’ were removed,       difficult problem in itself, we use an approach inspired by
which could affect the processing of sentences in other con-       Cognitive Grammar, a theory of grammar from the field of
texts, it would result in two correct parses of the sentence:      Cognitive Linguistics [Langacker, 2008]. Cognitive Gram-
one in which ‘local area’ is treated as a simple noun and one      mar takes a holistic view of language, combining syntax, se-
treating it as a noun phrase. The transformation to the SBVR       mantics, and pragmatics into a unified whole. In particular,
model would not resolve this issue either, as both forms have      our approach is based on that of [Holmqvist, 1993], which
a valid mapping. Therefore, the user would have to select          provides a computational model of Cognitive Grammar.
the preferred mapping or the SBVR model would have to be              In Cognitive Grammar, the meaning of an expression is
processed to see if either one or the other has already been       understood by combining the semantic structures evoked by
created. Either way it makes the process more cumbersome,          its constituent expression into a unified structure; a process
whereas we propose that by configuring the SBVR model di-          called semantic accommodation in [Holmqvist, 1993]. Evok-
rectly, this problem would be avoided (as long as only one or      ing the semantic structure of an expression is a relatively sim-
the other has been specified in the vocabulary).                   ple endeavour as the two are linked; therefore, a semantic
   It could be argued that this problem is a quirk of the SBVR     structure is evoked by looking up the expression in a lexicon.
model as other semantic representations with a structure more      As a result, our method is able to do away with traditional
similar to the parse tree would have more direct mappings.         syntactic analysis for a much simpler model, leaving most of
However, it is an important issue as SBVR has gained trac-         the effort in understanding the expression to be performed by
tion in our application domain in recent years [Nemuraite et       the accommodation process. With the aim of combining se-
al., 2010; Sukys et al., 2012] and is an important part of the     mantic structures into a composite structure based on the al-
OMG’s Model-Driven Architecture [OMG, 2008]. Moreover,             lowable relationships between them, the accommodation pro-




                                                                                     Michel Aldanondo and Andreas Falkner, Editors
                                                                        Proceedings of the 15th International Configuration Workshop
                                                                                                  August 29-30, 2013, Vienna, Austria
66                                                                               Matt Selway, Wolfgang Mayer, Markus Stumptner

cess is analogous to a configuration task.                         their semantic structures and grammatical expectations deter-
   The syntactic analysis and the accommodation process us-        mined by the representation of object types and fact types in
ing configuration are detailed in the following sections. These    SBVR. In our application domain, the vocabulary is provided
two processes are performed iteratively with the first using the   by business people, which drives our need for a simple lexi-
expectations, or placeholders in our case, to propose possible     con with minimal information.
parses of the sentence, and the second combining the seman-           Using grammatical expectations, the syntactic analysis is
tic aspects of the suggested parses into a complete structure.     performed incrementally, when an expression is found to fill
The result is a progressively more detailed and complete set       an expectation (i.e. the expectation is said to catch the ex-
of domain knowledge containing the concepts, their defini-         pression [Holmqvist, 1993]) a possible parse is proposed and
tions, and their associated rules.                                 kept in a suggestion list. Since the number of suggestions
                                                                   increases rapidly, the suggestion list is kept short by order-
3.1   Syntactic Analysis                                           ing the suggestions using heuristics to identify the best parse
The syntactic analysis of our approach is primarily the evo-       and pruning off any suggestions over a limit and/or that are
cation of semantic structures from the lexicon, taking into ac-    not considered good candidates [Holmqvist, 1993]. The met-
count grammatical dependencies. For example, in the exam-          rics used by the heuristics include: (1) catching distance, the
ple expression, there are two quantifiers ‘each’ and ‘exactly      linear distance to the word (or combination) filling a place-
one’. If only the evoked semantic structures were known, the       holder; (2) binding energy, the summation of all catching dis-
configurator would not know which quantifier applies to what       tances in a suggestion; (3) local coverage, the ratio of words
concept, yet we know that ‘each’ is supposed to apply to the       in the suggestion to words spanned by the suggestion; and
term ‘branch’ and ‘exactly one’ to ‘local area’.                   (4) global coverage, the ratio of words in the suggestion to all
   Traditionally, these dependencies are handled by the rela-      words of the expression encountered up to the current point.
tionships between categories (as shown in Figure 2a). How-            The suggestion list and suggestions are defined as follows.
ever, as this leads to complex lexicons that are not suitable      Definition 3 (Suggestion List). A suggestion list SL is an
for our target application and require complex processing to       ordered set of suggestions, where each suggestion s ∈ SL is
produce, we account for these dependencies using so called         a tuple s = (SLE, C, be, lc, gc) such that:
grammatical expectations [Holmqvist, 1993]. Grammatical
expectations are a relatively simple model of grammatical             • SLE is a set of lexical entries or previous suggestions
dependency that identify locations in an expression where                included in s
other expressions are “expected” to fill. Moreover, grammat-          • C is a set of tuples
ical expectations are a good match for SBVR-based seman-                      catch = (sle1 ∈ SLE, ge, sle2 ∈ SLE, cd)
tics, as they are similar to the placeholders of fact types. In          that associate a grammatical expectation, ge, of the
[Holmqvist, 1993], only left and right expectations were in-             catching lexical entry or suggestion, sle1 , to the lexical
troduced, which search to the left or right for another expres-          entry or suggestion that it catches, sle2 , with the catch-
sion. We also introduced internal expectations, which search             ing distance, cd.
within the span of the expression, in order to more easily deal                P
with SBVR fact types with more than two placeholders.                 • be = c∈C c.cd is the binding energy of the suggestion,
   This model defines lexicon and lexical entries as follows.         • lc is the local coverage of the suggestion,
Definition 1 (Lexicon). A lexicon is a tuple                          • gc is the global coverage of the suggestion
     l = (E, LE, lookup)                                              The order for each s1 , s2 ∈ SL is determined by:
where E is a set of expressions, LE is a set of lexical entries,     s1  s2 ⇐⇒ (s1 .gc, s1 .lc, s1 .be) ≤l (s2 .gc, s2 .lc, s2 .be)
and lookup : E → 2LE \∅
                                                                      Based on the preferences that: the best parse should cover
Definition 2 (Lexical Entry). A lexical entry is a tuple           the entire sentence; suggestions should have no holes; and,
     le = (e, ss, GE, typeGE )                                     words should be captured at the shortest distance.
where e is an expression of one or more words, ss is its as-
sociated semantic structure, GE is a set of grammatical ex-           Due to the recursive compositional nature of the sugges-
pectations, and typeGE : GE → {lef t, internal, right}             tions, i.e. each x ∈ SLE is a lexical entry or another sug-
assigns a type to each grammatical expectation ge ∈ GE.            gestion, the catching information constitutes a non-traditional
                                                                   parse tree. Figure 3 shows an example of a parse tree pro-
   This definition is purposefully generic with respect to the     duced by our syntactic analysis compared to the traditional
form the semantic structure takes. While we use the SBVR           parse tree equivalent to Figure 2a. The parse tree and the
meta-model, other semantic representations could be used.          partial SBVR model (i.e. the semantic structures) of the sug-
As a result, our approach remains flexible in terms of vary-       gested parse are provided to the semantic accommodation
ing the model being configured, as in [Kleiner et al., 2009].      process to be configured into a complete model.
   The lexicon is partly predefined. For example, words with          The general algorithm for the syntactic analysis is as fol-
explicit semantics in SBVR, such as those for quantifications,     lows, for each word in the expression:
logical operators, etc., are explicitly defined as certain ex-
pressions, semantic structures, and grammatical expectations.        1. Retrieve the entry for the current word from the lexicon
Domain specific terms are provided by the vocabulary of a            2. If the current word has any left placeholders, for each
business specification, e.g. a glossary of terms, which have            suggested parse in the suggestion list:




Michel Aldanondo and Andreas Falkner, Editors
Proceedings of the 15th International Configuration Workshop
August 29-30, 2013, Vienna, Austria
Matt Selway, Wolfgang Mayer, Markus Stumptner                                                                                      67

                                                                  3.2     Semantic Accommodation/Configuration
                                                                  Parses suggested by the syntactic analysis are sent to the
                                                                  semantic accommodation process to determine whether or
                                                                  not they are admissible in the (SBVR) semantics. Rather
                                                                  than the numerous processes for accommodation discussed
                                                                  in [Holmqvist, 1993], we utilise knowledge-based configura-
                                                                  tion to perform the accommodation. Specifically, component-
                                                                  oriented configuration is used, which combines advantages
                                                                  from connection-, resource-, and structure-based approaches
                  (a)                            (b)              [Soininen et al., 1998; Stumptner, 1997]. Moreover, the
                                                                  object-oriented nature of component-oriented configuration
Figure 3: A traditional parse tree (a) and one created by our     lends itself more easily to that of the SBVR meta-model.
analysis (b)                                                         Using the terminology of [Soininen et al., 1998], the SBVR
                                                                  meta-model constitutes the configuration model knowledge of
      (a) catch the closest word (or word combination) to the     our approach, i.e. it defines the types of entities, properties,
          left of the current word                                and rules that specify the set of correct configurations. It fol-
                                                                  lows that an SBVR (terminal) model constitutes the configu-
      (b) add the new suggested parse to the suggestion list
                                                                  ration solution knowledge or (possibly partial) configuration.
  3. Else, for each suggested parse in the suggestion list, if    Lastly, the parse tree created by the lexical analysis consti-
     the previous word or combination has any internal place-     tutes the requirements knowledge, i.e. additional constraints
     holders and the current word is within its span:             on the configuration that are not strictly part of the configura-
      (a) catch the current word with the internal placeholder    tion model. For example, the SBVR meta-model may allow
      (b) add the newly suggested parse to the suggestion list    either quantification to be applied to either object type in the
                                                                  example rule, however, the grammatical dependencies require
  4. Else, for each suggested parse in the suggestion list, if    that ‘each’ be applied to ‘branch’ and ‘exactly one’ to ‘local
     the previous word or combination has any right place-        area’ for the correct interpretation of the sentence.
     holders:                                                        Since the SBVR (meta-)model is defined using ECore, the
      (a) catch the current word with the right placeholder       Eclipse Modelling Framework 2 implementation of EMOF
      (b) add the newly suggested parse to the suggestion list    from the MOF specification of the OMG [OMG, 2006],
  5. Update the heuristics, distances between words, order        we first discuss its mapping to the configuration ontology
     the suggestion list, and cull excess entries                 of [Soininen et al., 1998]. The configuration ontology de-
                                                                  fines standard concepts for representing the different aspects
  6. Provide newly suggested parses to the semantic accom-        of configuration knowledge including: taxonomy, attributes,
     modation/configuration process                               structure, topology, and constraints. An example of the map-
  7. Remove suggestions that failed accommodation                 ping for SBVR is displayed in Figure 5. It focuses on the con-
   An example of the syntactic analysis after a complete parse    figuration model knowledge, with some example component
of the example sentence is displayed in Figure 4.                 individuals. For simplicity, port individuals are not included
                                                                  in the figure. The mapping is by no means complete, but pro-
                                                                  vides a link between the meta-model representation and the
                                                                  representation used for the configuration task.
                                                                  Taxonomy
                                                                  ECore allows classification hierarchies to be defined through
                                                                  the use of the concepts EClass and EObject, and the rela-
                                                                  tions eSuperTypes and eClass. The concept EClass gener-
                                                                  ically represents a type and therefore could be mapped to
                                                                  Configuration Type; however, ECore does not dis-
                                                                  tinguish the sub-types Component Type, Port Type,
                                                                  Resource Type, and Function Type in the same
                                                                  manner as [Soininen et al., 1998]. Therefore, it is more
                                                                  appropriate to map instances of EClass to Component
                                                                  Types. Other ECore concepts map to the other configuration
                                                                  types. Therefore, only component types have a classification
Figure 4: Lexical analysis of the rule ‘Each branch is included   hierarchy; the others are made direct subtypes of their appro-
in exactly one local area.’ Asterisks represent the grammati-     priate configuration type (i.e. Attribute Type, etc.).
cal expectations, hexagons represent the catching word, rect-        Sub-types and super-types in ECore are represented by eS-
angles represent the caught word, catching distance is shown      uperTypes. This relation defines the direct super-types of an
above each line.
                                                                     2
                                                                         http://www.eclipse.org/modeling/emf/




                                                                                       Michel Aldanondo and Andreas Falkner, Editors
                                                                          Proceedings of the 15th International Configuration Workshop
                                                                                                    August 29-30, 2013, Vienna, Austria
68                                                                               Matt Selway, Wolfgang Mayer, Markus Stumptner




          Figure 5: Fragment of the configuration model derived from the ECore mapping of the SBVR meta-model.


EClass and, therefore, maps to the isa relation. Moreover,         model as it only includes necessary and optional attributes.
multiple inheritance is allowed in both representations.
   In ECore, an EClass may be abstract, i.e. cannot have any       Structure and Topology
instances. However, in the context of configuration, it makes      The ontology of [Soininen et al., 1998] differentiates be-
sense to relax this definition to allow partial information of a   tween Part Definitions, which specify the composi-
configuration, such as in [Soininen et al., 1998]. Therefore,      tional structure of components, and Port Definitions,
abstract and non-abstract (i.e. concrete) types in ECore are       which specify the topological connections (either physical or
mapped to abstract and concrete classes in the ontology using      logical) between components. Part Definitions con-
the appropriate Abstraction Definition.                            stitute a direct has-part relation between components. This
   Instances of EObject represent Individuals from the             relation must be anti-symmetric and anti-reflexive. Moreover,
ontology. In ECore, these are associated to their type by          the transitive closure of has-part defines a transitive has-
eClass, which maps to is directly of. Moreover, since eClass       part relation, which must also be anti-symmetric and anti-
specifies the EClass of an individual, EObject necessarily         reflexive. Port Definitions have no such restriction.
maps to Component Individual.                                         In ECore, both Part Definitions and Port
                                                                   Definitions are represented by the concept
Attributes                                                         EReference. An EReference may be a contain-
Attributes in ECore are represented by the concept                 ment reference (for compositional relationships) and/or it
EAttribute, which have a name, a type specified by eAt-            may have an eOpposite for bi-directional relationships.
tributeType, and relations for the lower and upper bounds of          While it seems intuitive to map containment and uni-
their cardinality (lowerBound and upperBound, respectively).       directional references to part definitions, and bi-directional
   An eAttributeType links an EAttribute to its                    relationships to port definitions, this is not possible as ECore
EDataType, which maps to the concept Attribute                     does not uphold the anti-symmetric and anti-reflexive require-
Type from [Soininen et al., 1998].              It follows that    ments of has-part relations. For example, the situation shown
EAttributes map to Attribute Definitions                           in Figure 6, in which the transitive closure of the has-part
with the appropriate Attribute Name, Attribute                     relations between Meaning, Representation, and Expression
Value Type (from eAttributeType), and Necessity                    are reflexive, is allowable in ECore but not the configuration
Definition. The value of an attribute for a specific               ontology. To determine those EReferences that could be
EObject is mapped to an Attribute with the respective              mapped to part definitions would require analysis of the meta-
Attribute Value and Component Individual.                          model; instead, we map all EReferences to ports. As a re-
   In ECore, attributes can have a zero-to-many cardinality,       sult, we effectively use ports as a generalised structural rela-
while Necessity Definitions are restricted to exactly              tionship similar to that described in [Hotz and Wolter, 2013].
one (necessary) and at most one (optional) attributes. As a           A Port Definition requires a Port Name, a
result, there exists only a partial mapping to the configuration   Possible port type set, and a Cardinality.
ontology; however, this is not a problem for the SBVR meta-        Similar to EAttribute, EReferences have a name,




Michel Aldanondo and Andreas Falkner, Editors
Proceedings of the 15th International Configuration Workshop
August 29-30, 2013, Vienna, Austria
Matt Selway, Wolfgang Mayer, Markus Stumptner                                                                                    69

                                                                   branch is included in exactly one local area’ where ‘local
                                                                   area’ is considered a noun phrase would be inconsistent with
                                                                   the domain knowledge, while the parse where ‘local area’ is
                                                                   considered a noun would be consistent.
Figure 6: Transitive relationships in the SBVR meta-model
                                                                   4   Experimental Results
a lowerBound, and an upperBound.           Therefore, an           We present the results of early experiments on the configura-
EReference maps to a Port Definition, with the                     tion of domain knowledge for NLP. We performed multiple
name mapping to the Port Name, and the lowerBound                  tests of the example structural rule (1) and gathered statistics
and upperBound are mapped to the Cardinality. Port                 on the performance of the configurator in configuring these
Types and their Compatibility Definitions are                      kinds of models. The results are summarised in Table 1.
created for each EReference to ensure the associated ports            The configurations produced were evaluated by hand for
can only connect to each other correctly.                          correctness. In each case the configuration was correct (cor-
   When two EObjects are associated to one an-                     responding with that shown in Figure 2b). The high num-
other through an EReference the appropriate                        ber of variable assignments are due to relationships in the
Port Individuals of the equivalent Component                       SBVR meta-model with cardinalities higher than one produc-
Individual are connected-to each other.                            ing multiple ports in the configuration model; therefore, most
                                                                   port assignments are to the unconnected state.
Constraints                                                           It is interesting to note the correlation between the (mini-
Arbitrary constraints in ECore are specified using anno-           mum) number of backtracks and the number of components
tations on model elements, written in some constraint              generated. This is due to the nature of the SBVR meta-model
language. These annotations are mapped to Constraint               in two respects: (1) it contains a large number of relations
Instances and their corresponding Constraint                       with a cardinality of zero-to-many, and (2) it uses reified re-
Expressions. Special cases of constraints, particularly            lations, which means that, in terms of configuration, each re-
Property Definition and its sub-types (Attribute                   lation is represented as a component.
Definition, etc.), are utilised by previous mappings.                 To prevent spurious connections between components,
   The constraints defined in our configuration model come         ports representing a zero-to-many relation are first set to the
from the SBVR specification. An example is shown in Fig-           unconnected state; therefore, they are only connected to a
ure 6 in that, the ‘has meaning’ relation between an ‘Expres-      component if being unconnected violates some constraint,
sion’ and a ‘Meaning’ is allowed if and only if the ‘Meaning’      causing a backtrack. Furthermore, the use of reified relations
is connected to the ‘Expression’ through a ‘Representation’        means that new relation components need to be created, even
and the ‘has representation’ and ‘has expression’ relations.       if it results in connecting two existing (non-relation) compo-
   In the configurator used by our implementation the differ-      nents. Therefore, in the optimal search of only connecting
ent aspects of the configuration knowledge are mapped to a         existing (non-relation) components, there will always be the
generative CSP (or GCSP) as described in [Stumptner et al.,        same number of backtracks as generated components.
1998]. This particular approach differs in its definition of          The higher number of backtracks in other configurations
some of the previously discussed concepts of [Soininen et al.,     of the example are the result of the non-deterministic solver
1998] in the following respects:                                   attempting variable assignments in a suboptimal order. This
   • Non-leaf nodes in the taxonomy are assumed to be ab-          has a negative impact on performance. However, this could be
      stract; therefore, concrete component types that have        avoided by providing SBVR specific procedural strategies for
      sub-types are split into two types: an abstract super-type   guiding the search [Stumptner et al., 1998; Hotz and Wolter,
      and a concrete sub-type.                                     2013]. For example, ordering heuristics can be provided to
   • Ports are specified as necessary or optional; therefore,      change the order in which different component types or port
      Port Definitions with higher cardinalities result            types are assigned. Moreover, the search space could be re-
      in multiple ports that are then grouped into port sets,      duced by preventing the creation of certain component types.
      which allow quantitative reasoning over their members.       For example, we assume a sentence is to be interpreted in the
                                                                   context of a provided vocabulary, hence we could prevent the
   By configuring the SBVR models directly, we perform
configuration of the concepts in a business specification rather
than a syntactic parse tree as in [Kleiner et al., 2009]. This                                             Sentence (1)
results in an iteratively more detailed domain model, where                    # Constraints                   197
new domain knowledge is taken into account each time a new                     # Variable Assignments         5162
sentence is processed. In addition, inconsistencies can be de-                 Min. # Backtracks                 6
tected more easily than by reprocessing the model after new
                                                                               Max. # Backtracks                25
knowledge is added through a model transformation. Finally,
the mapping to SBVR is simplified as the lexicon maps di-                      Ave. # Backtracks                17
rectly to the semantics, rather than an intermediate syntactic                 # Components Generated            6
model. This solves the issue of multiple parse trees with cor-
rect mappings to SBVR as, for example, the parse of ‘Each          Table 1: Performance statistics of the configuration process




                                                                                     Michel Aldanondo and Andreas Falkner, Editors
                                                                        Proceedings of the 15th International Configuration Workshop
                                                                                                  August 29-30, 2013, Vienna, Austria
70                                                                               Matt Selway, Wolfgang Mayer, Markus Stumptner

creation of new object types, fact types, and other concepts       addition, a more thorough evaluation of the process will be
related to the vocabulary aspects of the SBVR meta-model.          performed over larger examples in order to determine the ef-
   We are in the process of implementing a larger example,         fect of growing domain knowledge on the process.
a portion of which assigned 4812 variables, generated 5 new
components, and took 49 backtracks to do so. This empha-           References
sises the need for heuristics to help guide the search.            [Duchier et al., 2012] Denys Duchier, Thi-Bich-Hanh Dao,
                                                                      Yannick Parmentier, and Willy Lesaint. Property grammar
5    Related Work                                                     parsing seen as a constraint optimization problem. In Proc.
Previous work in using configuration for natural language             Formal Grammar 2010/2011, LNCS 7395, pages 82–96,
parsing has translated property grammars into a configuration         2012.
model [Estratat and Henocque, 2004]. Using this approach,          [Duchier, 1999] Denys Duchier. Axiomatizing dependency
a simple context free-grammar (an bn ) and a subset of French         parsing using set constraints. In Proc. Sixth Meeting on
were processed. This approach focuses primarily on the syn-           Mathematics of Language, pages 115–126, 1999.
tactic aspect of generating parse trees. Although the possi-       [Estratat and Henocque, 2004] Mathieu Estratat and Laurent
bility of incorporating semantics is suggested in [Estratat and       Henocque. Parsing languages with a configurator. In Proc.
Henocque, 2004], none were incorporated in the processing             ECAI’2004, volume 16, pages 591–595, 2004.
of the natural language subset.
   The approach of [Estratat and Henocque, 2004] was               [Holmqvist, 1993] K. B. I. Holmqvist. Implementing cogni-
adapted to English and a Model-Driven Engineering environ-            tive semantics: image schemata, valence accommodation
ment in [Kleiner et al., 2009]. Although their focus remains          and valence suggestion for AI and computational linguis-
on the syntactic aspect of generating parse tress, SBVR se-           tics. PhD thesis, Dept. of Cognitive Science Lund Univer-
mantics are partially taken into account by associating ele-          sity, Lund, Sweden, 1993.
ments of the parse tree with SBVR types. This information          [Hotz and Wolter, 2013] Lothar Hotz and Katharina Wolter.
is used to simplify the model transformation; however, the            Beyond physical product configuration configuration in
domain knowledge included in the SBVR model is not taken              unusual domains. AI Communications, 26:39–66, 2013.
into account and, therefore, the process is not truly semantic.    [Kleiner et al., 2009] M. Kleiner, P. Albert, and J. Bézivin.
   Other approaches have used standard CSP translations of            Configuring models for (controlled) languages. In Proc.
Dependency Grammars [Duchier, 1999] and, more recently,               ConfWS’09, pages 61–68, 2009.
Property Grammars [Duchier et al., 2012] in order to process       [Langacker, 2008] R. W. Langacker. Cognitive grammar: a
natural language. However, both of these approaches focus
                                                                      basic introduction. Oxford University Press, Oxford, New
on syntactic parsing, while we aim to incorporate semantics
                                                                      York, 2008.
and domain knowledge directly into the parsing process.
                                                                   [Nemuraite et al., 2010] Lina Nemuraite, Tomas Skersys,
6    Conclusions and Future Work                                      Algirdas Sukys, Edvinas Sinkevicius, and Linas Ablon-
                                                                      skis. VETIS tool for editing and transforming SBVR
In this paper we have presented an approach to natural lan-           business vocabularies and business rules into UML&OCL
guage processing that utilises configuration of domain knowl-         models. In Proc. ICIST 2010, pages 377–384, 2010.
edge to determine the validity of an expression. In effect, this   [OMG, 2006] OMG. Meta Object Facility (MOF) Core
performs natural language understanding, at least in terms of
                                                                      Specification. Object Management Group, 2006.
the semantic representation used. Moreover, we have demon-
strated how techniques from Cognitive Linguistics can be           [OMG, 2008] OMG. Semantics of Business Vocabulary and
combined with a translation of the SBVR meta-model (the se-           Business Rules (SBVR), v1.0. Object Management Group,
mantic representation in our case) into a configuration prob-         2008.
lem in order to achieve this natural language understanding.       [Soininen et al., 1998] Timo Soininen, Juha Tiihonen, Tomi
   Our approach is novel in its combination of techniques             Männistö, and Reijo Solunen. Towards a general ontology
from Cognitive Linguistics and configuration, and in that it          of configuration. AI EDAM, 12(04):357–372, 1998.
performs configuration directly on the semantics of the do-        [Stumptner et al., 1998] Markus Stumptner, Gerhard E.
main knowledge. This is in contrast to previous approaches            Friedrich, and Alois Haselböck. Generative constraint-
that use configuration or CSPs for natural language process-          based configuration of large technical systems. AI EDAM,
ing, as they tend to focus only on the syntactic aspect of gen-       12(04):307–320, 1998.
erating a parse tree. As a result, our approach benefits from
a simplified lexicon (important to our application in the busi-    [Stumptner, 1997] Markus Stumptner.          An overview of
ness domain), improved mapping to the target semantics, and           knowledge-based configuration. AI Communications,
the semantic disambiguation of terms during processing.               10(2):111–125, 1997.
   The presented experimental results demonstrate the feasi-       [Sukys et al., 2012] Algirdas Sukys, Lina Nemuraite, Bro-
bility of our approach. In its current form, however, the con-        nius Paradauskas, and Edvinas Sinkevicius. Transforma-
figuration of the SBVR model can be inefficient and, there-           tion framework for SBVR based semantic queries in busi-
fore, future work will look at providing heuristics in order          ness information systems. In Proc. BUSTECH 2012, pages
to ensure better performance in the configuration process. In         19–24, July 22-27 2012.




Michel Aldanondo and Andreas Falkner, Editors
Proceedings of the 15th International Configuration Workshop
August 29-30, 2013, Vienna, Austria