<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Measuring Semantic Label Quality Using WordNet</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Fabian Friedrich</string-name>
          <email>Fabian.Friedrich@informatik.hu-berlin.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Business and Economics, Institute of Information Systems Spandauer Strasse 1</institution>
          ,
          <addr-line>Berlin</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1998</year>
      </pub-date>
      <fpage>296</fpage>
      <lpage>304</lpage>
      <abstract>
        <p>The automatic determination of defects in business process model and the assurance of a high quality standard are crucial to achieve easy to read and understandable models. Recent research has focused its efforts on the analysis of structural properties of business process models. This paper instead wants to focus on the labels and their impact on the understandability and integratability of process models. Metrics which can help in identifying process model labels that could lead to misunderstandings are discussed and a way to automatically detect labels with a high chance of ambiguity is presented. Therefor the lexical database WordNet is used to obtain information about the specificity and possible synonyms of a word. The derived measures were then applied to the SAP Reference Model and the most interesting findings are presented.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Business Process Modeling has received more and more attention in recent years.
Naturally, the interest in the quality of those models grows with their application. Different
frameworks have been developed to understand the factors that influence the quality of
business process models. A long established framework is Lindland et al. [LSS94]
developed in 1994, which served as a basis for many other quality definition approaches
[KJ03, KSJ06]. But these Frameworks provide only qualitative statements about process
model quality and are rather abstract. So far very little research has been conducted to find
appropriate quantitative measurements. Some of them are, for example, the Cross Flow
Connectivity or the Density metrics [VCR+] which try to analyze the structure of a given
process model.</p>
      <p>Figure 1 shows an EPC from the SAP Reference Model, which was chosen as a basis for
tests for the following analysis. What makes the labels of this model interesting is the fact
that the term ”wage” and ”remuneration” were used within the same model although they
are interchangeable. This problem of inconsistent usage of terms arises due to different
levels of detail and abstractions used by different modelers [HS06]. Hence, comparing
and merging of models or sub-models becomes more complicated because of these
conflicts [Pfe08]. To detect and avoid those conflicts this paper will propose solutions by
analyzing the labels of process models. The particular approach is to analyze the meaning
of these labels, using the well known WordNet semantic database [Mil95] and to define a
quantitative measure which is able to provide clear evidence to whether a label is good or
bad.</p>
      <p>On the following pages a short introduction of the elements in WordNet is given.
Afterwards these elements will be used to derive two quantitative measures for the semantic
quality of process model labels. The last chapter will then present the results of the
application of those measures to the EPCs of the SAP Reference Model to verify their value.
The paper will conclude by critically assessing this application’s results and provide an
outlook to possible extensions and further research.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Background</title>
      <p>This section wants to introduce the preliminaries for the metrics that were developed.
As the focus of this work are the label’s semantics, an overview of the lexical database
WordNet will be given. Furthermore, section 4 will make use of semantic relatedness
measures to determine the meaning of a word. Therefore, the main principles of semantic
relatedness will be shortly introduced, too.
2.1</p>
      <sec id="sec-2-1">
        <title>A Brief Introduction to WordNet</title>
        <p>WordNet is a semantic database which was developed in 1985 at Princeton University,
mainly for natural language processing [Mil95]. Since then it has steadily grown and
today it contains more than 155,000 words. These words are organized into so called SynSets
(Synonym Sets). A SynSet contains several words which share the same meaning.
Furthermore, the SynSets in WordNet are linked to each other through pointers with different
meanings. Thus a program is able to extract different semantic relations for a given word.
For example:</p>
        <p>Synonyms - Words which have the same meaning (to work on - to process)
Homonyms - Words which are written identically, but have a different meaning
Hypernyms/Troponyms - Nouns which are superordinate to the given noun. The
opposite of a hypernym is a Hyponym. The same principle can also be applied to
verbs which is called a troponym then. (sue - challenge, tree - plant)
Meronyms - Structures nouns in a ”part-of” relationship. (car - wheel)
Antonyms - Mainly used for adjectives and adverbs and describes the opposite
(wet - dry, hot - cold)
The quality metrics that will be explained in detail in section 3 will make use of the
possibility to extract synonyms and hypernyms/troponyms from WordNet.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Semantic Relatedness</title>
        <p>Semantic relatedness is a measure that states in how far two terms are related to each other.
Some of the most popular semantic relatedness measures are those of Hirst and St-Onge,
Leacock and Chodorow, Resnik, Jiang &amp; Conrath [JC97] and Dekang Lin [Lin98] which
were compared in [BH01]. The latter two were used in the conducted experiments. A
recent master thesis also investigated on this topic [Scr06]
Both measures leverage the hierarchical structure that is build up within WordNet. But
they rely on statistical information on the probability of the occurrence of a word. A
possible approach to determine these probability values is to count the number of occurrences
within a large corpus of text such as the complete works of Shakespeare, the Brown corpus
from the ICAME Collection of English or the British National Corpus1, which was used
for the experiments in this paper. Once the probability values of a concept Ci can be
determined, P (Ci), and the first concept C0 that subsumes both of those concepts has been
extracted from the hierarchical structure, the similarity can be computed as follows:
Lin:
simLin(C1; C2) = (2 logP (C0))=(logP (C1) + logP (C2))
(1)
Taking the example from figure 2 this means that the similarity between the words ”coast”
and ”hill”, given that there first common parent is ”geological-formation”, is 0:59 (Lin)
or 9:15 (Jiang &amp; Conrath), respectively. Obviously, the metric defined by Dekang Lin has
the advantage that it is always within the bounds of 0.0 and 1.0, but as the word sense
disambiguation conducted in section 4 only tries to determine a maximum value it has no
influence on the metrics, yet.
(2)
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Quality Metrics</title>
      <p>The information that can be gathered through the methods previously explained will now
be used to determine the semantic quality for a given label. As this quality largely depends
on the environment the label is found in, a model cannot be considered independently, but
a collection of many models - a model repository - has to be analyzed. To test the metrics
presented in this paper, the 604 EPCs of the SAP Reference Model were used. The focus
will be the analysis of nouns and verbs, as especially the specificity metrics which will be
described below rely on the hypernym/troponym structure within WordNet.
3.1</p>
      <sec id="sec-3-1">
        <title>Consistency</title>
        <p>The first problem that is addressed here is the use of several words with the same meaning,
as this contradicts the principle of a shared vocabulary and increases the ambiguity of
labels and is responsible for misunderstandings. This occurrence of 2 synonyms referring
to the same concept or sense respectively, is a conflict that can lead to problems in the
process of consolidating a model repository. Resolving those conflicts usually demands
word
order
purchase</p>
        <p>bill
invoice
usage count
1025
199
202
132
the consultation of several technical and domain experts and can thus become very costly.
[DHLS08]
An example of such practice would be the usage of both ”invoice” and ”bill” within the
same model repository.</p>
        <p>The first step to detect such rarely used synonyms is to count the occurrence of every word
within the model repository. To unify different word forms the stemming algorithm of
Porter [Por80] was used. Afterwards synonyms of a given word can be acquired through
WordNet. The number of occurrences of the given word divided by the total number of
occurrences of all synonyms of the word then determines the relative frequency of the
given word within the model repository. Another example are the words ”order” and
”purchase” which both can be found in the SAP Reference Model (see table 3.1).
Now, the consistency quality can be measured. Either an occurrence of 100% can be
demanded for each word compared to its synonyms or a continuous quality value based
on a minimum frequency m ,which has to be defined externally, can be used. The quality
measure for a word x can then be computed using a quality function like this one:
qcons(x) = min(1:0; (
f requency(x) 2</p>
        <p>) )
m
(3)
This quality function has the advantage that it is scaled between 0:0 and 1:0 and as soon
as the frequency value x of a words falls below m the quality will rapidly decrease, but
a distinction is still possible. Additionally, the quality measure depends on the minimum
frequency m. Taking the example above, this means that at a minimum frequency level
of 30% the word ”purchase” will be assigned a quality value of 0:284, whereas at
m = 50% it is 0:098. Hence, it can be defined how strict the quality metric is supposed
to evaluate the labels of a given model.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Specificity</title>
        <p>Another problem, that arises when different people are modeling the same domain, is
that each can use a different level of abstraction to describe reality. A good indicator
for the used level of abstraction is the depth within the WordNet hypernym tree. The
deeper the level of a word, the more specific it is. Thus, to align different sub-models
within the repository all the words used should be in the same depth range within the
WordNet hypernym tree. As this tree is only available for verbs and nouns the analysis
only uses those and leaves the evaluation of adjectives and adverbs for further research. To
compute the depth for a given word a first approach was to determine the average depth
of all possible senses which are present in WordNet. The resulting distribution is shown
in figure 3 It is visible that most of the words are in a depth level of about 5. Similar to
the Consistency Metrics presented before, an arbitrary lower and upper bound can now be
defined and used in a quality function qspec. As an example this could be:
qspec(x) =</p>
        <p>0:0; if depth(x) &gt; boundlower + boundupper
min(1:0; ( boduenptdhlo(wx)er )2; (boundupper+boundlower)2 ; else
depth(x)
(4)
(5)
as depicted in figure 4. This function provides the same advantages as mentioned for qcons
before, but punishes deviations into both directions now. Alternatively, a function which
is cut off only on one side could be used if it is assumed that only a superficial labeling
style causes problems.
The following section will discuss problems of the methods described so far and how they
were further optimized.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Consistency and Specificity Quality using Semantic Relatedness</title>
      <p>The problem with the approach explained above is that words in WordNet can have many
different meanings and the usage of averages can lead to unwanted results. On the one
hand this can distort the consistency quality as a synonym might just not be appropriate for
the word under investigation. On the other hand it could lead to a depth for the specificity
metric that deviates strongly from the one the specific meaning intended for this word has.
Thus, it is necessary to determine the meaning of a word within a label prior to calculating
its quality.</p>
      <p>Determining the semantics of a word is a well known task in natural language processing.
Some of the problems and ways to solve them can be found e.g. in [Sus93, BGBZ07,
Yar92]. The main idea is to use the context of a word and to evaluate which meaning is
the most probable. In our case, the context of a word in a label is the rest of the label,
of course, but also the labels of its predecessors and successors. If one of them should
be a join or split node (XOR, OR, AND) the predecessors/successors of that node will be
taken into account, too. Afterwards, the semantic relatedness as described in section 2.2 is
computed and averaged for each of the possible meanings a word has. The meaning with
the highest relatedness to its context is selected. This procedure is executed independently
for each word. Thus the meaning which was selected for one word does not influence the
selection of a meaning of another word, contrary to the approach taken in [Sus93].
Applying this technique makes the results much more accurate as only relevant synonyms
are taken into account for the consistency metric. The distribution of depth values also
becomes clearer when only concrete meanings are evaluated. See figure 5.
The only problem is that the disambiguation itself cannot be guaranteed to be always
correct and a false meaning could get selected. Therefore, the quality aspects of both
ways (using averages and using the selected meaning) will be regarded. Furthermore, the
influence of both metrics on the quality of a word will be scaled by a factor which can
be externally set. The quality of a single word then becomes:
qword(x) =
qaverage(x) + (1</p>
      <p>) qspecific(x )
where
qspecific(x) = qaverage(x) =
qcons(x) + (1
) qspec(x)
but the specific quality uses the selected meaning (here denoted by x ).
(6)
(7)</p>
    </sec>
    <sec id="sec-5">
      <title>Aggregation on Label and Model Level</title>
      <p>To quickly identify and easily visualize these semantic quality metrics it is necessary to
aggregate them for a whole label and/or model. This enables the user to quickly identify
models with the least quality and to adjust the labeling in a quality assurance process. Our
approach was to calculate the arithmetic mean and variance to determine the quality of a
given label/model.</p>
      <p>qlabel(Label) =
i=0
n
X qword(Label(i))</p>
      <p>n
where n is the number of nouns and verbs and Label(i) denotes the i-th noun or verb
within the label.</p>
      <p>n
X (qword(Label(i))</p>
      <p>qlabel(Label))2
2
label(Label) =</p>
      <p>i=0
qmodel(M odel) =
The same applies for the model level:
i=0
m
X qlabel(M odel(i))
m
n
where m is the number of labeled elements (in the case of EPCs Functions and Events)
and M odel(i) refers to the i-th label within the model under investigation.
2
model(M odel) =</p>
      <p>Xm (qword(M odel(i))
i=0
m
qlabel(M odel(i)))2
(11)
On the one hand models with a low quality should be subject to further quality checking,
but also models with a high variance are interesting. They will have a single label that is
evaluated as bad by our metric, surrounded by many good labels. Examples for both cases
will be presented in the next section.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Application to the SAP Reference Model</title>
      <p>The quality model described before was prototypically implemented and applied to a part
of the SAP Reference Model. In particular the 604 EPCs were examined. The parameter
was set to 0:5 so the consistency and specificity metrics were regarded equally
important. The degree to which the quality using the specific meaning is used was set to 0.8.
A minimum occurrence of 51% was demanded for the consistency quality qcons and the
lower and upper bound for the specificity metric were set to the value of the lowest and
highest quantile.</p>
      <p>One of the models with the worst total semantic quality was the one shown in figure 1
with a low total semantic quality score of (0.66). The graphical representation was
enhanced with two bars which depict the quality using averages (left) and using the specific
meaning (right). When the value for the label drops below 0.8 the bar becomes yellow and
below 0.6 red. The main issue with this model is the use of the very specific word
”garnishment” which lies in the 9th level in the WordNet hypernym tree. On the other hand
the very generic verb ”exists” is used. Interestingly, even in this small model 2 synonyms
”wage” and ”remuneration” are used alongside. But while ”remuneration” was used 13
times throughout the model repository, wage only appeared twice.</p>
      <p>From the information that was acquired, now even recommendations can be given to a user
to increase the label quality. Synonym usage can be decreased if ”remuneration” is also
used in the upper left event and in the function. Furthermore the word ”exist” could be
replaced with one of its hyponyms. Albeit it is hard to make an automated recommendation
as ”exist” is very unspecific and lies on the 0-th level within the hypernym tree. Lastly, the
word ”garnishment” could be replaced by one of its hypernyms e.g. ”court order” so the
model becomes more abstract and aligns better with the other models in the repository.
Another interesting finding was that a high variance usually arises, when the model
contains events which were not labeled at all and were therefore evaluated with a semantic
quality of 0:0. But there are also models as the one shown in figure 6 where a single
function is responsible for the high variance, although the total quality of the model is not
strongly affect. This model is also an example for the problems that arise with our metrics.
The quality of the function ”RFQ/Quotation” is low because the special abbreviation
Request for Quotation (RFQ) is not part of WordNet. A way to solve these kinds of problems
would be to create a lexical database similar to WordNet, but specialized for a business
administration and IT context.</p>
      <p>For the following analysis the parameters of the metrics were altered to be a lot stricter
with a minimum frequency of 80% and the specificity bounds set to 4.5 and 5.5. The five
best and worst labels that resulted from this analysis are depicted in table 2.
By looking into the detailed data for each word within a label it becomes evident why they
were evaluated good or bad, although some of the labels seem to be very similar. Table
3 shows in how far the single words of some of the labels shown before do not fulfill the
requirements of a consistent labeling style. When interpreting the results it is important to
remember that no syntactic features were examined by our quality metric. Thus the label
”Distribution” is of high semantic quality regarding its specificity and consistency with
the modeling repository, although no clear action can be derived from its syntax. New
mechanisms which can be used to tackle those issues are discussed e.g. in [LMS].
The low score of the word ”was” could be problematic in the last case as it is used as an
auxiliary verb to form the past tense. Although the label could be improved by changing
it to ”Created Revaluation” the exclusion of some basic vocabulary would prevent such
issues.
label
Material BOM Distribution</p>
      <p>Distribution
Electronic Bank Statement</p>
      <p>Transaction data
Document Distribution ALE</p>
      <p>RFQ/Quotation</p>
      <p>Write-up
Post-capitalization</p>
      <p>Usage Decision
Revaluation was made
One of the first models used for describing the quality of a process was the one by Lindland
et al [LSS94]. As our approach tries to increase the understandability of the model for the
audience, it is a part of the pragmatic quality mentioned in this work. Although Lindland
also defines the term ”semantic quality”, meaning the congruence of the model and the
domain which has to be modeled, it is not related to what is addressed as semantic label
quality in this paper. But, our metrics entirely depend on the WordNet semantic database
and try to automatically determine the meaning of a word from its given context. Hence,
the term semantic label quality seemed appropriate.</p>
      <p>So far quantitative evaluation approaches concentrated on the evaluation of the structure
of the process model e.g. [VRM+08]. A good overview provides [VCR+] and an
empirical correlation analysis between understandability and structural features can be found
in [MS08]. Some research especially on the impact of labeling and the structure of labels
was recently conducted by Jan Mendling [MR08a, MR08b, MRR09]. So far no research
that tries to leverage the information of WordNet for the improvement of Business Process
Model labels is known to the author.
8</p>
    </sec>
    <sec id="sec-7">
      <title>Conclusion</title>
      <p>In this paper a quantitative approach for determining the quality of process model labels
was presented. The metrics developed utilized the information that is available in the
WordNet lexical database, with the aim of minimizing misunderstandings between several
stakeholders. To achieve this, ambiguity by the usage of synonyms or words from
different levels of specificity is penalized. To test the approach defined here it was applied
to the EPCs, of the SAP Reference Model with promising results. Although it has been
applied to EPCs the model can be seamlessly transferred onto other modeling notations
like BPMN, YAWL or UML Activity Diagrams. A different procedure was developed at
the University of Muenster [DHLS08], where the agreement upon a restricted set of words
and syntactical constructs has to be found before starting to model. While modeling,
these constructs are then enforced by a modeling tool. Although this approach guarantees
consistent results, it is quite restrictive for the modeler himself. In contrast to that, our
approach ensures that a stringent und unified labeling style emerges through agreement
between the different parties involved in the modeling process. Thus, the costly process
of defining a set of allowed words prior to modeling can be omitted. Furthermore,
recommendations on how to improve the alignment of a model’s label with the rest of the
repository can be automatically derived from the metrics presented. Therefore, it also
addresses the main characteristics which are demanded in [Reb09] (dynamic, evolutionary
and community-based) for an approach to handle semantic ambiguity.
8.1</p>
      <sec id="sec-7-1">
        <title>Limitations</title>
        <p>Some general problems arise with the use of our methodology. As briefly discussed some
domain specific terms that are completely understandable for a domain expert are not
contained in WordNet. Another example is the word ”R/3” which is specific to the company
of SAP and was mapped to the SynSet containing the physical gas constant ”r”. This
also shows that the simple disambiguation heuristic introduced in section 4 is not perfect
as well and could be exchanged for a more sophisticated one. But, due to the fact that
both WordNet and the BNC corpus, used for the experiments, were designed for general
English such problems will always arise. A considerable reduction can probably only be
achieved by extending WordNet with domain specific terms and by changing the corpus
that is used to determine the probabilities for the semantic relatedness measure. A good
starting point for such a corpus would be e.g. a reference manual or domain specific
literature.</p>
        <p>Another problem is, that semantic relatedness strongly depends on subjective evaluations.
Thus, even if in general the computed similarity is accepted, individual persons could have
a different perspective and could object to the identified relations. Another alternative is
to give the modeler the possibility to adjust the meaning of a word, if he does not agree
with the automatic determination. Finally, as the semantic quality and understandability
of a model strongly depend on subjective evaluations, too, these metrics can never give a
perfect answer, but they are able to point a user or modeler to peculiar labels and provide
hints to increase the general understandability and alignability of the model.
8.2</p>
      </sec>
      <sec id="sec-7-2">
        <title>Further Research</title>
        <p>Further research could include identifying the possibilities of the information within
WordNet which were not used in this paper. Additionally, concepts that are already present,
like the determined similarity relation between the labels, could be used to identify labels
which do not seem to have a high correlation to their environment. This could lead to
the identification of parts that should probably extracted to a sub-model or integrated with
other parts. Another focus will be to conduct an empirical study to verify the usefulness
of the metrics.</p>
        <p>To conclude, it will also be necessary to combine the measures presented above with
techniques that also evaluate the syntactical structure of the label that is investigated, to provide
an overall quantitative quality metric, An attempt to that is currently being evaluated at our
institute [LMS].
[BH01]</p>
        <p>A. Budanitsky and Graeme Hirst. Semantic Distance in WordNet: An Experimental,
Application-oriented Evaluation of Five Measures, 2001.
[DHLS08] Patrick Delfmann, Sebastian Herwig, Lukasz Lis, and Armin Stein. Eine
Methode zur formalen Spezifikation und Umsetzung von Bezeichnungskonventionen fu¨r
fachkonzeptionelle Informationsmodelle. pages 23–38, 2008.
[HS06]
[JC97]
[KJ03]
[KSJ06]
[Lin98]
[LMS]
[LSS94]
[Mil95]
[MR08a]
[MR08b]
[MRR09]</p>
        <p>I. Hadar and P. Soffer. Variations in conceptual modeling: classification and ontological
analysis. In JAIS, pages 568–592, 2006.</p>
        <p>J. J. Jiang and D. W. Conrath. Semantic Similarity Based on Corpus Statistics and
Lexical Taxonomy. In International Conference Research on Computational Linguistics
(ROCLING X), pages 9008+, September 1997.</p>
        <p>J. Krogstie and H.D. Jorgensen. Quality of Interactive Models. In Conceptual Modeling
- ER 2002, Workshops of the 21st International Conference on Conceptual Modeling,
Tampere, Finland, volume LNCS 2784, pages 351–363. Springer Verlag Berlin
Heidelberg, 2003.</p>
        <p>John Krogstie, Guttorm Sindre, and Ha˚vard Jørgensen. Process models representing
knowledge for action: a revised quality framework. Eur. J. Inf. Syst., 15(1):91–102,
2006.</p>
        <p>Henrik Leopold, Jan Mendling, and Sergey Smirnov. Measuring Label Quality using
Part of Speech Tagging. to be published autumn/winter 2009.</p>
        <p>Odd Ivar Lindland, Guttorm Sindre, and Arne Sølvberg. Understanding Quality in
Conceptual Modeling. IEEE Software, 11(2):42–49, 1994.</p>
        <p>George A. Miller. WordNet: A Lexical Database for English. Communications of the
ACM, 38(11):39–41, 1995.</p>
        <p>
          Jan Mendling and Jan C. Recker. Towards Systematic Usage of Labels and Icons in
Business Process Models. In Terry Halpin, Erik Proper, John Krogstie, Xavier Franch,
Ela Hunt, an
          <xref ref-type="bibr" rid="ref3">d Remi Coletta, editors, CAiSE 2008</xref>
          Workshop Proceedings - Twelfth
International Workshop on Exploring Modeling Methods in Systems Analysis an
          <xref ref-type="bibr" rid="ref3">d Design
(EMMSAD 2008</xref>
          ), volume 337, pages 1–13. CEUR-WS.org, June 16-17 2008.
        </p>
        <p>Jan Mendling and Hajo A. Reijers. The Impact of Activity Labeling Styles on Process
Model Quality. In SIGSAND-EUROPE, pages 117–128, 2008.</p>
        <p>J. Mendling, H. A. Reijers, and J. Recker. Activity labeling in process modeling:
Empirical insights and recommendations. Information Systems, April 2009.
[VCR+]
[Yar92]</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [BGBZ07]
          <string-name>
            <given-names>Jordan</given-names>
            <surname>Boyd-Graber</surname>
          </string-name>
          , David Blei,
          <string-name>
            <given-names>and Xiaojin</given-names>
            <surname>Zhu</surname>
          </string-name>
          .
          <article-title>A Topic Model for Word Sense Disambiguation</article-title>
          .
          <source>In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)</source>
          , pages
          <fpage>1024</fpage>
          -
          <lpage>1033</lpage>
          , Prague, Czech Republic,
          <year>June 2007</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [Pfe08]
          <article-title>[Por80] [Reb09] [Scr06] [Sus93] Jan Mendling</article-title>
          and
          <string-name>
            <given-names>Mark</given-names>
            <surname>Strembeck</surname>
          </string-name>
          .
          <article-title>Influence Factors of Understanding Business Process Models</article-title>
          . In Dieter Fensel Witold Abramowicz, editor,
          <source>Business Information Systems</source>
          , 11th International Conference, BIS 2008, Innsbruck, Austria, May
          <year>2008</year>
          , pages
          <fpage>142</fpage>
          -
          <lpage>153</lpage>
          . Springer-Verlag,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>D.</given-names>
            <surname>Pfeiffer</surname>
          </string-name>
          .
          <article-title>Semantic Business Process Analysis - Building Block-based Construction of Automatically Analyzable Business Process Models</article-title>
          . M u¨nster,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>M. F.</given-names>
            <surname>Porter</surname>
          </string-name>
          .
          <article-title>An algorithm for suffix stripping</article-title>
          .
          <source>Program</source>
          ,
          <volume>14</volume>
          (
          <issue>3</issue>
          ):
          <fpage>130</fpage>
          -
          <lpage>137</lpage>
          ,
          <year>1980</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Michael</given-names>
            <surname>Rebstock</surname>
          </string-name>
          .
          <article-title>Technical opinion: Semantic ambiguity: Babylon, Rosetta or beyond? Commun</article-title>
          . ACM,
          <volume>52</volume>
          (
          <issue>5</issue>
          ):
          <fpage>145</fpage>
          -
          <lpage>146</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Aaron D.</given-names>
            <surname>Scriver</surname>
          </string-name>
          .
          <article-title>Semantic Distance in WordNet: A Simplified and Improved Measure of Semantic</article-title>
          .
          <source>Master's thesis</source>
          , University of Waterloo, Waterloo, Ontario, Canada,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Michael</given-names>
            <surname>Sussna</surname>
          </string-name>
          .
          <article-title>Word sense disambiguation for free-text indexing using a massive semantic network</article-title>
          .
          <source>In proceedings of the second international conference on Information and knowledge management (CIKM '93)</source>
          , pages
          <fpage>67</fpage>
          -
          <lpage>74</lpage>
          , New York, NY, USA,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [VRM+08]
          <string-name>
            <surname>Irene</surname>
            <given-names>Vanderfeesten</given-names>
          </string-name>
          , Hajo Reijers, Jan Mendling, Wil van der Aalst, and
          <string-name>
            <given-names>Jorge</given-names>
            <surname>Cardoso</surname>
          </string-name>
          .
          <article-title>On a Quest for Good Process Models: The Cross-Connectivity Metric</article-title>
          .
          <source>Advanced Information Systems Engineering</source>
          , pages
          <fpage>480</fpage>
          -
          <lpage>494</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>David</given-names>
            <surname>Yarowsky</surname>
          </string-name>
          .
          <article-title>Word-sense disambiguation using statistical models of Roget's categories trained on large corpora</article-title>
          .
          <source>In Proceedings of the 14th conference on Computational linguistics</source>
          , pages
          <fpage>454</fpage>
          -
          <lpage>460</lpage>
          , Morristown, NJ, USA,
          <year>1992</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>