<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards a Model-driven Approach for Reverse Engineering Design Patterns</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Awny Alnusair</string-name>
          <email>alnusair@uwm.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tian Zhao</string-name>
          <email>tzhao@uwm.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Wisconsin-Milwaukee</institution>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The size and complexity of software systems is rapidly increasing. Meanwhile, the ability to understand and maintain such systems is decreasing almost as fast. Model Driven Engineering (MDE) promotes the notion of modeling to cope with software complexity; in this paper we report on our research that utilizes ontological modeling for understanding complex software systems. We focus the discussion on recovering design pattern information from source code. We thus argue that an e ective recovery approach needs to utilize semantic reasoning to properly match an ontological representation of both: conceptual source code knowledge and design pattern descriptions. Since design patterns can take di erent forms when implemented in code, we argue that hardcoding their descriptions limits the exibility and usability of a detection mechanism.</p>
      </abstract>
      <kwd-group>
        <kwd>OWL</kwd>
        <kwd>Semantic Web</kwd>
        <kwd>Ontology</kwd>
        <kwd>Semantic Reasoning</kwd>
        <kwd>Design Patterns</kwd>
        <kwd>Program Understanding</kwd>
        <kwd>Reverse Engineering</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>In Software Engineering practices, software understanding deals with the process</title>
      <p>of studying software to mentally conceptualizing its behavior and inner
working structure. Unfortunately, understanding voluminous and complex software
is widely regarded as the most time consuming and resource intensive activity
during both reverse and forward engineering practices. In order to cope with
software complexity and consequently enable better understanding of software,</p>
    </sec>
    <sec id="sec-2">
      <title>Model Driven Engineering (MDE) has emerged as a software engineering disci</title>
      <p>pline. MDE emphasizes the systematic use of models and modeling principles
throughout the software development lifecycle. In fact, software design and
design best practices (i.e., design patterns) are of particular relevance to MDE due
to the heavy reliance on modeling techniques and principles.</p>
    </sec>
    <sec id="sec-3">
      <title>Design patterns describe reusable solutions to common recurrent object</title>
      <p>
        oriented design problems. The classic book [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] of design patterns introduced
many of these patterns. Ever since then, these patterns have been liberally
used in building and documenting new object-oriented systems as well as
reengineering legacy software. It is thus evident why one would be interested in
reverse engineering approaches for design pattern instance recovery.
      </p>
      <p>
        Central to all reverse engineering activities is the accurate identi cation of
interrelationships among the components of a subject system as well as a true
representation of the system at a higher level of abstraction [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. As a typical
reverse engineering activity, design pattern recovery is no exception. However,
building an e ective model that promotes understanding and truly represents the
internal working structure and organization of a software system is not a straight
forward task; it requires a formal, explicit, and semantic-based representation
of the conceptual knowledge of source code artifacts found in the subject
system. The research presented in this paper explores this hypothesis by providing
ontological representations of software assets.
      </p>
      <p>In our further exploration of this hypothesis, we take into account modeling
principles and techniques from Model Driven Engineering. MDE promises full
support for reverse engineering activities through modeling and model
transformations at di erent levels of abstractions. Due to the formal and built-in
reasoning foundations of ontologies, our methodology is solely reliant on
ontologybased modeling. Therefore, we provide an OWL1 ontology model that includes
a software representation ontology; this ontology is automatically populated
through text-to-model transformation with ontological instances representing
various program elements of a subject system. Furthermore, each design
pattern's structure including its participants' behaviors and collaborations are
represented by a separate OWL ontology; this ontology also encodes the rules needed
to detect this particular pattern.</p>
      <p>The rest of the paper is organized as follows: In Section 2, we describe our
approach for structuring and populating the knowledge base. We discuss the
details of design pattern detection in Section 3. Implementation and evaluation
case studies are discussed in Section 4. Finally, a discussion of the current
stateof-the-art followed by future work directions are discussed in Section 5 and 6,
respectively.
2</p>
      <sec id="sec-3-1">
        <title>Ontology Model</title>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Due to their ability to enable automatic knowledge sharing and understanding,</title>
      <p>ontologies are being used as the knowledge representation component of the</p>
    </sec>
    <sec id="sec-5">
      <title>Semantic Web vision [3]. To this end, many ontology modeling languages and</title>
    </sec>
    <sec id="sec-6">
      <title>Semantic Web technologies have emerged and been used in various domain ar</title>
      <p>eas. In our work, we utilize the Web Ontology Language (OWL), the Resource</p>
    </sec>
    <sec id="sec-7">
      <title>Description Framework (RDF)2, the Semantic Web Rule Language (SWRL)3, and SPARQL4 query language.</title>
    </sec>
    <sec id="sec-8">
      <title>OWL is used for capturing relationship semantics among domain concepts, OWL-DL is a subset of OWL based on Description Logic and has desirable computational properties for reasoning systems. OWL-DL's reasoning support</title>
      <p>1 http://www.w3.org/TR/owl-guide/
2 http://www.w3.org/TR/rdf-primer
3 http://www.w3.org/Submission/SWRL
4 http://www.w3.org/TR/rdf-sparql-query
allows for inferring additional knowledge and computing the classi cation
hierarchy (subsumption reasoning). RDF is used as a exible data representation
model. RDF is suitable for describing resources and provides a data model for
representing machine-processable semantics of data. SWRL is used for writing
rules that can be combined with OWL knowledge bases for reasoning and
computing entailments. Finally, SPARQL is an RDF query language and protocol
for ontological querying of RDF graphs.</p>
    </sec>
    <sec id="sec-9">
      <title>In what follows, we describe how Semantic Web languages contribute to our</title>
      <p>methodology; in particular, we show how OWL-DL and RDF are used to obtain
a precise formal representation of source code knowledge and design patterns.</p>
    </sec>
    <sec id="sec-10">
      <title>We also show how SWRL rules can be used to further extend the OWL represen</title>
      <p>tation of design patterns to handle features that cannot be expressed using OWL
alone. Collectively, these representations form the base for semantic reasoning
and inference of additional facts that can be retrieved using SPARQL semantic
queries.
2.1</p>
      <sec id="sec-10-1">
        <title>Structuring the Knowledge Base</title>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>At the core of the ontology model is a Source Code Representation Ontology</title>
      <p>(referred to afterwards as SCRO). This ontology is created to provide an
explicit representation of the conceptual knowledge structure found in source code.</p>
    </sec>
    <sec id="sec-12">
      <title>SCRO captures major concepts of object-oriented programs and helps under</title>
      <p>stand the relationships and dependencies among source code artifacts.</p>
      <p>
        SCRO's knowledge is represented using the OWL-DL ontology language and
veri ed using the Pellet OWL-DL reasoner [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. It was designed with ontology
reuse in mind, such that its vocabulary can be used not only for design
pattern detection but also in any application that requires semantic source code
information. This ontology is a work in progress, currently focused on the Java
programming language; however, extensions for other object-oriented languages
can be easily obtained. A fragment of the ontology's taxonomy using the Protege
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] ontology editor is shown in Fig. 1 and the complete ontology can be found
online [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
    </sec>
    <sec id="sec-13">
      <title>Each of the concepts (classes) shown in Fig. 1 is interpreted as a set that</title>
      <p>de nes a group of individuals (instances) sharing some properties. Disjoint
subclasses are also modeled to de ne di erent sets of individuals. For example, all
individuals that are members of class InstanceMethod are necessarily members
of class Method but none of them can simultaneously be a member of another
subclass of Method. Currently SCRO de nes 56 OWL classes and subclasses.</p>
    </sec>
    <sec id="sec-14">
      <title>Those classes map directly to source code elements and collectively represent the most important concepts found in object-oriented programs such as method, class, nested class, and eld.</title>
    </sec>
    <sec id="sec-15">
      <title>Various object properties, sub-properties, and ontological axioms are de</title>
      <p>ned within SCRO to represent relationships among concepts by linking
individuals from di erent OWL classes. For example, hasOutputType is a
functional property de ned for the return type of a method and hasSuperType is a
transitive object property de ned with two transitive sub-properties for
inheritance and interface implementation. Inverse properties are also used to de ne
the inverse relation. For example, isLocalVariableOf is the inverse property
of hasLocalVariable. This is in particular useful in many situations such as
traversing the resulted RDF graph in both directions and making the
ontology useful for applications that do not depend on a reasoner system. Currently,</p>
    </sec>
    <sec id="sec-16">
      <title>SCRO de nes 73 object properties, sub-properties and data properties. Some of these properties are further discussed in Section 3.</title>
      <p>2.2</p>
      <sec id="sec-16-1">
        <title>Design Pattern Ontology Sub-model</title>
      </sec>
    </sec>
    <sec id="sec-17">
      <title>In order to properly represent a design pattern, a proper identi cation of its</title>
      <p>participating classes, their instances, roles, and collaborations is indeed essential.</p>
    </sec>
    <sec id="sec-18">
      <title>We thus reuse the vocabulary de ned in SCRO and build de nitions for each design pattern. The result is a modular extensible structure of OWL ontologies linked together via regular OWL reuse mechanisms. This structure is depicted in Fig. 2.</title>
    </sec>
    <sec id="sec-19">
      <title>The design-pattern ontology represents knowledge common to all design pat</title>
      <p>terns. An ontology is created for each design pattern describing its essential
participants and their properties, collaborations, restrictions, and the corresponding</p>
    </sec>
    <sec id="sec-20">
      <title>SWRL rules needed to detect this pattern. This modular structure promotes on</title>
      <p>Design-Pattern
Singleton
tology reuse, thus allowing SCRO to be used with various maintenance related
activities and certainly with reasoners that do not support SWRL rules. Having
the ontology structure created, the next natural step is to populate the
knowledge base with ontological instances of various concepts in the ontology. This
process is described next.</p>
    </sec>
    <sec id="sec-21">
      <title>Populating knowledge repositories requires binding concrete resource informa</title>
      <p>tion with the ontology to create instances of ontological concepts. These instances
are the building blocks of the knowledge base that can be used for browsing and
querying. When dealing with large source code repositories, a large number of
instances is usually created, this is why automatic text-to-model transformation
and knowledge population is essential.</p>
      <p>Inspired by the Java RDFizer idea from the Simile5 project, we have built a
subsystem that automatically extracts knowledge from Java binaries. The
original RDFizer was able to distinguish only class types, super types, and basic use
relations represented in a class le. Currently, our subsystem performs a
comprehensive parsing of the class le format speci ed by the Java Virtual Machine. It
captures every ontology concept that represents a source code element and e
ectively generates instances of all ontological properties de ned in our ontologies
for those program elements. Our knowledge generator subsystem distinguishes
itself as not being relied on the existence of source code to extract knowledge.</p>
    </sec>
    <sec id="sec-22">
      <title>This is in particular helpful for binary reverse engineers who must understand a</title>
      <p>software system in which the source code is unavailable.</p>
    </sec>
    <sec id="sec-23">
      <title>The semantic instances generated by our subsystem are serialized using RDF</title>
      <p>triples; a separate RDF ontology in Notation36 syntax is generated for each
application framework parsed, this amounts to instantiating an OWL knowledge
base for that framework. This arrangement provides a clean separation of explicit</p>
    </sec>
    <sec id="sec-24">
      <title>OWL vocabulary de nitions from the metadata represented by RDF. Since OWL is a vocabulary extension of RDF, the encoded metadata represented in RDF</title>
      <p>5 http://simile.mit.edu
6 http://www.w3.org/DesignIssues/Notation3
can be naturally linked to the OWL design pattern ontologies via OWL reuse
mechanisms. A sample output of our knowledge extractor subsystem for the</p>
    </sec>
    <sec id="sec-25">
      <title>JHotDraw7 application framework can be examined online [6].</title>
      <p>3</p>
      <sec id="sec-25-1">
        <title>Design Pattern Recovery Approach</title>
        <p>Due to their abstract nature and the varying responsibilities and interactions
of their participants, design patterns can generally be implemented in di erent
ways and using di erent techniques or language-speci c constructs. Examples of
such implementation variations are plenty. Consider object composition for
example, some systems do not use a built-in collections framework component for
maintaining an aggregate collection of objects. Instead, they use a user-de ned
data structure. Other systems favor some delegation techniques over the others
when implementing design patterns such as Visitor, State, and Strategy. We thus
believe that pattern detection tools should be exible and easily extensible to
improve usability and e ectiveness, that is, the roles and interactions of
participants should not be hard-coded. In fact, users of the detection tool often need to
change those hard-coded role restrictions but nd it extremely di cult because
it requires code modi cation and rebuilding the entire tool.</p>
        <p>
          In our approach, we aim at providing exibility and transparency such that
design patterns are speci ed externally using ontology formalisms and
participant's responsibilities are depicted using ontology rules that can be easily
understood and modi ed. We thus use the expressivity of OWL-DL and SWRL
rules to formally represent each design pattern using a set of intent-preserving
conditions. When these conditions are collectively met, an instance of the design
pattern is detected. This approach relies solely on the de nitions found in our
ontologies and on an OWL-DL reasoner that is capable of computing inferences
from the set of facts and SWRL rules de ned in the ontologies. To test the
validity of our approach, we authored SWRL rules for ve well-known design
patterns, namely, Visitor, Observer, Composite, Template-Method, and
Singleton. In the next subsections, we illustrate our detection mechanism for Visitor
and Observer. Since SWRL rules are part of the OWL ontology representing a
particular design pattern, the reader can examine the rules for the other three
patterns by downloading the corresponding ontology [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] and view it in an
ontology editor such as Protege. Note that our approach is not limited to these
ve design patterns, others can be easily formalized and thus detected using the
same procedure.
3.1
        </p>
        <sec id="sec-25-1-1">
          <title>Detecting the Visitor Design Pattern</title>
        </sec>
      </sec>
    </sec>
    <sec id="sec-26">
      <title>Visitor is a behavioral design pattern that facilitates exibility by allowing external operations to be performed on the elements of an object structure and hence not modifying the classes of the elements [1]. The idea is to keep the object</title>
      <p>7 http://www.jhotdraw.org
structure intact by de ning new structures of visitors representing the new
behaviors of interest. This pattern de nes two separate class hierarchies: Host and
Visitor. The following are three sample conditions for detecting this pattern:
1. At the root of the Visitor hierarchy is a Visitor interface common to all
concrete visitors; this interface declares the invariant abstract behaviors (visit
methods) that should be implemented by each ConcreteVisitor, since each
visit method is designed for a ConcreteHost, the concrete host must be
present as an argument (input type) in the corresponding visit method.
2. A ConcreteVisitor is a concrete class type that implements the Visitor
interface and overrides each visit method to implement visitor-speci c
behavior for the corresponding ConcreteHost.</p>
    </sec>
    <sec id="sec-27">
      <title>3. A ConcreteHost must de ne the accept instance method that takes Visitor</title>
      <p>as an argument. This method implements the double dispatching technique
by calling the matching visit method and passing the host object into the
visitor.</p>
      <p>
        Intuitively, we should take advantage of OWL-DL expressivity to formalize
the above restrictions. However, these restrictions require more expressive power
than what Description Logic provides. OWL can only handle descriptions of
in nite number of unstructured objects connected in a tree-like manner [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. We
thus use SWRL to handle all non-tree-like situations and property chaining for
design pattern restrictions. SWRL extends OWL-DL with First Order Horn-like
rules; a rule in SWRL has two parts: the antecedent and the consequent, each of
these parts contain only positive conjunctions of either unary or binary atoms.
      </p>
    </sec>
    <sec id="sec-28">
      <title>In its simple form, a unary atom represents an OWL class predicate of the form</title>
    </sec>
    <sec id="sec-29">
      <title>C(var1) and a binary atom represents an OWL property predicate of the form</title>
    </sec>
    <sec id="sec-30">
      <title>P(var1, var2), both var1 and var2 are variables over OWL individuals. The</title>
      <p>reasoner will carry out the actions speci ed in the consequent only if all the
atoms in the antecedent are known to be true.</p>
      <p>
        Listing 1 shows a sample SWRL rule that depicts the above conditions for
the visitor design pattern. Please refer to SCRO and the visitor ontology found
online [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] for the de nitions of OWL properties used in this rule. Every restriction
was formalized using three di erent atoms in the rule; the hasInputType OWL
object property is de ned in SCRO to represent a method's formal parameter
and methodOverrides works for both method overriding and interface method
implementation. Upon classifying the ontology, a reasoner with a rule engine
such as Pellet would infer and thus create instances for the di erent participants
of this design pattern as described in the consequent part of the rule.
scro : InterfaceType (? visitor ) ^
scro : hasAbstractMethod (? visitor , ? visit ) ^
scro : hasInputType (? visit , ? concrete - host ) ^
scro : hasSuperType (? concrete - visitor , ? visitor ) ^
scro : hasInstanceMethod (? concrete - visitor , ?c - visit ) ^
scro : methodOverrides (?c - visit , ? visit ) ^
scro : hasInstanceMethod (? concrete - host , ? accept ) ^
scro : hasInputType (? accept , ? visitor ) ^
scro : invokesMethod (? accept , ? visit )
      </p>
      <p>=)
visitor : Visitor (? visitor ) ^
visitor : hasConcreteVisitor (? visitor , ? concrete - visitor ) ^
visitor : hasConcreteHost (? visitor , ? concrete - host ) ^
visitor : hasVisitMethod (? concrete - visitor , ?c - visit ) ^
visitor : hasAcceptMethod (? concrete - host , ? accept )</p>
    </sec>
    <sec id="sec-31">
      <title>Listing 1. A sample SWRL rule for Visitor</title>
    </sec>
    <sec id="sec-32">
      <title>Relaxing or adding more restrictions to the requirements is relatively simple.</title>
      <p>For the sake of argument, one might want to retrieve only pattern instances
that declare a super type for all concrete hosts in the Host hierarchy. This
can be accomplished by modifying the third condition and introducing a fourth
condition as shown below. The result is a new rule depicted in Listing 2.
3. A ConcreteHost is a subtype of Host. It de nes the accept instance method
that overrides the hook method found in Host. This method implements
double dispatching by calling the matching visit method and passes the
host in to the visitor.
4. At the root of the Host hierarchy is an interface or abstract class type. It
represents the super type of all concrete hosts. It declares the abstract hook
method; this method takes Visitor as an argument.
scro : InterfaceType (? visitor ) ^
scro : hasAbstractMethod (? visitor , ? visit ) ^
scro : hasInputType (? visit , ? concrete - host ) ^
scro : hasSuperType (? concrete - visitor , ? visitor ) ^
scro : hasInstanceMethod (? concrete - visitor , ?c - visit ) ^
scro : methodOverrides (?c - visit , ? visit ) ^
scro : hasSuperType (? concrete - host , ? host ) ^
scro : hasAbstractMethod (? host , ? hook ) ^
scro : hasInputType (? hook , ? visitor ) ^
scro : hasInstanceMethod (? concrete - host , ? accept ) ^
scro : methodOverrides (? accept , ? hook ) ^
scro : invokesMethod (? accept , ? visit )</p>
      <p>=)
visitor : Visitor (? visitor ) ^
visitor : hasHost (? visitor , ? host )
....................</p>
    </sec>
    <sec id="sec-33">
      <title>Listing 2. A modi ed SWRL rule for Visitor</title>
      <sec id="sec-33-1">
        <title>3.2 Detecting the Observer Design Pattern</title>
      </sec>
    </sec>
    <sec id="sec-34">
      <title>Observer represents a one-to-many dependency between communicating objects</title>
      <p>such that when the subject object changes its state, it sends a noti cation
message to all its listeners to be updated accordingly. Unlike the Composite pattern,
Observer is implemented in two separate hierarchies of participants. An interface
for all listeners sits at the root of the listener's hierarchy; it identi es a common
behavior for all listeners interested in observing a particular subject such that
each concrete listener maintains a reference to that particular subject. On the
other hand, the subject knows its listeners, it provides means to establishing the
relationship with them, and it is responsible for communicating the behavior
change to all those registered listeners.</p>
      <p>Listing 3 shows sample rule representations of this pattern. The rst rule
identi es candidates for potential listeners, their concrete listeners and the
corresponding update methods.
scro : hasSuperType (?c- listener , ? listener ) ^
scro : hasMethod (? listener , ? update ) ^
scro : methodOverriddenBy (? update , ?c- update ) ^
scro : isMethodOf (?c- update , ?c- listener )</p>
      <p>=)
observer : hasCListenerCandidate (? listener , ?c- listener ) ^
observer : hasCUpdate (?c- listener , ?c- update ) ^
observer : hasUpdate (? listener , ? update )
------------------------------------------------------------scro : hasField (?c- subject , ? container ) ^
scro : hasStructuredDataType (? container , ? containerDT ) ^
scro : hasMethod (? containerDT , ? insert ) ^
scro : methodInvokedBy (? insert , ?add - listener ) ^
scro : isMethodOf (? add - listener , ?c- subject )</p>
      <p>=)
observer : hasAddListener (?c- subject , ?add - listener )
------------------------------------------------------------observer : hasCListenerCandidate (? listener , ?c- listener ) ^
observer : hasUpdate (? listener , ? update ) ^
observer : hasAddListener (?c- subject , ?add - listener ) ^
scro : hasInputType (? add - listener , ? listener ) ^
scro : hasPart (?c- listener , ?c- subject ) ^
scro : hasMethod (?c- subject , ? notify ) ^
scro : invokesMethod (? notify , ? update )</p>
      <p>=)
observer : hasConcreteListener (? listener , ?c- listener ) ^
observer : listensTo (?c- listener , ?c- subject ) ^
observer : Observer (? notify )</p>
    </sec>
    <sec id="sec-35">
      <title>Listing 3. Sample SWRL rules for the Observer pattern</title>
    </sec>
    <sec id="sec-36">
      <title>The OWL object property hasSuperType is made transitive so that the rea</title>
      <p>soner can infer all direct or indirect super types of a given class. The update
method speci ed in the listener interface should be implemented by all
concrete listeners. Recall that the object property methodOverrides works for both
classes and interfaces.</p>
    </sec>
    <sec id="sec-37">
      <title>The second rule e ectively identi es potential candidates for concrete sub</title>
      <p>jects and the method used for establishing the relationship between this subject
and its listeners. The subject class maintains a collection of its listeners, typically
stored in a eld that has a structured data type. In SCRO, StructuredDataType
represent arrays or collections of elements; it is a sub-class of ComplexDataType
which also includes UnstructuredDataType such as user-de ned data structures.</p>
    </sec>
    <sec id="sec-38">
      <title>Therefore, the rule in Listing 3 can be modi ed to detect containers of any kind.</title>
      <p>The third rule builds on the other two rules and e ectively ensures the
conditions needed for detecting this pattern. The hasPart property ensures that a
concrete listener must maintain a reference to the observable; the hasInputType
property ensures that the candidate add-listener method accepts only listener
objects, and nally the noti cation behavior is speci ed using method
invocation.
4</p>
      <sec id="sec-38-1">
        <title>Implementation and Evaluation</title>
      </sec>
    </sec>
    <sec id="sec-39">
      <title>We are currently developing a program understanding tool that utilizes reasoning</title>
      <p>services over the ontological representation of source code elements. This tool
currently accepts the Java byte code and the ontologies (Sections 2.1 and 2.2) as
input; the knowledge generator module parses the byte code and generates the</p>
    </sec>
    <sec id="sec-40">
      <title>RDF repository (Section 2.3). This repository is stored and managed by Jena</title>
      <p>
        [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], an open source Java framework for building Semantic Web applications by
providing programmatic support and handling for RDF, OWL, and SPARQL.
      </p>
    </sec>
    <sec id="sec-41">
      <title>Our tool currently supports SPARQL ontological queries against the knowledge base. Several SPARQL queries are embedded within the tool for retrieving the detected design pattern details. This process is simple and fully automated, no manual interaction is required by the reverse engineer.</title>
      <p>4.1</p>
      <sec id="sec-41-1">
        <title>Case Study: Design Pattern Detection</title>
      </sec>
    </sec>
    <sec id="sec-42">
      <title>We have conducted a preliminary study on multiple open source frameworks</title>
      <p>including JUnit8, JHotDraw, and Java AWT. The chosen frameworks vary in
size, they represent di erent domains, and most importantly, they were built
with design patterns in mind; this makes them a good t for evaluating our
approach. Instance detection is shown in Table 1.</p>
    </sec>
    <sec id="sec-43">
      <title>The interpretation of a pattern instance can be readily obtained from the corresponding rule for that pattern. For example, a Composite instance represents the child element's container, a Singleton instance represents the singleton</title>
      <p>8 http://www.junit.org
class, and a Template-method instance represents the template method itself.
In our manual evaluation of the obtained results, we adopted precision
measures { The number of correctly inferred pattern instances (True Positives) over
the total number of inferred instances. In order to accept an instance as truly
positive, this instance needs to be explicitly stated in the framework's
documentation or there have to be strong indications found by inspecting source code
and available comments. In all the tests performed, none of the inferred pattern
instances violates any of the structural or behavioral restrictions put forward for
that pattern in the axioms and rules; this means that the knowledge base
accurately represents the source code and the reasoner acted accordingly. However,
as noted in Table 1, there are a few cases in which precision su ers since the
identi ed instances were considered by our standards as false positives { falsely
inferred instances that were not designed as such. In all these cases, neither
documentation nor source code comments strongly certi ed those instances as true
positives.</p>
      <p>Recall statistics, however, are usually harder to come up with since they
require the identi cation of all false negatives { occurrences that were intended
as actual pattern instances but were not inferred by the reasoner. Most
framework documentations do not explicitly report on all actual number of pattern
instances, thus manual identi cation through source code inspection is subject
to one's interpretations of what constitutes an actual instance. Nevertheless, we
found some evidence of false negatives. Some of the instances were missed due
to our parser's inability to capture the complete picture of the running code;
others are due to the restrictions speci ed in OWL axioms and SWRL rules. For
example, the current rule for Template Method requires the primitive operation
that is called by the template method to be declared protected, however, this
is not a requirement; we found evidence of that on both JHotDraw and JUnit.</p>
    </sec>
    <sec id="sec-44">
      <title>It is obvious that such relaxation of the Template Method rule allows the rea</title>
      <p>soner to detect the missed instances. Generally speaking, relaxation reduces false
negatives; it does, however, increase the risk of false positives.</p>
      <p>It was evident in some cases that better capturing of behavior speci ed in
method bodies through more e ective parsing is indeed helpful, an appropriate
parsing provides more exibility when relaxing the requirements and certainly
improves both precision and recall. Nevertheless, our approach to pattern
detection distinguishes itself as being exible and usable such that users can relax or
restrict pattern descriptions as they wish. In Singleton, for example, accounting
for another form of lazy instantiation or perhaps disputing the way the static
reference is being declared, can be readily obtained by slightly modifying the
rule.</p>
      <p>A more comprehensive case study is currently being conducted. In this study
we are investigating the e ect of rule relaxation, applying our approach to other
frameworks, comparing our results to other non-model driven approaches, and
nally attempting to come up with recall statistics for selective well documented
frameworks. Preliminary results are extremely encouraging and show an
improvement However, a systematic study can uncover the overlap between
different approaches and most fundamentally show the value of ontology-based
modeling in software engineering.</p>
      <p>Table 2 shows statistics related to the software frameworks used in our study
as well as running time analysis.</p>
      <p>JUnit 3.7 JHotDraw 6.0 Java AWT 1.6
Statistics</p>
      <p>No. of Classes</p>
      <p>No. of OWL Individuals
Processing Time Parsing + Processing
Reasoner Time</p>
      <p>Visitor
Observer
Composite
Template Method
Singleton
99
1411
3
1.6
2.9
2.5
1.6
1.7
377
6254
6.5
18
33.3
29.1
14.7
14
549
11853
13
49.7
95.2
76
46
48</p>
    </sec>
    <sec id="sec-45">
      <title>It is evident that the time required for parsing the code, loading the ontolo</title>
      <p>gies, executing the queries, etc. is in most cases minimal when compared to the
time required by the reasoner to classify the ontologies and execute SWRL rules.
Furthermore, since rules allow for reasoning with all available individuals, rule
execution time is susceptible to increase as the number of OWL individuals in
the ABox (assertions that represent ground facts for individual descriptions) as
well as the complexity of the rules increase. In fact, Pellet, our experimentation
reasoner, does not claim optimal performance. However, it does have an
integrated rule engine that provides a complete and sound support for SWRL; that
de nitely adds to the cost of memory consumption and computational speed.</p>
      <sec id="sec-45-1">
        <title>Related Work</title>
        <p>
          On surveying the state of the art in design pattern recovery, we found numerous
non-semantic based approaches. Most existing approaches di er in the
detection technique utilized as well as the mechanism used for obtaining source code
knowledge and formalizing pattern descriptions. PINOT [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] for example,
represents design patterns using control- ow graphs which have some scalability
issues. This tool relies on processing the Abstract Syntax Tree (AST) of method
bodies to build a CFG for program elements. This CFG is then examined to
verify the existence of restrictions related to a particular design pattern. Tsantalis
et al. [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] proposed an approach that relies on graph and matrix representation
of both, the system under study and design patterns. The ASM framework is
used to parse the Java bytecode and populate the matrices for a particular
system; using a well known similarity score algorithm, system's representation is
matched with pattern descriptions that are hard-coded within their tool. Other
approaches utilize information obtained through dynamic analysis of the running
code [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. In general, runtime analysis can be e ective in detecting behavioral
patterns only when the software is backed by a suitable and complete test data.
        </p>
      </sec>
    </sec>
    <sec id="sec-46">
      <title>Whether the representation mechanism used is CFG, ASG, matrices, or DFG,</title>
      <p>it is our belief, however, that using a semantic-based formalism ensures consistent
and accurate functional representation of design patterns, their participants,
and the system under study. Furthermore, semantic representation allow for
computing inferences of knowledge that was not explicitly stated, yielding an
inference-based detection of design pattern instances.</p>
    </sec>
    <sec id="sec-47">
      <title>Closer to our approach is the work of Kramer and Prechelt [12] and Wuyts</title>
      <p>[13] in which declarative meta-programming techniques were utilized. In
particular, design patterns are described using variations of Prolog predicates and
limited information about program elements and their roles are represented as</p>
    </sec>
    <sec id="sec-48">
      <title>Prolog facts. Most recently, the work presented in [14] and [15] explored the</title>
      <p>notion of utilizing OWL ontologies to structure the source code knowledge.
However, these approaches provide template ontology descriptions or at best
rudimentary ontologies that must be extended to become more expressive. The
full potential of semantic-based modeling was yet to be explored. The primary
focus of the work presented in [14] is basically to facilitate exchanging and
sharing knowledge about patterns, anti-patterns, and refactoring. Our aim, however,
is to provide a true semantic-based approach for design pattern detection and
provide an aid for understanding, reasoning, and conceptualizing source code.
6</p>
      <sec id="sec-48-1">
        <title>Conclusion and Future Work</title>
        <p>In this paper, we proposed a semantic-based approach for software
understanding. In particular, we illustrated our approach for recovering design patterns from
source code. The proposed approach is fully automatic and distinguishes itself
as being extensible and extremely usable. Moreover, the proposed approach is
purely semantic-based that describes knowledge using a set of ontologies. SCRO,
our main ontology, provides a consistent and complete functional description of
program elements and allows for semantic inference of knowledge that was not
explicitly stated.</p>
        <p>The tool described in Section 4 is currently a work in progress. Our ultimate
goal is to provide a more comprehensive program comprehension environment for
conceptualizing, understanding, and recovering software knowledge to aid both
reverse and forward engineering activities. Notably the generated knowledge by
the bytecode parser may not provide the full picture of the running code; in
particular, method bodies need to be e ectively parsed to capture more detailed
behavioral aspects of programs. We are currently investigating the use of other
means to augment the current knowledge, the more knowledge captured, the
more exibility is given to the user in authoring the rules for detecting design
patterns. We also investigating our approach on other design patterns found in
literature and conducting more case studies as described in Section 4.1.
12. Kramer, C., Prechelt, L.: Design Recovery by Automated Search for Structural
Design Patterns in Object Oriented Software. In: 21st IEEE/ACM International
Conference on Automated Software Engineering, pp. 123{132. (2006)
13. Wuyts, R.: Declarative reasoning about the structure of object-oriented systems.</p>
        <p>In: Proceedings of TOOLS USA '98, pp. 112{124. (1998)
14. Dietrich, J., Elgar, C.: A Formal Description of Design Patterns Using OWL. In:
Proceedings of the 2005 Australian Software Engineering Conference, pp. 243{250.
(2005)
15. Kirasic, D., Basch, D.: Ontology-Based Design Pattern Recognition. In: 12th
International Conference on Knowledge-Based and Intelligent Information and
Engineering Systems (KES'08), pp. 384-393. (2008)</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Gamma</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Helm</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , Johnson, R.,
          <string-name>
            <surname>Vlissides</surname>
          </string-name>
          , J.: Design Patterns:
          <article-title>Elements of Reusable Object-Oriented Software</article-title>
          .
          <string-name>
            <surname>Addison-Wesley</surname>
          </string-name>
          (
          <year>1995</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Chikofsky</surname>
            ,
            <given-names>E.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cross</surname>
            ,
            <given-names>J.H.</given-names>
          </string-name>
          :
          <article-title>Reverse engineering and design recovery: A taxonomy</article-title>
          .
          <source>IEEE Software</source>
          .
          <volume>7</volume>
          (
          <issue>1</issue>
          ),
          <volume>13</volume>
          {
          <fpage>17</fpage>
          (
          <year>1990</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Berners-Lee</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hendler</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lassila</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>The Semantic Web</article-title>
          . Scienti c American.
          <volume>284</volume>
          (
          <issue>5</issue>
          ), pp.
          <volume>34</volume>
          {
          <fpage>43</fpage>
          , (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Sirin</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parsia</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grau</surname>
            ,
            <given-names>B.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kalyanpur</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kartz</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Pellet</surname>
            :
            <given-names>A Practical</given-names>
          </string-name>
          <string-name>
            <surname>OWL-DL Reasoner</surname>
          </string-name>
          .
          <source>Web Semantics: Science, Services and agents on the World Wide Web</source>
          .
          <volume>5</volume>
          (
          <issue>2</issue>
          ) (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Knublauch</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fergerson</surname>
            ,
            <given-names>R.W</given-names>
          </string-name>
          , Noy,
          <string-name>
            <given-names>N.F.</given-names>
            ,
            <surname>Musen</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. A.</surname>
          </string-name>
          :
          <article-title>The Protege OWL Plugin: An Open Development Environment for Semantic Web Applications</article-title>
          . In: Third International Semantic Web Conference (ISWC
          <year>2004</year>
          ), pp.
          <volume>229</volume>
          {
          <fpage>243</fpage>
          . (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Source</given-names>
            <surname>Code Representation</surname>
          </string-name>
          <article-title>Ontology (SCRO); Design Pattern Ontolgies; and an automatically generated ontology from parsing the JHotDraw application framework</article-title>
          , http://www.cs.uwm.edu/~alnusair/ontologies
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Motik</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grau</surname>
            ,
            <given-names>B.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sattler</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          :
          <article-title>Structured Objects in OWL: Representation and Reasoning</article-title>
          .
          <source>In: 17th International Conference on World Wide Web</source>
          , pp.
          <volume>555</volume>
          {
          <fpage>564</fpage>
          . (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>McBride</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Jena: A Semantic Web Toolkit</article-title>
          .
          <source>In: IEEE Internet Computing</source>
          ,
          <volume>6</volume>
          (
          <issue>6</issue>
          ),
          <fpage>55</fpage>
          -
          <lpage>59</lpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Shi</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Olsson</surname>
            ,
            <given-names>R.A.</given-names>
          </string-name>
          :
          <article-title>Reverse engineering of design patterns from Java Source Code</article-title>
          .
          <source>In: 21st IEEE/ACM International Conference on Automated Software Engineering</source>
          , pp.
          <volume>123</volume>
          {
          <fpage>132</fpage>
          . (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Tsantalis</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chatzigeorgiou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stephanides</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Halkidis</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Design Pattern Detection Using Similarity Scoring</article-title>
          .
          <source>IEEE Transactions on Software Engineering</source>
          .
          <volume>32</volume>
          (
          <issue>11</issue>
          ),
          <volume>896</volume>
          {
          <fpage>909</fpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>De Lucia</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deufemia</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gravino</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Risi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Behavioral Pattern Identi cation Through Visual Language Parsing and Code Instrumentation</article-title>
          .
          <source>In: European Conference on Software Maintenance and Reengineerig (CSMR'09)</source>
          , pp.
          <volume>99</volume>
          {
          <fpage>108</fpage>
          . (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>