<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Enhancing Xtext for General Purpose Languages</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Adolfo Sanchez-Barbudo Herrera</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of York</institution>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <fpage>2</fpage>
      <lpage>9</lpage>
      <abstract>
        <p>Xtext is a popular language workbench conceived to support development of tooling (e.g. parsers and editors) for textual languages. Although Xtext o ers strong support for source code generation when building tooling for Domain Speci c Languages (DSL), the amount of hand-written source code required to give support to complex General Purpose Languages (GPL) is still signi cant. This research investigates techniques for reducing the amount of hand-written source code for supporting GPLs, via the development of new DSLs from which source code can be automatically generated. In particular, these techniques will be researched in the context of the OCL and QVT Operational languages.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Context</title>
      <p>
        Since 2010, Eclipse OCL has evolved to better align with the OMG OCL
standard, while at the same time providing enhancements in the form of high-quality
textual editors. These editors have the particular feature that they are mostly
automatically generated using the Xtext [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] language workbench. Eclipse QVTo
(which implements the Operational QVT standard) is a mature project which
relies on Eclipse OCL; the former has not evolved synchronously and in-step
with the latter over the last 3 years. As a result, it is not aligned with the new
      </p>
      <sec id="sec-1-1">
        <title>Eclipse OCL implementation. Therefore, there is a need to make the Eclipse</title>
      </sec>
      <sec id="sec-1-2">
        <title>QVTo implementation evolve in the same direction as Eclipse OCL, which in</title>
        <p>turn evolves according to the OMG standard.</p>
        <p>When undertaking that alignment, parsers and editors generated by Xtext
are desired (e.g. reuse). However, in the Eclipse OCL implementation, there is
a signi cant amount of hand written source code which should be avoided in
favour of higher level of abstraction languages from which the source code can
be generated. Those languages will ease the provision of the parsers and editors
for Eclipse QVTo, which provides enough motivation for this EngD project.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Research project scope</title>
      <sec id="sec-2-1">
        <title>The activities to be performed in this research project are conceived to ultimately</title>
        <p>provide support { in the form of Eclipse-based parsers and editors { for the OMG</p>
      </sec>
      <sec id="sec-2-2">
        <title>OCL and QVTo languages; in particular, this will be via the o cial Eclipse OCL and QVTo projects. As a consequence of technology decisions made earlier in the Eclipse projects, the research project is constrained to the use of the Xtext language workbench.</title>
        <p>4</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Problem</title>
      <sec id="sec-3-1">
        <title>Xtext is a language workbench which can be used to automatically generate</title>
        <p>tooling (e.g. parsers and editors) for textual languages. However, whilst Xtext
is suitable to fully generate tooling for textual DSLs, it cannot currently be
used for a completely automated generation process to cope with more complex</p>
      </sec>
      <sec id="sec-3-2">
        <title>GPLs [5]. Instead, a semi-automated approach is used for producing GPL tools,</title>
        <p>in which Xtext provides automation in the rst step, and generated code is
complemented by hand-coding to support particular language requirements. A
more complete automated generative process from language descriptions, with a
higher level of abstraction that reduces the amount of hand-coding, is desirable.</p>
      </sec>
      <sec id="sec-3-3">
        <title>The following subsections will explain some of the issues arising when using</title>
      </sec>
      <sec id="sec-3-4">
        <title>Xtext for auto-generation of tools for GPLs such as OCL.</title>
        <p>4.1</p>
        <sec id="sec-3-4-1">
          <title>Fixed concrete and abstract syntax</title>
        </sec>
      </sec>
      <sec id="sec-3-5">
        <title>The OMG OCL and QVT speci cations separately describe the abstract syntax (AS) and the concrete syntax (CS) of the languages, exposing a gap between them. For example, Figure 1 depicts the VariableExp concept from the OCL AS. Figure 2 shows the corresponding CS grammar excerpt.</title>
      </sec>
      <sec id="sec-3-6">
        <title>The typical approach [6] to bridge this gap is as follows. First, we obtain a</title>
        <p>syntax tree from a parser, and with further analysis algorithms such a syntax
tree is re ned. This re nement is declared in OCL by the grammar synthesized
attributes showed in Figure 2. However, if this information was moved into an</p>
      </sec>
      <sec id="sec-3-7">
        <title>Xtext grammar, e.g., Listing 1.1, two problems can be seen:</title>
        <p>1 VariableExp :
2 referredVariable=[Variable|simpleNameCS]
3 | referredVariable=[Variable|’self’];</p>
      </sec>
      <sec id="sec-3-8">
        <title>Listing 1.1. Potential Xtext grammar rule to obtain a VariableExp</title>
      </sec>
      <sec id="sec-3-9">
        <title>Firstly, concepts representing both abstract syntax (e.g. VariableExp, line</title>
      </sec>
      <sec id="sec-3-10">
        <title>1) and concrete syntax (e.g. simpleNameCS, line 2) need to be mixed. The</title>
        <p>essence of separating the abstract syntax from concrete syntax as it is done in the</p>
      </sec>
      <sec id="sec-3-11">
        <title>OCL speci cation would get lost with an Xtext grammar. Secondly, syntax tree re nements (described by the right-hand side of the OCL grammar's synthesised attributes) can't be represented in Xtext. Thus the OCL CS2AS bridge needs to be hand-coded by customizing the generated parser.</title>
        <p>4.2</p>
        <sec id="sec-3-11-1">
          <title>Visibility rules for name analysis</title>
        </sec>
      </sec>
      <sec id="sec-3-12">
        <title>Another challenge when re ning syntax trees is name resolution, based on qual</title>
        <p>i ed accesses, nested scopes, and inheritance; these constructs are common in
OO-derived GPLs. For instance, listing 1.2 depicts a typical name resolution
scenario, with respect to the method which can be referred to by a method call
expression.</p>
        <p>1 class A {
2 public void methodA() { // do something }
3 }
4 class B extends A{
5 public void methodB() { // do something }
6 public static void main(String [ ] args) {
7 B b = new B();
8 b.methodA();
9 }
10 }</p>
      </sec>
      <sec id="sec-3-13">
        <title>Listing 1.2. Simple name resolution Java example</title>
      </sec>
      <sec id="sec-3-14">
        <title>When calling a class method for a given object in Java, one could choose from a set of methods based on visibility rules, e.g., all public methods de ned by the class plus all public methods of every ancestor. Xtext grammars don't provide means to de ne this kind of rule, which has to be manually encoded.</title>
        <p>4.3</p>
        <sec id="sec-3-14-1">
          <title>Syntax tree rewrites based on semantic analysis.</title>
        </sec>
      </sec>
      <sec id="sec-3-15">
        <title>In languages like OCL, we might nd situations in which the nal abstract</title>
        <p>syntax tree can't be obtained with simple context-free grammar syntax rules.
Consider the OCL expression self.aProperty-&gt;size(). The size() library operation
is normally applied on collections in order to obtain its number of elements. In
this case, it is a applied on the value of aProperty of a model element. The
issue is that it isn't known in advance if the value of aProperty is a collection
until some semantic analysis is done. In the OCL language, if aProperty turned
out to be de ned as a single-value, there is a syntactic rewrite { an implicit
collection conversion. This kind of syntax rewrite is described in the OMG OCL
speci cation by Listing 1.3; it can't be expressed in Xtext.</p>
        <p>1 OperationCallExpCS.ast.source =
2 if OclExpressionCS.ast.type.oclIsKindOf(CollectionType)
3 then OclExpressionCS.ast
4 else OclExpressionCS.ast.withAsSet()
5 endif</p>
      </sec>
      <sec id="sec-3-16">
        <title>Listing 1.3. Syntax rewrite example</title>
        <p>5</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Proposed solutions and expected contributions</title>
      <p>Given the issues mentioned previously, we propose improvements to Xtext to
provide support for generating tooling for more complex GPLs. Currently, crucial
activities such as name resolution and syntax rewrites have to be hand-written
by customizing code generated by Xtext1. Our idea is to use DSLs which capture
the variability of these activities, in order to further automatically generate the
corresponding source code. The overall approach is depicted in Figure 3. The
following subsections describe the expected contributions.
1 Xtext has a generated, but naive name resolution strategy</p>
      <sec id="sec-4-1">
        <title>DSLs to reduce the amount of hand-written artefacts</title>
        <sec id="sec-4-1-1">
          <title>We will propose a set of DSLs, complementing Xtext grammars, which will allow language engineers to reduce the amount of hand-written artefacts needed to give support to GPLs. These DSLs will be accompanied with the required tooling (e.g. editors, code generators) to automate generation of source code.</title>
          <p>5.2</p>
          <p>E</p>
          <p>cient scheduling of activities</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>Our proposed enhancements to Xtext are in terms of bridging CS and AS, name</title>
          <p>resolution and syntax rewrites. These di erent activities are closely related: the</p>
        </sec>
        <sec id="sec-4-1-3">
          <title>AS graphs will not be completed until name resolution is performed, whereas</title>
          <p>some names resolutions can't be undertaken until some AS elements are
available. Some syntax rewrites are not possible without performing name resolution.</p>
        </sec>
        <sec id="sec-4-1-4">
          <title>So, there is a complex chain of activity dependency.</title>
        </sec>
        <sec id="sec-4-1-5">
          <title>One of the goals of this research is analysing the dependencies among these</title>
          <p>activities with the aim of generating an e cient implementation capable of
exploiting them for faster AS model retrieval. This area of research will produce a
contribution to the eld, because related work either addresses these activities
in an isolated way, or does not consider the overall dependency issue at all.
6</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Related Work</title>
      <sec id="sec-5-1">
        <title>To the best of our knowledge, there is no related work that aims to provide</title>
        <p>abstract descriptions to complement Xtext grammars so as to generate additional
source code for GPLs. There exists some work related to the detailed research
activities that may provide good ideas or inspiration to individual problems
(though none target Xtext tool generation or grammars).
6.1</p>
        <sec id="sec-5-1-1">
          <title>NaBL and Stratego</title>
          <p>
            The most relevant related work on name resolution is by Konat et al [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ], which
introduces NaBL to specify name bindings for a language in terms of namespaces,
scopes, site de nitions and use de nitions. This declarative language comprises
a DSL to support name resolution when building AS trees, however it's only
conceived to be used by Spoofax [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ] language workbench. Syntax rewrites can
be speci ed in Spoofax by using a transformation language called Stratego [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ].
6.2
          </p>
        </sec>
        <sec id="sec-5-1-2">
          <title>JastAdd and JastEMF</title>
        </sec>
      </sec>
      <sec id="sec-5-2">
        <title>JastAdd [9] is another approach related to our research topics; it provides a</title>
        <p>language (a avour of attribute grammar) in which not only are there syntactic
rules to initially build AS trees but also language constructs to de ne
inherited and synthesised attributions; these can be used to address name resolution
concerns, as well as context-dependent syntax rewrites.</p>
        <p>
          From the name resolution point of view, NaBL provides a more convenient
language with a higher level of abstraction. However, JastAdd introduces
constructs for syntax rewrites, though we know of no work that analyses
dependencies between syntax rewrites and name resolution. JastEMF [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] demonstrates
the suitability of using JastAdd to create parsers which produce EMF-based
models from textual inputs. Both JastAdd and JastEMF do not yet produce the
high-quality textual editors required for this project.
6.3
        </p>
        <p>Gra2Mol</p>
      </sec>
      <sec id="sec-5-3">
        <title>Canovas et al. [11] propose Gra2Mol to obtain models from source code. Their</title>
        <p>tool provides the means to facilitate the creation of AS models generated from
CS models produced by a parser { in this case by the means of a domain speci c
transformation language (DSTL). Despite the accepted convenience of DSTLs for
this task, a bespoke query language is used which is fragile in scenarios where the
structure to be queried changes. For instance, the CS metamodel (the source of
a transformation) frequently changes when incrementally building support for a
language. Gra2Mol also doesn't generate a textual editor for the target language.
7</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Plan for evaluation and validation</title>
      <sec id="sec-6-1">
        <title>The following metrics and evaluation methods are presented which will be used to validate the contributions of the research.</title>
      </sec>
      <sec id="sec-6-2">
        <title>To validate the contribution of the DSLs (and source generators), the lines</title>
        <p>of code will be measured and compared: the Java sources which currently need
to be hand written, with respect to the new descriptions based on those DSLs.</p>
      </sec>
      <sec id="sec-6-3">
        <title>To validate the contribution that a dependency analysis might provide, the same test suites will be executed using both implementations and relevant hypothesis testing will be performed to verify that there is a signi cant improvement in the measured execution time results.</title>
        <p>8
8.1</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Preliminary work and current status</title>
      <sec id="sec-7-1">
        <title>Complete OCL documents</title>
        <p>
          Instead of pursuing the aforementioned DSLs, a more general language will be
used as part of the rst prototype. In this case, the descriptions are given in the
form of Complete OCL documents [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. The reasons behind this decision are as
follows.
        </p>
        <p>{ It is not yet clear what expressiveness is needed of a language to represent
complex scenarios for name resolution or syntax rewrites. A GPL like OCL,
we hypothesise, may provide enough exibility.
{ The source code generated by Xtext is Java. The Eclipse OCL project
includes an OCL2Java code generator so that complex scenarios described
with OCL can be translated to Java.
{ One of the goals proposed by the sponsor is producing some parts of the</p>
        <sec id="sec-7-1-1">
          <title>OMG OCL speci cation [1], which currently uses OCL to specify CS2AS descriptions. By using OCL to drive part of the Eclipse OCL implementation, it could be reused to drive part of the OMG OCL speci cation.</title>
        </sec>
        <sec id="sec-7-1-2">
          <title>The aforementioned DSLs will need to be investigated. However, instead of di</title>
          <p>rectly producing Java code, the source code generators will produce Complete</p>
        </sec>
        <sec id="sec-7-1-3">
          <title>OCL documents. Figure 4 depicts the whole generative process.</title>
          <p>One of the mentioned activities consists of specifying the CS2AS description
and generate source code2 from them. Assuming that Complete OCL documents
will be used to describe how CS and AS could be bridged, Listing 1.4 shows an
example related to an OCL language concept:
1 context VariableExpCS
2 def : ast(env : env::Environment) : ocl::VariableExp =
3 let refVariable : ocl::Variable = env.lookupVariable(self.name)
4 in ocl::VariableExp {
5 referredVariable = refVariable,
6 type = refVariable.type
7 }</p>
        </sec>
        <sec id="sec-7-1-4">
          <title>Listing 1.4. Complete OCL based CS2AS description</title>
          <p>8.3</p>
        </sec>
      </sec>
      <sec id="sec-7-2">
        <title>Name resolution description</title>
        <p>Some progress has currently been done with respect to how name resolution
descriptions will be expressed and Complete OCL documents will also be used
2 Due to space limitations details about source code generator will be omitted
for this task. They will describe, for instance, how AS elements, with the
corresponding name, are contributed to an environment so that those elements can
be found by name later on. Listing 1.5 shows an example of how a Package
contributes named elements (nested packages and types) to the environment.
1 context Package
2 def : _env(env : env::Environment) : env::Environment =
3 env.nestedEnv()
4 .addElements(self.nestedPackage)
5 .addElements(self.ownedType)
6 }</p>
        <sec id="sec-7-2-1">
          <title>Listing 1.5. Complete OCL based Name Resolution description</title>
          <p>9</p>
          <p>Future work
{ Finish source code generators from CS2AS and name resolution descriptions.
{ Design a solution for syntax rewrites including source code generator.
{ Improve generated source code implementation based on dependency
analysis of CS2AS, name resolution and syntax rewrites activities.
{ Design high level of abstraction DSLs and implement the corresponding
generators to produce Complete OCL documents.
{ Evaluate the contribution of nal solutions as described in this paper.</p>
        </sec>
        <sec id="sec-7-2-2">
          <title>Acknowledgement. We gratefully acknowledge the support of the EPSRC via the LSCITS initiative, and the sponsor Willink Transformations Ltd.</title>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. OMG. OCL,
          <year>V2</year>
          .4. http://www.omg.org/spec/OCL/2.4,
          <string-name>
            <surname>January</surname>
          </string-name>
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. OMG. QVT,
          <year>V1</year>
          .2. http://www.omg.org/spec/QVT/1.2, May
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Eclipse</given-names>
            <surname>Platform</surname>
          </string-name>
          . On-Line: http://www.eclipse.org/,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>M.</given-names>
            <surname>Eysholdt</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Behrens</surname>
          </string-name>
          . Xtext:
          <article-title>Implement your language faster than the quick and dirty way</article-title>
          .
          <source>In OOPSLA</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Edward</surname>
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Willink</surname>
          </string-name>
          . Re-engineering
          <string-name>
            <surname>Eclipse</surname>
            <given-names>MDT</given-names>
          </string-name>
          /OCL for Xtext.
          <source>Electronic Communications of the EASST</source>
          ,
          <volume>36</volume>
          :
          <fpage>1</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Alfred</surname>
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Aho</surname>
          </string-name>
          , Monica S. Lam, Ravi Sethi, and
          <string-name>
            <surname>Je</surname>
            rey
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Ullman</surname>
          </string-name>
          . Compilers: principles, techniques, &amp; tools. Pearson Education Inc.,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Gabriel</given-names>
            <surname>Konat</surname>
          </string-name>
          , Lennart Kats, Guido Wachsmuth, and
          <string-name>
            <given-names>Eelco</given-names>
            <surname>Visser</surname>
          </string-name>
          .
          <article-title>Declarative name binding and scope rules</article-title>
          .
          <source>In Software Language Engineering</source>
          , volume
          <volume>7745</volume>
          . Springer Berlin Heidelberg,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Lennart</given-names>
            <surname>Kats</surname>
          </string-name>
          and
          <string-name>
            <given-names>Eelco</given-names>
            <surname>Visser</surname>
          </string-name>
          .
          <article-title>The spoofax language workbench: rules for declarative speci cation of languages and IDEs</article-title>
          .
          <source>In ACM Sigplan Notices</source>
          , volume
          <volume>45</volume>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Torbjo</surname>
          </string-name>
          <article-title>rn Ekman and Gorel Hedin. Modular name analysis for Java using JastAdd</article-title>
          .
          <source>In Generative and Transformational Techniques in SE</source>
          . Springer,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10. Christo Burger, Sven Karol, and
          <string-name>
            <given-names>Christian</given-names>
            <surname>Wende</surname>
          </string-name>
          .
          <article-title>Applying attribute grammars for metamodel semantics</article-title>
          .
          <source>In Proceedings of the International Workshop on Formalization of Modeling Languages, FML '10</source>
          , pages
          <fpage>1</fpage>
          <article-title>:1{1:5</article-title>
          . ACM,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Javier</surname>
          </string-name>
          <article-title>Canovas Izquierdo and Jesus Garc a Molina. Extracting models from source code in software modernization</article-title>
          .
          <source>Software &amp; Systems Modeling</source>
          ,
          <volume>13</volume>
          :1{
          <fpage>22</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>