<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>JAVAC H E C K : A Domain Specific Language for the static analysis of Java code</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rules' Logic</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Report</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Java project</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>JavaCheck Runtime wordreference Java runnable</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Sara Pe ́rez-Soler, Juan de Lara Modelling &amp; Software Engineering Research Group</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>-The increasing complexity of software systems has raised the need for code analysis tools to assess its quality. However, these tools offer predefined metrics or evaluation criteria, which are frequently hard to extend or modify. For this purpose, we have developed JAVACHECK, a DomainSpecific Language targeted to define expected properties of Java code bases. JAVACHECK can be used in a variety of scenarios related to quality assurance: to define expected code styles (e.g., naming conventions), specify programming conventions (e.g., private attributes), detect code smells possibly indicating errors (e.g., equals method with no hashCode), and detect patterns (e.g., uses of Singleton) or requirements demanded in a project (e.g., a class with name synonym to “Professor”). Index Terms-Domain-Specific Languages, Source code analysis, Quality JavaCheck rules code generator AST II. APPROACH</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>I. INTRODUCTION</title>
      <p>
        Software projects are increasing their complexity and size
to address the requirements of today’s systems. Software is
typically developed by (sometimes large) teams of
programmers with dissimilar skills. Hence, it is common practice to
use tools to check code quality or help in enforcing company
or project code standards [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. However, sometimes these
tools are rigid, or difficult to adapt and extend.
      </p>
      <p>
        To improve this situation, we have created a Domain
Specific Language (DSL) called JAVACHECK. The language permits
expressing predicates to be evaluated over the source code
bases of Java projects. The DSL is flexible and allows the
expression of style and programming conventions, can be
used to search for occurrences of programming idioms and
patterns, and to express code smells [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] possibly indicating
some potential problem. JAVACHECK is connected with services
to detect synonyms in several languages, which permits its
use to specify expected domain requirements (e.g., to partially
automate the correction of programming exercises). The DSL
has been created using Model-based technology (EMF and
Xtext) and is integrated within Eclipse. Hence, it shows the
potential of Model-driven engineering in the programming
domain.
      </p>
      <p>Paper organization. Sec. II overviews our approach,
explaining its different parts. Sec. III describes tool support and some
initial experiments. Sec. IV compares with related work and
Sec. V finishes with the conclusions and future work.</p>
      <p>Quality
engineer</p>
      <p>JavaCheck</p>
      <p>MM
«conforms to»</p>
      <p>Fig 1 shows the general architecture of our approach. We
have created a DSL called JAVACHECK, which can be used to
define predicates that Java projects should fulfil.</p>
      <p>Predicates can be used to express general quality properties
(e.g., ensure all attributes of a class are private), accepted
Java style guidelines (e.g., class names in upper camel case,
constant names in uppercase), project-specific guidelines (e.g.,
maximum number of classes in a package),
applicationspecific checkings (there should be a class with a name
synonym to “Machine”), or smells of possible errors (a class
redefining method equals, but not hashcode). JAVACHECK has a
textual syntax and has been defined through a meta-model.</p>
      <p>The predicates expressed with JAVACHECK are compiled into
Java. This Java code uses a library we have built, which
offers services to parse Java code into an Abstract Syntax Tree
(AST), or to issue queries on Wordreference1, to obtain lists
of synonyms in both English and Spanish.</p>
      <p>Fig 2 shows a small part of our meta-model. RuleSet is the
root class, and contains a list of project names to be checked
and a set of sentences to check on them. There are two types
of sentences: the rules that will be evaluated, and intermediate
1http://www.wordreference.com/
1 Equals: Method satisfy name=”equals” and return type=Primitive.boolean</p>
      <p>and
2 parameter size=1 types=[”Object”];
3 HashCode: Method satisfy name=”hashCode” and return type=Primitive.int</p>
      <p>and parameter size=0;
4
5 all Class which have f one Method in Equals g
6 satisfy have f one Method in HashCode g;
Listing 1: JAVACHECK program to detect classes with equals
but no hashCode
variables to store collections of elements that have some that are not static and final (i.e., all attributes that are not
properties. All sentences have a type element, that can be constant), must be private or protected. The second rule (lines
File, Package, Interface, Class, Enum, Method or Attribute and a 7) states that every method must have a JavaDoc comment with
clause that needs to be satisfied. The satisfy clause contains all @parameter and @return tags. The third rule (lines 9-11) checks
properties that the element must comply with. The rules have that the project has one class named User or a synonym (in
a quantifier (all, exist or one) and a filter, with same structure English), and this class must have an attribute named address.
as the satisfy clause. The rules can also reference variables. For this rule WordReference is used to obtain the synonyms.</p>
      <p>Listing 1 shows some simple JAVACHECK sentences. The first Finally, the last rule checks that all abstract classes have some
sentence collects all methods named equals with a parameter children class.
of type Object and return type boolean, in a collection variable 1 Projects Name: ;
named Equals. The second collection named HashCode contains 2
3 all Attribute
all methods named hashCode, without parameters and integer 4 which is not modified with [static and final]
return type. Lines 5-6 show a rule that checks that all classes 5 satisfy is modified with [private or protected];
6
with one method in the Equals collection also have one method 7 all Method satisfy JavaDoc @parameter @return;
in HashCode collection. Overall, occurrences of this rule may 8
signal potential problems in the Java code. 109 onoenCelaAsttsribsuatteissfyatnisafmyenalimkee‘=‘U‘‘asedrd”r,eEsnsg”lish and have f
11 g;
12
13 all Class which is modified with [abstract] satisfy is superclass;</p>
      <p>JAVACHECK is evaluated over the AST of Java code. The 67 FAIL:
AST is a tree representation of the syntactic structure of source 8 These elements satisfy is superclass [1.. ]:
code. We have programmed a library to create and explore the 9 In fiElleemD:ennWt(olirnkes:p3a)cenEvaluatensrcnabstractClassnElement.java the class
AST, and evaluate all the sentences. The library defines all 10 is super of:
the functionality of the static analyser, and the code generator 11 In filleinDe::n5W)orkspacenEvaluatensrcnrestOfClassnPoint.java the class Point (
needs only to synthesize code for the specific sentences by
calling the library. When all sentences are generated, they can Listing 4: Example of JAVACHECK report
be evaluated scanning all nodes of the AST and checking the
properties.
1 all Class satisfy name type= upper camel case;</p>
    </sec>
    <sec id="sec-2">
      <title>Listing 2: JAVACHECK naming convention rule</title>
      <p>Listing 2 shows another example to check a naming
convention for classes. In particular, it checks that all class names
are written in upper camel case. In this example the analyser
first obtains all class declaration nodes in the ASTs of the
project, and then checks that all names of these nodes are in
upper camel case.</p>
    </sec>
    <sec id="sec-3">
      <title>III. TOOL SUPPORT AND EXPERIMENTATION</title>
      <p>We have created an Eclipse plugin for JAVACHECK using
Xtext and Xtend. The DSL allows us to express characteristics
of Java programs, and then reports the result of the analysis.</p>
      <p>Listing 3 shows a example of JAVACHECK file with 4 rules.
First, the project or projects to be analysed should be indicated
(line 1). All projects to be analysed are required to be in the
Eclipse workspace. A “*” in the name makes JAVACHECK take
all projects in the workspace.</p>
      <p>Following the project name, a JAVACHECK program contains
the sentences to be checked. In the listing we show some
examples. The first rule (lines 3-5) checks that all attributes</p>
    </sec>
    <sec id="sec-4">
      <title>Listing 3: Example JAVACHECK program</title>
      <p>1 all class which modifiers: [ (abstract) ] satisfy is superclass [1.. ]
2 Checked.....ERROR
3 PASS:
4 These elements do not satisfy is superclass [1.. ]:
5 In file D:nWorkspacenEvaluatensrcnabstractClassnPlane.java the class</p>
      <p>Plane (line: 3)
Listing 4 shows the JAVACHECK report produced by the last rule
of Listing 3. Currently, the report is a text file showing if the
rule is met or not (in this case it is not), and then listing all
the elements that pass and that do not pass the rule.
A. Experimentation</p>
      <p>We present two preliminary experiments with JAVACHECK.
The first one is directed to assess expressivity, and usefulness
to detect problems in the code. The second one is directed to
check its scalability.</p>
      <p>In the first experiment we use JAVACHECK as a way to
semiautomatically assess student projects related to the creation
of an information system for an antiquarian. We used three
types of rules for validation: style, programming and
domainspecific. Some style rules included: all files have only a class,
an interface or an enumeration; every file has less than 2000
lines of code (LOC); methods’ bodies should be less than 30
LOC; every attribute that is not constant must be written in
lower camel case, while constant attributes must be in upper
case; the enumerations, classes and interfaces names must
be in upper camel case; every class, method, interface and
enumeration must have a JavaDoc comment; every package
must have a Java file (i.e., must not be empty), among others
(a total of 15 rules).</p>
      <p>Programming rules included: every abstract class must have
some children; all interfaces must be extended by other
interface or implemented by some class; every class that
implements Comparable must override equals and hashCode
methods and there is no method that returns a value of
type Object. Regarding domain-specific rules, after reading the
requirements, we included these rules: there is one class called
Item or a synonym, which is abstract and public, is extended 3
or 4 times and has an identifier that is integer or long attribute,
a name or description, a date and a price. The project must
also have three classes, with names synonym to Small, Bulky
and WorkOfArts, all extending Item.</p>
      <p>We were able to successfully encode these rules in
JAVACHECK, hence showing good expressivity. Regarding
usefulness, we found several problems in the analysed code. The
most prominent ones included, lack of JavaDoc comments,
many methods over 30 LOC, abstract classes with no children,
seven methods returning Object and a class Item with no date.
As we were able to found these defects, we concluded that</p>
    </sec>
    <sec id="sec-5">
      <title>JAVACHECK was useful for our purpose.</title>
      <p>To measure performance we run the previous 15 style
rules on a larger project, the org.eclipse.jdt.core of the library
org.eclipse.jdt.core of Eclipse. This project has 1,443 files
with 1,442 classes, 238 interfaces, 17 enumerations, 24.290
methods and 12.709 attributes. The check took 6 minutes
approximately, which we see as reasonable for large projects.</p>
    </sec>
    <sec id="sec-6">
      <title>IV. RELATED WORK</title>
      <p>
        Model-Driven Engineering (MDE) has been used to solve
different problems in the programming domain, like reverse
engineering [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], repository mining [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] or comparing open
source software using quality models [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Many times, the
code needs to be represented as a model, conforming to a
meta-model, so that it can be queried and processed using
model management tools. However, this has the drawback of
requiring too long pre-processing times [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. To solve this
issue, in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] the Epsilon model connectivity layer was extended
so that the Epsilon model management languages can be run
on Java programs. Our approach goes in this direction, as
JAVACHECK executes on Java ASTs. However, our approach is
more direct, as the generated code can directly access the AST
with no need for an intermediate layer. Moreover JAVACHECK
is a DSL specifically designed for querying Java ASTs.
      </p>
      <p>Regarding code analysis tools, there are two main types:
static and dynamic. The first ones analyse the code without
running the program, and can be used in the earliest phases
of programming. The second ones, typically testing tools, can
be used at the end of the coding.</p>
      <p>
        There are many tools able to assess code quality in a static
way [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. PMD [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] works over Java and other languages, and
can detect potential problems like empty try/catch/finally/switch
statements, dead code, overcomplicate expressions or duplicate
code. New rules can be added by coding them in Java or
XPath [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. FindBugs [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] focuses on finding coding errors
and supports only Java. It cannot be increased with new rules
and works over Java bytecode. CheckStyle [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] focuses on
analysing style conventions. The tool is highly configurable
with a XML file and allows creating new rules, coding them
in Java and using ASTs. SonarQube [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] uses various static
analysis tools like PMD, CheckStyle or FindBugs to obtain
metrics to help improving the quality of the source code. It
can show information about the architecture, design, duplicate
code, programming rules, possible errors and their possible
solutions. Finally, Semmle QL [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] is a query language
over code. This language is based on DataLog, so it needs to
process the code first, to obtain a relational representation.
      </p>
      <p>In conclusion, with respect to MDE approaches to solve
problems in the programming domain, JAVACHECK can be
executed on ASTs more directly, due to its compilation approach.
With respect to existing source code analysis tools, JAVACHECK
is a DSL that permits a high customization of queries, with
no need for low-level coding based on XPath or ASTs.</p>
    </sec>
    <sec id="sec-7">
      <title>V. CONCLUSIONS AND FUTURE WORK</title>
      <p>We have presented JAVACHECK, a DSL for expressing rules to
be checked on Java projects. We have built an Eclipse plugin
which permits the integration of JAVACHECK with the Java IDE,
and performed some initial experiments showing promising
results.</p>
      <p>In the future, we want to improve tooling, e.g., enhancing
the reporting facility. We would also like to extend the
expressiveness of the language to consider the analysis of method
bodies, which currently cannot be analysed. Finally, we are
considering adding the possibility to define quick-fixes, to be
fired when some rule fails.</p>
    </sec>
    <sec id="sec-8">
      <title>ACKNOWLEDGEMENTS.</title>
      <p>Work funded by the Spanish MINECO (TIN2014-52129-R)
and the R&amp;D programme of Madrid (S2013/ICE-3006).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>H.</given-names>
            <surname>Brunelie</surname>
          </string-name>
          `re, J. Cabot, G. Dupe´, and
          <string-name>
            <given-names>F.</given-names>
            <surname>Madiot</surname>
          </string-name>
          .
          <article-title>Modisco: A model driven reverse engineering framework</article-title>
          .
          <source>Information &amp; Software Technology</source>
          ,
          <volume>56</volume>
          (
          <issue>8</issue>
          ):
          <fpage>1012</fpage>
          -
          <lpage>1032</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>[2] CheckStyle. http://checkstyle.sourceforge.net/.</mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>[3] FindBugs. http://findbugs.sourceforge.net//.</mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>F. A.</given-names>
            <surname>Fontana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Braione</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Zanoni</surname>
          </string-name>
          .
          <article-title>Automatic detection of bad smells in code: An experimental assessment</article-title>
          .
          <source>Journal of Object Technology</source>
          ,
          <volume>11</volume>
          (
          <issue>2</issue>
          ):5:
          <fpage>1</fpage>
          -
          <lpage>38</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Fowler</surname>
          </string-name>
          . Refactoring - Improving
          <source>the Design of Existing Code. Addison-Wesley</source>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Garc</surname>
          </string-name>
          <article-title>´ıa-Dom´ınguez and D. S. Kolovos. Models from code, or code as models?</article-title>
          <source>In Proc. OCL@MODELS</source>
          , volume
          <volume>1756</volume>
          <source>of CEUR Workshop Proceedings</source>
          , pages
          <fpage>137</fpage>
          -
          <lpage>148</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>E.</given-names>
            <surname>Hajiyev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Verbaere</surname>
          </string-name>
          , O. de Moor, and
          <string-name>
            <given-names>K. D.</given-names>
            <surname>Volder</surname>
          </string-name>
          .
          <article-title>Codequest: querying source code with datalog</article-title>
          .
          <source>In Proc. OOPSLA</source>
          <year>2005</year>
          , pages
          <fpage>102</fpage>
          -
          <lpage>103</lpage>
          . ACM,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>[8] PMD. https://pmd.github.io/.</mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D. D.</given-names>
            <surname>Ruscio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Kolovos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Korkontzelos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Matragkas</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Vinju</surname>
          </string-name>
          .
          <article-title>Supporting custom quality models to analyse and compare opensource software</article-title>
          .
          <source>In Proc. QUATIC</source>
          <year>2016</year>
          , pages
          <fpage>94</fpage>
          -
          <lpage>99</lpage>
          . IEEE Computer Society,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Scheidgen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Fischer</surname>
          </string-name>
          .
          <article-title>Creating and analyzing source code repository models - A model-based approach to mining software repositories</article-title>
          .
          <source>In Proc. MODELSWARD</source>
          , pages
          <fpage>329</fpage>
          -
          <lpage>336</lpage>
          . SciTePress,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Semmle</surname>
          </string-name>
          . https://semmle.com/products/semmle-ql/.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>[12] SonarQube. https://www.sonarqube.org/.</mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>[13] XPath. https://www.w3schools.com/xml/xpath_intro.asp.</mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>