=Paper= {{Paper |id=Vol-2019/poster_5 |storemode=property |title=JAVACHECK: A Domain Specific Language for the Static Analysis of Java Code |pdfUrl=https://ceur-ws.org/Vol-2019/posters_5.pdf |volume=Vol-2019 |authors=Sara Pérez-Soler,Juan de Lara |dblpUrl=https://dblp.org/rec/conf/models/Perez-SolerL17 }} ==JAVACHECK: A Domain Specific Language for the Static Analysis of Java Code== https://ceur-ws.org/Vol-2019/posters_5.pdf
    J AVAC HECK : A Domain Specific Language for
           the static analysis of Java code
                                                  Sara Pérez-Soler, Juan de Lara
                                        Modelling & Software Engineering Research Group
                                                          http://miso.es
                                                 Computer Science Department
                                           Universidad Autónoma de Madrid (Spain)
                                           e-mail: {sara.perezs, juan.delara}@uam.es


   Abstract—The increasing complexity of software systems has                                        JavaCheck
raised the need for code analysis tools to assess its quality.                                          MM
However, these tools offer predefined metrics or evaluation
criteria, which are frequently hard to extend or modify.                                         «conforms to»
   For this purpose, we have developed JAVAC HECK, a Domain-                                         JavaCheck
Specific Language targeted to define expected properties of Java                                        rules
code bases. JAVAC HECK can be used in a variety of scenarios
                                                                             Quality
related to quality assurance: to define expected code styles (e.g.,
                                                                            engineer                   code
naming conventions), specify programming conventions (e.g.,
                                                                                                     generator
private attributes), detect code smells possibly indicating errors
(e.g., equals method with no hashCode), and detect patterns (e.g.,
uses of Singleton) or requirements demanded in a project (e.g.,                                           Rules’
                                                                                        Report                     AST
a class with name synonym to “Professor”).                                                                Logic
   Index Terms—Domain-Specific Languages, Source code anal-
ysis, Quality                                                                                            JavaCheck
                                                                                                          Runtime         Java
                                                                                                                         project
                                                                                  wordreference      Java runnable
                       I. I NTRODUCTION
                                                                                       Fig. 1: Overview of our approach
   Software projects are increasing their complexity and size
to address the requirements of today’s systems. Software is
typically developed by (sometimes large) teams of program-                                        II. A PPROACH
mers with dissimilar skills. Hence, it is common practice to             Fig 1 shows the general architecture of our approach. We
use tools to check code quality or help in enforcing company          have created a DSL called JAVAC HECK, which can be used to
or project code standards [2], [12]. However, sometimes these         define predicates that Java projects should fulfil.
tools are rigid, or difficult to adapt and extend.                       Predicates can be used to express general quality properties
   To improve this situation, we have created a Domain Spe-           (e.g., ensure all attributes of a class are private), accepted
cific Language (DSL) called JAVAC HECK. The language permits          Java style guidelines (e.g., class names in upper camel case,
expressing predicates to be evaluated over the source code            constant names in uppercase), project-specific guidelines (e.g.,
bases of Java projects. The DSL is flexible and allows the            maximum number of classes in a package), application-
expression of style and programming conventions, can be               specific checkings (there should be a class with a name
used to search for occurrences of programming idioms and              synonym to “Machine”), or smells of possible errors (a class
patterns, and to express code smells [5] possibly indicating          redefining method equals, but not hashcode). JAVAC HECK has a
some potential problem. JAVAC HECK is connected with services         textual syntax and has been defined through a meta-model.
to detect synonyms in several languages, which permits its               The predicates expressed with JAVAC HECK are compiled into
use to specify expected domain requirements (e.g., to partially       Java. This Java code uses a library we have built, which
automate the correction of programming exercises). The DSL            offers services to parse Java code into an Abstract Syntax Tree
has been created using Model-based technology (EMF and                (AST), or to issue queries on Wordreference1 , to obtain lists
Xtext) and is integrated within Eclipse. Hence, it shows the          of synonyms in both English and Spanish.
potential of Model-driven engineering in the programming                 Fig 2 shows a small part of our meta-model. RuleSet is the
domain.                                                               root class, and contains a list of project names to be checked
Paper organization. Sec. II overviews our approach, explain-          and a set of sentences to check on them. There are two types
ing its different parts. Sec. III describes tool support and some     of sentences: the rules that will be evaluated, and intermediate
initial experiments. Sec. IV compares with related work and
Sec. V finishes with the conclusions and future work.                   1 http://www.wordreference.com/
    variables to store collections of elements that have some that are not static and final (i.e., all attributes that are not
    properties. All sentences have a type element, that can be constant), must be private or protected. The second rule (lines
    File, Package, Interface, Class, Enum, Method or Attribute and a      7) states that every method must have a JavaDoc comment with
    clause that needs to be satisfied. The satisfy clause contains all @parameter and @return tags. The third rule (lines 9-11) checks
    properties that the element must comply with. The rules have that the project has one class named User or a synonym (in
    a quantifier (all, exist or one) and a filter, with same structure English), and this class must have an attribute named address.
    as the satisfy clause. The rules can also reference variables.        For this rule WordReference is used to obtain the synonyms.
       Listing 1 shows some simple JAVAC HECK sentences. The first Finally, the last rule checks that all abstract classes have some
    sentence collects all methods named equals with a parameter children class.
    of type Object and return type boolean, in a collection variable 1 Projects Name: ∗;
    named Equals. The second collection named HashCode contains 2
                                                                        3 all Attribute
    all methods named hashCode, without parameters and integer 4 which is not modified with [static and final]
    return type. Lines 5-6 show a rule that checks that all classes 5 satisfy is modified with [private or protected];
                                                                        6
    with one method in the Equals collection also have one method 7 all Method satisfy JavaDoc @parameter @return;
    in HashCode collection. Overall, occurrences of this rule may 8
                                                                        9 one Class satisfy name like ‘‘User”, English and have {
    signal potential problems in the Java code.                        10    one Attribute satisfy name=‘‘address”
                                                                              11    };
1    Equals: Method satisfy name=”equals” and return type=Primitive.boolean
                                                                              12
           and
                                                                              13    all Class which is modified with [abstract] satisfy is superclass;
2        parameter size=1 types=[”Object”];
3    HashCode: Method satisfy name=”hashCode” and return type=Primitive.int                     Listing 3: Example JAVAC HECK program
           and parameter size=0;
4
5    all Class which have { one Method in Equals }                             1    all class which modifiers: [ (abstract) ] satisfy is superclass [1..∗]
6    satisfy have { one Method in HashCode };                                  2    Checked.....ERROR
    Listing 1: JAVAC HECK program to detect classes with equals                3    PASS:
                                                                               4     These elements do not satisfy is superclass [1..∗]:
    but no hashCode                                                            5     − In file D:\Workspace\Evaluate\src\abstractClass\Plane.java the class
                                                                                            Plane (line: 3)
                                                                               6
      J AVAC HECK is evaluated over the AST of Java code. The                  7    FAIL:
    AST is a tree representation of the syntactic structure of source 8 These elements satisfy is superclass [1..∗]:
    code. We have programmed a library to create and explore the 9 − In file       D:\Workspace\Evaluate\src\abstractClass\Element.java the class
                                                                              Element (line: 3)
    AST, and evaluate all the sentences. The library defines all 10 is super of:
    the functionality of the static analyser, and the code generator 11 In fileline:
                                                                                  D:\Workspace\Evaluate\src\restOfClass\Point.java the class Point (
                                                                                     5)
    needs only to synthesize code for the specific sentences by
                                                                                      Listing 4: Example of JAVAC HECK report
    calling the library. When all sentences are generated, they can
    be evaluated scanning all nodes of the AST and checking the Listing 4 shows the JAVAC HECK report produced by the last rule
    properties.                                                        of Listing 3. Currently, the report is a text file showing if the
1    all Class satisfy name type= upper camel case;                                rule is met or not (in this case it is not), and then listing all
             Listing 2: JAVAC HECK naming convention rule                          the elements that pass and that do not pass the rule.

       Listing 2 shows another example to check a naming con-                      A. Experimentation
    vention for classes. In particular, it checks that all class names                We present two preliminary experiments with JAVAC HECK.
    are written in upper camel case. In this example the analyser                  The first one is directed to assess expressivity, and usefulness
    first obtains all class declaration nodes in the ASTs of the                   to detect problems in the code. The second one is directed to
    project, and then checks that all names of these nodes are in                  check its scalability.
    upper camel case.                                                                 In the first experiment we use JAVAC HECK as a way to semi-
                                                                                   automatically assess student projects related to the creation
             III. T OOL SUPPORT AND EXPERIMENTATION                                of an information system for an antiquarian. We used three
       We have created an Eclipse plugin for JAVAC HECK using                      types of rules for validation: style, programming and domain-
    Xtext and Xtend. The DSL allows us to express characteristics                  specific. Some style rules included: all files have only a class,
    of Java programs, and then reports the result of the analysis.                 an interface or an enumeration; every file has less than 2000
       Listing 3 shows a example of JAVAC HECK file with 4 rules.                  lines of code (LOC); methods’ bodies should be less than 30
    First, the project or projects to be analysed should be indicated              LOC; every attribute that is not constant must be written in
    (line 1). All projects to be analysed are required to be in the                lower camel case, while constant attributes must be in upper
    Eclipse workspace. A “*” in the name makes JAVAC HECK take                     case; the enumerations, classes and interfaces names must
    all projects in the workspace.                                                 be in upper camel case; every class, method, interface and
       Following the project name, a JAVAC HECK program contains                   enumeration must have a JavaDoc comment; every package
    the sentences to be checked. In the listing we show some                       must have a Java file (i.e., must not be empty), among others
    examples. The first rule (lines 3-5) checks that all attributes                (a total of 15 rules).
                                             Fig. 2: Meta-model of JAVAC HECK (excerpt)


   Programming rules included: every abstract class must have         J AVAC HECK was useful for our purpose.
some children; all interfaces must be extended by other                  To measure performance we run the previous 15 style
interface or implemented by some class; every class that              rules on a larger project, the org.eclipse.jdt.core of the library
implements Comparable must override equals and hashCode               org.eclipse.jdt.core of Eclipse. This project has 1,443 files
methods and there is no method that returns a value of                with 1,442 classes, 238 interfaces, 17 enumerations, 24.290
type Object. Regarding domain-specific rules, after reading the       methods and 12.709 attributes. The check took 6 minutes
requirements, we included these rules: there is one class called      approximately, which we see as reasonable for large projects.
Item or a synonym, which is abstract and public, is extended 3
or 4 times and has an identifier that is integer or long attribute,                        IV. R ELATED WORK
a name or description, a date and a price. The project must
also have three classes, with names synonym to Small, Bulky              Model-Driven Engineering (MDE) has been used to solve
and WorkOfArts, all extending Item.                                   different problems in the programming domain, like reverse
                                                                      engineering [1], repository mining [10] or comparing open
  We were able to successfully encode these rules in                  source software using quality models [9]. Many times, the
J AVAC HECK, hence showing good expressivity. Regarding use-          code needs to be represented as a model, conforming to a
fulness, we found several problems in the analysed code. The          meta-model, so that it can be queried and processed using
most prominent ones included, lack of JavaDoc comments,               model management tools. However, this has the drawback of
many methods over 30 LOC, abstract classes with no children,          requiring too long pre-processing times [10]. To solve this
seven methods returning Object and a class Item with no date.         issue, in [6] the Epsilon model connectivity layer was extended
As we were able to found these defects, we concluded that             so that the Epsilon model management languages can be run
on Java programs. Our approach goes in this direction, as                   [3] FindBugs. http://findbugs.sourceforge.net//.
J AVAC HECK executes on Java ASTs. However, our approach is                 [4] F. A. Fontana, P. Braione, and M. Zanoni. Automatic detection of
                                                                                bad smells in code: An experimental assessment. Journal of Object
more direct, as the generated code can directly access the AST                  Technology, 11(2):5: 1–38, 2012.
with no need for an intermediate layer. Moreover JAVAC HECK                 [5] M. Fowler. Refactoring - Improving the Design of Existing Code.
is a DSL specifically designed for querying Java ASTs.                          Addison-Wesley, 1999.
                                                                            [6] A. Garcı́a-Domı́nguez and D. S. Kolovos. Models from code, or code
   Regarding code analysis tools, there are two main types:                     as models? In Proc. OCL@MODELS, volume 1756 of CEUR Workshop
static and dynamic. The first ones analyse the code without                     Proceedings, pages 137–148, 2016.
running the program, and can be used in the earliest phases                 [7] E. Hajiyev, M. Verbaere, O. de Moor, and K. D. Volder. Codequest:
                                                                                querying source code with datalog. In Proc. OOPSLA 2005, pages 102–
of programming. The second ones, typically testing tools, can                   103. ACM, 2005.
be used at the end of the coding.                                           [8] PMD. https://pmd.github.io/.
   There are many tools able to assess code quality in a static             [9] D. D. Ruscio, D. S. Kolovos, Y. Korkontzelos, N. Matragkas, and J. J.
                                                                                Vinju. Supporting custom quality models to analyse and compare open-
way [4]. PMD [8] works over Java and other languages, and                       source software. In Proc. QUATIC 2016, pages 94–99. IEEE Computer
can detect potential problems like empty try/catch/finally/switch               Society, 2016.
statements, dead code, overcomplicate expressions or duplicate             [10] M. Scheidgen, M. Schmidt, and J. Fischer. Creating and analyzing
                                                                                source code repository models - A model-based approach to mining soft-
code. New rules can be added by coding them in Java or                          ware repositories. In Proc. MODELSWARD, pages 329–336. SciTePress,
XPath [13]. FindBugs [3] focuses on finding coding errors                       2017.
and supports only Java. It cannot be increased with new rules              [11] Semmle. https://semmle.com/products/semmle-ql/.
                                                                           [12] SonarQube. https://www.sonarqube.org/.
and works over Java bytecode. CheckStyle [2] focuses on                    [13] XPath. https://www.w3schools.com/xml/xpath_intro.asp.
analysing style conventions. The tool is highly configurable
with a XML file and allows creating new rules, coding them
in Java and using ASTs. SonarQube [12] uses various static
analysis tools like PMD, CheckStyle or FindBugs to obtain
metrics to help improving the quality of the source code. It
can show information about the architecture, design, duplicate
code, programming rules, possible errors and their possible
solutions. Finally, Semmle QL [11], [7] is a query language
over code. This language is based on DataLog, so it needs to
process the code first, to obtain a relational representation.
   In conclusion, with respect to MDE approaches to solve
problems in the programming domain, JAVAC HECK can be exe-
cuted on ASTs more directly, due to its compilation approach.
With respect to existing source code analysis tools, JAVAC HECK
is a DSL that permits a high customization of queries, with
no need for low-level coding based on XPath or ASTs.
            V. C ONCLUSIONS AND FUTURE WORK
   We have presented JAVAC HECK, a DSL for expressing rules to
be checked on Java projects. We have built an Eclipse plugin
which permits the integration of JAVAC HECK with the Java IDE,
and performed some initial experiments showing promising
results.
   In the future, we want to improve tooling, e.g., enhancing
the reporting facility. We would also like to extend the expres-
siveness of the language to consider the analysis of method
bodies, which currently cannot be analysed. Finally, we are
considering adding the possibility to define quick-fixes, to be
fired when some rule fails.
                     ACKNOWLEDGEMENTS .
  Work funded by the Spanish MINECO (TIN2014-52129-R)
and the R&D programme of Madrid (S2013/ICE-3006).
                           R EFERENCES
 [1] H. Brunelière, J. Cabot, G. Dupé, and F. Madiot. Modisco: A model
     driven reverse engineering framework. Information & Software Tech-
     nology, 56(8):1012–1032, 2014.
 [2] CheckStyle. http://checkstyle.sourceforge.net/.