=Paper=
{{Paper
|id=Vol-2019/poster_5
|storemode=property
|title=JAVACHECK: A Domain Specific Language for the Static Analysis of Java Code
|pdfUrl=https://ceur-ws.org/Vol-2019/posters_5.pdf
|volume=Vol-2019
|authors=Sara Pérez-Soler,Juan de Lara
|dblpUrl=https://dblp.org/rec/conf/models/Perez-SolerL17
}}
==JAVACHECK: A Domain Specific Language for the Static Analysis of Java Code==
J AVAC HECK : A Domain Specific Language for the static analysis of Java code Sara Pérez-Soler, Juan de Lara Modelling & Software Engineering Research Group http://miso.es Computer Science Department Universidad Autónoma de Madrid (Spain) e-mail: {sara.perezs, juan.delara}@uam.es Abstract—The increasing complexity of software systems has JavaCheck raised the need for code analysis tools to assess its quality. MM However, these tools offer predefined metrics or evaluation criteria, which are frequently hard to extend or modify. «conforms to» For this purpose, we have developed JAVAC HECK, a Domain- JavaCheck Specific Language targeted to define expected properties of Java rules code bases. JAVAC HECK can be used in a variety of scenarios Quality related to quality assurance: to define expected code styles (e.g., engineer code naming conventions), specify programming conventions (e.g., generator private attributes), detect code smells possibly indicating errors (e.g., equals method with no hashCode), and detect patterns (e.g., uses of Singleton) or requirements demanded in a project (e.g., Rules’ Report AST a class with name synonym to “Professor”). Logic Index Terms—Domain-Specific Languages, Source code anal- ysis, Quality JavaCheck Runtime Java project wordreference Java runnable I. I NTRODUCTION Fig. 1: Overview of our approach Software projects are increasing their complexity and size to address the requirements of today’s systems. Software is typically developed by (sometimes large) teams of program- II. A PPROACH mers with dissimilar skills. Hence, it is common practice to Fig 1 shows the general architecture of our approach. We use tools to check code quality or help in enforcing company have created a DSL called JAVAC HECK, which can be used to or project code standards [2], [12]. However, sometimes these define predicates that Java projects should fulfil. tools are rigid, or difficult to adapt and extend. Predicates can be used to express general quality properties To improve this situation, we have created a Domain Spe- (e.g., ensure all attributes of a class are private), accepted cific Language (DSL) called JAVAC HECK. The language permits Java style guidelines (e.g., class names in upper camel case, expressing predicates to be evaluated over the source code constant names in uppercase), project-specific guidelines (e.g., bases of Java projects. The DSL is flexible and allows the maximum number of classes in a package), application- expression of style and programming conventions, can be specific checkings (there should be a class with a name used to search for occurrences of programming idioms and synonym to “Machine”), or smells of possible errors (a class patterns, and to express code smells [5] possibly indicating redefining method equals, but not hashcode). JAVAC HECK has a some potential problem. JAVAC HECK is connected with services textual syntax and has been defined through a meta-model. to detect synonyms in several languages, which permits its The predicates expressed with JAVAC HECK are compiled into use to specify expected domain requirements (e.g., to partially Java. This Java code uses a library we have built, which automate the correction of programming exercises). The DSL offers services to parse Java code into an Abstract Syntax Tree has been created using Model-based technology (EMF and (AST), or to issue queries on Wordreference1 , to obtain lists Xtext) and is integrated within Eclipse. Hence, it shows the of synonyms in both English and Spanish. potential of Model-driven engineering in the programming Fig 2 shows a small part of our meta-model. RuleSet is the domain. root class, and contains a list of project names to be checked Paper organization. Sec. II overviews our approach, explain- and a set of sentences to check on them. There are two types ing its different parts. Sec. III describes tool support and some of sentences: the rules that will be evaluated, and intermediate initial experiments. Sec. IV compares with related work and Sec. V finishes with the conclusions and future work. 1 http://www.wordreference.com/ variables to store collections of elements that have some that are not static and final (i.e., all attributes that are not properties. All sentences have a type element, that can be constant), must be private or protected. The second rule (lines File, Package, Interface, Class, Enum, Method or Attribute and a 7) states that every method must have a JavaDoc comment with clause that needs to be satisfied. The satisfy clause contains all @parameter and @return tags. The third rule (lines 9-11) checks properties that the element must comply with. The rules have that the project has one class named User or a synonym (in a quantifier (all, exist or one) and a filter, with same structure English), and this class must have an attribute named address. as the satisfy clause. The rules can also reference variables. For this rule WordReference is used to obtain the synonyms. Listing 1 shows some simple JAVAC HECK sentences. The first Finally, the last rule checks that all abstract classes have some sentence collects all methods named equals with a parameter children class. of type Object and return type boolean, in a collection variable 1 Projects Name: ∗; named Equals. The second collection named HashCode contains 2 3 all Attribute all methods named hashCode, without parameters and integer 4 which is not modified with [static and final] return type. Lines 5-6 show a rule that checks that all classes 5 satisfy is modified with [private or protected]; 6 with one method in the Equals collection also have one method 7 all Method satisfy JavaDoc @parameter @return; in HashCode collection. Overall, occurrences of this rule may 8 9 one Class satisfy name like ‘‘User”, English and have { signal potential problems in the Java code. 10 one Attribute satisfy name=‘‘address” 11 }; 1 Equals: Method satisfy name=”equals” and return type=Primitive.boolean 12 and 13 all Class which is modified with [abstract] satisfy is superclass; 2 parameter size=1 types=[”Object”]; 3 HashCode: Method satisfy name=”hashCode” and return type=Primitive.int Listing 3: Example JAVAC HECK program and parameter size=0; 4 5 all Class which have { one Method in Equals } 1 all class which modifiers: [ (abstract) ] satisfy is superclass [1..∗] 6 satisfy have { one Method in HashCode }; 2 Checked.....ERROR Listing 1: JAVAC HECK program to detect classes with equals 3 PASS: 4 These elements do not satisfy is superclass [1..∗]: but no hashCode 5 − In file D:\Workspace\Evaluate\src\abstractClass\Plane.java the class Plane (line: 3) 6 J AVAC HECK is evaluated over the AST of Java code. The 7 FAIL: AST is a tree representation of the syntactic structure of source 8 These elements satisfy is superclass [1..∗]: code. We have programmed a library to create and explore the 9 − In file D:\Workspace\Evaluate\src\abstractClass\Element.java the class Element (line: 3) AST, and evaluate all the sentences. The library defines all 10 is super of: the functionality of the static analyser, and the code generator 11 In fileline: D:\Workspace\Evaluate\src\restOfClass\Point.java the class Point ( 5) needs only to synthesize code for the specific sentences by Listing 4: Example of JAVAC HECK report calling the library. When all sentences are generated, they can be evaluated scanning all nodes of the AST and checking the Listing 4 shows the JAVAC HECK report produced by the last rule properties. of Listing 3. Currently, the report is a text file showing if the 1 all Class satisfy name type= upper camel case; rule is met or not (in this case it is not), and then listing all Listing 2: JAVAC HECK naming convention rule the elements that pass and that do not pass the rule. Listing 2 shows another example to check a naming con- A. Experimentation vention for classes. In particular, it checks that all class names We present two preliminary experiments with JAVAC HECK. are written in upper camel case. In this example the analyser The first one is directed to assess expressivity, and usefulness first obtains all class declaration nodes in the ASTs of the to detect problems in the code. The second one is directed to project, and then checks that all names of these nodes are in check its scalability. upper camel case. In the first experiment we use JAVAC HECK as a way to semi- automatically assess student projects related to the creation III. T OOL SUPPORT AND EXPERIMENTATION of an information system for an antiquarian. We used three We have created an Eclipse plugin for JAVAC HECK using types of rules for validation: style, programming and domain- Xtext and Xtend. The DSL allows us to express characteristics specific. Some style rules included: all files have only a class, of Java programs, and then reports the result of the analysis. an interface or an enumeration; every file has less than 2000 Listing 3 shows a example of JAVAC HECK file with 4 rules. lines of code (LOC); methods’ bodies should be less than 30 First, the project or projects to be analysed should be indicated LOC; every attribute that is not constant must be written in (line 1). All projects to be analysed are required to be in the lower camel case, while constant attributes must be in upper Eclipse workspace. A “*” in the name makes JAVAC HECK take case; the enumerations, classes and interfaces names must all projects in the workspace. be in upper camel case; every class, method, interface and Following the project name, a JAVAC HECK program contains enumeration must have a JavaDoc comment; every package the sentences to be checked. In the listing we show some must have a Java file (i.e., must not be empty), among others examples. The first rule (lines 3-5) checks that all attributes (a total of 15 rules). Fig. 2: Meta-model of JAVAC HECK (excerpt) Programming rules included: every abstract class must have J AVAC HECK was useful for our purpose. some children; all interfaces must be extended by other To measure performance we run the previous 15 style interface or implemented by some class; every class that rules on a larger project, the org.eclipse.jdt.core of the library implements Comparable must override equals and hashCode org.eclipse.jdt.core of Eclipse. This project has 1,443 files methods and there is no method that returns a value of with 1,442 classes, 238 interfaces, 17 enumerations, 24.290 type Object. Regarding domain-specific rules, after reading the methods and 12.709 attributes. The check took 6 minutes requirements, we included these rules: there is one class called approximately, which we see as reasonable for large projects. Item or a synonym, which is abstract and public, is extended 3 or 4 times and has an identifier that is integer or long attribute, IV. R ELATED WORK a name or description, a date and a price. The project must also have three classes, with names synonym to Small, Bulky Model-Driven Engineering (MDE) has been used to solve and WorkOfArts, all extending Item. different problems in the programming domain, like reverse engineering [1], repository mining [10] or comparing open We were able to successfully encode these rules in source software using quality models [9]. Many times, the J AVAC HECK, hence showing good expressivity. Regarding use- code needs to be represented as a model, conforming to a fulness, we found several problems in the analysed code. The meta-model, so that it can be queried and processed using most prominent ones included, lack of JavaDoc comments, model management tools. However, this has the drawback of many methods over 30 LOC, abstract classes with no children, requiring too long pre-processing times [10]. To solve this seven methods returning Object and a class Item with no date. issue, in [6] the Epsilon model connectivity layer was extended As we were able to found these defects, we concluded that so that the Epsilon model management languages can be run on Java programs. Our approach goes in this direction, as [3] FindBugs. http://findbugs.sourceforge.net//. J AVAC HECK executes on Java ASTs. However, our approach is [4] F. A. Fontana, P. Braione, and M. Zanoni. Automatic detection of bad smells in code: An experimental assessment. Journal of Object more direct, as the generated code can directly access the AST Technology, 11(2):5: 1–38, 2012. with no need for an intermediate layer. Moreover JAVAC HECK [5] M. Fowler. Refactoring - Improving the Design of Existing Code. is a DSL specifically designed for querying Java ASTs. Addison-Wesley, 1999. [6] A. Garcı́a-Domı́nguez and D. S. Kolovos. Models from code, or code Regarding code analysis tools, there are two main types: as models? In Proc. OCL@MODELS, volume 1756 of CEUR Workshop static and dynamic. The first ones analyse the code without Proceedings, pages 137–148, 2016. running the program, and can be used in the earliest phases [7] E. Hajiyev, M. Verbaere, O. de Moor, and K. D. Volder. Codequest: querying source code with datalog. In Proc. OOPSLA 2005, pages 102– of programming. The second ones, typically testing tools, can 103. ACM, 2005. be used at the end of the coding. [8] PMD. https://pmd.github.io/. There are many tools able to assess code quality in a static [9] D. D. Ruscio, D. S. Kolovos, Y. Korkontzelos, N. Matragkas, and J. J. Vinju. Supporting custom quality models to analyse and compare open- way [4]. PMD [8] works over Java and other languages, and source software. In Proc. QUATIC 2016, pages 94–99. IEEE Computer can detect potential problems like empty try/catch/finally/switch Society, 2016. statements, dead code, overcomplicate expressions or duplicate [10] M. Scheidgen, M. Schmidt, and J. Fischer. Creating and analyzing source code repository models - A model-based approach to mining soft- code. New rules can be added by coding them in Java or ware repositories. In Proc. MODELSWARD, pages 329–336. SciTePress, XPath [13]. FindBugs [3] focuses on finding coding errors 2017. and supports only Java. It cannot be increased with new rules [11] Semmle. https://semmle.com/products/semmle-ql/. [12] SonarQube. https://www.sonarqube.org/. and works over Java bytecode. CheckStyle [2] focuses on [13] XPath. https://www.w3schools.com/xml/xpath_intro.asp. analysing style conventions. The tool is highly configurable with a XML file and allows creating new rules, coding them in Java and using ASTs. SonarQube [12] uses various static analysis tools like PMD, CheckStyle or FindBugs to obtain metrics to help improving the quality of the source code. It can show information about the architecture, design, duplicate code, programming rules, possible errors and their possible solutions. Finally, Semmle QL [11], [7] is a query language over code. This language is based on DataLog, so it needs to process the code first, to obtain a relational representation. In conclusion, with respect to MDE approaches to solve problems in the programming domain, JAVAC HECK can be exe- cuted on ASTs more directly, due to its compilation approach. With respect to existing source code analysis tools, JAVAC HECK is a DSL that permits a high customization of queries, with no need for low-level coding based on XPath or ASTs. V. C ONCLUSIONS AND FUTURE WORK We have presented JAVAC HECK, a DSL for expressing rules to be checked on Java projects. We have built an Eclipse plugin which permits the integration of JAVAC HECK with the Java IDE, and performed some initial experiments showing promising results. In the future, we want to improve tooling, e.g., enhancing the reporting facility. We would also like to extend the expres- siveness of the language to consider the analysis of method bodies, which currently cannot be analysed. Finally, we are considering adding the possibility to define quick-fixes, to be fired when some rule fails. ACKNOWLEDGEMENTS . Work funded by the Spanish MINECO (TIN2014-52129-R) and the R&D programme of Madrid (S2013/ICE-3006). R EFERENCES [1] H. Brunelière, J. Cabot, G. Dupé, and F. Madiot. Modisco: A model driven reverse engineering framework. Information & Software Tech- nology, 56(8):1012–1032, 2014. [2] CheckStyle. http://checkstyle.sourceforge.net/.