18 Comparison of Software Structures in Java and Erlang Programming Languages ANA VRANKOVIĆ, TIHANA GALINAC GRBAC, University of Rijeka, Faculty of Engineering MELINDA TÓTH, ELTE Eötvös Loránd University, Budapest, Hungary Empirical studies on fault behaviour in evolving complex software systems have shown that communication structures among the software entities such as classes, modules, software units and communications among them, is signicantly aecting the system fault behaviour. Therefore, we were motivated to further investigate software structures. One interesting question is to investigate software structures from software products written in dierent programming languages. In this work we present our preliminary study for which we developed tools to examine software structures of software products written in Java and Erlang programming language. We provide details on how we extract software structure from software product and provide preliminary results analyzing four Erlang software products and four Java software products. Categories and Subject Descriptors: H.2.11 [Software Engineering]: Software Architectures—Languages General Terms: Software structure Additional Key Words and Phrases: Network graph, Subgraphs 1. INTRODUCTION Today, network analysis is used in many scientific fields. It has proven to be useful in numerous prob- lems. In medicine, physics, sociology, electrical engineering, it helped solve diverse issues. In computer science it is used as a tool to understand software behaviour by structuring software dependencies as a network graph. Milo in [Milo et al. 2002] proposed network motifs as patterns in complex networks with higher appearance than in random networks and elaborate its purpose as hidden structural prop- erty that can be used in characterizing various complex networks. In our previous study, we analyzed the software structure using network analysis on software written in the Java programming language. For the purpose of network graph extraction we developed the tool rFind [Petric et al. 2014a; Petric and Grbac 2014]. By extracting software graph structures we identified structural changes during the releases of evolving Eclipse software product. In an aim to better understand software behaviour in terms of structural changes, and influence of programming language on obtained conclusions we want to expand our study to other programming languages. In this study, we present preliminary study on the results obtained by analysing software structures in products implemented in Erlang. The subject of this work is to explore whether the programming language has any influence on software structure This work has been supported in part by Croatian Science Foundations funding of the project UIP-2014-09-7945 and by the University of Rijeka Research Grant 13.09.2.2.16 and by the Hungarian Government through the New National Excellence Program of the Ministry of Human Capacities. Author’s address: A. Vranković, Faculty of engineering, Vukovarska 58, 51000 Rijeka, Croatia; email: avrankovic@riteh.hr; T. Galinac Grbac, Faculty of engineering, Vukovarska 58, 51000 Rijeka, Croatia; email: tgalinac@riteh.hr; M. Tóth, Faculty if Informatics, Pázmány Péter sétány 1/C, 1117, Budapest, Hungary; email: tothmelinda@elte.hu Copyright c by the paper’s authors. Copying permitted only for private and academic purposes. In: Z. Budimac (ed.): Proceedings of the SQAMIA 2017: 6th Workshop of Software Quality, Analysis, Monitoring, Improvement, and Applications, Belgrade, Serbia, 11-13.9.2017, Also published online by CEUR Workshop Proceedings (http://ceur-ws.org, ISSN 1613-0073) 18:2 • Ana Vranković et al. in terms of network graphs. Like we did in our previous work [Milo et al. 2002; Petric and Grbac 2014; Petric et al. 2014a], we used thirteen subgraph types to study software structure, as presented in Fig- ure 1. These subgraphs cover all three-node connections. All subgraph types present directed graphs where each node is a class/module and every edge is a connection between them. This preliminary study is based on simple comparison of subgraph counts present in software structures obtained from different Erlang and Java software products. Fig. 1. Subgraph types The rest of the paper is organized as follows. At first, in Section 2 is description of background, in Section 3 we present the used tools to extract the software structures in different programming languages. Then in Section 4 we present the preliminary results obtained by simple comparison of results obtained for Erlang and Java code. Finally, in Section 6 we conclude the paper. 2. BACKGROUND To analyze software structure we can define different types of structures, modules, classes and different types of software unites. Graph theory is a field of study that looks into the formal description and analysis of graphs [Bullmore and Sporns 2009]. Part of graph theory study are also complex networks, graphs that are based on real world networks, they are discussed in [Simon 1991]. Analyzing system using network graphs and complex networks has been used in many scientific fields for a long time: in medicine, for protein analysis [Aristóteles Góes-Netoa and et al. 2010], in logistics [Carlos PaisMontes and Laxe 2013], crime analyses [Colladon and Remondi 2017], electrical system analyses [Alexandre P. Alves da Silva and Souza 2012] and many more. In computer science there has been a few ideas of using complex networks as a tool to better understand the software behavior. In [S.Jenkins and S.R.Kirk 2007] software architecture graphs were presented as a complex networks using Java written applications. Interesting finding was that as the software ages, more out-going calls than incoming calls are present. In paper [Chong and Lee 2015] complex networks are used as a tool for analyzing the complexity of software system based on object oriented approach. They used a weighted complex network on a system to help them understand its maintainability and reliability. They also managed to identify violations of common software design principals. In [Luis G. Moyanoa and Vargas 2011] the community structure of a real complex software network is explored. The results of this paper shows a significant dependence between community structure and internal dynamical processes. Relationships between Erlang processes have been discussed in [Bozó and Tóth 2016]. No work has been found on comparison between the Java and Erlang software structure. Since Java and Erlang are not similar in paradigm and are not usually used in the similar products, the comparison between them is not often explored. Therefore we wanted to compare them because of their differences to see if there is any variation in the way they communicate in terms of the subgraph type. Comparison of Software Structures in Java and Erlang Programming Languages • 18:3 3. TOOLS In this work we used four different tools. For analyzing Java written applications we used the tool rFind [Petric and Grbac 2014; Petric et al. 2014a]. The input of rFind is an application code written in Java. As an output we receive two files that allow us to see all calls between classes. One of those files is .classlist where a list of all classes is displayed using class ids for easier reading. Each class represents a node in the network. The other file is a .graph where all connections between class ids are presented. Every connection is viewed as an edge between nodes. For getting the same information from Erlang applications we used the tool RefactorErl [Bozó et al. 2011]. RefactorErl is an open source static source code analyser and transformer tool for Erlang. Refac- torErl supports dependency examination both on module and function level, and is able to present it as a graph to the user. The input of the tool were applications written in Erlang, and it was able to produce a textual representation of dependencies as an output. Although the presentation of the result was quite different from the output of rFind, but the main idea was the same: present communication between modules. The SuBuCo tool [Petric et al. 2014b] is an application that expects an Rfind output as input: the .classlist and .graph. Then it searches for three-node subgraph structures inside .graph file. Its output is a list of all subgraphs that appear in the given code. The file created contains a list of all subgraphs separated by subgraph type and ids of every class/module contained in specific subgraph. Fig. 2. Subgraph analysis process graph Since the output of RefactorErl was not in the form for SuBuCo analysis, we wrote a parser to adjust the result so that it also contains .classlist and .graph. The parser was implemented in Java where the input files were files gathered from RefactorErl and the output files were the two needed files. The whole analysis process can be seen in Figure 2. 4. RESULTS Our tests were conducted on four different software implemented in the Erlang programming language and four written in Java. The analysed Erlang software are: Mnesia for distributed telecommunica- tions database; Dialyzer that allows static analysis for identifying software discrepancies; Cowboy which is a http server for Erlang/OTP; and RabbitMQ server that runs a multi-protocol messaging 18:4 • Ana Vranković et al. broker. The former two are part of the standard Erlang/OPT distribution, the latter two applications were taken from open git repositories. For analyzing software written in Java we used Java Devel- opment Kit (JDT) and Plug-in Development Environment (PDE) projects from Eclipse project, Open Microscopy Environment that is an open-source software and data format standards for the storage and manipulation of biological microscopy data, and Ultimate Android, development framework, from git repository. Table I. Tested data ERLANG PROJECT NUMBER OF NODES NUMBER OF EDGES LOC Mnesia 1914 6092 21417 Cowboy 510 948 4966 Dialyzer 1380 4089 14757 RabbitMq 3416 6648 23472 JAVA PROJECT NUMBER OF NODES NUMBER OF EDGES LOC OpenMicroscopy 3127 10775 438107 Ultimate Android 1893 9286 224442 JDT 3202 16923 606767 PDE 2542 9834 333390 Table II. Mnesia Results Table III. Dialyzer Results ID Appearance Percentage ID Appearance Percentage 36 40999 67.24% 6 24398 51.0664% 6 13053 21.407% 36 16466 34.4643% 12 5994 9.83% 12 5207 10.8986% 38 705 1.15623% 14 926 1.9382% 14 177 0.2903% 38 588 1.2307% 74 36 0.059% 74 87 0.1821% 46 6 0.00984% 46 52 0.10884% 98 2 0.00328% 78 33 0.0691% 78 2 0.00328% 98 14 0.0293% 102 0 0% 102 4 0.00837% 238 0 0% 108 2 0.004186% 110 0 0% 238 0 0% 108 0 0% 110 0 0% The number of edges and nodes for each tested software can be seen in Table I. Number of edges seems to be much larger in Java software, even where number of nodes is lesser then in Erlang soft- ware. In examples where the number of nodes are similar, Mnesia and Ultimate Android applications, number of edges is still much greater in Java application than in Erlang. Communication is far more common in Java written software. We can see that in all tested applications number of edges grows with the number of nodes. Subgraph ids discussed in this section are referring to the network subgraphs in Figure 1. In three out of four applications in Erlang subgraph with id 36 was the most common. The same subgraph id also was the most present in both Java projects and in projects gathered from git repositories. It seems that the communication in which multiple classes/modules heavily use one library is the most frequent one. Only one had different results, Dialyzer. We can see from the Tables II-V that in all Erlang applications subgraph ids 36,6 and 12 are the most common ones, most often in that exact order. Subgraph with id 6 presents communications where one node needs multiple resources from other nodes and the subgraph with id 12 could be the situation where communication flows from one node to the other and when the second node is triggered he calls for the third node. In Tables VI-IX. Comparison of Software Structures in Java and Erlang Programming Languages • 18:5 Table IV. RabbitMQ Results Table V. Cowboy Results ID Appearance Percentage ID Appearance Percentage 36 31870 58.0955% 36 1826 38.9755% 6 14671 26.7436% 6 1388 29.6265% 12 7759 14.144% 12 1228 26.2113% 38 474 0.86405% 38 156 3.32968% 14 66 0.12031% 14 58 1.237994% 74 12 0.021875% 74 15 0.32017% 46 3 0.0054687% 98 8 0.170758% 108 2 0.003646% 46 4 0.085379% 102 1 0.001823% 102 1 0.02135% 98 1 0.001823% 78 1 0.02135% 238 0 0% 238 0 0% 110 0 0% 110 0 0% 78 0 0% 108 0 0% we can see that Java projects behave similarly. In all projects it is the same order of frequency while in PDE id 38 is present more often than 12. Fig. 3. Pareto graphs for open-source Erlang projects In terms of subgraph id appearance, we can see that subgraphs with id 238 and 110 do not appear in any of Erlang application and neither in Java applications. Subgraph with ids 102 and 98 were found in Erlang application, but not in any of Java applications. We can see that applications written in Erlang and Java have similar behavior in terms of subgraph id appearance, even though Java software products are greater in class/module size. 18:6 • Ana Vranković et al. Table VI. Ultimate Android Table VII. Java Development Results Tool Results ID Appearance Percentage ID Appearance Percentage 36 463879 96.14989% 36 1349284 89.5352% 6 17338 3.5937105% 6 132736 8.808% 12 924 0.19152% 12 18535 1.2299% 38 273 0.0565857% 38 4469 0.29655% 14 30 0.00622% 74 952 0.0632% 74 6 0.001244% 14 951 0.0632% 46 4 0.00083% 78 45 0.00299% 102 0 0% 46 44 0.00292% 238 0 0% 102 0 0% 98 0 0% 238 0 0% 110 0 0% 98 0 0% 108 0 0% 110 0 0% 78 0 0% 108 0 0% Table VIII. OpenMicroscopy Table IX. Plug-in Development Results Environment Results ID Appearance Percentage ID Appearance Percentage 36 802077 92.1518% 36 1064393 96.2593884% 6 54724 6.2873% 6 36556 3.309% 12 10547 1.21176% 12 3671 0.33199% 38 2792 0.32078% 38 1042 0.09433% 14 167 0.01919% 14 81 0.007325% 74 45 0.00517% 74 11 0.0009948% 46 17 0.00195% 46 1 0.00009044% 108 14 0.00161% 102 0 0% 78 4 0.00046% 238 0 0% 102 0 0% 98 0 0% 238 0 0% 110 0 0% 98 0 0% 108 0 0% 110 0 0% 78 0 0% Looking at Pareto diagrams on Figures 3 and 4 we can see that there is significant growth only for subgraph types 36,6,12 and 38 in both Java projects and Erlang projects. 5. THREATS TO VALIDITY Data collection and analysis is possible on any code written in Erlang or Java. Erlang applications that were tested are server implementations, database and static analytic tool. Java software applications were frameworks for developing Java software and software for working with specific types of data. Software function is not the same in Erlang and Java applications. Since Erlang is a language used for scalable soft real-time systems and Java is general purpose programming language, comparing software applications written in each of them could not give us generalized conclusions. Comparing similar types of languages could be a better approach. 6. CONCLUSION In this study our main focus was to analyze code structure on software written in Erlang and compare it to the software written in Java. To do that we used several tools and combined them together to get the appropriate output that we can analyze. We represented class/module communication using thirteen subgraph types. Comparison of Software Structures in Java and Erlang Programming Languages • 18:7 Fig. 4. Pareto graphs for open-source Java projects In code written in Java, there was much larger number of communicating classes in comparison to communicating functions in Erlang code. Subgraph types 38,36,6,46,12,74 and 14 were present in all tested code. Types 108 and 78 were present in two software applications. Besides id 46, they are the only ones present that have the number of edges higher then 3. The one with the highest occurrence was subgraph type 36 in every tested application, followed by types 6 and 12. Subgraph ids 102,238,98 and 110 did not appear at all. There is no communication where more then four interactions between three classes are existent. Unlike in Java applications, in Erlang applications subgraph id 98 occurred in all tested applications and id 102 appeared in two tested applications. Subgraph with id 98 is the only one where communi- cation is circular, it starts and ends in the same node with just one interaction between each node. Just as in Java application, in Erlang applications ids 36,6 and 12 had the highest occurrence but in different percentage. While in Java applications subgraph id 36 occupied over 89% of all subgraphs, in Erlang the same id occupied between 58% and 68% while in Dialyzer it had appearance of only 34%. On tested java software, had a low appearance rate of under 10%. In Erlang applications result was different. Id 6 had a presence of around 20% in Mnesia and RabbitMQ. In Dialyzer, it had the largest number of appearance, 51%. We can see that in Java written code, subgraph id 36 occupies more then 90% of all the communica- tion while in Erlang code, ids 36 and 6 together occupy 80-90%. Based on the code analysis, we can conclude that although there is similar behavior between lan- guages, there are some differences. There are structures that appear in Erlang, but not in Java. Specifi- cally structures where there is more communication edges between modules and id 98 where communi- cation is circular. It is possible that those types are specific to that language. There is also a difference 18:8 • Ana Vranković et al. in percentage of the subgraph. While id 36 is in an extensive number of subgraph in Java, types of communication where one library is being heavily used by other classes, in Erlang that number is much lesser. We can see that the usage of libraries is greater in Java programs. There is also a big difference in number of communicating classes/modules. It seems that classes in Java programming language tend to communicate more often than Erlang modules. It is possible that those results are because of the fact that Java is an object oriented language and is based on object communication. In our future work we aim to do the analysis on code written in other programming and scripting languages, both functional and object oriented. Doing that we can come to the determinant conclusion in aspect of which subgraph types are specific for individual programming languages or applications. REFERENCES Antonio C.S. Lima Alexandre P. Alves da Silva and Suzana M. Souza. 2012. Fault location on transmission lines using complex- domain neural networks. Electrical Power and Energy Systems 43 (Dec. 2012), 720–727. https://doi.org/10.1016/j.ijepes.2012. 05.046 Marcelo V.C. Diniza Aristóteles Góes-Netoa and et al. 2010. Comparative protein analysis of the chitin metabolic pathway in extant organisms: A complex network approach. BioSystems 101, 1 (July 2010), 59–66. I. Bozó, D. Horpácsi, R. Kitlei, Z.n Horváth, J. Kőszegi, M. Tejfel, and M. Tóth. 2011. RefactorErl-source code analysis and refactoring in Erlang. In Proceeding of the12th Symposium on Programming Languages and Software Tools. Tallin, Estonia. István Bozó and Melinda Tóth. 2016. Analysing and Visualising Erlang Behaviours. AIP Conference Proceedings 1738 (June 2016). http://dx.doi.org/10.1063/1.4952023 E. Bullmore and O. Sporns. 2009. Complex brain networks: graph theoretical analysis of structural and functional systems. Nat Rev Nurosci 10 (April 2009), 186–198. DOI:http://dx.doi.org/10.1038/nrn2575 Maria Jesus Freire Seoane Carlos PaisMontes and Fernando Gonzalez Laxe. 2013. General cargo and containership emergent routes: Acomplex networks description. Transport Policy 24 (Nov. 2013), 126–140. https://doi.org/10.1016/j.tranpol.2012.06.022 Chun Yong Chong and Sai Peck Lee. 2015. Analyzing maintainability and reliability of object-oriented software using weighted complex network. Journal of Systems and Software 110 (Dec. 2015), 28–53. https://doi.org/10.1016/j.jss.2015.08.014 Andrea Fronzetti Colladon and Elisa Remondi. 2017. Using Social Network Analysis to Prevent Money Laundering. Expert Systems With Applications 67 (Jan. 2017), 49–58. https://doi.org/10.1016/j.eswa.2016.09.029 Mary Luz Mourontea Luis G. Moyanoa and Maria Luisa Vargas. 2011. Communities and dynamical processes in a complex software network. Physica A 390, 4 (Feb. 2011), 741–748. https://doi.org/10.1016/j.physa.2010.10.026 R. Milo, S. Shen-Orr, S. Itzkovitz, and et al. 2002. Network motifs: simple building blocks of complex networks. Science (Oct. 2002), 298:824–27. Jean Petric and Tihana Galinac Grbac. 2014. Software structure evolution and relation to system defectiveness. EASE (May 2014). DOI:http://dx.doi.org/10.1145/2601248.2601287 Jean Petric, Tihana Galinac Grbac, and Mario Dubravac. 2014a. Processing and Data Collection of Program Structures in Open Source Repositories. In Proceedings of the 3rd Workshop on Software Quality Analysis, Monitoring, Improvement and Applications (SQAMIA 2014), Lovran, Croatia, September 19-22, 2014. 57–66. J. Petric, T. Galinac Grbac, and M. Dubravac. 2014b. Software structure evolution and relation to system defectiveness. In Proceedings of SQAMIA 2014. Lovran,Croatia, 57–66. H. Simon. 1991. The Architecture of Complexity, in: Facets of Systems Science (1st ed.). Springer, Boston, MA, USA. S.Jenkins and S.R.Kirk. 2007. Software architecture graphs as complex networks: A novel partitioning scheme to measure stability and evolution. Information Sciences 177, 12 (June 2007), 2587–2601. https://doi.org/10.1016/j.ins.2007.01.021