<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Graph Clustering Evaluation Metrics as Software Metrics</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Miloš</forename><surname>Savić</surname></persName>
							<affiliation key="aff0">
								<orgName type="department" key="dep1">MILO Š SAVI Ć</orgName>
								<orgName type="department" key="dep2">MIRJANA IVANOVI Ć</orgName>
								<orgName type="institution">University of Novi Sad</orgName>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department" key="dep1">Department of Mathematics and Informatics</orgName>
								<orgName type="department" key="dep2">Faculty of Sciences</orgName>
								<orgName type="institution">University of Novi Sad</orgName>
								<address>
									<addrLine>Trg Dositeja Obradovića 4</addrLine>
									<postCode>21000</postCode>
									<settlement>Novi Sad</settlement>
									<country key="RS">Serbia</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Mirjana</forename><surname>Ivanović</surname></persName>
							<affiliation key="aff0">
								<orgName type="department" key="dep1">MILO Š SAVI Ć</orgName>
								<orgName type="department" key="dep2">MIRJANA IVANOVI Ć</orgName>
								<orgName type="institution">University of Novi Sad</orgName>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department" key="dep1">Department of Mathematics and Informatics</orgName>
								<orgName type="department" key="dep2">Faculty of Sciences</orgName>
								<orgName type="institution">University of Novi Sad</orgName>
								<address>
									<addrLine>Trg Dositeja Obradovića 4</addrLine>
									<postCode>21000</postCode>
									<settlement>Novi Sad</settlement>
									<country key="RS">Serbia</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Graph Clustering Evaluation Metrics as Software Metrics</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">A62E615A08B6146760EE150C5CCD9DB5</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T07:49+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>D.2.8 [Software engineering]: Metrics-Product metrics Measurement</term>
					<term>Theory cohesion</term>
					<term>clustering</term>
					<term>metrics</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Graph clustering evaluation (GCE) metrics quantify the quality of clusters obtained by graph clustering (community detection) algorithms. In this paper we argue that GCE metrics can be applied on graph representations of software systems in order to evaluate the degree of cohesiveness of software entities. In contrast to widely known cohesion measures used in software engineering, GCE metrics do not ignore external dependencies among software entities, but contrast them to internal dependencies to quantify cohesion. Using the theoretical framework of cohesion measurement in software engineering introduced by Briand et al. we investigate the properties of GCE metrics. Our analysis shows that GCE metrics are theoretically sound with respect to the monotonicity and merge property, but also reveals that they possess certain limitations whose importance is discussed in the paper. Finally, we propose a set of research questions for further empirical studies on this topic.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">INTRODUCTION</head><p>Graph clustering is one of the most important method in complex network analysis <ref type="bibr" target="#b1">[Boccaletti et al. 2006</ref>]. Identification of clusters (also known as communities or modules) in a network helps us to reduce the complexity of the network in order to be able to understand its underlying structure. Other applications of graph clustering techniques include parallel processing of graph based data, route planning, image segmentation and VLSI physical design <ref type="bibr" target="#b4">[Buluc ¸et al. 2013]</ref>, to mention a few. Intuitively speaking, a cluster in a network is a part of the network where connections among members of the cluster are much denser than with the rest of the network <ref type="bibr" target="#b6">[Fortunato 2010</ref>]. There is a variety of graph clustering (community detection) algorithms <ref type="bibr" target="#b13">[Schaeffer 2007</ref>]. Naturally, they are accompanied by graph clustering evaluation (GCE) metrics that quantify the quality of partitions produced by them <ref type="bibr" target="#b9">[Leskovec et al. 2010]</ref>.</p><p>"Low coupling, high cohesion" is one of the basic design principles in software engineering <ref type="bibr" target="#b14">[Yourdon and Constantine 1979]</ref>. This principle states that the coupling between modules of a software system has to be minimal as possible keeping at the same time strong relations between elements of each module. The main idea of this work is that highly cohesive modules that are loosely coupled to other modules can be viewed as clusters in a graph that encompasses software entities at lower levels of abstraction. Therefore, the aim of this paper is to investigate existing GCE metrics as metrics reflecting cohesion of software modules.</p><p>The rest of the paper is structured as follows. Related work is presented in Section 2. Graph clustering evaluation metrics explored in this work as software metrics are defined in the next section of the paper. Section 4 explains how GCE metrics can be applied to graph representation of software systems. Theoretical analysis of GCE metrics as software metrics is given in Section 5. The last section concludes the paper and presents research questions for our future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">RELATED WORK</head><p>Cohesion of a software module reflects how strongly related are the elements of the module. Perhaps the widest known cohesion metric in software engineering is LCOM (Lack of cohesion in methods) introduced by <ref type="bibr" target="#b5">Chidamber and Kemerer [1994]</ref> in their object-oriented metrics suite. As the name of the metric suggests, LCOM is an inverse cohesion metric: a low value of LCOM indicates high cohesion of a class and vice versa. LCOM is based on a specific coupling between methods: two methods in a class are considered as data coupled if they use at least one common class attribute. Then LCOM is the number of non-coupled methods (P) reduced by the number of coupled methods (Q) if P &gt; Q, or zero otherwise. From the definition of the LCOM it can be seen that this metric does not take external dependencies into account nor method invocations (another form of method coupling).</p><p>The approach of <ref type="bibr" target="#b7">Hitz and Montazeri [1995]</ref> to measure cohesion of software entities followed the research of Chidamber and Kemerer. For a class we can construct graph G whose nodes are methods defined in the class, and two methods are connected by an undirected link if they are data coupled. LCOM of Hitz and Montazeri is the number of connected components in G. The same authors also proposed another variant of the same metric where G includes method calls relations. Finally, they introduced a metric called connectivity which quantifies how much G is far from being completely connected. <ref type="bibr" target="#b0">Bieman and Kang [1995]</ref> introduced two cohesion metrics called tight class cohesion (TCC) and loose class cohesion (LCC). The basic element in their metrics is again a graph representing a class that encompasses methods of the class. TCC/LCC is the density (the actual number of links divided by the maximal number of links) of a TCC/LCC graph. Two methods are connected in TCC graph if they both access the same variable or there is a direct call between them. LCC graph is an extension of TCC graph that includes indirect method calls. <ref type="bibr" target="#b8">Lee et al. [1995]</ref> introduced a class cohesion metric based on information flow. The basic idea is that the strength of call coupling between invoking and invoked method is determined by the number of parameters of invoked method: the more information passed through formal parameters, the stronger call coupling between methods. Then, the cohesion of a method is defined as the number of calls to other methods multiplied by the number of formal parameters. Finally, the cohesion of a class is the sum of cohesion of its methods.</p><p>From the review of widely used software engineering cohesion metrics it can be concluded that the cohesiveness of a software entity is estimated in isolation. In other words, those metrics rely only on internal dependencies, while external dependencies, dependencies reflecting coupling between software modules, are not taken into account. However, external dependencies can be also important when estimating cohesiveness of software modules. Firstly, a module that has much more external than internal dependencies hardly can be considered as strongly cohesive regardless of the density or the connectedness of its internal parts. Secondly, having two modules that have the same degree of internal density the one with the smaller number of external dependencies can be considered as more cohesive compared to the other. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">GRAPH CLUSTERING EVALUATION METRICS</head><p>Let G = (V, E) be a directed graph where V is the set of nodes and E set of links. Let C denote a cluster in V (C ⊆ V ), and let c be a node from C. An intra-cluster link emanating from c connects c to another node from C, while an inter-cluster link emanating from c connects c to a node that does not belong to C. Intra-cluster out-degree of node c is the number of intra-cluster links emanating from c, while inter-cluster out-degree of node c is the number of inter-cluster links emanating from c.</p><p>The most common formulation of the graph partitioning problem asks for a division of the set of nodes into balanced, disjoint subsets of nodes such that the edge cut (links connecting nodes from different clusters) is minimized. Therefore, the basic graph clustering evaluation (GCE) metrics are based on the size of the edge cut. Let E C denote the size of the cut (the number of inter-cluster links) for cluster C,</p><formula xml:id="formula_0">E C = | {(x, y)} : x ∈ C, y ∈ C | = x∈C inter-cluster out-degree(x),</formula><p>I C the number of intra-cluster links for C,</p><formula xml:id="formula_1">I C = | {(x, y)} : x ∈ C, y ∈ C | = x∈C intra-cluster out-degree(x),</formula><p>N C the number of nodes in C, and N the number of nodes in the graph. Then cut based GCE metrics, conductance, expansion and cut-ratio, are defined as follows <ref type="bibr" target="#b9">[Leskovec et al. 2010</ref>]:</p><p>(1) Conductance of cluster C is the size of the cut normalized by the total number of links incident to nodes contained in C,</p><formula xml:id="formula_2">Conductance(C) = E C E C + I C .</formula><p>(2) Expansion of cluster C is the size of the cut divided by the total number of nodes in C,</p><formula xml:id="formula_3">Expansion(C) = E C N C .</formula><p>(3) Cut-ratio of cluster C is the size of the cut divided by the size of maximal cut,</p><formula xml:id="formula_4">Cut-ratio(C) = E C N C (N − N C ) .</formula><p>Probably the oldest definition of graph cluster originate from circuit theory which is furtherly adopted in social network analysis. Namely, <ref type="bibr" target="#b10">Luccio and Sami [1969]</ref> introduced the notion of LS-set that is also known as Raddichi strong community in social network analysis <ref type="bibr" target="#b11">[Radicchi et al. 2004</ref>]. For directed graphs, an LS-set is a subgraph such that the intra-cluster out-degree of each node in the set is higher than its inter-cluster out-degree. The nodes having zero out-degree are not taken into account when determining whether the cluster is Radicchi strong. If the number of intra-cluster links is higher than the number of inter-cluster links then the subgraph is considered as Radicchi weak cluster. Each Radicchi strong cluster is at the same time Radicchi weak cluster, while the converse is not generally true. If a cluster is Radicchi weak or strong then its conductance is smaller than 0.5. The difference between the number of intra-and inter-cluster links inspired ODF (out-degree fraction) family of cluster quality measures <ref type="bibr" target="#b9">[Leskovec et al. 2010</ref>]:</p><p>(1) Maximum-ODF of cluster C is the maximum fraction of inter-cluster links of a node observed in the cluster,</p><formula xml:id="formula_5">Maximum-ODF(C) = max c∈C | {(c, d)} : d ∈ C | D out (c) ,</formula><p>where D out (c) stands for out-degree of node c.</p><p>(2) Average-ODF of cluster C is the average fraction of inter-cluster links of nodes from C,</p><formula xml:id="formula_6">Average-ODF(C) = 1 N C c∈C | {(c, d)} : d ∈ C | D out (c)</formula><p>(3) Flake-ODF of cluster C is the fraction of nodes in C that have higher intra-cluster out-degree than inter-cluster out-degree,</p><formula xml:id="formula_7">Flake-ODF(C) = | {x : x ∈ C, | {(x, y)} : y ∈ C | &lt; D out (c)/2 | N C .</formula><p>In other words Flake-ODF measures how C is close to being Radicchi strong cluster: if Flake-ODF(C) is equal to 1 then C is Radicchi strong.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">SOFTWARE NETWORKS AND CLUSTERING METRICS</head><p>Software networks are graph-based representations of a software system. The architecture of the whole system can be represented by one directed graph that we refer to as a General Dependency Network (GDN) <ref type="bibr" target="#b12">[Savić et al. 2014</ref>]. The nodes of GDN represent software entities such as packages/units, classes/modules, methods/functions and class attributes/global variables, while links represent relations between them. We can distinguish between two types of links in GDN: "vertical" (CONTAINS) links that maintain the hierarchy of software entities and "horizontal" links that show dependencies between entities from the same level of abstraction. Two software entities A and B are connected by a CONTAINS link A → B if entity A defines or declares entity B. A group of entities that are contained in the same highly cohesive and loosely coupled entity naturally form a cluster of contained entities.</p><p>Examples of such clusters for object-oriented software systems are: (1) classes and interfaces contained in the same package or workspace, and (2) methods and class attributes contained in the same class. When a software is written in a procedural programming language then procedures (functions) and global variables defined in a module form a cluster. We can separate horizontal links of GDN into two categories:</p><p>-Intra-cluster link connects two entities from the same level of abstraction that are contained in the same software entity, i.e.</p><formula xml:id="formula_8">A → B is an intra-cluster link ⇔ (∃O) CONTAINS(O → A) ∧ CONTAINS(O → B).</formula><p>-Inter-cluster link connects two entities from the same level of abstraction that are contained in two different software entities, i.e.</p><formula xml:id="formula_9">A → B is an inter-cluster link ⇔ (∃O 1 , O 2 ) O 1 = O 2 ∧ CONTAINS(O 1 → A) ∧ CONTAINS(O 2 → B).</formula><p>The separation of links into intra-and inter-cluster links enables us to apply graph clustering evaluation (GCE) metrics to:</p><p>(1) Class collaboration networks in order to evaluate cohesiveness of packages. Class collaboration network is a subgraph of GDN that shows dependencies between classes and interfaces. (2) Extended static call graphs in order to evaluate cohesiveness of classes in OO systems or modules in procedural software systems. Functions (methods) and global variables (class attributes) constitute the set of nodes in an extended static call graph, while links denote call relationships between functions and uses (access) relationships between functions and global variables.</p><p>It can be easily seen from the definition of GCE metrics that only the Flake-ODF metric measures cohesion, while other metrics introduced in the previous section are inverse cohesion measures. In</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Graph clustering evaluation metrics as software metrics</head><p>• 11:85 contrast to cohesion metrics widely used in software engineering (see Section 2), GCE metrics do not ignore references to external entities. On the contrary, they use the number of dependencies to external references to determine to what extent the entity is isolated from the rest of the system. In other words, GCE metric are based on the following principle: an entity can be considered as highly cohesive if its elements are better connected themselves than with the entities defined outside the entity.</p><p>Figure <ref type="figure">1</ref> shows a class collaboration network that represent a simple software system that consists of two packages P and Q where both packages contain three classes. It can be observed that class F has higher inter-cluster out-degree than intra-cluster out-degree: this class references one class from its package and two classes from package P . Therefore, package Q is not Radicchi strong cluster. This package is neither Radicchi weak cluster since the number of intra-cluster links is not higher than the the number of inter-cluster links. It can be also observed that the system presented in Figure <ref type="figure">1</ref> can be refactored in order improve the overall degree of cohesion: if we move class F from package Q to package F then both packages will be Radicchi strong.  <ref type="bibr" target="#b3">Briand et al. [1996;</ref><ref type="bibr" target="#b2">1998]</ref> defined several properties that a software metric should satisfy in order to be theoretically sound (lack of) cohesion metric. Those properties are:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Package P Package</head><formula xml:id="formula_10">Q A B C D E F Package P Package Q Intra-</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">THEORETICAL ANALYSIS</head><p>(1) Nonnegativity. A cohesion (lack of cohesion) metric cannot take a negative value.</p><p>(2) Normalization. The metric belongs to an interval [0, M ], where M is the fixed maximal value.</p><p>(3) Null value. The cohesion of a software entity is null if R c is empty, where R c denotes the set of relationships within the software entity. This means that if there are no intra-cluster links the cohesion of the entity should be zero. On the other side, a metric measuring the lack of cohesion should be zero if R c is maximal. R c is maximal if all possible relationships within the entity are present. (4) Maximum value. If R c is maximal then a metric of cohesion takes the maximal value. If R c = ∅ then a metric measuring the lack of cohesion takes the maximal value. (5) Monotonicity. Let e be a software entity. Let e be the software entity such that R e ⊆ R e , i.e.</p><p>we added some relationships (intra-cluster links) in e to obtain e . Then the following inequalities must hold</p><formula xml:id="formula_11">C(e) ≤ C(e ),<label>(1)</label></formula><formula xml:id="formula_12">L(e) ≥ L(e ),<label>(2)</label></formula><p>11:86</p><formula xml:id="formula_13">• Miloš Savić, Mirjana Ivanović</formula><p>where C and L denote a cohesion and lack of cohesion metric, respectively. In other words, the property states that addition new intra-cluster links must not decrease/increase the value of the cohesion/lack of cohesion metric. (6) Merge property. Let e 1 and e 2 be two unrelated (unconnected) software entities. This means that e 1 does not reference e 2 and vice versa, i.e. there are no relationships (inter-cluster links) between e 1 and e 2 . Let e be the software entity which is the union of e 1 and e 2 . Then the following inequalities must hold</p><formula xml:id="formula_14">C(e) ≤ max{C(e 1 ), C(e 2 )},<label>(3)</label></formula><p>L(e) ≥ min{L(e 1 ), L(e 2 )}.</p><p>(4)</p><p>Namely, this property says that merging two unrelated entities must not increase/decrease the value of the cohesion/lack of cohesion metric.</p><p>As a first step in our theoretical analysis of graph clustering evaluation metrics, we state and prove the following lemma that will be frequently used in this section.</p><p>LEMMA 1. Let P and Q be two nonnegative numerical properties of a module. If P and Q are additive under the merge operation then a (lack of) cohesion metric defined as C = P/Q satisfies the merge property.</p><p>PROOF. Let m 1 and m 2 be two modules. Without loss of generality we can assume that C(m 1 ) ≤ C(m 2 ). Due to the nonnegativity of P and Q the following inequality holds</p><formula xml:id="formula_15">P (m 1 )Q(m 2 ) ≤ P (m 2 )Q(m 1 ).</formula><p>(5)</p><p>Let m denote the module obtained by merging m 1 and m 2 . Due to the additivity of P and Q we have that</p><formula xml:id="formula_16">C(m) = P (m 1 ) + P (m 2 ) Q(m 1 ) + Q(m 2 ) .</formula><p>C is a cohesion metric. Let us suppose that the merge property is not satisfied, i.e.</p><formula xml:id="formula_17">C(m) &gt; max{C(m 1 ), C(m 2 )} = C(m 2 ).</formula><p>By elementary algebraic transformation we obtain that</p><formula xml:id="formula_18">P (m 1 )Q(m 2 ) &gt; P (m 2 )Q(m 1 )<label>(6)</label></formula><p>which is in contradiction with inequality 5.</p><p>C is a lack of cohesion metric. Again we give a proof by contradiction. If</p><formula xml:id="formula_19">C(m) &lt; min{C(m 1 ), C(m 2 )} = C(m 1 )</formula><p>then by elementary algebraic transformation we again obtain inequality 6.</p><p>From the definition of GCE metrics (see Section 3) it can be easily seen that all of them are nonnegative. The maximal value of conductance is equal to 1 when R c = ∅ and consequently this measure satisfies both the normalization property and the maximum value property. When R c is maximal conductance is not necessarily equal to zero. Namely, conductance is equal to zero if and only if a module does not depend on other modules. Adding intra-cluster relationship increases only the denominator of conductance and consequently conductance satisfies the monotonicity property. The merge property of conductance is the consequence of Lemma 1 when P is the number of inter-cluster links and Q the sum of the number of inter-and intra-cluster links. Namely, the number of intra-cluster links is an additive property under the merge operation. Secondly, if two modules are unrelated then they have disjoint sets of inter-cluster links. This means that the number of inter-cluster links is also an additive property for unrelated modules.</p><p>In contrast to conductance, expansion does not satisfy the normalization property. If we modify expansion to be a value in the interval [0,1] then we actually obtain the cut-ratio metric. Expansion also does not satisfy the null value property and the maximum value property: both the numerator and denominator in the definition of expansion are independent on the number of intra-cluster links. The expansion of a module remains the same under the addition of intra-cluster links. Therefore, this metric also satisfies the monotonicity property. As already mentioned, the number of inter-cluster links is an additive property of disjoint modules. The number of nodes in a module is also an additive property under the merge operation. Therefore, by Lemma 1 expansion satisfies the merge property.</p><p>Cut-ratio satisfies the normalization property: the maximal value of cut-ratio is equal to 1 which is obtained when each entity from the module references all entities defined outside the module. Both the numerator and the denominator of cut-ratio are independent of the number of intra-cluster links and similarly as expansion this measure does not satisfy the null and the maximum value property. If we add a new intra-cluster link the cut-ratio does not change and consequently this metric satisfies monotonicity property. The cut-ratio metric satisfies the merge property which shows the following lemma.</p><p>LEMMA 2. Cut-ratio satisfies the merge property.</p><p>PROOF. Let C x denote the number of inter-cluster links emanating from nodes contained in module x, N x the number of nodes in module x, and N the number of nodes in the whole network. Let p and q be two disconnected modules such that the cut-ratio of p is smaller than the cut-ratio of q, i.e.</p><formula xml:id="formula_20">C p N p (N − N p ) ≤ C q N q (N − N q ) ⇔ C p N q (N − N q ) ≤ C q N p (N − N p ).<label>(7)</label></formula><p>Let r denote the union of p and q. Let us suppose that the merge property is not satisfied, i.e.</p><formula xml:id="formula_21">C p + C q (N p + N q )(N − N p − N q ) &lt; C p N p (N − N p )<label>(8)</label></formula><p>If we multiply both sides of inequality 8 by (N p + N q )(N − N p − N q )N p (N − N p ) &gt; 0, then we obtain</p><formula xml:id="formula_22">(C p + C q )N p (N − N p ) &lt; C p (N p + N q )(N − N p − N q ) (9) C q N p (N − N p ) &lt; C p N q (N − N q ) − 2C p N p N q (10) ≤ C p N q (N − N q ) (11)</formula><p>which is in contradiction with inequality 7.</p><p>From the definition of ODF measures it can be easily seen that they take values in the range [0, 1], which means that they satisfy bot nonnegativity and normalization property. When R c = ∅ then Maximum-ODF and Average-ODF are equal to 1, while Flake-ODF is equal to 0, which means that Maximum-and Average-ODF satisfy the maximum value property, while Flake-ODF satisfies the null value property (recall that Flake-ODF measures cohesion, while Maximum-and Average-ODF are lack of cohesion metrics). The numerator of Maximum-and Average-ODF is independent of the number of intra-cluster links. Consequently, those metrics does not satisfy the null value property and satisfy the monotonicity property (addition of intra-cluster links does not change Maximum-and Average-ODF). The merge property is trivially satisfied for Maximum-ODF. LEMMA 3. Average-ODF satisfies the merge property. PROOF. Let D a denote the number of inter-cluster links emanating from node a, D a out-degree of node a (D a ≤ Da), and N x the number of nodes in module x. Let p and q be two disconnected modules such that the Average-ODF of p is smaller than the Average-ODF of q, i.e.</p><formula xml:id="formula_23">1 N p u∈p D u D u ≤ 1 N q u∈q D u D u ⇔ N q α ≤ N p β, where α = u∈p D u D u , β = u∈q D u D u<label>(12)</label></formula><p>Let r denote the union of p and q. The Average-ODF of r is equal to</p><formula xml:id="formula_24">Average-ODF(r) = 1 N p + N q u∈r D u D u = 1 N p + N q u∈p D u D u + u∈q D u D u = α + β N p + N q (13)</formula><p>Let us suppose that Average-ODF does not satisfy the merge property, i.e. (α + β)/(N p + N q ) &lt; α/N p .</p><p>Then we obtain that βN p &lt; αN q which is in contradiction with inequality 12.</p><p>The addition of new intra-cluster links can only increase the number of entities defined in a module whose intra-cluster out-degree is greater than inter-cluster out-degree. Therefore, Flake-ODF satisfies the monotonicity property. The number of entities in the module whose intra-cluster out-degree is greater than inter-cluster out-degree is additive property under the merge operation. Therefore, Flake-ODF also satisfies merge property by Lemma 1. The properties of graph clustering metrics as (lack of) cohesion software metrics are summarized in Table <ref type="table" target="#tab_1">I</ref>. This table indicates the limitations of GCE metrics as software metrics. As observed by <ref type="bibr" target="#b2">Briand et al. [1998]</ref>, only a few widely known software cohesion metric fulfill all of the cohesion properties. In other words, a measure which does not satisfy all of the properties can be considered as poorly defined. Secondly, we can see that GCE metrics reflecting cohesion does not satisfy the null value property, while GCE metrics reflecting lack of cohesion does not satisfy the maximum value property. However, we believe that this is not the disadvantage of GCE metrics. Firstly, it is very unlikely to observe full connected software modules in practice (each class from a package reference each other; each method from a class calls each other and access to each class attribute). Secondly, in such cases GCE metrics favour loosely coupled software modules emphasizing another quality principle of good software design, i.e. the principle of low coupling.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">CONCLUSION AND FUTURE WORK</head><p>In this paper we introduced the idea of applying graph clustering evaluation (GCE) metrics to graphs representing software systems in order to evaluate cohesiveness of software entities. In contrast to standard cohesion metrics, GCE metrics do not ignore external references. They are based on the idea that reducing coupling between an entity and the rest of the system increases cohesion of the elements contained in the entity. Using the theoretical framework introduced by Briand et al. we investigated the properties of graph clustering evaluation metrics. This analysis showed that GCE metrics are theoretically sound with respect to the most important properties of software cohesion metrics (monotonicity and merge property), but also showed that they possess certain limitations we should be aware of when using GCE metrics as software metrics. Our future work will extend the present work with an empirical investigation of the following research questions:</p><p>(1) Do GCE metrics correlate to standard software cohesion metrics <ref type="bibr">(LCOMs, TCC, LCC, etc.)</ref> and to what extent? (2) Each software entity can be described by a numerical vector containing metrics of internal complexity (such as LOC, Halstead measures, cyclomatic complexity, etc.) and metric of design complexity (metrics quantifying importance of the entity such as betweenness centrality and page rank, its coupling to other entities such as degree centrality/CBO, inheritance for classes such as NOC and DIT, and invocation for methods/functions). Each of these vectors can be, according to the degree of cohesion, classified as Radicchi strong (strongly cohesive), Radicchi weak (weakly cohesive) or poorly cohesive (entity that is neither Radicchi strong nor Radicchi weak). Therefore, our second research question will be: are there any differences in internal and design complexity between strongly, weakly and poorly cohesive software entities? (3) Is it possible to automatically remodularize software system using simple refactorings such as move class/method in order to improve the degree of cohesion of the overall system (to increase the number of Raddichi strong clusters, to minimize the average conductance, etc.).</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table I .</head><label>I</label><figDesc>Properties of graph clustering metrics as (lack of) cohesion software metrics.</figDesc><table><row><cell>Metric</cell><cell cols="6">Nonnegativity Normalization Null value Maximum value Monotonicity Merge</cell></row><row><cell>Conductance</cell><cell>yes</cell><cell>yes</cell><cell>no</cell><cell>yes</cell><cell>yes</cell><cell>yes</cell></row><row><cell>Expansion</cell><cell>yes</cell><cell>no</cell><cell>no</cell><cell>no</cell><cell>yes</cell><cell>yes</cell></row><row><cell>Cut-ratio</cell><cell>yes</cell><cell>yes</cell><cell>no</cell><cell>no</cell><cell>yes</cell><cell>yes</cell></row><row><cell>Maximum-ODF</cell><cell>yes</cell><cell>yes</cell><cell>no</cell><cell>yes</cell><cell>yes</cell><cell>yes</cell></row><row><cell>Average-ODF</cell><cell>yes</cell><cell>yes</cell><cell>no</cell><cell>yes</cell><cell>yes</cell><cell>yes</cell></row><row><cell>Flake-ODF</cell><cell>yes</cell><cell>yes</cell><cell>yes</cell><cell>no</cell><cell>yes</cell><cell>yes</cell></row></table></figure>
		</body>
		<back>

			<div type="funding">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This work was supported by the Serbian Ministry of Education, Science and Technological Development through project Intelligent Techniques and Their Integration into Wide-Spectrum Decision Support, no. OI174023.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Cohesion and Reuse in an Object-oriented System</title>
		<author>
			<persName><forename type="first">James</forename><forename type="middle">M</forename><surname>Bieman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Byung-Kyoo</forename><surname>Kang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 1995 Symposium on Software Reusability (SSR &apos;95)</title>
				<meeting>the 1995 Symposium on Software Reusability (SSR &apos;95)<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="1995">1995</date>
			<biblScope unit="page" from="259" to="262" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Complex Networks : Structure and Dynamics</title>
		<author>
			<persName><forename type="first">S</forename><surname>Boccaletti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Latora</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Moreno</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Chavez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D-U.</forename><surname>Hwang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Physics Reports</title>
		<imprint>
			<biblScope unit="volume">424</biblScope>
			<biblScope unit="page" from="175" to="308" />
			<date type="published" when="2006">2006. 2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">A Unified Framework for Cohesion Measurement in Object-OrientedSystems</title>
		<author>
			<persName><forename type="first">Lionel</forename><forename type="middle">C</forename><surname>Briand</surname></persName>
		</author>
		<author>
			<persName><forename type="first">John</forename><forename type="middle">W</forename><surname>Daly</surname></persName>
		</author>
		<author>
			<persName><surname>Üst</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Empirical Software Engineering</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="65" to="117" />
			<date type="published" when="1998">1998. 1998</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Property-Based Software Engineering Measurement</title>
		<author>
			<persName><forename type="first">Lionel</forename><forename type="middle">C</forename><surname>Briand</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sandro</forename><surname>Morasca</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Victor</forename><forename type="middle">R</forename><surname>Basili</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Software Engineering</title>
		<imprint>
			<biblScope unit="volume">22</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="68" to="86" />
			<date type="published" when="1996">1996. 1996</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">Recent advances in graph partitioning</title>
		<author>
			<persName><forename type="first">Aydin</forename><surname>Buluc</surname></persName>
		</author>
		<author>
			<persName><forename type="first">¸</forename></persName>
		</author>
		<author>
			<persName><forename type="first">Henning</forename><surname>Meyerhenke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ilya</forename><surname>Safro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Peter</forename><surname>Sanders</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christian</forename><surname>Schulz</surname></persName>
		</author>
		<idno>CoRR abs/1311.3144</idno>
		<imprint>
			<date type="published" when="2013">2013. 2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">A metrics suite for object oriented design</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">R</forename><surname>Chidamber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">F</forename><surname>Kemerer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions in Software Engineering</title>
		<imprint>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="issue">6</biblScope>
			<biblScope unit="page" from="476" to="493" />
			<date type="published" when="1994">1994. 1994</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Community detection in graphs</title>
		<author>
			<persName><forename type="first">Santo</forename><surname>Fortunato</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Physics Reports</title>
		<imprint>
			<biblScope unit="volume">486</biblScope>
			<biblScope unit="page" from="75" to="174" />
			<date type="published" when="2010">2010. 2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Measuring Coupling and Cohesion in Object-Oriented Systems</title>
		<author>
			<persName><forename type="first">Martin</forename><surname>Hitz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Behzad</forename><surname>Montazeri</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. International Symposium on Applied Corporate Computing</title>
				<meeting>International Symposium on Applied Corporate Computing</meeting>
		<imprint>
			<date type="published" when="1995">1995</date>
			<biblScope unit="page" from="25" to="27" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Measuring the coupling and cohesion of an object-oriented program based on information flow</title>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">S</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">S</forename><surname>Liang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">F</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">J</forename><surname>Wang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of International Conference on Software Quality</title>
				<meeting>International Conference on Software Quality</meeting>
		<imprint>
			<date type="published" when="1995">1995</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Empirical Comparison of Algorithms for Network Community Detection</title>
		<author>
			<persName><forename type="first">Jure</forename><surname>Leskovec</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kevin</forename><forename type="middle">J</forename><surname>Lang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Michael</forename><surname>Mahoney</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 19th International Conference on World Wide Web (WWW &apos;10)</title>
				<meeting>the 19th International Conference on World Wide Web (WWW &apos;10)<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="631" to="640" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">On the decomposition of networks in minimally interconnected subnetworks</title>
		<author>
			<persName><forename type="first">F</forename><surname>Luccio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sami</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Circuit Theory</title>
		<imprint>
			<biblScope unit="volume">16</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="184" to="188" />
			<date type="published" when="1969">1969. 1969</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Defining and identifying communities in networks</title>
		<author>
			<persName><forename type="first">F</forename><surname>Radicchi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Castellano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Cecconi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Loreto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Parisi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Proceedings of the National Academy of Sciences</title>
		<imprint>
			<biblScope unit="volume">101</biblScope>
			<biblScope unit="issue">9</biblScope>
			<biblScope unit="page" from="2658" to="2663" />
			<date type="published" when="2004">2004. 2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">A language-independent approach to the extraction of dependencies between source code entities</title>
		<author>
			<persName><forename type="first">Milos</forename><surname>Savić</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gordana</forename><surname>Rakić</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Zoran</forename><surname>Budimac</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mirjana</forename><surname>Ivanović</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Information and Software Technology</title>
		<imprint>
			<biblScope unit="volume">56</biblScope>
			<biblScope unit="issue">10</biblScope>
			<biblScope unit="page" from="1268" to="1288" />
			<date type="published" when="2014">2014. 2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Graph Clustering</title>
		<author>
			<persName><forename type="first">Elisa</forename><surname>Satu</surname></persName>
		</author>
		<author>
			<persName><surname>Schaeffer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Compututer Science Review</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="27" to="64" />
			<date type="published" when="2007">2007. 2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<title level="m" type="main">Structured Design: Fundamentals of a Discipline of Computer Program and Systems Design</title>
		<author>
			<persName><forename type="first">Edward</forename><surname>Yourdon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Larry</forename><forename type="middle">L</forename><surname>Constantine</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1979">1979</date>
			<publisher>Prentice-Hall, Inc</publisher>
			<pubPlace>Upper Saddle River, NJ, USA</pubPlace>
		</imprint>
	</monogr>
	<note>1st ed</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
