<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>New metrics for object-oriented software based on regular expression</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Fatma Zohra Mekahlia</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rabah Chabane-Chaouche</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer systems department, Faculty of Computer Science, University of Science and Technology Houari-Boumédiène Bab Ezzouar</institution>
          ,
          <addr-line>Algiers</addr-line>
          ,
          <country country="DZ">Algeria</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Modeling, Verification and Performance Evaluation of Complex Systems Laboratory</institution>
          ,
          <addr-line>MOVEP</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>In software engineering, the testing is a crucial phase of the software development life cycle. which represents a specific sequence of activities ensuring that software quality objectives are achieved. Currently, development companies have the heavy task of performing diferent types of testing which takes a lot of time and efort. Therefore, software defect prediction becomes a hot topic that allows estimating the parts of the code that are prone to failures in order to allow testing on these estimated parts only. Some researchers have proposed the use of software metrics to describe the characteristics of software evolution. For this reason, we have proposed six new object-oriented software metrics that monitor: import conflicts, exception conflicts, encapsulation rate, overload and overrid method ration, number of each Swing component. Furthermore, we have formalized these metrics using regular expression with lookaround assertions. This article essentially aims to: 1- study the software fault prediction procedure. 2-propose new metrics to strengthen the prediction. 3- formally proved the new metrics using regular expressions.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Software metrics</kwd>
        <kwd>object-oriented</kwd>
        <kwd>regular expression</kwd>
        <kwd>Java</kwd>
        <kwd>lookaround assertions</kwd>
        <kwd>software engineering</kwd>
        <kwd>formal model</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>1.1. Motivation</title>
        <p>
          In the field of software defect prediction, several works have shown that there is a strong relationship
between the number and quality of object-oriented metrics which influence the detection of subject
classes such as: [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Since the detection and determination of software defects
is a very delicate, dificult and costly phase compared to developers who spend a lot of time to target
and correct them in the diferent phases of the software life cycle such as unit tests, functional tests,
integration tests, etc. The transition to software defect prediction becomes essential because it allows
to reduce the efort, time and resources to identify any type of software defects before the delivery of
the product to the end customer. Generally, software defect prediction is based on machine learning
models [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] [10] and software metrics [11] [12]. Which makes the emergence and development of new
software metrics play a very important role in the quality and accuracy of software defect prediction
[13] and what prompted us to study the paradigm of object-oriented programming to propose new
metrics that represent fundamental concepts of OOP and that positively influence the prediction of
software defects. Existing solutions sufer from lack of certain metrics that represent fundamental
concepts in object-oriented software such as:
• The absence of measures that evaluate the degree of encapsulation in classes, knowing that
encapsulation is a fundamental principle of object-oriented programming. The evaluation of
the degree of encapsulation helps to avoid providing other undesirable side efects such as data
manipulation errors or unintentional external interferences.
• Failure to measure the degree of method redefinition and overloading that could cause common
problems for developers. Indeed, overuse of these mechanisms can lead to excessive complexity
and compromise the modularity of the code.
• Failure to measure the degree of use of exceptions in a program that can help detect potential
security risks.
• The failure to measure the complexity of the GUI user in terms of the use of the swing graphics
library because excessive use of certain components can negatively afect the performance of the
software due to the large number of swing components used.
        </p>
        <p>Our goal is to increase the set of object-oriented software metrics by addressing the gap we have
targeted in order to increase the performance as well as the accuracy of predicting software defects.
Regular expressions are a formalization of John von Neumann’s automata theory that was later
formalized by the mathematician Stephen Kleene. Afterwards, regular expressions were first implemented by
Ken Thompson in a software with a patent on the proper use of pattern matching and of course this
formalism has been reimplemented in many ways up to the present day. Today regular expressions can
be applied to extract information from text at a very high level and we can say that regular expressions
have exceeded the traditional mathematical limits. However, we can use them to search for tokens
or particular elements in a large text. For example, detecting email addresses in a text editor in order
to identify spam emails. Also, they can be used to replace target tokens in a text by other particular
elements in the case of data cleaning for example. Generally, in arithmetic, we use operations such as
* and / to construct expressions. As example: 5*4/2. In the same way, we can use regular operations
to construct regular expressions while describing a regular language, called regular expressions. For
example: ( ⋃︀ )* .</p>
      </sec>
      <sec id="sec-1-2">
        <title>1.2. Contributions</title>
        <p>This work consists of the proposal and formal validation of new object-oriented metrics which will
help in the future to predict object-oriented software defects. The main contributions of this paper are
summarized below:</p>
        <p>1. Proposal of six new object-oriented metrics that allow to measure the following aspects of a
software: import conflicts, degree of encapsulation, overridden and overloaded methods, potential risk
generated by the use of exceptions and complexity of the graphical user interface and other performance
measures like: file size, non-empty lines, empty lines, comment lines, total number of lines.</p>
        <p>2. We have formalized our proposed metrics using regular expressions lookarounds in Java where we
have presented our mathematical semantics.</p>
        <p>To our knowledge, this work is the first work that proposes new object-oriented metrics that measure
fundamental concepts in object-oriented programming such as method redefinition and overloading,
complexity of the graphical user interface with formalization of the metrics. Generally, the existing
works have proposed metrics that manage the general aspect of a software such as the number of lines
of code in addition the metrics have been presented by a simple formula.</p>
        <p>The paper is organized as follows. Related work was presented in Section 2. In section 3, we will
present our discussion on related work. In Section 4, we present the proposed set of object oriented
metrics. In Section 5, formal semantics have been presented and we will end with a conclusion and
future work in the section 6.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Related works</title>
      <p>Since the emergence of the object-oriented paradigm in 1990s, several software metrics have been
proposed in the literature that allow to evaluate the characteristics of an O.O software. In [14], the
authors invented a suite of metrics that allow you to measure the complexity of the object-oriented code.
Commonly called CK metrics are still references today. The metrics were validated using commercial
systems written in C++ and SmalItalk. Subsequently, in [15] the authors propose a data model and
terminology with illustration of the importance of CK: 1) Weighted Methods per Class, 2) Depth of
Inheritance Tree, 3) Number Of Children, 4) Coupling Between Objects, 5) Response For a Class, 6) Lack
of Cohesion in Methods.</p>
      <p>In [16] we find the MOOD metrics (Metrics for Object Oriented Design) which measure certain
characteristics of object oriented such as Encapsulation, Coupling, Inheritance and Polymorphism. In
[17]the authors present their tool, which is based on MOOD metrics proposed: 1) Method Hidden
Factor, 2) Attribute Hidden Factor, 3) Method Inheritance Factor, 4) Attribute Inheritance Factor, 5)
Polymorphism Factor, 6) Coupling factor.</p>
      <p>In [18] the authors presented through their surveys a comparative study between proprietary, free
and open source tools that evaluate static and dynamic object-oriented metrics. Moreover, the metrics
supported in dynamic metrics calculation tools are very limited such as inheritance, dynamic binding
and runtime polymorphism.</p>
      <p>The authors in [19] investigated the relationship between centrality measures and O.O-metrics in
order to predict the propensity for failure in three aspects: fault-prone classes, fault severity and number
of faults. As a conclusion, he finds that the use of O.O-metrics and centrality measures improves the
prediction of fault-prone classes and the number of failures in a program. To carry out this study, the
authors focus on 9 metrics that belong to three families: complexity, coupling, and size.</p>
      <p>Regular expressions, also called regex [20], are formal patterns that allow the extraction of tokens
or target character strings from a text or a computer program. A regular expression is represented by
a set of metacharacters where each metacharacter represents a given meaning. For example ( [ ] ) :
Match any one of the enclosed characters. Example: [abc] matches ’a’, ’b’, or ’c’. In the literature, we
ifnd several works which focus on the use of regular expressions as a formal solution which makes it
possible to target a certain text or program code in order to carry out certain processing on the targeted
area. In [21] the authors proposed the FungiRegEx software which is based on regular expressions and
which allows proteomic research. The software retrieves real-time data on several species from the JGI
Mycocosm database.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Discussion</title>
      <p>According to what we have observed in the work carried out, the quality of software metrics plays a
very important role in the prediction of software defects and makes it possible to increase the prediction
performance. For this reason and in this work, we aim to propose new object-oriented metrics in order
to improve the performance of software defect prediction. Furthermore, our second objective consists
of formally proving using regular expressions the validity of our new metrics which will be used in the
future to strengthen the prediction of software defects.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Proposed software metrics</title>
      <p>Developing computer software requires a methodical and rigorous approach, especially when it comes
to measuring the quality and performance of the code. In this context, the evaluation of metrics plays a
crucial role. A metric, in the field of software engineering, is a quantitative measure used to evaluate
various aspects of a software system, including its complexity, quality, maintainability, and performance.
Thus, this process allows informed decisions to be made throughout the development cycle, from design
to implementation, including validation and maintenance. Therefore, we followed several key steps
to evaluate and propose six new object-oriented metrics specifically for the Java language in order to
improve the quality and eficiency of object-oriented software, which are:</p>
      <sec id="sec-4-1">
        <title>4.1. Import Conflict (IC)</title>
        <p>Import conflicts often occur during software development. However, available Java development tools
do not provide a comprehensive analysis on this topic. Therefore, we proposed the IC metric to fill
this gap. This metric helps to manage the import of a Java file with all possible scenarios, it provides
insights into the complexity and class dependencies. IC retrieves all the imports in a special file and
then classifies them into four distinct categories which are:
• Used Imports: the import is used at least once. Example: import java.util.*. We propose to modify
the import by using only the class concerned.
• Not Used Imports: the import is never used. We propose to delete the import.
• Duplicate Imports: if the import is repeated at least once. we propose to remove the painful
copies.
• Conflict Imports: two or more imports have one class in common. Example: import java.sql.date
and import java.util.date. We propose to delete one of the two.</p>
        <p>Impact in software engineering:
• A large number of imports can increase code complexity and extend compilation time, thus
afecting software maintainability and performance.
• Better visibility of imports in a class improves its understandability and maintainability, reducing
unnecessary code complexity.
• This metric gives insight into a project’s imports and helps in detecting conflict errors even in
the general import case (*) that the IDE does not detect.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Java Exception Analyser (JEA)</title>
        <p>Help a detect potential security risks and strengthen the overall software security by evaluating this
aspect of the code. This metric analyzes the Java project to extract exceptions and classifies them into
two groups: default Java exceptions and non-default exceptions (exception made by the user), and itself
divided into two subsections: 1. Runtime exception: exception that extends RunTimeException. 2.
Compile time exception: exception that doesn’t extends RunTimeException.</p>
        <p>Impact in software engineering:
• A high number of exceptions handled indicates attention to error handling and exceptional cases.
• A large number of potential exceptions can indicate excessive complexity in the code, which can
make the code more dificult to understand, maintain, and debug.
• The metric is designed to improve code quality assessment by providing information about
exception handling practices.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Encapsulation Rate (ER)</title>
        <p>Encapsulation is one of the fundamental principles of object-oriented programming and a crucial aspect
of code quality. Therefore, we have proposed the ER metric to better evaluate and monitor this concept.
It measures the degree of encapsulation of a class and protected from direct access from other parts of
the program. Fetches number of members within a Java file (method, class) and categorize them by
their access modifiers: public, package friendly, protected and private.</p>
        <p>In addition we calculate the ratio of encapsulation using the below formula, where | | and
| | represent the number of private and protected members respectively, and   represents
the total number of members:</p>
        <p>Ratio of encapsula = | | + | |
 
(1)</p>
        <sec id="sec-4-3-1">
          <title>Impact in software engineering:</title>
          <p>Measuring the degree of data encapsulation of a class can be used to evaluate the design quality of a
class in an system. A class with a high encapsulation rate is generally considered to be better designed
because it promotes modularity, reusability, and code maintainability. On the other hand, a class with a
low encapsulation rate may be prone to undesirable side efects, such as data manipulation errors or
unintended external interference. Thus, by monitoring and optimizing the encapsulation rate of classes,
developers can improve the robustness and reliability of their software.</p>
        </sec>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Overload and Overrid Method Ratio (OOMR)</title>
        <p>Most developers want to identify areas in the code where inheritance and method overloading are
overused, as overuse of these mechanisms can lead to excessive complexity and compromise the
modularity of the code. Therefore, we proposed a specific metric to evaluate and monitor this aspect.
Our system retrieves methods of a class for categories in two parts: override methods et overload
methods.</p>
        <p>Then it calculates the ratio of overrided and overloaded methods and the OOMR ratio using the
below formulas , where || and || represent the number of overload and overload
methods respectively, and   represents the total number of methods:</p>
        <p>OOMR Ratio = || + ||</p>
        <sec id="sec-4-4-1">
          <title>Impact in software engineering:</title>
          <p>This proposed metric can be useful to assess the complexity and maintainability of a software system.
Thus, by calculating the OOMR, we can obtain indications on the quality of the design and identify
potential areas requiring revision or optimization in the code. As an example, a high OOMR indicates
an overuse of inheritance mechanisms, which can make the code more dificult to understand and
maintain. On the other hand, a low OOMR indicates good object-oriented design with appropriate
use of inheritance and method overloading. Finally, this metric ofers insight into code reuse and the
complexity of the object-oriented design, and allows to obtain valuable information on how methods
are manipulated in the code, thus facilitating the analysis of modularity.</p>
        </sec>
      </sec>
      <sec id="sec-4-5">
        <title>4.5. JavalyzerX (JAX)</title>
        <p>Understanding a Java project is an essential step for its improvement and development, given the lack
of a tool that meets all these criteria in one go. We proposed a metric that gives a complete analysis of
the project that will serve as a balance sheet, encompassing various aspects and providing valuable
insights into the structure, size and performance of the code. The JAX metric is a composite static
metric that includes several sub-metrics. The analysis includes metrics such as:
1. Number of line: fetches total number of line and categorize them in two section:
• Code statistic: a. Number of empty line and number of code line. b. Number of curly braces
only line (has only ’{’ or ’}’). c. Ratio of code line using the formula , where || is number of
code line and | | is total number of line:
(2)
(3)
(4)
(5)
• Comment statistic: a. Number of line of comment only. b. Ratio of comment line only using the
below formula , where || is number of comment line only and | | is total number
of line:
2. Software: a. Method prototype and interfaces implemented. b. Classes: Sub classes and abstract
classes. c. Parent of the Java file initialized by ’Object’ in case it isn’t extending another class.
3. Performance: a. size of the file in byte. b. RunTime of the file in seconds.</p>
        <sec id="sec-4-5-1">
          <title>Impact in software engineering:</title>
          <p>This metric provides a robust solution to analyze Java files in a specified folder. By providing valuable
insights into code structure, size, and performance, it improves the understanding and optimization of
Java projects.</p>
        </sec>
      </sec>
      <sec id="sec-4-6">
        <title>4.6. Swing Component (SC)</title>
        <p>Swing is a widely used graphical library for creating user interfaces in Java applications. Excessive use
of some components can negatively afect the application performance. Java applications can become
complex due to the large number of Swing components used. In order to enable developers to make
informed decisions about the organization and structure of the code, which in turn can help improve
the maintainability of the code in the long run, we have proposed a metric that calculates the exact
number of each Swing component in the code. SC is a static metric of user interface complexity. It takes
the swing elements in the Java file and classifies them according to their type.</p>
        <p>Impact in software engineering: this metric provides an accurate assessment of user interface
complexity, allowing developers to better understand the workload associated with maintaining and
extending it, it also facilitates the optimization of overall application performance by monitoring
component usage trends, as it contributes to the efective management of code complexity by allowing
developers to make informed decisions about the organization and structure of the code to improve its
maintainability.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Formal semantics</title>
      <p>In this section we will present our mathematical semantics which is based on regular expressions with
lookaround to formalize the six metrics proposed in Java. Let r be a regular expression and ∑︀ be an
alphabet. The language represented by r denoted L(r) is a regular language. r is a sequence of symbols,
like: union, concatenation, alphabet. To present our proposed engine, we will start by defining the
syntax of our Java’s regex:</p>
      <sec id="sec-5-1">
        <title>5.1. Java’s regex syntax</title>
        <p>5.1.1. Quantifiers
Quantifiers are symbols for specifying how many times a pattern should appear in a regular expression.
Our used quantifiers are:
• *: Matches 0 or more occurrences of the preceding pattern.
• +: Matches 1 or more occurrences.
• ?: Matches 0 or 1 occurrences.
• {n,m}: Matches between n and m occurrences..
• {n,}: Matches n or more occurrences.</p>
        <p>• {n}: Matches exactly n occurrences.
5.1.2. Alternation
Alternation is a strong concept in regular expressions which allows you to specify several alternatives
for the same pattern:
• Vertical Bar (|): Acts as a logical OR between patterns.</p>
        <p>Example: a|b matches ’a’ or ’b’.
5.1.3. Character Classes
Represent sets of characters that describe specific search patterns and which are:
• Square Brackets ( [ ] ) : Match any one of the enclosed characters.</p>
        <p>Example: [abc] matches ’a’, ’b’, or ’c’.
• Ranges: Specify a range of characters.</p>
        <p>Example: [a-z] matches any lowercase letter. i.e. match any character that belongs to the
specified set
• Negation: Use to negate the character class. i.e. match any character that does not belong to the
specified set.</p>
        <p>
          Example: [
          <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5 ref6 ref7 ref8 ref9">0-9</xref>
          ] matches any character that is not a digit.
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Escaped characters</title>
        <p>
          In regular expressions, some characters cannot be used directly because they have a special meaning.
To use them, we will need to escape them with an escape character (\):
• \d : Matches any digit, equivalent to [
          <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5 ref6 ref7 ref8 ref9">0-9</xref>
          ].
• \D : Matches any non-digit, equivalent to [
          <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5 ref6 ref7 ref8 ref9">0-9</xref>
          ].
• \w : Matches any word character (alphanumeric plus underscore), equivalent to [a-zA-Z0-9_].
• \W : Matches any non-word character.
• \s : Matches any whitespace character (spaces, tabs, line breaks).
• \S : Matches any non-whitespace character.
        </p>
        <p>• \n : Matches new line .
5.2.1. Special Characters</p>
        <p>• Dot ( . ) : Matches any single character except newline (\n) .
5.2.2. Lookahead and Lookbehind
Regular expression lookarounds are very practical for checking the presence or absence of a left or
right subexpression in relation to the current position while using constraints on the context in the
searched pattern [22]. Furthermore, if the regular expression engine tests a lookaround, it does not
advance in the text but rather it stays in its place, it can advance if and only if the condition defined in
the lookaround is tested correctly. You should also know that there are two types of lookaround which
are lookaheads and lookbehinds.</p>
        <p>Lookaheads to test the presence or absence of a pattern ahead of the searched pattern by specifying
conditions to check after the current position. On the other hand, lookbehinds to test the presence or
absence of a pattern after the searched pattern by specifying conditions to check before the current
position.</p>
        <p>We also have positive and negative lookaheads. For positive lookahead checks that the expression
must be found after the current position and without including this expression in the global match
((?=expression)). On the other hand, negative lookaheads ((?!expression)) checks if the expression is not
found after the current position and therefore the opposite.</p>
        <p>Definition: Let ∑︀ be an alphabet. Lookaround tests the presence or absence of a pattern just before
or just after the searched pattern as follows:
• ( (?=pattern) ): it is positive lookahead. Asserts that what follows the current position in the
string matches the pattern inside the parentheses.</p>
        <p>Example: a(?=b) matches ’a’ only if it is followed by ’b’.
• ( (?!pattern) ): it is negative lookahead asserts that what follows the current position in the
string does not match the pattern inside the parentheses.</p>
        <p>Example: a(?!b) matches ’a’ only if it is not followed by ’b’.
• ( (?&lt;=pattern) ): it is positive lookbehind asserts that what precedes the current position in
the string matches the pattern inside the parentheses.</p>
        <p>Example: (?&lt;=a)b matches ’b’ only if it is preceded by ’a’.
• ( (?&lt;!pattern) ): it is negative lookbehind asserts that what precedes the current position in
the string does not match the pattern inside the parentheses.</p>
        <p>Example: (?&lt;!a)b matches ’b’ only if it is not preceded by ’a’.</p>
        <sec id="sec-5-2-1">
          <title>Example:</title>
          <p>Consider the regular expression \w+(?= email) searches any alphanumeric word that is followed by
the word “email”. Lookahead allows you to check the presence or not of the word “email” after the
current position with non-inclusion in the global correspondence.</p>
        </sec>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Language proposed</title>
        <p>5.3.1. Type Parameter
Type parameter is used to build other regex like the genericity part of class and method detection regex.
Example: public class NumberBox &lt;T extends Number&gt; {
public static &lt;T extends Number&gt; double sum(T[ ] array) {}}</p>
        <p>L1 = { Type Parameter Of Genericity }
• MultipleBoundPattern = (\s+extends\s+\w+(\s*&lt;\
w+&gt;\s*)?(\s+&amp;\s+\w+ (\s*&lt;\s*\w+\s*&gt;)?)*\s*)?,
matches one/multiple bounds in a Type Parameter extends NumberClass &amp; &lt;T &gt;.
• TypeParameterGen =
(\s*&lt;\s*\w+(MultipleBound</p>
        <p>Pattern) \s*(,\s*\w+(MultipleBoundPattern)\s*)*\s*
&gt;\s*)?, matches Type Parameter &lt;T extends NumberClass, k extends &lt;V »
5.3.2. Method Detection
• IC(Import Conflict): it’s needed to fetch the classes from the method prototype (return type ,
parameter type , exception thrown) because some of them need imports to be used.
Example: public static ArrayList
&lt;ImportStatus&gt;ImportFetch(File file){...}
• ER(Encapsulation Rate): the metric needs to obtain the method prototype from the java file to
fetch its access modifier.</p>
        <p>Example: private void switchButtonToPoly() {...}
• SM(Swing Component): it’s needed to fetch the swing element from method prototype (return
type , parameter type , exception thrown).</p>
        <p>Example: public JPanel createPanelWithButton( JButton button)
• JAX(Java Analyzer): the metric needs to fetch the prototype method from the java file.
• JEA(Java Exception Analysis): it needs to fetch the exception thrown in a method prototype.</p>
        <p>Example: public static int countOverrideMethods (Class&lt;?&gt;) throws FileNotFoundException{ ...}
L2={Method Or Constructor Prototype}
• Bracket=(\[ \s* \]){1,2} ,to match [ ] or [ ][ ] of arrays.
• ArrayDeclarationPattern=\w+\s+\w+\s*(Bracket)| \w+
\s*(Bracket)\s*\w+ \s*, to match 2d or 1d array declaration like : int[][] IntegrGrid, Student
ArrayStudent [].
• ArrayTypePattern = \w+\s*(Bracket)\s*, to match 1d or 2d array type without name of the array
example:int [].
• NormalPattern = \w+\s+\w+ ,to match simple type: int nb, float pi.
• WrapperClass = \s*\w+\s*, to match WrapperClass Inside the &lt;&gt;of a collection: Integer.
• WildCardGen = \s* \? (extends\s+ \w+|super \s+ \w+)?\s*, matches wildcard genericity.
• SimpleInside = \s*(WrapperClass)\s* | \s*(ArrayTypePattern)\s*|(WildCardGen), to match either
wrapper class or arrays inside &lt;&gt;of a collection.
• InsideCollection = (SimpleInside) | \s*\w+&lt;\s*(SimpleInside)\s*&gt;|\s*\w+\s*&lt;\s*</p>
        <p>(SimpleInside)\s*,\s*(SimpleInside)\s*&gt;\s* ,to match double nested or normal inside of a collection.
• SetListPattern = \w+\s*&lt;\s*(InsideCollection)\s*&gt;\s* ,to match set and list collection.
• MapPattern = \w+\s*&lt;\s*(InsideCollection)\s*,\s*</p>
        <p>(InsideCollection)\s*&gt;\s* ,to match map collection.
• CollectionPatten = (MapPattern)|(SetListPattern).
• Paramter = \s*(NormalPattern)\s* |
\s*(Array</p>
        <p>DeclarationPattern)\s*| \s*(CollectionPattern) \w+\s* ,to match collection, array and simple type.
• Arg = \(\s*((Paramter)(,\s*(Paramter))*)?\s*\) ,to match the Arguments including the parenthesis
of a method it also matches no arguments.
• AcessModifier= (private\s+ |public \s+| protected</p>
        <p>\s+)?.
• NonAcessModifierSimple = (static\s+ |final</p>
        <p>\s+|abstract\s+)?.
• ModifierSimple = (AcessModifier) (NonAcessModifierSimple) | (NonAcessModifierSimple)
(AcessModifier).
• ModifierComplex = (AcessModifier)final \s+static
\s+|(AcessModifier)static\s+final \s+ | final \s+
(AcessModifier)static\s+ |static\s+(AcessModifier) final \s+ |static\s+final\s+(AcessModifier)
static\s+(AcessModifier).
• ModifierPattern = ModifierSimple |ModifierComplex.
• ThrowsPattern = (\s*throws\s+\w+\s* (\s*,\s*\w+</p>
        <p>\s*)*)?, to match single or multiple exceptions throws.
• CurlyBraces = (\{\s* | {\s*\}\s*)?, to match { or {}.
• ReturnType = \s*\w+\s+ | (collectionPattern) | (MapPattern) | (ArrayTypePattern) ,to match
return type could be array like int[ ] , simple type : Integer , collection : List &lt;ArrayList&lt;String
[]».
• ConstructorRegex=(AcessModfier)(TypeParameterGen)
\w+ \s*(Arg)\s*(ThrowsPattern)(CurlyBraces) ,to match constructor prototype :
ImportStatus(String ImportName,int ImportStatus,int LineNumber) { ...}.
• MethodPattern = (ModifierPattern)
(TypeParameterGen)(RetunType)\w+\s*(Arg) (ThrowsPattern)((CurlyBraces)|\s*;\s*), to match
normal method prototype: static Set&lt;String &gt;</p>
        <p>FetchSrcPackageFile(String AsterixImport){ ...}.
• MethodPrototypePattern= (MethodPattern) |(ConstructorRegex), to match method prototype.
5.3.3. Import Detection
IC(Import Conflict): it’s needed to fetch the imports from a Java file.</p>
        <p>L3 = {Import Line}
• StaticAccesModifier = (\s*static\s+)? , to match static access modifier for static import.
• ImportPattern=\s*import\s+(StaticModifier)\w+(\s*\.\s*
(\*|\w+))+\s*;\s*, to match static and normal import line : import static java.util.math.*; , import
application.BackEnd.RegularExpression;
5.3.4. Package Detection
it’s there to avoid package statement while reading the Java file line by line since it doest hold any
interesting data.</p>
        <p>L4 = {Package Line}
• PackagePattern =\s*package\s+\w+(\s*\.\s*\w+)*\s*;\s*, to catch package line : pacakge application;
, package application.Backend;
5.3.5. Catch Detection
L5={Catch Statement}
• IC(Import Conflict): some exceptions inside catch statement needs to be imported to be used.</p>
        <p>Example: catch (FileNotFoundException | MalformedURLException | ClassNotFoundException
e).
• JEA(Java Exception Analysis): metric needs to fetch the exceptions inside a catch statement.</p>
        <p>Example: catch(Exception e)
• OptionalClosingCurlyBraces = (\s*\}\s*)?, to match } before the catch statement.
• SingleCatch = \s*\w+\s+\w+\s*, to match single exception catch: Exception e.
• MultipleCatch = \s*\w+\s*(\s*\|\s*\</p>
        <p>w+\s*)*\s*\|\s*\w+\s+\w+\s*,to match multiple exception catch : FileNotFound | IOException e.
• InsideCatch = (SingleCatch)|(MultipleCatch), to match what’s between the parasyntheses of a
catch statement.
• CurlyBraces = (\{\s* | {\s*\}\s*)?, to match optional curly braces in the end of a catch statement : {
or { }.
• CatchPattern = \s*(OptionalClosingCurlyBraces)catch\s*</p>
        <p>\((InsideCatch)\)\s*(CurlyBraces)\s*, to match a catch statement.
5.3.6. String Literal
It’s used to build the method call regex since a string literal can be passed as a parameter.
L6={String Literal with all concatenation possible }
• Char= ["\n]+, matches any characters beside" and newline.
• StringConcatElement= (ClassVariable)| (MethodCall)|\w+|"(Char)" |(NumbersPattern), matches
methodCall: token.getToken(), ClassVariable:</p>
        <p>Student.Name, VarName: Age, or a String: "Hello World!", Numbers: 21, -32.21f.
• StringConcat = (StringConcatElement)(\+(StringConcatElement))*, this matches one/multiple
concatenation with StringConcatElement.
• LiteralStringPattern = ((StringConcat)\+)?"((Char)| " \+(StringConcat)\+ ")"(\+(StringConcat))? ,
matches string literal with optional concatenation in the beginning middle and end : Age + "Is
My Age and My Name is "+ Student.Name + "My Grade Is"+ 13.21f.
5.3.7. Numbers
The numbers regex is used to build the string literal regex since a number can be concatenated with a
string literal , it’s also used as parameter in method call.</p>
        <p>L7={Int And Float And Double}
• SignPattern = (\+\s*|\-\s*)?, to match sign of numbers: none , + ,
• FloatPattern = \s*(SignPattern)\d+\.\d+(f)?\s*, to match double and float.
• IntPattern = \s*(SignPattern)\d+\s* , to match integers.</p>
        <p>• NumbersPattern = (FloatPattern)|(IntPattern), to match int, float and double.
5.3.8. Method Call
It’s used to build the static call regex
L8={Method Call}
5.3.9. Throw Detection
L9={Throw Statement}
• ClassCall = (\w+\.)+\w+ matches static call of a method or object call method : list.size.
• ClassVariable = \w+\.\w+ , matches class variable like Student.age.
• SimpleArgMethodCall = (ClassVariable)|(NumbersPattern) |\w+|(LiteralStringPattern), simple
argument inside of a method call (): String literal:
"Hello world!", Numbers: 32, ClassVariable: City.inhabitant.
• SimpleMethodCall = \s*(ClassCall)\(((SimpleArgMethodCall)
(\s*,\s*SimpleArgMethodCall)*)?\s*\)\s*, matches simple method call with no method call
as parameter inside the(): list.size(),Student.grade(grade1, grade2, 12.90).
• Inside = (SimpleArgMethodCall)|(Class)
\(((SimpleMethodCall)(\s*,\s*SimpleMethodCall)*)? \s*\)\s*|(SimpleMethodCall), inside of a
method call is either numbers, variable, class variable, String Literal, nested method call, simple
method call.
• MethodCall = \s*(ClassCall)\(((Inside)(\s*,\s*(Inside))*)?</p>
        <p>\s*\)\s* , matches double nested method call.
• IC(Import Conflict): some exceptions inside a throw statement needs to be imported to be used.</p>
        <p>Example: throw new IllegalArgumentException("Number must be positive").
• JEA(Java Exception Analysis): metric needs to fetch the exception of a throw statement.</p>
        <p>Example: throw new Exception("ERROR EXCEPTION HAPPENING").
• ThrowPattern = \s*throw\s+new\s+\w+\s*\(\s*(Inside)\s*\)
\s*;\s* , matches throw statment :
throw new IllegalArgumentException("Age must be 18 or older.");
5.3.10. Class
JAX(Java Analyzer): the metric needs to fetch all the classes definition of a java file.</p>
        <p>L10={Class Definition}
• AcessModifier = (public\s+ |private\s+|protected\s+)?, to matches acces modfiers: private, public,
protected or none.
• NonAcessModifierClass = (abstract\s+ |final\s+)?, matches abstract, final modifier.
• ModifierClass = (AcessModifier)(NonAcessModifierClass) |(NonAcessModifierClass)
(AcessModifier) , matches all possible combination of modifiers : final private , public abstract , abstract
...etc
• ExtendsPattern = (?:\s+extends\s+\w+(\s*</p>
        <p>&lt;\s*\w+\s*&gt;\s*)?)? , matches extends : extends TreeCell&lt;TreeItemData&gt;.
• ImplementsPattern = (?:\s+implements\s+\w+\s*(\s*,\s*
\w+\s*)*)? , macthes one/multiple implements of interfaces : implements Comparable,
MathInterface.
• ClassPattern = \s*(ModifierClass)class\s+\w+ (TypeParameterGen) (ExtendsPattern)
(ImplementsPattern) \s*(?:\{\s*), matches class definition line:
public class CustomTreeCell extends TreeCell&lt;TreeItemData &gt;{...}
5.3.11. Instanciation
• IC(Import Conflict): to fetch the constructor since some of them needs to be imported to be used.</p>
        <p>Example: try (BuferedReader reader = new BuferedReader(new FileReader(file))).
• SC(Swing Component): to fetch the constructor of swing element.</p>
        <p>Example: mainFrame.setLayout(new BorderLayout());
L11={Line That Contains Instanciation}
• NewPattern = .+ ( \( | = )\s*new\s+ .+ , matches line of code that contains instanciation:
try(BuferedReader reader = new BuferedReader(new FileReader(file))).
5.3.12. Variable
L12={Variables}
• IC(Import Conflict): it’s needed to fetch the classes from the variables(reference type, type
parameter of a collection) because some of them need imports to be used.</p>
        <p>Example: public static ArrayList&lt;ImportStatus &gt;ListImport.
• ER(Encapsulation Rate): the metric needs to obtain the class attribute from the java file to fetch
its access modifier.</p>
        <p>Example: private static boolean IsMailUsed = false.
• SC(Swing Component): it’s needed to fetch swing element from the variables. Example: JButton
button.
• PatternAcessModifiers = (private\s+ |protected\s+|public\s+)?, to matches acces modfiers: private,
public, protected or none.
• StaticModifier = (static\s+)?, macthes static modifier.
• FinalModifier = (final\s+)?, matches final modifier.
• VarModifer = (PatternAcessModifiers) (FinalModifier) (StaticModifier) |(PatternAcessModifiers)
(StaticModifier) (FinalModifier) | (StaticModifier) (PatternAcessModfiers) (FinalModifier) |
(StaticModifier) (FinalModifier) (PatternAcessModifiers) | (FinalModifier) (PatternAcessModifiers)
(StaticModifier) | (FinalModifier) (StaticModifier) (PatternAcessModifiers), matches all the
possible combination of the modifiers: static final, final private static, public final, etc.
• VariablePattern = (VarModifer) ((?!return\s+)\w+\s+\w+ | (ArrayDeclarationPattern) |
(CollectionPattern)\w+ )\s*(=\s*.+)?;?, matches Variables: List&lt;Integer&gt;NbList; int a = 0; etc.
5.3.13. Annotation
Import Conflict: needed to fetch the annotation because some of them needs an import to be used.
Example: @FXML.</p>
        <p>L13={Annotation Besides Overload and Override}
• AnnotationPatten = \s*@\s*(?!(Overload|</p>
        <p>Override))\w+ \s*, this matches all anotation beside Override and Overload: @FXML
5.3.14. Static Call
IC(Import Conflict): some static call method and variable class need to be imported to be used.
Example: for ( ImportStatus Import: ImportController.ListImport ) Encapsulation encapsulation =
Encapsulation.EncapsulationFetch(file).</p>
        <p>L14={Line That Contains Static Method Call}
• StaticCallPattern= .+ (MethodCall) .+, this matches line of code that contains Static Call:
ListImport= ImportStatus.update(file,(ImportStatus.ImportFetch(file)));</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion and futur work</title>
      <p>The objective of this paper is to strengthen software defect prediction techniques by increasing the set
of objectoriented metrics, which leads to decreasing the load, time and efort of software development.
In this context, the evaluation of metrics has become crucial to measure the quality, complexity and
performance of software. This paper focuses on proposing new object-oriented metrics to better
understand contemporary challenges in software development and their contribution to improving
development processes by identifying risk areas and guiding code optimization in creating robust and
maintainable software. We have proposed six new object-oriented metrics. Similarly, we have proposed
a formal model based on regular expressions that allows us to present our metrics under a mathematical
model to validate their accuracy. Finally, we conclude with the future perspectives: First, develop a tool
based on the proposed medel. Second, working on a prediction model which is based on our proposed
metrics as well as some existing ones using artificial intelligence algorithms.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>The authors have not employed any Generative AI tools.
[10] A. Majd, M. Vahidi-Asl, A. Khalilian, P. Poorsarvi-Tehrani, H. Haghighi, Sldeep: Statement-level
software defect prediction using deep-learning model on static code features, Expert Systems with
Applications 147 (2020) 113156.
[11] N. A. A. Khleel, K. Nehéz, A novel approach for software defect prediction using cnn and gru
based on smote tomek method, Journal of Intelligent Information Systems 60 (2023) 673–707.
[12] Q. Yu, S. Jiang, J. Qian, L. Bo, L. Jiang, G. Zhang, Process metrics for software defect prediction in
object-oriented programs, IET Software 14 (2020) 283–292.
[13] A. Tete, F. Toure, M. Badri, Using deep learning and object-oriented metrics to identify critical
components in object-oriented systems, in: Proceedings of the 2023 5th World Symposium on
Software Engineering, 2023, pp. 48–54.
[14] S. R. Chidamber, C. F. Kemerer, A metrics suite for object oriented design, IEEE Transactions on
software engineering 20 (1994) 476–493.
[15] N. I. Churcher, M. J. Shepperd, Towards a conceptual framework for object oriented software
metrics, ACM SIGSOFT Software Engineering Notes 20 (1995) 69–75.
[16] F. B. Abreu, R. Carapuça, Object-oriented software engineering: Measuring and controlling the
development process, in: Proceedings of the 4th international conference on software quality,
volume 186, 1994, pp. 1–8.
[17] F. B. Abreu, M. Goulão, R. Esteves, Toward the design quality evaluation of object-oriented
software systems, in: Proceedings of the 5th International Conference on Software Quality, Austin,
Texas, USA, 1995, pp. 44–57.
[18] Manju, P. K. Bhatia, A survey of static and dynamic metrics tools for object oriented environment,
in: Emerging Research in Computing, Information, Communication and Applications: ERCICA
2020, Volume 2, Springer, 2022, pp. 521–530.
[19] A. Ouellet, M. Badri, Combining object-oriented metrics and centrality measures to predict faults
in object-oriented software: An empirical validation, Journal of Software: Evolution and Process
36 (2024) e2548.
[20] M. Sipser, Introduction to the theory of computation, ACM Sigact News 27 (1996) 27–29.
[21] V. Terrón-Macias, J. Mejia, M. A. Canseco-Pérez, M. Muñoz, M. Terrón-Hernández, Fungiregex: A
tool for pattern identification in fungal proteomic sequences using regular expressions, Applied
Sciences 14 (2024) 4429.
[22] K. Mamouras, A. Chattopadhyay, Eficient matching of regular expressions with lookaround
assertions, Proceedings of the ACM on Programming Languages 8 (2024) 2761–2791.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>N. E.</given-names>
            <surname>Fenton</surname>
          </string-name>
          ,
          <article-title>Software metrics: a practical and rigorous approach</article-title>
          , International Thomson Pub.,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>V. R.</given-names>
            <surname>Basili</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. C.</given-names>
            <surname>Briand</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. L.</given-names>
            <surname>Melo</surname>
          </string-name>
          ,
          <article-title>A validation of object-oriented design metrics as quality indicators</article-title>
          ,
          <source>IEEE Transactions on software engineering 22</source>
          (
          <year>1996</year>
          )
          <fpage>751</fpage>
          -
          <lpage>761</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kaur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Malhotra</surname>
          </string-name>
          ,
          <article-title>Application of random forest in predicting fault-prone classes, in: 2008 international conference on advanced computer theory and engineering</article-title>
          , IEEE,
          <year>2008</year>
          , pp.
          <fpage>37</fpage>
          -
          <lpage>43</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Rathore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <article-title>An empirical study of some software fault prediction techniques for the number of faults prediction</article-title>
          ,
          <source>Soft Computing</source>
          <volume>21</volume>
          (
          <year>2017</year>
          )
          <fpage>7417</fpage>
          -
          <lpage>7434</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Rathore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <article-title>A study on software fault prediction techniques</article-title>
          ,
          <source>Artificial Intelligence Review</source>
          <volume>51</volume>
          (
          <year>2019</year>
          )
          <fpage>255</fpage>
          -
          <lpage>327</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B.</given-names>
            <surname>Turhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. T.</given-names>
            <surname>Mısırlı</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bener</surname>
          </string-name>
          ,
          <article-title>Empirical evaluation of the efects of mixed project data on learning defect predictors</article-title>
          ,
          <source>Information and Software Technology</source>
          <volume>55</volume>
          (
          <year>2013</year>
          )
          <fpage>1101</fpage>
          -
          <lpage>1118</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kaur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Malhotra</surname>
          </string-name>
          ,
          <article-title>Empirical validation of object-oriented metrics for predicting fault proneness models</article-title>
          ,
          <source>Software quality journal 18</source>
          (
          <year>2010</year>
          )
          <fpage>3</fpage>
          -
          <lpage>35</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bhattacharya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Iliofotou</surname>
          </string-name>
          , I. Neamtiu,
          <string-name>
            <given-names>M.</given-names>
            <surname>Faloutsos</surname>
          </string-name>
          ,
          <article-title>Graph-based analysis and prediction for software evolution</article-title>
          ,
          <source>in: 2012 34th International conference on software engineering (ICSE)</source>
          , IEEE,
          <year>2012</year>
          , pp.
          <fpage>419</fpage>
          -
          <lpage>429</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D</given-names>
            <surname>.-L. Miholca</surname>
          </string-name>
          , G. Czibula,
          <string-name>
            <given-names>I. G.</given-names>
            <surname>Czibula</surname>
          </string-name>
          ,
          <article-title>A novel approach for software defect prediction through hybridizing gradual relational association rules with artificial neural networks</article-title>
          ,
          <source>Information Sciences 441</source>
          (
          <year>2018</year>
          )
          <fpage>152</fpage>
          -
          <lpage>170</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>