=Paper=
{{Paper
|id=Vol-2506/Paper5-seed2019
|storemode=property
|title=Literature Reviews on Applying Artificial Intelligence/Machine Learning to Software Engineering Research Problems: Preliminary
|pdfUrl=https://ceur-ws.org/Vol-2506/Paper5-seed2019.pdf
|volume=Vol-2506
|authors=Pornsiri Muenchaisri
|dblpUrl=https://dblp.org/rec/conf/apsec/Muenchaisri19
}}
==Literature Reviews on Applying Artificial Intelligence/Machine Learning to Software Engineering Research Problems: Preliminary==
<pdf width="1500px">https://ceur-ws.org/Vol-2506/Paper5-seed2019.pdf</pdf>
<pre>
                                                                                                                                    30


           Literature Reviews on Applying Artificial Intelligence/Machine
         Learning to Software Engineering Research Problems: Preliminary

                                               Pornsiri Muenchaisri
                            Department of Computer Engineering, Faculty of Engineering,
                                   Chulalongkorn University, Bangkok, Thailand
                                            Pornsiri.mu@chula.ac.th


                                                                Abstract

                           This paper is aimed to explore the application of Artificial
                           Intelligence/Machine Learning (AI/ML) to software
                           engineering research problems. Which activities of software
                           engineering use AI/ML the most for solving research problems?
                           The scope of the paper is to preliminary review research papers
                           published in Asia-Pacific Software Engineering Conference
                           2018 (APSEC 2018) proceedings and researches conducted at
                           the Department of Computer Engineering, Chulalongkorn
                           University (CPCU). The author manually reviews papers with
                           some keywords such as machine learning, neural network, and
                           natural language processing. The result shows that machine
                           learning is used in coding and software quality improvement
                           activities more than other activities.


1           Introduction
Software Engineering (SE) is a step-by-step approach to gather and analyze requirements, design, develop, and test
a software effectively and efficiently. Each activity can be accomplished with suitable techniques and methods. For
example, software requirements can be obtained from interviewing and joint application development method which are
performed manually. After requirements are gathered, requirements may be categorized. An automatic tool may be needed
for requirement categorization. The research problem may involve finding an approach that can classify and categorize
each requirement into appropriate group. Methods for creating functional models from requirements automatically may
be needed. Design defect detection may be predicted before software is implemented. Automatically generating test cases
from requirements is also possible.
   Artificial Intelligence (AI) emphasizes the development of software which can perform tasks like human being, such
as visual perception, speech recognition, decision-making, and translation between languages. Machine Learning, one
type of Artificial Intelligence (AI), allows software to learn from data and make decisions. AI/ML are mainly applied to
solving problems on optimization, classification, clustering, and prediction. Two types of AI/ML research problems are
involving on software application and on theory discovery.


    Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    In: S. Chawla, B. Wadhwa, P. Muenchaisri (eds.): Proceedings of the 2nd Software Engineering Education Workshop (SEED 2019),
    Putrajaya, Malaysia, 02-Dec-2019, published at http://ceur-ws.org
                                                                                                                                    31


   Several research problems in software engineering particularly in requirements engineering, defect prediction, and
coding are solved using AI/ML. Tahira Iqbal et al. present literature review of AI/ML for requirements engineering
research problems [Iqb18]. Robert Feldt et al. present the review of AI in SE [Fel18]. Previous researches [San19],
[Poo18], [Mek12], [Man11], [Sre16], [Kae19], [Phe19] at CPCU have applied AL/ML methods in RE, coding, software
quality improvement and maintenance. In this paper, the author intends to investigate which SE activities that are often
used AI/ML methods to solve research problems. The most AI/ML used activity will be summarized. Research problems
of this paper include
   RQ1: What is the current state of the art in Software Engineering activities problems which are resolved with AI/ML
methods in APSEC 2018?
   RQ2: What is the current state of the art in Software Engineering activities problems which are resolved with AI/ML
methods at CPCU?
   The scope of this research is to extract information from APSEC 2018 proceedings and Software Engineering group
at Chulalongkorn University, Thailand. The results of the study may be considered to possibly update AI/ML contents of
some courses of the Software Engineering curriculum.
   Section 2 briefly describes related research. The methodology is explained in section 3. Results and conclusions are
described in section 4 and section 5 respectively.

2           Related Research
Robert Feldt et al. present “the AI in SE Application Levels (AI-SEAL) taxonomy” [Fel18]. Applications are categorized
according to their point of AI application (process or product), the type of AI technology used and the automation level
(1 to 10) allowed.” Types of AI include Symbolist, e.g., inverse deduction, Connectionist, e.g., backpropagation,
Evolutionary, e.g., genetic programming, Bayesians, e.g., probabilistic inference and Analogizers, e.g., kernel machines.
Seventeen papers of previous RAISE workshops (out of 44 papers) are papers with the application of AI to software
engineering. The papers are classified based on the three aspects. The results show that there are 12 process, 3 product
and 2 runtime-related papers. Eight of them are Analogizer, five Symbolist, and one for Evolutionary and for
Connectionist. Most of them have low level of automation (level 2-3).
   Tahira Iqbal et al. conduct literature review to obtain an overview of how ML are used in requirements engineering
(RE) which includes requirements elicitation, requirements analysis, requirements documentation and requirements
verification [Iqb18]. The paper summarizes as follows. 1. In requirements elicitation and discovery phase, several kinds
of research use mining, ML, and recommendation system to classify requirements into improvement request or not, into
bugs, features and junk and to discover evolutionary requirements and related requirements. 2. In requirements
specification and analysis phase, ML can use to identify if a set of requirements is Non-functional Requirements (NFR)
or not and is functional Requirements (FR) or not, to distinguish FR from NFR, to find Prioritization of Requirements,
and to identify if a NFR is security requirement. 3. In requirements validation phase, some researches validate on
consistency and traceability. 4. In requirement management activity, some researches focus on visualization of a large
group of requirements in order to make better decision and on using ML to classify and cluster information in requirements
specification into requirements or information and to grouping similar and related requirements and place them
contiguously.

3           Research Method

3.1         Two Aspects of Interests
Two aspects consist of software engineering aspect and AL/ML aspect. Since Tahira Iqbal et al. review papers which
apply ML to RE [Iqb18] and Robert Feldt et al. present papers that use AI on either process or product aspect [Fel18],
this paper further identifies specific process activities of software engineering by which AL/ML methods are used in
solving problem. Processes as software engineering aspect include Requirements Gathering, Analysis, Design, Coding,
Testing, Maintenance, Software Quality Improvement (Product). AL/ML methods as the second aspect include 1. natural
language processing, 2. supervised learning (support vector machine (SVM)), 3. unsupervised learning (genetic


    Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    In: S. Chawla, B. Wadhwa, P. Muenchaisri (eds.): Proceedings of the 2nd Software Engineering Education Workshop (SEED 2019),
    Putrajaya, Malaysia, 02-Dec-2019, published at http://ceur-ws.org
                                                                                                                                    32


algorithms, clustering/classification, similarity, K-Nearest Neighbour (KNN), and 4. reinforcement learning (neural
network, Bayesian network, Naïve Bayes).

3.2         Scope of the study
To answer RQ1, 82 regular papers of Asia-Pacific Software Engineering Conference 2018 (APSEC 2018) are reviewed.
Seven-teen papers containing keywords on AL/ML methods are found and studied. To answer RQ2, seven interviews and
some paper reviews are performed at the Department of Computer Engineering, Faculty of Engineering, Chulalongkorn
University, Bangkok, Thailand. Five papers [San19], [Poo18], [Mek12], [Man11], [Sre16], [Kae19] use AI/ML for
solving software engineering problems.

4           Results

4.1         Software Engineering Activities and AI/ML at APSEC 2018
Table I classifies papers according to software engineering activities and AI/ML methods used. Coding and software
quality improvement have more papers using AI/ML methods to solve research problems than other software engineering
activities which answers research question#1 (RQ1). The details of each paper are described in the following.

4.1.1       Requirements Engineering (RE) Papers
Requirements and analysis model are classified into functional or dysfunctional kind with The Open Innovation in
Requirements Engineering (OIRE) method [Yin18] and using different ML methods. MatGap is a tool providing gap
analysis of two sets of Business Rules: A golden reference and target set (one verb concept is removed). Similarity scores
are used to find correct matches.

4.1.2       Coding Papers
API usage patterns are automatically generated from the natural language queries [Tin18]. Rules-based regularization
method is used to get concise usage patterns. The encoder of the proposed method uses the recurrent neural network with
long short-term memory (LSTM) units. Comparisons with other methods are presented. Doc2Vec is an NLP tool that uses
neural networks [Ama18]. Comments of original java code and the comment-erased version are assessed with Doc2Vec.
Similarity score of each version is computed and checked if the erased-comments version has a high value or not.
Shinyama et al. analyze code comments to boost program comprehension using a decision-tree based classifier [Shi18].
Three different classifiers are built for each element: Extent, Target and Category. SOQDE is a supervised learning-based
(random forest) question difficulty estimation model [Has18]. STAR is a specialized tagging approach for docker
repositories [Yin18]. Logistic regression-based classifier is used to determine whether a tag should be assigned to a
repository. Four methods of tagging are compared.
   An automatic approach using KNN and Random forest [Kim18] is proposed to validate log levels in a class or a
method: Trace, Debug, Info, Warn, Error, or Fatal. A tool for Tuning the Level of Parallelism of Spark Applications
Optimizations with KMeansClustering and StreamingWordCount is proposed [Ros18]. An approach with neural network,
naive Bayes, logistic regression and SVM, DTPre based on decision tree [Moh18] is proposed to predict which pull
requests will get reopened in GitHub. SLAMPA tool recommends code snippets with statistical language Model [Zho18]
using a deep neural network called Recurrent Neural Network (RNN).

4.1.3       Software Quality Papers
A hybrid analysis method is designed to detect malicious JavaScript code [He18]. Several classifiers are used in
constructing classification models, such as Random Forests (RF), Logistic Regression (LR), Naive Bayes (NB) and
Support Vector Machine (SVM). Detecting Duplicate Bug Reports with Convolutional Neural Networks (CNN) is
presented in [Xie18].


    Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    In: S. Chawla, B. Wadhwa, P. Muenchaisri (eds.): Proceedings of the 2nd Software Engineering Education Workshop (SEED 2019),
    Putrajaya, Malaysia, 02-Dec-2019, published at http://ceur-ws.org
                                                                                                                                 33


   A ML-based approach is proposed to categorize and predict Invalid vulnerabilities on common vulnerabilities and
exposures [Che18]. A machine learning model adopts several classic classification algorithms including naïve Bayes,
multinomial naive Bayes, SVM and Random Forest for learning from the whole dataset of invalid CVEs. A Comparison
of Nano-patterns and software metrics in Vulnerability Prediction is presented in [Sul18]. A vulnerability prediction
model using the nano-patterns extracted from vulnerable and neutral (we use the term “neutral” to refer to methods where
no known vulnerability exists) code of different software systems. Three machine-learning techniques are used to classify
vulnerable code including Naive Bayes (NB), Support Vector Machine (SVM) and Logistic regression (LR).
   A Top-k Learning to Rank (LTR) Approach using Random forest is designed to predict cross- project software defect
[Wan18]. A bug localization model is constructed with two main parts including character-level convolutional neural
network (CNN) and recurrent neural network (RNN) language model [Xia18].

 Table I Research papers with Software Engineering activities by which ML/AI methods are used for problem solving.

      Paper#          Software                  AI/ML Type**                        Methods***                   Reference
                    Engineering
                     Activities*
         1          RE                          ML                        NB, Max, DT, RF, SVM                     [Yin18]
         2          RE                          Natural                   Similarity                               [Min18]
         3          Coding                      ML                        Clustering, neural network               [Tia18]
         4          Coding                      Natural, ML               NN, similarity                           [Ama18]
         5          Coding                      ML, Natural               DT, CoreNLP                              [Shi18]
         6          Coding                      Natural, ML               CoreNLP, RF, KNN, BN                     [Has18]
         7          Coding                      ML                        LR, Similarity                           [Yin18]
         8          Coding                      ML, Natural               KNN, RF                                  [Kim18]
         9          Coding                      ML                        KmeansClustering                         [Ros18]
         10         Coding                      ML                        NN,                                      [Moh18]
                                                                          NB, LR and SVM, DTPre
                                                                       based on DT
         11         Coding                      ML                        Deep NN (Recurrent NN)                   [Zho18]
         12         SWQ-defect                  ML                        RF, LR, NB, SVM                          [He18]
         13         SWQ-defect                  ML, Natural               NN                                       [Xie18]
         14         SWQ-security                ML                        NB, SVM, RF                              [Che18]
         15         SWQ-security                ML                        NB, SVM, LR                              [Sul18]
         16         SWQ-defect                  ML                        RF                                       [Wan18]
         17         SWQ-defect                  ML                        Convolution NN, RNN                      [Xia18]

   *SE activities: RE: Requirements Engineering, Design, Coding, Testing, SWQ: Software Quality Improvement
(SWQ-defect, SWQ-security),
   **Natural: Natural language processing, ML: Machine Learning
   ***NB:Naïve Bayes, Max:MaxEnt, DT: Decision Trees, RF: Random Forest, SVM: Support vector machine, NN:
Neural Networks, BN: Bayes Network, LR: Logistic Regression, ARL: Association Rule Learning, EBL:Explanation-
based learning,

4.2      Software Engineering Activities and AI/ML at CPCU
This section explores research conducted at the Department of Computer Engineering, Chulalongkorn University,
Thailand. There are 33 faculty members. Twelve faculty members have main researches in AI/ML and five in SE. Only
two faculty members solve software engineering research problems using AI/ML.


 Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
 In: S. Chawla, B. Wadhwa, P. Muenchaisri (eds.): Proceedings of the 2nd Software Engineering Education Workshop (SEED 2019),
 Putrajaya, Malaysia, 02-Dec-2019, published at http://ceur-ws.org
                                                                                                                                    34


         Table II shows research papers which use AI/ML methods solving software engineering research problems.
Naive Bayes method is used to classify short text of requirements [San19]. Association rule learning (ARL) is used to
find impact factors for rejection of pull requests on GitHub [Poo18]. Explanation-based learning in a meta-
programming approach is used to detects of Object-Oriented design defects [Mek12]. This paper uses machine learning
methods (Naive Bayes, Logistic, IB1, Ibk, VFI, J48 and Random forest) to predict bad-smells design from software
design model [Man11]. Defect-related keywords are discovered using natural language process (NLP) by analyzing user
feedback to extract defect related keyword [Sre16]. Prioritizing software maintenance plan uses Analytical Hierarchy
Process (AHP). Software problem report types are classified using machine learning [Kae19]. Mobile application user
reviews are classified for generating tickets on issue tracking system.
         Software quality improvement (SWQ-defect) has more papers using AI/ML methods to solve research
problems than other software engineering activities which answers research question#2 (RQ2). However, the result
from only 7 papers is not sufficient to make any general conclusion.

 Table II Research papers with Software Engineering activities by which AI/ML methods are used for problem solving
                       at the Department of Computer Engineering, Chulalongkorn University

                1                     RE                         ML                         NB                      [San19]
                2                   Coding                       ML                        ARL                     [Poo18]
                3                 SWQ-defect                     ML                         EBL                    [Mek12]
                4                 SWQ-defect                     ML                      NB, LR, RF                [Man11]
                5                 SWQ-defect                    NLP                        AHP                      [Sre16]
                6                 Maintenance                  NLP/ML                       NB                     [Kae19]
                7                 Maintenance                  NLP/ML                     NB, DT                    [Phe19]

5           Conclusions and future works
This paper preliminary investigates on using AI/ML of software engineering activities from papers published in APSEC
2018 and at the Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Thailand. In
APSEC 2018, there are more papers in coding and in software quality improvement using AI/ML methods than other
software engineering activities. At CPCU, most AI/ML researches are focused mainly in theoretical aspects and in
applying in several application domains. Only two faculty members uses AI/ML in software engineering research
problems. Future works include 1. review more papers with automatic tool 2. extend scope to cover research conducted
in industry 3. find a possibility to include AI/ML into SE courses/curriculum.

References
[Iqb18] T. Iqbal et al. A Bird’s Eye View on Requirements Engineering and Machine learning, the 25th Asia-Pacific
        Software Engineering Conference 2018 (APSEC 2018), December 4-7, Nara, Japan.
[Fel18] R. Feldt et al. Ways of Applying Artificial Intelligence in Software Engineering, the Workshop on Realizing
        Artificial Intelligence Synergies in Software Engineering (RAISE 2018), May 2018, Gothenburg, Sweden.
[San19] K. Sangounpao and P.Muenchaisri. ONTOLOGY-BASED NAIVE BAYES METHOD SHORT TEXT
        CLASSIFICATION FOR A SMALL DATASET, 20th IEEE/ACIS International Conference on Software
        Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD 2019), Japan.
[Poo18] P. Pooput and P.Muenchaisri. Finding Impact Factors for Rejection of Pull Requests on GitHub, The VII
        International Conference on Network, Communication and Computing (ICNCC 2018), Taipei, Taiwan,
        Dec.14-16, 2018.
[Mek12] S. Mekruksavanichi, P. P. Yupapin and P. Muenchaisri. Analytical Learning Based on a Meta-programming
        Approach for the Detection of Object-Oriented Design Defects, Information Technology Journal, 11:
        1677-1686, 2012.


    Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    In: S. Chawla, B. Wadhwa, P. Muenchaisri (eds.): Proceedings of the 2nd Software Engineering Education Workshop (SEED 2019),
    Putrajaya, Malaysia, 02-Dec-2019, published at http://ceur-ws.org
                                                                                                                                 35


[Man11] N. Maneera and P. Muenchaisri. Bad-smell Prediction from Software Design Model Using Machine Learning
        Techniques, the 8th International Joint Conference on Computer Science and Software Engineering
        (JCSSE2011), Nakhon Pathom, Thailand, May 11-13, 2011.
[Sre16] K. Srewuttanapitikul and P. Muenchaisri. Prioritizing Software Maintenance Plan by Analyzing User Feedback,
        The International Conference on Information Science and Security 2016 (ICISS 2016), December 19th-22nd,
        2016, Pattaya, Thailand.
[Kae19] Phatcharaporn Kaewnoo and Twittie Senivongse, "Identification of Software Problem Report Types Using
        Multiclass Classification," 2019 The 3rd International Conference on Software and e-Business (ICSEB 2019),
        December 9-11, 2019, Tokyo, Japan
[Phe19] Kittisak Phetrungnapha and Twittie Senivongse, "Classification of Mobile Application User Reviews for
        Generating Tickets on Issue Tracking System," The 12th International Conference on Information &
        Communication Technology and System (ICTS 2019), July 18, 2019, Surabaya, Indonesia
[Yin18] H. Yin et al. The OIRE Method - Overview and Initial Validation, the 25th Asia-Pacific Software Engineering
        Conference 2018 (APSEC 2018), December 4-7, Nara, Japan.
[Mit18] S. Mitra et al. MatGap A Systematic Approach to Perform Match and Gap Analysis among SBVR-Based
        Domain Specific Business Rules, the 25th Asia-Pacific Software Engineering Conference 2018 (APSEC 2018),
        December 4-7, Nara, Japan.
[Tin18] Y. Tian et al. Automatically Generating API Usage Patterns from Natural Language Queries, the 25th Asia-
        Pacific Software Engineering Conference 2018 (APSEC 2018), December 4-7, Nara, Japan.
[Ama18] H. Aman et al. A Doc2Vec-Based Assessment of Comments and Its Application to Change-Prone Method
        Analysis, the 25th Asia-Pacific Software Engineering Conference 2018 (APSEC 2018), December 4-7, Nara,
        Japan.
[Shi18] Y. Shinyama et al. Analyzing Code Comments to Boost Program Comprehension, the 25th Asia-Pacific
        Software Engineering Conference 2018 (APSEC 2018), December 4-7, Nara, Japan.
[Has18] Sk. A. Hassan et al. SOQDE: A Supervised Learning based Question Difficulty Estimation Model for Stack
        Overflow, the 25th Asia-Pacific Software Engineering Conference 2018 (APSEC 2018), December 4-7, Nara,
        Japan.
[Yin18] K. Yin et al. STAR: A Specialized Tagging Approach for Docker Repositories, the 25th Asia-Pacific Software
        Engineering Conference 2018 (APSEC 2018), December 4-7, Nara, Japan.
[Kim18] T. Kim et al. An Automatic Approach to Validating Log Levels in Java, the 25th Asia-Pacific Software
        Engineering Conference 2018 (APSEC 2018), December 4-7, Nara, Japan.
[Ros18] E. Rosales et al. lpt: a Tool for Tuning the Level of Parallelism of Spark Applications, the 25th Asia-Pacific
        Software Engineering Conference 2018 (APSEC 2018), December 4-7, Nara, Japan.
[Moh18] A. Mohamed et al. Predicting which pull requests will get reopened in GitHub, the 25th Asia-Pacific Software
        Engineering Conference 2018 (APSEC 2018), December 4-7, Nara, Japan.
[Zho18] S. Zhou et al. SLAMPA: Recommending Code Snippets with Statistical Language Model, the 25th Asia-Pacific
        Software Engineering Conference 2018 (APSEC 2018), December 4-7, Nara, Japan.
[He18] X. He et al. Malicious JavaScript Code Detection Based on Hybrid Analysis, the 25th Asia-Pacific Software
        Engineering Conference 2018 (APSEC 2018), December 4-7, Nara, Japan.
[Xie18] Q. Xie et al. Detecting Duplicate Bug Reports with Convolutional Neural Networks, the 25th Asia-Pacific
        Software Engineering Conference 2018 (APSEC 2018), December 4-7, Nara, Japan.
[Che18] Q. Chen et al. Categorizing and Predicting Invalid Vulnerabilities on Common Vulnerabilities and Exposures,
        the 25th Asia-Pacific Software Engineering Conference 2018 (APSEC 2018), December 4-7, Nara, Japan
[Sul18] K. Z. Sultana et al. A Comparison of Nano-patterns Vs. Software Metrics in Vulnerability Prediction, the 25th
        Asia-Pacific Software Engineering Conference 2018 (APSEC 2018), December 4-7, Nara, Japan.
[Wan18] F. Wang et al. A Top-k Learning to Rank Approach to Cross-Project Software Defect Prediction, the 25th
        Asia-Pacific Software Engineering Conference 2018 (APSEC 2018), December 4-7, Nara, Japan.
[Xia18] Y. Xiao et al. Improving Bug Localization with Character-level Convolutional Neural Network and Recurrent
        Neural Network, the 25th Asia-Pacific Software Engineering Conference 2018 (APSEC 2018), December 4-7,
        Nara, Japan.


 Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
 In: S. Chawla, B. Wadhwa, P. Muenchaisri (eds.): Proceedings of the 2nd Software Engineering Education Workshop (SEED 2019),
 Putrajaya, Malaysia, 02-Dec-2019, published at http://ceur-ws.org

</pre>