Detecting SQL Injection Attacks using Machine
                                Learning
                                Béatrice Moissinac1,∗ , Elie Saad1 , Miranda Clay1 and Maialen Berrondo1
                                1
                                    Okta


                                                                         Abstract
                                                                         Injection attacks such as SQL injection attacks (SQLia) are commonly used against systems. The
                                                                         consequences of those attacks range from financial, data, and reputational loss or worse. SQLia can be
                                                                         detected by analyzing the HyperText Transfer Protocol (HTTP) request data from which the SQLia is
                                                                         transmitted into the target resource. Various statistical and analytical tools exist today to detect SQLia,
                                                                         however, they are prone to false positives, which make their usage in production environment limited.
                                                                             In this paper, we propose (1) a method of feature engineering to generate SQL and HTTP language
                                                                         mixtures, (2) these mixtures are used to significantly reduce the time and effort needed by Subject
                                                                         Matter Experts (SMEs) to label, and (3) evaluate supervised Machine Learning models using this feature
                                                                         engineering method. Furthermore, a major contribution of this paper is that our proposed solution is
                                                                         developed and evaluated using real-world HTTP request data sampled from authentication transactions
                                                                         served by a major Identity & Access Management (IAM) company. Thus, we believe that our results are
                                                                         a strong representation of the real-world effect of this detection method. Finally, we also show that this
                                                                         technique can be trivially extended to other types of injection attacks.

                                                                         Keywords
                                                                         SQL injection, Machine Learning, Language mixture


                                1. Introduction
                                Injection attacks such as SQL injection attacks (SQLia) are commonly used against platforms to
                                extract, delete, or otherwise corrupt valuable resources [1]. The consequences of those attacks
                                range from financial, data, and reputation loss or worse [2]. Technically, SQLia are carried out by
                                ”injecting” (or inserting) SQL queries in the HyperText Transfer Protocol (HTTP) request data
                                sent between the client and the server1 . Once the attacker has sent the request containing the
                                nefarious SQL query, she expects the server to read the SQL query and perchance, a vulnerable
                                system would execute the query and either return, delete, or alter sensitive data. Furthermore,
                                the risk of SQLia has recently increased with the introduction of Large Language Models (e.g.,
                                ChatGPT), to the general public, lowering the barrier of entry for potential new threat actors
                                [3].

                                CAMLIS’23: Conference on Applied Machine Learning for Information Security, October 19–20, 2023, Arlington, VA
                                ∗
                                    Corresponding author.
                                Envelope-Open beatrice.moissinac@okta.com (B. Moissinac); elie.saad@okta.com (E. Saad); miranda.clay@okta.com (M. Clay);
                                maialen.berrondo@okta.com (M. Berrondo)
                                                                       © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                    CEUR
                                    Workshop
                                             CEUR Workshop Proceedings (CEUR-WS.org)
                                    Proceedings
                                                  http://ceur-ws.org
                                                  ISSN 1613-0073


                                1
                                    This paper is situated at the intersection of applied Machine Learning research and Threat Intelligence research.
                                    Thus, throughout this paper, we aim at joining both expertise together, by including definition and explanation for
                                    concepts to improve the general understanding of the reader, no matter their background.


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
  A diverse landscape of analytical and statistical tools exists to detect SQLia. In section 2.1,
we review Threat Intelligence techniques and libraries. These techniques focus on creating
rule sets, which are prone to false positives and restrict their usage in real-world settings. In
section 2.2, we review Machine Learning (ML) techniques to detect SQLia. These data-based
approaches are usually developed using ”synthetic data”, that is, data generated by a Subject
Matter Expert (SME) rather than from the real-world.
  On one hand, generating rule sets is time consuming, prone to false positives, and potentially
not exhaustive enough. On the other hand, Machine Learning techniques have been limited to
synthetic data and weak statistical modeling. In this paper, we propose to address those issues
with the following contributions:

   1. Propose a novel method of feature engineering to generate SQL and HTTP language
      mixtures inspired by topic modeling [4];

   2. These mixtures are used to significantly reduce the time and effort needed by Subject
      Matter Experts (SME) to label;

   3. Evaluate supervised Machine Learning models using this feature engineering method.

  Furthermore, a major contribution of this paper is that our proposed solution is developed
and evaluated using real-world HTTP request data sampled from authentication transactions
served by a major IAM company. Thus, we believe that our results are representative of how
the method would perform in the real-world.
  Finally, the novel feature engineering approach presented in this paper can be trivially
extended to the parent attack class of injection [5]. Thus, this model is more useful than current
existing techniques and covers a wider range of attack classes than what is available today.


2. Related Work on Detection of SQLia
2.1. Detection using Threat Intelligence Techniques
Released in 2012, the Libinjection project [6] proposed a novel way to detect SQLia. Most SQLia
detectors were based on rule-sets and regular expressions, while Libinjection developed attack
vector identification based on digesting previous patterns and generating an algorithm based
on them. Libinjection was published as a library to be integrated on application layer defenses.
It is commonly used by Open Source Web Application Firewalls (WAF), Intrusion Detection
Systems (IDS), and Open Source Security software, such as ModSecurity, an Apache module
[7, 8], which in turn, is used by other tools [9]. Libinjection has been extended to support a
number of languages (i.e., C, Python, PHP, JavaScript, Go, Ruby, and Java).

2.2. Detection using Machine Learning
Many industry tenants have also focused on developing and improving signature-based models
using ML, for instance, Fortinet [10] or CloudFlare [11]. Some other companies (such as F5 [12]
and Imperva [13]) implemented ML models to enable a wider set of signatures, and suppress
or trigger alerts based on the confidence of the ML model. However, their algorithms are not
publicly available for comparison. On the other hand, Academic research has published multiple
ML-based approaches to SQLia detection [14], whose details are described below.

Feature Sets ML models require a feature set, that is, a set of signals (i.e., presence/absence,
counts, etc) to be correlated with the desired output (i.e., is/isn’t SQLia). Thus, ML-based SQLia
detection heavily relies on SQL language markers for detection. For instance, in [15], the
authors used the presence of any comment character, the number of semicolons, the presence
of a tautology (i.e., a statement that is always true, such as 1 = 1), the number of commands
per statement, and the presence of abnormal command or special keywords. Similarly, in [16],
the features included single-line and multi-line comments, SQL operators, punctuation, logical
operators, keywords, etc. Virtually all prior work relied on some variation of the SQL language
marker, but also only the SQL language markers.

Algorithms Many ML-based SQLia detection models have been developed in recent years,
using Naive Bayes [16, 17, 18, 19], SVM [18, 20, 21], or an Ensemble method [15, 18, 19, 21].
However, we do note that Naive Bayes approaches may not be statistically robust. Naive
Bayes assumes the independence of features, however, programming language markers are not
independent from each other. For example, ‘SELECT‘ is very correlated with ‘FROM‘ in SQL.

Data Within the Threat Intelligence research on SQLia detection, models are developed from
data collected from ”red teams”, a teams of security SMEs, which generates injection attacks for
the purpose of testing a platform vulnerability. This type of data collection is omnipresent in
ML research on SQLia as well [15, 16, 18, 19]. From an ML point of view, this type of data is
called ’synthetic’ and presents a major risk of not being representative of real-world data, as
well as being too small. In [15], the authors trained their models on 105 SQL statements and
[16] used 178 examples. In [18], the authors collected 4,000 rows of plain text sentences from
HTML forms collected “from user input” via a “web application”. Overall, data collection and
labeling is the most expensive problem to solve in ML-based SQLia detection.


3. Data
HTTP URL Request The data is a set of 1 million HTTP URL requests from identity-centric
traffic. It was sampled from a large Customer Identity and Access Management (CIAM) platform
between 2021 and 2023, at the network edge2 . The volume of traffic from which this is sampled
is substantial enough to be representative of US customer Internet traffic, and the platform may
be considered a giant Honey Pot3 .
   In this paper, the proposed solution focuses only on the URL request data. We do not consider
the IP or any other signals within the transaction, because we want to specifically evaluate

2
 While the dataset may have been filtered before entering our line of vision, our methodology and results still
  represent a real-world use-case.
3
  In Threat Intelligence research, a Honey Pot is a system mimicking real-world vulnerabilities to attract the attacker
  and collect useful data about the attacker and the attacker pattern.
the statistical robustness of language mixtures as signals for SQLia detection. Furthermore,
methods such as this one are not meant to be a silver bullet, but could be integrated into a
layered security architecture.


4. Building a Language Mixture
Intuition behind Language Mixtures The idea described below is similar in spirit to the
Latent Dirichlet Allocation (LDA) [4] approach for topic modeling. The traditional LDA model
estimates a mixture of topics per document (i.e., what the document is about), and a mixture
of words per topic. LDA aims at answering the question: ”What topics are present in this
document, and how much are those topics discussed in this document?”. Similarly, we want to
calculate ”how much” HTTP and ”how much” SQL is in an URL request. In this sense, HTTP
and SQL are the ”topics” of the URL request (document). However, LDA is based on word count4
and relies on repetition of the same words within a document to estimate the mixture of topics.
Thus, LDA did not work well for this problem, because obfuscated SQLia have only very few
(and unique) SQL markers in the URL request.
   Instead, we propose a ”language mixture”, which is not affected by word count. For each URL
request, we score ”how much” SQL-like and ”how much” HTTP-like the request is. We want to
automatically estimate a dictionary of language markers for each language (SQL and HTTP).
Each marker is associated with a weight based on how important (or common) the marker is to
this language. For instance ’SELECT’ is very representative of SQL. Using real-world data is
crucial to guarantee that the markers and their weights are representative of real-world usage.

Building a Language Mixture To build a language mixture for SQL, we used 1 million SQL
queries from open source SQL repositories from GitHub. For the HTTP language mixture, we
used 1 million URL requests5 from the same provider described in Section 3. We did not start
with a known dictionary of SQL or HTTP operator, but rather extracted everything present
in the data and sorted it into three categories of token, in order to be representative of the
real-world. For each data set separately, we extracted three types of markers:

    • Keywords (any character chain of length 2 or more);

    • Delimiters (parenthesis, brackets, comma, etc.);

    • Operators (+, − , ∗, etc.).

   The weight of each token is the percentage of ”documents” (URL requests or SQL queries)
which contained that token at least once. For instance, the token ‘FROM‘ has a weight of 0.47
because it was present in 47% of the SQL queries. We kept tokens with a weight greater than
0.10.

4
”Word count” is a featurization in ML which count the occurrence of a word in an instance.
5
While it is possible that those URL requests contain injections and other ”impurities”, we assume that the low
volume of those attacks on this type of traffic sufficiently guarantees that the HTTP tokens extracted are correct
and representative.
Table 1
Calculation of SQL and HTTP mixture scores
                        Token SQL Mixture          HTTP Mixture
                          and              0.286           0.000
                         select            0.385           0.000
                           –               0.508           0.000
                           =               0.484           0.732
                            .              0.573           0.727
                            /              0.208           0.999
                            (              0.818           0.000
                            ,              0.747           0.000
                            )              0.818           0.000
                           ?               0.000           0.738
                        Total             4.827           3.196


Estimating the Language Mixture Consider the following SQL injection attack found in
the data set.

/yyoa/ext/trafaxserver/downloadAtt.jsp?attach_ids=
(1)%20and%201=2%20union%20select%201,2,3,4,5,md5(203735726),7--

   From this string, we extracted the tokens listed in Table 1. Each mixture is the sum of the
weights of the tokens present in the URL request. A weight is summed only once, even if the
token appears multiple times. that is, the mixture score is not the sum of the weights multiplied
by the number of occurrence of the token in the string. This is because it would make this
method too insensitive to obfuscation of short SQL queries within a long HTTP string. We
also don’t normalize the score, because it is not usual for SQL queries or HTTP strings to
have all their markers. Thus, the score is an absolute representation of the SQL-likeness or
HTTP-likeness rather than a relative percentage of completeness.


5. How to use language mixtures to detect SQLia
Conveniently, the example presented in the previous section has an SQL mixture greater than
its HTTP mixture. Unfortunately, comparing the language mixtures is generally not sufficient
to make a decision as to whether an URL request contains an SQLia. For instance, we found that
highly obfuscated SQLia will have a low SQL mixture and a high HTTP mixture. Nevertheless,
we can use the language mixtures to (1) label more efficiently the data set (2) build an ML model
using the mixture tokens and weights as features to learn to classify within the non-linear space
of SQL/HTTP mixtures.

Language Mixture as Labeling Heuristic We computed the SQL/HTTP mixtures for 1
millions URL requests from identity-centric authentication transactions. The distribution of
mixtures over the data set is not linear, as shown in Figure (1). Labeling such a large dataset is
not realistic, but the language mixtures can be used as a heuristic to efficiently select batches
of URL requests ”of interest”. From a ML point of view, we want to label examples near the
Figure 1: Number of URL request per SQL/HTTP mixture in the 1M row sample


boundaries (or thresholds) between SQL/HTTP. Those are the most ambiguous examples from
the ML model’s perspective. To find those, we sampled batches of URL request to be labeled,
with the following language mixtures characteristics:

(A) Highest SQL mixture : This yielded obvious SQLia such as:
          /upload/mobile/index.php?c=category&a=asynclist&price_max=1.0
          %20AND%20(SELECT%201%20FROM(SELECT%20COUNT(*),CONCAT(0x7e,md5
          (1),0x7e,FLOOR(RAND(0)*2))x%20FROM%20INFORMATION_SCHEMA.
          CHARACTER_SETS%20GROUP%20BY%20x)a)''

(B) Lowest HTTP mixture : Not ”HTTP”-enough, (and also not ”SQL-enough”) revealed the
      ability of this technique to discover other type of command injections, such as this XSS
      injection6 :
          /?q=%27%3E%22%3Csvg%2Fonload=confirm%28%27testing-
          xss1%27%29%3E&s=%27%3E%22%3Csvg%2Fonload=confirm%28%27testing-
          xss2%27%29%3E&search=%27%3E%22%3Csvg%2Fonload=confirm%28%27tes
          ting-xss3%27%29%3E&id=%27%3E%22%3Csvg%2Fonload=confirm%28%27te
          sting-[...]


6
    This example was truncated due to space limitation.
(C) Random sample across the entire set : A sample of 1,000 instances across the entire
     set was labeled to explore other areas of the search space, and increase chances to label
     diverse types of SQLia (in terms of SQL/HTTP mixtures). Then, using the SQLia found
     in this batch, we selected more URL request to be labeled by sampling URL requests
     whose mixture scores were within +/ − 𝑥, with 𝑥 varying from 0.05 to 1 from those SQLia
     examples.

Inter-Rater Reliability The data was labeled by threat intelligence and security engineer
SMEs. We reached an inter-rater reliability rate of 94.9% , with 1,705 innocuous URL requests
(labeled ’HTTP’) and 114 SQLia (labeled ’SQL’).


6. ML-based SQLia Detection
6.1. Experimental Setup
Features In this paper, we evaluated a feature engineering method based on creating language
mixture scores for each URL request. Thus, for each URL request, the feature vector has two
parameters: the mixture for SQL and HTTP respectively.

Benchmark To benchmark our proposed feature vector, we compare it with previously
proposed feature vectors using word counts7 and presence/absence flags8 of SQL tokens (see
Section 2.2). We used the 61 SQL tokens generated by the method presented in Section 4.

Algorithm In order to fairly compare the efficacy of the feature vectors described above, we
needed to use the same algorithm. Furthermore, the benchmark features have some statistical
particularities that restrict which algorithm to use. The features are correlated with each other
due to the nature of programming languages (e.g., ’SELECT’ and ’FROM’ in SQL are likely to go
together in a query). Thus, we used Decision Tree9 , an algorithm family which is not sensitive
to the correlation between features, and can optimize a solution within a non-linear search
space10 .

Training & Testing Sets We used the entire 1,819 labeled examples for the training without
hold-out, because the selection of training set example was biased by our goal to find more
SQLia examples to train the model. Thus, we did not test and validate on the training set.
Instead, we randomly selected 638 examples from the remaining 1 million URL requests, and
applied the fitted model to predict an ’http’ or ’sql’ label. In parallel, our security SMEs also
labeled the testing set for groundtruth. This way, the model is evaluated fairly, without biases
that may have been inputted from Section 5.
7
  A ”word count” feature vector is a vector where each word is a feature, and the value of each feature is the number
  of times the word appeared in the instance (i.e., the URL request)
8
 ’Presence/absence flags’ is a feature vector of boolean, with a token is a feature, and the parametrization is a boolean
  flag set to 1 if the token is present in the instance (i.e., URL request), and 0 otherwise.
9
 We used the 𝑠𝑘𝑙𝑒𝑎𝑟𝑛 Python package, which implements the CART algorithm.
10
   In ML, the search space is the set of all possible solutions to an optimization problem.
7. Results & Discussion
In this paper, we proposed a feature engineering method to calculate how much ”HTTP-like”
and ”SQL-like” are URL requests, in order to detect SQLia. These features are used as heuristic
to reduce the time and effort needed to create a data set to train an ML model for detecting
SQLia. We implemented a supervised ML model using Decision Tree to compare this feature set
with traditional feature sets (i.e., boolean flags and word counts). The notation of those feature
vectors is listed below, and used in the rest of this section.

     (A) HTTP & SQL language mixture scores;
     (B) Word Count of SQL tokens;
     (C) Presence/Absence of SQL tokens.

Evaluation Metrics From the ML point of view, a lot of the difficulty in evaluating ML
methods for SQLia detection is in the strong imbalance in the data set. The testing set contains
17 SQLia for 614 innocuous HTTP URL requests11 , thus accuracy is not a good measure, because
even if we mislabeled all the SQL injections, we would still have 97% accuracy. Instead, we
focused on False Positive Rate (FPR) and False Negative Rate (FNR). In the rest of this paper, we
consider a ’Positive’ to be an SQL injection, and a ’Negative’ to be an innocuous HTTP URL
request, and the rest of this section will be referring to the results presented in Tables 2, 3, and
4. We observed a trade-off between model (A), (B), and (C), where model (A) is less likely to
falsely identify a legitimate HTTP URL request as an SQLia compared to model (B) and (C) (i.e.,
FPR 0.16%), while model (C) is the best at identifying SQLia (FNR 0.0%). Additionally, those
results are to be nuanced. The inspection of the HTTP URL request marked as SQLia by model
(A) revealed that the request did contain an SQL command. The SQL command is expected by
that customer’s CIAM implementation. The SQLia sample is not large enough to extrapolate an
updated FPR. Overall, model (A) had fewer misclassifications.

The Lesser of Two Evils In a real-world production environment, the minimization of FPR
vs FNR will depend on the use-case. On one hand, letting through SQL command injections may
be dangerous, although we may also assume that this model would be part of a layered approach
to security. On the other hand, the friction caused to legitimate users by a large quantity12 of
false positives might become very undesirable. A preference for each feature vector will depend
on the use-case.

Real-World Cost A great advantage of Decisions trees is the full explainability and coverage
of the rule set generated (if the tree is allowed to go to its full depth). Hence, the model could be
trained off-line (Computation takes less than 1 second), for nearly free, and the rules generated
by the Decision Tree could be added to the current rules of any systems. Thus, we argue that
the cost of this method is comparable to the cost of current rule-based methods.
11
   and 7 XSS command injection, see our discussion below. Those XSS command injections are removed from the
   metrics calculation in order to strictly evaluate the model on SQLia detection
12
   Extrapolating the FPR of 1.63% on our original 1M URL requests would caused 16,300 requests to fail or be delayed.
Table 2
Feature Vector (A) Confusion Matrix (Language Mixture Scores)
                 Prediction/True label     HTTP        SQL   XSS
                                HTTP        613         3     4    FPR     0.16%
                                  SQL        1          14    3    FNR    17.65%

Table 3
Feature Vector (B) Confusion Matrix (Word count of SQL tokens)
                  Prediction/True label     HTTP       SQL   XSS
                                 HTTP        604        1     4    FPR    1.63%
                                   SQL       10         16    3    FNR    5.88%

Table 4
Feature Vector (C) Confusion Matrix (Presence/Absence of SQL tokens)
                  Prediction/True label     HTTP       SQL   XSS
                                 HTTP        603        0     4    FPR    1.79%
                                   SQL       11         17    3    FNR    0.00%

Table 5
Overlapping SQL & HTTP tokens and their language weights.
                                     Token     SQL      HTTP
                                       =       0.480     0.730
                                       _       0.780     0.660
                                       -       0.350     0.540
                                       .       0.570     0.730
                                       /       0.210     1.000


XSS and other type of injections. While labeling the training and testing sets, our SMEs
found that a URL request whose mixtures are not ’HTTP-enough’ and not ’SQL-enough’ is
likely to be some other sort of command injection (e.g., template, code, os, xxe etc.). While those
other type of command injections were removed from the training set, we decided to include XSS
examples in the testing set to highlight an avenue for future work: the expansion of language
mixtures to other types of injections. From a production perspective, it is desirable to develop
one ML model capable of detection/classifying various type of injections. However, the more
injection types are added, the more confusion is introduced. For example, in Table 5, SQL and
HTTP have overlapping tokens, that is, they ”share a feature”. When using a presence/absence
or word count type of featurization, overlapping features may create ambiguity that makes the
problem more difficult for an ML model. Intuitively, a language mixture approach may help
alleviate the overlapping of markers between languages, by biasing them with their weights
(i.e., their importance within that language).
    Future work will focus on ’unknown unknowns’ and previously unidentified vulnerabilities.
This will include research on the language mixtures with behavioral analysis of the URL request’s
response. This will deepen the model’s understanding to identify potential zero-days13 before
they are known by correlating request and response, and their effect. For instance, this may
help organizations identify attacks such as data exfiltration, and reduce the false positive rate
on benign requests.


Acknowledgement
We thank Kim Berry for initial discussion on this approach. We thank Mathew Woodyard and
George Vauter for their initial work on labels.


References
 [1] Cve, 2014. URL: http://cve.mitre.org/.
 [2] A. Hern,         TalkTalk hit with record £400k fine over cyber-attack,                 The
     Guardian            (2016).           https://www.theguardian.com/business/2016/oct/05/
     talktalk-hit-with-record-400k-fine-over-cyber-attack.
 [3] OWASP Top 10 for Large Language Model Applications | OWASP Foundation, 2023. URL:
     https://owasp.org/www-project-top-10-for-large-language-model-applications.
 [4] D. M. Blei, A. Y. Ng, M. I. Jordan, Latent dirichlet allocation, Journal of machine Learning
     research 3 (2003) 993–1022.
 [5] Shadowd Zecure, 2023. URL: https://capec.mitre.org/data/definitions/248.html.
 [6] LibInjection, 2012. URL: https://github.com/client9/libinjection/blob/master/README.md.
 [7] ModSecurity, 2002. URL: https://coreruleset.org/faq.
 [8] LibModSecurity,        2002. URL: https://github.com/SpiderLabs/ModSecurity/blob/
     ec86b242e15f9df1d143c1b7f86a27889658b4cb/README.md.
 [9] Naxsi, 2014. URL: https://github.com/nbs-system/naxsi/blob/master/README.md.
[10] Fortinet,        2023.       URL:       https://docs.fortinet.com/document/fortiweb/6.3.7/
     administration-guide/193258/machine-learning.
[11] CloudFlare, 2023. URL: https://blog.cloudflare.com/waf-ml/.
[12] F5,           2022.           URL:         https://community.f5.com/t5/technical-articles/
     f5-distributed-cloud-waf-ai-ml-model-to-suppress-false-positives/ta-p/299946.
[13] Imperva, 2004. URL: https://www.imperva.com/products/attack-analytics/.
[14] M. Al Rubaiei, T. Al Yarubi, M. Al Saadi, B. Kumar, SQLIA Detection and Prevention
     Techniques, in: 2020 9th International Conference System Modeling and Advancement in
     Research Trends (SMART), 2020, pp. 115–121. doi:10.1109/SMART50582.2020.9336795 .
[15] M. Hasan, Z. Balbahaith, M. Tarique, Detection of SQL injection attacks: A machine learn-
     ing approach, in: 2019 International Conference on Electrical and Computing Technologies
     and Applications (ICECTA), IEEE, 2019, pp. 1–6.
[16] I. Jemal, O. Cheikhrouhou, H. Hamam, A. Mahfoudhi, Sql injection attack detection
     and prevention techniques using machine learning, International Journal of Applied
     Engineering Research 15 (2020) 569–580.

13
     A Zero-day is a vulnerability that is not yet known, and that can be exploited.
[17] A. Makiou, Y. Begriche, A. Serhrouchni, Improving Web Application Firewalls to detect
     advanced SQL injection attacks, in: 2014 10th International Conference on Information
     Assurance and Security, IEEE, 2014, pp. 35–40.
[18] S. Mishra, SQL injection detection using machine learning (2019).
[19] K. Ross, M. Moh, T.-S. Moh, J. Yao, Multi-source data analysis and evaluation of machine
     learning techniques for SQL injection detection, in: Proceedings of the ACMSE 2018
     Conference, 2018, pp. 1–8.
[20] S. O. Uwagbole, W. J. Buchanan, L. Fan, Applied machine learning predictive analytics
     to SQL injection attack detection and prevention, in: 2017 IFIP/IEEE Symposium on
     Integrated Network and Service Management (IM), IEEE, 2017, pp. 1087–1090.
[21] K. Ross, SQL Injection Detection Using Machine Learning Techniques and Multiple Data
     Sources, Master of Science, San Jose State University, San Jose, CA, USA, 2018. URL:
     https://scholarworks.sjsu.edu/etd_projects/650. doi:10.31979/etd.zknb- 4z36 .