<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Attacks using Machine Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Béatrice Moissinac</string-name>
          <email>beatrice.moissinac@okta.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Elie Saad</string-name>
          <email>elie.saad@okta.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Miranda Clay</string-name>
          <email>miranda.clay@okta.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maialen Berrondo</string-name>
          <email>maialen.berrondo@okta.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CAMLIS'23: Conference on Applied Machine Learning for Information Security</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Injection attacks such as SQL injection attacks (SQLia) are commonly used against systems. The consequences of those attacks range from financial, data, and reputational loss or worse. SQLia can be detected by analyzing the HyperText Transfer Protocol (HTTP) request data from which the SQLia is transmitted into the target resource. Various statistical and analytical tools exist today to detect SQLia, however, they are prone to false positives, which make their usage in production environment limited.</p>
      </abstract>
      <kwd-group>
        <kwd>SQL injection</kwd>
        <kwd>Machine Learning</kwd>
        <kwd>Language mixture</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        Injection attacks such as SQL injection attacks (SQLia) are commonly used against platforms to
extract, delete, or otherwise corrupt valuable resources [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The consequences of those attacks
range from financial, data, and reputation loss or worse [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Technically, SQLia are carried out by
”injecting” (or inserting) SQL queries in the HyperText Transfer Protocol (HTTP) request data
sent between the client and the server1. Once the attacker has sent the request containing the
nefarious SQL query, she expects the server to read the SQL query and perchance, a vulnerable
system would execute the query and either return, delete, or alter sensitive data. Furthermore,
the risk of SQLia has recently increased with the introduction of Large Language Models (e.g.,
ChatGPT), to the general public, lowering the barrier of entry for potential new threat actors
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
nEvelop-O
∗Corresponding author.
CEUR
Workshop
Proceedings
This paper is situated at the intersection of applied Machine Learning research and Threat Intelligence research.
Thus, throughout this paper, we aim at joining both expertise together, by including definition and explanation for
concepts to improve the general understanding of the reader, no matter their background.
      </p>
      <p>A diverse landscape of analytical and statistical tools exists to detect SQLia. In section 2.1,
we review Threat Intelligence techniques and libraries. These techniques focus on creating
rule sets, which are prone to false positives and restrict their usage in real-world settings. In
section 2.2, we review Machine Learning (ML) techniques to detect SQLia. These data-based
approaches are usually developed using ”synthetic data”, that is, data generated by a Subject
Matter Expert (SME) rather than from the real-world.</p>
      <p>
        On one hand, generating rule sets is time consuming, prone to false positives, and potentially
not exhaustive enough. On the other hand, Machine Learning techniques have been limited to
synthetic data and weak statistical modeling. In this paper, we propose to address those issues
with the following contributions:
1. Propose a novel method of feature engineering to generate SQL and HTTP language
mixtures inspired by topic modeling [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ];
2. These mixtures are used to significantly reduce the time and efort needed by Subject
      </p>
      <p>Matter Experts (SME) to label;
3. Evaluate supervised Machine Learning models using this feature engineering method.</p>
      <p>Furthermore, a major contribution of this paper is that our proposed solution is developed
and evaluated using real-world HTTP request data sampled from authentication transactions
served by a major IAM company. Thus, we believe that our results are representative of how
the method would perform in the real-world.</p>
      <p>
        Finally, the novel feature engineering approach presented in this paper can be trivially
extended to the parent attack class of injection [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Thus, this model is more useful than current
existing techniques and covers a wider range of attack classes than what is available today.
      </p>
    </sec>
    <sec id="sec-3">
      <title>2. Related Work on Detection of SQLia</title>
      <sec id="sec-3-1">
        <title>2.1. Detection using Threat Intelligence Techniques</title>
        <p>
          Released in 2012, the Libinjection project [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] proposed a novel way to detect SQLia. Most SQLia
detectors were based on rule-sets and regular expressions, while Libinjection developed attack
vector identification based on digesting previous patterns and generating an algorithm based
on them. Libinjection was published as a library to be integrated on application layer defenses.
It is commonly used by Open Source Web Application Firewalls (WAF), Intrusion Detection
Systems (IDS), and Open Source Security software, such as ModSecurity, an Apache module
[
          <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
          ], which in turn, is used by other tools [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. Libinjection has been extended to support a
number of languages (i.e., C, Python, PHP, JavaScript, Go, Ruby, and Java).
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>2.2. Detection using Machine Learning</title>
        <p>
          Many industry tenants have also focused on developing and improving signature-based models
using ML, for instance, Fortinet [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] or CloudFlare [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. Some other companies (such as F5 [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]
and Imperva [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]) implemented ML models to enable a wider set of signatures, and suppress
or trigger alerts based on the confidence of the ML model. However, their algorithms are not
publicly available for comparison. On the other hand, Academic research has published multiple
ML-based approaches to SQLia detection [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], whose details are described below.
Feature Sets ML models require a feature set, that is, a set of signals (i.e., presence/absence,
counts, etc) to be correlated with the desired output (i.e., is/isn’t SQLia). Thus, ML-based SQLia
detection heavily relies on SQL language markers for detection. For instance, in [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], the
authors used the presence of any comment character, the number of semicolons, the presence
of a tautology (i.e., a statement that is always true, such as 1 = 1), the number of commands
per statement, and the presence of abnormal command or special keywords. Similarly, in [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ],
the features included single-line and multi-line comments, SQL operators, punctuation, logical
operators, keywords, etc. Virtually all prior work relied on some variation of the SQL language
marker, but also only the SQL language markers.
        </p>
        <p>
          Algorithms Many ML-based SQLia detection models have been developed in recent years,
using Naive Bayes [
          <xref ref-type="bibr" rid="ref16">16, 17, 18, 19</xref>
          ], SVM [18, 20, 21], or an Ensemble method [
          <xref ref-type="bibr" rid="ref15">15, 18, 19, 21</xref>
          ].
However, we do note that Naive Bayes approaches may not be statistically robust. Naive
Bayes assumes the independence of features, however, programming language markers are not
independent from each other. For example, ‘SELECT‘ is very correlated with ‘FROM‘ in SQL.
Data Within the Threat Intelligence research on SQLia detection, models are developed from
data collected from ”red teams”, a teams of security SMEs, which generates injection attacks for
the purpose of testing a platform vulnerability. This type of data collection is omnipresent in
ML research on SQLia as well [
          <xref ref-type="bibr" rid="ref15 ref16">15, 16, 18, 19</xref>
          ]. From an ML point of view, this type of data is
called ’synthetic’ and presents a major risk of not being representative of real-world data, as
well as being too small. In [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], the authors trained their models on 105 SQL statements and
[
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] used 178 examples. In [18], the authors collected 4,000 rows of plain text sentences from
HTML forms collected “from user input” via a “web application”. Overall, data collection and
labeling is the most expensive problem to solve in ML-based SQLia detection.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. Data</title>
      <p>HTTP URL Request The data is a set of 1 million HTTP URL requests from identity-centric
trafic. It was sampled from a large Customer Identity and Access Management (CIAM) platform
between 2021 and 2023, at the network edge2. The volume of trafic from which this is sampled
is substantial enough to be representative of US customer Internet trafic, and the platform may
be considered a giant Honey Pot3.</p>
      <p>In this paper, the proposed solution focuses only on the URL request data. We do not consider
the IP or any other signals within the transaction, because we want to specifically evaluate
2While the dataset may have been filtered before entering our line of vision, our methodology and results still
represent a real-world use-case.
3In Threat Intelligence research, a Honey Pot is a system mimicking real-world vulnerabilities to attract the attacker
and collect useful data about the attacker and the attacker pattern.
the statistical robustness of language mixtures as signals for SQLia detection. Furthermore,
methods such as this one are not meant to be a silver bullet, but could be integrated into a
layered security architecture.</p>
    </sec>
    <sec id="sec-5">
      <title>4. Building a Language Mixture</title>
      <p>
        Intuition behind Language Mixtures The idea described below is similar in spirit to the
Latent Dirichlet Allocation (LDA) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] approach for topic modeling. The traditional LDA model
estimates a mixture of topics per document (i.e., what the document is about), and a mixture
of words per topic. LDA aims at answering the question: ”What topics are present in this
document, and how much are those topics discussed in this document?”. Similarly, we want to
calculate ”how much” HTTP and ”how much” SQL is in an URL request. In this sense, HTTP
and SQL are the ”topics” of the URL request (document). However, LDA is based on word count4
and relies on repetition of the same words within a document to estimate the mixture of topics.
Thus, LDA did not work well for this problem, because obfuscated SQLia have only very few
(and unique) SQL markers in the URL request.
      </p>
      <p>Instead, we propose a ”language mixture”, which is not afected by word count. For each URL
request, we score ”how much” SQL-like and ”how much” HTTP-like the request is. We want to
automatically estimate a dictionary of language markers for each language (SQL and HTTP).
Each marker is associated with a weight based on how important (or common) the marker is to
this language. For instance ’SELECT’ is very representative of SQL. Using real-world data is
crucial to guarantee that the markers and their weights are representative of real-world usage.
Building a Language Mixture To build a language mixture for SQL, we used 1 million SQL
queries from open source SQL repositories from GitHub. For the HTTP language mixture, we
used 1 million URL requests5 from the same provider described in Section 3. We did not start
with a known dictionary of SQL or HTTP operator, but rather extracted everything present
in the data and sorted it into three categories of token, in order to be representative of the
real-world. For each data set separately, we extracted three types of markers:
• Keywords (any character chain of length 2 or more);
• Delimiters (parenthesis, brackets, comma, etc.);
• Operators (+, − , ∗, etc.).</p>
      <p>The weight of each token is the percentage of ”documents” (URL requests or SQL queries)
which contained that token at least once. For instance, the token ‘FROM‘ has a weight of 0.47
because it was present in 47% of the SQL queries. We kept tokens with a weight greater than
0.10.
4”Word count” is a featurization in ML which count the occurrence of a word in an instance.
5While it is possible that those URL requests contain injections and other ”impurities”, we assume that the low
volume of those attacks on this type of trafic suficiently guarantees that the HTTP tokens extracted are correct
and representative.
/yyoa/ext/trafaxserver/downloadAtt.jsp?attach_ids=
(1)%20and%201=2%20union%20select%201,2,3,4,5,md5(203735726),7-</p>
      <p>From this string, we extracted the tokens listed in Table 1. Each mixture is the sum of the
weights of the tokens present in the URL request. A weight is summed only once, even if the
token appears multiple times. that is, the mixture score is not the sum of the weights multiplied
by the number of occurrence of the token in the string. This is because it would make this
method too insensitive to obfuscation of short SQL queries within a long HTTP string. We
also don’t normalize the score, because it is not usual for SQL queries or HTTP strings to
have all their markers. Thus, the score is an absolute representation of the SQL-likeness or
HTTP-likeness rather than a relative percentage of completeness.</p>
    </sec>
    <sec id="sec-6">
      <title>5. How to use language mixtures to detect SQLia</title>
      <p>Conveniently, the example presented in the previous section has an SQL mixture greater than
its HTTP mixture. Unfortunately, comparing the language mixtures is generally not suficient
to make a decision as to whether an URL request contains an SQLia. For instance, we found that
highly obfuscated SQLia will have a low SQL mixture and a high HTTP mixture. Nevertheless,
we can use the language mixtures to (1) label more eficiently the data set (2) build an ML model
using the mixture tokens and weights as features to learn to classify within the non-linear space
of SQL/HTTP mixtures.</p>
      <sec id="sec-6-1">
        <title>Language Mixture as Labeling Heuristic We computed the SQL/HTTP mixtures for 1</title>
        <p>millions URL requests from identity-centric authentication transactions. The distribution of
mixtures over the data set is not linear, as shown in Figure (1). Labeling such a large dataset is
not realistic, but the language mixtures can be used as a heuristic to eficiently select batches
of URL requests ”of interest”. From a ML point of view, we want to label examples near the
boundaries (or thresholds) between SQL/HTTP. Those are the most ambiguous examples from
the ML model’s perspective. To find those, we sampled batches of URL request to be labeled,
with the following language mixtures characteristics:
(A) Highest SQL mixture : This yielded obvious SQLia such as:
/upload/mobile/index.php?c=category&amp;a=asynclist&amp;price_max=1.0
%20AND%20(SELECT%201%20FROM(SELECT%20COUNT(*),CONCAT(0x7e,md5
(1),0x7e,FLOOR(RAND(0)*2))x%20FROM%20INFORMATION_SCHEMA.</p>
        <p>CHARACTER_SETS%20GROUP%20BY%20x)a)''
(B) Lowest HTTP mixture : Not ”HTTP”-enough, (and also not ”SQL-enough”) revealed the
ability of this technique to discover other type of command injections, such as this XSS
injection6:
6This example was truncated due to space limitation.
(C) Random sample across the entire set : A sample of 1,000 instances across the entire
set was labeled to explore other areas of the search space, and increase chances to label
diverse types of SQLia (in terms of SQL/HTTP mixtures). Then, using the SQLia found
in this batch, we selected more URL request to be labeled by sampling URL requests
whose mixture scores were within +/ −  , with  varying from 0.05 to 1 from those SQLia
examples.</p>
        <p>Inter-Rater Reliability The data was labeled by threat intelligence and security engineer
SMEs. We reached an inter-rater reliability rate of 94.9% , with 1,705 innocuous URL requests
(labeled ’HTTP’) and 114 SQLia (labeled ’SQL’).</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>6. ML-based SQLia Detection</title>
      <sec id="sec-7-1">
        <title>6.1. Experimental Setup</title>
        <p>Features In this paper, we evaluated a feature engineering method based on creating language
mixture scores for each URL request. Thus, for each URL request, the feature vector has two
parameters: the mixture for SQL and HTTP respectively.</p>
        <p>Benchmark To benchmark our proposed feature vector, we compare it with previously
proposed feature vectors using word counts7 and presence/absence flags 8 of SQL tokens (see
Section 2.2). We used the 61 SQL tokens generated by the method presented in Section 4.
Algorithm In order to fairly compare the eficacy of the feature vectors described above, we
needed to use the same algorithm. Furthermore, the benchmark features have some statistical
particularities that restrict which algorithm to use. The features are correlated with each other
due to the nature of programming languages (e.g., ’SELECT’ and ’FROM’ in SQL are likely to go
together in a query). Thus, we used Decision Tree9, an algorithm family which is not sensitive
to the correlation between features, and can optimize a solution within a non-linear search
space10.</p>
        <p>Training &amp; Testing Sets We used the entire 1,819 labeled examples for the training without
hold-out, because the selection of training set example was biased by our goal to find more
SQLia examples to train the model. Thus, we did not test and validate on the training set.
Instead, we randomly selected 638 examples from the remaining 1 million URL requests, and
applied the fitted model to predict an ’http’ or ’sql’ label. In parallel, our security SMEs also
labeled the testing set for groundtruth. This way, the model is evaluated fairly, without biases
that may have been inputted from Section 5.
7A ”word count” feature vector is a vector where each word is a feature, and the value of each feature is the number
of times the word appeared in the instance (i.e., the URL request)
8’Presence/absence flags’ is a feature vector of boolean, with a token is a feature, and the parametrization is a boolean
lfag set to 1 if the token is present in the instance (i.e., URL request), and 0 otherwise.
9We used the  Python package, which implements the CART algorithm.
10In ML, the search space is the set of all possible solutions to an optimization problem.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>7. Results &amp; Discussion</title>
      <p>In this paper, we proposed a feature engineering method to calculate how much ”HTTP-like”
and ”SQL-like” are URL requests, in order to detect SQLia. These features are used as heuristic
to reduce the time and efort needed to create a data set to train an ML model for detecting
SQLia. We implemented a supervised ML model using Decision Tree to compare this feature set
with traditional feature sets (i.e., boolean flags and word counts). The notation of those feature
vectors is listed below, and used in the rest of this section.</p>
      <p>(A) HTTP &amp; SQL language mixture scores;
(B) Word Count of SQL tokens;
(C) Presence/Absence of SQL tokens.</p>
      <p>Evaluation Metrics From the ML point of view, a lot of the dificulty in evaluating ML
methods for SQLia detection is in the strong imbalance in the data set. The testing set contains
17 SQLia for 614 innocuous HTTP URL requests11, thus accuracy is not a good measure, because
even if we mislabeled all the SQL injections, we would still have 97% accuracy. Instead, we
focused on False Positive Rate (FPR) and False Negative Rate (FNR). In the rest of this paper, we
consider a ’Positive’ to be an SQL injection, and a ’Negative’ to be an innocuous HTTP URL
request, and the rest of this section will be referring to the results presented in Tables 2, 3, and
4. We observed a trade-of between model (A), (B), and (C), where model (A) is less likely to
falsely identify a legitimate HTTP URL request as an SQLia compared to model (B) and (C) (i.e.,
FPR 0.16%), while model (C) is the best at identifying SQLia (FNR 0.0%). Additionally, those
results are to be nuanced. The inspection of the HTTP URL request marked as SQLia by model
(A) revealed that the request did contain an SQL command. The SQL command is expected by
that customer’s CIAM implementation. The SQLia sample is not large enough to extrapolate an
updated FPR. Overall, model (A) had fewer misclassifications.</p>
      <p>The Lesser of Two Evils In a real-world production environment, the minimization of FPR
vs FNR will depend on the use-case. On one hand, letting through SQL command injections may
be dangerous, although we may also assume that this model would be part of a layered approach
to security. On the other hand, the friction caused to legitimate users by a large quantity12 of
false positives might become very undesirable. A preference for each feature vector will depend
on the use-case.</p>
      <p>Real-World Cost A great advantage of Decisions trees is the full explainability and coverage
of the rule set generated (if the tree is allowed to go to its full depth). Hence, the model could be
trained of-line (Computation takes less than 1 second), for nearly free, and the rules generated
by the Decision Tree could be added to the current rules of any systems. Thus, we argue that
the cost of this method is comparable to the cost of current rule-based methods.
11and 7 XSS command injection, see our discussion below. Those XSS command injections are removed from the
metrics calculation in order to strictly evaluate the model on SQLia detection
12Extrapolating the FPR of 1.63% on our original 1M URL requests would caused 16,300 requests to fail or be delayed.
XSS and other type of injections. While labeling the training and testing sets, our SMEs
found that a URL request whose mixtures are not ’HTTP-enough’ and not ’SQL-enough’ is
likely to be some other sort of command injection (e.g., template, code, os, xxe etc.). While those
other type of command injections were removed from the training set, we decided to include XSS
examples in the testing set to highlight an avenue for future work: the expansion of language
mixtures to other types of injections. From a production perspective, it is desirable to develop
one ML model capable of detection/classifying various type of injections. However, the more
injection types are added, the more confusion is introduced. For example, in Table 5, SQL and
HTTP have overlapping tokens, that is, they ”share a feature”. When using a presence/absence
or word count type of featurization, overlapping features may create ambiguity that makes the
problem more dificult for an ML model. Intuitively, a language mixture approach may help
alleviate the overlapping of markers between languages, by biasing them with their weights
(i.e., their importance within that language).</p>
      <p>Future work will focus on ’unknown unknowns’ and previously unidentified vulnerabilities.
This will include research on the language mixtures with behavioral analysis of the URL request’s
response. This will deepen the model’s understanding to identify potential zero-days13 before
they are known by correlating request and response, and their efect. For instance, this may
help organizations identify attacks such as data exfiltration, and reduce the false positive rate
on benign requests.</p>
    </sec>
    <sec id="sec-9">
      <title>Acknowledgement</title>
      <p>We thank Kim Berry for initial discussion on this approach. We thank Mathew Woodyard and
George Vauter for their initial work on labels.
13A Zero-day is a vulnerability that is not yet known, and that can be exploited.
[17] A. Makiou, Y. Begriche, A. Serhrouchni, Improving Web Application Firewalls to detect
advanced SQL injection attacks, in: 2014 10th International Conference on Information
Assurance and Security, IEEE, 2014, pp. 35–40.
[18] S. Mishra, SQL injection detection using machine learning (2019).
[19] K. Ross, M. Moh, T.-S. Moh, J. Yao, Multi-source data analysis and evaluation of machine
learning techniques for SQL injection detection, in: Proceedings of the ACMSE 2018
Conference, 2018, pp. 1–8.
[20] S. O. Uwagbole, W. J. Buchanan, L. Fan, Applied machine learning predictive analytics
to SQL injection attack detection and prevention, in: 2017 IFIP/IEEE Symposium on
Integrated Network and Service Management (IM), IEEE, 2017, pp. 1087–1090.
[21] K. Ross, SQL Injection Detection Using Machine Learning Techniques and Multiple Data
Sources, Master of Science, San Jose State University, San Jose, CA, USA, 2018. URL:
https://scholarworks.sjsu.edu/etd_projects/650. doi:10.31979/etd.zknb- 4z36.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Cve</surname>
          </string-name>
          ,
          <year>2014</year>
          . URL: http://cve.mitre.org/.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hern</surname>
          </string-name>
          ,
          <article-title>TalkTalk hit with record £400k fine over cyber-attack, The Guardian (</article-title>
          <year>2016</year>
          ). https://www.theguardian.com/business/2016/oct/05/ talktalk-hit
          <article-title>-with-record-400k-fine-over-cyber-attack.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>OWASP</given-names>
            <surname>Top</surname>
          </string-name>
          <article-title>10 for Large Language Model Applications |</article-title>
          OWASP Foundation,
          <year>2023</year>
          . URL: https://owasp.org
          <article-title>/www-project-top-10-for-large-language-model-applications.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Blei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Y.</given-names>
            <surname>Ng</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. I. Jordan</surname>
          </string-name>
          ,
          <article-title>Latent dirichlet allocation</article-title>
          ,
          <source>Journal of machine Learning research 3</source>
          (
          <year>2003</year>
          )
          <fpage>993</fpage>
          -
          <lpage>1022</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Shadowd</given-names>
            <surname>Zecure</surname>
          </string-name>
          ,
          <year>2023</year>
          . URL: https://capec.mitre.org/data/definitions/248.html.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>LibInjection</surname>
          </string-name>
          ,
          <year>2012</year>
          . URL: https://github.com/client9/libinjection/blob/master/README.md.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>ModSecurity</surname>
          </string-name>
          ,
          <year>2002</year>
          . URL: https://coreruleset.org/faq.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>LibModSecurity</surname>
          </string-name>
          ,
          <year>2002</year>
          . URL: https://github.com/SpiderLabs/ModSecurity/blob/ ec86b242e15f9df1d143c1b7f86a27889658b4cb/README.md.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Naxsi</surname>
          </string-name>
          ,
          <year>2014</year>
          . URL: https://github.com/nbs-system/naxsi/blob/master/README.md.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Fortinet</surname>
          </string-name>
          ,
          <year>2023</year>
          . URL: https://docs.fortinet.com/document/fortiweb/6.3.7/ administration-guide/193258/machine-learning.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>CloudFlare</surname>
          </string-name>
          ,
          <year>2023</year>
          . URL: https://blog.cloudflare.com/waf-ml/.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <issue>F5</issue>
          ,
          <year>2022</year>
          . URL: https://community.f5.com/t5/technical-articles/ f5-distributed
          <article-title>-cloud-waf-ai-ml-model-to-suppress-false-</article-title>
          <string-name>
            <surname>positives/</surname>
          </string-name>
          ta-p/
          <fpage>299946</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Imperva</surname>
          </string-name>
          ,
          <year>2004</year>
          . URL: https://www.imperva.com/products/attack-analytics/.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Al Rubaiei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. Al</given-names>
            <surname>Yarubi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Al</given-names>
            <surname>Saadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kumar</surname>
          </string-name>
          , SQLIA Detection and
          <article-title>Prevention Techniques</article-title>
          ,
          <source>in: 2020 9th International Conference System Modeling and Advancement in Research Trends (SMART)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>115</fpage>
          -
          <lpage>121</lpage>
          . doi:
          <volume>10</volume>
          .1109/SMART50582.
          <year>2020</year>
          .
          <volume>9336795</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>M.</given-names>
            <surname>Hasan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Balbahaith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tarique</surname>
          </string-name>
          ,
          <article-title>Detection of SQL injection attacks: A machine learning approach</article-title>
          , in: 2019 International Conference on Electrical and
          <article-title>Computing Technologies and Applications (ICECTA)</article-title>
          , IEEE,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>I.</given-names>
            <surname>Jemal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Cheikhrouhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hamam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mahfoudhi</surname>
          </string-name>
          ,
          <article-title>Sql injection attack detection and prevention techniques using machine learning</article-title>
          ,
          <source>International Journal of Applied Engineering Research</source>
          <volume>15</volume>
          (
          <year>2020</year>
          )
          <fpage>569</fpage>
          -
          <lpage>580</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>