<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>and Rytis Maskeliûnas. Identifying Phishing Attacks in Communication Networks Using
URL Consistency Features. Int. J. Electron. Secur. Digit. Forensic</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1504/ijesdf.2020.106318</article-id>
      <title-group>
        <article-title>Attention Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Simone Re</string-name>
          <email>simone.re@smricercaselezione.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matteo Olivieri</string-name>
          <email>matteo.olivieri@smricercaselezione.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ricardo Anibal Matamoros Aragon</string-name>
          <email>ricardo.matamoros@socialthingum.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandro Solinas</string-name>
          <email>alessandro.solinas@socialthingum.com</email>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesco Epifania</string-name>
          <email>francesco.epifania@socialthingum.com</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Anomaly Detection, Artificial Intelligence, E-learning, Attention Mechanism</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Milano Bicocca</institution>
          ,
          <addr-line>Milan</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Informattiva S.r.l.</institution>
          ,
          <addr-line>Milan</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Politecnico di Milano</institution>
          ,
          <addr-line>Milan</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Social Things S.r.l</institution>
          ,
          <addr-line>Milan</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>12</volume>
      <issue>2</issue>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>In today's interconnected digital landscape, the Internet plays a pivotal role in our daily human activities. However, the intricacy of the online communication network exposes vulnerabilities that can be exploited by malicious actors, who adopt increasingly sophisticated strategies to compromise cybersecurity. This issue extends to the domain of e-learning, where the protection of user personal data and the interaction with external educational resources become critical aspects. In this context, we introduce an e-learning platform developed by Informattiva, integrated with an advanced cybersecurity mechanism. This mechanism is designed to analyze educational resources from external repositories, such as Merlot.org, aiming to identify potential insecurities based on URLs. To achieve this, we implemented a model based on the Bidirectional Gated Recurrent Unit (BiGRU) with attention mechanisms, focusing on the identification of potentially malicious web addresses. Preliminary results indicate that, through bidirectional processing and attention mechanisms, our methodology has the potential to efectively diferentiate suspicious URLs from secure ones.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>In the interconnected world of today’s digital era, where everything is connected, the internet
stands as the cornerstone of modern communication and information dissemination. Its
pervasive presence in our daily lives has revolutionized the way we learn, work, and interact with the
world. Yet, as the internet continues to weave itself into the fabric of society, it concurrently
exposes us to a growing spectrum of digital threats and vulnerabilities. Cybercriminals, in their
relentless pursuit of exploiting these opportunities, constantly devise new tactics to breach our
digital security, endangering both individuals and organizations alike.</p>
      <p>
        In response to this ever-present cyber threat, the research team at Informattiva Srl has embarked
on a mission to safeguard one of the most vital sectors of our digital realm: Elearning [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. As the
demand for remote learning and virtual classrooms has surged in recent years, the availability of
open-source educational resources has grown exponentially. These resources ofer educators an
invaluable toolbox for enhancing the quality and efectiveness of their courses. Amongst this
extensive collection of educational resources, there is a concerning lack of security controls which
leaves resources vulnerable to exploitation. Recognizing this critical gap in online education,
our research team has dedicated considerable eforts to address this issue head-on. We have
developed a platform designed to empower educators with the means to fortify the security of
their educational materials. At its core, our platform leverages a sophisticated security firewall
capable of discerning malicious intentions by scrutinizing the URLs associated with online
resources [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        Researchers are currently exploring the use of machine learning for detecting malicious URLs.
One notable study by Vanhoenshoven et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] utilized a multi layer perceptron (MLP) for this
purpose. The study discovered that varying feature sets can afect the accuracy of the results
when working with the same dataset.
      </p>
      <p>
        In their study, Azeez et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] employed a naive Bayesian algorithm to identify malicious URLs
by analyzing the syntax, vocabulary, hosts, and other content of the URL present in the email.
Laughter et al. [5] incorporated the HTTP request features in the detection feature set by
considering the process of visiting the website. In recent years, the growth of deep learning has
brought new developments to detecting malicious web pages using those techniques [6]. In
particular, Recurrent Neural Networks (RNN) are considered the best-performing and therefore
most suitable models to perform anomaly detection due to their ability to capture sequential
dependencies and temporal patterns in data, making them exceptionally adept at identifying
deviations from expected patterns in various applications.
      </p>
      <p>In this paper, we shed light on an innovative approach centered around the utilization of a
Dropout Attention Bidirectional Gated Recurrent Unit (DA-BiGRU) model [7]. Our primary
focus is on identifying potentially malicious web addresses within the vast sea of online
educational resources. By harnessing the power of bidirectional processing and the precision of
attention mechanisms, our approach showcases the potential to diferentiate between suspicious
URLs and harmless ones, thereby strengthening the security of online educational content.
As we delve into the intricacies of our research, we will explore the theoretical foundations of
the DA-BiGRU model and its application in the realm of URL analysis. Through a comprehensive
examination of this model and its experimental results, we aim to contribute to the growing
body of literature addressing cybersecurity in the context of online education. Our work not
only underscores the importance of securing educational resources but also demonstrates the
transformative potential of cutting-edge machine learning techniques in the fight against digital
threats [8].</p>
      <p>In the following sections, we delve deeper into the methodology, results, and implications of
our research, ofering insights and recommendations that can pave the way for a safer and more
secure online learning environment.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Datasets Utilized for Anomaly Detection in URL Analysis</title>
      <p>When it comes to detecting anomalies in URLs, the data that is selected is crucial for keeping
online systems and networks secure. It is important to have a good understanding of the typical
patterns and behaviors of URLs so that any potentially harmful or unusual web trafic can be
identified and dealt with. Having accurate and thorough data enables the anomaly detection
algorithms to distinguish between genuine website interactions and suspicious activity, which
helps to improve cybersecurity eforts and guard against eventual threats. To this end, we
selected two open-source datasets from Kaggle.com about malicious URLs.</p>
      <p>First, we used the Malicious URLs dataset[9], which contains 651,191 URLs with 34% of anomalies.
This dataset will be divided into train, validation, and test. Then as an additional test and as
proof of the model’s scalability, we utilized the Malicious_n_Non-Malicious URL dataset [10]
which is composed of 411,247 URLs and 18% of anomalies. The algorithm under consideration
was validated using the previously described datasets. Subsequently, it was applied to the
dataset from Merlot.org. This latter dataset represents a fundamental resource for the e-learning
platform developed by Informattiva, allowing users to enrich and customize their educational
paths by integrating external educational resources.</p>
    </sec>
    <sec id="sec-4">
      <title>3. Model in-depth</title>
      <p>In this chapter, we will summarize the DA-BiGRU attention model architecture, diving into the
details of some key aspects. The meaning of the symbols used in this section is summarized in
Table 1</p>
      <sec id="sec-4-1">
        <title>3.1. BiGRU architecture</title>
        <p>Introduced by Cho, et al. [11] in 2014, GRU aims to solve the vanishing gradient problem that
comes with a standard recurrent neural network. Its introduction was made as an improvement
of the Long Short-Term Memory (LSTM) architecture. The key components of GRU, summarized
in Figure 2, are:
• Hidden State: to capture information from previous steps GRU maintains a hidden state
ℎ , as in traditional RNNs
• Update Gate: this is a crucial component of GRU that controls how much of the previous
hidden state should be retained.</p>
        <p>It is computed through a sigmoid, as:
  =  (  ⋅   +  ℎ ⋅ ℎ−1 +   )
(1)
• Reset Gate: the reset gate determines how much of the previous hidden state should be
reset or forgotten when computing the new candidate state. It is computed similarly to
the update gate:
  =  (  ⋅   +  ℎ ⋅ ℎ−1 +   )
(2)
  ,  ℎ ,   ,  ℎ</p>
        <p>,  
ℎ−1


ℎ̃  = tanh( ℎ ⋅   +  ℎℎ ⋅ (ℎ−1 ⊙   ) ℎ)</p>
        <p>ℎ = ℎ−1 ⊙   + (1 −   ) ⊙ ℎ̃ 
and then it is combined with the update gate in the computation of the new hidden state ℎ as
follows:
(3)
(4)
candidate hidden state (ℎ̃  ).</p>
        <p>From the last equation, is evident how the update gate (  ) impacts the new hidden state. When
it is closer to 1, the model retains most of the information of the hidden state at the previous
timestamp (ℎ−1 ), while if it approaches 0 most of the informations are retained from the
identify patterns and relationships between diferent parts of the URL, such as domain names,
subdomains, and query parameters. This bidirectional processing enables it to understand how
diferent components of the URL relate to each other and extract valuable features for tasks like
URL classification, parsing, or anomaly detection. Additionally, Bi-GRU’s ability to model both
past and future context ensures a comprehensive understanding of the URL.</p>
      </sec>
      <sec id="sec-4-2">
        <title>3.2. Attention Mechanism for Enhanced URL Segment Analysis:</title>
      </sec>
      <sec id="sec-4-3">
        <title>Mathematical Formulation</title>
        <p>URLs vary in structure across diferent locations, necessitating distinct specifications. An
attention mechanism is introduced to comprehend the interdependence of words or symbols
across diverse URL segments. This attention mechanism filters out irrelevant content and
prioritizes crucial URL information, enhancing data utilization and ultimately elevating model
accuracy [13, 14]. The mathematical formulation for this process is detailed below.
  =    (  ⋅   )
  =
exp(  )</p>
        <p>∑=1  

=1
 ∗ = ∑   ⋅  
(5)
(6)
(7)
obtained from element-wise multiplication of the input and attention vectors.
In Equation 5, the attention vector is computed using the input information at time t   , the
learned weight matrices   ,   and a hyperbolic tangent (tanh) as activation. Then the vector
is normalized through a softmax function, as can be seen in
Equation 7. Finally, the output  ∗ is</p>
      </sec>
      <sec id="sec-4-4">
        <title>3.3. Dropout mechanism</title>
        <p>Dropout is a regularization technique commonly employed in deep learning models to prevent
overfitting. During training, it randomly deactivates a fraction of neurons or units in a
neural network, efectively dropping them out, which encourages the network to become more
robust and generalize better to unseen data. This stochastic dropout process helps prevent
co-dependencies between neurons and promotes a more robust and reliable model.</p>
      </sec>
      <sec id="sec-4-5">
        <title>3.4. Model structure</title>
        <p>Within the context of deep learning architectures, the DA-BiGRU model emerges as a particularly
advanced solution, characterized by a complex yet highly efective structure. The initial phase
of the processing involves the preprocessing of input URLs. This critical phase employs the
Word2Vec technique [15], a model renowned for its ability to transform text sequences into
dense vector representations, commonly known as ”embeddings”. These embeddings allow
the URL text to be represented in a format that can subsequently be processed by the model,
ensuring a coherent and informative semantic representation.</p>
        <p>Following this transformation phase, the input is introduced into a dropout layer. This layer,
positioned before the BiGRU architecture, serves to prevent overfitting and enhance the model’s
robustness. Within the BiGRU architecture, forward and backward propagation of the hidden
state occurs, enabling the model to capture and process the temporal dependencies present in
the data.</p>
        <p>The output from the BiGRU structure is then fed into an attention layer. This layer plays a
pivotal role in identifying and emphasizing the most relevant and pertinent features of the
input, ensuring that the model focuses on the most informative aspects of the URLs.
Finally, the process concludes with a fully connected layer followed by a softmax function. This
combination is responsible for the final classification, allowing the model to categorize the URLs
based on the features learned during the training phase.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Results</title>
      <p>In the subsequent section, we delineate the outcomes procured during the model evaluation
phase. Specifically, the model underwent training for 30 epochs, employing binary cross-entropy
as the designated loss function, complemented by the Adam optimizer with a learning rate set
at 10−3.</p>
      <p>Throughout the progression of each epoch, we meticulously observed pivotal performance
indicators, encompassing loss, accuracy, precision, and recall, for both the training and
validation datasets. A graphical representation of these metrics can be referenced in Figure 3. It’s
imperative to highlight that the model’s preservation is predicated on the optimal validation
loss, thereby rendering any overfitting tendencies in the concluding epochs inconsequential.
The output of the model under consideration extends within the range between 0 and 1, where
a higher value suggests a greater likelihood that a given sample is identified as an anomaly.
To precisely determine which samples to categorize as anomalies, a specific threshold was
defined. In this scenario, the precision metric, representing the ratio between true positive
instances and the set of instances predicted as positive, assumes paramount importance. The
primary objective was to favour precision over recall, with the intent to limit the number of
false positives and prevent suboptimal resource allocation. After a weighted analysis, it was
deduced that a threshold of 0.99 represents the ideal balance, efectively classifying samples as
anomalies if their probability exceeds this value. The choice of this threshold aligns with the
aim of ensuring a high level of reliability in anomaly detection while simultaneously reducing
the danger of excluding valuable resources due to erroneous identifications.</p>
      <p>In Table 2, we provide a detailed exposition of the metrics computed on the aforementioned
distinct test samples. The composition of the second dataset was intentionally skewed,
encompassing a mere 5% anomalies. This disproportionate dataset was meticulously curated to
test the model’s proficiency in anomaly detection under conditions that emulate real-world
scenarios. Upon scrutinizing the outcomes, it is evident that for the inaugural dataset, our
model manifests commendable eficacy, accurately categorizing 94% of websites. Notably, it
evinces the adeptness to pinpoint 87% of malevolent websites. Moreover, the probability that
websites adjudged as malicious by the model are indeed malicious stands at an impressive 94.5%.
Transitioning to the evaluation on the second, highly imbalanced dataset, our model sustains
elevated levels of accuracy and precision, with both metrics consistently surpassing the 90% mark.
However, a marked diminution in recall is discernible. This attenuation in recall is attributable
to our judicious selection of the threshold, a parameter that ofers potential for optimization
contingent on specific objectives. To elucidate, by hypothetically calibrating the threshold
to 0.5, we attain a recall rate of 80%. This recalibration, nonetheless, incurs a decrement in
precision, plummeting it to 88%. The determination of an optimal threshold necessitates a
strategic balance between recall and precision, contingent upon the bespoke requirements and
inherent limitations of the application in question.</p>
    </sec>
    <sec id="sec-6">
      <title>5. Conclusions</title>
      <p>In conclusion, our investigations have illuminated significant insights regarding the
enhancement of security measures applied to digital educational resources in today’s interconnected
online environment. While the Internet stands as a pivotal medium for education and
communication, it subjects us to ever-evolving cyber threats, necessitating the adoption of proactive
strategies to counter potential malicious activities. In response to this challenge, our research
group has devised an advanced firewall system, specifically aimed at bolstering the security of
educational content. Despite the widespread availability of open-source educational resources,
the absence of adequate security controls has rendered such resources susceptible to exploitation.
Our analysis has centered on the adoption of a Bidirectional Gated Recurrent Unit (BiGRU)
attention model, expressly designed for the identification of potentially harmful web addresses.
Leveraging the capabilities of bidirectional processing and attention mechanisms, the proposed
methodology has showcased considerable potential in distinguishing between innocuous and
potentially dangerous URLs. The results obtained underscore the essentiality of employing
advanced machine learning methodologies in the realm of cybersecurity for educational resources.
Such integration has facilitated significant advancements in strengthening the digital learning
environment. Looking forward, the importance of continuous optimization of our model is
evident, along with the need to modulate detection thresholds based on the specific security
requirements of educational platforms and various digital contexts. As we continue refining our
approach, we remain steadfast in our commitment to enhancing security measures in the digital
age, with the aim of ensuring educators and learners can optimally utilize online resources in a
context of trust and serenity.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Bhatia</surname>
            , Meghna, and
            <given-names>J. K.</given-names>
          </string-name>
          <string-name>
            <surname>Maitra.</surname>
          </string-name>
          ”
          <article-title>E-learning platforms security issues and vulnerability analysis</article-title>
          .
          <source>” 2018 International Conference on Computational and Characterization Techniques in Engineering &amp; Sciences (CCTES)</source>
          . IEEE,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Tamjidyamcholo</surname>
          </string-name>
          ,
          <string-name>
            <surname>Alireza</surname>
          </string-name>
          , et al. ”
          <article-title>Evaluation model for knowledge sharing in information security professional virtual community</article-title>
          .
          <source>” Computers &amp; Security</source>
          <volume>43</volume>
          (
          <year>2014</year>
          ):
          <fpage>19</fpage>
          -
          <lpage>34</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Malak</given-names>
            <surname>Aljabri</surname>
          </string-name>
          , Hanan S. Altamimi, Shahd A.
          <string-name>
            <surname>Albelali</surname>
          </string-name>
          , Maimunah Al-Harbi, Haya T. Alhuraib,
          <string-name>
            <surname>Najd K. Alotaibi</surname>
          </string-name>
          , Amal A.
          <string-name>
            <surname>Alahmadi</surname>
          </string-name>
          , Fahd Alhaidari, Rami Mustafa A.
          <string-name>
            <surname>Mohammad</surname>
            , and
            <given-names>Khaled</given-names>
          </string-name>
          <string-name>
            <surname>Salah</surname>
          </string-name>
          .
          <source>Detecting Malicious URLs Using Machine Learning Techniques: Review and Research Directions. IEEE Access</source>
          , Volume
          <volume>10</volume>
          ,
          <year>2022</year>
          , Pages
          <fpage>121395</fpage>
          -
          <lpage>121417</lpage>
          . DOI:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2022</year>
          .
          <volume>3222307</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Nureni</given-names>
            <surname>Ayofe</surname>
          </string-name>
          <string-name>
            <surname>Azeez</surname>
          </string-name>
          , Balikis Bolanle Salaudeen, Sanjay Misra, Robertas Damaševièius,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>