Enhancing Online Educational Resource Security with BiGRU Attention Models Simone Re1,∗,† , Matteo Olivieri1,∗,† , Ricardo Anibal Matamoros Aragon2,3,∗,† , Alessandro Solinas2,4,∗,† and Francesco Epifania2 1 Informattiva S.r.l., Milan, Italy 2 Social Things S.r.l, Milan, Italy 3 Department of Computer Science, University of Milano Bicocca, Milan, Italy 4 Politecnico di Milano, Milan, Italy Abstract In today’s interconnected digital landscape, the Internet plays a pivotal role in our daily human activities. However, the intricacy of the online communication network exposes vulnerabilities that can be exploited by malicious actors, who adopt increasingly sophisticated strategies to compromise cybersecurity. This issue extends to the domain of e-learning, where the protection of user personal data and the interaction with external educational resources become critical aspects. In this context, we introduce an e-learning platform developed by Informattiva, integrated with an advanced cybersecurity mechanism. This mechanism is designed to analyze educational resources from external repositories, such as Merlot.org, aiming to identify potential insecurities based on URLs. To achieve this, we implemented a model based on the Bidirectional Gated Recurrent Unit (BiGRU) with attention mechanisms, focusing on the identification of potentially malicious web addresses. Preliminary results indicate that, through bidirectional processing and attention mechanisms, our methodology has the potential to effectively differentiate suspicious URLs from secure ones. Keywords Anomaly Detection, Artificial Intelligence, E-learning, Attention Mechanism 1. Introduction In the interconnected world of today’s digital era, where everything is connected, the internet stands as the cornerstone of modern communication and information dissemination. Its perva- sive presence in our daily lives has revolutionized the way we learn, work, and interact with the world. Yet, as the internet continues to weave itself into the fabric of society, it concurrently exposes us to a growing spectrum of digital threats and vulnerabilities. Cybercriminals, in their relentless pursuit of exploiting these opportunities, constantly devise new tactics to breach our AIABI 2023: 3rd Italian Workshop on Artificial Intelligence and Applications for Business and Industries, November 9, 2023, Milano, Italy ∗ Corresponding author. † These authors contributed equally. Envelope-Open simone.re@smricercaselezione.com (S. Re); matteo.olivieri@smricercaselezione.com (M. Olivieri); ricardo.matamoros@socialthingum.com (R. A. M. Aragon); alessandro.solinas@socialthingum.com (A. Solinas); francesco.epifania@socialthingum.com (F. Epifania) Orcid 0000-0002-1957-2530 (R. A. M. Aragon); 0000-0002-5428-3187 (A. Solinas) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings digital security, endangering both individuals and organizations alike. In response to this ever-present cyber threat, the research team at Informattiva Srl has embarked on a mission to safeguard one of the most vital sectors of our digital realm: Elearning [1]. As the demand for remote learning and virtual classrooms has surged in recent years, the availability of open-source educational resources has grown exponentially. These resources offer educators an invaluable toolbox for enhancing the quality and effectiveness of their courses. Amongst this ex- tensive collection of educational resources, there is a concerning lack of security controls which leaves resources vulnerable to exploitation. Recognizing this critical gap in online education, our research team has dedicated considerable efforts to address this issue head-on. We have developed a platform designed to empower educators with the means to fortify the security of their educational materials. At its core, our platform leverages a sophisticated security firewall capable of discerning malicious intentions by scrutinizing the URLs associated with online resources [2]. Researchers are currently exploring the use of machine learning for detecting malicious URLs. One notable study by Vanhoenshoven et al. [3] utilized a multi layer perceptron (MLP) for this purpose. The study discovered that varying feature sets can affect the accuracy of the results when working with the same dataset. In their study, Azeez et al. [4] employed a naive Bayesian algorithm to identify malicious URLs by analyzing the syntax, vocabulary, hosts, and other content of the URL present in the email. Laughter et al. [5] incorporated the HTTP request features in the detection feature set by considering the process of visiting the website. In recent years, the growth of deep learning has brought new developments to detecting malicious web pages using those techniques [6]. In particular, Recurrent Neural Networks (RNN) are considered the best-performing and therefore most suitable models to perform anomaly detection due to their ability to capture sequential dependencies and temporal patterns in data, making them exceptionally adept at identifying deviations from expected patterns in various applications. In this paper, we shed light on an innovative approach centered around the utilization of a Dropout Attention Bidirectional Gated Recurrent Unit (DA-BiGRU) model [7]. Our primary focus is on identifying potentially malicious web addresses within the vast sea of online edu- cational resources. By harnessing the power of bidirectional processing and the precision of attention mechanisms, our approach showcases the potential to differentiate between suspicious URLs and harmless ones, thereby strengthening the security of online educational content. As we delve into the intricacies of our research, we will explore the theoretical foundations of the DA-BiGRU model and its application in the realm of URL analysis. Through a comprehensive examination of this model and its experimental results, we aim to contribute to the growing body of literature addressing cybersecurity in the context of online education. Our work not only underscores the importance of securing educational resources but also demonstrates the transformative potential of cutting-edge machine learning techniques in the fight against digital threats [8]. In the following sections, we delve deeper into the methodology, results, and implications of our research, offering insights and recommendations that can pave the way for a safer and more secure online learning environment. 2. Datasets Utilized for Anomaly Detection in URL Analysis When it comes to detecting anomalies in URLs, the data that is selected is crucial for keeping online systems and networks secure. It is important to have a good understanding of the typical patterns and behaviors of URLs so that any potentially harmful or unusual web traffic can be identified and dealt with. Having accurate and thorough data enables the anomaly detection algorithms to distinguish between genuine website interactions and suspicious activity, which helps to improve cybersecurity efforts and guard against eventual threats. To this end, we selected two open-source datasets from Kaggle.com about malicious URLs. First, we used the Malicious URLs dataset[9], which contains 651,191 URLs with 34% of anomalies. This dataset will be divided into train, validation, and test. Then as an additional test and as proof of the model’s scalability, we utilized the Malicious_n_Non-Malicious URL dataset [10] which is composed of 411,247 URLs and 18% of anomalies. The algorithm under consideration was validated using the previously described datasets. Subsequently, it was applied to the dataset from Merlot.org. This latter dataset represents a fundamental resource for the e-learning platform developed by Informattiva, allowing users to enrich and customize their educational paths by integrating external educational resources. 3. Model in-depth In this chapter, we will summarize the DA-BiGRU attention model architecture, diving into the details of some key aspects. The meaning of the symbols used in this section is summarized in Table 1 3.1. BiGRU architecture Introduced by Cho, et al. [11] in 2014, GRU aims to solve the vanishing gradient problem that comes with a standard recurrent neural network. Its introduction was made as an improvement of the Long Short-Term Memory (LSTM) architecture. The key components of GRU, summarized in Figure 2, are: • Hidden State: to capture information from previous steps GRU maintains a hidden state ℎ𝑡 , as in traditional RNNs • Update Gate: this is a crucial component of GRU that controls how much of the previous hidden state should be retained. It is computed through a sigmoid, as: 𝑧𝑡 = 𝜎 (𝑊𝑥𝑧 ⋅ 𝑥𝑡 + 𝑊ℎ𝑧 ⋅ ℎ𝑡−1 + 𝑏𝑧 ) (1) • Reset Gate: the reset gate determines how much of the previous hidden state should be reset or forgotten when computing the new candidate state. It is computed similarly to the update gate: 𝑟𝑡 = 𝜎 (𝑊𝑥𝑟 ⋅ 𝑥𝑡 + 𝑊ℎ𝑟 ⋅ ℎ𝑡−1 + 𝑏𝑟 ) (2) Symbol Description 𝜎 Activation function 𝑊𝑥𝑟 , 𝑊ℎ𝑟 , 𝑊𝑥𝑧 , 𝑊ℎ𝑧 Weight parameters 𝑏𝑟 , 𝑏𝑧 Bias parameters ℎ𝑡−1 Hidden state in the previous timestamp 𝑥𝑡 Current input Table 1 Symbols meaning The candidate hidden state is computed as: ℎ̃ 𝑡 = tanh(𝑊𝑧ℎ ⋅ 𝑥𝑡 + 𝑊ℎℎ ⋅ (ℎ𝑡−1 ⊙ 𝑟𝑡 )𝑏ℎ ) (3) and then it is combined with the update gate in the computation of the new hidden state ℎ𝑡 as follows: ℎ𝑡 = ℎ𝑡−1 ⊙ 𝑧𝑡 + (1 − 𝑧𝑡 ) ⊙ ℎ̃ 𝑡 (4) From the last equation, is evident how the update gate (𝑧𝑡 ) impacts the new hidden state. When it is closer to 1, the model retains most of the information of the hidden state at the previous timestamp (ℎ𝑡−1 ), while if it approaches 0 most of the informations are retained from the candidate hidden state (ℎ̃ 𝑡 ). Figure 1: GRU architecture [12] Bi-GRU combines two separate GRUs that process the input sequence in both the forward and backward directions simultaneously, enabling the model to capture contextual information from both past and future data points, making it particularly useful in tasks where understanding bidirectional context is crucial. By processing the URL sequence bidirectionally, Bi-GRU can identify patterns and relationships between different parts of the URL, such as domain names, subdomains, and query parameters. This bidirectional processing enables it to understand how different components of the URL relate to each other and extract valuable features for tasks like URL classification, parsing, or anomaly detection. Additionally, Bi-GRU’s ability to model both past and future context ensures a comprehensive understanding of the URL. 3.2. Attention Mechanism for Enhanced URL Segment Analysis: Mathematical Formulation URLs vary in structure across different locations, necessitating distinct specifications. An attention mechanism is introduced to comprehend the interdependence of words or symbols across diverse URL segments. This attention mechanism filters out irrelevant content and prioritizes crucial URL information, enhancing data utilization and ultimately elevating model accuracy [13, 14]. The mathematical formulation for this process is detailed below. 𝑒𝑡 = 𝑊 𝑇 𝜎 (𝑊𝑙 ⋅ 𝑥𝑡 ) (5) exp(𝑒𝑡 ) 𝑞𝑡 = 𝑇 (6) ∑𝑡=1 𝑒𝑡 𝑇 ∗ 𝑥𝑡 = ∑ 𝑞𝑡 ⋅ 𝑥𝑡 (7) 𝑡=1 In Equation 5, the attention vector is computed using the input information at time t 𝑥𝑡 , the learned weight matrices 𝑊𝑙 , 𝑊 𝑇 and a hyperbolic tangent (tanh) as activation. Then the vector is normalized through a softmax function, as can be seen in Equation 7. Finally, the output 𝑥𝑡∗ is obtained from element-wise multiplication of the input and attention vectors. 3.3. Dropout mechanism Dropout is a regularization technique commonly employed in deep learning models to prevent overfitting. During training, it randomly deactivates a fraction of neurons or units in a neu- ral network, effectively dropping them out, which encourages the network to become more robust and generalize better to unseen data. This stochastic dropout process helps prevent co-dependencies between neurons and promotes a more robust and reliable model. 3.4. Model structure Within the context of deep learning architectures, the DA-BiGRU model emerges as a particularly advanced solution, characterized by a complex yet highly effective structure. The initial phase of the processing involves the preprocessing of input URLs. This critical phase employs the Word2Vec technique [15], a model renowned for its ability to transform text sequences into dense vector representations, commonly known as ”embeddings”. These embeddings allow the URL text to be represented in a format that can subsequently be processed by the model, ensuring a coherent and informative semantic representation. Following this transformation phase, the input is introduced into a dropout layer. This layer, positioned before the BiGRU architecture, serves to prevent overfitting and enhance the model’s robustness. Within the BiGRU architecture, forward and backward propagation of the hidden state occurs, enabling the model to capture and process the temporal dependencies present in the data. The output from the BiGRU structure is then fed into an attention layer. This layer plays a pivotal role in identifying and emphasizing the most relevant and pertinent features of the input, ensuring that the model focuses on the most informative aspects of the URLs. Finally, the process concludes with a fully connected layer followed by a softmax function. This combination is responsible for the final classification, allowing the model to categorize the URLs based on the features learned during the training phase. Figure 2: DA-BiGRU architecture [7] 4. Results In the subsequent section, we delineate the outcomes procured during the model evaluation phase. Specifically, the model underwent training for 30 epochs, employing binary cross-entropy as the designated loss function, complemented by the Adam optimizer with a learning rate set at 10−3 . Throughout the progression of each epoch, we meticulously observed pivotal performance indicators, encompassing loss, accuracy, precision, and recall, for both the training and valida- tion datasets. A graphical representation of these metrics can be referenced in Figure 3. It’s imperative to highlight that the model’s preservation is predicated on the optimal validation loss, thereby rendering any overfitting tendencies in the concluding epochs inconsequential. The output of the model under consideration extends within the range between 0 and 1, where a higher value suggests a greater likelihood that a given sample is identified as an anomaly. To precisely determine which samples to categorize as anomalies, a specific threshold was defined. In this scenario, the precision metric, representing the ratio between true positive instances and the set of instances predicted as positive, assumes paramount importance. The Figure 3: Model’s loss and precision over epochs primary objective was to favour precision over recall, with the intent to limit the number of false positives and prevent suboptimal resource allocation. After a weighted analysis, it was deduced that a threshold of 0.99 represents the ideal balance, effectively classifying samples as anomalies if their probability exceeds this value. The choice of this threshold aligns with the aim of ensuring a high level of reliability in anomaly detection while simultaneously reducing the danger of excluding valuable resources due to erroneous identifications. In Table 2, we provide a detailed exposition of the metrics computed on the aforementioned distinct test samples. The composition of the second dataset was intentionally skewed, en- compassing a mere 5% anomalies. This disproportionate dataset was meticulously curated to test the model’s proficiency in anomaly detection under conditions that emulate real-world scenarios. Upon scrutinizing the outcomes, it is evident that for the inaugural dataset, our model manifests commendable efficacy, accurately categorizing 94% of websites. Notably, it evinces the adeptness to pinpoint 87% of malevolent websites. Moreover, the probability that websites adjudged as malicious by the model are indeed malicious stands at an impressive 94.5%. Transitioning to the evaluation on the second, highly imbalanced dataset, our model sustains el- evated levels of accuracy and precision, with both metrics consistently surpassing the 90% mark. However, a marked diminution in recall is discernible. This attenuation in recall is attributable to our judicious selection of the threshold, a parameter that offers potential for optimization contingent on specific objectives. To elucidate, by hypothetically calibrating the threshold to 0.5, we attain a recall rate of 80%. This recalibration, nonetheless, incurs a decrement in precision, plummeting it to 88%. The determination of an optimal threshold necessitates a strategic balance between recall and precision, contingent upon the bespoke requirements and inherent limitations of the application in question. Dataset Accuracy Precision Recall Test 1 0.9404 0.9456 0.8728 Test 2 0.9220 0.9275 0.6889 Table 2 Summary of the model’s metrics 5. Conclusions In conclusion, our investigations have illuminated significant insights regarding the enhance- ment of security measures applied to digital educational resources in today’s interconnected online environment. While the Internet stands as a pivotal medium for education and commu- nication, it subjects us to ever-evolving cyber threats, necessitating the adoption of proactive strategies to counter potential malicious activities. In response to this challenge, our research group has devised an advanced firewall system, specifically aimed at bolstering the security of educational content. Despite the widespread availability of open-source educational resources, the absence of adequate security controls has rendered such resources susceptible to exploitation. Our analysis has centered on the adoption of a Bidirectional Gated Recurrent Unit (BiGRU) attention model, expressly designed for the identification of potentially harmful web addresses. Leveraging the capabilities of bidirectional processing and attention mechanisms, the proposed methodology has showcased considerable potential in distinguishing between innocuous and potentially dangerous URLs. The results obtained underscore the essentiality of employing ad- vanced machine learning methodologies in the realm of cybersecurity for educational resources. Such integration has facilitated significant advancements in strengthening the digital learning environment. Looking forward, the importance of continuous optimization of our model is evident, along with the need to modulate detection thresholds based on the specific security requirements of educational platforms and various digital contexts. As we continue refining our approach, we remain steadfast in our commitment to enhancing security measures in the digital age, with the aim of ensuring educators and learners can optimally utilize online resources in a context of trust and serenity. References [1] Bhatia, Meghna, and J. K. Maitra. ”E-learning platforms security issues and vulnerability analysis.” 2018 International Conference on Computational and Characterization Techniques in Engineering & Sciences (CCTES). IEEE, 2018. [2] Tamjidyamcholo, Alireza, et al. ”Evaluation model for knowledge sharing in information security professional virtual community.” Computers & Security 43 (2014): 19-34. [3] Malak Aljabri, Hanan S. Altamimi, Shahd A. Albelali, Maimunah Al-Harbi, Haya T. Alhuraib, Najd K. Alotaibi, Amal A. Alahmadi, Fahd Alhaidari, Rami Mustafa A. Mohammad, and Khaled Salah. Detecting Malicious URLs Using Machine Learning Techniques: Review and Research Directions. IEEE Access, Volume 10, 2022, Pages 121395-121417. DOI: 10.1109/AC- CESS.2022.3222307. [4] Nureni Ayofe Azeez, Balikis Bolanle Salaudeen, Sanjay Misra, Robertas Damaševièius, and Rytis Maskeliûnas. Identifying Phishing Attacks in Communication Networks Using URL Consistency Features. Int. J. Electron. Secur. Digit. Forensic, Volume 12, Number 2, January 2020, Pages 200-213. ISSN: 1751-911X. DOI: 10.1504/ijesdf.2020.106318. URL: https: //doi.org/10.1504/ijesdf.2020.106318. [5] Ashley Laughter, Safwan Omari, Piotr Szczurek, and Jason Perry. Detection of Malicious HTTP Requests Using Header and URL Features. In: Advances in Digital Forensics XVI, Year 2021, Month January, Pages 449-468. ISBN: 978-3-030-63088-1. DOI: 10.1007/978-3-030- 63089-8_29. [6] Hou, Yung-Tsung, et al. ”Malicious web content detection by machine learning.” expert systems with applications 37.1 (2010): 55-60. [7] Tiefeng Wu, Miao Wang, Yunfang Xi, and Zhichao Zhao. Malicious URL Detection Model Based on Bidirectional Gated Recurrent Unit and Attention Mechanism. Applied Sciences, Volume 12, Number 23, 2022, Article Number 12367. ISSN: 2076-3417. DOI: 10.3390/app122312367. URL: https://www.mdpi.com/2076-3417/12/23/12367. [8] Musser, Micah, and Ashton Garriott. ”Machine learning and cybersecurity.” Center for Security and Emerging Technology: Washington, DC, USA (2021). [9] Manu Siddhartha. (2016). Malicious URLs dataset, version 1. Retrieved in 2023 from https: //www.kaggle.com/datasets/sid321axn/malicious-urls-dataset. [10] antonyj. (2017). Malicious_n_Non-Malicious URL, version 1. Retrieved in 2023 from https: //www.kaggle.com/datasets/antonyj453/urldataset. [11] Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning Phrase Representations using {RNN} Encoder{–}Decoder for Statistical Machine Translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing ({EMNLP}), Doha, Qatar, Oc- tober 2014. Publisher: Association for Computational Linguistics. DOI: 10.3115/v1/D14-1179. URL: https://aclanthology.org/D14-1179. Pages 1724-1734. [12] Pengpeng Li, An Luo, Jiping Liu, Yong Wang, Jun Zhu, Yue Deng, and Junjie Zhang. Bidirectional Gated Recurrent Unit Neural Network for Chinese Address Element Segmentation. ISPRS International Journal of Geo-Information, Volume 9, Number 11, 2020, Article Number 635. ISSN: 2220-9964. DOI: 10.3390/ijgi9110635. URL: https://www.mdpi.com/2220-9964/9/ 11/635. [13] Chorowski, Jan K., et al. ”Attention-based models for speech recognition.” Advances in neural information processing systems 28 (2015). [14] Bahdanau, Dzmitry, et al. ”End-to-end attention-based large vocabulary speech recog- nition.” 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2016. [15] Tomas Mikolov, Kai Chen, G.s Corrado, and Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space. Proceedings of Workshop at ICLR, Volume 2013, January 2013.