=Paper=
{{Paper
|id=Vol-3682/Paper13
|storemode=property
|title=Leveraging Cloud Computing for Drug Review Analysis
|pdfUrl=https://ceur-ws.org/Vol-3682/Paper13.pdf
|volume=Vol-3682
|authors=Shiva Teja Pecheti,Nithin Kodurupaka,Basavadeepthi H M,Talari Tanvi,Beena B M
|dblpUrl=https://dblp.org/rec/conf/sci2/PechetiKMTM24
}}
==Leveraging Cloud Computing for Drug Review Analysis==
<pdf width="1500px">https://ceur-ws.org/Vol-3682/Paper13.pdf</pdf>
<pre>
                                Leveraging Cloud Computing for Drug Review Analysis
                                Shiva Teja Pecheti1, *, Nithin Kodurupaka1, Basavadeepthi H M1, Talari Tanvi1, and
                                Dr. Beena B.M. 1

                                1 Department of Computer Science & Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham,

                                Bangalore, Karnataka, India, 560035

                                               Abstract
                                               This research focuses on creating a Drug Information and Recommendation
                                               System using Amazon Web Services (AWS) for sentiment analysis of
                                               pharmaceutical reviews. The methodology encompasses data collection,
                                               preprocessing, sentiment analysis, and drug prediction. AWS services such as
                                               EC2, S3, and IAM are utilized to ensure compatibility, scalability, and security.
                                               The resultant platform, Sentiment AI, is deployed seamlessly on AWS,
                                               demonstrating efficient resource utilization and real-time monitoring through
                                               CloudWatch. This work signifies a significant step forward in pharmaceutical
                                               sentiment analysis, harnessing the power of cloud computing to provide
                                               reliable and effective insights into user perceptions of medications.

                                               Keywords
                                               Sentiment analysis, Amazon Web Services (AWS), Natural Language Processing (NLP), User-
                                               generated reviews, Cloud services, Pharmaceuticals.


                                1. Introduction
                                   Pharmaceutical product awareness could be completely transformed in the ever-
                                changing healthcare industry by combining state-of-the-art technology with insightful data
                                analysis. The goal of this research is to create a sophisticated drug information and
                                recommendation system by combining cloud computing’s transformative capabilities with
                                sentiment analysis and drug prediction. The primary goal is to provide consumers and
                                healthcare professionals with an effective tool that can extract insightful information from
                                an extensive collection of pharmaceutical reviews, eventually leading to better informed
                                decision-making. Our approach is based on our belief that utilizing cloud computing,
                                particularly via Amazon Web Services (AWS), can greatly improve the security, scalability,
                                and effectiveness of sentiment analysis models used to analyze medication reviews. We
                                propose that integrating cloud services presents a complete solution for retrieving vital
                                pharmacological information, while also enhancing technical capabilities and streamlining
                                deployment and management procedures.


                                Symposium on Computing & Intelligent Systems (SCI), May 10, 2024, New Delhi, INDIA
                                ∗ Corresponding author.
                                † These authors contributed equally.

                                   shivatejapecheti@gmail.com (S. T. Pecheti); nithinkodurupaka@gmail.com (N. Kodurupaka);
                                basavadeepthihm@gmail.com (B. H. M); tanvitalari2002@gmail.com (T. Tanvi); bm_beena@blr.amrita.edu(Dr.
                                B. B.M.)
                                             © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
    The methodology starts with careful collection of data from several web platforms,
which prepares the background for a precise preprocessing stage. This essential phase
consists of eliminating unnecessary words, tokenizing, changing text to lowercase, and
cleaning HTML tags. primary objective is to improve the textual data’s consistency and
quality, which will provide an appropriate foundation for further analysis. Sentiment
analysis and the drug prediction were the system’s primary objectives. Sentiment analysis
utilizes extensive natural language processing (NLP) techniques to identify sentiments in
drug reviews and categorize them into neutral, positive, and negative categories.
Additionally, the drug prediction module adds a layer of predictive insight to the system by
classifying drugs according to characteristics taken from the preprocessed text data,
generated by machine learning models. The outcomes of drug prediction and sentiment
analysis are combined and stored in CSV files, and these are kept as essential parts of the
medication Information and Recommendation System. Most importantly, we also
investigate cloud computing, specifically focusing on AWS services. Restructuring the
sentiment analysis approach makes it easily integrated with the AWS architecture. Since the
parts are systematically arranged into modular modules, the deployment procedure can be
accelerated. System efficiency is improved by optimizing data storage configurations
utilizing Amazon Simple Storage Service (S3) together with customized settings for Amazon
Elastic Compute Cloud (EC2) instances. In addition to these technological developments,
the study highlights the significance of the user interface. This application, which is a
representation of Sentiment AI, acts as an interface for users to engage with the abundance
of data obtained from reviews of pharmaceuticals. It has capabilities including obtaining
comprehensive information about pharmaceuticals and looking up medication suggestions
depending on medical conditions. With the help of cloud-backed analytics and a user-
friendly interface, there is an exceptional opportunity to bridge the knowledge gap between
end users and complex data, giving up important insights to a wider audience.


2. Related Work
   The reviewed literature encompasses a broad spectrum of advancements in
computational biology, bioinformatics, and healthcare applications. Notably, studies have
been conducted to enhance drug discovery and prediction of drug-related interactions.
Innovative computational pipelines, such as DCMGCN, NGDTP, and DDI-IS-SL, integrate
various data sources and methodologies to predict novel drug combinations and drug-
target interactions, demonstrating superior performance over existing methods.
Additionally, research efforts focus on personalized recommendation systems, exemplified
by GCFM, which employs graph-convolved factorization machines for interpretable and
effective recommendations in financial product scenarios [1, 2, 3, 4], [5]. In the context of
medical emergencies, machine learning-driven drug recommendation systems, like the one
proposed by Silpa et al., offer valuable and accurate suggestions based on patient symptoms
and conditions [6]. Furthermore, there is a growing emphasis on the integration of artificial
intelligence and cloud computing in healthcare, as explored in studies like Gupta and
Sharma’s investigation into cloud-based solutions for data analytics [7]. The literature also
delves into disease prediction and drug recommendation prototypes, such as Nayak et al.’s
multi-approach model, showcasing the potential of machine learning in providing efficient
and personalized healthcare solutions [8]. In the realm of diabetes prediction, the Adaptive
Weighted Decision Forest algorithm proves effective in analyzing vital condition data [9].
DeepDrug introduces an innovative computer-aided drug design software utilizing artificial
   intelligence to expedite the identification of new compounds [10]. Moreover, the use of
deep attention neural networks, as seen in Liu et al.’s DANN-DDI, enhances drug-drug
interaction predictions [11]. DeepSide, a deep learning approach for drug side effect
prediction, stands out for its multi-modal architecture, offering accurate and interpretable
predictions [12]. Finally, studies such as Galeano et al.’s framework for predicting the
frequencies of drug side effects and Zhang et al.’s FS-MLKNN method contribute to the
understanding of drug risk-benefit assessment and multi-label learning [13, 14]. The
integration of AI, particularly evident in Leora’s role as an AI-powered chatbot for mental
health support, showcases the potential for digital mental health services to address global
public health concerns [15]. Swathi et al. proposed a methodology for predicting drug side-
effects by analyzing data sourced from open health forums, utilizing UiPath for data
extraction and machine learning techniques for classification [16]. In a similar vein,
Palanivinayagam and Sasikumar introduced a model that identifies diseases based on
symptoms and recommends drugs aiming for minimal side effects, optimizing their
recommendation model for efficiency [17]. Roy et al. delved into drug-resistant ovarian
cancer, employing transcriptional profiling to understand gene co-expression and un- cover
resistance mechanisms, particularly focusing on PARP inhibitor resistance [18]. Thomas
and Zachariah conducted computational analysis to investigate the potential of 4-phenyl-
4H- chromeno derivatives as anticancer and anti-inflammatory agents, identifying
compounds with promising drug-like properties [19]. Furthermore, Radhika et al. explored
Insilco analysis techniques for nano Polyamidoamine (PAMAM) dendrimers in cancer drug
delivery, emphasizing the utilization of computational methodologies in drug development
and predictive modeling for drug discovery [20]. These diverse studies [21] collectively
contribute to the advancement of predictive modeling, drug recommendation systems, and
understanding drug resistance mechanisms.


3. Methodology
  The system methodology comprises several steps that amalgamate data collection,
analysis, modeling, and prediction within a structured framework as shown in Fig.1.


3.1.    Data Collection and Preparation
   In the initial phase of our study, we carefully compiled diverse datasets comprising
essential information about drugs. This included establishing connections between drug
names and specific medical conditions, drawing from reputable medical databases and
pharmaceutical records. Simultaneously, we gathered sentiment scores or reviews
associated with various drugs, originating from sources. Additionally, we collected average
user ratings reflecting the general feedback on the effectiveness and side effects of different
medications. Once we collected this information, the subsequent step involved data
cleansing. We conducted a thorough consistency check to ensure accuracy in drug-
condition associations, sentiment scores, and user ratings. Handling missing values was
another crucial aspect, where we either imputed missing data or removed incomplete
entries based on stringent quality standards. Standardizing data formats and structures
across different datasets was undertaken to facilitate seamless integration. Furthermore,
normalization was applied to certain data, ensuring a uniform scale for comparative
analysis.
                  Figure 1: Block diagram of proposed research methodology


The process also involved eliminating any duplicate entries or redundant information that
could potentially skew analytical results. The final step included preparing the datasets in a
structured format ready for analysis, employing tools such as Pandas in Python for effective
management. This comprehensive compilation and cleansing of datasets establish a robust
foundation for the subsequent phases of in-depth analysis and modeling.


3.2.    Sentiment Analysis
   In the phase of Sentiment Analysis, we delve into Natural Language Processing (NLP)
techniques, leveraging the VADER (Valence Aware Dictionary and sentiment Reasoner)
module available through NLTK (Natural Language Toolkit). This powerful approach
enables us to decipher the sentiment embedded within user-generated drug reviews. By
employing NLP methodologies, we can systematically break down these reviews,
identifying nuanced sentiments such as positive, negative, or neutral tones expressed by
users. The VADER module specifically aids in this process by assigning polarity scores to
individual words or phrases within the reviews, allowing us to gauge the overall sentiment
conveyed. The goal here is not just to identify sentiments but also to extract profound
insights from this analysis. Through sentiment insights, we aim to unravel the varied user
perceptions and experiences associated with different drugs. This involves a deeper
understanding of how users express their satisfaction, concerns, or experiences regarding
the efficacy, side effects, or overall impact of specific medications. By comprehensively
analyzing these sentiments, we can glean valuable information about user sentiment
trends, common issues reported, or standout positive experiences. These insights become
instrumental in understanding the user landscape and can guide future decision-making
processes in healthcare, pharmaceuticals, and patient care.


3.3.    Web Scraping for Additional Information
    In the stage of Web Scraping for Additional Information, we employ web scraping tools,
notably Selenium, to access and extract crucial data from prominent online platforms such
as 1mg.com. This process involves navigating through the website’s pages, specifically
targeting sections that contain valuable information about various drugs. Our primary focus
lies in collecting comprehensive data encompassing drug names, their corresponding
generic equivalents, and alternative brands available on the platform. The aim is to gather a
robust dataset that includes detailed information about the drugs listed on the website.
This includes not only their specific names but also their generic counterparts, which are
vital in understanding the broader spectrum of medications available. Moreover, we gather
details about alternative brands, taking into consideration user ratings, as these serve as
viable alternatives for the same medication. Through this web scraping endeavor, we
streamline the process of data collection, ensuring we acquire extensive and relevant
information from reputable online sources. This consolidated dataset becomes a
foundational asset for our research, facilitating a comprehensive analysis and providing a
richer context for understanding drug-related information and user preferences.


3.4.    Data Fusion and Integration
   The phase of Data Fusion and Integration involves merging and unifying diverse
datasets, specifically integrating the outcomes derived from sentiment analysis with the
user ratings. This process aims to create a cohesive and consolidated dataset that
amalgamates varied aspects of drug performance and user sentiments. Initially, sentiment
analysis outcomes obtained through Natural Language Processing techniques, particularly
VADER from NLTK, are harmonized with the existing dataset. These sentiment insights,
derived from analyzing user-generated drug reviews, capture the emotional tone and
sentiment expressed towards different medications. Subsequently, this sentiment-inclusive
dataset is combined with the user ratings dataset, aligning the sentiments expressed in
reviews with the quantitative ratings provided by users. This integration is pivotal as it
brings together qualitative sentiment analysis and quantitative user ratings, providing a
comprehensive and nuanced understanding of drug performance from multiple
perspectives. The objective of this fusion and integration process is to create a singular,
enriched dataset that encapsulates both qualitative and quantitative aspects of user
feedback. This integrated repository becomes a valuable resource for conducting a holistic
assessment of drug performance, offering insights into user sentiments alongside
numerical ratings, ultimately contributing to a more comprehensive evaluation of various
medications.


3.5.  Model Development
   Sentiment analysis and drug prediction are the two primary models of the Drug
Information and Recommendation System. Evaluating the sentiment expressed in reviews
of drugs is the primary objective of the sentiment analysis model. Data is initially collected
from many online resources, and then the text is thoroughly cleaned up, with tokenization,
lowercasing, and the removal of HTML tags and special characters. The model uses natural
language processing (NLP) techniques to classify reviews as good, negative, or neutral by
assigning sentiment scores. The drug prediction model simultaneously aims to classify
medicines according to properties taken from the preprocessed text. Collectively, these
models provide an in-depth understanding of the emotional tone of reviews and drugs
categorizations by sharing the same dataset and preprocessing processes. The findings are
combined, stored as CSV files, and used as the basis for an extensive Drug Information and
Recommendation System. To improve analytical power and usability, cloud computing
services are integrated into the system. The model evaluation includes metrics such as
accuracy, mean squared error (MSE), and a confusion matrix, providing a comprehensive
assessment of classification and regression performance. These metrics offer a
comprehensive evaluation of the models’ performance in predicting drug-related
categories. Ultimately, they contribute to enhancing the understanding of how user
sentiments align with numerical ratings, offering predictive capabilities for drug
performance assessment.


4. Experimental Results
    The drug prediction task involves predicting the drug based on the review. The neural
network model’s training over 30 epochs shows significant improvement and potential as
shown in Fig.2.Validation accuracy falls at 50.55%, indicating a possibility of overfitting
despite a training accuracy of 61.99%. But this common challenge provides a chance for
optimization via regularization strategies or model adjustments. Computational efficiency
is demonstrated by the effective training time of 16 seconds for every batch.


              Figure 2: Drug Prediction task training and validation accuracy


   The sentiment analysis task involves predicting whether a review has a positive, neutral,
or negative sentiment. The neural network performed well overall, with an accuracy of
about 88.20% on the validation set as shown in Fig.3. Neural networks are an excellent
choice for tasks involving complicated interactions because they can effectively capture
complex patterns in data. With its several deep layers, the selected architecture offers
flexibility in terms of learning from the input features.
            Figure 3: Sentiment analysis task training and validation accuracy


4.1.    Cloud Deployment:
   Model and Application Structuring: The model architecture and application design
have been refactored to ensure compatibility with AWS infrastructure. The components
have been organized into modular and scalable units, allowing for easy deployment and
management on AWS. This modular approach enables efficient utilization of cloud
resources and simplifies maintenance and updates.


                             Figure 4: Creating EC2 Instance


   EC2: Customized settings and configurations specific to Amazon Elastic Compute Cloud
(EC2) have been implemented to ensure optimal performance and scalability. The EC2
instances have been provisioned with appropriate specifications and resources to support
the sentiment analysis model and application requirements. This includes defining instance
types, storage options, and network configurations.


                            Figure 5: AWS S3 Bucket creation
   Data Storage Setup: Amazon Simple Storage Service (S3) has been utilized to establish
storage buckets for securely storing model artifacts and datasets. The S3 buckets have been
configured with appropriate access policies and encryption settings to protect the data at
rest and in transit. Versioning has been enabled to maintain data consistency and integrity,
allowing for easy retrieval and management of different versions of datasets and model
artifacts.


                                Figure 6: Creating IAM Role


   IAM Policies and Access Control: IAM policies have been configured to manage user
permissions and restrict access, ensuring the security of sensitive data and resources.
Access controls have been defined to limit privileges and enforce data confidentiality and
system integrity. This includes assigning roles and permissions to different user groups and
implementing multi-factor authentication for enhanced security.


                             Figure 7: Cloud Watch monitoring


   Monitoring and Management Configuration: AWS CloudWatch has been set up to
monitor system metrics, log data, and performance indicators. This enables proactive
management by tracking resource utilization, identifying bottlenecks, and detecting
anomalies. Alerts and notifications have been configured to promptly notify administrators
of critical events or performance deviations, facilitating timely responses and issue
resolution.
                                 Figure 8: User Insights


   By structuring the model and application to align with AWS infrastructure, configuring
IAM policies for access control, setting up secure data storage with S3, and implementing
monitoring and management through CloudWatch, the sentiment analysis model can be
seamlessly deployed and managed on AWS as shown in Fig. 8. This ensures compatibility,
scalability, security, and efficient utilization of cloud resources, enabling reliable and
effective sentiment analysis for user reviews of pharmaceutical products.


5. Challenges
   One significant challenge was handling a wide range of user-generated information with
different languages, expressions, and emotions. This was overcome by using strong Natural
Language Processing (NLP) techniques. Effective data processing strategies and data
cleaning procedures were necessary to handle massive volumes of data while maintaining
quality and integrity. Continuous optimization, including algorithm refining, parameter
adjustment, and iterative improvement techniques, was necessary to ensure model
correctness and relevance.
6. Conclusion
   In summary, a significant advancement in pharmaceutical sentiment analysis has been
made with the development of Sentiment AI, a Drug Information and Recommendation
System that is hosted on Amazon Web Services (AWS). Sentiment AI provides compatibility,
scalability, and security by utilizing AWS services including EC2, S3, and IAM. This allows
for effective resource use and real-time monitoring via AWS CloudWatch. Furthermore,
sentiment analysis is improved by the integration of complex machine learning (ML) and
deep learning (DL) models, offering insightful information on how users view
pharmaceuticals. This platform is a solid pharmaceutical sentiment analysis tool that has
the potential to revolutionize healthcare decision-making.


7. Future Scope
   Future scope involves addressing challenges related to data variability, volume handling,
and model accuracy improvement. Continuous optimization strategies, including refining
algorithms and incorporating feedback loops, are essential for enhancing predictive
accuracy. By harnessing sentiment insights, businesses can refine customer experiences,
healthcare benefits from patient feedback, and policymakers can comprehend public
sentiments for responsive governance.


References
  [1] H. Chen, Y. Lu, Y. Yang, Y. Rao, A drug combination prediction framework based on
      graph convolutional network and heterogeneous information, IEEE/ACM
      Transactions on Computational Biology and Bioinformatics (2022).
  [2] P. Xuan, B. Chen, T. Zhang, Y. Yang, Prediction of drug–target interactions based on
      network representation learning and ensemble learning, IEEE/ACM transactions on
      computational biology and bioinformatics 18 (2020) 2671–2681.
  [3] C. Yan, G. Duan, Y. Zhang, F.-X. Wu, Y. Pan, J. Wang, Predicting drug-drug interactions
      based on integrated similarity and semi-supervised learning, iEEE/ACM
      transactions on computational biology and bioinformatics 19 (2020) 168–179.
  [4] Y. Zheng, P. Wei, Z. Chen, Y. Cao, L. Lin, Graph-convolved factorization machines for
      personalized recommendation, IEEE Transactions on Knowledge and Data
      Engineering (2021).
  [5] D. Parasar, A. Ali, N. M. Pillai, A. Shahi, B. S. Alfurhood, K. Pant, Detailed review on
      integrated healthcare prediction system using artificial intelligence and machine
      learning, in: 2023 3rd International Conference on Advance Computing and
      Innovative Technologies in Engineering (ICACITE), IEEE, 2023, pp. 682–685.
  [6] C. Silpa, B. Sravani, D. Vinay, C. Mounika, K. Poorvitha, Drug recommendation system
      in medical emergencies using machine learning, in: 2023 International Conference
      on Innovative Data Communication Technologies and Application (ICIDCA), IEEE,
      2023, pp. 107–112.
  [7] U. Gupta, R. Sharma, A study of cloud based solution for data analytics in healthcare,
     in: 2023 6th International Conference on Information Systems and Computer
     Networks (ISCON), IEEE, 2023, pp. 1–6.
 [8] S. K. Nayak, M. Garanayak, S. K. Swain, S. K. Panda, D. Godavarthi, An intelligent
     disease prediction and drug recommendation prototype by using multiple
     approaches of machine learning algorithms, IEEE Access (2023).
 [9] R. Han, A study of diabetes prediction based on adaptive weighted decision forest,
     in: 2023 8th International Conference on Cloud Computing and Big Data Analytics
     (ICCCBDA), IEEE, 2023, pp. 203–207.
[10] S. Mukhopadhyay, M. Brylinski, A. Bess, F. Berglind, C. Galliano, P. F. McGrew,
     Deepdrug: Applying ai for the advancement of drug discovery, in: 2022 14th
     International Conference on COMmunication Systems & NETworkS (COMSNETS),
     IEEE, 2022, pp. 667–674.
[11] S. Liu, Y. Zhang, Y. Cui, Y. Qiu, Y. Deng, Z. Zhang, W. Zhang, Enhancing drug-drug
     interaction prediction using deep attention neural networks, IEEE/ACM
     transactions on computational biology and Bioinformatics 20 (2022) 976–985.
[12] O. C. Uner, H. I. Kuru, R. G. Cinbis, O. Tastan, A. E. Cicek, Deepside: a deep learning
     approach for drug side effect prediction, IEEE/ACM Transactions on Computational
     Biology and Bioinformatics 20 (2022) 330–339.
[13] D. Galeano, S. Li, M. Gerstein, A. Paccanaro, Predicting the frequencies of drug side
     effects, Nature communications 11 (2020) 4575.
[14] W. Zhang, F. Liu, L. Luo, J. Zhang, Predicting drug side effects by multi-label learning
     and ensemble learning, BMC bioinformatics 16 (2015) 1–11.
[15] E. L. van der Schyff, B. Ridout, K. L. Amon, R. Forsyth, A. J. Campbell, Providing self-
     led mental health support through an artificial intelligence–powered chat bot
     (leora) to meet the demand of mental health care, Journal of Medical Internet
     Research 25 (2023) e46448.
[16] D. N. Swathi, et al., Predicting drug side-effects from open source health forums
     using supervised classifier approach, in: 2020 5th International conference on
     communication and electronics systems (ICCES), IEEE, 2020, pp. 796–800.
[17] A. Palanivinayagam, D. Sasikumar, Drug recommendation with minimal side effects
     based on direct and temporal symptoms, Neural Computing and Applications 32
     (2020) 10971–10978.
[18] S. Roy, J. Jeyalakshmi, S. Gochhait, S. Poonkuzhali, M. M. Gromiha, Metadata analysis
     to get insight into drug resistant ovarian cancer, Ingénierie des Systèmes
     d’Information 28 (2023).
[19] N. Thomas, S. M. Zachariah, In silico drug design and analysis of 4-phenyl-4h-
     chromene derivatives as anticancer and antiinflammatory agents, International
     Journal of Pharma- ceutical Sciences Review and Research 22 (2013) 50–54.
[20] R. Radhika, V. Rohith, N. Anil Kumar, K. Varun Gopal, P. Krishnan Namboori, O.
     Deepak, Insilico analysis of nano polyamidoamine (pamam) dendrimers for cancer
     drug delivery, Int. J. Recent Trends Eng. Technol 4 (2010) 142–144.
[21] N. Sukumar, M. P. Krein, M. J. Embrechts, Predictive cheminformatics in drug
     discovery: statistical modeling for analysis of micro-array and gene expression data,
     Bioinformatics and Drug Discovery (2012) 165–194.

</pre>