=Paper= {{Paper |id=Vol-3244/PAPER_01 |storemode=property |title=Analysis of Vulnerabilities in Hadoop Map Reduce Framework: A Review |pdfUrl=https://ceur-ws.org/Vol-3244/PAPER_01.pdf |volume=Vol-3244 |authors=Shubham Jambhulkar,Deepak Singh Tomar,RK Pateriya }} ==Analysis of Vulnerabilities in Hadoop Map Reduce Framework: A Review== https://ceur-ws.org/Vol-3244/PAPER_01.pdf
Analysis of Vulnerabilities in Hadoop Map Reduce Framework:
A Review
Shubham Jambhulkar a, Deepak Singh Tomar a and R K Pateriya a
a
    Maulana Azad National Institute of Technology, Bhopal, India


                 Abstract
                 Enormous Data is an assortment of various equipment and programming advancements,
                 which have a heterogeneous framework. Hadoop system assumes the main part in managing
                 and putting it away. It provides intelligent financial and fast data applied in various regions
                 such as clinical benefits, social networks, and safeguard. Hadoop Framework is based on
                 distributed streaming model and is used to manage and store data within wide range of
                 product PCs. Because of the adaptability of the system, a few weaknesses emerge. These
                 weaknesses are dangers to the information and lead to assaults. In this paper, various sorts of
                 weaknesses are talked about and potential arrangements are given to diminish or take out
                 these weaknesses. The test arrangement used to perform normal assaults to comprehend the
                 idea and execution of an answer for staying away from those assaults is introduced. The
                 outcomes show the impact of assaults on the presentation. As per results, there is a need to
                 ensure information utilizing guards inside and out to security.

                 Keywords 1
                 Big Data, Map-Reduce, Hadoop, Vulnerability, Kerberos

1. Introduction
   Big Data is a gathering of exceptionally enormous informational collections [1] which are
extremely perplexing or too huge to ever be worried about by customary information handling
applications. For any data to be regarded as big data it must satisfy the 4 V’s namely Velocity,
Veracity, Volume and Variety [3], [2]. With the advancement of technology in today’s world, a large
amount of information is produced in various fields such as social networking sites, transaction
records, data sensors, log files, etc. Due to this various source terabytes of assembled, semi-
assembled, and unassembled data are produced at every point of time. Therefore, if this data is not
stored or pre-processed there is a chance of loss of this important data. To avoid this loss, the Hadoop
framework is used with different analytics tools and they are often much quicker than conventional
analytical methods of the past.

   Big data is a word that is equally associated with Hadoop. As previously discussed, for any data to
be considered as big data it must satisfy 4 V’s namely Velocity, Veracity, Volume and Variety. With
data advancement, it had not only affected the Velocity, Veracity, Volume and Variety aside from the
privacy and security aspects in data. Additionally, the inclusion of another V that is Vulnerability is
proposed [4]. Figure 1 represent the different V’s in a diagrammatic manner.




WCNC-2022: Workshop on Computer Networks and Communications, April 22 – 24, 2022, Chennai, India.
EMAIL: shubham.pj0806@gmail.com (Shubham Jambhulkar)
ORCID: 0000-0002-2934-3934 (Shubham Jambhulkar)
            ©️ 2022 Copyright for this paper by its authors.
            Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
            CEUR Workshop Proceedings (CEUR-WS.org)


                                                                                   1
Figure 1: Big data representation of different V’s [2], [3]

   Big data accumulates all the interesting values from the data pool and many countries are operating
on dominant schemes on the basis of big data. As a result of global level schemes many latest models
and framework have been developed. Some frameworks were developed for providing a considerable
and good amount of storage capacity, real-time data analysis, and parallel processing of data [5]. One
such popular framework is Hadoop. The advantages offered by big data are very vast. The technology
offers better scalability, flexibility, with fulfillment-based in a affordable rates. Subsequently recent
growth in sustainable technology, the cost associated with the processing and storage section
continues to decrease [6].

    The recently developed technology is designed to guarantees privacy and security aspects in
comparison to traditional previous technologies. But even with these advantages, they are becoming
prone to negative purposes. With the recent growth in fields and organizations using this technology
for storing and processing their private organization’s data, it has become prone to negative data
attacks.

1.1.    Hadoop Framework
    Apache Hadoop provides a way to process parallelly the same distribution of very large or
complex databases. The Hadoop framework provides advantages like distributed computing and
parallel processing for datasets. Hadoop comprises a component such as HDFS, Map Reduce, and
YARN. HDFS supervises the repository, Map Reduce supervises processing in parallelly and YARN
is responsible for resource management in Hadoop’s Cluster.

1.1.1. Hadoop
   During 2005 Hadoop initially appeared and was introduced at later 2011 to help spread the web
searching tool scheme to Yahoo. [7]. The delivery had very little safety assist, made for people who
were loyal to the Climate. Hadoop since then has emerged among the modern state-of-the-art
advancements to store, process, and examine large information through utilizing bunch of out-spread

                                                    2
climate [8]. The Framework clients subsequently unrolled from one side of the planet to the other,
generally enormous organizations [9].

1.1.2. Hadoop Distributed File System
   Hadoop distributed file system is responsible for the storage mechanism of the Hadoop framework
and can operate without any hurdles. In HDFS, a big complex file is distributed over the cluster
network that is comprised with multiple nodes of data and associated repository. During this cycle
segments of the Node Name, the first record becomes square, 64 MB in size and repeats on various
Data Nodes depending on the rules with the previous characters. Name Node additionally comes with
metadata for this duplication and distribution. Every information block has been redesigned multiple
times for maximum access, two from one Data Node site and one from various Data Node racks. The
group Information Node stores a small portion of all text. Name Node always remembers which
information block has the location where the file is located, where the information blocks are set, and
where the power limits are involved. Using periodic signals, Name Node invariably knows which
Data Nodes are still available. When the signal (heartbeat) is missing, the Name Node detects a Data
Node failure, eliminates the Data Node bombed in the Hadoop group, and attempts to distribute the
information load evenly across the current Data Node. Alternatively, the Name Node ensures that a
specified number of duplicates of information is kept constant for maximum access. The diagram
below (Figure 2) shows the Hadoop distribute file structure architecture




Figure 2: HDFS Architecture

1.1.3. Map Reduce
   MapReduce is an equal handling structure work dependent on the expert slave guideline, like
Hadoop Distributed File System. Map Reduce is mixture consists of three slave agents per slave and
one expert agent in group. Map Reduce management depends equally on various calculations for
direction and downtime. This works in two stages, the map function and the reduction function. This
JobTracker divides the database into separate clusters called map operations and directs them into

                                                  3
three Data Nodes naturally across all related product computers distributed across the organization for
equal management.

   Let us have a look at block diagram of Map reduce phases in Figure 3.




Figure 3: Phases in Map Reduce namely Shuffling and Reducing

    Ordinarily, the guide assignments run on a similar bunch of Data Nodes where information lives
(Data area). Assuming a hub is as of now vigorously stacked, another hub that is near the information,
i.e., ideally a hub in a similar rack, is chosen. Moderate outcomes are inaccessible to the client and are
traded among the nodes (Shuffling), and from there on, converged by the decreased undertakings to
get the outcome. The Figure 4 shown below shows the internal algorithm of Map Reduce.




Figure 4: Algorithm for Mad Reduce Function

The Table 1. Shows the internal structure Key value pair in the Map Reduce.

Table 1: Map Reduce phases Structure
Phase                          Input                                  Output
Mapper                         (Key, Value)                           (Key, Value)
Shuffle & Sort                 (Key, Value)                           (Key, list (Value))
Reducer                        (Key, list (Value))                    (Key, Value)

                                                    4
   Transitional aftereffects of guide stages were amassed with storing short information size as
conceivable within the transfer undertakings to diminish assignments. Medium results are stored in
the nearest Data Node record system. JobTracker responds by carefully resetting any function in the
event of a disruption. If an undertaking doesn't advise any advancement is still up in the air time, or
on the other hand, assuming Node of data flops totally, whole assignment will be booted on other
server counting errands however, will not be wrapped up. Assuming an errand runs very leisurely, the
JobTracker likewise restarts the assignment on one more server to execute the general occupation at a
suitable period.

1.1.4. Yet Another Resource Negotiator
   MapReduce was split into two categories: Yet Another Resource Negotiator and Map Reduce [7].
Yet another Resource Negotiator primary rule is to isolate the assets of the executives and occupation
planning functionalities into independent daemons. An asset administrator refers assets among
framework implementations, with hub chief assistance. The asset director has two fundamental parts:
application supervisor and scheduler. scheduler assigns assets to different operating systems and
books depending on the asset requirements of the applications. The asset director acknowledges work
entries, and each occupation is distributed to the application supervisor. The diagram below figure 5
represents the YARN log file architecture.




Figure 5: YARN log file Architecture

2. Related Work
   This section reviews and provide an analysis of different vulnerabilities on the Hadoop framework.
Generally, our concerned area lies in the Map-Reduce functionalities provided by the Apache Hadoop
framework. There has been numerous tools and method defined to tackle the problem of
vulnerabilities but every tools is able to provide one functionality while leaving several other holes in
the system. The use of Kerberos system with proper authorization is suggested to tackle multiple
problems responsible for vulnerabilities.

3. Literature Review
   In a report [10], the vulnerability is categorized as data privacy, infrastructure security, and data
management. These classifications are further classified into three different categories: Dimensional
modeling, architecture dimension, and information flow. The life cycle of data comprises data in
transit and data privacy comprises data at rest.


                                                   5
    As per another report [11], the big data security and privacy issues are categorized into five types
namely, Hadoop Security, Key management, anonymization, monitoring, and auditing. The author
also proposed some algorithms concerning security and monitoring aspects of sensitive information
like the Bull eye algorithm.

    As per another report [12] on cloud security, a security model with some proposed cloud
infrastructure layered was designed. This model was then further classified into four categories:
logical, basic, governance, and value-added security. This report specifies the infrastructure policy
framework of Hadoop.

   As per another report [13], the author specifies different types of attacks that have taken place in
the Hadoop framework. The attacks namely comprised of Denial of Service, Man in the Middle,
impersonation, repudiation and replay attacks. According to the author, because of the distributed
nature of the Map-Reduce component of Hadoop possible wide range of attacks were possible leading
it to a vulnerable state. The ideal Map Reduce component would be comprised of proper
authentication control, access control, authorization, confidentiality of data, and lastly data availability
for Map and reducer class of Map-reduce. For better authentication control the author recommends
the use of Kerberos protocol.

   From one more report [14], the security and privacy aspects faced challenges that were categorized
in different model names namely access control, access control policy, Data confidentiality, and lastly
smart objects. This report puts forwards the challenges of research faced in regards to comprehensive
solutions for securing security and privacy aspects.

    Another report [15] lists out the challenges faced when the privacy and security aspects are needed
to be ensured to be safe. The challenges were broadly classified into Risks concerning privacy,
Credibility of data, lacking of recent technologies, and threats. To cope with these challenges author
introduces supervising data, protection mechanism, protection agency, and quality of data.

    As per another report [16], different categories of security and privacy aspects and the connection
between them were discussed briefly. The aspects were classified as Confidentiality, analytics,
integrity, privacy, stream processing, data format and lastly visualization.

   This report [17], has showcased an investigation with the corporate perspectives relying on big
data aspects simply and most effectively. Accordingly based on this corporate perspectives economic
perspective, investment decisions, fighting cybercrimes, and cyber insurance. The Table 2 represents
the vulnerabilities reported in online databases as shown below.

Table 2: Tabular representation of attacks described in online database [18].
Year                     Total Vulnerabilities     Denial of Service        Cross-Site Scripting
2011                     44                        15                       7
2012                     63                        19                       6
2013                     74                        25                       9
2014                     92                        23                       6
2015                     57                        19                       5
2016                     103                       15                       17
2017                     217                       29                       22
2018                     148                       15                       9
2019                     158                       13                       14
2020                     161                       6                        16
2021                     193                       16                       10




                                                     6
3.1.     Vulnerability Databases
   There are numerous online databases currently available all over the internet, that are mainly
responsible for exposing the possible security vulnerabilities on numerous products and hardware.
There are numerous such online databases namely, Common Vulnerabilities and exposures, Computer
emergency readiness team, National Vulnerability databases, and Open-Source Vulnerability
databases.

   The CVEs uniquely identify the vulnerabilities based on an identification number. Based on CVE
the list of vulnerabilities encountered in Hadoop has been shown in the tabular format below [19]. The
Table 3. shows different vulnerability reported in CVE.

Table 3: Detailed CVEID of various attacks published on online database[4], [19].
CVE ID              Description
CVE-2021-45911      An issue was discovered in gif2apng 1.9. There is a heap-based buffer overflow
                    in the main function. It allows an attacker to write 2 bytes outside the
                    boundaries of the buffer.
CVE-2021-45906      OpenWrt 21.02.1 allows XSS via the NAT Rules Name screen.


CVE-2017-7669       In Apache Hadoop 2.8.0 the LinuxContainerExecutor runs docker commands as
                    root with insufficient input validation. When the docker feature is enabled,
                    authenticated users can run commands as root.
CVE-2017-3162       HDFS clients interact with a servlet on the Data Node to browse the HDFS
                    namespace. The Name Node is provided as a query parameter that is not
                    validated in Apache Hadoop before 2.7.0.

CVE-2017-3161       The HDFS web UI in Apache Hadoop before 2.7.0 is vulnerable to a cross-site
                    scripting (XSS) attack through an unescaped query parameter.

CVE-2017-15713      Vulnerability in Apache Hadoop 3.0.0 allows a cluster user to expose private
                    files owned by the user running the MapReduce job history server process. The
                    malicious user can construct a configuration file containing XML directives that
                    reference sensitive files on the MapReduce job history server host.



3.2.     Patch Management
    Patch management is a mechanism for detecting and eliminating the vulnerabilities before any
attackers try to exploit them. The throughput is directly proportional to the fast detection of
vulnerabilities, rectified, compressed with some methods like scanning and testing for reviewing of
code. As per the report from 2017, the application of scanning methods has been gradually rising
internationally [20].

3.3.     Security Issues in Hadoop
   It is known that Hadoop was designed primarily for a performance basis and not on security basis.
The Developers decided that security functionalities will be added over time to increase the
framework efficiency. Due to this the security mechanism of Hadoop was very weak and prone to
many attacks. Hadoop was mainly designed with a focus on improving efficiency. But due to recent


                                                  7
attacks researchers are now focusing on the security aspects of Hadoop. However, presently there
does not exist any evaluation method for the security policies of Hadoop.

   Due to the recent growth of Big Data, the security policies available are not up to the benchmark to
be even considered for evaluation. The ecosystem of Hadoop comprises a collection of different
applications, where every application requires some security mechanism to function accordingly for
Big Data.

   Out of all the models proposed previously to work with big data, Hadoop was uniquely identified
because of its distribution system with parallel processing but was lacking in the security policies.
Whereas, the distributed nature of Hadoop was favored previously, now the distributed nature of
computing is posing a set of new vulnerabilities for professional and security managers [21].

3.4.    Security threats and possible attacks
   Any possible danger for the information system can be referred to as a threat. A threat is basically
what an attacker tries to identify and use as an attack against any company or organization [22]. Also
are already familiar with the CIA triad. For any system to be regarded as secure it must satisfy
Confidentiality, integrity, and Availability also known as CIA triads. To comply with confidentiality,
an authenticated server can be implemented that can access the whole system.

3.4.1. Impersonation Attacks
   This type of attack occurs when an attacker tries to impersonate the registered or legitimate
authority for accessing the resources. The attackers can make use of different sets of tools and
methods to steal sensitive information attacks directly on the Hadoop Clusters leaving the system
vulnerable. To perform an impersonation attacks an attacker can try to replay the acknowledgement
received from Kerberos protocol. At last, when the attackers gain access to the Hadoop framework,
performing actions like leaking and throttling the processing time of Map Reduce.

3.4.2. Denial-of-Service Attacks
   A Denial-of-Service [23] is a type of attack where an attacker floods the system with an enormous
request which makes the system unable to allocate resources to legitimate users. As per the report,
more than 11247 attacks have been taken place among which 5 attacks were able to breach the
security. Denial of Service attacks is basically where a system is flooded with large request or traffic
causing the system servers to crash or halt all the operations. Denial of Services can be initiated in two
ways: by crashing the services, flooding the services. The Hadoop Component like Name Node and
the authentication server is prone to Denial-of-Service attacks. A simple Denial of Service attack on
Name Node is enough to halt all the operations of Map Reduce and stop the read-write operation of
the Hadoop Distributed file system.

3.4.3. Cross-Site Scripting
    Cross-site Scripting [24] is a type of attack where malicious code is injected into any web
application that is vulnerable. Cross-Site scripting is different from other attacks such in a way not
intended for the implementation in question. But actually, the users of web applications are at risk
here. The Cross-site script attacks can be categorized into two types: stored, reflected. Stored attacks
also go by the name persistent and are more damaging than the reflected attacks as it is directly
injected into the vulnerable web applications. Whereas in reflected, the malicious script is reflected
directly onto the user web browser.



                                                    8
3.4.4. Present Attacks
   The Hadoop framework due to its open ports and IP address has always been an object of attacks
by all the attackers, due to which around 5307 of Hadoop Cluster has been exposed with the
vulnerable security settings that attackers use to exploit the framework [25]. There was an online
search engine designed to show all the details of the servers and all peripheral devices connected to
them over the internet, its name was shodan2. The advantages of shodan2 were that it was possible to
recommend any security policy but the disadvantage is that it was used by attackers to exploit the
system. To tackle this attack and stop stealing some strategies with high high-security policies must
be implemented.

  Here, is the following table. 4 that gives a comparative analysis of attacks that had been taken at
Hadoop.

Table 4: Comparative analysis of various attacks and challenges faced.
    Author            Year            Attacks            Features           Challenges           Description
                      2019         Impersonation      Authentication          How to             This type of
                                      Attacks                             authenticate if       attack occurs
  Bhathal                                                                  the person is           when an
Gurjeet Singh                                                            actually legit and     attacker tries
     [4]                                                                        not           to impersonate
                                                                         impersonated by       the registered
                                                                            an attacker.        or legitimate
                                                                                                authority for
                                                                                                accessing the
                                                                                                  resources.
  Bhathal             2019            Denial of       Authentication,    The collection of       A Denial of
Gurjeet Singh                          Service         Authorization      attacks can be         Service is a
     [4]                                                                    diverse or         type of attack
                                                                             complex               where an
                                                                                              attacker floods
                                                                                                 the system
                                                                                                    with an
                                                                                                  enormous
                                                                                               request which
                                                                                                  makes the
                                                                                               system unable
                                                                                                 to allocate
                                                                                                resources to
                                                                                                  legitimate
                                                                                                    users.
  Bhathal             2019           Cross-Site       Authentication,    Set up some anti         Cross-site
Gurjeet Singh                        Scripting         Authorization         triggered          Scripting is a
     [4]                                                                   methods to          type of attack
                                                                         avoid hijacking of         where
                                                                              the user        malicious code
                                                                             accounts         is injected into
                                                                                                   any web
                                                                                                 application
                                                                                                    that is
                                                                                                 vulnerable.


                                                  9
Fu Xiao [28]    2017   Data Leakage        Confidentiality,    Avoid leaking,     Data Leakage is
                                           authentication,    destruction, and             the
                                            authorization      corruption of        unapproved
                                                                 confidential     transmission of
                                                                information.         information
                                                                                   from inside an
                                                                                   association to
                                                                                       an outer
                                                                                     objective or
                                                                                     beneficiary.
  Jose Ancy     2014   DNS reflection      Confidentiality    Misconfiguration     DNS reflection
 Sherin [29]           amplification                          of DNS leads to           attack is
                                                                   DDoS           basically a type
                                                                                   of Distributed
                                                                                       Denial of
                                                                                   Service attack.
M Mizukoshi     2019    Distributed        Authentication,         Manual          A DDoS attack
   [26]                  Denial of          Authorization       intervention            includes
                          Service                              requirement is          different
                                                                  too much            associated
                                                                                      web-based
                                                                                       gadgets,
                                                                                      altogether
                                                                                      known as a
                                                                                   botnet, which
                                                                                   are utilized to
                                                                                   overpower an
                                                                                    objective site
                                                                                          with
                                                                                      counterfeit
                                                                                         traffic.
Xianqing Yu     2015   Cloud Attacks       Confidentiality     How to avoid            Using the
    [27]                                                      misconfiguration,     public cloud
                                                               unauthorized           connection
                                                              access, hijacking    characteristics
                                                                                  an attacker can
                                                                                   try to hide his
                                                                                       breaches.
  Bhathal       2019     Port Block        Access Control       How to avoid          Sending of
Gurjeet Singh             Attacks                             overcomplication      packets to a
     [4]                                                            of the        specific port on
                                                                 application            the host
  Bhathal       2019     SYN Flood         Access Control     How to configure        Repetitive
Gurjeet Singh              Attack                               the firewall,        initiation of
     [4]                                                      setting up an IPS       connection
                                                                                        without
                                                                                   establishing it
                                                                                    to make the
                                                                                    server busy.




                                      10
4. Acknowledgements
   This paper and the research behind it were only possible because of the guidance of my guide Dr.
Deepak Singh Tomar, Associate Professor at MANIT Bhopal. His attention to detail and helping to
keep my work on track from the first encounter.

   I would also like to thank my other Supervisor Dr. R K Pateriya, Professor at MANIT Bhopal for
their encouragement and guidance in carrying out the project work. I also thank MANIT Bhopal for
giving me the opportunity to embark on this project.

5. Conclusion
   In this study, an analysis of big data vulnerabilities, security threats, and possible attacks was
reviewed for a popular framework like Hadoop. Although it was observed that Hadoop was designed
in mind to provide maximum efficiency but with the exponential growth of big data has led to Hadoop
being left vulnerable to possible attacks, lack of security policies, mechanisms, proper access control,
etc. To make Hadoop a more reliable and secure framework a proper authentication server with
authorization and auditing is required. At the same time, some mechanism to ensure data protection
will be what an ideal framework would be.

6. References
[1] Lai TL, Yuan H. Stochastic approximation: from statistical origin to big-data, multidisciplinary
    applications. Statistical Science. 2021 Apr;36(2):291-302.
[2] Li, Yun, Manzhu Yu, Mengchao Xu, Jingchao Yang, Dexuan Sha, Qian Liu, and Chaowei Yang.
     "Big data and cloud computing." In Manual of Digital Earth, pp. 325-355. Springer, Singapore,
     2020.
[3] Oussous, Ahmed, Fatima-Zahra Benjelloun, Ayoub Ait Lahcen, and Samir Belfkih. "Big Data
     technologies: A survey." Journal of King Saud University-Computer and Information
     Sciences 30, no. 4 (2018): 431-448.
[4] Bhathal, Gurjit Singh, and Amardeep Singh. "Big data: Hadoop framework vulnerabilities,
     security issues and attacks." Array 1 (2019): 100002.
[5] Brauna, T. D., H. J. Siegelb, N. Beckc, L. L. Bölönid, Albert Muthucumaru Maheswarane,
     Robertsong IR, Theysh JP, and Yaoi MD. "B., Hensgenj, D. and Freundk, RF, “A Comparison of
     Eleven Static Heuristics for Mapping a Class of Independent Tasks onto Heterogeneous
     Distributed Computing Systems,”." Journal of Parallel and Distributed Computing 61, no. 6
     (2001): 810-837.
[6] Gautam, Akansha, and Indranath Chatterjee. "Big data and cloud computing: A critical
     review." International Journal of Operations Research and Information Systems (IJORIS) 11, no.
     3 (2020): 19-38.
[7] Cai, Xiaojun, Feng Li, Ping Li, Lei Ju, and Zhiping Jia. "SLA-aware energy-efficient scheduling
     scheme for Hadoop YARN." The Journal of Supercomputing 73, no. 8 (2017): 3526-3546.
[8] Dunn-Rankin, Peter, Gerald A. Knezek, Susan R. Wallace, and Shuqiang Zhang. Scaling
     methods. Psychology Press, 2014.
[9] Mavridis, Ilias, and Helen Karatza. "Performance evaluation of cloud-based log file analysis with
     Apache Hadoop and Apache Spark." Journal of Systems and Software 125 (2017): 133-151.
[10] Ye, Haina, Xinzhou Cheng, Mingqiang Yuan, Lexi Xu, Jie Gao, and Chen Cheng. "A survey of
     security and privacy in big data." In 2016 16th international symposium on communications and
     information technologies (ist), pp. 268-272. IEEE, 2016.
[11] Terzi, Duygu Sinanc, Ramazan Terzi, and Seref Sagiroglu. "A survey on security and privacy
     issues in big data." In 2015 10th International Conference for Internet Technology and Secured
     Transactions (ICITST), pp. 202-207. IEEE, 2015.


                                                  11
[12] Sharif, Ather, Sarah Cooney, Shengqi Gong, and Drew Vitek. "Current security threats and
     prevention measures relating to cloud services, Hadoop concurrent processing, and big data."
     In 2015 IEEE International Conference on Big Data (Big Data), pp. 1865-1870. IEEE, 2015.
[13] Derbeko, Philip, Shlomi Dolev, Ehud Gudes, and Shantanu Sharma. "Security and privacy
     aspects in MapReduce on clouds: A survey." Computer science review 20 (2016): 1-28.
[14] Bertino, Elisa, and Elena Ferrari. "Big data security and privacy." In A comprehensive guide
     through the Italian database research over the last 25 years, pp. 425-439. Springer, Cham, 2018.
[15] Zhang, Dongpo. "Big data security and privacy protection." In 8th International Conference on
     Management and Computer Science (ICMCS 2018), vol. 77, pp. 275-278. Atlantis Press, 2018.
[16] Nelson, Boel, and Tomas Olovsson. "Security and privacy for big data: A systematic literature
     review." In 2016 IEEE international conference on big data (big data), pp. 3693-3702. IEEE,
     2016.
[17] Tao, Hai, Md Zakirul Alam Bhuiyan, Md Arafatur Rahman, Guojun Wang, Tian Wang, Md
     Manjur Ahmed, and Jing Li. "Economic perspective analysis of protecting big data security and
     privacy." Future Generation Computer Systems 98 (2019): 660-671.
[18] Erraissi, Allae, and Mouad Banane. "Managing Big Data using Model Driven Engineering: From
     Big Data Meta-model to Cloudera PSM meta-model." In 2020 International Conference on
     Decision Aid Sciences and Application (DASA), pp. 1235-1239. IEEE, 2020.
[19] Mitre      Corp,     “CVE      Details”,     12      October      2021.    [Online].    Available:
     https://www.cvedetails.com/vendor/45/Apache.html
[20] Salleh, Khairulliza Ahmad, and Lech Janczewski. "Security considerations in big data solutions
     adoption: Lessons from a case study on a banking institution." Procedia Computer Science 164
     (2019): 168-176.
[21] Parmar, Raj R., Sudipta Roy, Debnath Bhattacharyya, Samir Kumar Bandyopadhyay, and Tai-
     Hoon Kim. "Large-scale encryption in the Hadoop environment: Challenges and
     solutions." IEEE Access 5 (2017): 7156-7163.
[22] Dahbur, Kamal, Bassil Mohammad, and Ahmad Bisher Tarakji. "A survey of risks, threats and
     vulnerabilities in cloud computing." In Proceedings of the 2011 International conference on
     intelligent semantic Web-services and applications, pp. 1-6. 2011.
[23] Gavric, Zeljko, and Dejan Simic. "Overview of DOS attacks on wireless sensor networks and
     experimental results for simulation of interference attacks." Ingeniería e Investigación 38, no. 1
     (2018): 130-138.
[24] Gupta, Shashank, and Brij Bhooshan Gupta. "Cross-Site Scripting (XSS) attacks and defense
     mechanisms: classification and state-of-the-art." International Journal of System Assurance
     Engineering and Management 8, no. 1 (2017): 512-530.
[25] Millman, Rene. "Thousands of hadoop clusters still not being secured against attacks." SC
     Media 10 (2017).
[26] M. Mizukoshi and M. Munetomo, "Distributed denial of services attack protection system with
     genetic algorithms on Hadoop cluster computing framework," 2015 IEEE Congress on
     Evolutionary Computation (CEC), 2015, pp. 1575-1580.
[27] Xianqing Yu, P. Ning and M. A. Vouk, "Enhancing security of Hadoop in a public cloud," 2015
     6th International Conference on Information and Communication Systems (ICICS), 2015, pp. 38-
     43
[28] Fu, Xiao, Yun Gao, Bin Luo, Xiaojiang Du, and Mohsen Guizani. "Security threats to Hadoop:
     data leakage attacks and investigation." IEEE Network 31, no. 2 (2017): 67-71.
[29] Jose, Ancy Sherin, and A. Binu. "Automatic detection and rectification of dns reflection
     amplification attacks with hadoop mapreduce and chukwa." In 2014 Fourth International
     Conference on Advances in Computing and Communications, pp. 195-198. IEEE, 2014.




                                                  12