=Paper= {{Paper |id=Vol-2062/paper4 |storemode=property |title=Towards a Security Reference Architecture for Big Data |pdfUrl=https://ceur-ws.org/Vol-2062/paper04.pdf |volume=Vol-2062 |authors=Julio Moreno,Manuel A. Serrano,Eduardo Fernandez-Medina,Eduardo B. Fernandez |dblpUrl=https://dblp.org/rec/conf/dolap/MorenoSFF18 }} ==Towards a Security Reference Architecture for Big Data== https://ceur-ws.org/Vol-2062/paper04.pdf
         Towards a Security Reference Architecture for Big Data
                                Julio Moreno                                                           Manuel A. Serrano
    GSyA Research Group, University of Castilla-La Mancha                             Alarcos Research Group, University of Castilla-La
                    Ciudad Real, Spain                                                                    Mancha
                  Julio.Moreno@uclm.es                                                               Ciudad Real, Spain
                                                                                                 Manuel.Serrano@uclm.es

                    Eduardo Fernandez-Medina                                                        Eduardo B. Fernandez
    GSyA Research Group, University of Castilla-La Mancha                           Department of Computer and Electrical Engineering and
                    Ciudad Real, Spain                                                  Computer Science, Florida Atlantic University
               Eduardo.FdezMedina@uclm.es                                                           Boca Raton, Florida
                                                                                                    Fernande@fau.edu
ABSTRACT                                                                             Big Data was not conceived initially as a secure environment
Companies are aware of Big Data importance as data are essential                     [33], and therefore, the main security problems are related to the
to conduct their daily activities, but new problems arise with new                   specific architecture of Big Data itself which makes it harder to
technologies, as it is the case of Big Data; these problems are                      protect the privacy of the data that it is being used [7].
related not only to the 3Vs of Big Data, but also to privacy and                        Obtaining an adequate level of security in Big Data can influ-
security. Security is crucial in Big Data systems, but unfortunately,                ence its implementation in an institution because of the loss of
security problems occur due to the fact that Big Data was not                        reputation they could suffer or because they could receive finan-
initially conceived as a secure environment. Furthermore, this                       cial penalties, due to regulations, in the case of data breaches;
task is difficult due to the heterogeneous configurations that a Big                 in fact, without a security guarantee, Big Data will not reach
Data system can have. One way to solve this problem is by having                     an appropriate level of acceptance [35]. Hence, it is important
a global perspective, and in this way, a Reference Architecture                      to have guidance, methodologies, and mechanisms to properly
(RA) is a high-level abstraction of a system that can be useful in                   implement not only the Big Data system, but also its security.
the implementation of complex systems. Several initiatives have                      Big Data environments are very complex, so in order to address
been made for obtaining a RA for Big Data like those from IBM,                       their security, we need to start from a global perspective. Secu-
ORACLE, NIST or ISO, but none of them have their main focus                          rity should be approached from high-level policies that can be
on security. It is widely accepted that adding elements to address                   mapped to the lower levels [13]. Different authors [2, 23] high-
threats and facilitate the definition of security requirements to                    light that Reference Architectures (RA) have been shown to be
RA is a good starting point for solving these kind of threats and,                   valuable to guide security in different environments; for example,
in this way, converting RAs into Security Reference Architectures                    Cloud Computing [13] or Internet of Things [19].
(SRAs). In the current paper, a SRA for Big Data is defined using                       An RA is an abstract software architecture that is based on one
UML models trying to ease secure Big Data implementations;                           or more domains and with no implementation features [2]. More-
allowing to apply security patterns in order to secure final Big                     over, an RA should be expressed at a high level of abstraction, in
Data systems.                                                                        order to be reusable, extendable, and configurable. This kind of
                                                                                     architecture can be composed of different patterns to facilitate
                                                                                     the implementation of the system and improve the addition of
1    INTRODUCTION                                                                    non-functional requirements [15]. Adding security patterns to
Companies are increasingly aware of Big Data importance [1]. For                     control their identified threats, RAs become a Security Reference
all of them, data are essential to conduct their daily activities and                Architecture (SRA). In this way, a SRA is a high level architecture
to help senior management to achieve business objectives and, as                     that incorporates a set of elements facilitating the definition of
a result, take better decisions based on the information extracted                   security requirements and allowing better understanding of secu-
from such data [22]. Big Data implies a change compared to                           rity policies, threats, vulnerabilities, etc., and which can be used
traditional techniques in three different ways: the amount of                        to describe a conceptual model of security for Big Data systems
data (volume), the rate of generation and transmission of data                       [21].
(velocity) and the heterogeneity of the types of structured and                         Among our main concerns in computer security, our current
unstructured data that it can handle (variety) [6]. These properties                 goal is to improve the security and trust of Big Data environ-
are known as the three Vs of Big Data [30].                                          ments. In order to achieve that objective, our first step is the
    New problems usually arise with new technologies, as it is the                   creation of a SRA for Big Data. To do that, we consider that
case of Big Data. These problems are related not only to the 3 Vs                    security patterns have a primordial role in facilitating the im-
of Big data, but also to privacy and security. Big Data not only                     plementation of security mechanisms in a Big Data ecosystem.
increases the scale of the problems related to privacy and security,                 Hence, we modified the RA proposed by the National Institute
as faced in the traditional management of security, but also adds                    of Standards and Technology (NIST) for Big Data [26] to create
new ones that should be addressed with different techniques and                      a richer architecture, in which the relations between the differ-
measures [36]. These security problems occur due to the fact that                    ent parts of Big Data are clearly exposed with a more granular
                                                                                     detail. This enhanced RA will allow a better understanding of
© 2018 Copyright held by the owner/author(s). Published in the Workshop              the Big Data ecosystem. In order to achieve that purpose, our
Proceedings of the EDBT/ICDT 2018 Joint Conference (March 26, 2018, Vienna,
Austria) on CEUR-WS.org (ISSN 1613-0073). Distribution of this paper is permitted    reference architecture is specified by means of UML diagrams
under the terms of the Creative Commons license CC-by-nc-nd 4.0.
[29]. Finally, along with the SRA, we created a partial example
of how to apply our architecture; we have considered some of
the different threats that can affect a Big Data system, and how
the different components that take part in addressing them can
be instantiated; for example, security patterns that can help in
the solution of those problems.
   We organize the content of the paper as follows: first, we show
a section which explains the main properties of the NIST proposal
of an RA for Big Data. After that, we present the components
and structure of our SRA, together with an example of how to
use security patterns to address threats in a particular Big Data
project. Subsequently, we compare our proposal with the main
Big Data RA proposals. Finally, we include a section in which
conclusions and future work are discussed.

2   REFERENCE MODEL: NIST REFERENCE
    ARCHITECTURE FOR BIG DATA
For the last several years, the NIST has defined a RA for Big Data      Figure 1: NIST proposal for a Big Data architecture [26]
which has received the general consensus of the industry and
scientific community [26]. With the release of last version on
August 2017, this architecture collects many different ideas and             underlying Big Data Framework Provider, as well as with
features for creating a Big Data ecosystem. This set of features             the Data Consumer, DP or even with each other.
were extracted from the proposals of a Big Data architecture               • Big Data Framework Provider (BDFP): The BDFP compo-
made by the main companies of the sector, such as, Oracle and                nent can be considered as the platform implementation of
IBM. As a result, NIST produced the RA that can be seen in Figure            the Big Data logic. It supports the activities defined in the
1. The architecture is divided into five different components that           BDAP. In general, Big Data implementations are hybrids
interact with each other and have different objectives. These                that combine multiple technologies. It has three main ac-
components are:                                                              tivities: infrastructure (virtual or physical), platform (how
                                                                             the data is distributed and organized), and processing (how
    • System Orchestrator (SO): This is one of the most impor-               data will be processed to support Big Data applications). In
      tant components of a Big Data ecosystem because it is                  addition, the BDFP component also provides the support
      the one in charge of defining and integrating the required             services for the system like communications or resource
      data application activities into the ecosystem. The main               management.
      purpose of this component is the configuration and man-              • Data Consumer (DC): It is similar to the DP component.
      agement of the other components of the Big Data architec-              Usually the actor that interacts with this component is
      ture. In an enterprise, this function is typically centralized         an end-user or another system. Similarly to the DP, it is
      and can be mapped to the traditional role of system gov-               composed of a set of interfaces between the end-user and
      ernor which provides the supervision of the requirements               the information.
      and constraints that the Big Data must fulfill; for example,
      policies, architecture, or business requirements.                   The NIST proposal cannot be considered as a SRA, but it rec-
    • Data Provider (DP): This component oversees feeding the          ognizes the importance of security and privacy in a Big Data
      Big Data ecosystem with new data. In order to accomplish         environment. In order to face the security problems, this archi-
      that goal, the Data Provider has a collection of interfaces,     tecture has a Security and Privacy Fabric that addresses the needs
      or services, between the Big Data and the data sources.          and solutions about this specific topic. In fact, there exists a spe-
      This set of interfaces acts like a gate between the outside      cific volume about privacy and security in Big Data [27].
      world and the Big Data system.                                      From our point of view, this representation based on blocks
    • Big Data Application Provider (BDAP): The BDAP compo-            is not expressive enough. This kind of specification is too high
      nent provides a specific set of services along the data life     level in terms of abstraction, it provides little emphasis on de-
      cycle to meet the requirements established by the SO. It is      tails of the subcomponents and how they are connected. This
      important to highlight that its main purpose is to encapsu-      approach can make difficult the design and implementation of a
      late the business logic and functionality to be executed by      Big Data ecosystem. Following the same approach, the ISO/IEC
      the architecture. In a regular Big Data scenario, there are      organization is also working in the creation of a RA for Big Data
      several applications executing over the same data. As data       under the standard ISO/IEC 20547-3 [16]. Although, it is a work
      propagates through the ecosystem, it is being processed          in progress, it is expected that it will follow a similar approach
      and transformed in different ways to obtain valuable in-         to the NIST proposal.
      formation from the data. In order to achieve that goal, the
      BDAP is composed of different services or activities that        3   A SECURITY REFERENCE
      can be considered as the SaaS layer of the Big Data sys-             ARCHITECTURE (SRA) FOR BIG DATA
      tem. These activities are: collection, preparation, analytics,   In this section, we will describe our SRA proposal which is struc-
      visualization, and access. Activities can be implemented         tured using the same schema and components as the guidelines
      as independent functions and deployed as stand-alone             proposed by NIST. We consider that if our SRA is aligned with the
      services. Furthermore, the activities can interact with the      RA proposed by NIST, it will be easier to implement. Furthermore,
this architecture highlights the importance of implementing se-         3.2    Data Provider (DP)
curity solutions based in concepts of the SRA.                          The DP component creates an abstraction of the data sources
                                                                        considering their security metadata, if they exist. These meta-
                                                                        data allow the DP to identify the types of access and analysis
                                                                        allowed by the data source and its security requirements. As
3.1    System Orchestrator (SO)                                         we explained in section 2, the DP has a set of interfaces. Those
The main purpose of this component is the enforcement of the            interfaces must consider the constraints of each data source and
different requirements that the Big Data ecosystem must address.        also the different security policies and requirements specified by
Also, it organizes how the requirements are connected to all the        the SO. In this element, there may exist conflicts between the
components of the architecture; in this section, we will focus on       security requirements of the data source and the ones of the Big
the security requirements and the relation between them and             Data system itself. These clashes must be addressed in a way
the different components. Figure 2 shows the structure of our           that satisfies both sides. The security and privacy issues of this
SO proposal. Due to the characteristics of this component, the          component are mostly related to how to properly identify and
security activities related to it are in general focused on the re-     validate the end point inputs. The DP interfaces must evaluate
quirements and how to implement and monitor them. Those                 the provenance of the data source. It is a critical challenge in
requirements must fulfill Big Data goals and should be aligned          the data collection process knowing how to validate that a data
with the different business goals and company policies. In this         source is not malicious and to filter out those which are [7].
concern, the role of the Security Administrator is crucial to en-          In our SRA, the interfaces are connected with the Collector
sure the observance of the security requirements. These security        service of the BDAP that will be described in the next subsec-
requirements must comply with the regulations affecting each            tion. Figure 3 represents the DP component with its interfaces.
Big Data ecosystem context. In fact, there are many other kinds of      In general, the elements that generally compose a data source,
requirements that can address the needs of a Big Data ecosystem;        include: the data itself that can be structured, semi-structured,
for example, architecture, quality, or governance requirements.         or unstructured; security requirements of the data source; and
   There are many examples of security requirements that should         security metadata of the data source. Those elements are not
be addressed in a Big Data context. Topics like data privacy and        represented in the diagram because we consider data source as
how to secure the Big Data architecture itself are the most ad-         an external agent of the Big Data system. Still it is important to
dressed by researchers [25]. These problems can be tackled by           know them to apply their constraints.
using general mechanisms like user authorization and authentica-
tion, fraud detection, risk control, auditing, encryption, network
access control, intrusion detection, or guarantee the quality and       3.3    Big Data Application Provider (BDAP)
security of the data when they come from different data sources
                                                                        The BDAP component has the objective of meeting the require-
[3, 17, 20, 25, 32]. These are general security mechanisms but
                                                                        ments established by the SO, including its security and privacy
they must be modified to be applied to specific types of systems,
                                                                        requirements. To achieve that goal, the BDAP is composed of
based on possible threats.
                                                                        different services or activities that can be considered as the SaaS
   As it is shown in Figure 2, these security requirements can
                                                                        (Service as a System) layer of the Big Data ecosystem; in our case,
be satisfied by means of different security solutions that follow
                                                                        we assume that, in general, Big Data is implemented on a Cloud
the security policies of the company and have as main objective
                                                                        platform, which will affect how the SRA is defined in the BDFP
addressing threats to control vulnerabilities. An example of a se-
                                                                        component. Figure 4 shows the different services that constitute
curity policy in a company can be the obligation of using secure
                                                                        this component, and also the BDAP Security Solution that must
communications, this policy can cause a security requirement
                                                                        map the SO security solutions to these stages; for example, au-
in the Big Data environment that specifies that the data trans-
                                                                        thorization may control here who can apply which operations to
fer between components must be secure. One way to approach
                                                                        perform data analysis.
requirement is by using authentication methods, the implemen-
                                                                           As it is represented in the diagram, not all the activities can
tation of this security solution can be helped by means of the
                                                                        communicate with each other, there is a sequential order of execu-
“Role-based access control” security pattern. These security solu-
                                                                        tion. This means that some of these activities are not mandatory
tions should be specifically implemented in the BDAP and BDFP
                                                                        in a Big Data ecosystem. The preparation step has the purpose
components. However, these solutions are not easy to implement;
                                                                        of validating, cleaning and storing the data, but in a real-time
thus, our model uses security patterns as a guidance. A security
                                                                        scenario where the data should be analysed as soon as it gets into
pattern is a solution to a recurrent problem that indicates how
                                                                        the system, this activity might be skipped. Something similar
to defend against a threat, or a set of threats, in a concise and
                                                                        happens to the visualization step, if the data consumer is not
reusable way [12]. Patterns are abstract solutions that must be
                                                                        a human end-user but another system, like a data warehouse
tailored to where they are applied. Furthermore, we can use mis-
                                                                        or even another Big Data ecosystem, this activity may not be
use patterns [14] as a way to understand each attack and guide
                                                                        necessary.
the application of the different security patterns that can be used
                                                                           Nevertheless, the other three activities are basic in a Big Data
to stop a threat. Moreover, the security metadata can be defined
                                                                        ecosystem: the collection activity acts like an ETL (Extract, Trans-
as a way to facilitate the coordination and realization of security
                                                                        form, and Load) process and combines sets of data of similar
requirements. Another topic covered by our architecture is the
                                                                        structure with the objective of unifying them; the analysis step
context of the asset; for example, the security considerations of
                                                                        includes a set of techniques to obtain valuable knowledge from
a medical record, are totally different compared to the ones of a
                                                                        data; for example, MapReduce algorithms and finally, the access
log file. It is important to evaluate the required security level for
                                                                        activity has the purpose of communicating with the DC, acting
each asset.
                                                                        like an interface between DC and visualization and analytics
                                            Figure 2: System Orchestrator (SO) diagram


                                                                         In regard to security and privacy issues, in this component the
                                                                      activities should be focused on the encryption and key manage-
                                                                      ment of the data, the isolation and containerization of process
                                                                      execution, authorization, authentication, audit logging, and how
                                                                      to secure the storage and the network. Those security issues
                                                                      should be addressed by means of the security solutions defined
                                                                      on the SO, which can be implemented in this level as BDFP secu-
                                                                      rity solutions. The SO security solutions are now mapped to data
                                                                      protection, including application of cryptography and specialized
                                                                      authorization mechanisms [8, 37].

                                                                      3.5    Data Consumer (DC)
           Figure 3: Data Provider (DP) diagram                       The DC component is, similarly to DP, composed by a set of
                                                                      interfaces. The interaction could include interactive visualiza-
                                                                      tion, report creation, or data drilling using business intelligence
activities. The relation between those different activities is rep-   techniques. It is important to highlight that these interfaces must
resented in Figure 4 by dotted lines, because it is a temporary       address the authorization and authentication function, in order
usage relation.                                                       to reach the goal that the DC matches the metadata related to
                                                                      the end-user and the security requirements and policies of the
                                                                      information.
3.4    Big Data Framework Provider (BDFP)
                                                                         Finally, Figure 6 summarizes our complete SRA for Big Data. In
In general, the BDFP component is composed of a set of clus-          this figure, the relationships between the different components
ters which, in turn, are composed of nodes. Those nodes can be        of the architecture can be seen in perspective. This figure is
deployed by means of Virtual Machines or Containers, which            important to better understand the example which is presented
interact with the hardware itself and the OS.                         in the following subsection.
   The BDFP component in NIST is very abstract, with a lack of
details in the subcomponents needed to perform its processes.
                                                                      3.6    Examples of Application of Security
Therefore, our proposal makes more emphasis in the different
elements and how they are connected. Figure 5 depicts the differ-            Patterns
ent subcomponent of the BDFP. Our SRA highlights the idea of          As a way to show the usefulness of our SRA, we explain an ex-
a Big Data ecosystem with the possibility of implementing the         ample of how to employ security patterns using our architecture.
system with a Cloud environment and visualization techniques.         We created the example by identifying some of the threats that
                                   Figure 4: Big Data Application Provider (BDAP) diagram




                                   Figure 5: Big Data Framework Provider (BDFP) diagram


can be found in the different activities of the BDAP component.   We will use an object diagram to explain it, this diagram is shown
A systematic method for the enumeration of threats is shown       in Figure 7. In this scenario, we have the stored data as the main
in [12]. Those threats can be addressed by means of security      asset to protect, this asset has a vulnerability: it has no protection,
patterns, which, in some cases, should be modified from general   this vulnerability could be exploited by a threat like TC1. In order
security patterns to meet the Big Data inherent features. The     to prevent that situation is necessary to implement a security
modification of these patterns, and the creation of new ones if   solution. To facilitate the implementation of the solution, two
needed, is beyond the purpose of this paper and is considered     security patterns can be used: Role-based access control and
as future work. Table I summarizes some of the threats of each    Authentication. However, this security solution will still have a
activity and the general patterns that can be applied to solve    high abstraction level due to the fact that it is defined in the SO
them. Those patterns are defined in [12].                         component. Hence, a low level implementation of the security
   As a way to better understand how to integrate the different   solution should be approached in the BDAP level, in this case, the
components of our SRA and the security patterns, we will define   TC1 can affect the different services provided by the BDAP, that
how the threat TC1 can be addressed by using security patterns.
Figure 6: Big Data SRA complete diagram
                          Table 1: Identified threats and security patterns for the different activities

           ID         Activity               Threat                                Security Pattern
           TC1        Common to all the      Data modified                         Authentication, Role-based access control
                      activities
           TC2        Common to all the      Data destroyed                        Authentication, Role-based access control
                      activities
           TC3        Common to all the      Data illegally read                   Encryption, Role-based access control, Au-
                      activities                                                   thentication
           TC4        Common to all the      Unapproved change in activity         Logger and Auditor, Controlled access
                      activities             function                              session,Role-based access control, Authenti-
                                                                                   cation
           TCo1       Collection             Malicious data source                 Authentication
           TP1        Preparation            Malicious filter                      Logger and Auditor, Controlled access ses-
                                                                                   sion, Role-based access control, Authentica-
                                                                                   tion
           TA1        Analysis               Infer PII* from anonymized data       Encryption, Logger and Auditor, Multilevel
                                                                                   security, Role-based access control, Authenti-
                                                                                   cation
           TA2        Analysis               Malicious analysis algorithms         Logger and Auditor, Controlled access ses-
                                                                                   sion, Role-based access control, Authentica-
                                                                                   tion
           TV1        Visualization          PII* exposed due to high graphic      Multilevel security, Authentication, Role-
                                             granularity                           based access control
           TAc1       Access                 Several malicious access              Authentication, Role-based access control
                                                  *PII – Personal Identifiable Information


                                                                          the other hand, as its name indicates, one of the most important
                                                                          things to implement the Role-based access control is to define
                                                                          the different roles. In this case, we have defined four roles: the
                                                                          administrator of the Big Data system, the data scientist, the end
                                                                          user, and the data owner. As we explained before, this example is
                                                                          focused on the Collector phase, so the defined rights of the roles
                                                                          must consider this situation; for example, in this phase the end
                                                                          user should not have any rights over the data. Hence, the Figure
                                                                          8 shows the different functions that the user can perform over
                                                                          the data according to their rights.


                                                                          4    COMPARISON WITH OTHER PROPOSALS
                                                                         There are not many reference architectures for Big Data systems;
                                                                         if we focus our architecture goal in security, there are even fewer.
                                                                         However, different authors and organizations have proposed
                                                                         different reference architectures for Big Data. In this section, we
                                                                         describe some of the most relevant proposals.
Figure 7: Using security patterns to address a specific
                                                                            Demchenko et al. [11] propose a Big Data Framework Archi-
threat
                                                                         tecture that establishes the data lifecycle in a Big Data ecosystem.
                                                                         As in the NIST approach, they use a block representation; but
is the reason why the security solution should be implemented            with a more detail in the relationships between the different com-
there and not in another component.                                      ponents of the architecture. However, they address security in
    Furthermore, we will describe how to create an instance of the       a very sketchy way and as an isolated feature, not really con-
two different security patterns to secure the Collector subcom-          nected to the other components. In [28] the authors propose a
ponent (Authentication and Role-based Access Control security            complete architecture in terms of the relations between the dif-
patterns) by creating a partial example. In this example, we will        ferent components; however, we found a lack of consideration
focus on a Big Data system whose objective is to process tweets          given to security and privacy aspects. Klein et al. propose in
from the Twitter platform to analyse the general sentiment about         [18] a specific reference architecture for Big Data in the national
a product. Figure 8 shows the object diagram for this example.           security domain. Their architecture is very similar to the one
The main component is what we want to protect, in this case:             proposed by NIST. Our goal is to obtain a better abstraction of
the tweets that have been obtained to be analysed.                       the architecture, but still it is interesting how they address some
    The Authentication pattern allows us to verify the identity of       concerns by using solution patterns. They highlight the impor-
the user by using a proof of identity and an authenticator. On           tance of having a specific domain for the requirements. In our
                        Figure 8: Application of Authentication and Role-based access control patterns

             Table 2: Comparison between RAs                          in some proposals a lack of connection between the different
                                                                      components of the architecture, our SRA clearly specifies those
 RA      Pro- Requirements Security Connection Abstraction            relationships. Finally, our proposal has a medium abstraction
 posal        concern      con-     between    level                  level, due to the fact that we do not consider specific technology
                           cern     compo-                            solutions or applications.
                                    nents                                Although there are some SRAs for Cloud environments and
                                                                      some of their contributions could be useful to a Big Data en-
 NIST           Medium         High       Low           High
                                                                      vironment, there are still some differences that are remarkable
 Demchenko      Medium         Low        Medium        Medium
                                                                      enough to create a SRA for Big Data. For example, there are some
 Klein          Low            Medium     Medium        Low
                                                                      cases where the Big Data environment is supported by a Cloud
 Pääkkönen      Medium         Low        High          Medium
                                                                      infrastructure, in that case, the Big Data RAs must consider that
 and Pakkala
                                                                      possibility. In general, Cloud RAs are focused on the infrastruc-
 SRA Pro-       High           High       High          Medium
                                                                      ture, while a Big Data RA must contemplate also the services
 posal
                                                                      associated with the data analysis.


case, requirements, and specifically the ones related to security,    5   CONCLUSION AND FUTURE WORK
are the main part of the SO component.                                A more precise Reference Architecture (RA) is a better framework
   Sqrrl [34] and BlueTalon [4] propose a Big Data model focused      to guide the use of security mechanisms to provide a high level
on data-centric security. Their purpose is to embed security in-      of security. Our Security Reference Architecture (SRA) subsumes
formation within the data itself. In the case of Sqrrl, they made     the published RAs, including the proposals made by NIST, Oracle,
emphasis in the access control in each field of data, and to do       NTT, and different researchers.
that they use a layered architecture built around the value or           We have created a SRA described by means of UML diagrams
sensitivity of the data. On the other hand, BlueTalon includes in     that try to facilitate the implementation of secure Big Data. We
their proposal the concept of data lakes, a storage repository that   decided to use UML diagrams because we found a lack of propos-
holds a huge amount of raw data until it is needed. There are         als where the relationship between the different components and
other proposals made by the main IT companies like Oracle [5],        subcomponents is precisely defined. Also, thanks to this kind of
NTT data [10], IBM [9], Microsoft [24] or SAP [31]. Table II sum-     diagram it is possible to apply different security patterns, which
marizes these RA and compares them with our SRA proposal. The         are usually described as UML models. Security patterns address
criteria were selected based on a previous systematic mapping         recurrent security problems, we have defined some of the security
study that we carried out about security Big Data concerns [25].      patterns that can be implemented to protect the system against
As a side effect of this work, we detected some characteristics       threats. Our SRA emphasizes the idea of a Big Data ecosystem by
that usually are not considered in the different proposals and        implementing the system using a Cloud Computing environment.
could be important to define a SRA.                                      We have also listed some of the threats that can be found
   Unlike the other proposals, our SRA has the requirements           in a Big Data ecosystem; however, a deeper understanding of
as the main factor to consider to properly implement a Big            the different threats that can affect these systems it is needed.
Data ecosystem, more specifically the security requirements that      We will address this problem by creating different use cases
should be approached in this phase. Moreover, we have found           and scenarios to identify those threats as in the method of [14].
Once we have the threats identified, we will find, adapt or create                      [25] Julio Moreno, Manuel A. Serrano, and Eduardo Fernández-Medina. 2016. Main
security patterns that can solve those problems. We consider                                 Issues in Big Data Security. Future Internet 8, 3 (2016), 44.
                                                                                        [26] NIST NBD-WG. 2017. NIST Big Data Reference Architecture. (2017). https:
these topics as the next steps to complete our SRA. Furthermore,                             //bigdatawg.nist.gov/_uploadfiles/M0639_v1_9796711131.docx
it is important to perform an analysis of the different stakeholders                    [27] NIST NBD-WG. 2017. NIST Big Data Security and Privacy. (2017). https:
                                                                                             //bigdatawg.nist.gov/_uploadfiles/M0638_v1_4829021654.docx
that interact with the Big Data use cases.                                              [28] Pekka Pääkkönen and Daniel Pakkala. 2015. Reference architecture and
                                                                                             classification of technologies, products and services for big data systems. Big
                                                                                             Data Research 2, 4 (2015), 166–186.
ACKNOWLEDGMENTS                                                                         [29] James Rumbaugh, Ivar Jacobson, and Grady Booch. 2004. Unified modeling
This work was funded by the SEQUOIA project (Ministerio de                                   language reference manual, the. Pearson Higher Education.
                                                                                        [30] S. Sagiroglu and D. Sinanc. 2013. Big data: A review. Collaboration Technologies
Economía y Competitividad and the Fondo Europeo de Desarrollo                                and Systems (CTS), 2013 International Conference on (May 2013), 42–47. https:
Regional FEDER, TIN2015-63502-C3-1-R).                                                       //doi.org/10.1109/CTS.2013.6567202
                                                                                        [31] SAP. 2016. CIO Guide to Using the SAP HANA® Platform for Big Data. (Feb.
                                                                                             2016).
REFERENCES                                                                              [32] B. Saraladevi, N. Pazhaniraja, P. Victer Paul, MS Saleem Basha, and P.
                                                                                             Dhavachelvan. 2015. Big Data and Hadoop-A study in security perspective.
 [1] Jacky Akoka, Isabelle Comyn-Wattiau, and Nabil Laoufi. 2017. Research on                Procedia computer science 50 (2015), 596–601.
     Big Data – A systematic mapping study. SI: New modeling in Big Data 54, Part       [33] Priya P. Sharma and Chandrakant P. Navdeti. 2014. Securing big data hadoop:
     2 (Nov. 2017), 105–115. https://doi.org/10.1016/j.csi.2017.01.004                       a review of security issues, threats and solution. Int. J. Comput. Sci. Inf. Technol
 [2] Paris Avgeriou. 2003. Describing, Instantiating and Evaluating a Reference              5 (2014).
     Architecture: A Case Study. Default journal (2003).                                [34] SQRRL. 2014. Big Data and Data Centric Security. (2014). http://sqrrl.com/
 [3] E. Bertino. 2015. Big Data - Security and Privacy. In 2015 IEEE International           media/Data-Centric-Security-WP-final-.pdf
     Congress on Big Data. 757–761. https://doi.org/10.1109/BigDataCongress.2015.       [35] Bhavani Thuraisingham. 2015. Big data security and privacy. In Proceedings of
     126                                                                                     the 5th ACM Conference on Data and Application Security and Privacy. ACM,
 [4] BlueTalon. 2016. BlueTalon Data-Centric Security Platform: Bringing Order               279–280.
     to Data Security Chaos. (2016). http://bluetalon.com/data-centric_security/        [36] Hua Wang, Xiaohong Jiang, and Georgios Kambourakis. 2015. Special issue on
 [5] Doug Cackett. 2013. Information Management And Big Data A Reference                     Security, Privacy and Trust in network-based Big Data. Information Sciences:
     Architecture. Oracle, February (2013).                                                  an International Journal 318, C (2015), 48–50.
 [6] Min Chen, Shiwen Mao, and Yunhao Liu. 2014. Big data: A survey. Mobile             [37] Jiaqi Zhao, Lizhe Wang, Jie Tao, Jinjun Chen, Weiye Sun, Rajiv Ranjan, Joanna
     Networks and Applications 19, 2 (2014), 171–209.                                        Kołodziej, Achim Streit, and Dimitrios Georgakopoulos. 2014. A security
 [7] Big Data Working Group Cloud Security Alliance (CSA). 2013. Ex-                         framework in G-Hadoop for big data computing across distributed Cloud data
     panded Top Ten Big Data Security and Privacy.                     (April 2013).         centres. J. Comput. System Sci. 80, 5 (2014), 994 – 1007. https://doi.org/10.
     https://downloads.cloudsecurityalliance.org/initiatives/bdwg/Expanded_                  1016/j.jcss.2014.02.006 Special Issue on Dependable and Secure Computing.
     Top_Ten_Big_Data_Security_and_Privacy_Challenges.pdf
 [8] Jason C. Cohen and Subrata Acharya. 2014. Towards a trusted HDFS stor-
     age platform: Mitigating threats to Hadoop infrastructures using hardware-
     accelerated encryption with TPM-rooted key protection. Journal of Informa-
     tion Security and Applications 19, 3 (2014), 224 – 244. https://doi.org/10.1016/
     j.jisa.2014.03.003
 [9] IBM Corporation. 2014. IBM Big Data & Analytics RA. (2014).
[10] NTT DATA. 2015. NTT DATA BigData Reference Architecture. (2015). http://
     www.nttdata.com/global/en/shared/pdf/bigdata_reference_architecture.pdf
[11] Yuri Demchenko, Cees De Laat, and Peter Membrey. 2014. Defining architec-
     ture components of the Big Data Ecosystem. In Collaboration Technologies and
     Systems (CTS), 2014 International Conference on. IEEE, 104–112.
[12] Eduardo B. Fernandez. 2013. Security patterns in practice: designing secure
     architectures using software patterns. John Wiley & Sons.
[13] Eduardo B. Fernandez, Raul Monge, and Keiko Hashizume. 2016. Building a
     security reference architecture for cloud systems. Requirements Engineering
     21, 2 (June 2016), 225–249. https://doi.org/10.1007/s00766-014-0218-7
[14] Eduardo B. Fernandez, Nobukazu Yoshioka, and Hironori Washizaki. 2009.
     Modeling misuse patterns. In Availability, Reliability and Security, 2009.
     ARES’09. International Conference on. IEEE, 566–571.
[15] Eduardo B. Fernandez, Nobukazu Yoshioka, Hironori Washizaki, and Madiha H.
     Syed. 2016. Modeling and Security in Cloud Ecosystems. Future Internet 8, 2
     (April 2016), 13. https://doi.org/10.3390/fi8020013
[16] ISO/IEC. 2018. ISO/IEC CD 20547-3 - Information technology – Big data
     reference architecture – Part 3: Reference architecture. (2018). https://www.
     iso.org/standard/71277.html?browse=tc
[17] M. Kaushik and A. Jain. 2014. Challenges to big data security and privacy.
     International Journal of Computer Science and Information Technologies (IJCSIT)
     5, 3 (2014), 3042–3043.
[18] John Klein, Ross Buglak, David Blockow, Troy Wuttke, and Brenton Cooper.
     2016. A reference architecture for big data systems in the national security
     domain. In Proceedings of the 2nd International Workshop on BIG Data Software
     Engineering. ACM, Austin, Texas, 51–57.
[19] Srdjan Krco, Boris Pokric, and Francois Carrez. 2014. Designing IoT archi-
     tecture (s): A European perspective. In Internet of Things (WF-IoT), 2014 IEEE
     World Forum on. IEEE, 79–84.
[20] Guillermo Lafuente. 2015. The big data security challenge. Network Security
     2015, 1 (Jan. 2015), 12–14. https://doi.org/10.1016/S1353-4858(15)70009-7
[21] Fang Liu, Jin Tong, Jian Mao, Robert Bohn, John Messina, Lee Badger, and
     Dawn Leaf. 2011. NIST cloud computing reference architecture. NIST special
     publication 500, 2011 (2011), 292.
[22] V. Mayer-Schönberger and K. Cukier. 2013. Big Data: A Revolution that Will
     Transform how We Live, Work, and Think. Houghton Mifflin Harcourt. https:
     //books.google.es/books?id=uy4lh-WEhhIC
[23] Nenad Medvidovic and Richard N. Taylor. 2010. Software architecture: founda-
     tions, theory, and practice. In Proceedings of the 32nd ACM/IEEE International
     Conference on Software Engineering-Volume 2. ACM, 471–472.
[24] Microsoft. 2014. Microsoft Big Data Solution Brief. (2014). http://download.
     microsoft.com/download/f/a/1/fa126d6d-841b-4565-bb26-d2add4a28f24/
     microsoft_big_data_solution_brief.pdf