=Paper=
{{Paper
|id=Vol-2062/paper4
|storemode=property
|title=Towards a Security Reference Architecture for Big Data
|pdfUrl=https://ceur-ws.org/Vol-2062/paper04.pdf
|volume=Vol-2062
|authors=Julio Moreno,Manuel A. Serrano,Eduardo Fernandez-Medina,Eduardo B. Fernandez
|dblpUrl=https://dblp.org/rec/conf/dolap/MorenoSFF18
}}
==Towards a Security Reference Architecture for Big Data==
Towards a Security Reference Architecture for Big Data
Julio Moreno Manuel A. Serrano
GSyA Research Group, University of Castilla-La Mancha Alarcos Research Group, University of Castilla-La
Ciudad Real, Spain Mancha
Julio.Moreno@uclm.es Ciudad Real, Spain
Manuel.Serrano@uclm.es
Eduardo Fernandez-Medina Eduardo B. Fernandez
GSyA Research Group, University of Castilla-La Mancha Department of Computer and Electrical Engineering and
Ciudad Real, Spain Computer Science, Florida Atlantic University
Eduardo.FdezMedina@uclm.es Boca Raton, Florida
Fernande@fau.edu
ABSTRACT Big Data was not conceived initially as a secure environment
Companies are aware of Big Data importance as data are essential [33], and therefore, the main security problems are related to the
to conduct their daily activities, but new problems arise with new specific architecture of Big Data itself which makes it harder to
technologies, as it is the case of Big Data; these problems are protect the privacy of the data that it is being used [7].
related not only to the 3Vs of Big Data, but also to privacy and Obtaining an adequate level of security in Big Data can influ-
security. Security is crucial in Big Data systems, but unfortunately, ence its implementation in an institution because of the loss of
security problems occur due to the fact that Big Data was not reputation they could suffer or because they could receive finan-
initially conceived as a secure environment. Furthermore, this cial penalties, due to regulations, in the case of data breaches;
task is difficult due to the heterogeneous configurations that a Big in fact, without a security guarantee, Big Data will not reach
Data system can have. One way to solve this problem is by having an appropriate level of acceptance [35]. Hence, it is important
a global perspective, and in this way, a Reference Architecture to have guidance, methodologies, and mechanisms to properly
(RA) is a high-level abstraction of a system that can be useful in implement not only the Big Data system, but also its security.
the implementation of complex systems. Several initiatives have Big Data environments are very complex, so in order to address
been made for obtaining a RA for Big Data like those from IBM, their security, we need to start from a global perspective. Secu-
ORACLE, NIST or ISO, but none of them have their main focus rity should be approached from high-level policies that can be
on security. It is widely accepted that adding elements to address mapped to the lower levels [13]. Different authors [2, 23] high-
threats and facilitate the definition of security requirements to light that Reference Architectures (RA) have been shown to be
RA is a good starting point for solving these kind of threats and, valuable to guide security in different environments; for example,
in this way, converting RAs into Security Reference Architectures Cloud Computing [13] or Internet of Things [19].
(SRAs). In the current paper, a SRA for Big Data is defined using An RA is an abstract software architecture that is based on one
UML models trying to ease secure Big Data implementations; or more domains and with no implementation features [2]. More-
allowing to apply security patterns in order to secure final Big over, an RA should be expressed at a high level of abstraction, in
Data systems. order to be reusable, extendable, and configurable. This kind of
architecture can be composed of different patterns to facilitate
the implementation of the system and improve the addition of
1 INTRODUCTION non-functional requirements [15]. Adding security patterns to
Companies are increasingly aware of Big Data importance [1]. For control their identified threats, RAs become a Security Reference
all of them, data are essential to conduct their daily activities and Architecture (SRA). In this way, a SRA is a high level architecture
to help senior management to achieve business objectives and, as that incorporates a set of elements facilitating the definition of
a result, take better decisions based on the information extracted security requirements and allowing better understanding of secu-
from such data [22]. Big Data implies a change compared to rity policies, threats, vulnerabilities, etc., and which can be used
traditional techniques in three different ways: the amount of to describe a conceptual model of security for Big Data systems
data (volume), the rate of generation and transmission of data [21].
(velocity) and the heterogeneity of the types of structured and Among our main concerns in computer security, our current
unstructured data that it can handle (variety) [6]. These properties goal is to improve the security and trust of Big Data environ-
are known as the three Vs of Big Data [30]. ments. In order to achieve that objective, our first step is the
New problems usually arise with new technologies, as it is the creation of a SRA for Big Data. To do that, we consider that
case of Big Data. These problems are related not only to the 3 Vs security patterns have a primordial role in facilitating the im-
of Big data, but also to privacy and security. Big Data not only plementation of security mechanisms in a Big Data ecosystem.
increases the scale of the problems related to privacy and security, Hence, we modified the RA proposed by the National Institute
as faced in the traditional management of security, but also adds of Standards and Technology (NIST) for Big Data [26] to create
new ones that should be addressed with different techniques and a richer architecture, in which the relations between the differ-
measures [36]. These security problems occur due to the fact that ent parts of Big Data are clearly exposed with a more granular
detail. This enhanced RA will allow a better understanding of
© 2018 Copyright held by the owner/author(s). Published in the Workshop the Big Data ecosystem. In order to achieve that purpose, our
Proceedings of the EDBT/ICDT 2018 Joint Conference (March 26, 2018, Vienna,
Austria) on CEUR-WS.org (ISSN 1613-0073). Distribution of this paper is permitted reference architecture is specified by means of UML diagrams
under the terms of the Creative Commons license CC-by-nc-nd 4.0.
[29]. Finally, along with the SRA, we created a partial example
of how to apply our architecture; we have considered some of
the different threats that can affect a Big Data system, and how
the different components that take part in addressing them can
be instantiated; for example, security patterns that can help in
the solution of those problems.
We organize the content of the paper as follows: first, we show
a section which explains the main properties of the NIST proposal
of an RA for Big Data. After that, we present the components
and structure of our SRA, together with an example of how to
use security patterns to address threats in a particular Big Data
project. Subsequently, we compare our proposal with the main
Big Data RA proposals. Finally, we include a section in which
conclusions and future work are discussed.
2 REFERENCE MODEL: NIST REFERENCE
ARCHITECTURE FOR BIG DATA
For the last several years, the NIST has defined a RA for Big Data Figure 1: NIST proposal for a Big Data architecture [26]
which has received the general consensus of the industry and
scientific community [26]. With the release of last version on
August 2017, this architecture collects many different ideas and underlying Big Data Framework Provider, as well as with
features for creating a Big Data ecosystem. This set of features the Data Consumer, DP or even with each other.
were extracted from the proposals of a Big Data architecture • Big Data Framework Provider (BDFP): The BDFP compo-
made by the main companies of the sector, such as, Oracle and nent can be considered as the platform implementation of
IBM. As a result, NIST produced the RA that can be seen in Figure the Big Data logic. It supports the activities defined in the
1. The architecture is divided into five different components that BDAP. In general, Big Data implementations are hybrids
interact with each other and have different objectives. These that combine multiple technologies. It has three main ac-
components are: tivities: infrastructure (virtual or physical), platform (how
the data is distributed and organized), and processing (how
• System Orchestrator (SO): This is one of the most impor- data will be processed to support Big Data applications). In
tant components of a Big Data ecosystem because it is addition, the BDFP component also provides the support
the one in charge of defining and integrating the required services for the system like communications or resource
data application activities into the ecosystem. The main management.
purpose of this component is the configuration and man- • Data Consumer (DC): It is similar to the DP component.
agement of the other components of the Big Data architec- Usually the actor that interacts with this component is
ture. In an enterprise, this function is typically centralized an end-user or another system. Similarly to the DP, it is
and can be mapped to the traditional role of system gov- composed of a set of interfaces between the end-user and
ernor which provides the supervision of the requirements the information.
and constraints that the Big Data must fulfill; for example,
policies, architecture, or business requirements. The NIST proposal cannot be considered as a SRA, but it rec-
• Data Provider (DP): This component oversees feeding the ognizes the importance of security and privacy in a Big Data
Big Data ecosystem with new data. In order to accomplish environment. In order to face the security problems, this archi-
that goal, the Data Provider has a collection of interfaces, tecture has a Security and Privacy Fabric that addresses the needs
or services, between the Big Data and the data sources. and solutions about this specific topic. In fact, there exists a spe-
This set of interfaces acts like a gate between the outside cific volume about privacy and security in Big Data [27].
world and the Big Data system. From our point of view, this representation based on blocks
• Big Data Application Provider (BDAP): The BDAP compo- is not expressive enough. This kind of specification is too high
nent provides a specific set of services along the data life level in terms of abstraction, it provides little emphasis on de-
cycle to meet the requirements established by the SO. It is tails of the subcomponents and how they are connected. This
important to highlight that its main purpose is to encapsu- approach can make difficult the design and implementation of a
late the business logic and functionality to be executed by Big Data ecosystem. Following the same approach, the ISO/IEC
the architecture. In a regular Big Data scenario, there are organization is also working in the creation of a RA for Big Data
several applications executing over the same data. As data under the standard ISO/IEC 20547-3 [16]. Although, it is a work
propagates through the ecosystem, it is being processed in progress, it is expected that it will follow a similar approach
and transformed in different ways to obtain valuable in- to the NIST proposal.
formation from the data. In order to achieve that goal, the
BDAP is composed of different services or activities that 3 A SECURITY REFERENCE
can be considered as the SaaS layer of the Big Data sys- ARCHITECTURE (SRA) FOR BIG DATA
tem. These activities are: collection, preparation, analytics, In this section, we will describe our SRA proposal which is struc-
visualization, and access. Activities can be implemented tured using the same schema and components as the guidelines
as independent functions and deployed as stand-alone proposed by NIST. We consider that if our SRA is aligned with the
services. Furthermore, the activities can interact with the RA proposed by NIST, it will be easier to implement. Furthermore,
this architecture highlights the importance of implementing se- 3.2 Data Provider (DP)
curity solutions based in concepts of the SRA. The DP component creates an abstraction of the data sources
considering their security metadata, if they exist. These meta-
data allow the DP to identify the types of access and analysis
allowed by the data source and its security requirements. As
3.1 System Orchestrator (SO) we explained in section 2, the DP has a set of interfaces. Those
The main purpose of this component is the enforcement of the interfaces must consider the constraints of each data source and
different requirements that the Big Data ecosystem must address. also the different security policies and requirements specified by
Also, it organizes how the requirements are connected to all the the SO. In this element, there may exist conflicts between the
components of the architecture; in this section, we will focus on security requirements of the data source and the ones of the Big
the security requirements and the relation between them and Data system itself. These clashes must be addressed in a way
the different components. Figure 2 shows the structure of our that satisfies both sides. The security and privacy issues of this
SO proposal. Due to the characteristics of this component, the component are mostly related to how to properly identify and
security activities related to it are in general focused on the re- validate the end point inputs. The DP interfaces must evaluate
quirements and how to implement and monitor them. Those the provenance of the data source. It is a critical challenge in
requirements must fulfill Big Data goals and should be aligned the data collection process knowing how to validate that a data
with the different business goals and company policies. In this source is not malicious and to filter out those which are [7].
concern, the role of the Security Administrator is crucial to en- In our SRA, the interfaces are connected with the Collector
sure the observance of the security requirements. These security service of the BDAP that will be described in the next subsec-
requirements must comply with the regulations affecting each tion. Figure 3 represents the DP component with its interfaces.
Big Data ecosystem context. In fact, there are many other kinds of In general, the elements that generally compose a data source,
requirements that can address the needs of a Big Data ecosystem; include: the data itself that can be structured, semi-structured,
for example, architecture, quality, or governance requirements. or unstructured; security requirements of the data source; and
There are many examples of security requirements that should security metadata of the data source. Those elements are not
be addressed in a Big Data context. Topics like data privacy and represented in the diagram because we consider data source as
how to secure the Big Data architecture itself are the most ad- an external agent of the Big Data system. Still it is important to
dressed by researchers [25]. These problems can be tackled by know them to apply their constraints.
using general mechanisms like user authorization and authentica-
tion, fraud detection, risk control, auditing, encryption, network
access control, intrusion detection, or guarantee the quality and 3.3 Big Data Application Provider (BDAP)
security of the data when they come from different data sources
The BDAP component has the objective of meeting the require-
[3, 17, 20, 25, 32]. These are general security mechanisms but
ments established by the SO, including its security and privacy
they must be modified to be applied to specific types of systems,
requirements. To achieve that goal, the BDAP is composed of
based on possible threats.
different services or activities that can be considered as the SaaS
As it is shown in Figure 2, these security requirements can
(Service as a System) layer of the Big Data ecosystem; in our case,
be satisfied by means of different security solutions that follow
we assume that, in general, Big Data is implemented on a Cloud
the security policies of the company and have as main objective
platform, which will affect how the SRA is defined in the BDFP
addressing threats to control vulnerabilities. An example of a se-
component. Figure 4 shows the different services that constitute
curity policy in a company can be the obligation of using secure
this component, and also the BDAP Security Solution that must
communications, this policy can cause a security requirement
map the SO security solutions to these stages; for example, au-
in the Big Data environment that specifies that the data trans-
thorization may control here who can apply which operations to
fer between components must be secure. One way to approach
perform data analysis.
requirement is by using authentication methods, the implemen-
As it is represented in the diagram, not all the activities can
tation of this security solution can be helped by means of the
communicate with each other, there is a sequential order of execu-
“Role-based access control” security pattern. These security solu-
tion. This means that some of these activities are not mandatory
tions should be specifically implemented in the BDAP and BDFP
in a Big Data ecosystem. The preparation step has the purpose
components. However, these solutions are not easy to implement;
of validating, cleaning and storing the data, but in a real-time
thus, our model uses security patterns as a guidance. A security
scenario where the data should be analysed as soon as it gets into
pattern is a solution to a recurrent problem that indicates how
the system, this activity might be skipped. Something similar
to defend against a threat, or a set of threats, in a concise and
happens to the visualization step, if the data consumer is not
reusable way [12]. Patterns are abstract solutions that must be
a human end-user but another system, like a data warehouse
tailored to where they are applied. Furthermore, we can use mis-
or even another Big Data ecosystem, this activity may not be
use patterns [14] as a way to understand each attack and guide
necessary.
the application of the different security patterns that can be used
Nevertheless, the other three activities are basic in a Big Data
to stop a threat. Moreover, the security metadata can be defined
ecosystem: the collection activity acts like an ETL (Extract, Trans-
as a way to facilitate the coordination and realization of security
form, and Load) process and combines sets of data of similar
requirements. Another topic covered by our architecture is the
structure with the objective of unifying them; the analysis step
context of the asset; for example, the security considerations of
includes a set of techniques to obtain valuable knowledge from
a medical record, are totally different compared to the ones of a
data; for example, MapReduce algorithms and finally, the access
log file. It is important to evaluate the required security level for
activity has the purpose of communicating with the DC, acting
each asset.
like an interface between DC and visualization and analytics
Figure 2: System Orchestrator (SO) diagram
In regard to security and privacy issues, in this component the
activities should be focused on the encryption and key manage-
ment of the data, the isolation and containerization of process
execution, authorization, authentication, audit logging, and how
to secure the storage and the network. Those security issues
should be addressed by means of the security solutions defined
on the SO, which can be implemented in this level as BDFP secu-
rity solutions. The SO security solutions are now mapped to data
protection, including application of cryptography and specialized
authorization mechanisms [8, 37].
3.5 Data Consumer (DC)
Figure 3: Data Provider (DP) diagram The DC component is, similarly to DP, composed by a set of
interfaces. The interaction could include interactive visualiza-
tion, report creation, or data drilling using business intelligence
activities. The relation between those different activities is rep- techniques. It is important to highlight that these interfaces must
resented in Figure 4 by dotted lines, because it is a temporary address the authorization and authentication function, in order
usage relation. to reach the goal that the DC matches the metadata related to
the end-user and the security requirements and policies of the
information.
3.4 Big Data Framework Provider (BDFP)
Finally, Figure 6 summarizes our complete SRA for Big Data. In
In general, the BDFP component is composed of a set of clus- this figure, the relationships between the different components
ters which, in turn, are composed of nodes. Those nodes can be of the architecture can be seen in perspective. This figure is
deployed by means of Virtual Machines or Containers, which important to better understand the example which is presented
interact with the hardware itself and the OS. in the following subsection.
The BDFP component in NIST is very abstract, with a lack of
details in the subcomponents needed to perform its processes.
3.6 Examples of Application of Security
Therefore, our proposal makes more emphasis in the different
elements and how they are connected. Figure 5 depicts the differ- Patterns
ent subcomponent of the BDFP. Our SRA highlights the idea of As a way to show the usefulness of our SRA, we explain an ex-
a Big Data ecosystem with the possibility of implementing the ample of how to employ security patterns using our architecture.
system with a Cloud environment and visualization techniques. We created the example by identifying some of the threats that
Figure 4: Big Data Application Provider (BDAP) diagram
Figure 5: Big Data Framework Provider (BDFP) diagram
can be found in the different activities of the BDAP component. We will use an object diagram to explain it, this diagram is shown
A systematic method for the enumeration of threats is shown in Figure 7. In this scenario, we have the stored data as the main
in [12]. Those threats can be addressed by means of security asset to protect, this asset has a vulnerability: it has no protection,
patterns, which, in some cases, should be modified from general this vulnerability could be exploited by a threat like TC1. In order
security patterns to meet the Big Data inherent features. The to prevent that situation is necessary to implement a security
modification of these patterns, and the creation of new ones if solution. To facilitate the implementation of the solution, two
needed, is beyond the purpose of this paper and is considered security patterns can be used: Role-based access control and
as future work. Table I summarizes some of the threats of each Authentication. However, this security solution will still have a
activity and the general patterns that can be applied to solve high abstraction level due to the fact that it is defined in the SO
them. Those patterns are defined in [12]. component. Hence, a low level implementation of the security
As a way to better understand how to integrate the different solution should be approached in the BDAP level, in this case, the
components of our SRA and the security patterns, we will define TC1 can affect the different services provided by the BDAP, that
how the threat TC1 can be addressed by using security patterns.
Figure 6: Big Data SRA complete diagram
Table 1: Identified threats and security patterns for the different activities
ID Activity Threat Security Pattern
TC1 Common to all the Data modified Authentication, Role-based access control
activities
TC2 Common to all the Data destroyed Authentication, Role-based access control
activities
TC3 Common to all the Data illegally read Encryption, Role-based access control, Au-
activities thentication
TC4 Common to all the Unapproved change in activity Logger and Auditor, Controlled access
activities function session,Role-based access control, Authenti-
cation
TCo1 Collection Malicious data source Authentication
TP1 Preparation Malicious filter Logger and Auditor, Controlled access ses-
sion, Role-based access control, Authentica-
tion
TA1 Analysis Infer PII* from anonymized data Encryption, Logger and Auditor, Multilevel
security, Role-based access control, Authenti-
cation
TA2 Analysis Malicious analysis algorithms Logger and Auditor, Controlled access ses-
sion, Role-based access control, Authentica-
tion
TV1 Visualization PII* exposed due to high graphic Multilevel security, Authentication, Role-
granularity based access control
TAc1 Access Several malicious access Authentication, Role-based access control
*PII – Personal Identifiable Information
the other hand, as its name indicates, one of the most important
things to implement the Role-based access control is to define
the different roles. In this case, we have defined four roles: the
administrator of the Big Data system, the data scientist, the end
user, and the data owner. As we explained before, this example is
focused on the Collector phase, so the defined rights of the roles
must consider this situation; for example, in this phase the end
user should not have any rights over the data. Hence, the Figure
8 shows the different functions that the user can perform over
the data according to their rights.
4 COMPARISON WITH OTHER PROPOSALS
There are not many reference architectures for Big Data systems;
if we focus our architecture goal in security, there are even fewer.
However, different authors and organizations have proposed
different reference architectures for Big Data. In this section, we
describe some of the most relevant proposals.
Figure 7: Using security patterns to address a specific
Demchenko et al. [11] propose a Big Data Framework Archi-
threat
tecture that establishes the data lifecycle in a Big Data ecosystem.
As in the NIST approach, they use a block representation; but
is the reason why the security solution should be implemented with a more detail in the relationships between the different com-
there and not in another component. ponents of the architecture. However, they address security in
Furthermore, we will describe how to create an instance of the a very sketchy way and as an isolated feature, not really con-
two different security patterns to secure the Collector subcom- nected to the other components. In [28] the authors propose a
ponent (Authentication and Role-based Access Control security complete architecture in terms of the relations between the dif-
patterns) by creating a partial example. In this example, we will ferent components; however, we found a lack of consideration
focus on a Big Data system whose objective is to process tweets given to security and privacy aspects. Klein et al. propose in
from the Twitter platform to analyse the general sentiment about [18] a specific reference architecture for Big Data in the national
a product. Figure 8 shows the object diagram for this example. security domain. Their architecture is very similar to the one
The main component is what we want to protect, in this case: proposed by NIST. Our goal is to obtain a better abstraction of
the tweets that have been obtained to be analysed. the architecture, but still it is interesting how they address some
The Authentication pattern allows us to verify the identity of concerns by using solution patterns. They highlight the impor-
the user by using a proof of identity and an authenticator. On tance of having a specific domain for the requirements. In our
Figure 8: Application of Authentication and Role-based access control patterns
Table 2: Comparison between RAs in some proposals a lack of connection between the different
components of the architecture, our SRA clearly specifies those
RA Pro- Requirements Security Connection Abstraction relationships. Finally, our proposal has a medium abstraction
posal concern con- between level level, due to the fact that we do not consider specific technology
cern compo- solutions or applications.
nents Although there are some SRAs for Cloud environments and
some of their contributions could be useful to a Big Data en-
NIST Medium High Low High
vironment, there are still some differences that are remarkable
Demchenko Medium Low Medium Medium
enough to create a SRA for Big Data. For example, there are some
Klein Low Medium Medium Low
cases where the Big Data environment is supported by a Cloud
Pääkkönen Medium Low High Medium
infrastructure, in that case, the Big Data RAs must consider that
and Pakkala
possibility. In general, Cloud RAs are focused on the infrastruc-
SRA Pro- High High High Medium
ture, while a Big Data RA must contemplate also the services
posal
associated with the data analysis.
case, requirements, and specifically the ones related to security, 5 CONCLUSION AND FUTURE WORK
are the main part of the SO component. A more precise Reference Architecture (RA) is a better framework
Sqrrl [34] and BlueTalon [4] propose a Big Data model focused to guide the use of security mechanisms to provide a high level
on data-centric security. Their purpose is to embed security in- of security. Our Security Reference Architecture (SRA) subsumes
formation within the data itself. In the case of Sqrrl, they made the published RAs, including the proposals made by NIST, Oracle,
emphasis in the access control in each field of data, and to do NTT, and different researchers.
that they use a layered architecture built around the value or We have created a SRA described by means of UML diagrams
sensitivity of the data. On the other hand, BlueTalon includes in that try to facilitate the implementation of secure Big Data. We
their proposal the concept of data lakes, a storage repository that decided to use UML diagrams because we found a lack of propos-
holds a huge amount of raw data until it is needed. There are als where the relationship between the different components and
other proposals made by the main IT companies like Oracle [5], subcomponents is precisely defined. Also, thanks to this kind of
NTT data [10], IBM [9], Microsoft [24] or SAP [31]. Table II sum- diagram it is possible to apply different security patterns, which
marizes these RA and compares them with our SRA proposal. The are usually described as UML models. Security patterns address
criteria were selected based on a previous systematic mapping recurrent security problems, we have defined some of the security
study that we carried out about security Big Data concerns [25]. patterns that can be implemented to protect the system against
As a side effect of this work, we detected some characteristics threats. Our SRA emphasizes the idea of a Big Data ecosystem by
that usually are not considered in the different proposals and implementing the system using a Cloud Computing environment.
could be important to define a SRA. We have also listed some of the threats that can be found
Unlike the other proposals, our SRA has the requirements in a Big Data ecosystem; however, a deeper understanding of
as the main factor to consider to properly implement a Big the different threats that can affect these systems it is needed.
Data ecosystem, more specifically the security requirements that We will address this problem by creating different use cases
should be approached in this phase. Moreover, we have found and scenarios to identify those threats as in the method of [14].
Once we have the threats identified, we will find, adapt or create [25] Julio Moreno, Manuel A. Serrano, and Eduardo Fernández-Medina. 2016. Main
security patterns that can solve those problems. We consider Issues in Big Data Security. Future Internet 8, 3 (2016), 44.
[26] NIST NBD-WG. 2017. NIST Big Data Reference Architecture. (2017). https:
these topics as the next steps to complete our SRA. Furthermore, //bigdatawg.nist.gov/_uploadfiles/M0639_v1_9796711131.docx
it is important to perform an analysis of the different stakeholders [27] NIST NBD-WG. 2017. NIST Big Data Security and Privacy. (2017). https:
//bigdatawg.nist.gov/_uploadfiles/M0638_v1_4829021654.docx
that interact with the Big Data use cases. [28] Pekka Pääkkönen and Daniel Pakkala. 2015. Reference architecture and
classification of technologies, products and services for big data systems. Big
Data Research 2, 4 (2015), 166–186.
ACKNOWLEDGMENTS [29] James Rumbaugh, Ivar Jacobson, and Grady Booch. 2004. Unified modeling
This work was funded by the SEQUOIA project (Ministerio de language reference manual, the. Pearson Higher Education.
[30] S. Sagiroglu and D. Sinanc. 2013. Big data: A review. Collaboration Technologies
Economía y Competitividad and the Fondo Europeo de Desarrollo and Systems (CTS), 2013 International Conference on (May 2013), 42–47. https:
Regional FEDER, TIN2015-63502-C3-1-R). //doi.org/10.1109/CTS.2013.6567202
[31] SAP. 2016. CIO Guide to Using the SAP HANA® Platform for Big Data. (Feb.
2016).
REFERENCES [32] B. Saraladevi, N. Pazhaniraja, P. Victer Paul, MS Saleem Basha, and P.
Dhavachelvan. 2015. Big Data and Hadoop-A study in security perspective.
[1] Jacky Akoka, Isabelle Comyn-Wattiau, and Nabil Laoufi. 2017. Research on Procedia computer science 50 (2015), 596–601.
Big Data – A systematic mapping study. SI: New modeling in Big Data 54, Part [33] Priya P. Sharma and Chandrakant P. Navdeti. 2014. Securing big data hadoop:
2 (Nov. 2017), 105–115. https://doi.org/10.1016/j.csi.2017.01.004 a review of security issues, threats and solution. Int. J. Comput. Sci. Inf. Technol
[2] Paris Avgeriou. 2003. Describing, Instantiating and Evaluating a Reference 5 (2014).
Architecture: A Case Study. Default journal (2003). [34] SQRRL. 2014. Big Data and Data Centric Security. (2014). http://sqrrl.com/
[3] E. Bertino. 2015. Big Data - Security and Privacy. In 2015 IEEE International media/Data-Centric-Security-WP-final-.pdf
Congress on Big Data. 757–761. https://doi.org/10.1109/BigDataCongress.2015. [35] Bhavani Thuraisingham. 2015. Big data security and privacy. In Proceedings of
126 the 5th ACM Conference on Data and Application Security and Privacy. ACM,
[4] BlueTalon. 2016. BlueTalon Data-Centric Security Platform: Bringing Order 279–280.
to Data Security Chaos. (2016). http://bluetalon.com/data-centric_security/ [36] Hua Wang, Xiaohong Jiang, and Georgios Kambourakis. 2015. Special issue on
[5] Doug Cackett. 2013. Information Management And Big Data A Reference Security, Privacy and Trust in network-based Big Data. Information Sciences:
Architecture. Oracle, February (2013). an International Journal 318, C (2015), 48–50.
[6] Min Chen, Shiwen Mao, and Yunhao Liu. 2014. Big data: A survey. Mobile [37] Jiaqi Zhao, Lizhe Wang, Jie Tao, Jinjun Chen, Weiye Sun, Rajiv Ranjan, Joanna
Networks and Applications 19, 2 (2014), 171–209. Kołodziej, Achim Streit, and Dimitrios Georgakopoulos. 2014. A security
[7] Big Data Working Group Cloud Security Alliance (CSA). 2013. Ex- framework in G-Hadoop for big data computing across distributed Cloud data
panded Top Ten Big Data Security and Privacy. (April 2013). centres. J. Comput. System Sci. 80, 5 (2014), 994 – 1007. https://doi.org/10.
https://downloads.cloudsecurityalliance.org/initiatives/bdwg/Expanded_ 1016/j.jcss.2014.02.006 Special Issue on Dependable and Secure Computing.
Top_Ten_Big_Data_Security_and_Privacy_Challenges.pdf
[8] Jason C. Cohen and Subrata Acharya. 2014. Towards a trusted HDFS stor-
age platform: Mitigating threats to Hadoop infrastructures using hardware-
accelerated encryption with TPM-rooted key protection. Journal of Informa-
tion Security and Applications 19, 3 (2014), 224 – 244. https://doi.org/10.1016/
j.jisa.2014.03.003
[9] IBM Corporation. 2014. IBM Big Data & Analytics RA. (2014).
[10] NTT DATA. 2015. NTT DATA BigData Reference Architecture. (2015). http://
www.nttdata.com/global/en/shared/pdf/bigdata_reference_architecture.pdf
[11] Yuri Demchenko, Cees De Laat, and Peter Membrey. 2014. Defining architec-
ture components of the Big Data Ecosystem. In Collaboration Technologies and
Systems (CTS), 2014 International Conference on. IEEE, 104–112.
[12] Eduardo B. Fernandez. 2013. Security patterns in practice: designing secure
architectures using software patterns. John Wiley & Sons.
[13] Eduardo B. Fernandez, Raul Monge, and Keiko Hashizume. 2016. Building a
security reference architecture for cloud systems. Requirements Engineering
21, 2 (June 2016), 225–249. https://doi.org/10.1007/s00766-014-0218-7
[14] Eduardo B. Fernandez, Nobukazu Yoshioka, and Hironori Washizaki. 2009.
Modeling misuse patterns. In Availability, Reliability and Security, 2009.
ARES’09. International Conference on. IEEE, 566–571.
[15] Eduardo B. Fernandez, Nobukazu Yoshioka, Hironori Washizaki, and Madiha H.
Syed. 2016. Modeling and Security in Cloud Ecosystems. Future Internet 8, 2
(April 2016), 13. https://doi.org/10.3390/fi8020013
[16] ISO/IEC. 2018. ISO/IEC CD 20547-3 - Information technology – Big data
reference architecture – Part 3: Reference architecture. (2018). https://www.
iso.org/standard/71277.html?browse=tc
[17] M. Kaushik and A. Jain. 2014. Challenges to big data security and privacy.
International Journal of Computer Science and Information Technologies (IJCSIT)
5, 3 (2014), 3042–3043.
[18] John Klein, Ross Buglak, David Blockow, Troy Wuttke, and Brenton Cooper.
2016. A reference architecture for big data systems in the national security
domain. In Proceedings of the 2nd International Workshop on BIG Data Software
Engineering. ACM, Austin, Texas, 51–57.
[19] Srdjan Krco, Boris Pokric, and Francois Carrez. 2014. Designing IoT archi-
tecture (s): A European perspective. In Internet of Things (WF-IoT), 2014 IEEE
World Forum on. IEEE, 79–84.
[20] Guillermo Lafuente. 2015. The big data security challenge. Network Security
2015, 1 (Jan. 2015), 12–14. https://doi.org/10.1016/S1353-4858(15)70009-7
[21] Fang Liu, Jin Tong, Jian Mao, Robert Bohn, John Messina, Lee Badger, and
Dawn Leaf. 2011. NIST cloud computing reference architecture. NIST special
publication 500, 2011 (2011), 292.
[22] V. Mayer-Schönberger and K. Cukier. 2013. Big Data: A Revolution that Will
Transform how We Live, Work, and Think. Houghton Mifflin Harcourt. https:
//books.google.es/books?id=uy4lh-WEhhIC
[23] Nenad Medvidovic and Richard N. Taylor. 2010. Software architecture: founda-
tions, theory, and practice. In Proceedings of the 32nd ACM/IEEE International
Conference on Software Engineering-Volume 2. ACM, 471–472.
[24] Microsoft. 2014. Microsoft Big Data Solution Brief. (2014). http://download.
microsoft.com/download/f/a/1/fa126d6d-841b-4565-bb26-d2add4a28f24/
microsoft_big_data_solution_brief.pdf