Towards a Security Reference Architecture for Big Data Julio Moreno Manuel A. Serrano GSyA Research Group, University of Castilla-La Mancha Alarcos Research Group, University of Castilla-La Ciudad Real, Spain Mancha Julio.Moreno@uclm.es Ciudad Real, Spain Manuel.Serrano@uclm.es Eduardo Fernandez-Medina Eduardo B. Fernandez GSyA Research Group, University of Castilla-La Mancha Department of Computer and Electrical Engineering and Ciudad Real, Spain Computer Science, Florida Atlantic University Eduardo.FdezMedina@uclm.es Boca Raton, Florida Fernande@fau.edu ABSTRACT Big Data was not conceived initially as a secure environment Companies are aware of Big Data importance as data are essential [33], and therefore, the main security problems are related to the to conduct their daily activities, but new problems arise with new specific architecture of Big Data itself which makes it harder to technologies, as it is the case of Big Data; these problems are protect the privacy of the data that it is being used [7]. related not only to the 3Vs of Big Data, but also to privacy and Obtaining an adequate level of security in Big Data can influ- security. Security is crucial in Big Data systems, but unfortunately, ence its implementation in an institution because of the loss of security problems occur due to the fact that Big Data was not reputation they could suffer or because they could receive finan- initially conceived as a secure environment. Furthermore, this cial penalties, due to regulations, in the case of data breaches; task is difficult due to the heterogeneous configurations that a Big in fact, without a security guarantee, Big Data will not reach Data system can have. One way to solve this problem is by having an appropriate level of acceptance [35]. Hence, it is important a global perspective, and in this way, a Reference Architecture to have guidance, methodologies, and mechanisms to properly (RA) is a high-level abstraction of a system that can be useful in implement not only the Big Data system, but also its security. the implementation of complex systems. Several initiatives have Big Data environments are very complex, so in order to address been made for obtaining a RA for Big Data like those from IBM, their security, we need to start from a global perspective. Secu- ORACLE, NIST or ISO, but none of them have their main focus rity should be approached from high-level policies that can be on security. It is widely accepted that adding elements to address mapped to the lower levels [13]. Different authors [2, 23] high- threats and facilitate the definition of security requirements to light that Reference Architectures (RA) have been shown to be RA is a good starting point for solving these kind of threats and, valuable to guide security in different environments; for example, in this way, converting RAs into Security Reference Architectures Cloud Computing [13] or Internet of Things [19]. (SRAs). In the current paper, a SRA for Big Data is defined using An RA is an abstract software architecture that is based on one UML models trying to ease secure Big Data implementations; or more domains and with no implementation features [2]. More- allowing to apply security patterns in order to secure final Big over, an RA should be expressed at a high level of abstraction, in Data systems. order to be reusable, extendable, and configurable. This kind of architecture can be composed of different patterns to facilitate the implementation of the system and improve the addition of 1 INTRODUCTION non-functional requirements [15]. Adding security patterns to Companies are increasingly aware of Big Data importance [1]. For control their identified threats, RAs become a Security Reference all of them, data are essential to conduct their daily activities and Architecture (SRA). In this way, a SRA is a high level architecture to help senior management to achieve business objectives and, as that incorporates a set of elements facilitating the definition of a result, take better decisions based on the information extracted security requirements and allowing better understanding of secu- from such data [22]. Big Data implies a change compared to rity policies, threats, vulnerabilities, etc., and which can be used traditional techniques in three different ways: the amount of to describe a conceptual model of security for Big Data systems data (volume), the rate of generation and transmission of data [21]. (velocity) and the heterogeneity of the types of structured and Among our main concerns in computer security, our current unstructured data that it can handle (variety) [6]. These properties goal is to improve the security and trust of Big Data environ- are known as the three Vs of Big Data [30]. ments. In order to achieve that objective, our first step is the New problems usually arise with new technologies, as it is the creation of a SRA for Big Data. To do that, we consider that case of Big Data. These problems are related not only to the 3 Vs security patterns have a primordial role in facilitating the im- of Big data, but also to privacy and security. Big Data not only plementation of security mechanisms in a Big Data ecosystem. increases the scale of the problems related to privacy and security, Hence, we modified the RA proposed by the National Institute as faced in the traditional management of security, but also adds of Standards and Technology (NIST) for Big Data [26] to create new ones that should be addressed with different techniques and a richer architecture, in which the relations between the differ- measures [36]. These security problems occur due to the fact that ent parts of Big Data are clearly exposed with a more granular detail. This enhanced RA will allow a better understanding of © 2018 Copyright held by the owner/author(s). Published in the Workshop the Big Data ecosystem. In order to achieve that purpose, our Proceedings of the EDBT/ICDT 2018 Joint Conference (March 26, 2018, Vienna, Austria) on CEUR-WS.org (ISSN 1613-0073). Distribution of this paper is permitted reference architecture is specified by means of UML diagrams under the terms of the Creative Commons license CC-by-nc-nd 4.0. [29]. Finally, along with the SRA, we created a partial example of how to apply our architecture; we have considered some of the different threats that can affect a Big Data system, and how the different components that take part in addressing them can be instantiated; for example, security patterns that can help in the solution of those problems. We organize the content of the paper as follows: first, we show a section which explains the main properties of the NIST proposal of an RA for Big Data. After that, we present the components and structure of our SRA, together with an example of how to use security patterns to address threats in a particular Big Data project. Subsequently, we compare our proposal with the main Big Data RA proposals. Finally, we include a section in which conclusions and future work are discussed. 2 REFERENCE MODEL: NIST REFERENCE ARCHITECTURE FOR BIG DATA For the last several years, the NIST has defined a RA for Big Data Figure 1: NIST proposal for a Big Data architecture [26] which has received the general consensus of the industry and scientific community [26]. With the release of last version on August 2017, this architecture collects many different ideas and underlying Big Data Framework Provider, as well as with features for creating a Big Data ecosystem. This set of features the Data Consumer, DP or even with each other. were extracted from the proposals of a Big Data architecture • Big Data Framework Provider (BDFP): The BDFP compo- made by the main companies of the sector, such as, Oracle and nent can be considered as the platform implementation of IBM. As a result, NIST produced the RA that can be seen in Figure the Big Data logic. It supports the activities defined in the 1. The architecture is divided into five different components that BDAP. In general, Big Data implementations are hybrids interact with each other and have different objectives. These that combine multiple technologies. It has three main ac- components are: tivities: infrastructure (virtual or physical), platform (how the data is distributed and organized), and processing (how • System Orchestrator (SO): This is one of the most impor- data will be processed to support Big Data applications). In tant components of a Big Data ecosystem because it is addition, the BDFP component also provides the support the one in charge of defining and integrating the required services for the system like communications or resource data application activities into the ecosystem. The main management. purpose of this component is the configuration and man- • Data Consumer (DC): It is similar to the DP component. agement of the other components of the Big Data architec- Usually the actor that interacts with this component is ture. In an enterprise, this function is typically centralized an end-user or another system. Similarly to the DP, it is and can be mapped to the traditional role of system gov- composed of a set of interfaces between the end-user and ernor which provides the supervision of the requirements the information. and constraints that the Big Data must fulfill; for example, policies, architecture, or business requirements. The NIST proposal cannot be considered as a SRA, but it rec- • Data Provider (DP): This component oversees feeding the ognizes the importance of security and privacy in a Big Data Big Data ecosystem with new data. In order to accomplish environment. In order to face the security problems, this archi- that goal, the Data Provider has a collection of interfaces, tecture has a Security and Privacy Fabric that addresses the needs or services, between the Big Data and the data sources. and solutions about this specific topic. In fact, there exists a spe- This set of interfaces acts like a gate between the outside cific volume about privacy and security in Big Data [27]. world and the Big Data system. From our point of view, this representation based on blocks • Big Data Application Provider (BDAP): The BDAP compo- is not expressive enough. This kind of specification is too high nent provides a specific set of services along the data life level in terms of abstraction, it provides little emphasis on de- cycle to meet the requirements established by the SO. It is tails of the subcomponents and how they are connected. This important to highlight that its main purpose is to encapsu- approach can make difficult the design and implementation of a late the business logic and functionality to be executed by Big Data ecosystem. Following the same approach, the ISO/IEC the architecture. In a regular Big Data scenario, there are organization is also working in the creation of a RA for Big Data several applications executing over the same data. As data under the standard ISO/IEC 20547-3 [16]. Although, it is a work propagates through the ecosystem, it is being processed in progress, it is expected that it will follow a similar approach and transformed in different ways to obtain valuable in- to the NIST proposal. formation from the data. In order to achieve that goal, the BDAP is composed of different services or activities that 3 A SECURITY REFERENCE can be considered as the SaaS layer of the Big Data sys- ARCHITECTURE (SRA) FOR BIG DATA tem. These activities are: collection, preparation, analytics, In this section, we will describe our SRA proposal which is struc- visualization, and access. Activities can be implemented tured using the same schema and components as the guidelines as independent functions and deployed as stand-alone proposed by NIST. We consider that if our SRA is aligned with the services. Furthermore, the activities can interact with the RA proposed by NIST, it will be easier to implement. Furthermore, this architecture highlights the importance of implementing se- 3.2 Data Provider (DP) curity solutions based in concepts of the SRA. The DP component creates an abstraction of the data sources considering their security metadata, if they exist. These meta- data allow the DP to identify the types of access and analysis allowed by the data source and its security requirements. As 3.1 System Orchestrator (SO) we explained in section 2, the DP has a set of interfaces. Those The main purpose of this component is the enforcement of the interfaces must consider the constraints of each data source and different requirements that the Big Data ecosystem must address. also the different security policies and requirements specified by Also, it organizes how the requirements are connected to all the the SO. In this element, there may exist conflicts between the components of the architecture; in this section, we will focus on security requirements of the data source and the ones of the Big the security requirements and the relation between them and Data system itself. These clashes must be addressed in a way the different components. Figure 2 shows the structure of our that satisfies both sides. The security and privacy issues of this SO proposal. Due to the characteristics of this component, the component are mostly related to how to properly identify and security activities related to it are in general focused on the re- validate the end point inputs. The DP interfaces must evaluate quirements and how to implement and monitor them. Those the provenance of the data source. It is a critical challenge in requirements must fulfill Big Data goals and should be aligned the data collection process knowing how to validate that a data with the different business goals and company policies. In this source is not malicious and to filter out those which are [7]. concern, the role of the Security Administrator is crucial to en- In our SRA, the interfaces are connected with the Collector sure the observance of the security requirements. These security service of the BDAP that will be described in the next subsec- requirements must comply with the regulations affecting each tion. Figure 3 represents the DP component with its interfaces. Big Data ecosystem context. In fact, there are many other kinds of In general, the elements that generally compose a data source, requirements that can address the needs of a Big Data ecosystem; include: the data itself that can be structured, semi-structured, for example, architecture, quality, or governance requirements. or unstructured; security requirements of the data source; and There are many examples of security requirements that should security metadata of the data source. Those elements are not be addressed in a Big Data context. Topics like data privacy and represented in the diagram because we consider data source as how to secure the Big Data architecture itself are the most ad- an external agent of the Big Data system. Still it is important to dressed by researchers [25]. These problems can be tackled by know them to apply their constraints. using general mechanisms like user authorization and authentica- tion, fraud detection, risk control, auditing, encryption, network access control, intrusion detection, or guarantee the quality and 3.3 Big Data Application Provider (BDAP) security of the data when they come from different data sources The BDAP component has the objective of meeting the require- [3, 17, 20, 25, 32]. These are general security mechanisms but ments established by the SO, including its security and privacy they must be modified to be applied to specific types of systems, requirements. To achieve that goal, the BDAP is composed of based on possible threats. different services or activities that can be considered as the SaaS As it is shown in Figure 2, these security requirements can (Service as a System) layer of the Big Data ecosystem; in our case, be satisfied by means of different security solutions that follow we assume that, in general, Big Data is implemented on a Cloud the security policies of the company and have as main objective platform, which will affect how the SRA is defined in the BDFP addressing threats to control vulnerabilities. An example of a se- component. Figure 4 shows the different services that constitute curity policy in a company can be the obligation of using secure this component, and also the BDAP Security Solution that must communications, this policy can cause a security requirement map the SO security solutions to these stages; for example, au- in the Big Data environment that specifies that the data trans- thorization may control here who can apply which operations to fer between components must be secure. One way to approach perform data analysis. requirement is by using authentication methods, the implemen- As it is represented in the diagram, not all the activities can tation of this security solution can be helped by means of the communicate with each other, there is a sequential order of execu- “Role-based access control” security pattern. These security solu- tion. This means that some of these activities are not mandatory tions should be specifically implemented in the BDAP and BDFP in a Big Data ecosystem. The preparation step has the purpose components. However, these solutions are not easy to implement; of validating, cleaning and storing the data, but in a real-time thus, our model uses security patterns as a guidance. A security scenario where the data should be analysed as soon as it gets into pattern is a solution to a recurrent problem that indicates how the system, this activity might be skipped. Something similar to defend against a threat, or a set of threats, in a concise and happens to the visualization step, if the data consumer is not reusable way [12]. Patterns are abstract solutions that must be a human end-user but another system, like a data warehouse tailored to where they are applied. Furthermore, we can use mis- or even another Big Data ecosystem, this activity may not be use patterns [14] as a way to understand each attack and guide necessary. the application of the different security patterns that can be used Nevertheless, the other three activities are basic in a Big Data to stop a threat. Moreover, the security metadata can be defined ecosystem: the collection activity acts like an ETL (Extract, Trans- as a way to facilitate the coordination and realization of security form, and Load) process and combines sets of data of similar requirements. Another topic covered by our architecture is the structure with the objective of unifying them; the analysis step context of the asset; for example, the security considerations of includes a set of techniques to obtain valuable knowledge from a medical record, are totally different compared to the ones of a data; for example, MapReduce algorithms and finally, the access log file. It is important to evaluate the required security level for activity has the purpose of communicating with the DC, acting each asset. like an interface between DC and visualization and analytics Figure 2: System Orchestrator (SO) diagram In regard to security and privacy issues, in this component the activities should be focused on the encryption and key manage- ment of the data, the isolation and containerization of process execution, authorization, authentication, audit logging, and how to secure the storage and the network. Those security issues should be addressed by means of the security solutions defined on the SO, which can be implemented in this level as BDFP secu- rity solutions. The SO security solutions are now mapped to data protection, including application of cryptography and specialized authorization mechanisms [8, 37]. 3.5 Data Consumer (DC) Figure 3: Data Provider (DP) diagram The DC component is, similarly to DP, composed by a set of interfaces. The interaction could include interactive visualiza- tion, report creation, or data drilling using business intelligence activities. The relation between those different activities is rep- techniques. It is important to highlight that these interfaces must resented in Figure 4 by dotted lines, because it is a temporary address the authorization and authentication function, in order usage relation. to reach the goal that the DC matches the metadata related to the end-user and the security requirements and policies of the information. 3.4 Big Data Framework Provider (BDFP) Finally, Figure 6 summarizes our complete SRA for Big Data. In In general, the BDFP component is composed of a set of clus- this figure, the relationships between the different components ters which, in turn, are composed of nodes. Those nodes can be of the architecture can be seen in perspective. This figure is deployed by means of Virtual Machines or Containers, which important to better understand the example which is presented interact with the hardware itself and the OS. in the following subsection. The BDFP component in NIST is very abstract, with a lack of details in the subcomponents needed to perform its processes. 3.6 Examples of Application of Security Therefore, our proposal makes more emphasis in the different elements and how they are connected. Figure 5 depicts the differ- Patterns ent subcomponent of the BDFP. Our SRA highlights the idea of As a way to show the usefulness of our SRA, we explain an ex- a Big Data ecosystem with the possibility of implementing the ample of how to employ security patterns using our architecture. system with a Cloud environment and visualization techniques. We created the example by identifying some of the threats that Figure 4: Big Data Application Provider (BDAP) diagram Figure 5: Big Data Framework Provider (BDFP) diagram can be found in the different activities of the BDAP component. We will use an object diagram to explain it, this diagram is shown A systematic method for the enumeration of threats is shown in Figure 7. In this scenario, we have the stored data as the main in [12]. Those threats can be addressed by means of security asset to protect, this asset has a vulnerability: it has no protection, patterns, which, in some cases, should be modified from general this vulnerability could be exploited by a threat like TC1. In order security patterns to meet the Big Data inherent features. The to prevent that situation is necessary to implement a security modification of these patterns, and the creation of new ones if solution. To facilitate the implementation of the solution, two needed, is beyond the purpose of this paper and is considered security patterns can be used: Role-based access control and as future work. Table I summarizes some of the threats of each Authentication. However, this security solution will still have a activity and the general patterns that can be applied to solve high abstraction level due to the fact that it is defined in the SO them. Those patterns are defined in [12]. component. Hence, a low level implementation of the security As a way to better understand how to integrate the different solution should be approached in the BDAP level, in this case, the components of our SRA and the security patterns, we will define TC1 can affect the different services provided by the BDAP, that how the threat TC1 can be addressed by using security patterns. Figure 6: Big Data SRA complete diagram Table 1: Identified threats and security patterns for the different activities ID Activity Threat Security Pattern TC1 Common to all the Data modified Authentication, Role-based access control activities TC2 Common to all the Data destroyed Authentication, Role-based access control activities TC3 Common to all the Data illegally read Encryption, Role-based access control, Au- activities thentication TC4 Common to all the Unapproved change in activity Logger and Auditor, Controlled access activities function session,Role-based access control, Authenti- cation TCo1 Collection Malicious data source Authentication TP1 Preparation Malicious filter Logger and Auditor, Controlled access ses- sion, Role-based access control, Authentica- tion TA1 Analysis Infer PII* from anonymized data Encryption, Logger and Auditor, Multilevel security, Role-based access control, Authenti- cation TA2 Analysis Malicious analysis algorithms Logger and Auditor, Controlled access ses- sion, Role-based access control, Authentica- tion TV1 Visualization PII* exposed due to high graphic Multilevel security, Authentication, Role- granularity based access control TAc1 Access Several malicious access Authentication, Role-based access control *PII – Personal Identifiable Information the other hand, as its name indicates, one of the most important things to implement the Role-based access control is to define the different roles. In this case, we have defined four roles: the administrator of the Big Data system, the data scientist, the end user, and the data owner. As we explained before, this example is focused on the Collector phase, so the defined rights of the roles must consider this situation; for example, in this phase the end user should not have any rights over the data. Hence, the Figure 8 shows the different functions that the user can perform over the data according to their rights. 4 COMPARISON WITH OTHER PROPOSALS There are not many reference architectures for Big Data systems; if we focus our architecture goal in security, there are even fewer. However, different authors and organizations have proposed different reference architectures for Big Data. In this section, we describe some of the most relevant proposals. Figure 7: Using security patterns to address a specific Demchenko et al. [11] propose a Big Data Framework Archi- threat tecture that establishes the data lifecycle in a Big Data ecosystem. As in the NIST approach, they use a block representation; but is the reason why the security solution should be implemented with a more detail in the relationships between the different com- there and not in another component. ponents of the architecture. However, they address security in Furthermore, we will describe how to create an instance of the a very sketchy way and as an isolated feature, not really con- two different security patterns to secure the Collector subcom- nected to the other components. In [28] the authors propose a ponent (Authentication and Role-based Access Control security complete architecture in terms of the relations between the dif- patterns) by creating a partial example. In this example, we will ferent components; however, we found a lack of consideration focus on a Big Data system whose objective is to process tweets given to security and privacy aspects. Klein et al. propose in from the Twitter platform to analyse the general sentiment about [18] a specific reference architecture for Big Data in the national a product. Figure 8 shows the object diagram for this example. security domain. Their architecture is very similar to the one The main component is what we want to protect, in this case: proposed by NIST. Our goal is to obtain a better abstraction of the tweets that have been obtained to be analysed. the architecture, but still it is interesting how they address some The Authentication pattern allows us to verify the identity of concerns by using solution patterns. They highlight the impor- the user by using a proof of identity and an authenticator. On tance of having a specific domain for the requirements. In our Figure 8: Application of Authentication and Role-based access control patterns Table 2: Comparison between RAs in some proposals a lack of connection between the different components of the architecture, our SRA clearly specifies those RA Pro- Requirements Security Connection Abstraction relationships. Finally, our proposal has a medium abstraction posal concern con- between level level, due to the fact that we do not consider specific technology cern compo- solutions or applications. nents Although there are some SRAs for Cloud environments and some of their contributions could be useful to a Big Data en- NIST Medium High Low High vironment, there are still some differences that are remarkable Demchenko Medium Low Medium Medium enough to create a SRA for Big Data. For example, there are some Klein Low Medium Medium Low cases where the Big Data environment is supported by a Cloud Pääkkönen Medium Low High Medium infrastructure, in that case, the Big Data RAs must consider that and Pakkala possibility. In general, Cloud RAs are focused on the infrastruc- SRA Pro- High High High Medium ture, while a Big Data RA must contemplate also the services posal associated with the data analysis. case, requirements, and specifically the ones related to security, 5 CONCLUSION AND FUTURE WORK are the main part of the SO component. A more precise Reference Architecture (RA) is a better framework Sqrrl [34] and BlueTalon [4] propose a Big Data model focused to guide the use of security mechanisms to provide a high level on data-centric security. Their purpose is to embed security in- of security. Our Security Reference Architecture (SRA) subsumes formation within the data itself. In the case of Sqrrl, they made the published RAs, including the proposals made by NIST, Oracle, emphasis in the access control in each field of data, and to do NTT, and different researchers. that they use a layered architecture built around the value or We have created a SRA described by means of UML diagrams sensitivity of the data. On the other hand, BlueTalon includes in that try to facilitate the implementation of secure Big Data. We their proposal the concept of data lakes, a storage repository that decided to use UML diagrams because we found a lack of propos- holds a huge amount of raw data until it is needed. There are als where the relationship between the different components and other proposals made by the main IT companies like Oracle [5], subcomponents is precisely defined. Also, thanks to this kind of NTT data [10], IBM [9], Microsoft [24] or SAP [31]. Table II sum- diagram it is possible to apply different security patterns, which marizes these RA and compares them with our SRA proposal. The are usually described as UML models. Security patterns address criteria were selected based on a previous systematic mapping recurrent security problems, we have defined some of the security study that we carried out about security Big Data concerns [25]. patterns that can be implemented to protect the system against As a side effect of this work, we detected some characteristics threats. Our SRA emphasizes the idea of a Big Data ecosystem by that usually are not considered in the different proposals and implementing the system using a Cloud Computing environment. could be important to define a SRA. We have also listed some of the threats that can be found Unlike the other proposals, our SRA has the requirements in a Big Data ecosystem; however, a deeper understanding of as the main factor to consider to properly implement a Big the different threats that can affect these systems it is needed. Data ecosystem, more specifically the security requirements that We will address this problem by creating different use cases should be approached in this phase. Moreover, we have found and scenarios to identify those threats as in the method of [14]. Once we have the threats identified, we will find, adapt or create [25] Julio Moreno, Manuel A. Serrano, and Eduardo Fernández-Medina. 2016. Main security patterns that can solve those problems. We consider Issues in Big Data Security. Future Internet 8, 3 (2016), 44. [26] NIST NBD-WG. 2017. NIST Big Data Reference Architecture. (2017). https: these topics as the next steps to complete our SRA. Furthermore, //bigdatawg.nist.gov/_uploadfiles/M0639_v1_9796711131.docx it is important to perform an analysis of the different stakeholders [27] NIST NBD-WG. 2017. NIST Big Data Security and Privacy. (2017). https: //bigdatawg.nist.gov/_uploadfiles/M0638_v1_4829021654.docx that interact with the Big Data use cases. [28] Pekka Pääkkönen and Daniel Pakkala. 2015. Reference architecture and classification of technologies, products and services for big data systems. Big Data Research 2, 4 (2015), 166–186. ACKNOWLEDGMENTS [29] James Rumbaugh, Ivar Jacobson, and Grady Booch. 2004. Unified modeling This work was funded by the SEQUOIA project (Ministerio de language reference manual, the. Pearson Higher Education. [30] S. Sagiroglu and D. Sinanc. 2013. Big data: A review. Collaboration Technologies Economía y Competitividad and the Fondo Europeo de Desarrollo and Systems (CTS), 2013 International Conference on (May 2013), 42–47. https: Regional FEDER, TIN2015-63502-C3-1-R). //doi.org/10.1109/CTS.2013.6567202 [31] SAP. 2016. CIO Guide to Using the SAP HANA® Platform for Big Data. (Feb. 2016). REFERENCES [32] B. Saraladevi, N. Pazhaniraja, P. Victer Paul, MS Saleem Basha, and P. Dhavachelvan. 2015. Big Data and Hadoop-A study in security perspective. [1] Jacky Akoka, Isabelle Comyn-Wattiau, and Nabil Laoufi. 2017. Research on Procedia computer science 50 (2015), 596–601. Big Data – A systematic mapping study. SI: New modeling in Big Data 54, Part [33] Priya P. Sharma and Chandrakant P. Navdeti. 2014. Securing big data hadoop: 2 (Nov. 2017), 105–115. https://doi.org/10.1016/j.csi.2017.01.004 a review of security issues, threats and solution. Int. J. Comput. Sci. Inf. Technol [2] Paris Avgeriou. 2003. Describing, Instantiating and Evaluating a Reference 5 (2014). Architecture: A Case Study. Default journal (2003). [34] SQRRL. 2014. Big Data and Data Centric Security. (2014). http://sqrrl.com/ [3] E. Bertino. 2015. Big Data - Security and Privacy. In 2015 IEEE International media/Data-Centric-Security-WP-final-.pdf Congress on Big Data. 757–761. https://doi.org/10.1109/BigDataCongress.2015. [35] Bhavani Thuraisingham. 2015. Big data security and privacy. In Proceedings of 126 the 5th ACM Conference on Data and Application Security and Privacy. ACM, [4] BlueTalon. 2016. BlueTalon Data-Centric Security Platform: Bringing Order 279–280. to Data Security Chaos. (2016). http://bluetalon.com/data-centric_security/ [36] Hua Wang, Xiaohong Jiang, and Georgios Kambourakis. 2015. Special issue on [5] Doug Cackett. 2013. Information Management And Big Data A Reference Security, Privacy and Trust in network-based Big Data. Information Sciences: Architecture. Oracle, February (2013). an International Journal 318, C (2015), 48–50. [6] Min Chen, Shiwen Mao, and Yunhao Liu. 2014. Big data: A survey. Mobile [37] Jiaqi Zhao, Lizhe Wang, Jie Tao, Jinjun Chen, Weiye Sun, Rajiv Ranjan, Joanna Networks and Applications 19, 2 (2014), 171–209. Kołodziej, Achim Streit, and Dimitrios Georgakopoulos. 2014. A security [7] Big Data Working Group Cloud Security Alliance (CSA). 2013. Ex- framework in G-Hadoop for big data computing across distributed Cloud data panded Top Ten Big Data Security and Privacy. (April 2013). centres. J. Comput. System Sci. 80, 5 (2014), 994 – 1007. https://doi.org/10. https://downloads.cloudsecurityalliance.org/initiatives/bdwg/Expanded_ 1016/j.jcss.2014.02.006 Special Issue on Dependable and Secure Computing. Top_Ten_Big_Data_Security_and_Privacy_Challenges.pdf [8] Jason C. Cohen and Subrata Acharya. 2014. Towards a trusted HDFS stor- age platform: Mitigating threats to Hadoop infrastructures using hardware- accelerated encryption with TPM-rooted key protection. Journal of Informa- tion Security and Applications 19, 3 (2014), 224 – 244. https://doi.org/10.1016/ j.jisa.2014.03.003 [9] IBM Corporation. 2014. IBM Big Data & Analytics RA. (2014). [10] NTT DATA. 2015. NTT DATA BigData Reference Architecture. (2015). http:// www.nttdata.com/global/en/shared/pdf/bigdata_reference_architecture.pdf [11] Yuri Demchenko, Cees De Laat, and Peter Membrey. 2014. Defining architec- ture components of the Big Data Ecosystem. In Collaboration Technologies and Systems (CTS), 2014 International Conference on. IEEE, 104–112. [12] Eduardo B. Fernandez. 2013. Security patterns in practice: designing secure architectures using software patterns. John Wiley & Sons. [13] Eduardo B. Fernandez, Raul Monge, and Keiko Hashizume. 2016. Building a security reference architecture for cloud systems. Requirements Engineering 21, 2 (June 2016), 225–249. https://doi.org/10.1007/s00766-014-0218-7 [14] Eduardo B. Fernandez, Nobukazu Yoshioka, and Hironori Washizaki. 2009. Modeling misuse patterns. In Availability, Reliability and Security, 2009. ARES’09. International Conference on. IEEE, 566–571. [15] Eduardo B. Fernandez, Nobukazu Yoshioka, Hironori Washizaki, and Madiha H. Syed. 2016. Modeling and Security in Cloud Ecosystems. Future Internet 8, 2 (April 2016), 13. https://doi.org/10.3390/fi8020013 [16] ISO/IEC. 2018. ISO/IEC CD 20547-3 - Information technology – Big data reference architecture – Part 3: Reference architecture. (2018). https://www. iso.org/standard/71277.html?browse=tc [17] M. Kaushik and A. Jain. 2014. Challenges to big data security and privacy. International Journal of Computer Science and Information Technologies (IJCSIT) 5, 3 (2014), 3042–3043. [18] John Klein, Ross Buglak, David Blockow, Troy Wuttke, and Brenton Cooper. 2016. A reference architecture for big data systems in the national security domain. In Proceedings of the 2nd International Workshop on BIG Data Software Engineering. ACM, Austin, Texas, 51–57. [19] Srdjan Krco, Boris Pokric, and Francois Carrez. 2014. Designing IoT archi- tecture (s): A European perspective. In Internet of Things (WF-IoT), 2014 IEEE World Forum on. IEEE, 79–84. [20] Guillermo Lafuente. 2015. The big data security challenge. Network Security 2015, 1 (Jan. 2015), 12–14. https://doi.org/10.1016/S1353-4858(15)70009-7 [21] Fang Liu, Jin Tong, Jian Mao, Robert Bohn, John Messina, Lee Badger, and Dawn Leaf. 2011. NIST cloud computing reference architecture. NIST special publication 500, 2011 (2011), 292. [22] V. Mayer-Schönberger and K. Cukier. 2013. Big Data: A Revolution that Will Transform how We Live, Work, and Think. Houghton Mifflin Harcourt. https: //books.google.es/books?id=uy4lh-WEhhIC [23] Nenad Medvidovic and Richard N. Taylor. 2010. Software architecture: founda- tions, theory, and practice. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 2. ACM, 471–472. [24] Microsoft. 2014. Microsoft Big Data Solution Brief. (2014). http://download. microsoft.com/download/f/a/1/fa126d6d-841b-4565-bb26-d2add4a28f24/ microsoft_big_data_solution_brief.pdf