=Paper= {{Paper |id=Vol-2326/short2 |storemode=property |title=An Agent-based Approach for Dynamic Big Data Processing in a Smart City Environment |pdfUrl=https://ceur-ws.org/Vol-2326/short2.pdf |volume=Vol-2326 |authors=Zakarya Elaggoune,Ramdane Maamri,Imane Boussebough |dblpUrl=https://dblp.org/rec/conf/icaase/ElaggouneMB18 }} ==An Agent-based Approach for Dynamic Big Data Processing in a Smart City Environment== https://ceur-ws.org/Vol-2326/short2.pdf
An Agent-based Approach for Dynamic Big Data Processing in a
                  Smart City Environment

                  Zakarya Elaggoune                            Ramdane Maamri
                   LIRE Laboratory                              LIRE Laboratory
               Constantine 2 University                     Constantine 2 University
              25000 Constantine, Algeria                  25000 Constantine, Algeria
       zakarya.elaggoune@univ-constantine2.dz       ramdane.maamri@univ-constantine2.dz
                                        Imane Boussebough
                                          LIRE Laboratory
                                     Constantine 2 University
                                   25000 Constantine, Algeria
                                    iboussebough@gmail.com



                                                                        strengthen their focus in this sector. They recognized ten
                                                                        important areas that will play a key role in creating a
                            Abstract                                    smart city: smart lifestyle, smart security system, smart
                                                                        home, smart building, smart environment, smart govern-
     The big data era brought us new processing and                     ment, smart grid, smart tourism, smart transportation
     information management challenges to face.                         and smart health [CDBN09]. Each component of smart
     The existing tools managed to control the on-                      cities is based on large-scale data analysis that show pub-
     going challenges, and the current architectures                    lic safety, economic development, pollution, traffic con-
     are close to meeting the needs of the users. But                   ditions, and so on.
     the volume rate at which new data is generated                         Smart cities are an imminent need, and are the true
     leads to new rising challenges. This is especially                 form of smart earth applied to custom areas to achieve
     true in the context of smart cities, where gath-                   intelligent and integrated city management. In smart
     ering information in an energy-efficient man-                      cities, different sets of data are continually analyzed to
     ner to prolong the lifetime of Wireless Sensor                     present intelligent planning ideas, intelligent building
     Networks (WSNs); and adapting the analytical                       models and intelligent management, where big data is
     mechanism to support the speed at which new                        treated as the fuel of any smart system [Coc14].
     data is generated to deliver real-time results dy-                     At the beginning of the Big Data era, three main chal-
     namically are the two key rising challenges. This                  lenges inherent to the characteristics of big data ap-
     article aims at exploring and describing how                       peared (the "3V" initial Big Data):
     Multi-Agent Systems (MAS) can handle a large
     amount of data with a dynamic analytics capa-                          Volume: data sets with enormous size and complex-
     bilities and in an energy-efficient manner.                            ity (many features),

                                                                            Velocity: fast generation of data arriving in continu-
1    INTRODUCTION                                                           ous flows,
The prospects for smart cities are very promising, and
                                                                            Variety: Different types of data come in different
various smart device manufacturing groups, for exam-
                                                                            forms.
ple, IBM and Intel, are launching various initiatives to
                                                                        These challenges, also known as "data flood", have
Copyright © by the paper’s authors. Copying permitted for private and
academic purposes.
                                                                        pushed storage systems and processing techniques to
                                                                        their limits at that time. After becoming familiar with the
In: Proceedings of the 3rd Edition of the International Conference on
Advanced Aspects of Software Engineering (ICAASE’18), Constantine,      first three challenges, the new techniques began to per-
Algeria, 1,2-December-2018, published at http://ceur-ws.org             form well, but soon the flood of data overwhelmed these




                                                                                                                         Page 134
An Agent-based Approach for Dynamic Big Data Processing in a Smart City Environment                           ICAASE'2018




                                          Figure 1: An overview of the system
techniques. Indeed, as the volume of data increased and       each time new data arrives, and this to integrate the new
sources multiplied, raw data became increasingly poor         data sensed into the processing cycle. Restarting the an-
and useful information became scarcer. Increasingly, the      alytical process periodically consumes energy and time,
usefulness and reliability of data and their sources have     therefore, processing data continuously without stopping
been questioned. Hence the emergence of two new chal-         in an adaptive way is a necessary task.
lenges taking the "3V" challenges of big data to "5V".           Thence, our goal is to propose a new approach for
[JGL+ 14] define the new ’Vs’ as follows:                     smart cities, that can gather relevant information in a
                                                              smart manner and can adapt to changes that occur in the
     Value: the usefulness of the data or more precisely
                                                              data without having to restart the entire process. There-
     the amount of useful information among the flooded
                                                              fore, we use a multi-agent approach to design a two-tiers
     data,
                                                              system: the first tier for data gathering and preprocessing
     Veracity: Reliability and confidence attributed to the   (a smart wireless sensor network); and the second one
     data and its sources.                                    is a real-time multi-agent system for dynamic big data
                                                              analytics.
With the recent increase in the number of smart and              The rest of this article is organized as follows. In Sect.
portable devices and other measuring instruments in am-       2, we describe the two-tiers multi-agent-approach. In
bient applications and smart cities, we are just beginning    Sect. 3 the smart WSN is presented, describing in de-
to address every aspect of this new big data. In the smart    tail the different steps of relevant data extraction. Then
cities context, we can extract two main rising challenges     we discuss the dynamic big data mechanism In Sect. 4.
from this new big data:                                       Lastly, we conclude our study in Sect. 5.
    Gathering data from WSN in an energy-efficient man-
ner.A WSN consists of a large number of sensor nodes
                                                              2 An Overview of the System
with limited batteries, which are randomly deployed over
an area to collect data. The lifetime of the network de-      In this system we propose the use of fuzzy agents for the
creases because of these limited batteries. Therefore, it     data relevance estimation. To communicate the data be-
is important to minimize the energy consumption of each       tween sensor nodes with low energy consumption, we
node, which leads to the extension of the lifetime of the     use the technique of clustering, where in each Cluster-
WSN. Since many of the detected data could be redun-          Head(CH) an instance of a fuzzy agent is embedded. Af-
dant or unimportant, collecting only relevant data could      ter gathering the data, each CH sends the extracted rel-
be a good technique for saving energy in sensor nodes         evant data to the sink node, this last one dispatch the
and extending network lifetime.                               relevant data to the second-tier (processing agent) for
    Managing the dynamicity of the data in an adaptive        real-time analysis.
way. One of the advantages of big data is the exploita-          Concerning the second-tier, which is the big data pro-
tion of the large volume of data in several purposes, like    cessing, we use a multi-agent system to build a three-
business strategies and healthcare. For efficient data ex-    layer big data processing system: a real-time processing
ploitation, the data processing process stops and restarts    layer; an adaptive batch processing layer; and a service




International Conference on Advanced Aspects of Software Engineering                                             Page 135
ICAASE, December, 01-02, 2018
An Agent-based Approach for Dynamic Big Data Processing in a Smart City Environment                                    ICAASE'2018



layer that combines the results of the two previous layers.                   sor networks. they try to tackle the problem of build-
   The aimed system is composed of the following set of                       ing an aggregation tree for a group of source nodes
components (see Figure 1):                                                    in the WSN to send sensory data to the base sta-
                                                                              tion. However, the construction of this tree largely
    • First-tier- a smart wireless sensor network: sensor                     depends on the deployment of the nodes, which is
      node; fuzzy agent; cluster-head; sink node.                             generally random, and consumes a large amount of
    • Second-tier- a dynamic big data processing: data                        energy. Since the communication range of a node
      node; processing agent; knowledge; service agent.                       is limited, the nodes can only communicate with
                                                                              their one hop neighbors, so the euclidean distance
                                                                              between the source node and the receiving node is
3     First-Tier: Smart Wireless Sensor Net-
                                                                              unreliable
      work
the basic role of sensor nodes is to collect information                   • Mobile agent based directed diffusion (MADD)
from the environment and send them to the base station                       [CKY+ 06]: The authors considered mobile agents
in order to perform calculations. This collection must re-                   (MA) in multi-hop environments and adopted direct
spect the battery life of each node to maintain the lifetime                 broadcast to dispatch the MA. In directed broadcast-
of the network.                                                              ing, a detection task is broadcast through the sensor
    The traditional model of data collecting is the                          network as requests of interest for named data, i.e.
Client/Server (C/S) approach. In the C/S approach;                           the interests of the users are diffused through the
when the sensors capture the data, they send it directly                     sensor network. The sink node floods a request to
to the base station as unprocessed raw data. in addi-                        the interest sensors and the intermediate nodes set
tion, to send data to the base station, the communication                    gradients to send data around the routes to the sink
goes through a multi-hop communication. This multi-                          node[IGE+ 03]. however, the current MADD frame-
hop communication causes additional power consump-                           work is only suitable when the data is retrieved di-
tion, because intermediate nodes relay information on                        rectly from the network whenever there are request
more distant nodes. Several studies have been done to                        from the users. some enhancement for the frame-
optimize the architecture of this model, some works are                      work is needed to retrieve requests only from the
listed below:                                                                active area.

    • Incremental data fusion of a maximum number of                       • There are several works that have proposed a struc-
      sensors [PDN04]: when a node sends its data to the                     tured strategy like multicast tree[AKUMK09, UG07].
      sink, the intermediate nodes merge their data with                     However, because of excessive communication costs
      others coming from the first node. Therefore, this                     and centralized management of the sensor network
      data is fused into a single message. this solution                     structure, structured approaches are not good for
      is not scalable, and it is suitable only for networks                  dynamic scenarios.
      which does not contain a large number of nodes.
      Furthermore, the intermediate nodes do not have al-               After having analyzed the solutions presented above, we
      ways relevant information to send and they do not                 can deduce that there is still a lot of work in terms of en-
      filter out redundant and irrelevant information.                  ergy efficiency in the wireless sensor networks field, and
                                                                        since preprocessing data and eliminating irrelevant in-
    • Data aggregation for clustered WSN [CMM08]: the
                                                                        formation contributes to lower energy consumption, our
      authors propose a clustering algorithm in which
                                                                        goal is to propose a wireless sensor network based on
      sensors choose themselves as cluster heads with a
                                                                        the relevance of data. We use the agent technique for
      certain probability and disseminate their decisions.
                                                                        intelligent and adaptive management.
      their work focuses on incorporating adaptive behav-
      ior into protocols in such a dynamic network. Once                   For more efficiency, we have proposed the use of the
      the data from each node is received, the cluster head             clustering technique to send data easily to the Sink and
      transmits it directly to the sink. This solution based            for batter organization. We can use the algorithm Low
      in the cluster heading paradigm which consumes a                  Energy Adaptive Clustering Hierarchy (LEACH) or any
      large amount of energy. Furthermore, the authors                  other efficient algorithm to decompose the network into
      did not address the problem of complexity and ne-                 clusters, each with a Cluster-Head (CH). To achieve our
      glected the importance of scalability of such kind of             objective, we propose to integrate, into each CH, a fuzzy
      networks.                                                         agent to process data, eliminate non-useful data, and re-
                                                                        duce redundancy. Each CH in the network is seen as an
    • The ant agent [LKF08]: the authors present a data                 autonomous fuzzy agent with its own attitudes and char-
      aggregation based on ant colonies for wireless sen-               acteristics towards the different events they receive.




International Conference on Advanced Aspects of Software Engineering                                                      Page 136
ICAASE, December, 01-02, 2018
An Agent-based Approach for Dynamic Big Data Processing in a Smart City Environment                                     ICAASE'2018



3.1    Fuzzy Agent Role Behaviors                                       4     Second-Tier: Dynamic Big Data Process-
                                                                              ing
                                                                        4.1    Big Data Architectures
                                                                        The most used process for big data analysis is the dis-
                                                                        tributed pipeline (Figure 3-a). this model has been pro-
                                                                        posed to circumvent the rigidity problem by reducing the
                                                                        processing time by means of parallelism. This pipeline is
                                                                        based on the MapReduce pattern and its famous Hadoop
                                                                        framework.
                                                                           However, applying this model does not solve the
                                                                        problem of data dynamicity, moreover, this model relies
                                                                        on batch processing and does not really focus on real-
                                                                        time processing, which leaves always a portion of non-
            Figure 2: Degree of relevance of data
                                                                        processed data (Figure 3-b).
                                                                           Other architectures have extended this model, try-
  The aim of the WSN is to collect the maximum data                     ing to support the real-time processing, in the follow-
and eliminate the irrelevant or redundant ones.                         ing paragraphs we will discuss the two most used archi-
   Each Cluster-Head in the network is associated with                  tectures: Lambda Architecture(LA) and Kappa Architec-
a fuzzy agent (FA), the principal role of the FA is to use              ture(KA).
fuzzy logic to estimate the relevance of the data and to
eliminate the unimportant data. Hence, we have defined                      • lambda Architecture (LA): "The LA aims to satisfy
two main points for fuzzy agent to extract the relevant                       the needs for a robust system that is fault-tolerant,
information, which means to reduce the power of each                          both against hardware failures and human mistakes,
node and to extend the life of the WSN:                                       being able to serve a wide range of workloads and use
                                                                              cases, and in which low-latency reads and updates are
                                                                              required. The resulting system should be linearly scal-
  1. Degree of relevance of data: the degree of relevance                     able, and it should scale out rather than up." [HB]
     of the data strongly depends on the desired appli-
                                                                              This is what it looks like, from a high level point of
     cation. This parameter is calculated locally in the
                                                                              view [HB]:
     sensor node. The fuzzy agent can estimate the de-
     gree of relevance of the data collected. This informa-                      – All streamed data is sent to both the batch layer
     tion is taken into account if it’s the primary informa-                       and the speed layer,
     tion containing the required information. for exam-
                                                                                 – The Batch layer pre-calculate the batch views,
     ple, for air pollution monitoring, the node records
     the latest collected data to compare with the new                           – The serving layer indexes the batch views so
     ones collected. The fuzzy agent considers data as                             that they can be queried in low-latency way,
     relevant if the difference between the two values is                        – The speed layer indemnify the high latency of
     greater than a predetermined threshold. However,                              updates to the serving layer and process only
     if the difference increases, the fuzzy agent consider                         recent data,
     that these data have a higher priority, so the degree
                                                                                 – Any incoming query can be resolved by merg-
     of relevance increases.
                                                                                   ing results from real-time views and batch
                                                                                   views.
  2. Inter-sensor-nodes redundancy elimination: typically,                    The idea behind these layers was that the speed
     the sensor nodes are randomly deployed. so,                              layer will be providing real-time results into serving
     many sensor nodes will cover the same geographi-                         layer, and if any data is missed while stream process-
     cal points, which means that they will give the same                     ing or any data errors, then batch job will compen-
     information (redundancy). In this case, the fuzzy                        sate that and updates the serving layer, so provid-
     agent will compare the values collected by each sen-                     ing accurate results. But it is very hard to build the
     sor node with its neighbors for eliminating the inter-                   pipeline and maintain analysis logic in both batch
     sensor-nodes redundancy.                                                 and speed layer.

Figure 2 illustrates the fuzzy logic used by the agent to
estimate the relevance of the data.




International Conference on Advanced Aspects of Software Engineering                                                       Page 137
ICAASE, December, 01-02, 2018
An Agent-based Approach for Dynamic Big Data Processing in a Smart City Environment                                   ICAASE'2018




               Figure 3: (a) distributed Big Data Analytics pipeline [BGG16];(b) Big data processing-Batch
                                                                4.2 The need of new dynamic approaches
                                                                        After having analyzed the solutions presented above, we
                                                                        can deduce that available big data architectures do not
                                                                        really adapt to the dynamism of data. Furthermore, they
                                                                        must restarting periodically to take into account the real-
                                                                        time data streamed and does not integrate the new data
                                                                        in adaptive way.
                                                                        The MAS technology, with the cooperative interaction
                                                                        process of its autonomous agents, gives us the means to
           Figure 4: Lambda Architecture [SV16]                         break the rigidity problem in the other big data architec-
                                                                        tures, and can offer an adaptive management of big data
   • Kappa Architecture (KA): "Kappa Architecture is a                  streaming without the need to restarting the process pe-
     simplification of Lambda Architecture. A Kappa Ar-                 riodically.
     chitecture system is like a Lambda Architecture sys-                   When an agent receives new data, it starts processing
     tem with the batch processing system removed. To re-               data directly to deliver real-time results. And after this
     place batch processing, data is simply fed through the             agent consumes all the data stored in his node, he creates
     streaming system quickly." [Ues]                                   a link with the last agent in the batch-layer to contribute
                                                                        to the batch processing (distributed data mining), and
                                                                        another agent with an empty data node takes his place
      One of the disadvantage of the lambda architecture,
                                                                        for real-time data processing. This translates into data
      as detailed above, is to have to keep coding and ex-
                                                                        analysis tasks in interaction, mainly through communi-
      ecuting the same logic twice, and this is avoided in
                                                                        cation, then each task can help and work with other tasks
      the kappa architecture. However, the kappa archi-
                                                                        for the sake of continuous real-time adaptation of the an-
      tecture should only be considered an alternative to
                                                                        alytic process to changes in data.
      the lambda architecture in applications that do not
                                                                            The cooperation between the agents is described in
      require unbounded retention.
                                                                        the following steps (Figure 6) :

                                                                          1. Each node in the system is associated with a process-
                                                                             ing agent. The node that receives the captured data
                                                                             from the WSN is responsible for rel-time processing
                                                                             and returns real-time views as a results , the other
                                                                             nodes in the system work on the batch processing
                                                                             and return the batch views.
            Figure 5: Kappa Architecture [SV16]                           2. Agents in the batch-layer are partitioned into neigh-
                                                                             borhood groups. The neighborhood is defined by




International Conference on Advanced Aspects of Software Engineering                                                     Page 138
ICAASE, December, 01-02, 2018
An Agent-based Approach for Dynamic Big Data Processing in a Smart City Environment                                  ICAASE'2018




                                          Figure 6: Multi-Agent Cooperation
     time, from which two neighboring agents represent       speed up readings [TR14].
     two successive periods. Each group represents a full    The real-time Views and the batch Views are created for
     batch period, from where agents of the same group       a specific use case. This use case problem is resolved in
     apply distributed data mining and display batch-        the serving layer (Figure 8). Querying from the users
     views.                                                  is managed by a dedicated service agent. For each new
                                                             query the service agent is created.
 3. Whenever the data stored in the real-time node is        To prepare the response and solve the given problem, ser-
     processed, the real-time agent updates the real-time    vice agent is collects the needed data. Fresh online data
     views and creates a link with the last agent in the     are provided by the real-time views. A similar process-
     batch-layer to contribute to the batch processing.      ing is done to collect historical data (batch-views). Both
     Another agent with an empty data node takes his         views are combined together to display the whole picture
     place for real-time data processing.                    of the data.
Another way to achieve this goal, is to use the property     After combining all required data from the real-time and
of System-of-Systems (SoS) by combining one or several       batch views, the response is presented. In this point the
MASs for each step of Big Data analytics and represent       life-cycle of service agent ends.
them with an agent in one super MAS (see figure 7). this
property is used to widen the batch period.




                                                                            Figure 8: Second-tier: dynamic big data processing



                                                                        5    Conclusion
                                                                        This two-tiers approach allow building the smart city as
                                                                        an agent community that can work in distributed and
                                                                        complex systems. The first-tier describes the construc-
                                                                        tion and effective used of fuzzy agents in the wireless
      Figure 7: MAS of MAS based Big Data Analytics                     sensor network, with the consideration of the relevance
                                                                        of collected data, which can help enormously in the pro-
                                                                        longation of the lifetime of the network by decreasing the
4.3    Service Agent
                                                                        energy consumption of each sensor node. In the process-
The service agent is responsible for serving the views                  ing layer, we described and discussed how multi agent
computed by the real-time and batch layers. This pro-                   system can be applied to process big data dynamically
cess can be facilitated by additional indexing of data to               without the need to restarting the process periodically.




International Conference on Advanced Aspects of Software Engineering                                                     Page 139
ICAASE, December, 01-02, 2018
An Agent-based Approach for Dynamic Big Data Processing in a Smart City Environment                                    ICAASE'2018



   As systems architecture and agent behaviors were de-                 [JGL+ 14]     H. V. Jagadish, Johannes Gehrke, Alexan-
signed, in our future research, we move into the imple-                               dros Labrinidis, Yannis Papakonstantinou,
mentation and validation phases.                                                      Jignesh M. Patel, Raghu Ramakrishnan,
                                                                                      and Cyrus Shahabi. Big data and its techni-
References                                                                            cal challenges. Commun. ACM, 57(7):86–
                                                                                      94, July 2014.
[AKUMK09] Jamal N. Al-Karaki, Raza Ul-Mustafa, and
          Ahmed E. Kamal. Data aggregation and                          [LKF08]       Wen-Hwa Liao, Yucheng Kao, and Chien-
          routing in wireless sensor networks: Op-                                    Ming Fan. Data aggregation in wireless sen-
          timal and heuristic algorithms. Comput.                                     sor networks using ant colony algorithm.
          Netw., 53(7):945–960, May 2009.                                             Journal of Network and Computer Applica-
                                                                                      tions, 31(4):387–401, 2008.
[BGG16]          E. Belghache, J. P. Georgé, and M. P.
                                                          [PDN04]                     S. Patil, S. R. Das, and A. Nasipuri. Se-
                 Gleizes.    Towards an adaptive multi-
                                                                                      rial data fusion using space-filling curves in
                 agent system for dynamic big data
                                                                                      wireless sensor networks. In 2004 First An-
                 analytics. In 2016 Intl IEEE Conferences
                                                                                      nual IEEE Communications Society Confer-
                 on Ubiquitous Intelligence Computing,
                                                                                      ence on Sensor and Ad Hoc Communications
                 Advanced and Trusted Computing, Scal-
                                                                                      and Networks, 2004. IEEE SECON 2004.,
                 able Computing and Communications,
                                                                                      pages 182–190, Oct 2004.
                 Cloud and Big Data Computing, Inter-
                 net of People, and Smart World Congress  [SV16]                      N. SeyvetIgnacio and M. Viela. Applying
                 (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld),                             the kappa architecture in the telco industry,
                 pages 753–758, July 2016.                                            2016.

[CDBN09]         A. Caragliu, C. Del Bo, and P. Nijkamp.                [TR14]        B. Twardowski and D. Ryzko. Multi-agent
                 Smart cities in europe. Serie Research                               architecture for real-time big data process-
                 Memoranda 0048, VU University Amster-                                ing. In 2014 IEEE/WIC/ACM International
                 dam, Faculty of Economics, Business Ad-                              Joint Conferences on Web Intelligence (WI)
                 ministration and Econometrics, 2009.                                 and Intelligent Agent Technologies (IAT), vol-
                                                                                      ume 3, pages 333–337, Aug 2014.
[CKY+ 06]        Min Chen, Taekyoung Kwon, Yong Yuan,
                 Yanghee Choi, and Victor C.M. Leung.                   [Ues]         Shu Uesugi. Kappa architecture.
                 Mobile agent-based directed diffusion in
                                                                        [UG07]        S. Upadhyayula and S. K. S. Gupta. Span-
                 wireless sensor networks. EURASIP Jour-
                                                                                      ning tree based algorithms for low latency
                 nal on Advances in Signal Processing,
                                                                                      and energy efficient data aggregation en-
                 2007(1):036871, Oct 2006.
                                                                                      hanced convergecast (dac) in wireless sen-
                                                                                      sor networks. Ad Hoc Netw., 5(5):626–648,
[CMM08]          Huifang Chen, Hiroshi Mineno, and
                                                                                      July 2007.
                 Tadanori Mizuno.    Adaptive data ag-
                 gregation scheme in clustered wireless
                 sensor networks.    Comput. Commun.,
                 31(15):3579–3585, September 2008.

[Coc14]          Annalisa Cocchia. Smart and Digital City: A
                 Systematic Literature Review, pages 13–43.
                 Springer International Publishing, Cham,
                 2014.

[HB]             M. Hausenblas and N. Bijnens. Lambda ar-
                 chitecture.

[IGE+ 03]        C. Intanagonwiwat, R. Govindan, D. Es-
                 trin, J. Heidemann, and F. Silva.     Di-
                 rected diffusion for wireless sensor net-
                 working. IEEE/ACM Transactions on Net-
                 working, 11(1):2–16, Feb 2003.




International Conference on Advanced Aspects of Software Engineering                                                      Page 140
ICAASE, December, 01-02, 2018