=Paper=
{{Paper
|id=Vol-2326/short2
|storemode=property
|title=An Agent-based Approach for Dynamic Big Data Processing in a Smart City
Environment
|pdfUrl=https://ceur-ws.org/Vol-2326/short2.pdf
|volume=Vol-2326
|authors=Zakarya Elaggoune,Ramdane Maamri,Imane Boussebough
|dblpUrl=https://dblp.org/rec/conf/icaase/ElaggouneMB18
}}
==An Agent-based Approach for Dynamic Big Data Processing in a Smart City
Environment==
An Agent-based Approach for Dynamic Big Data Processing in a
Smart City Environment
Zakarya Elaggoune Ramdane Maamri
LIRE Laboratory LIRE Laboratory
Constantine 2 University Constantine 2 University
25000 Constantine, Algeria 25000 Constantine, Algeria
zakarya.elaggoune@univ-constantine2.dz ramdane.maamri@univ-constantine2.dz
Imane Boussebough
LIRE Laboratory
Constantine 2 University
25000 Constantine, Algeria
iboussebough@gmail.com
strengthen their focus in this sector. They recognized ten
important areas that will play a key role in creating a
Abstract smart city: smart lifestyle, smart security system, smart
home, smart building, smart environment, smart govern-
The big data era brought us new processing and ment, smart grid, smart tourism, smart transportation
information management challenges to face. and smart health [CDBN09]. Each component of smart
The existing tools managed to control the on- cities is based on large-scale data analysis that show pub-
going challenges, and the current architectures lic safety, economic development, pollution, traffic con-
are close to meeting the needs of the users. But ditions, and so on.
the volume rate at which new data is generated Smart cities are an imminent need, and are the true
leads to new rising challenges. This is especially form of smart earth applied to custom areas to achieve
true in the context of smart cities, where gath- intelligent and integrated city management. In smart
ering information in an energy-efficient man- cities, different sets of data are continually analyzed to
ner to prolong the lifetime of Wireless Sensor present intelligent planning ideas, intelligent building
Networks (WSNs); and adapting the analytical models and intelligent management, where big data is
mechanism to support the speed at which new treated as the fuel of any smart system [Coc14].
data is generated to deliver real-time results dy- At the beginning of the Big Data era, three main chal-
namically are the two key rising challenges. This lenges inherent to the characteristics of big data ap-
article aims at exploring and describing how peared (the "3V" initial Big Data):
Multi-Agent Systems (MAS) can handle a large
amount of data with a dynamic analytics capa- Volume: data sets with enormous size and complex-
bilities and in an energy-efficient manner. ity (many features),
Velocity: fast generation of data arriving in continu-
1 INTRODUCTION ous flows,
The prospects for smart cities are very promising, and
Variety: Different types of data come in different
various smart device manufacturing groups, for exam-
forms.
ple, IBM and Intel, are launching various initiatives to
These challenges, also known as "data flood", have
Copyright © by the paper’s authors. Copying permitted for private and
academic purposes.
pushed storage systems and processing techniques to
their limits at that time. After becoming familiar with the
In: Proceedings of the 3rd Edition of the International Conference on
Advanced Aspects of Software Engineering (ICAASE’18), Constantine, first three challenges, the new techniques began to per-
Algeria, 1,2-December-2018, published at http://ceur-ws.org form well, but soon the flood of data overwhelmed these
Page 134
An Agent-based Approach for Dynamic Big Data Processing in a Smart City Environment ICAASE'2018
Figure 1: An overview of the system
techniques. Indeed, as the volume of data increased and each time new data arrives, and this to integrate the new
sources multiplied, raw data became increasingly poor data sensed into the processing cycle. Restarting the an-
and useful information became scarcer. Increasingly, the alytical process periodically consumes energy and time,
usefulness and reliability of data and their sources have therefore, processing data continuously without stopping
been questioned. Hence the emergence of two new chal- in an adaptive way is a necessary task.
lenges taking the "3V" challenges of big data to "5V". Thence, our goal is to propose a new approach for
[JGL+ 14] define the new ’Vs’ as follows: smart cities, that can gather relevant information in a
smart manner and can adapt to changes that occur in the
Value: the usefulness of the data or more precisely
data without having to restart the entire process. There-
the amount of useful information among the flooded
fore, we use a multi-agent approach to design a two-tiers
data,
system: the first tier for data gathering and preprocessing
Veracity: Reliability and confidence attributed to the (a smart wireless sensor network); and the second one
data and its sources. is a real-time multi-agent system for dynamic big data
analytics.
With the recent increase in the number of smart and The rest of this article is organized as follows. In Sect.
portable devices and other measuring instruments in am- 2, we describe the two-tiers multi-agent-approach. In
bient applications and smart cities, we are just beginning Sect. 3 the smart WSN is presented, describing in de-
to address every aspect of this new big data. In the smart tail the different steps of relevant data extraction. Then
cities context, we can extract two main rising challenges we discuss the dynamic big data mechanism In Sect. 4.
from this new big data: Lastly, we conclude our study in Sect. 5.
Gathering data from WSN in an energy-efficient man-
ner.A WSN consists of a large number of sensor nodes
2 An Overview of the System
with limited batteries, which are randomly deployed over
an area to collect data. The lifetime of the network de- In this system we propose the use of fuzzy agents for the
creases because of these limited batteries. Therefore, it data relevance estimation. To communicate the data be-
is important to minimize the energy consumption of each tween sensor nodes with low energy consumption, we
node, which leads to the extension of the lifetime of the use the technique of clustering, where in each Cluster-
WSN. Since many of the detected data could be redun- Head(CH) an instance of a fuzzy agent is embedded. Af-
dant or unimportant, collecting only relevant data could ter gathering the data, each CH sends the extracted rel-
be a good technique for saving energy in sensor nodes evant data to the sink node, this last one dispatch the
and extending network lifetime. relevant data to the second-tier (processing agent) for
Managing the dynamicity of the data in an adaptive real-time analysis.
way. One of the advantages of big data is the exploita- Concerning the second-tier, which is the big data pro-
tion of the large volume of data in several purposes, like cessing, we use a multi-agent system to build a three-
business strategies and healthcare. For efficient data ex- layer big data processing system: a real-time processing
ploitation, the data processing process stops and restarts layer; an adaptive batch processing layer; and a service
International Conference on Advanced Aspects of Software Engineering Page 135
ICAASE, December, 01-02, 2018
An Agent-based Approach for Dynamic Big Data Processing in a Smart City Environment ICAASE'2018
layer that combines the results of the two previous layers. sor networks. they try to tackle the problem of build-
The aimed system is composed of the following set of ing an aggregation tree for a group of source nodes
components (see Figure 1): in the WSN to send sensory data to the base sta-
tion. However, the construction of this tree largely
• First-tier- a smart wireless sensor network: sensor depends on the deployment of the nodes, which is
node; fuzzy agent; cluster-head; sink node. generally random, and consumes a large amount of
• Second-tier- a dynamic big data processing: data energy. Since the communication range of a node
node; processing agent; knowledge; service agent. is limited, the nodes can only communicate with
their one hop neighbors, so the euclidean distance
between the source node and the receiving node is
3 First-Tier: Smart Wireless Sensor Net-
unreliable
work
the basic role of sensor nodes is to collect information • Mobile agent based directed diffusion (MADD)
from the environment and send them to the base station [CKY+ 06]: The authors considered mobile agents
in order to perform calculations. This collection must re- (MA) in multi-hop environments and adopted direct
spect the battery life of each node to maintain the lifetime broadcast to dispatch the MA. In directed broadcast-
of the network. ing, a detection task is broadcast through the sensor
The traditional model of data collecting is the network as requests of interest for named data, i.e.
Client/Server (C/S) approach. In the C/S approach; the interests of the users are diffused through the
when the sensors capture the data, they send it directly sensor network. The sink node floods a request to
to the base station as unprocessed raw data. in addi- the interest sensors and the intermediate nodes set
tion, to send data to the base station, the communication gradients to send data around the routes to the sink
goes through a multi-hop communication. This multi- node[IGE+ 03]. however, the current MADD frame-
hop communication causes additional power consump- work is only suitable when the data is retrieved di-
tion, because intermediate nodes relay information on rectly from the network whenever there are request
more distant nodes. Several studies have been done to from the users. some enhancement for the frame-
optimize the architecture of this model, some works are work is needed to retrieve requests only from the
listed below: active area.
• Incremental data fusion of a maximum number of • There are several works that have proposed a struc-
sensors [PDN04]: when a node sends its data to the tured strategy like multicast tree[AKUMK09, UG07].
sink, the intermediate nodes merge their data with However, because of excessive communication costs
others coming from the first node. Therefore, this and centralized management of the sensor network
data is fused into a single message. this solution structure, structured approaches are not good for
is not scalable, and it is suitable only for networks dynamic scenarios.
which does not contain a large number of nodes.
Furthermore, the intermediate nodes do not have al- After having analyzed the solutions presented above, we
ways relevant information to send and they do not can deduce that there is still a lot of work in terms of en-
filter out redundant and irrelevant information. ergy efficiency in the wireless sensor networks field, and
since preprocessing data and eliminating irrelevant in-
• Data aggregation for clustered WSN [CMM08]: the
formation contributes to lower energy consumption, our
authors propose a clustering algorithm in which
goal is to propose a wireless sensor network based on
sensors choose themselves as cluster heads with a
the relevance of data. We use the agent technique for
certain probability and disseminate their decisions.
intelligent and adaptive management.
their work focuses on incorporating adaptive behav-
ior into protocols in such a dynamic network. Once For more efficiency, we have proposed the use of the
the data from each node is received, the cluster head clustering technique to send data easily to the Sink and
transmits it directly to the sink. This solution based for batter organization. We can use the algorithm Low
in the cluster heading paradigm which consumes a Energy Adaptive Clustering Hierarchy (LEACH) or any
large amount of energy. Furthermore, the authors other efficient algorithm to decompose the network into
did not address the problem of complexity and ne- clusters, each with a Cluster-Head (CH). To achieve our
glected the importance of scalability of such kind of objective, we propose to integrate, into each CH, a fuzzy
networks. agent to process data, eliminate non-useful data, and re-
duce redundancy. Each CH in the network is seen as an
• The ant agent [LKF08]: the authors present a data autonomous fuzzy agent with its own attitudes and char-
aggregation based on ant colonies for wireless sen- acteristics towards the different events they receive.
International Conference on Advanced Aspects of Software Engineering Page 136
ICAASE, December, 01-02, 2018
An Agent-based Approach for Dynamic Big Data Processing in a Smart City Environment ICAASE'2018
3.1 Fuzzy Agent Role Behaviors 4 Second-Tier: Dynamic Big Data Process-
ing
4.1 Big Data Architectures
The most used process for big data analysis is the dis-
tributed pipeline (Figure 3-a). this model has been pro-
posed to circumvent the rigidity problem by reducing the
processing time by means of parallelism. This pipeline is
based on the MapReduce pattern and its famous Hadoop
framework.
However, applying this model does not solve the
problem of data dynamicity, moreover, this model relies
on batch processing and does not really focus on real-
time processing, which leaves always a portion of non-
Figure 2: Degree of relevance of data
processed data (Figure 3-b).
Other architectures have extended this model, try-
The aim of the WSN is to collect the maximum data ing to support the real-time processing, in the follow-
and eliminate the irrelevant or redundant ones. ing paragraphs we will discuss the two most used archi-
Each Cluster-Head in the network is associated with tectures: Lambda Architecture(LA) and Kappa Architec-
a fuzzy agent (FA), the principal role of the FA is to use ture(KA).
fuzzy logic to estimate the relevance of the data and to
eliminate the unimportant data. Hence, we have defined • lambda Architecture (LA): "The LA aims to satisfy
two main points for fuzzy agent to extract the relevant the needs for a robust system that is fault-tolerant,
information, which means to reduce the power of each both against hardware failures and human mistakes,
node and to extend the life of the WSN: being able to serve a wide range of workloads and use
cases, and in which low-latency reads and updates are
required. The resulting system should be linearly scal-
1. Degree of relevance of data: the degree of relevance able, and it should scale out rather than up." [HB]
of the data strongly depends on the desired appli-
This is what it looks like, from a high level point of
cation. This parameter is calculated locally in the
view [HB]:
sensor node. The fuzzy agent can estimate the de-
gree of relevance of the data collected. This informa- – All streamed data is sent to both the batch layer
tion is taken into account if it’s the primary informa- and the speed layer,
tion containing the required information. for exam-
– The Batch layer pre-calculate the batch views,
ple, for air pollution monitoring, the node records
the latest collected data to compare with the new – The serving layer indexes the batch views so
ones collected. The fuzzy agent considers data as that they can be queried in low-latency way,
relevant if the difference between the two values is – The speed layer indemnify the high latency of
greater than a predetermined threshold. However, updates to the serving layer and process only
if the difference increases, the fuzzy agent consider recent data,
that these data have a higher priority, so the degree
– Any incoming query can be resolved by merg-
of relevance increases.
ing results from real-time views and batch
views.
2. Inter-sensor-nodes redundancy elimination: typically, The idea behind these layers was that the speed
the sensor nodes are randomly deployed. so, layer will be providing real-time results into serving
many sensor nodes will cover the same geographi- layer, and if any data is missed while stream process-
cal points, which means that they will give the same ing or any data errors, then batch job will compen-
information (redundancy). In this case, the fuzzy sate that and updates the serving layer, so provid-
agent will compare the values collected by each sen- ing accurate results. But it is very hard to build the
sor node with its neighbors for eliminating the inter- pipeline and maintain analysis logic in both batch
sensor-nodes redundancy. and speed layer.
Figure 2 illustrates the fuzzy logic used by the agent to
estimate the relevance of the data.
International Conference on Advanced Aspects of Software Engineering Page 137
ICAASE, December, 01-02, 2018
An Agent-based Approach for Dynamic Big Data Processing in a Smart City Environment ICAASE'2018
Figure 3: (a) distributed Big Data Analytics pipeline [BGG16];(b) Big data processing-Batch
4.2 The need of new dynamic approaches
After having analyzed the solutions presented above, we
can deduce that available big data architectures do not
really adapt to the dynamism of data. Furthermore, they
must restarting periodically to take into account the real-
time data streamed and does not integrate the new data
in adaptive way.
The MAS technology, with the cooperative interaction
process of its autonomous agents, gives us the means to
Figure 4: Lambda Architecture [SV16] break the rigidity problem in the other big data architec-
tures, and can offer an adaptive management of big data
• Kappa Architecture (KA): "Kappa Architecture is a streaming without the need to restarting the process pe-
simplification of Lambda Architecture. A Kappa Ar- riodically.
chitecture system is like a Lambda Architecture sys- When an agent receives new data, it starts processing
tem with the batch processing system removed. To re- data directly to deliver real-time results. And after this
place batch processing, data is simply fed through the agent consumes all the data stored in his node, he creates
streaming system quickly." [Ues] a link with the last agent in the batch-layer to contribute
to the batch processing (distributed data mining), and
another agent with an empty data node takes his place
One of the disadvantage of the lambda architecture,
for real-time data processing. This translates into data
as detailed above, is to have to keep coding and ex-
analysis tasks in interaction, mainly through communi-
ecuting the same logic twice, and this is avoided in
cation, then each task can help and work with other tasks
the kappa architecture. However, the kappa archi-
for the sake of continuous real-time adaptation of the an-
tecture should only be considered an alternative to
alytic process to changes in data.
the lambda architecture in applications that do not
The cooperation between the agents is described in
require unbounded retention.
the following steps (Figure 6) :
1. Each node in the system is associated with a process-
ing agent. The node that receives the captured data
from the WSN is responsible for rel-time processing
and returns real-time views as a results , the other
nodes in the system work on the batch processing
and return the batch views.
Figure 5: Kappa Architecture [SV16] 2. Agents in the batch-layer are partitioned into neigh-
borhood groups. The neighborhood is defined by
International Conference on Advanced Aspects of Software Engineering Page 138
ICAASE, December, 01-02, 2018
An Agent-based Approach for Dynamic Big Data Processing in a Smart City Environment ICAASE'2018
Figure 6: Multi-Agent Cooperation
time, from which two neighboring agents represent speed up readings [TR14].
two successive periods. Each group represents a full The real-time Views and the batch Views are created for
batch period, from where agents of the same group a specific use case. This use case problem is resolved in
apply distributed data mining and display batch- the serving layer (Figure 8). Querying from the users
views. is managed by a dedicated service agent. For each new
query the service agent is created.
3. Whenever the data stored in the real-time node is To prepare the response and solve the given problem, ser-
processed, the real-time agent updates the real-time vice agent is collects the needed data. Fresh online data
views and creates a link with the last agent in the are provided by the real-time views. A similar process-
batch-layer to contribute to the batch processing. ing is done to collect historical data (batch-views). Both
Another agent with an empty data node takes his views are combined together to display the whole picture
place for real-time data processing. of the data.
Another way to achieve this goal, is to use the property After combining all required data from the real-time and
of System-of-Systems (SoS) by combining one or several batch views, the response is presented. In this point the
MASs for each step of Big Data analytics and represent life-cycle of service agent ends.
them with an agent in one super MAS (see figure 7). this
property is used to widen the batch period.
Figure 8: Second-tier: dynamic big data processing
5 Conclusion
This two-tiers approach allow building the smart city as
an agent community that can work in distributed and
complex systems. The first-tier describes the construc-
tion and effective used of fuzzy agents in the wireless
Figure 7: MAS of MAS based Big Data Analytics sensor network, with the consideration of the relevance
of collected data, which can help enormously in the pro-
longation of the lifetime of the network by decreasing the
4.3 Service Agent
energy consumption of each sensor node. In the process-
The service agent is responsible for serving the views ing layer, we described and discussed how multi agent
computed by the real-time and batch layers. This pro- system can be applied to process big data dynamically
cess can be facilitated by additional indexing of data to without the need to restarting the process periodically.
International Conference on Advanced Aspects of Software Engineering Page 139
ICAASE, December, 01-02, 2018
An Agent-based Approach for Dynamic Big Data Processing in a Smart City Environment ICAASE'2018
As systems architecture and agent behaviors were de- [JGL+ 14] H. V. Jagadish, Johannes Gehrke, Alexan-
signed, in our future research, we move into the imple- dros Labrinidis, Yannis Papakonstantinou,
mentation and validation phases. Jignesh M. Patel, Raghu Ramakrishnan,
and Cyrus Shahabi. Big data and its techni-
References cal challenges. Commun. ACM, 57(7):86–
94, July 2014.
[AKUMK09] Jamal N. Al-Karaki, Raza Ul-Mustafa, and
Ahmed E. Kamal. Data aggregation and [LKF08] Wen-Hwa Liao, Yucheng Kao, and Chien-
routing in wireless sensor networks: Op- Ming Fan. Data aggregation in wireless sen-
timal and heuristic algorithms. Comput. sor networks using ant colony algorithm.
Netw., 53(7):945–960, May 2009. Journal of Network and Computer Applica-
tions, 31(4):387–401, 2008.
[BGG16] E. Belghache, J. P. Georgé, and M. P.
[PDN04] S. Patil, S. R. Das, and A. Nasipuri. Se-
Gleizes. Towards an adaptive multi-
rial data fusion using space-filling curves in
agent system for dynamic big data
wireless sensor networks. In 2004 First An-
analytics. In 2016 Intl IEEE Conferences
nual IEEE Communications Society Confer-
on Ubiquitous Intelligence Computing,
ence on Sensor and Ad Hoc Communications
Advanced and Trusted Computing, Scal-
and Networks, 2004. IEEE SECON 2004.,
able Computing and Communications,
pages 182–190, Oct 2004.
Cloud and Big Data Computing, Inter-
net of People, and Smart World Congress [SV16] N. SeyvetIgnacio and M. Viela. Applying
(UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld), the kappa architecture in the telco industry,
pages 753–758, July 2016. 2016.
[CDBN09] A. Caragliu, C. Del Bo, and P. Nijkamp. [TR14] B. Twardowski and D. Ryzko. Multi-agent
Smart cities in europe. Serie Research architecture for real-time big data process-
Memoranda 0048, VU University Amster- ing. In 2014 IEEE/WIC/ACM International
dam, Faculty of Economics, Business Ad- Joint Conferences on Web Intelligence (WI)
ministration and Econometrics, 2009. and Intelligent Agent Technologies (IAT), vol-
ume 3, pages 333–337, Aug 2014.
[CKY+ 06] Min Chen, Taekyoung Kwon, Yong Yuan,
Yanghee Choi, and Victor C.M. Leung. [Ues] Shu Uesugi. Kappa architecture.
Mobile agent-based directed diffusion in
[UG07] S. Upadhyayula and S. K. S. Gupta. Span-
wireless sensor networks. EURASIP Jour-
ning tree based algorithms for low latency
nal on Advances in Signal Processing,
and energy efficient data aggregation en-
2007(1):036871, Oct 2006.
hanced convergecast (dac) in wireless sen-
sor networks. Ad Hoc Netw., 5(5):626–648,
[CMM08] Huifang Chen, Hiroshi Mineno, and
July 2007.
Tadanori Mizuno. Adaptive data ag-
gregation scheme in clustered wireless
sensor networks. Comput. Commun.,
31(15):3579–3585, September 2008.
[Coc14] Annalisa Cocchia. Smart and Digital City: A
Systematic Literature Review, pages 13–43.
Springer International Publishing, Cham,
2014.
[HB] M. Hausenblas and N. Bijnens. Lambda ar-
chitecture.
[IGE+ 03] C. Intanagonwiwat, R. Govindan, D. Es-
trin, J. Heidemann, and F. Silva. Di-
rected diffusion for wireless sensor net-
working. IEEE/ACM Transactions on Net-
working, 11(1):2–16, Feb 2003.
International Conference on Advanced Aspects of Software Engineering Page 140
ICAASE, December, 01-02, 2018