Clustering-based data in ad-hoc networks
                             Bakhta Meroufel1, Ghalem Belalem2
                   1
                       bakhtasba@gmail.com , 2ghalem1dz@univ-oran.dz
                                  Department of computer science
                                        Faculty of Sciences
                                    Oran University (Es Senia)
                                            -Algeria-


          Abstract: Clustering is an important mechanism in large wireless sensor
       networks for obtaining scalability, reducing energy consumption and
       achieving better network performance. Most of the research in this area has
       focused on clustering based on physical parameters such as: energy, mobility,
       connectivity, density.., without taking in the count the data stored in each
       nodes. The main objective of this paper is to provide a useful fully-distributed
       algorithm for clustering that maximize the intra-cluster access, so we used a
       new heuristic parameter that combine between energy, mobility and number
       of data to select the Cluster-Heads. Our clustering strategy was compared with
       Lowest-ID Cluster Algorithm (LID) and the results show that our algorithm
       improves system performance and increases its life.

          Keywords: Clustering, data, ad-hoc networks, availability, mobility.

  1.    Introduction
   A mobile ad-hoc network (MANET) is a collection of mobile nodes that form a wireless
network without the existence of a fixed infrastructure or centralized administration. This
type of network can survive without any infrastructure and can work independently. Hosts
forming an ad hoc network can take equal responsibility in maintaining the network. Each
host provides routing services to other hosts to deliver messages to remote destinations. As
such a network requires no fixed infrastructure; it makes them better for deployment in a
volatile environment such as battlefield and disaster relief situations [6]. Some of the
constraints in MANETs are: limited bandwidth, low battery nodes and link frequent breaks
due to node mobility. These constraints must be taken into account, while maintaining the
connectivity between nodes. Clustering plays an important role in solving such problems
[11]. It hides the dynamic structure of the system by forming a hierarchical topology. There
are many researches that offer different strategies for clustering based on different metrics.
   Unfortunately, the majority of thus works do not take into account the management of
requests in the system [1]. In a network where energy is critical, the management of user
requests and the difference between intra-and inter-cluster access can consume a lot of
energy, which degrades system performances and reduces its life. In this paper, we propose
a new strategy of clustering that takes into account the data of the system in addition to
mobility and energy of the nodes. Our goals are minimizing the energy used by managing
the user’s requests and at the same time improving the response time and the stability of the
system. The rest of the paper is structured as follows: in the second section, we present some
related works on clustering. The third section presents the environment of our work. In


Proceedings ICWIT 2012                                                             261
section four we introduce our clustering algorithm based on the data, energy and mobility of
each node in the system. We validate the proposed approach by a set of experimental results
shown in the fifth section, and we finish this work with a conclusion and perspectives.
  2.     Related works
The algorithms differ on the criterion of selection of cluster-head. Among these algorithms
we have: the Lowest-ID Cluster Algorithm (LID) [6], in this algorithm, each mobile host in
the network must have a unique identifier id. The node that has the smallest id among all its
neighbors is elected as the cluster-head. The cluster is formed by the cluster-head and all its
neighbors. LID has the advantage of being simple and rapid but also generates a large
number of clusters and can be adjustable to changes in the topology. The Last cluster-head
Change algorithm (LLC)[4], is designed to minimize the change of cluster-head and
provides better stability in the composition of system. In High-Connectivity Clustering
(HCC) [9], [11], the cluster-head election is based on the degree of connectivity (number of
neighbors of the node) instead of the identities of the nodes. A node is elected as a cluster-
head if it has the highest connectivity among all its neighbors. This algorithm suffers from
frequent changes of cluster-head. In [3], two clustering algorithms are proposed, providing a
new approach. The first, Distributed Clustering Algorithm (DCA) which is targeted to
"quasi-static" in which the movement of nodes must be "slow". A weight is defined by the
speed of each node. The criterion for the election of the cluster-head is the maximum weight
in its neighborhood. The node whose weight is the greatest among all its neighbors is
elected cluster-head. The second algorithm is designed for networks and mobility called
Mobility-Adaptive Clustering algorithm (DMAC). Each node reacts locally to all changes
depending on its status: member node or cluster-head. In the two algorithms, it is assigned
different weights to the nodes and it is assumed that each node has knowledge of its weight.
A node is selected as cluster-head if its weight is greater than among all its neighbors. There
are other works that take into account data management in ad hoc networks, such as work
[13] which proposes a strategy with two steps: first create the cluster and then replicate the
data requested in each cluster. The works [12] and [7] also proposed methods of replication
to improve availability of data or to facilitate the update and allocation of data. All works
cited precisely separate between the clustering and data management and use the replication
to improve their approaches. In this paper we propose a strategy of clustering that includes
data management and the creation of clusters in one step by taking advantage of existing
data without creating other replicas in the system.

  3.     Contributions
   Proper management of queries in an ah-hoc network improves the reliability and
usefulness of the system but increases energy and reduces the lifetime of the nodes. The
majority of the works cited above focus on the physical characteristics of networks such as:
energy, connectivity, mobility, density, without taking into account the usefulness of the
network itself, that is to say the goal of building the networks, the type of services provided
and the types of treatments that can be achieved. In this paper, we propose a new algorithm
for non-overlapping clustering (see definition 2) to minimize power consumption and


Proceedings ICWIT 2012                                                              262
response time in the system. Minimizing the access time of queries depend on the location
and distribution of data in the system [8]. The main idea in this work is to maximize intra-
cluster access by maximizing the number of different data in the same cluster and remove
the replica of same data. In the example of Figure 1, the system contains two clusters, each
cluster has two different data (two colors) and in this case, the cluster can satisfy only
queries that research these two data. As against the second system in Figure 2, each cluster
contains three data which increases the number of requests satisfied at the intra-cluster.


 Fig1: First system: two colors in each cluster   Fig2: Second system: three colors in each cluster


 Model
       The system is modeled by an undirected graph G = (V, E) where V is the set of network
nodes and E models all the connections between these nodes. An edge (u, v) E if and only
if nodes u and v can mutually receive transmissions of each other. This means that all links
between nodes are bidirectional. In this case, we say that u and v are neighbors. The set of
neighbors of a node v V is denoted Neighv. Each node u of the network is associated to a
unique id and can communicate with its neighbors Neigh V. Each node is characterized by
its mobility Mob and energy Energ remaining in the battery, it also can store zero or more
data Di. Users can initiate read requests to access data in the system.
For the sake of clarity, we will use later in this paper the following notations:
    -     Neighu: The neighbors of node u.
    -     Energu: The remaining energy of the battery of node u.
    -     Mobu: The mobility of the node u.
    -     ListLu: List local data stored in the node u.
    -     ListTu: Total list of data stored in the node u and its neighbors (see formula 2).
Before describing our clustering algorithm in detail, we make the following assumptions,
which are common in the design of clustering algorithms for MANETs [1], [3], [6], [11],
[13]:
- The network topology is static during the execution of the clustering algorithm.
- A packet transmitted by a node can be received correctly by all its neighbors in a finite
      time.
- Each node has a unique id and knows its neighbors and vice versa.
- The inter-cluster access is expensive compared to intra-cluster access in terms of:
      bandwidth and energy.
One more of these assumptions, we also assume that all requests made by users are of the
reading type [10].


Proceedings ICWIT 2012                                                                   263
   4.    Clustering
   The clustering algorithm consists of two steps: the selection of cluster-head and then the
construction of clusters.
   4.1 Metric of clustering
   Given the interest in the concept of clustering and its undeniable contributions to improve
the performance of an ad hoc network, the choice of the clustering mechanism is important.
Thus, a clustering algorithm must first be able to select the appropriate nodes to ensure the
functionality of the cluster-head. In our algorithm, the cluster-head selection is based on a
new metric γ:
                              γ =          ||       ||                                     (1)
   -     ã : Metric of clustering.
   -     ||       ||: The cardinality of the data list of the node u and its neighbors.
                         ListTu =             m             u                           (2)
   -     m: The number of neighbors of node u.
   The node with the maximum γ in the neighborhood can be selected as a cluster-head. The
rapport between energy and mobility helps to select a cluster-head which has a relatively
high energy capacity and low mobility witch increase the stability of the system. Held that
the parameter ||        || can elect a cluster-head which has a great opportunity to satisfy
read requests in a single degree at the neighborhood.

Maximize the number of different data per cluster can be achieved by choosing as cluster-
head node, which in its first neighborhood has many different data including its own list of
data (Maximum ||           || . For example in Figure 3, assuming that the report
/i [0,3] is the same for all nodes i (just to explain) then:
   -      For node 0, the list ListT0 = {A} {A, Z} {A, H} = {A, H}.
   -      For node 1, the list ListT1 = {A, H} {C, B, F} {A} {A, Z} = {A,B,H,F,C,Z}.
   -      For node 2, the list ListT2 = {A, Z} {A, H} {A} = {A, E, H}.
   -      For node 3, the list ListT3 = {C, B, F} {A, H} = {A, H, C, B, F}}
   We note that the node1 in its neighborhood contains a lot of data compared to its
neighbors: | |ListT1 ||>||ListT3 ||>||ListT2 ||>||ListT0 | |, so node 1 is capable of meet a lot of
requests (read requests) by at most one jump (zero jump if data is stored in the node itself).
In this case, the node 1 will be selected as a cluster-head and broadcasts its status to its
neighbors.


                                  Fig3. A system with 4 nodes

   4.2 Clustering algorithm
   The construction of the clusters is through periodic exchange of messages which we will
call hello messages. Each network node exchanges with its neighbors the hello messages.


Proceedings ICWIT 2012                                                                  264
Each hello message sent by a node u contains four values are: idu, Statutu , (Energu / Mobu)
and ListLu.
- idu: the identifier of the node u.
- Statutu: The role of the node u in the system (See definition 3).
   This message is also used for each node to verify the presence of its neighbors. Thus, if a
node no longer receives hello message from a neighbor at the end of a period, it considers
that this neighbor has disappeared. So each node waits a specified period in advance and it is
assumed that during this period, all nodes have sent their hello message. In our algorithm we
use the following definitions:
    Definition 1 (Cluster): We define a cluster Ci by a connected sub-graph of the network,
with a diameter less than or equal to 2. The cluster has an identifier corresponding to the
identity of the node with the higher γ in the cluster, that is to say that if the cluster is the
cluster Ci then the cluster-head of this cluster has the identifier id = i.
   Definition 2 (Non-overlapping): a non-overlapping clustering ensures that there is no
node that belongs to two different clusters at the same time witch minimizes the conflict.
   Definition 3 (node status): Each node u has a status Statutu, the Statutu can be one of the
following: CH: cluster-head node, MN: member node, or GN: gateway node. These three
roles are defined in definitions 3.1, 3.2 and 3.3:
   Definition 3.1 (Cluster-head): A node u is called cluster-head of cluster Ci iff: idu =i ˄
v Neighu , γu>γv.
   Definition 3.2 (member node): A node u is said member node if it is not a cluster-head
and it does not have in its neighborhood a node associated to a different cluster.
u Ci Then idu ≠ i ˄ [ v (Cj/Cj ≠ Ci) ˄ v Neighu]
   Definition 3.3 (Gateway node) A node u is a Gateway node iff: u Ci Then v Neighu
/ v Cj ˄ Cj ≠ Ci. Gateway node has a special role. It provides access to one or more
neighboring clusters.
   Definition 4 (Degree of no-similarity): the degree of non-similarity             between two
nodes u and v is the size of the list of data that exists in a node and does not exist in the other
(formula 2). Two nodes with a high need each other more then two nodes with a lower
(disjointness of contents).
                      (u,v)= ||(ListLu ListLv ) - (ListLu ListLv )||                            (3)
    Where:
   -      u,v: Nodes.
   -      ListLu: List of local data in the node u.
For example, if local lists of data in nodes u, v, h are ListLu = {A, B, C}, ListLv = {A, E, F,
G, H}, ListLh = {A, B, C, D} respectively then:
   -        (u, v)= || {B, C, E, F, G, H} || = 6.
   -        (u,h)= || {D} || =1.

     Clustering steps are:
1.    Each node calculates the parameter γ and disseminates information on its neighbors by
      the Hello packet.


Proceedings ICWIT 2012                                                                  265
2.   If the node has the maximum γ among its neighbors then it becomes the cluster-head
     (Status = CH) and sends requests to join the cluster to the other. If two nodes u and v
     have γu = γv , then the cluster-head is the node that has the lowest id.
3.   The node that receives a request to join a cluster, then it became with a Status = MN.
4.   If the node receives several requests from different cluster-head to join their clusters,
     then it becomes a gateway node: Status = GN. But as a cluster selects the one with the
     maximum γ. In case of a tie, the node selects the cluster-head that has the maximum no-
     similarity degree â with it.
5.   If two neighboring nodes have both a Status = CH, then the first node that detects the
     conflict through hello messages received compared γ to that of its cluster-head
     neighbor. If the γ is smaller, it abandoned his status as cluster-head and becomes a
     member node and sends a hello message to its neighbors announcing its new status.
     Otherwise, it retains the role of cluster-head.
6.   The algorithm terminates when each node in the system specifies its status.

   For example in Figure 4: γ0 = 20, means the node will be a cluster-head. The node 2 will
be a node member in the cluster of node 0. Node 7 is a getaway node. The result of
clustering is shown in Figure 4.


                         Fig 4. Example of clustering of 8 nodes

  4.3 Maintenance of the topology
  In ad hoc networks, topology changes frequently due to node mobility and energy. So we
have to manage:1) nodes that move, 2) nodes that disappear and nodes that appear.
Our algorithm can also manage these cases.

 -    The appearance of a new node. When a new node enters the network, it broadcasts at
      a regular interval of time a message of type hello. At the end of the timeout, if it has
      not received any type hello message from a cluster-head, the node becomes cluster-
      head. If the node received a hello message from a cluster-head, the node becomes a
      member node. In the event that it receives from more than one cluster-head, the node
      becomes a gateway node (Execute step 4 of the clustering algorithm)
 -    Disappearance of a node. In the same way as for the appearance by exchanging hello
      messages the neighbors are aware of the disappearance.
 -    Moving a node. Case equivalent to the emergence of a new node. In this case the hello
      messages will be received by the other new neighbors.


Proceedings ICWIT 2012                                                             266
   5       I
           Implement
                   tation and validation
                              v
   To vallidate our appproach, we ussed our simulaator that allowws us to run aand measure the
performaance of our strrategy of clustering CD (DData based Cluustering) and compare it with
                                                                                          w
the LID strategy
        s        [6]. As
                      A it is alreadyy mentioned inn the section of
                                                                 o related worrks, The LID is
                                                                                           i a
one hop clustering thaat selects as cluster-head thet node withh the minimuum id among its
neighborss without takking into acccount other characteristics
                                                   c               of system ssuch as energy,
mobility. The parameteers of simulatiions are preseented in Table 1.

                                  Table 1: Parameters of simulations

                          Variable                             Value
                          Total number of nodes               1-10000
                          Node mobilityy (%)                  1-100
                          Node energy (%)
                                       (                      10-1000
                          bandwidth (MMb/s)                   20-1000
                          range (m)                           10-1000
                          Total number of data                10-3000
                                                     de
                          Total number of data per nod        0-10
                          Data size (Mo)                      1-10


In the firsst experiment, we studied the
                                     t impact of the
                                                   t range on thet number off clusters in bo
                                                                                           oth
clusteringg approaches LID and CD.. The range specifies
                                                 s          the number
                                                                n       of possible neighbors
for each node. We stuudied the imppact of this paarameter on the t number of clusters in the
system. The
          T results arre shown in Figure
                                    F      5. Increeasing the rannge minimizes the numberr of
clusters in the both approaches becaause it increasses the numbeer of neighbors per node. But
                                                                                           B
our approoach optimizees the number of clusters with a gain of  o 25% comppared to the LIDL
approachh.

The numbmber of nodes also affects thhe number of clusters. Acccording to the results obtain
                                                                                           ned
in the seecond experim  ment (see Figgure 6). Increeasing the nuumber of nodees increases the
number of o clusters, buut not on the saame frequency
                                                    y for both appproaches. Ourr approach CD
                                                                                           D is
better thaan LID approaach with a gainn estimated byy 22.8%.


       Fig5: Range vs Num
                        mber of clusterss.          Fiig6: Number off nodes vs Num
                                                                                 mber of clusters.

The nummber of transitiions is a veryy important paarameter; it reeflects the staabilization of the
                     b the clusteriing. The transsition is the paassage from configuration i to
topology constructed by


Proceedinngs ICWIT 20012                                                                  267
configuraation i +1. Thee configuratioon is the resultt of the execuution of at leasst one step of the
clusteringg algorithm [22]. The resultss of our third experiment
                                                      e           (seee Figure 7) sshow that desp pite
the increased number of nodes in random graph         hs we used, thet number oof transitions for
stabilizatiion in our alggorithm CD varies very slig                s better witth 30 % than the
                                                      ghtly, while still
clusteringg approach LIID, because CD C is purely local
                                                      l      and doess not require that informatiion
obtained through the neighborhood hello h     messagees. This ensurees the scaling..

Minimiziing response time
                        t    increasess system reliab
                                                    bility. The ressults of the foourth experim
                                                                                              ment
shown inn Figure 8 prove
                       p      that in the LID app   proach, the response
                                                                 r           timee increases with
                                                                                              w
increasingg number of nodes and thee time is greaater comparedd to our approoach because the
data in clluster formed by LID are loocated random mly. The CD approach
                                                                 a           workks better becau
                                                                                               use
the CH maximizes
          m           the number of data in its neighborhood which increaases intra clusster
access froom access inteer-cluster. The CD minimizzes 43% of reesponse time ccompared to LID   L
approachh.


Fig7 : Num
         mber of nodes vs Number of traansitions Fig8
                                                    8: Number of noodes vs responsse time

In the last experimentt, we studiedd the impact of   o number off read requestts to the enerrgy
consumedd in the system. We notee that the eneergy increases with increaasing number of
applicatioons for both approaches
                        a             beecause each query
                                                    q      must bee redirected too other nodess in
different clusters to reach
                        r      the ansswer. But ourr approach CD C decreases remarkably the
energy coonsumption (aa gain of 25%   %) because it maximizes thhe number of different dataa in
each clusster taking in the
                        t account thhe mobility an  nd the energy of
                                                                  o each node, which improv    ves
the intra-cluster accesss and stabilize the topology (see Figure 9)).


                   Fig9: Number of reqquests vs Consu
                                                    umption of enerrgy.


Proceedinngs ICWIT 20012                                                                    268
  6      Conclusion
   In this report, we proposed a clustering strategy that maximizes intra-cluster access and
minimizes energy consumption. This strategy uses energy, mobility and the types of data
stored in the neighborhood to elect the cluster-head. The experimental results demonstrate
the effectiveness of our proposal. Our strategy can be used to create clusters in unstructured
P2P system as FastTrack and Gnutella [5], where the number of message and consumption
of bandwidth are large. To avoid thus problems and increase the quality of services in
requests management, we use our clustering algorithm where for each node u:             =1, so
γ=||ListTu||. Perspective, we offer extension work by taking into account the availability of
links between the different neighboring data to improve system reliability.

  References
 1.    A. Abbasi, M. Younis, A survey on clustering algorithms for wireless sensor networks, Journal
       Computer Communications. 30(14): 2826–2841, 2007.
 2.    D.J. Baker, A. Ephremides and J. A. Flynn. The Design and Simulation of a Mobile Radio
       Network with Distributed Control. IEEE Journal on Selected Areas in Communications, 2(1):
       226–237, 1984.
 3.    S. Basagni, Distributed Clustering Ad Hoc Networks, In Proceedings of the IEEE International
       Symposium on Parallel Architectures, Algorithms, and Networks (I-SPAN), pages 310-315,
       august 1999.
 4.    C-C. Chiang, H-K. Wu, W. Liu, M. Gerla, Routing in Clustered Multi hop, Mobile Wireless
       Networks With Fading ChannelL. IEEE Singapore international conference on networks, pages
       197-211, 1997.
 5.    K. Eger, T. Ho_feld, A. Binzenhofer, and G. Kunzmann, Efficient simulation of Large-Scale
       P2P Networks: Packet-level vs. Flow-level Simulations, in Proc. UPGRADE-CN'07, HPDC
       Workshops, pages 1-16, USA, 2007.
 6.    J. W. A. Ephremides and D. J. Baker, A design concept for reliable mobile radio networks with
       frequency hopping signaling, Proceeding of IEEE, pages 53-76, January 1987.
 7.    T. Hara, Replica Allocation Methods in Ad Hoc Networks with Data Update, Journal Mobile
       Networks and Applications, 8(4): 342-354, 2003.
 8.    B-J. Ko and D. Rubenstein. Distributed self-stabilizing placement of replicated resources in
       emerging networks. IEEE/ACM Trans. Netw., 13(3): 476–487, 2005.
 9.    S-J. Lee, W. Su, J. Hsu, M. Gerla and R. Bagrodia, A Performance Comparison Study of Ad
       hoc Wireless Multicast Protocols. Proceedings of the IEEE International Conference on
       Computers and Communications, pages 565 – 574, March 2000.
 10.   B. Meroufel, G. Belalem: Availability Management in Data Grid, Lecture Notes in Electrical
       Engineering, 1, Volume 107, IT Convergence and Services, Part 1, Pages 43-53.
 11.   H. Taniguchi, M. Inoue, T. Masuzawa, H. fujiwara, Clustering Algorithms in Ad Hoc Networks,
       Electronics and Communications in Japan, Part2, 88(1): 51-59, 2005.
 12.   L. Yin and G. Cao, Balancing the tradeoffs between data accessibility and query delay in ad
       hoc networks, Proceedings of the 23rd IEEE International Symposium on Reliable Distributed
       Systems, pages 289 – 298, 2004.
 13.   J. Zheng, J. Su, and X. Lu. A Clustering based Data Replication Algorithm in Mobile Ad hoc
       Networks for Improving Data Availability. In Proc 2nd International Symposium on Parallel and
       Distributed Processing and Applications (ISPA 2004), Pages 399–409, 2004.


Proceedings ICWIT 2012                                                                   269