<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Graph Analytics on Proximity Data to Fight Contagion</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Snehasis Banerjee</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vivek Chandel</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Avik Ghose</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>TCS Research &amp; Innovation, Tata Consultancy Services</institution>
          ,
          <addr-line>Kolkata</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>[4] K. Das</institution>
          ,
          <addr-line>S. Samanta, M. Pal, Study on centrality</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>As social distancing has become the new normal, it is eminent for enterprises and administrations to make decisions based on analytics on human proximity data. Graph Analytics can serve as an essential technology to understand human network behaviors and contact tracing inferences in pandemic and post-pandemic scenarios. A large amount of work has happened on epidemiology of pandemics. However, a formal application of graph analytics backed by visualization and mapping of graph algorithms has remained neglected in pandemic scenarios. This paper has addressed this important need backed by graph modeling, methods and analytics.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;graph analytics</kwd>
        <kwd>contact tracing</kwd>
        <kwd>COVID-19 pandemic</kwd>
        <kwd>human proximity data</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Title of the Proceedings: Proceedings of the CIKM 2020 Workshops,
October 19-20, Galway, Ireland.</p>
      <p>Editors of the Proceedings: Stefan Conrad, Ilaria Tiddi.
snehasis.banerjee@tcs.com (S. Banerjee)
0000-0001-6497-2085 (S. Banerjee)</p>
      <p>© 2020 Copyright for this paper by its authors. Use permitted under Creative
CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g CCoEmUmoRns WLiceonrsekAsthtriobuptioPnr4o.0cIneteerdnaitniognasl ((CCC EBYU4R.0)-.WS.org)</p>
    </sec>
    <sec id="sec-2">
      <title>1https://www.postgresql.org ; https://postgis.net 2https://www.orientdb.org</title>
      <p>2. Graph Data Population
While the individual events of proximity instances are
stored in a RDBMS, the actual inference and analytics
happens over the graph database. As any active
sensor consume power, hence keeping a sensor device
always online which sends events to server is
impractical. Hence, there is a sleep and wake cycle, when
proximity events are detected and sent to server database
at intervals. However, to perform graph analytics, the
data instances need to aggregated to populate the graph
with meaningful information. Therefore, if there is no
gap greater than a preset time threshold T in terms of
two persons being in contact, the proximity event
instances becomes a continuous event. When the gap is
more than T, a new entry is entered in the graph in a Figure 2: Modeling of nodes and edges in a proximity graph
similar fashion. Sometimes, due to sensor errors and
signal reflection, false proximity events can be recorded.</p>
      <p>
        However, as this is two way detection (person 1 de- is how a person proximity event is generated. A
pertecting person 2 and vice versa when they are in close son can have multiple tagged sensors (like two smart
contact of say 2 metres) [3], if there is only one event phones). Same sensor device (with a unique device ID)
detected, that is dropped from graph insertion – this is can be used by various persons in disjoint time frames.
a configurable setting as per sensitivity needs of spe- (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) Static zone tagged sensor: fixed sensors like
beacific deployments. In order for wake and sleep cycles cons in elevators, ofice areas, cabins, canteens, gym,
across sensors to be consistent, a standard time gap is shops, etc. helps in proximity event detection in two
maintained across devices, which is typically a sleep ways: (a) Any detected proximity event confirms
posiof 30 seconds and a wake of 5 seconds synchronized to tively if two or more persons were already reported by
00:00:00 hours of UTC (Coordinated Universal Time). person tagged sensors. (b) Usually infectious diseases
The data in relational database and graph is deleted stay active at a space for some period of time.
Therebased on some pre-set expiry timestamp to maintain fore, a proximity event with a person at time T1 and
a sliding window based on business logic or infection another person with time T2 can trigger an indirect
risk period (usually 30 days for COVID-19 scenario). proximity link between the two persons, if | 1 −  2|
Geo-spatial queries may be directly run on spatially in- are within the infectious period relevant to that space.
dexed graph database or be optionally run on the spa- (4) Dynamic zone tagged sensor: sensors tagged with
tial layer of the relational database (to use spatial geo- transport can keep track of proximity events that
ocmetric algorithms) by entangling with graph queries, cur when a person stays in that zone. It is important to
however, in enterprise scenarios (small geo-location keep track of the time span a person is in this dynamic
coverage), such needs are found to be rare. zone, as the entry and exit times will reveal the spatial
zones of entry and disembarkment. This needs to be
matched with other persons having overlapping
prox2.1. Entities as Graph Nodes imity events but diferent entry or exit points.
ContaA necessary part of graph modeling is identification gion can be tracked in the related entry and exit spaces
of entities and its representation for fast graph queries (like bus stops, airports, parking lots, stations) even if
and traversal. Here, the nodes are of 4 types: those spaces have bad sensor coverage.
      </p>
      <p>
        (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) Person: humans having features (attributes) like Locations can be sensed from GPS or cellular tower
age, risk factors that remain static over a period of estimates in outdoor settings. In indoors, various
lotime. Dynamic attributes are derived or calculated over calization techniques leveraging radar, RFID, Wifi, BLE
an interval – risk estimation, proximity activity and can be used. There are other possible node types
specifhealth estimate. A necessary attribute of a person in a ically mobile objects (like parcels, posts, shopping items,
contagion scenario is the person’s current health sta- currency notes) that can play a part in the infection
tus - infected, healthy or susceptible. spread. However, as proper sanitization practices can
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) Person tagged sensor: sensors like wearable smart mitigate these occasional object based contagion risks,
devices and smartphones are tagged to a person – that it is kept out of scope of the paper.
      </p>
      <sec id="sec-2-1">
        <title>2.2. Proximity Events as Graph Edges</title>
        <p>As presented in Fig. 2, a new proximity event is
entered into the graph as an edge with start time
(preferably UTC timestamp) and time duration (say seconds)
as necessary attributes. In the graph, relations are
maintained between persons and not devices (bearing
sensors) as same device may be used by diferent persons
over a period of time. So person and zone resolution
at the time of graph population is of high importance
– proximity event edge keeps an attribute source (like Figure 3: Visualization of proximity relations (with number
device sensor) to handle any future backtrack queries. of interactions and duration) given a person and depth level
Indirect proximity events are inferred using transitive
relation of person and zones with timestamp constraints.</p>
        <p>
          By nature humans are mobile, hence it makes sense to duration of contact, the number of instances of
conkeep a private record of all other nodes (places and tact, number of infected visiting a specific zone and the
other persons) visited within a sliding interval of time time spent there. Other queries can detect crowds at
period as edges. Because, two nodes in this graph can locations based on large number of temporally
collidhave multiple edges (proximity instances) and prox- ing proximity events – those zones need to be marked
imity events are by definition both way, the resulting as ‘at risk’ zones. Crowd detection will be based on
graph is an undirected Multigraph. Due to social dis- spatial queries on spatio-relational database (say
Posttancing measures, this graph will be sparse and have GIS) if dealing with latitude and longitude values (GPS).
local community clusters. Hence, adjacency list is the
best way to represent the relations between |V|
vertices (nodes) and |E| edges, resulting in 2*|E| space com- 3.2. Centrality Measures
plexity. A hash index on the nodes will aid in O(
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
lookup and fast adjacent node traversals. In a sparse In managing pandemics and enforcing social
distancgraph with community clusters, graph traversal us- ing, it becomes crucial to find the specific zones that
ing breadth first search is the optimal approach, which are hotspots and which persons have highest
capabiltakes time around (  +   ) ∗   /  , for a depth up to ity to spread infection. Usage of graph centrality
mealevel   .   and   are respective vertices and edges at sures helps in identification of critical nodes.
a community cluster in the graph of cluster diameter Degree centrality, measured by the number of unique
  (largest edge distance or number of nodes in path edge links of a node to other nodes can be thought of
between two nodes in that cluster or sub-graph). the relative risk of getting infected if the local
community sees an infection outbreak.
        </p>
        <p>Closeness centrality of a node is: C(i) = ∑   , where
3. Graph Analytics   is the geodesic distance from node i to node j
(number of links in the shortest path from node i to node
Graph analytics deals with pairwise relationship be- j). Closeness reveals how long it takes for infection to
tween two entities at a time; as well as structural char- lfow from one node to others in the graph.
acteristics of the graph as a whole. Graph Analyt- Betweenness centrality signifies that a node is
cruics also includes visualization aspects, insights gained cial or forms a bridge for spreading of infection.
Examfrom complex query execution and graph algorithms. ples are person nodes who visit many other nodes like
doctor, courier and security staf. Examples of bridge
3.1. Graph Queries and Visualization space nodes are market and transport zones.
Betweenness centrality is measured as, b(i) = ∑     /   , where
   is number of shortest paths from node j to node k
(j and k ≠ i) and g   is number of shortest paths from
node j to node k passing through node i.</p>
        <p>There are other graph centrality measures [4] like
Eigenvector Centrality. However, it requires
calculations over adjacency matrix where as the graph model
shown is based on adjacency list – hence not optimal.</p>
        <p>In contact tracing scenario, one of the primary
requirement is to find the chain of persons an infected person
has come across. This can be done by graph
traversal and the results can be showed in front-end
visualization as shown in Fig. 3. For privacy reasons, the
exact identity of a person is not disclosed and an
encrypted alias is used. Other related graph queries are
time overlap between visits to specific zones, the total</p>
      </sec>
      <sec id="sec-2-2">
        <title>3.3. Graph Structure Measures</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Proximity events between 2 persons represent a Dyad in graph, where as proximity event between 2 persons and a mutually visited zone forms a Triad. This are the simplest forms of graph structures.</title>
      <p>Eccentricity is the maximum distance from a given
a graph is the maximum eccentricity value.
Eccentricity calculation aids in understanding contagion’s
spread from one node to the others. If infection
propagates from a node A towards some specific nodes in
the graph in the smallest number of steps, it means
that the node A has better propagation eficiency.</p>
    </sec>
    <sec id="sec-4">
      <title>Density serves as an important graph structure mea</title>
      <p>sure, as more dense a graph is (more connected nodes),
the more the risk of infection spreading. Density is
calculated as: D(G) = 2 ∗ | |/(| | ∗ (| | − 1)).</p>
      <p>A clique is a graph (or subgraph) in which every
node is connected to every other node. Finding maxi- 
mum cliques in a graph provides the largest set of
common links – hence, strong communities can be discov-  
node to all other nodes in a graph. The diameter of Figure 4: Importance of Community Clusters in Contagion
ered with the following risk – if any one is infected,
it will result in a high probability of others also
getting infected. Another structure, K-cores, which are
not necessarily cohesive groups, but indicate areas of
a graph which contain clique-like structures.</p>
      <p>Clustering coeficient measures the drift of nodes to
form dense subgraphs. The clustering coeficient C of
a node i shows how its neighbors are connected with
each other. The local (node) clustering coeficient is</p>
      <p>calculated as follows: C = 2*T / ( K * (k - 1) ), where</p>
      <p />
      <sec id="sec-4-1">
        <title>T is the number of distance triangles with node i and</title>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>K * (k - 1) is the maximum number of possible con</title>
      <p>nections in neighbors of i. A large C implies that the
graph is well connected locally to form a cluster.
Average graph clustering coeficient is: C = 1/n *
∑   .</p>
      <sec id="sec-5-1">
        <title>3.4. Community Detection</title>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>In contact tracing and related analytics, it is important</title>
      <p>to discover communities from the proximity events so
that at risk communities can be identified and infected
communities can be kept in isolation. As seen from
tection methods can be divided into three categories:
divisive, agglomerative, and optimization algorithms.
Some popular algorithms [5] are K-cliques,
hierarchical clustering, spectral bisection, graph partioning,
modularity optimization and link communities. One of the
methods suitable for large graphs is Louvain Method
[6] that uses graph modularity and has time
complexity as O(|V|log|V|). Modularity is a measure of the
struc
ture of the graph and is defined as, Q = ( |E 
| ) / E , where |E
| is the number of links
connect| - |E
ing nodes that belong to the same community and |E
| is the estimated |E</p>
      <p>
        |, if links were random. The
Louvain Method works as follows: (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) By optimizing
modularity locally on all the nodes, small communities
of nodes are found. (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) Small communities are then
grouped into one node, and the first step is repeated.
 −
      </p>
      <sec id="sec-6-1">
        <title>3.5. Contagion and Risk Propagation</title>
        <p>Some work has been done on contagion and risk
estimation [7] [8]. However, they are dependent mostly
on external factors to estimate the risk. Here, a method
is presented that uses a combination of graph
modeling, graph structure measures and infection properties
to estimate risk for a person or community.
Contagion is dependent on physical proximity, hence
network difusion techniques can be helpful here. A
related problem is Link Prediction: the task here is to
predict whether there will be a link between two nodes,
provided that link does not pre-exist. Node
similarities can be found using measures [9] for analyzing the
proximity of nodes in a graph such as a degree
distribution, common neighbors, preferential attachment,</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Jaccard coeficient and Leicht-Holme-Newman Index.</title>
      <p>Highly influential nodes (hubs) in the graph (as
highlighted in Section 3.2) enable a swift spread of an
infection through the graph. On the other hand, poorly
connected or periphery nodes would relax the
spreading of infection and permit only a tiny portion of the
graph to get exposed to the difusion. Graphs with lots
of localized clusters (in case of highly fragmented
lowdensity graph) may limit the spread of an infection
and will face hurdles in establishing any momentum,
whereas dense graphs with few gaps (or few boundary
of Graph Analytics for proximity graph based contact
as root node;
tracing in pandemic scenarios.
spanners) would assist the dissemination of contagion.</p>
      <p>Earlier discussions in the paper were around
finding risks in the graph structure, nodes and
communities. Certain techniques like belief propagation and
label propagation are popular in general graph analytics,
however in this scenario, what is needed is a
knowledge wrapper on breadth first algorithm which can
match the scale of incoming proximity event streams.
Finally the paper presents an approach to propagate
risk across proximity graph (Algorithm 1). Due to
storage of last propagation update attribute at each
connected node, if there is a sudden stop due to server load
or time constraints, the risk propagation can start from
last checkpoint. This avoids updating the same node
risk attribute more than once. The risk estimate
propagation deals with three damping factors: x – takes care
of contagion speed and is determined by disease stage
of infected person (incubation, prodromal, illness,
decline and convalescence), y – to handle influence of
selected centrality measure, z – to handle relative
community density’s impact on risk.</p>
    </sec>
    <sec id="sec-8">
      <title>In conclusion, the paper described various aspects</title>
      <p>ing &amp; contact tracing in enterprise scenario, CPD
measures in social networks: a survey, Social
network analysis and mining 8 (2018) 13.</p>
      <p>A survey on theoretical advances of
community detection in networks, Computational
[6] S. Ghosh et. al., Distributed louvain algorithm for
graph community detection, in: IPDPS 2018, IEEE,
2018, pp. 885–895.
[7] N. Fenton et. al., A privacy-preserving bayesian
network model for personalised covid19 risk
assessment and contact tracing, medRxiv (2020).
[8] T. Bhattasali, Pandemic analytics to assess risk of
covid-19 outbreak (2020).</p>
      <p>A survey of link prediction
in complex networks, ACM computing surveys
node label;</p>
    </sec>
    <sec id="sec-9">
      <title>Result: Risk Propagation to a specific depth</title>
      <p>under specified time constraint.
Parameters:</p>
      <sec id="sec-9-1">
        <title>E ← set of edges or links in G;</title>
      </sec>
      <sec id="sec-9-2">
        <title>V ← set of vertices or nodes in the graph G;</title>
      </sec>
      <sec id="sec-9-3">
        <title>A ← external inputs like node health (Aℎ) and</title>
        <p>attributes like degree; specified with subscript;
A</p>
        <p>← attribute to store last risk propagation
software ← (.) link to software modules and
Begin:</p>
      </sec>
      <sec id="sec-9-4">
        <title>D ← 0 // current graph traversal depth level;</title>
        <p>v ← V(i) ← node i of set V is infected; hence
v(A ) ← 1 * x * w // infected person with</p>
      </sec>
      <sec id="sec-9-5">
        <title>Q ← queue is created and intitialized with V(i)</title>
        <p>satisfied do
R ← currrent root node at depth level D;
while Q ≠ empty And time constraint =
sudden halts ;
v( A ) ← V(i) label //helps recover from

v ← visited //if node is not visited earlier;
v ← Q.dequeue( ) /* Removing vertex v
from queue whose neighbour will be
visited next */;
v( A ) ← R( A ) * (1 - D / M) //risk
if v has relative Low Centrality Measure
v( A ) ← y * v( A ) //y is a damping

then
end
then
end
v( A ) ← z * v( A ) //z is a damping

if w ≠ visited And w( A ) ≠ v( A )
previous halts;
end</p>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>Q.enqueue( w ) //Stores w in Q;</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>M. J. M.</surname>
          </string-name>
          <article-title>Chowdhury et</article-title>
          . al.,
          <source>Covid-19 contact tracing: Challenges and future directions</source>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>N. e. a.</given-names>
            <surname>Ahmed</surname>
          </string-name>
          ,
          <article-title>A survey of covid-19 contact tracing apps</article-title>
          , arXiv preprint arXiv:
          <year>2006</year>
          .
          <volume>10306</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>V.</given-names>
            <surname>Chandel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Banerjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ghose</surname>
          </string-name>
          ,
          <article-title>Proxitrak: A robust solution to enforce real-time social distanc</article-title>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhao</surname>
          </string-name>
          , [9]
          <string-name>
            <given-names>V.</given-names>
            <surname>Martínez</surname>
          </string-name>
          et. al., foreach neighbors W of v ∈ G do /
          <article-title>/checks infected node label matches for end</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>