<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ORADIEX: A Big Data driven smart framework for real-time surveillance and analysis of individual exposure to radioactive pollution</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hadi Fadlallah</string-name>
          <email>Hadi.Fadlullah@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yehia Taher</string-name>
          <email>yehia.taher@uvsq.fr</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rafiqul Haque</string-name>
          <email>Rafiqul.Haque@intelligencia.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ali Jaber</string-name>
          <email>ali.jaber@ul.edu.lb</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Intelligencia R&amp;D</institution>
          ,
          <addr-line>Paris</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Lebanese University</institution>
          ,
          <addr-line>Beirut</addr-line>
          ,
          <country country="LB">Lebanon</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Université de Versailles - Paris-Saclay</institution>
          ,
          <addr-line>Versailles</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <fpage>52</fpage>
      <lpage>56</lpage>
      <abstract>
        <p>- Radiation pollution has been always a critical concern, since it can cause a huge damage to humans and for nature. To minimize the damage, governments are collecting and monitoring radiation level using advanced systems. In the past years, Big data technologies such as distributed file systems, NoSQL databases and stream processing technologies was implemented in the radiation monitoring systems to improve their abilities to handle huge volume of data coming from different sources in a high speed. As Big data technologies are being improved frequently to handles the fast growth of the data, these systems need to be updated and improved periodically to adopt new technologies and to guarantee a higher control over radiation exposure. In this paper, we proposed a system called ORADIEX which is an improvement of our previous published work RaDEn [2]. It has the ability to (1) reading data from sensors and different sources, (2) processing data in real-time, (3) stores raw radiation data as it comes from sources, (4) clean data and stores it in a time-series database, (5) visualize and monitor data in real-time, (6) send alert when a high radiation level is detected and (7) allow performing advanced data retrieval operations over raw and processed data. In addition, this system was implemented and tested using a real dataset provided by the Lebanese Atomic Energy Commission (LAEC-CNRS).</p>
      </abstract>
      <kwd-group>
        <kwd>Radiation</kwd>
        <kwd>data engineering</kwd>
        <kwd>radiation monitoring</kwd>
        <kwd>real-time processing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Data,</p>
    </sec>
    <sec id="sec-2">
      <title>I. INTRODUCTION</title>
      <p>
        Preventing and controlling radioactive exposures still one
of the most critical duties of governments and researchers,
since it has a catastrophic effect on every living beings [1].
Prevention activities can be classified into three main
categories: (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) physical protection, (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) radiation monitoring
and (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) handling exposures.
      </p>
      <p>Radiation monitoring is considered as the most
challenging part, since it requires building intelligent systems
that are able to collect, analyze, visualize and raise alert
when an exposure is detected.</p>
      <p>
        Due to the fast technology growth, collecting radiation data
can be done from a wider variety of data sources such as
small wireless sensors, mobile phones, smart watches. In
addition, the data management technologies are improved in
frequently to be able for handling the data sources growth.
To be able to handle data coming from multiple data sources
in real-time, radiation monitoring systems must adopt the
new data technologies and need to be improved periodically.
Several data engineering systems that relies on new data
technologies such as NoSQL databases, distributed file
system were proposed in literature such as [3][4][5][6][7]
and other solutions. These solutions have two main
limitations that (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) they cannot handle a huge volume of
data in real-time, (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) fault-tolerance and scalability are not
always guaranteed. As earlier, we proposed a radiation
engineering system called RaDEn [2] which is built using
Big Data technologies such as Hadoop 1 distributed file
system, and real-time data ingestion tools, this system
solved the problem related to reading and storing huge
radiation data but it still have many limitations as it cannot
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) visualize data from different sources, (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) notification
system was not implemented,(
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) historical data cannot be
visualized since it is saved in raw format,(
        <xref ref-type="bibr" rid="ref4">4</xref>
        ) historical alert
information are not stored, (
        <xref ref-type="bibr" rid="ref5">5</xref>
        ) real-time graph is very basic
and shows only last 30 measurements,(6) cleaned and
processed data was visualized without being saved and (7) it
doesn't have a user friendly interface.
      </p>
      <p>
        In this paper, we are proposing a radiation monitoring
system called ORADIEX were we improved the old system
RaDEn [2] by (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) adding a distributed time-series NoSQL
database (InfluxDB2) to store data after being cleaned and
processed, (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) adopting a powerful real-time monitoring
framework (Grafana3) that has a user friendly interface and
allows drawing real-time graphs from different sources,
designing dashboards, visualizing data already stored,
saving historical information about exposures and sending
email alerts when a radiation exposure detected.
      </p>
    </sec>
    <sec id="sec-3">
      <title>1 http://hadoop.apache.org 2 http://influxdata.com 3 http://grafana.com</title>
      <p>The rest of this paper is organized as follows. In Section 2,
we briefly introduce our solution called ORADIEX. The
data processing is described in section 3. The development
of ORADIEX will be detailed in Section 4. Section 5
demonstrates ORADIEX. We conclude our work in Section
9.</p>
    </sec>
    <sec id="sec-4">
      <title>II. AN OVERVIEW OF ORADIEX</title>
      <p>ORADIEX is a data engineering platform that has the
ability to read huge amounts of data from multiple sources
with different formats using a scalable and distributed
message broker, clean and process the collected data, store
data within a scalable storage, visualize data in real-time
graphs and raise alert when a high radiation level is detected.</p>
      <p>ORADIEX stores data in its raw format within a scalable
and fault-tolerant data lake to insure data governance, and
stores processed and cleansed data in JSON format within a
scalable NoSQL time-series database that allows user to
perform data retrieval operations.</p>
      <p>ORADIEX can handles data caught by sensors in
realtime, also it can handle data from other sources such as
databases or flat files.</p>
      <p>ORADIEX architecture is composed of 6 layers as
shown in Figure-1:
Data Sources: data sources consist of data generated from
sensors, or data stored within flat files or relational
databases
Data Ingestion: This layers consist of a scalable message
broker and other ingestion tools that allows reading data
generated from the different sources and send it at the same
time to the data processing and raw data storage layers.
Raw Data Storage: This layer consists of a scalable data
lake built on the top of a distributed file system where data
is stored as it comes from the sources without editing.
Beside of the data lake, metadata is stored in a metadata
repository that allows users to perform data retrieval
operations.</p>
      <p>Data Processing: This layer relies mainly on a distributed
data processing framework that allows processing huge
volume of data. In this layer, data are cleansed and
transformed into JSON format to be stored within the
processed data storage layer.</p>
      <p>Processed Data Storage: This layer consists of a
distributed and scalable data warehouse that has the ability
to store a time series data having different attributes.
Data Visualization: This layer allows user to create
dashboards that can visualize newly inserted data into the
processed data storage layer in real-time. Also, it gives the
ability to perform data retrieval operations and to send
email notifications when a radiation exposure is detected.</p>
    </sec>
    <sec id="sec-5">
      <title>III. DATA PROCESSING</title>
      <p>Since data comes from different data sources, the
incoming data quality must be assessed and improved. In the
data processing layer, we implemented a simple data
cleansing and quality assurance process using the following
steps:
1. The measurement date is validated; if it is not a valid date
time value the data is rejected, else it will be converted to
a universal date time format (yyyy-MM-ddTHH:mm:ss).
2. The other measurements are validated; if any
measurement cannot be parsed to a numeric value it will
be removed.
3. All empty strings are replaced with NULLs.
4. Data is converted to standard format (JSON4), to be stored
in the processed data storage layer.</p>
    </sec>
    <sec id="sec-6">
      <title>IV. DEVELOPMENT OF ORADIEX</title>
      <p>Each layer of ORADIEX was deployed on a separate
virtual machine where Ubuntu 5 16.04 LTS was used as
operating system.</p>
      <p>For data ingestion, we have used one virtual machine
where we installed and configured an Apache Kafka6 broker
to read data from different sources. Beside of the message
broker, we installed an Apache Flume7 agent to send data to
the data lake directly when it is received by the message
broker. In addition, we installed Apache Sqoop 8 on the
ingestion layer, to give the user the ability to import archival
data from relational databases directly into the data lake.</p>
      <p>We have chosen these technologies since they all
guarantee a high scalability and fault-tolerance.</p>
      <p>For the raw data storage, we have built a 4-node Hadoop
3.1.0 cluster where we configured one virtual machine
(node) to act as a master node and the others as slaves (data
nodes). We have set the replication factor to 3, so when data
is sent to the Hadoop master node it will be replicated in all 3
data nodes. The Hadoop Distributed File System (HDFS)
guarantee a high level of scalability and fault-tolerance.</p>
      <p>In addition, we have installed Apache Hive9 to store the
metadata so it can be used to perform data retrieval
operations using SQL-like language over the raw data.</p>
      <p>For the data processing, we have deployed a single node
Apache Spark cluster on a separate virtual machine. Apache
Spark is a distributed, scalable and fault-tolerant processing
4 http://json.org
5 http://ubuntu.com
6 http://kafka.apache.org
7 http://flume.apache.org
8 http://sqoop.apache.org
9 http:/ /hive.apache.org
framework, that has the ability to process data at rest and in
real-time.</p>
      <p>To implement the processing logic, we coded a Python10
script that uses PySpark library which is a wrapper of Spark.
The script listens to the message broker and read newly
added data, then it filters the bad rows as described in the
data cleansing section. After ensuring the data quality, the
data is transformed to a JSON format to be stored in the
processed data storage layer.</p>
      <p>To store the processed data, we used a time-series
NoSQL database called InfluxDB which guarantee a high
scalability.The reason for using a time-series database is that
the main key in the data we are working on is the date and
time of measurement.</p>
      <p>The data is stored in JSON format within InfluxDB. Each
JSON value is composed of 4 parts as shown Figure-2:
• Measurement: The name of the table where data is stored
• Time: The date and time of the measurement
• Fields: A list of values that can be visualized (rain level,
radiation level, …)
• Tags: A list of values that can be used to filter data (station
name)</p>
      <p>For visualization, we have installed a tool called Grafana
used for real-time monitoring. It allows designing
dashboards to visualize data, and querying the data stored
within the InfluxDB. In addition to this , it allows defining a
radiation level limit for each graph (we can define one for
each station since the radiation level is affected by the
weather and temperature factors which differs between
locations) and to send email alert when this limit is reached.
Grafana was installed on same machine of InfluxDB to
guarantee a real-time visualization.</p>
    </sec>
    <sec id="sec-7">
      <title>V. DEMONSTRATION OF ORADIEX</title>
      <p>In this section, we demonstrate ORADIEX. For our
demonstration we used a radiation dataset supplied by the
department of environmental radiation control at the
Lebanese Atomic Energy Commission (LAEC-CNRS).
10 http://python.org</p>
      <sec id="sec-7-1">
        <title>A. Dataset</title>
        <p>Accessing the sensors or the web server (relational
database) was not made due to confidentiality issues. The
dataset was provided as flat files with data collected from
2015-08-01 to 2016-08-01 from a testing sensor that was
installed in Beirut. The data set structure is described in
Table-1.</p>
        <sec id="sec-7-1-1">
          <title>Column name</title>
          <p>Measurement_ti
me
dose_rate</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Temperature Rain_Level Sensor_battery_ power</title>
    </sec>
    <sec id="sec-9">
      <title>External_batter y_power</title>
    </sec>
    <sec id="sec-10">
      <title>Station_Name</title>
      <sec id="sec-10-1">
        <title>Data Type Unit Description</title>
        <p>Datetime Measurement date
and time
Numeric nSv/h The radiation dose
rate
Numeric C Temperature
Numeric mm/h The rain level
Numeric mV The sensor
internal battery
power
Numeric mV The sensor
external battery
power
Text The sensor station
name</p>
        <p>Table 1 - Data set structure</p>
        <sec id="sec-10-1-1">
          <title>B. Starting and Configuring ORADIEX</title>
          <p>First of all, we started all virtual machines (Ingestion,
Processing, Raw Storage, Processed data storage). We
started the following services:
• Apache Kafka, Apache Flume services on the Ingestion
machine.
• Hadoop Cluster (Name node and data nodes) on the Raw
data storage machines.
• Apache Spark Cluster on the data processing machine.
• The python script on the data processing machine.
• InfluxDB and Grafana Services on the monitoring
machine.</p>
          <p>To simulate data ingestion from sensors, we have created a
directory where we must copy data set provided, and we
created a terminal script that creates a listener on this folder.
When any file is inserted, the script loops over the lines and
send them one by one to the Apache Kafka producer.</p>
          <p>Using Grafana, we created a dashboard that contains one
graph that visualize the radiation dose rate, the rain level
and the temperature data received from only Beirut Station
and we set the radiation limit to 45 as shown in Figure-3.
Moreover, we have configured the email notification
settings where you can add many recipients and write the
custom message you want as shown in the Figure-4.</p>
        </sec>
        <sec id="sec-10-1-2">
          <title>D. Retrieving Data</title>
          <p>As shown in Figure-7, we can perform data retrieval
operations from the InfluxDB database using Grafana
interface, and the result is visualized as a graph.</p>
        </sec>
        <sec id="sec-10-1-3">
          <title>C. Radiation Monitoring</title>
          <p>After starting and configuring ORADIEX, we copied the
data set to the ingestion directory. The data was visualized
in real-time on the dashboard we created (Figure-5).</p>
          <p>In addition, notification email was received when
radiation level as exceeded at the same time all alert was
recorded in the dashboard alert list as shown in Figure-6.</p>
          <p>In this paper, we designed a solution called ORADIEX
which is an improved version of our previous work RaDEn
[2]. In this version, we added a NoSQL database that stores
processed data as a time-series, and we replaced the old
visualization tool (Matplotlib python library) by a powerful
real-time monitoring tool called Grafana that has a user
friendly interface and allows real-time monitoring, data
retrieval and sending notification when a radiation exposure
occurs.</p>
          <p>
            We tried to cover all the limitations we identified in
RaDEn at the beginning of our work but unfortunately, the
implementation we have made have some limitations due to
the following issues: (
            <xref ref-type="bibr" rid="ref1">1</xref>
            ) We did not get permissions to
access the sensors or the databases. (
            <xref ref-type="bibr" rid="ref2">2</xref>
            ) The research time
limit.
          </p>
          <p>A list of works is lined up to be done in future. We can
enrich the data by integrating the free weather data offered
by online API's. Also, we can benefit from search engines
such as Solr 11, Elastic Search 12 to perform data retrieval
operations from raw data.
11 http://lucene.apache.org/solr
12 http://elastic.io</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>[1] "Ionising Radiation and Human Health," Australian government -</article-title>
          department
          <source>of health, 07 December</source>
          <year>2012</year>
          . [Online]. Available: http://www.health.gov.au/internet/publications/publishing.nsf/Content /ohp-radiological
          <article-title>-toc~ohp-radiological-05-ionising</article-title>
          .
          <source>[Accessed 17 September</source>
          <year>2018</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Fadlallah</surname>
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taher</surname>
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jaber</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <article-title>"RaDEn: A Scalable and Efficient Radiation Data Engineering"</article-title>
          ,
          <source>in International conference of Big Data and Cyber Security Intelligence</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Avram</surname>
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Folea</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dan</surname>
            <given-names>Radu</given-names>
          </string-name>
          &amp;
          <article-title>Astilean A., "WIRELESS RADIATION MONITORING SYSTEM,"</article-title>
          <source>in European Conference on Modelling and Simulation</source>
          , Romania.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Baker</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Davidson</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Evans</surname>
            ,
            <given-names>T. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hamilton</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jarrell</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Joubert</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <article-title>"High performance radiation transport simulations: preparing for Titan,"</article-title>
          <source>in International Conference on High Performance Computing, Networking, Storage and Analysis</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Jeong</surname>
            ,
            <given-names>M. H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sullivan</surname>
            ,
            <given-names>C. J.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <article-title>"Complex radiation sensor network analysis with big data analytics,"</article-title>
          <source>in In Nuclear Science Symposium and Medical Imaging Conference</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Liao</surname>
            ,
            <given-names>T. S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>C. C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chou</surname>
            ,
            <given-names>C. C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hwang</surname>
            ,
            <given-names>C. H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>Y. W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tsai</surname>
            ,
            <given-names>D. P.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Chen</surname>
          </string-name>
          , T. Y.,
          <article-title>"Simplified algorithm of ionizing radiation detecting based on image sensor,"</article-title>
          <source>in Instrumentation and Measurement Technology Conference Proceedings</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>K. S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kojima</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suzuki</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Naito</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Ogawa</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <article-title>"RALFIE: a life-logging system for reducing potential radiation exposures,"</article-title>
          <source>in the 1st ACM SIGSPATIAL International Workshop on the Use of GIS in Emergency Management</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>