<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Verification of the Formal Approach to Data Fusion</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ildar R. Baimuratov</string-name>
          <email>baimuratov.i@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stephan D. Morozov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Natalia A. Zhukova</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ITMO University</institution>
          ,
          <addr-line>Saint Petersburg</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We present a comprehensive overview of numerous existing data fusion models to show that in substance they are informal. Informal models are hardly applicable in designing real data fusion systems. Therefore, we represent and continue to develop the formal approach to data fusion proposed in our previous works. It includes, first, a formal language based on first order predicate logic and, second, category theoretic model derived from the formal language. We verify our formal approach by formalizing specification of a data fusion system on the logical language and modeling it by means of the category theoretic model.</p>
      </abstract>
      <kwd-group>
        <kwd>Data fusion Logic Category theory</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Today the fields of data analysis and machine learning are flourishing. But this
field is rather a collection of different unlinked methods and techniques than
a comprehensive and consistent system like more mature scientific fields. The
situation is analogous to one before industrial revolution: there were different
handicrafts but inventing mass production integrated separated craftsmen into
extensive manufacturing organization and allowed to dramatically increase
productive efficiency. Therefore, it is crucial to develop an entire theory for data
analysis and machine learning field. In this article we continue an attempt to
develop such formal approach.</p>
      <p>We considered data fusion, being the process of integrating data, as a source
for the required theory and data fusion models as prototypes. We surveyed
different data fusion models but it turned out that, unfortunately, none of them is
formal enough and grounded on any fundamental mathematical theory.
Therefore, we had to develop a formal theory for data fusion, that would be rooted in
fundamental mathematics. We consider logic and category theory as such
mathematical foundation. The overview of existing data fusion models is presented in
the Section 2.</p>
      <p>The first step at developing any formal theory is specifying the language that
will be used for formalizing subject field. Though contemporary data analysis
techniques use different mathematical notions, they hardly could be considered
as an integrated formal context cause of its usage inconsistency. Thus, our first
goal was to specify a formal language that would be able to comprise all existing
techniques. For this purpose, we use the language of first order predicate logic.
Specification of the first order predicate logic is presented in the Section 3.1.
After that, we analyze the structure of logical constructions be means of category
theory in order to derive a formal mathematical model, which intended to be
the formal model of data fusion. The resulting model is presented in the Section
3.2.</p>
      <p>The purpose of the current work is to examine the proposed formal approach
by applying it to description of a real data fusion system. Its application means
formalizing the description on predicate logic language. If the description is
successfully formalized, it implies that the category theoretic model is verified
as the model is derived from the logical language. As result, we achieve a formal
model of the considered data fusion system.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Existing data fusion models overview</title>
      <p>Developing an integrated formal theory of data analysis and machine learning,
we considered data fusion models as prototypes. We present an overview of most
notable data fusion models.
2.1</p>
      <sec id="sec-2-1">
        <title>JDL fusion model</title>
        <p>
          The JDL [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] was one of the first and most widespread data fusion systems,
was created in 1986 by the Joint directors of Laboratories data fusion working
group, as a result of a working at the US Department of Defense, to aid the
developments in military applications. The model consists of 4 levels [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]
1 Object refinement: data integration for getting improved representation
of individual objects.
2 Situation refinement: description of relations between objects and
environment.
3 Threat refinement: projecting of current situation in future and predicting
of consequences.
4 Process refinement: meta-process that monitors data- fusion processes to
maintain real-time performance.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Dasarathy model</title>
        <p>
          Dasarathy model [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] is a Data-Based Model. It is functionality oriented model,
rather than the task it is based on fusion function. There are three levels of
abstraction:
        </p>
        <sec id="sec-2-2-1">
          <title>a) Data</title>
          <p>b) Feature
c) Decision
And following function:
– DAI-DAO (data in data out) is the most basic data fusion method.</p>
          <p>The primary input and output is a raw data. The output is more accurate
and reliable. Data fusion is done immediately after the Data ’ s gathered
from the sensors.
– DAI-FEO (data in feature out). Extract the features and
characteristics of raw data. Feature extraction.
– FEI-FEO (feature in feature out). Set of features will be worked to
improve a feature or extract a new feature.
– FEI-DEO (feature in decision out). Decision can be taken based on
the set of feature as an input. The decision can be Pattern recognition and
pattern processing.
– DEI-DEO (decision in decision out). Obtain a new decisions from
the fused decisions. This is a high level fusion
2.3</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>Boyd control model</title>
        <p>
          Boyd control loop is the activity based model. This is the four stage cyclic loop.
This is also called OODA loop, which refers Observe, Orient, Decide, and Act.
Developed by the Military Strategist and U.S States Air Force Colonel John
Boyd in the year 1987 [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. There is four stages:
– Observe. Collect and pre-process all the data from sensor.
– Orient. Collect data is fused to clarify the current scenario.
– Decide. Based on the previous stage, (i.e. Orient) action plans are decided
for the future.
        </p>
        <p>– Act. Where is the plan which is decided in the previous stage is executed.
2.4</p>
      </sec>
      <sec id="sec-2-4">
        <title>Waterfall model</title>
        <p>The waterfall model is an hierarchical architecture which divided in 3 levels,
each one with two modules. The information outputted by one module will be
inputted to the next module:
– Data is gathered from the environment and properly transformed, delivering
not only the data processed but also information about the sensors to the
next level.
– The main features are extracted from the data from the previous module
and fused, thus reducing the quantity of data transmitted and increasing
their information richness.
– According to the processing from the previous levels, a scenario of events is
created and possible routes of action are assembled. The last module
(Decision Making) delivers enough information to the control module to calibrate
and configure the sensors.
2.5</p>
      </sec>
      <sec id="sec-2-5">
        <title>Luo and Kay Model</title>
        <p>Luo and Kay presented a hierarchical model [7], yet different from the waterfall
model. While in the waterfall model all data gathered is processed in a sequential
way for all modules, in the Luo and Kay model data from the sensors are
incrementally added on different fusion centers (multi sensor fusion), thus increasing
the level of representation, from the raw data or signal level to more abstract
symbolic representations of the data at the symbol level.
2.6</p>
      </sec>
      <sec id="sec-2-6">
        <title>Intelligence cycle model</title>
        <p>The intelligence cycle is also called Intelligence process by U.S Department of
Defense and the uniformed services. The intelligence cycle consist from four
stage:
– Collection. Collect the raw data from the environment.
– Collation. All the collected data is analyzed and correlated. Redundant and
unreliable data’s are discarded.
– Evaluation. The collated data’ s are fused and analyzed.
– Dissemination. Produce the decision based on the result of fused data from
the previous stage Evaluation
2.7</p>
      </sec>
      <sec id="sec-2-7">
        <title>Omnibus model</title>
        <p>
          Omnibus model [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] is a hybrid model which presents some features from other
model. It comprises a flow chart, a dual-perspective definition and a structured
repository of accumulated expertise, and consist from following levels:
– Observe. This level measures the environment, gathering and processing
data from the sensors, delivering the information to the second level
– Orientate. Here, the data is fused and the main features extracted in order
to reduce the amount of data.
– Decide. The third level concerns with the presentation of the processed data
to the human operator or/and act on the environment.
– Act. The model is in a closed loop with a control module to calibrate the
sensors.
2.8
        </p>
      </sec>
      <sec id="sec-2-8">
        <title>Thomopoulos Model</title>
        <p>
          Thomopoulos [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] also proposed a three level model, formed by the signal,
evidence and dynamics levels. In each level, data gathered is confronted with data
previously processed and stored, preserving a given order.
        </p>
        <p>1 The signal level process that information, performing correlations due to
the in existence of a mathematical model. Therefore, the data gathered is
correlated with information previously stored in the database, in a learning
process.
2 On the evidence level, data is combined at different levels of inference
based on a statistical model and the assessment required by the user (e.g.
decision making or hypothesis testing).
3 On the dynamics level a mathematical model is used to perform the data
fusion.</p>
        <p>As we see, none of these data fusion models has explicit formal definitions of
its elements and none of them has link to any fundamental mathematical theory.
Therefore, first of all we had to develop a systematic formal approach to data
fusion.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Formal approach to data fusion</title>
      <p>In [8] we proposed the formal framework for data fusion. It consists of the logic
language and the category theoretic model.
3.1</p>
      <sec id="sec-3-1">
        <title>Logic language</title>
        <p>The logic language for formalizing data fusion is in fact classic first order
predicate logic. It consists of
– individual variables: x1; x2; :::;
– individual functions: f1; f2; :::;
– predicate functions: P1; P2; :::;
– propositional variables: A1; A2; :::;
– propositional functions: :; ^; _; !; :::;
– quantifiers: 8; 9.</p>
        <p>Each individual function is n-placed, if n = 0, f is an individual constant.
Similarly, if propositional function is 0-placed, it is the propositional constant ?
or &gt;. Quantifier 8 is considered as generalized conjunction and 9 is considered
as generalized disjunction.</p>
        <p>The notion of object in predicate logic corresponds to the notion of term.
Terms are strings of symbols that are constructed according to the following
inductive definition:
– a variable x is a term,
– if f is an n-placed function and t1; :::; tn are terms, then f (t1; :::; tm) is a
term.</p>
        <p>Facts are denoted by formulas. There are atomic and complex formulas:
– if is an n-placed predicate P and t1; :::; tn are terms then P (t1; :::; tn) is an
atomic formula;
– if x is a variable and ; are formulas then
8x and 9x ;
: ;</p>
        <p>^ ; _ ; ! ;
are complex formulas.
– I(x) = X, where X
– I(f n) = An ! A;
– I(P n) = An ! f0; 1g.</p>
        <p>A;</p>
        <sec id="sec-3-1-1">
          <title>The fact that a formula</title>
          <p>M j= :</p>
          <p>The language of predicate logic is interpreted by models. A model M consists
of a universe A, a nonempty set of objects, and an interpretation function I,
which assigns structures of the universe A to structures of the language:
corresponds to truth is denoted by the notation</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>Therefore, if</title>
          <p>is an atomic sentence of the form P (t1; :::; tn),</p>
          <p>I( ) = 1 , M j= .</p>
          <p>M j= P (t1; :::; tn) , I(P (t1; :::; tn)) = 1.
If</p>
          <p>is a complex sentence:
– M j= 8x
– M j= 9x
– M j= :
– M j=
– M j=
– M j=
^
_
!
, for every a 2 X it is true that I( ) = 1;
, exists a 2 X, such that I( ) = 1;
, M 6j= ;
, M j= and M j= ;
, M j= or M j= ;
, M 6j= or M j= .
otherwise I( ) = 0.
3.2</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Category theoretic model</title>
        <p>Finally, we obtain the formal model for data fusion generalizing logical structures
by means of category theory. Analyzing definitions listed above we extract two
classes of objects:
– the universe A;
– truth-values T V = f0; 1g;
and three classes of mappings:
– terms T erm: A ! A;
– predicates P red: A ! T V ;
– propositions P rop: T V ! T V ;
where variables are considered as identity mappings</p>
        <p>idX : X ! X
and given mappings f : X ! Y and g : Y ! Z there is a composition
(f</p>
        <p>g) : X ! Z
for any objects X; Y; Z and mappings f; g. Resulting model is represented by the
following diagram:</p>
        <p>Term</p>
        <p>A</p>
        <p>Pred</p>
        <p>TV</p>
        <p>Prop</p>
        <p>We are going to verify our formal model expressing every logical structure in
category theoretical terms:
– individual variables: idA : A ! A;
– individual functions: A ! A;
– predicate functions: A ! T V ;
– propositional variables: idT V : T V ! T V ;
– propositional functions: T V ! T V ;
– quantifiers: T V ! T V .</p>
        <p>As complex logical structures are defined inductively, any further construction
is expressible in category theoretical terms as well.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Data fusion system formalization</title>
      <p>In order to verify our formal approach, we are going to consider an example of
data fusion system and apply the approach to its specification.
4.1</p>
      <sec id="sec-4-1">
        <title>Specification of the data fusion system</title>
        <p>Consider the following data fusion system [9]: Two data structures are given that
include three attributes (id (number), value(text/number), value(text/number)).
Data comes from a variety of sources, and for their comparison, a simple rule for
allowing entities is used up. Also a simple mapping of entities by coincidence of
values is used. In general, the experimental model consists of:
1. Deduplicated collection is an input collection, like A(id; a; b; c) or B(id; a; b; d)
2. Minimum Union operator is defined as the combination of outer union
with the subsequent removal of subsumed tuple.
3. Outer Union operation results in the union of two relations. If the schemes
do not match, the resulting scheme is the union of the two original. Outer
Union, in fact, is implemented using the FusionIndex index.
4. FusionIndex. For each id, all relevant records are obtained and those that
are absorbed by them are deleted
5. subsumption. The tuple t1 absorbs another tuple t2, if they:
– have the same schemes;
– in t2 there are more unknown (null) values than in t1;
– in t2 all known values coincide with the values in t1.
6. is_subsumed function. Checks whether one tuple is absorbed by a pairwise
comparison of attributes or a null test.
7. removeSubsumed function. Deletes all the captured records from the
tuple.
8. minUnion. It is necessary for construction of resulting tuples at realization
of Minimum Union.
9. Data Fusion operator. The Data Fusion operator is a special kind of
function that uses grouping to overcome conflicts. The basic idea is to group
different representations of the same entity by a common attribute, and
then to apply the conflict resolution functions to all other attributes. There
are two types of strategy for conflict resolution functions:
– The deciding strategy is to select some one value in some way (minimum,
maximum, random value);
– The mediating strategy is the aggregation of all values (mean, sum).
10. Fused collection is the resulting collection.</p>
        <p>For example, let us take two collections: A(id; a; b; c) and B(id; a; b; d).
Attributes a; b; c; d can contain null values, Attribute id is the same. Below is an
example of similar data for the collection</p>
        <p>A(id; name; age)</p>
        <p>fid : 7600; name : null; age : nullg; fid : 1500; name : zvk; age : 938g
B(id; name; inf o)
fid : 1500; name : null; inf o : nullg; fid : 7600; name : pjg; inf o : nullg
Also there are functions of calculating the mean, choosing a random nonzero
value, and concatenating. This merging of input collections according to specified
rules forms the Fused collection, with an average value for the age attribute, any
non-zero value for the name and for the attribute info, all the available values
will be concatenated. For the concerned collections the resulting collection is</p>
        <p>C(id; name; age; inf o)
{id:1500, name:zvk, age:938, info:null}, {id:7600, name:pjg, age:null, info:null}.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Logical formalization</title>
        <p>Analyzing structure of the data fusion system, we extract following elements:
– Symbols: id; a; b; c;
– Input collections: A(id; a; b), B(id; a; c);
– Operations: M inimumunion and Dataf usion;
– Resulting collection: C(id; a; b; c).</p>
        <p>To formalize these elements we will define to what kind of logical structures they
belong.</p>
        <p>– Symbols, being the simplest elements of the system, are individual constants,
or 0-placed individual functions. Therefore, symbol variables are individual
variables, or identity mappings for individuals.
– Collections are structures that have individuals as arguments. As collections
do not map individuals to other individuals, but only integrate them into
complex structures, they are predicates. For all values of argument id, being
a primary key, predicate corresponding to a collection should be defined,
therefore, the argument id is binded with the quantifier 8. All other
parameters are binded with the quantifier 9.
– Operation M inimumunion(M U ) joins two collections into one, i.e. it maps
collections to collections, therefore, M inimumunion is a propositional
function.
– Operation Dataf usion is a variable for some conflict resolution function for
individuals. These functions are min; max; random; mean; sum. All these
functions map individuals to individuals, therefore, Dataf usion is a variable
for individual functions. And as variables are identity mappings Dataf usion
is individual function as well.</p>
        <p>In result, the described data fusion system can be represented as the following
formula:</p>
        <p>8id((9a9bA(id; a; b) ^ 9a9cB(id; a; c)) !
9a9b9cC(id; mean(a); random(b); conc(c))).
4.3</p>
      </sec>
      <sec id="sec-4-3">
        <title>Category theoretic modeling</title>
        <p>We are going to compare the result of formalization of the described data analysis
system with the proposed category theoretical data fusion model. In result of
formalization we have the following logical structures:
– individual variables id; a; b; c;
– individual functions mean; random; conc;
– predicates A(id; a; b); B(id; a; c); C(id; a; b; c);
– quantifiers 8id; 9a; 9b; 9c;
– propositional function !MU .</p>
        <p>According to the category theoretic classification of logical structures, there are
following mappings:
– terms:
individual variables: id; a; b; c;
individual functions: mean; random; conc;
– predicates: A(id; a; b); B(id; a; c); C(id; a; b; c);
– propositions:
quantifiers: 8id; 9a; 9b; 9c;
operation !MU .</p>
        <p>Summing up, the result of formalization is represented in the Table 1 and
the following diagram:
String variables, being identical mapping, is omitted.</p>
        <p>It is notable, that we have quantifiers, which formalize the data structure,
not mentioned explicitly in the description of the data fusion system. The fact
that our formal system exposed them demonstrates the advantage of our formal
approach.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>Developing the formal approach to data fusion, we surveyed most notable data
fusion models and found out that none of them is formal enough. We
represented the formal approach proposed in the previous works, which includes
formal language based on the first order predicate logic and category theoretic
model derived from the formal language. After that, in order to verify the
proposed approach we considered an example of data fusion system, formalized it
with logical language and derived the category theoretic model of the system
from the result of formalization. This fact demonstrates applicability of our
formal approach. Moreover, it turned our that comprehensiveness of our formal
approach allows to expose elements of the system not mentioned explicitly in its
specification.
7. Luo, R. C., Kay, M. G.L.: Multisensor Integration and Fusion for Intelligent
Machines and Systems. Ablex Publishing Corp 1995
8. Baimuratov, I.R., Zhukova, N.A.: A formal framework for Data Fusion. International</p>
      <p>Journal of Applied Mathematics and Informatics 11, 56–64 (2017)
9. А. Е. Вовченко 1 , Л. А. Калиниченко 2 , Д. Ю. Ковалев: МЕТОДЫ
РАЗРЕШЕНИЯ СУЩНОСТЕЙ И СЛИЯНИЯ ДАННЫХ В ETL-ПРОЦЕССЕ
И ИХ РЕАЛИЗАЦИЯ В СРЕДЕ HADOOP. Информатика и ее применение T
8. вып 94 109 (2014)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Steinberg</surname>
            ,
            <given-names>A.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bowman</surname>
            ,
            <given-names>C.L.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>White</surname>
          </string-name>
          ,
          <string-name>
            <surname>Jr.</surname>
            ,
            <given-names>F.E.</given-names>
          </string-name>
          :
          <article-title>Revisions to the JDL Data Fusion Model</article-title>
          .
          <source>Proc. 3rd NATO/IRIS Conf</source>
          .,
          <string-name>
            <surname>Quebec</surname>
            <given-names>City</given-names>
          </string-name>
          , (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Steinberg</surname>
            ,
            <given-names>A.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bowman</surname>
            ,
            <given-names>C.L.</given-names>
          </string-name>
          :
          <article-title>Revisions to the JDL data fusion model</article-title>
          .
          <source>In: Handbook of Multisensor Data Fusion</source>
          , pp.
          <fpage>45</fpage>
          -
          <lpage>67</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Dasarathy</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Decision Fusion. Proc</article-title>
          . IEEE Computer Society Press, (
          <year>1994</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Fadok</surname>
            ,
            <given-names>D.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boyd</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Warden</surname>
          </string-name>
          , J.:
          <article-title>Air Power's Quest for Strategic Paralysis, Maxwell Air Force Base AI</article-title>
          . Air University Press, (AD-A291621)
          <article-title>(</article-title>
          <year>1995</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Bedworth</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          and
          <string-name>
            <given-names>O</given-names>
            <surname>Brien</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.:</surname>
          </string-name>
          <article-title>The Omnibus model: a new model of data fusion</article-title>
          .
          <source>DERA Malvern preprint</source>
          ,
          <year>1999</year>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Thomopoulos</surname>
            ,
            <given-names>S. C.</given-names>
          </string-name>
          :
          <article-title>Sensor integration and data fusion</article-title>
          .
          <source>Proc. SPIE 1198</source>
          ,
          <string-name>
            <surname>Sensor Fusion</surname>
            <given-names>II</given-names>
          </string-name>
          :
          <source>Human and Machine Strategies</source>
          ,
          <fpage>178</fpage>
          -
          <lpage>191</lpage>
          (
          <year>1989</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>