<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Scene-Adaptive Optimization Scheme for Depth Sensor Networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Johannes Wetzel</string-name>
          <email>johannes.wetzel@hs-karlsruhe.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Samuel Zeitvogel</string-name>
          <email>samuel.zeitvogel@hs-karlsruhe.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Astrid Laubenheimer</string-name>
          <email>astrid.laubenheimer@hs-karlsruhe.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael Heizmann</string-name>
          <email>michael.heizmann@kit.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Industrial Information Technology (IIIT), Karlsruhe Institute of Technology (KIT)</institution>
          ,
          <addr-line>Karlsruhe</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Intelligent Systems Research Group (ISRG), Karlsruhe University of Applied Sciences</institution>
          ,
          <addr-line>Karlsruhe</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <fpage>111</fpage>
      <lpage>116</lpage>
      <abstract>
        <p>In this work a scheme for scene-adaptive depth sensor network optimization is presented. We propose to fuse the knowledge inferred by the sensor network into a common world model while at the same time exploiting this knowledge to improve the perception and post processing algorithms themselves. Moreover, we show how our optimization scheme can be applied to improve the use cases of disparity estimation as well as people detection with multiple depth sensors.</p>
      </abstract>
      <kwd-group>
        <kwd>depth sensor networks</kwd>
        <kwd>context aware</kwd>
        <kwd>knowledge based optimization</kwd>
        <kwd>scene-adaptive</kwd>
        <kwd>optimization</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Low cost commodity depth sensors are an emerging technology and are applied
to a broad field of applications such as people detection and tracking, 3D
reconstruction or emergency detection in an ambient assisted living context. However,
depth sensor networks as well as modern vision algorithms have many parameters
and require fine-tuned, scene-specific configurations to achieve optimal
performance. Due to strongly varying scenes and changing conditions at run time it
is very challenging to fine-tune those parameters manually in real world
applications. To overcome the problem of scene-specific manual (re)configuration of
depth sensor networks, we propose a scene-adaptive scheme which exploits the
scene knowledge to improve perception and post processing vision algorithms.
Our objective is not only to tune the given parameters but also to improve the
vision algorithms, such as stereo block matching, detection or tracking by
explicit exploitation of the scene knowledge, e.g. by building scene-specific object
models. Therefore, we fuse the knowledge inferred from the sensor network into a
common world model, representing our current context knowledge. This
knowledge is then fed back to optimize sensor parameters and algorithms to improve
the performance of a sensor network at run time.
2
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <p>
        The configuration of video sensor networks in the context of video surveillance
has been widely studied in the literature. In [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] a general overview of the
different aspects of sensor network reconfiguration is given. Rinner et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]
focus on the aspect of configuration of smart camera networks in the context of
video surveillance. They review the configuration for a specific analysis task and
evaluate different configuration methods. In [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] a flexible uncertainty model is
presented to reconfigure the sensor network with the objective to optimize the
detection performance. Fischer et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] give an overview of intelligent
surveillance systems, analyzing the information flow between sensors, world model and
inference algorithms. In [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] an overview to visual sensor networks is given.
However, prior work focuses on monocular camera networks and employs
parameter reconfiguration. In contrast, our work deals with depth sensor networks
and proposes a scheme for explicit exploitation of the given scene knowledge.
This includes conventional parameter reconfiguration methods as well as
methods that construct and use sophisticated world models to improve the integrated
algorithms of sensor networks at run time.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Scene-adaptive sensor network optimization</title>
      <p>In this section we present a scheme for scene-adaptive sensor network
optimization. The general information flow in a depth sensor network is depicted in
Fig. 1 and separated into five different abstraction layers. The sensing layer
sensing
sensing
sensing</p>
      <p>sensor data
postprocessing</p>
      <p>sensor data
postprocessing
local 2D and
3D analysis
local 2D and
3D analysis
sensor data
postprocessing
local 2D and
3D analysis
…
…
…
data and
knowledge fusion
global data
analysis
world
model
scene and
situation analysis
contains low-level methods related to the raw sensor measurement such as
synchronization, calibration and image acquisition. In the sensor data post
processing layer depth estimation algorithms (e.g. stereo block matching), filtering
and low-level feature extractors are included. Local data analysis covers high
level vision algorithms which take the RGB-D data as input such as
segmentation, recognition, object detection, local 3D object and scene reconstruction or</p>
      <p>Scene-Adaptive Optimization Scheme for Depth Sensor Networks
3
tracking of objects. Based on the results of the data and knowledge fusion,
the global data analysis layer includes methods which make use of the fused
information of multiple sensors of the network. Examples are 3D scene
reconstruction, 3D object localization and global tracking. The sensor network infers
information about the scene across abstraction levels. Over time, information is
fused into a common world model which represents the current scene knowledge.
While a world model can be used to do e.g. scene and situation analysis, we use
it to optimize the parameters of each individual sensor online and support the
data analysis methods e.g. by building scene-specific object models gradually.
3.1</p>
      <sec id="sec-3-1">
        <title>Knowledge representation</title>
        <p>
          The employed knowledge representation within the world model has to be
expressive to solve the high-level task of the sensor network and the optimization
of the sensor network itself. The fusion layer might provide sensor data as well as
locally derived high-level knowledge and the world model therefore might need
to cover low-level data up to high level information. Taking these aspects into
account, several existing approaches for knowledge representations are qualified
to serve as world model. For most tasks and networks, a world model consisting
of geometric and semantic scene descriptions will be suitable. Geometric scene
knowledge thereby encompasses information about the objects contained in the
scene and their properties. This includes the object class (e.g. humans, furnature,
floor plan), the object location and orientation in a global world coordinate
system, dynamic properties e.g. a motion model, shape, material. Examples for such
a world model are object oriented world models [
          <xref ref-type="bibr" rid="ref2 ref5">2, 5</xref>
          ]. In order to enhance the
quality of the world model, a knowledge base consisting of preprocessed
information or prior knowledge can be used. This includes morphable shape models [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]
for different object classes as well as common recognition, detection and
segmentation models [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] which are applied on image and 3D data, e.g. RGB-D data,
point clouds, voxels or triangulated surfaces [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. In terms of semantic knowledge
Fuzzy Metric Temporal Logic and Situation Graph Trees [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] or ontologies [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]
can be incorporated. The semantic description might be data driven, e.g. Hartz
and Neumann [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] use a scene interpretation system [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] and learn ontological
concept descriptions from data.
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Optimization possibilities</title>
        <p>
          Depth sensor networks involve multiple algorithms which leads to a large amount
of parameters. In this section we give an overview of parameters and methods
which are suitable for automatic scene-adaptive sensor optimization. We assume
that a suitable knowledge base (see section 3.1) exists and focus on algorithm
and parameter optimization. Following our layered scheme, we categorize the
optimization targets into three major categories, see Fig. 2. Sensing
parameters have a direct impact on the measurement quality. Parts of this category
have already been addressed. Auto exposure is state-of-the-art for decades in
consumer cameras, but sophisticated scene models [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] can improve the result
4
e.g. by taking only the pixel intensities near regions of interest into account.
Sensor data post processing methods vary highly between different depth
sensing technologies. The depth estimation of a stereo sensor can be improved
by setting the minimum and maximum observable disparity based on
geometric scene knowledge. In section 4.1 a approach for the task of scene-adaptive
disparity estimation is presented with an exemplary knowledge representation.
Many scene-adapative local data analysis methods have already been
published. Yang et al. [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] learn global appearance and motion models to improve
multiple target tracking. Masksai et al. [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] propose a context-aware
optimization strategy for multi object tracking. They learn the most likely trajectory
patterns with respect to a given scene layout to reduce incorrect assignments
between detections and tracks. In 4.2 we show how the task of people detection
can be optimized in a scene-specific fashion.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Application</title>
      <p>In this section we show the applicability of our scheme on two exemplary use
cases.
4.1</p>
      <sec id="sec-4-1">
        <title>3D model based disparity estimation</title>
        <p>Our knowledge representation contains sensor knowledge in the form of a camera
model and existing camera calibration parameters π, scene geometry using a
ground plane assumption P (h) ⊂ R3 and a 3D morphable human surface model
parameterized by β. Scene semantics are represented as segmentations of a single
human sh and the ground plane sg in the image. Let Dπ(u) be a depth image
computed using the estimated disparity values u from the image pair (I1, I2).
Classical stereo algorithms estimate the disparity values u minimizing a cost
function</p>
        <p>E(u) = Ephotometric(u; I1, I2) + Ereg(u) ,
(1)</p>
        <p>Scene-Adaptive Optimization Scheme for Depth Sensor Networks
5
where Ephotometric is the photometric error penalizing intensity deviation in the
local neighborhood given u and Ereg regularizes the problem penalizing unlikely
disparity values based on simple scene assumptions. We propose to employ a
scene-adaptive optimization scheme reformulating (1) with</p>
        <p>Eadaptive(u) = Ephotometric(u; I1, I2) + Emodel(u[sh], u[sg]; β, h) ,
(2)
where Emodel uses our provided scene representation to measure the deviation
from the estimated depth at the segmented pixel locations u[sh] and u[sg] to
the explicit geometric scene representation consisting of the ground plane at
height h and the human shape model parameterized by β. Scene-adaptive
disparity estimation is then performed by estimating uˆ = arg minu Eadaptive(u).
Eq.(2) can be extended in various ways, which proves the generality of the
proposed approach by e.g. introducing a human motion model to enforce temporal
consistency constraints.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>People detection with multiple depth sensors</title>
        <p>The sensors have a top view on the scene and a significant overlap to each
other. Additionally, we assume that the sensors are intrinsically and extrinsically
calibrated in advance and that the common ground plane is known. We model
the presence of a person on the ground floor as a discrete grid of Bernoulli
random variables X = (x1, .., xn), xi ∈ {0, 1} where each xi maps to one specific
ground plane grid location gi ∈ R2. Our goal is to infer the likelihood of a scene
configuration X given current depth observations O = (O1, . . . , OC ) from C
depth sensors. Applying Bayes’ theorem and assuming that the prior factorizes
n
as p(X) = Qi=1 p(xi) we get the posterior distribution
n
p(X|O) ∝ p(O|X) Y p(xi).</p>
        <p>
          i=1
For this application we assume that the likelihood p(O|X) is given (see [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]
for details on the construction of the likelihood) and only focus on the
sceneadaptive choice of the prior p(X). We start with an uninformative prior to make
the detection of people at every location equally likely. In many real world scenes
this is a crude assumption due to obstacles or preferred walking tracks which
can be present in the scene. Thus, we propose to accumulate the detections
over time to get the relative frequencies H = (h1, . . . , hn) of the presence of
people for every ground plane grid location gi and fuse those information into
the world model. This scene-specific knowledge can be used in the feedback step
to continuously update the prior beliefs p(xi) accordingly to H on regular time
intervals.
5
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>In the present work we have proposed a scheme for scene-adaptive optimization
of depth sensor networks. We have given an analysis of relevant knowledge
representations and categorized identified optimization targets. Moreover, we have
(3)</p>
      <p>J.Wetzel et al.
exemplarily applied our scheme on the use cases of disparity estimation as well
as people detection with multiple depth sensors. Future work will include the
investigation of more use cases as well as proof of concept implementations.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Ahmed</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saint</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shabayek</surname>
            ,
            <given-names>A.E.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cherenkova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Das</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gusev</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aouada</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ottersten</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Deep learning advances on different 3d data representations: A survey</article-title>
          . arXiv preprint arXiv:
          <year>1808</year>
          .
          <volume>01462</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bauer</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Emter</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vagts</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Beyerer</surname>
          </string-name>
          , J.:
          <article-title>Object-oriented world model for surveillance systems</article-title>
          .
          <source>In: Future Security: 4th Security Research Conference</source>
          . pp.
          <fpage>339</fpage>
          -
          <lpage>345</lpage>
          . Fraunhofer Verlag (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Blanz</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vetter</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>A morphable model for the synthesis of 3d faces</article-title>
          .
          <source>In: Proceedings of the 26th annual conference on computer graphics and interactive techniques</source>
          . pp.
          <fpage>187</fpage>
          -
          <lpage>194</lpage>
          . ACM Press/Addison-Wesley Publishing Co.
          <article-title>(</article-title>
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Fischer</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Beyerer</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>A top-down-view on intelligent surveillance systems</article-title>
          .
          <source>Proc. of the 7th International Conference on Systems (c)</source>
          ,
          <fpage>43</fpage>
          -
          <lpage>48</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. GheT¸ a, I.,
          <string-name>
            <surname>Heizmann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Belkin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Beyerer</surname>
          </string-name>
          , J.:
          <article-title>World modeling for autonomous systems</article-title>
          . In: Dillmann,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Beyerer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Hanebeck</surname>
          </string-name>
          , U.D.,
          <string-name>
            <surname>Schultz</surname>
          </string-name>
          , T. (eds.)
          <source>KI 2010: Advances in Artificial Intelligence</source>
          . pp.
          <fpage>176</fpage>
          -
          <lpage>183</lpage>
          . Springer Berlin Heidelberg (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Hartz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neumann</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Learning a knowledge base of ontological concepts for high-level scene interpretation</article-title>
          .
          <source>In: ICMLA</source>
          . pp.
          <fpage>436</fpage>
          -
          <lpage>443</lpage>
          . IEEE (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Hotz</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neumann</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Scene interpretation as a configuration task (</article-title>
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Kyrkou</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Christoforou</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Timotheou</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Theocharides</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Panayiotou</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polycarpou</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Optimizing the detection performance of smart camera networks through a probabilistic image-based model</article-title>
          .
          <source>IEEE Transactions on Circuits and Systems for Video Technology</source>
          <volume>8215</volume>
          (c) (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Maksai</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fleuret</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fua</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Non-markovian globally consistent multi-object tracking</article-title>
          .
          <source>In: IEEE ICCV</source>
          . pp.
          <fpage>2544</fpage>
          -
          <lpage>2554</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Marino</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salakhutdinov</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gupta</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>The more you know: Using knowledge graphs for image classification</article-title>
          .
          <source>arXiv preprint arXiv:1612.04844</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. Mu¨nch, D., IJsselmuiden, J.,
          <string-name>
            <surname>Arens</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stiefelhagen</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>High-level situation recognition using fuzzy metric temporal logic, case studies in surveillance and smart environments</article-title>
          .
          <source>In: ICCV Workshops</source>
          . pp.
          <fpage>882</fpage>
          -
          <lpage>889</lpage>
          . IEEE (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Rinner</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dieber</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Esterle</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lewis</surname>
            ,
            <given-names>P.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yao</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          :
          <article-title>Resource-aware configuration in smart camera networks</article-title>
          .
          <source>IEEE CVPR (1)</source>
          ,
          <fpage>58</fpage>
          -
          <lpage>65</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Sanmiguel</surname>
            ,
            <given-names>J.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Micheloni</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shoop</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Foresti</surname>
            ,
            <given-names>G.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cavallaro</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Selfreconfigurable smart camera networks</article-title>
          .
          <source>IEEE Computer 47(5)</source>
          ,
          <fpage>67</fpage>
          -
          <lpage>73</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Soro</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heinzelman</surname>
            ,
            <given-names>W.:</given-names>
          </string-name>
          <article-title>A survey of visual sensor networks</article-title>
          .
          <source>Advances in Multimedia</source>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Wetzel</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zeitvogel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Laubenheimer</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heizmann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Towards global people detection and tracking using multiple depth sensors</article-title>
          .
          <source>IEEE ISETC</source>
          ,
          <string-name>
            <surname>Timisoara</surname>
          </string-name>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nevatia</surname>
            ,
            <given-names>R.:</given-names>
          </string-name>
          <article-title>An online learned crf model for multi-target tracking</article-title>
          .
          <source>IEEE</source>
          CVPR pp.
          <fpage>2034</fpage>
          -
          <lpage>2041</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vesdapunt</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kang</surname>
            ,
            <given-names>S.B.</given-names>
          </string-name>
          :
          <article-title>Personalized attentionaware exposure control using reinforcement learning 14(8</article-title>
          ),
          <fpage>1</fpage>
          -
          <lpage>17</lpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>Z.Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zheng</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>S.t.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          :
          <article-title>Object detection with deep learning: A review</article-title>
          . arXiv preprint arXiv:
          <year>1807</year>
          .
          <volume>05511</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>