<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Hallway Tracker: Hospital Contact Tracing During the COVID-19 Pandemic</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Christian Marinoni</string-name>
          <email>christian.marinoni@uniroma1.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Valerio Ponzi</string-name>
          <email>ponzi@diag.uniroma1.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Danilo Comminiello</string-name>
          <email>danilo.comminiello@uniroma1.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer, Control and Management Engineering, Sapienza University of Rome</institution>
          ,
          <addr-line>Via Ariosto 25, Roma, 00185</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Dpt. of Information Engineering, Electronics and Telecommunications, Sapienza University of Rome</institution>
          ,
          <addr-line>Via Eudossiana 18, Roma, 00184</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Institute for Systems Analysis and Computer Science, Italian National Research Council</institution>
          ,
          <addr-line>Via dei Taurini 19, Roma, 00185</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <fpage>51</fpage>
      <lpage>61</lpage>
      <abstract>
        <p>During the COVID-19 pandemic, the use of a people tracking system could have been crucial, particularly in sensitive environments, such as hospitals. DPPL Hallway Tracker is a framework that uses security camera footage to determine which rooms in a corridor a person has entered. It generates a database containing all the people identified and allows quick identification of potential cases of infection based on the time spent in a room and its maximum capacity. DPPL Hallway Tracker is structured in two phases: detection and re-identification. In the first phase, it exploits Mask RCNN to identify people and room doors. In the second one, it uses the deep association metric model from DeepSORT to re-identify a person as he leaves a room.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>COVID-19 Pandemic</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        Managing a pandemic has proved to be a dificult
challenge despite the technological developments of the past
decades. Containment measures based on restrictions on
personal mobility (such as lockdowns) have proved to be
very efective for infection containment [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3</xref>
        ].
However, these turn out to be short-term solutions that are
not extendable throughout the whole virus’s life cycle.
      </p>
      <sec id="sec-2-1">
        <title>As with Covid-19, the presence of a potentially infected</title>
        <p>
          individual in a closed environment is a central problem
Face masks, in combination with good room ventilation,
help to reduce the risk of transmission. However, it is not
suficient to eliminate all the risks. Tracking operations
are required to ensure the identification of the chain of
tagion. Tracking turns out to be even more essential in
public settings, such as public ofices and hospitals [
          <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
          ].
cific tracking apps (respectively, Immuni and
CoronaWarn-App) for a Bluetooth-based contact estimation
[
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. These solutions, although potentially efective, have
shown evident limitations, such as low difusion in the
population, constraints on the version of the smartphone
tives. While they may be efective in the short term since
they are employable on a big scale, other solutions prove
and segmented using Mask R-CNN; then, their mask is
passed to a Re-ID network to obtain an identifier (an
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>The descriptors are finally compared with those of the</title>
        <p>people already known to verify the person’s identity.
Another contribution, in addition to the general approach
networks, built from scratch or starting from existing
ones.</p>
      </sec>
      <sec id="sec-2-3">
        <title>DPPL Hallway Tracker appears to be very efective</title>
        <p>and the risk of contagion increases with exposure time. the level of saturation of the room given its maximum
contacts and the estimation of the relative risk of con- people’s movements, exploiting the characterization of</p>
        <p>Some countries, such as Italy and Germany, used spe- tion features to determine a distribution of the positions
OS, poor estimation of distances and related false posi- array) that “describes” the way they appear in the scene.
CEUR</p>
        <p>ceur-ws.org
in tracking people entering and leaving rooms facing a
corridor. The use of appearance features turns out to be
suficiently robust to allow correct identification, even if
it is less efective in recognizing people who reappear in
the corridor without leaving a room.</p>
        <p>This report describes the project’s workflow, from the
description of the datasets to the results’ analysis.
2. Related works
multiple people entering the same room collapse at the
same value, thus providing no valuable information for
the ID attribution when a person leaves the room. On the
contrary, the use of a re-identification network based on
appearance features in DeepSORT is functional for the
current application and is therefore also implemented in
this project.</p>
        <p>
          In today’s literature, at best of our knowledge, there
are no studies aimed at analyzing the specific context of
tracking and re-identifying people who enter and leave
rooms. Pedestrians on streets or people moving around
indoors are usually the focus of most approaches. Other
works specialize in counting people in some particular
environments. For example, Rabaud and Belongie [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]
investigate the possibility of counting people passing
through crowded environments; [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] focus on
counting passengers getting in/out of a bus and [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] of a
metropolitan train; [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] counts people walking through
a corridor or a door, without keeping into account their
identities.
        </p>
        <p>The absence of a similar application makes the
comparison between the implementation proposed in this
project with a baseline more complex. Therefore, in the
following Sections, the individual modules that constitute
it are compared with corresponding existing solutions, in
the attempt to ofer an objective yardstick on the choices
made.
3.1.1. Door detection</p>
      </sec>
      <sec id="sec-2-4">
        <title>To provide door detection, Mask R-CNN[19] was fine</title>
        <p>tuned with a dedicated dataset, assembled for the purpose.</p>
        <p>It includes a selection of 2773 out of 3000 RGB images
of the DeepDoors2 dataset [24], which is freely available
online. These images represent one or multiple doors
in diferent outdoors and indoors scenarios, which do
not necessarily correspond to a corridor: in fact, the
large majority of them represent doors from the front.</p>
        <p>
          They also include obstacles that partially occlude part
of the doors. The annotations in the DeepDoors2 data
set are provided as additional images where each one
has a black background and diferent coloured masks for
the doors. Being interested in this project more in the
Figure 1: General scheme of the R-CNN Mask framework. portion of space occupied by the door than in the profile
The layers indicated with the letters C and P are convolutional of the door itself, all the images are re-masked to segment
layers that represent the backbone network. The classic pyra- exclusively the door casing. Hence, almost all images
mid architecture improves the detection of objects of various have quadrilateral-shaped masks (thus with four vertices
sizes. only). Moreover, the generated annotation files are no
more encoded as images like in the original DeepDoors2
dataset, but they are fully compatible with the COCO
task. More specifically, it employs the Mask RNN frame- dataset specifications [ 25]. In fact, the annotation files
work [
          <xref ref-type="bibr" rid="ref19 ref20">19, 20</xref>
          ], which derives from Faster RNN [
          <xref ref-type="bibr" rid="ref11">11, 21</xref>
          ] (in are JSON files containing: (1) references to all images,
turn, one of the evolutions of the original R-CNN [22]) each having a unique ID, as shown in the first row of
but adds a third parallel head used to generate the masks. Table 1; (2) a mask and bounding box (bbox) associated
It also introduces further improvements, like the support to each image (second row of Table 1).
to pixel-to-pixel alignment between network inputs and
outputs (ROI-Align). Figure 1 shows the diferent stages {"images": [
that characterize the network. {"id": 514, "width": 1080,
        </p>
        <p>Initially, the image is passed as input to a convolution- "height":1920, "file_name":"frame.jpg"},
based Feature Pyramid Network [23], which has the task ...
of extracting meaningful information from diferently- ]
sized feature maps. An object can appear in the fore- }
ground (and therefore very large in the image) or further
away from the camera; hence, this pyramidal structure {"annotations": [
facilitates its detection. The features thus extracted are {"id": 519, "iscrowd": 0,
passed to the Region Proposal Network (RPN), which "image_id": 514, "category_id": 1,
produces several Regions Of Interest (ROI), each with its "segmentation": [[587.52,...,1097.77]],
bounding box. At this point, the first-mentioned ROI- ""abrbeoax"":: 1[2416076.82.08,75}8,1.407,295.90,809.02],
Align is applied and its result is passed to the second ...
stage of the network, from which a series of fully con- ]
nected layers allow to refine the position of the bounding }
box, the class of the object it contains and its mask.</p>
        <p>Moreover, assuming the camera to be static and, there- Table 1
fore, the position of the doors to be fixed over time, this An example of the formatting of JSON files containing image
project exploits two distinct models: one for the door annotations according to COCO specifications is represented
detection only and the other for people detection. Door in this table. The first row shows the data structure used
detection is applied just in the starting phase of the frame- to list all the images in the dataset, the second row instead
work while, from then on, people detection is performed. shows the one used to specify the annotations associated with
The process of generating the two models and the related each image, thus including the mask (“segmentation”) and the
results are analyzed below. bounding box (“bbox”). The “category_id” field is always set
to 1, as there is only one category (door or person, depending
on the dataset).</p>
      </sec>
      <sec id="sec-2-5">
        <title>The dataset is split into training, validation and test</title>
        <p>sets. These subsets are disjoint; the training set contains
70% (1941) of the images, while the remaining 30% is
equally divided between the validation and test sets (416
each).</p>
        <p>
          With the new dataset available, called Dppl, we
finetuned the model pre-trained with the COCO dataset, mated mask is considered to be True if its IoU is greater
which is available on the framework’s GitHub reposi- or equal than k, false otherwise.
tory. Consequently, ResNet101 was used as the backbone, The primary challenge metric for the COCO dataset is
and training was done in the same manner as the frame- AP@[.50:.05:.95] (usually referred to simply as AP), which
work’s authors. In particular, we trained the head only is the average AP for IoU (Intersection over Union) from
for the first ten epochs; for the following thirty epochs, 0.5 to 0.95 with a step size of 0.05. This metric is also used
we fine-tuned stages four and above of the backbone too; to evaluate the results of our test set. In particular, with
ifnally, in the last ten epochs, we extended the training the Dppl dataset and the training procedure described
to the entire network. Unlike [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ], the learning rate is above, we got an AP of 85.7 and AP@.75 of 95.8. We
initially set to 0.001 (rather than 0.02) to keep the weights also report the Average Accuracy, which is calculated by
from exploding; moreover, it is divided by a factor of 10 counting how many pixels out of those belonging to a
during phases two and three of the training. The other specific area are correctly classified. In this case, rather
parameters are left unchanged, such as the weight decay than the whole image, the considered area is the smallest
of 0.0001 and momentum of 0.9. Finally, mini-masks were rectangular portion of the image that contains both the
used (i.e. the masks were resized to the size of 56x56 px) ground-truth mask and the one produced by the model.
to lessen the risk of memory problems. Data augmen- In numerical terms, we obtained an Average Accuracy of
tation (horizontal flipping) was also applied. Figure 2 95.34% in the case of Door Detection.
shows the train and validation losses got during training. Figure 3 displays the situation in a corridor not
in
        </p>
        <p>On the test set, the AP metric was used to assess the cluded in the dataset: the door on the right that is
parquality of the results produced by the training. AP, the ticularly “thinned” from the perspective is indeed not
acronym for Average Precision, computes the average detected. Precisely for this reason, the framework
proprecision value for recall values over 0 to 1. In practice, vides a specific graphical interface that allows adding
AP is computed as the mean of precision values at a set of new door positions, as shown in Section 4.3.
 equally spaced recall levels, as defined by the following
formula 3.1.2. People detection</p>
      </sec>
      <sec id="sec-2-6">
        <title>Similarly to what was done with the doors, a model for</title>
        <p>people detection is also generated. Mask R-CNN with
where, given (⋅) the precision,   ( ) = max ∶̃ ≥̃ ( )̃ the weights of COCO is already alone able to detect and
and  = 101 in COCO. AP@k stands for the average pre- segment people with acceptable accuracy. However,
finecision for IoU (Intersection over Union, i.e. how much tuning was done using a dedicated dataset built
specifithe predicted mask overlaps with the ground truth) of k. cally for the occasion from videos captured along a
hallMore specifically, in the computation of AP@k, an esti- way. More in detail, the dataset contains 793 frames
captured in a corridor by a 1080x1920 px resolution
camera that was positioned a few centimeters from the ceiling
(approximately 2.9 meters from the floor) with a vertical
image layout. In the scene, six people appear walking
down the hallway and entering/exiting the adjoining
rooms. They wear various types of clothing (including
a white coat to simulate the presence of a doctor); they
are of diferent ages and all wear face masks. One of the
people has a foot cast and crutches. All frames are
handannotated to generate high-quality masks, accurately
respecting the person’s shape. The related annotation
ifles follow the COCO specifications, as described before.</p>
        <p>The split of files between training (555 images),
validation (119) and test (119) sets follows the same proportion
as the Dppl dataset.</p>
        <p>With this second dataset available, called dPPL, we
once again fine-tuned the model pre-trained with the
COCO dataset. All the Mask R-CNN’s parameters are
kept the same, but Gamma Contrast is used as a data
augmentation technique in conjunction with horizontal
lfipping in this case.</p>
        <p>Figure 4 shows the graph of the training and
validation losses. As for the performance on the test set, Table
2 shows the comparative Average Precision values
between the use of a model trained only with COCO and
that obtained by doing fine-tuning with the dPPL dataset.</p>
        <p>This second option provides better results for both AP
and AP@.75. The same applies for the Average
Accuracy. These good results should be evaluated considering</p>
        <p>Method</p>
        <p>COCO only
COCO+fine-tuning on dPPL</p>
        <p>AP
70.5
76.3</p>
        <p>AP@.75
92.9
95.5</p>
        <p>Acc.
99.08%
99.74%
the not very high number of images that compose the
dataset. Indeed, environments with completely diferent
illumination and compositions will certainly attenuate
the good performances provided by this model.</p>
        <sec id="sec-2-6-1">
          <title>3.2. People Re-identification</title>
        </sec>
      </sec>
      <sec id="sec-2-7">
        <title>The detection of doors and people in the scene does not</title>
        <p>
          sufice to ensure accurate tracking. As mentioned above,
one can use additional information extracted from the
images within more or less complex systems, which may
exploit appearance, movement and shape features. An
example is DeepSORT [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], which uses the Kalman
filter to predict the position of a person in the next frame
and integrates appearance information based on a deep
appearance descriptor. Despite DeepSORT being a
powerful tool, the use of the Kalman Filter turns out to be less
efective when the subject disappears for long periods
from the camera view. Indeed, the Kalman Filter
modulates the state estimate of the system (in this case, the
position in the frame of a subject) as a Gaussian
distribution whose variance strictly depends on the observations
over time. When a person disappears from the scene, the
degree of uncertainty increases and the same happens to
the distribution variance. Furthermore, the Kalman Filter
would be practically useless if several people enter the
same room: the states of those subjects would collapse
into the same value, making this information useless to
distinguish a person from the others when they leave the
room. Nevertheless, the solution undertaken in
DeepSORT on the use of appearance features turns out to be
quite efective whenever the Kalman Filter is not since
it relies on visual cues. For this reason, DPPL Tracker is
primarily based on appearance features, though it also
takes advantage of some assumptions related to the work
environment (a corridor).
        </p>
        <p>In this project, Deep Cosine Metric Learning [26], the
same used in DeepSORT for appearance re-identification,
is used. It applies a variation of Softmax classifier called
Cosine Softmax Classifier, which allows obtaining a
different representation space in which compact clusters
are formed based on the appearance features. This is
achieved by first applying the 2 normalization, which
uses the 2 -norm to normalize the input values so that, if
squared and summed, they would result in the value 1,
and, secondly, by normalizing the weights. Finally, the
cosine softmax classifier is applied, which is defined as
follows:
(  = |  ) =
exp( ⋅  ̃
  )
 

∑=1 exp( ⋅  ̃   )
where  is a free scaling parameter.
1 × 10−8; moreover, the input images are scaled to 128x64
px.</p>
      </sec>
      <sec id="sec-2-8">
        <title>The use of the masked MARS dataset proves to be</title>
        <p>beneficial for the network training since it provides
improved results according to the CMC Rank@K and mAP
metrics1, as shown in Table 4. The table also shows the
results of two state-of-the-art solutions on the original
MARS dataset. Both largely outperform the solution
proposed in this project, however, they also use much more</p>
      </sec>
      <sec id="sec-2-9">
        <title>1Computed through the MARS evaluation tool, available at</title>
        <p>distractors to make it more realistic. The goal of the Re- sophisticated methods or networks with many more
padataset after applying object instance segmentation.</p>
        <p>Method</p>
        <p>Rank1</p>
        <p>Rank5</p>
        <p>DCML on MARS</p>
        <p>DCML on masked MARS
B-BOT + Attention &amp; CL loss</p>
        <p>MGH</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. DPPL Tracker framework</title>
      <sec id="sec-3-1">
        <title>People tracking is ofered through a specific framework</title>
        <p>that employs Mask R-CNN and the above-mentioned
re-identification network. It also provides additional
features to improve the user experience and optimize the
search for people. More precisely, the workflow is the
following: the first frame is first passed as input to Mask
R-CNN for doors detection. Once doors are located, that
frame and the following ones are passed to the same
network (with diferent weights) for people detection.
The portion of the image containing each person is then
multiplied by the corresponding mask (to have a black
background) and, after being resized to 128 x 64 px, is
passed to the re-identification network. The latter has its
head cut of so that it outputs an array of size 128
(generated by the last Dense layer). This array is a descriptor of
the person’s appearance and is used by the framework’s
main algorithm to associate a unique identity ID with
each person.</p>
        <sec id="sec-3-1-1">
          <title>4.1. Main algorithm</title>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Algorithm 1: Main algorithm</title>
        <p>Data:   _  ,</p>
        <p>Result: People identified
1    _ ← [] ;
2 for   in   _  do
3  ,  ←   ;
4   ←   [[0] ∶</p>
        <p>
          [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] ∶ [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]] ;
5   _ ←   ∗ 
6   ←
        </p>
        <p>get_person_identifier (  _)
7    ,    ← ifnd_nearest(  
  );
if pID == -1 then</p>
        <p>// New person appeared
,
;
After selecting the video, the first frame is analyzed // Person in the corridor or exited from a
through mask-RCNN to locate the doors in the scene. room
If one or more doors are not detected, the user can man- 12 end
ually add additional ones, as shown in Section 4.3. Only 13    _ ←  
at that point, the analysis of the following frames begins. 14 end
Pseudocode 1 shows the main steps. As previously de- 15 for   in  _ _ _() do
scribed, Mask R-CNN is again used to identify people, 16 if   not in    _ then
while the re-ID network provides the people appearance 17 if   close to a room then
descriptors. At that point, for each person, the find_near- 18 // Person entered in a room
est function allows identifying the already-known closest 19 else
identifier to the detected descriptor, if any. In this way, 20 // Person disappeared from the scene
it is possible to determine whether that person already (may due to an occlusion)
appeared in the past and, depending on their position 21 end
and on the knowledge derived from past frames, a log is 22 end
added to the database if they are leaving a room. If there
is no similar person, the algorithm adds a new one to the 23 end
scene. The final for loop finds all people who were in
the environment up to the previous frame but are now
missing. In this case, there are two alternatives: the per- people who last left the corridor, then moving on to all
son may either have entered a room (if in the preceding the known people. The similarity between two identifiers
frame they were suficiently close to the relative door) ID and ID is computed with the cosine similarity, as
or may have disappeared, for example, because they left follows
the hallway or are temporarily occluded. To improve    = ID ⋅ ID
the eficacy of the algorithm, the framework starts track- ‖ID‖‖ ID‖
ing a person when he appears entirely in the scene and Two identifiers are more similar as the cosine
similarhis bounding box is at a minimum distance from the ity goes to one. Hence the need to define, for each of
image edges. Furthermore, it uses the area of the bbox the listed searches, a threshold that defines when two
to interrupt (temporarily or not) the tracking when an descriptors must be considered suficiently similar (and
object/person occludes the subject or when the tracked therefore belonging to the same person) or not. The
person has nearly entirely entered a room. choice of the threshold heavily influences the tracking</p>
        <p>A fundamental step is the one implemented by the efectiveness. In the various phases a diferent threshold
ifnd_nearest function, shown in Pseudocode 2. It uses is used, more specifically: (1) if a person is walking along
diferentiated searches to find the already-known person the corridor without other people in the close vicinity
with the most similar identity to the one passed as input. and, if compared to the previous frame, that person has
First, it searches among the people visible in the scene in not moved too far from their previous position in the
the previous frame. In case of failure, if the detection is scene, then a greater dissimilarity between the
descripclose enough to a door - according to a given threshold tors is tolerated; (2) in other cases, the threshold is set to
- it searches among the people who are known to be in a value between 0.85 and 0.9. Section 5 discusses some
that room. As a last chance, it starts searching among the critical issues regarding the choice of the threshold.</p>
      </sec>
      <sec id="sec-3-3">
        <title>Algorithm 2: Find nearest identity</title>
        <p>a room (Figure 7). In the latter case, the interface
highlights the riskiest situations (for example, if the room
capacity has been exceeded) in addition to providing all
records linked to the entered ID.</p>
        <p>frameID personID roomID "in/out/new"
where frameID is an incremental value representing the
currently processed frame, personID is a unique integer
associated to a person (diferent from the identifier
representing the way that person looks in the scene), roomID
is the ID of the room the person is entering/leaving - if
any - and it is equal to −1 otherwise. The last label has
the value “in” or “out” when “roomID” is diferent from
−1, while it assumes the value “new” when a new person
appears in the scene.</p>
        <p>For simplicity, the database is implemented via a
simple CSV file containing all the logs, but more complex
and scalable solutions (such as NoSQL) are also possible.
Knowing the video framerate, the framework derives an
estimate of the time spent in the room, to highlight
possible dangerous situations. The same is done by counting
the number of people in the same room and alerting when
the maximum capacity is exceeded.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Analysis and results</title>
      <sec id="sec-4-1">
        <title>The behaviour of the framework is evaluated in two difer</title>
        <p>4.3. GUI ent setups of incremental dificulty. In the first setup,
peoA simple user interface, implemented with the PySim- ple walk down a corridor one after the other, in a perfect
pleGUI library, is also available to provide the user with lfow that limits the occasions when two or more people
more flexible interaction with the framework. The user are simultaneously in the same room. This modality
alcan select a file or directory containing the needed frame lows focusing mainly on an inter-frame re-identification
images, as well as add new doors that Mask R-CNN did and on the correct detection of people entering and
leavnot detect. In this second case (shown in Figure 6), by ing the rooms. In the second setup, multiple people can
using a simple library such as Matplotlib, it is possible to enter the same room. The challenge, in this case, is to be
ofer a response in real-time on the location of the new able to identify the identity of a person when he leaves
doors and their heights (used by the algorithm). Finally, the room. The results show that the algorithm can handle
at the end of the processing of all frames, the user can a wide range of situations with ease, producing results
search all the times a particular ID has entered and left that are similar - if not identical - to the ground truth.</p>
        <p>First of all, it is beneficial to analyze how accurately on to analyze the accuracy of people tracking. In
particthe framework can detect the presence of one or more ular, the inter-frame re-identification of a person in the
people in the scene. To calculate the overall accuracy of scene scores 100% accuracy, even in the case of several
the detections we used two methods. The first consists people in the corridor; the same happens when the
perof considering only those frames in which a person is son leaves a room, even when more than one is inside
shown entirely (i.e., he is not hidden - even partially - by it. The criticalities are mainly two: (1) the dificulty in
objects or other people). The second way is to consider defining an eficient threshold for cosine similarity, since
all frames, including all borderline cases in which only the method adopted is susceptible to sudden changes in
a portion of a person’s arm or leg appears in the frame. the person’s position (such as front and rear vision of
Figure 8 shows an example of the frames considered with the person); (2) the influence of the quality of masks
proboth methods. The results - obviously better in numerical duced by Mask R-CNN on the re-identification network.
terms in the first case - are shown in Table 5. A sudden change in the portion of the image taken into
consideration (even without sudden movements of the</p>
        <p>Overall (Detection) Accuracy subject) can reduce the cosine similarity.</p>
        <p>Method 1 100% Cosine similarity can be a powerful tool for guiding the
Method 2 91.76% re-identification task: limiting the search to the people
Table 5 inside the room and using the cosine similarity always
The accuracy of people detection computed with two methods leads to correct identifications. Nevertheless, the
weakis shown here. With the first one, we only considered those nesses listed above heavily reduce its efectiveness when
frames in which people bodies are shown wholly in the image. it is necessary to recognize a person who had previously
The second method also includes those frames in which a left the corridor (without entering any room) and who
person is only partially visible. reappears later on. Indeed, the choice of a high threshold
(i.e., ≥ 0.9) makes it dificult to assign the same ID in
the situation under analysis, because usually, the person
will reappear in a completely diferent pose (for example,
from behind and not the front) which will reduce the
value of the cosine similarity. In this case, there will be
no ID switches between diferent people, but each time
one reappears in the scene it will be assigned a new ID.</p>
        <p>On the contrary, lowering the threshold facilitates the
ID switches, creating some cascading problems in the
framework (an ID already assigned - even if incorrectly
- to a person will not be re-assigned as long as the
person is in the scene, not even if the one it was originally
assigned to reappears). However, these problems do not
afect the recognition of people leaving the rooms: the
identifier produced by the Re-ID network and the
similarity computed with the cosine similarity is suficient
for the correct attribution of the ID. Compared to the
baseline (Re-ID network trained on the original MARS
dataset), it can be observed that the cosine similarity of
the same person in two diferent situations (frames) is
greater (by 1-2%) when assessed with our method.</p>
        <p>Figure 8: The frame on the left is an example of those con- As a final benchmark, the accuracy of the logs (seen
sidered with Method 1 for calculating the Overall Detection as the ratio of the logs equal to ones of the ground truth
Accuracy. The person’s body is entirely included in the scene. over the total number of them) produced in the tests is
The frame on the right is instead an example of those consid- equal to 50%. The accuracy goes up to 84% if we also
ered with Method 2, that takes also into account all borderline include those logs with labels “in” and “out” that difer
cases in which only a portion of a person’s arm or leg appears only in the person ID from the ground truth (but only
in the frame. In this case, the two people in the scene are only if that ID is a new one, and therefore if there is no ID
dpeatreticatlelyd vbiysitbhlee amnoddtehl.e arm of the uppermost person is not switch with a previously known identity). When a person
enters a room, the relative log at the exit is always correct,
as already mentioned above. As for performance, an</p>
        <p>Having ascertained that the framework can detect the Nvidia Tesla K80 is capable of processing 1.4-1.5 frames
presence of people with good reliability, we then move per second.</p>
        <p>We also ran a test in a setup with slightly diferent
specifications. In fact, the recording device was placed at
eye level, tilted almost parallel to the floor and with an
image ratio of 16:9. The results obtained are comparable
to those indicated above, although tracking people in
areas very distant from the camera (and therefore at lower
resolution) turns out to be more critical. Under these
conditions, it is quite easy for two diferent subjects to
appear very similar even to the human eye. An example
is shown in Figure 9. Ultimately, the framework is most
efective when the distance to the doors is not excessively
large.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>6. Conclusion</title>
      <sec id="sec-5-1">
        <title>DPPL Hallway Tracker turns out to be a good starting</title>
        <p>point for developing a framework capable of tracking
people entering and leaving multiple rooms. The use of
a re-ID network that exploits the masks produced in the
detection and segmentation phase leads, even in the tests
performed, to improvements in identification.</p>
        <p>A project extension might be able to address some of
the remaining issues: (1) the enrichment of the datasets of
people and doors could lead to better detection in several
more challenging contexts: for example, as discussed
above, the detection and segmentation of doors “thinned”
from perspective remains dificult; (2) using a dynamic
threshold and investigating complementary solutions to
51–61
the re-identification network could alleviate the dificulty
of assigning the same ID to a person who reappears in the
corridor without leaving a room. The study of solutions
for tracing people entering and leaving the rooms is of
great importance for the application developments that it
can have. It not only allows contact tracing in the event
of pandemics but it can be also used for other contexts, as
for the analysis of the movements of patients and medical
operators and the optimization of hospital wards.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>V.</given-names>
            <surname>Alfano</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. Ercolano,</surname>
          </string-name>
          <article-title>The eficacy of lockdown against covid-19: a cross-country panel analysis</article-title>
          ,
          <source>Applied health economics and health policy 18</source>
          (
          <year>2020</year>
          )
          <fpage>509</fpage>
          -
          <lpage>517</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Pepe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tedeschi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Brandizzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Iocchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <article-title>Human attention assessment using a machine learning approach with gan-based data augmentation technique trained using a custom dataset</article-title>
          ,
          <source>OBM Neurobiology 6</source>
          (
          <year>2022</year>
          ). doi:
          <volume>10</volume>
          . 21926/obm.neurobiol.
          <volume>2204139</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>V.</given-names>
            <surname>Ponzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wajda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Brociek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <article-title>Analysis pre and post covid-19 pandemic rorschach test data of using em algorithms and gmm models</article-title>
          , volume
          <volume>3360</volume>
          ,
          <year>2022</year>
          , pp.
          <fpage>55</fpage>
          -
          <lpage>63</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>V.</given-names>
            <surname>Marcotrigiano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. D.</given-names>
            <surname>Stingi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Fregnan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Magarelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pasquale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. B.</given-names>
            <surname>Orsi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Montagna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <article-title>An integrated control plan in primary schools: Results of a field investigation on nutritional and hygienic features in the apulia region (southern italy)</article-title>
          ,
          <source>Nutrients</source>
          <volume>13</volume>
          (
          <year>2021</year>
          ). doi:
          <volume>10</volume>
          .3390/nu13093006.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>De Magistris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Romano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Starczewski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <article-title>A novel dwt-based encoder for human pose estimation</article-title>
          , volume
          <volume>3360</volume>
          ,
          <year>2022</year>
          , pp.
          <fpage>33</fpage>
          -
          <lpage>40</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Arora</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zowghi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ferrari</surname>
          </string-name>
          ,
          <article-title>The rise and fall of covid-19 contact-tracing apps: when nfrs collide with pandemic</article-title>
          ,
          <source>in: 2021 IEEE 29th International Requirements Engineering Conference (RE)</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>106</fpage>
          -
          <lpage>116</lpage>
          . doi:
          <volume>10</volume>
          .1109/RE51729.
          <year>2021</year>
          .
          <volume>00017</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>N.</given-names>
            <surname>Wojke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bewley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Paulus</surname>
          </string-name>
          ,
          <article-title>Simple online and realtime tracking with a deep association metric</article-title>
          ,
          <source>in: 2017 IEEE international conference on image processing (ICIP)</source>
          , IEEE,
          <year>2017</year>
          , pp.
          <fpage>3645</fpage>
          -
          <lpage>3649</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Alfarano</surname>
          </string-name>
          , G. De Magistris,
          <string-name>
            <given-names>L.</given-names>
            <surname>Mongelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Starczewski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <article-title>A novel convmixer transformer based architecture for violent behavior detection 14126 LNAI (</article-title>
          <year>2023</year>
          )
          <fpage>3</fpage>
          -
          <lpage>16</lpage>
          . doi:
          <volume>10</volume>
          .1007/ 978- 3-
          <fpage>031</fpage>
          - 42508-
          <issue>0</issue>
          _
          <fpage>1</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>B.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Nevatia</surname>
          </string-name>
          <article-title>, Multi-target tracking by online learning of non-linear motion patterns and robust appearance models</article-title>
          ,
          <source>in: 2012 IEEE Conference on Computer Vision and Pattern Recognition</source>
          ,
          <year>2012</year>
          , pp.
          <article-title>- cascade neural network based approach</article-title>
          ,
          <year>2014</year>
          , pp.
          <fpage>1918</fpage>
          -
          <lpage>1925</lpage>
          . doi:
          <volume>10</volume>
          .1109/CVPR.
          <year>2012</year>
          .
          <volume>6247892</volume>
          .
          <fpage>355</fpage>
          -
          <lpage>362</lpage>
          . doi:
          <volume>10</volume>
          .1109/SPEEDAM.
          <year>2014</year>
          .
          <volume>6872103</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bewley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ramos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Upcroft</surname>
          </string-name>
          , [21]
          <string-name>
            <given-names>F.</given-names>
            <surname>Bonanno</surname>
          </string-name>
          , G. Capizzi,
          <string-name>
            <given-names>G. L.</given-names>
            <surname>Sciuto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <article-title>Simple online and realtime tracking, 2016 IEEE Wavelet recurrent neural network with semiInternational Conference on Image Processing parametric input data preprocessing for micro-wind (ICIP) (</article-title>
          <year>2016</year>
          ). URL: http://dx.doi.org/10.1109/ICIP. power forecasting in
          <source>integrated generation sys2016.7533003. doi:10</source>
          .1109/icip.
          <year>2016</year>
          .
          <volume>7533003</volume>
          . tems,
          <year>2015</year>
          , pp.
          <fpage>602</fpage>
          -
          <lpage>609</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICCEP.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Girshick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <surname>Faster</surname>
          </string-name>
          r-cnn:
          <year>2015</year>
          .7177554.
          <string-name>
            <surname>Towards</surname>
          </string-name>
          real
          <article-title>-time object detection with region pro-</article-title>
          [22]
          <string-name>
            <given-names>R.</given-names>
            <surname>Girshick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Donahue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Darrell</surname>
          </string-name>
          , J. Malik, Rich posal networks,
          <source>Advances in neural information feature hierarchies for accurate object detection processing systems</source>
          <volume>28</volume>
          (
          <year>2015</year>
          )
          <fpage>91</fpage>
          -
          <lpage>99</lpage>
          . and semantic segmentation,
          <source>in: Proceedings of the</source>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>R. E.</given-names>
            <surname>Kalman</surname>
          </string-name>
          ,
          <article-title>A new approach to linear filtering IEEE conference on computer vision and pattern and prediction problems (</article-title>
          <year>1960</year>
          ). recognition,
          <year>2014</year>
          , pp.
          <fpage>580</fpage>
          -
          <lpage>587</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>V.</given-names>
            <surname>Rabaud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Belongie</surname>
          </string-name>
          , Counting crowded mov- [23]
          <string-name>
            <surname>T.-Y. Lin</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Dollár</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Girshick</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>He</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Hariharan</surname>
          </string-name>
          , ing objects, in: 2006 IEEE Computer Society Con- S. Belongie,
          <article-title>Feature pyramid networks for object ference on Computer Vision</article-title>
          and Pattern Recog- detection,
          <year>2017</year>
          . arXiv:
          <volume>1612</volume>
          .03144.
          <source>nition (CVPR'06)</source>
          , volume
          <volume>1</volume>
          ,
          <year>2006</year>
          , pp.
          <fpage>705</fpage>
          -
          <lpage>711</lpage>
          . [24]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ramôa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Lopes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Alexandre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mogo</surname>
          </string-name>
          , Real-time doi:
          <volume>10</volume>
          .1109/CVPR.
          <year>2006</year>
          .
          <volume>92</volume>
          . 2d
          <article-title>-3d door detection and state classification on a</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>C.</given-names>
            <surname>Labit-Bonis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Thomas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Lerasle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Madrigal</surname>
          </string-name>
          ,
          <article-title>low-power device</article-title>
          ,
          <source>SN Applied Sciences</source>
          <volume>3</volume>
          (
          <year>2021</year>
          ).
          <article-title>Fast tracking-by-detection of bus passengers with doi</article-title>
          :
          <volume>10</volume>
          .1007/s42452-021-04588-3. siamese cnns, in: 2019 16th IEEE International [25]
          <string-name>
            <surname>T.-Y. Lin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Maire</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Belongie</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Hays</surname>
          </string-name>
          , P. Perona, Conference on Advanced Video and
          <string-name>
            <surname>Signal Based D. Ramanan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Dollár</surname>
            ,
            <given-names>C. L.</given-names>
          </string-name>
          <string-name>
            <surname>Zitnick</surname>
          </string-name>
          ,
          <source>Microsoft Surveillance (AVSS)</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          . doi:
          <volume>10</volume>
          .1109/ coco: Common objects in context,
          <source>in: European AVSS</source>
          .
          <year>2019</year>
          .8909843. conference on
          <source>computer vision</source>
          , Springer,
          <year>2014</year>
          , pp.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>C.-H. Chen</surname>
            ,
            <given-names>Y.-C.</given-names>
          </string-name>
          <string-name>
            <surname>Chang</surname>
            , T.-Y. Chen,
            <given-names>D.-J.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
          </string-name>
          ,
          <volume>740</volume>
          -
          <fpage>755</fpage>
          .
          <article-title>People counting system for getting in/out of a bus [26]</article-title>
          <string-name>
            <given-names>N.</given-names>
            <surname>Wojke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bewley</surname>
          </string-name>
          ,
          <article-title>Deep cosine metric learnbased on video processing</article-title>
          , in: 2008 Eighth In
          <article-title>- ing for person re-identification</article-title>
          ,
          <source>in: IEEE Winternational Conference on Intelligent Systems De- ter Conference on Applications of Computer Visign and Applications</source>
          , volume
          <volume>3</volume>
          ,
          <year>2008</year>
          , pp.
          <fpage>565</fpage>
          -
          <lpage>569</lpage>
          . sion (WACV), IEEE,
          <year>2018</year>
          . URL: https://elib.dlr.de/ doi:10.1109/ISDA.
          <year>2008</year>
          .
          <volume>335</volume>
          . 116408/.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>J.-W. Perng</surname>
            , T.-
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Y.-W.</given-names>
          </string-name>
          <string-name>
            <surname>Hsu</surname>
            ,
            <given-names>B.-F.</given-names>
          </string-name>
          <string-name>
            <surname>Wu</surname>
          </string-name>
          , The [27]
          <article-title>MARS: A Video Benchmark for Large-Scale Person design and implementation of a vision-based people Re-</article-title>
          identification, Springer,
          <year>2016</year>
          .
          <article-title>counting system in buses</article-title>
          , in: 2016 International [28]
          <string-name>
            <given-names>L.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Tian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Tian</surname>
          </string-name>
          , Conference on System Science and
          <article-title>Engineering Scalable person re-identification: A benchmark</article-title>
          ,
          <source>in: (ICSSE)</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>3</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICSSE.
          <year>2016</year>
          .
          <source>Proceedings of the IEEE International Conference 7551620. on Computer Vision</source>
          (ICCV),
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Velastin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Fernández</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Espinosa</surname>
          </string-name>
          , [29]
          <string-name>
            <given-names>P.</given-names>
            <surname>Pathak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. E.</given-names>
            <surname>Eshratifar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gormish</surname>
          </string-name>
          , Video perA. Bay,
          <article-title>Detecting, tracking and counting peo- son re-id: Fantastic techniques and where to find ple getting on/of a metropolitan train using them</article-title>
          ,
          <year>2019</year>
          . arXiv:
          <year>1912</year>
          .
          <article-title>05295. a standard video camera</article-title>
          ,
          <source>Sensors</source>
          <volume>20</volume>
          (
          <year>2020</year>
          ). [30]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Qin1</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tai</surname>
          </string-name>
          , URL: https://www.mdpi.com/1424-8220/20/21/6251. L.
          <string-name>
            <surname>Shao</surname>
          </string-name>
          ,
          <article-title>Learning multi-granular hypergraphs doi:10.3390/s20216251. for video-based person re-identification</article-title>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>S. D.</given-names>
            <surname>Pore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. F.</given-names>
            <surname>Momin</surname>
          </string-name>
          , Bidirectional people count- arXiv:
          <fpage>2104</fpage>
          .14913.
          <article-title>ing system in video surveillance</article-title>
          ,
          <source>in: 2016 IEEE International Conference on Recent Trends in Electronics, Information Communication Technology (RTEICT)</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>724</fpage>
          -
          <lpage>727</lpage>
          . doi:
          <volume>10</volume>
          .1109/RTEICT.
          <year>2016</year>
          .
          <volume>7807919</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>K.</given-names>
            <surname>He</surname>
          </string-name>
          , G. Gkioxari,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dollár</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Girshick</surname>
          </string-name>
          ,
          <article-title>Mask rcnn</article-title>
          ,
          <source>in: Proceedings of the IEEE international conference on computer vision</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>2961</fpage>
          -
          <lpage>2969</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>F.</given-names>
            <surname>Bonanno</surname>
          </string-name>
          , G. Capizzi,
          <string-name>
            <given-names>S.</given-names>
            <surname>Coco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Laudani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. L.</given-names>
            <surname>Sciuto</surname>
          </string-name>
          ,
          <article-title>Optimal thicknesses determination in a multilayer structure to improve the spp eficiency for photovoltaic devices by an hybrid fem</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>