<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Managing crowded museums: Visitors flow mea- Deep learning techniques for automatic detection,
surement, analysis, modeling, and optimization, IEEE Access</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Approach to Model Museum Visitors</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alessio Ferrato</string-name>
          <email>ale.ferrato@stud.uniroma3.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carla Limongelli</string-name>
          <email>limongel@dia.uniroma3.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mauro Mezzini</string-name>
          <email>mauro.mezzini@uniroma3.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giuseppe Sansonetti</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Helsinki, Finland</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Education, Roma Tre University</institution>
          ,
          <addr-line>Viale del Castro Pretorio 20, 00185 Rome</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Engineering, Roma Tre University</institution>
          ,
          <addr-line>Via della Vasca Navale 79, 00146 Rome</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>relies on the Faster Region-based Convolutional Neural</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <volume>8</volume>
      <issue>2020</issue>
      <fpage>352</fpage>
      <lpage>355</lpage>
      <abstract>
        <p>Although ubiquitous and fast access to the Internet allows us to admire objects and artworks exhibited worldwide from the comfort of our home, visiting a museum or an exhibition remains an essential experience today. Current technologies can help make that experience even more satisfying. For instance, they can assist the user during the visit, personalizing her experience by suggesting the artworks of her higher interest and providing her with related textual and multimedia content. To this aim, it is necessary to automatically acquire information relating to the active user. In this paper, we show how a deep neural network-based approach can allow us to obtain accurate information for understanding the behavior of the visitor alone or in a group. This information can also be used to identify users similar to the active one to suggest not only personalized itineraries but also possible visiting companions for promoting the museum as a vehicle for social and cultural inclusion.</p>
      </abstract>
      <kwd-group>
        <kwd>User interfaces</kwd>
        <kwd>Computer vision</kwd>
        <kwd>Deep Learning</kwd>
        <kwd>Museum visitors</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        1. Introduction and Background
Recent technological advances have made it possible to
significantly improve the experience of citizens when
they use public services [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ] or when they enjoy points
Among the diferent possible points of interest, there are
also museums and exhibits. The first studies concerning
the observation and analysis of museum visitor behavior
date back to the first half of the twentieth century [ 7, 8, 9].
      </p>
      <p>
        Since then, the works that publish studies based on the
analysis of visitor tracking have multiplied, namely, on
the detailed recording of “not only where visitors go but
also what visitors do while inside an exhibition” [10]. In
early studies on the subject, the most common method
for recording visitor behavior was the paper-and-pencil
one. Although this method is simple and low-cost,
several aspects limit its validity. Among these, the lack of
temporal information, more complicated to collect by the
observer, the need to transfer the data collected on paper
to the database, and the inability to accurately determine
the visitor’s real engagement. Fortunately, recent
technological advances in Machine Learning [11] have made
available to researchers several approaches for automatic
visitor tracking [12]. In the research literature, there are
several contributions on the technologies adopted for
R-CNN).
divided into three major parts:
1. A CNN backbone (composed of a ResNet [27]
and a Feature Pyramid Network (FPN) [28]) that
receives in input the image and gives in output a table position is fed by the detections of the model, the
conv feature map; table camera is determined and created in advance by the
2. A Region Proposal Network (RPN) that takes as system supervisor. First of all, it can be convenient to
input the conv feature map in output from the create the view dist_positions using the SQL Query 1.
backbone and returns a set of rectangular boxes,
each of which is associated with a score giving Query 1: View creation
the likelihood that the region contains an object CREATE VIEW dist_positions AS
or simple background; SELECT DISTINCT P.TIMESTMP as TIMESTMP, P.BADGE_ID as
3. The conv feature map given in output from the BADGE_ID, C.CT as CT,
backbone is also used by a detection system that, /P.*XCh+anC.gXinAgSthX,e reference system */
given the set of regions from the RPN, determines, P.Y + C.Y AS Y,
using the conv feature map of the backbone, for P.Z + C.Z AS Z
each distinct class, a score ∈ [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ]. This value FROM positions P, camera C
represents the likelihood that the object belongs WHERE P.CAMERA_ID = C.CAMERA_ID
to the corresponding class. The detection system Furthermore we add another table artwork with
atalso estimates the regression bounding box coor- tributes (ID, AX, AY, AZ, AW, AH). A tuple (, ,  , ,  , ℎ)
dinates ,  ,  , ℎ of the object being proposed by of table artwork records the  of the artwork, the upper
the RPN. right corner coordinates  ,  ,  (with respect to the
muIn this paper, we show how the gathered information seum), the height ℎ and width  of a rectangular box in
can be used to model museum visitors and identify their front of the artwork  .
nearest neighbors. The aim is to promote the museum Using this simple database we may easily and
efecas a vehicle to foster the social and cultural inclusion of tively retrieve the data described in the previous section.
its visitors. In particular, we may obtain the distances between all
pairs of distinct visitors (identified by a badge ID) and at
the same choose only those pairs which have a distance
2. Database with Collected Data less than a prefixed threshold  . In fact, we may first
create the SQL view visit_times (Query 2) that computes
for each artwork the total time a badge has spent nearby
the artwork, for all badge ids and for the time interval
between  0 and  1. In the following, for the sake of
simplicity, we assume that every badge has been detected
in front of every possible artwork. By this assumption,
the query visit_times returns for every badge and every
artwork at least a value.
      </p>
      <p>In order to make the analysis of the data collection easy
and at the same time efective, we propose the following
database implementation and give some sample queries
that could cover the most basic and useful needs when a
museum staf member wants to extract useful
information about visitor behavior from the database. The data
collected through the proposed system can be stored
in a data structure that supports spatial and temporal
analyses of visitor behavior [29]. Let us suppose, for Query 2: Badge-Artwork visit time
example, that we have  cameras and  badges. Each
camera detects, at a generic timestamp, a badge at cer- SCERLEEACTTE VpI.BEAWDvGiEs_iItD_atismBeAsDAGSE_ID, a.ID as A_ID, SUM(p.CT)
tain coordinates from the camera. We can store all those as V_TIMES
detections in a database composed of two tables. The FROM dist_positions p, artwork a
ifrst table, called positions, has attributes (TIMESTMP, WHERE p.TIMESTMP BETWEEN t_0 AND t_1 AND
CAMERA_ID, BADGE_ID, X, Y, Z) and the second table, p.X BETWEEN a.AX AND a.AX + a.AW AND
called camera, has attributes (CAMERA_ID, CT, X, Y, Z). pp..YZ BBEETTWWEEEENN aa..AAYZ AANNDD aa..AAYZ ++ 2a..A7H0 AND
A single tuple (,  _,  _, ,  , ) of position represents GROUP BY p.bid, a.id
a detection at timestamp  from the camera  _ of the
badge  _ at coordinates ,  ,  with respect to camera Using this view, we can compute the total time a single
 _ . A single tuple ( _, , ,  , ) of camera represents badge spent in front of any artwork (i.e, total_visit_times
the coordinates ,  ,  of the camera  _ in relation to the in Query 3).
museum. The value  is the time period of a frame. If 
is the frame rate of the camera, then we have  = 1/ . Query 3: Badge total visit time
For the sake of simplicity, hereafter, we suppose that  CREATE VIEW total_visit_times AS
assumes the same value for all cameras (i.e., 1/24 s), but SELECT BADGE_ID,SUM(V_TIMES) as TOTAL_TIMES
all the discussion can be extended with simple and mini- FROM visit_times
mal modifications to the general case, in which cameras GROUP BY BADGE_ID
can have diferent frame rates. We note that, whilst the</p>
      <p>Then, using the two previous views, namely, visit_times is defined by the following measure:
and total_visit_times, we can compute the percentage of
time spent by any badge in front of any artwork (i.e.,
total_visit_times_perc in Query 4).</p>
      <p>Query 4: Badge-Artwork percentage time
CREATE VIEW visit_times_perc AS
SELECT BADGE_ID, A_ID, V_TIMES*100/TOTAL_TIMES as</p>
      <p>V_TIMES_PERC</p>
    </sec>
    <sec id="sec-2">
      <title>3. Distance</title>
      <p>We want to define a measure that allows us to compare
two visitors according to the time they spend in front on
each artwork. Instead of considering absolute values, we
reason on a percentage basis. Let us assume that each
visitor spends a unit of time at the exhibition and let us
calculate the percentage of the time spent observing each
artwork. Let us define:</p>
      <p>tion;
•  : the number of artworks present in the
exhibi•   : the  -th visitor;
•  : the number of visitors we are going to track
•   : the overall time that   spends to visit the
entire exhibition. In this way, the time spent on
each artwork is calculated as a percentage and

∑</p>
      <p>=1   = 100.</p>
      <p>We can define a model for the  -th visitor as:
  = { 1, … ,   }</p>
      <p>where  ℎ, ℎ = 1, … ,  is the percent time that the the
visitor  spends in front of the artwork ℎ. The comparison
of the previous visitor’s model with the the following
 -th model:</p>
      <p>= { 1, … ,   }

1
200

∑ |
ℎ=1</p>
      <p>ℎ − ℎ|
(1)
 (
 ,  
) = (| 1 − 1|+⋯+|</p>
      <p>−  |)/200 =
In this way, the measure is a real number in [0, … , 1]. The
definition given in 1 is a measure, as it enjoys properties
of Positiveness, Minimality and Simmetry.</p>
      <p>Positiveness Formula 1 is a sum of positive values,
consequently this property is fulfilled.</p>
      <p>Minimality When two</p>
      <p>s coincide, their distance has
is equal to 1.


to be 0, which means that the times spent for
each artwork are the same ( ℎ =  ℎ, ∀ℎ = 1, … ,  )
and the overall sum is 0. On the contrary, if two
s are completely diferent, it means that the
two visitors have seen diferent artworks. When
 ≥ 0,   = 0 and vice-versa. In this way, the sum
Simmetry It is given by the absolute value of the
observation time diference.</p>
    </sec>
    <sec id="sec-3">
      <title>4. Data analysis</title>
      <p>Data collection and analysis is useful for museum
curators and staf members because it allows them to work
on the fruition of the exhibition itself, as well as on the
visitor flow. However, it also makes more refined
analyzes possible. Our system analyzes the video recorded
by the cameras positioned near each point of interest
(POI) (i.e., artwork) and from each frame. It can
identify the position in pixels of the badge and the distance
from the camera for each visitor positioned in front of
the POI up to a distance of 6 meters. By collecting and
triangulating the data from the diferent recordings, we
can track the path of each visitor in each room of the
museum. Some of the analyses that can be performed on
the data acquired by this system are:
• Temporal analysis regarding the time spent in
front of the artworks;
the artworks;
• Analysis of the trajectory of visitors in front of
• Spatial-temporal analyses between viewing
distance and time spent in front of an artwork;
• Heatmap of temporal data on visitor positions in
the room.</p>
      <p>The system also allows us to perform a more refined
analysis because it allows facial identification. Therefore,
it enables us to collect specific data about the visitor’s
attention to the artwork. If the face is not identified by
the system, we can assume that the visitor is distracted
and, therefore, increase the granularity of the analysis.
cultures.</p>
      <p>The possible future developments of the research work
presented herein are manifold. First of all, there is the
integration of the data collection system within a social
recommender system [31, 32, 33]. This would allow us to
assess the efective benefits of our system in terms of the
inclusion of individuals from diferent backgrounds [ 34].</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>G. D'Aniello</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Gaeta</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <article-title>La Rocca, KnowMISABSA: an overview and a reference model for apFigure 2: Movements of one visitor. plications of sentiment analysis and aspect-based sentiment analysis</article-title>
          ,
          <source>Artificial Intelligence Review</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>G. D'Aniello</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Gaeta</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Orciuoli</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Sansonetti</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Sorgente</surname>
          </string-name>
          ,
          <article-title>Knowledge-based smart city service system</article-title>
          ,
          <source>Electronics (Switzerland) 9</source>
          (
          <issue>2020</issue>
          )
          <fpage>1</fpage>
          -
          <lpage>22</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G.</given-names>
            <surname>Sansonetti</surname>
          </string-name>
          ,
          <article-title>Point of interest recommendation based on social and linked open data</article-title>
          ,
          <source>Personal and Ubiquitous Computing</source>
          <volume>23</volume>
          (
          <year>2019</year>
          )
          <fpage>199</fpage>
          -
          <lpage>214</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>G.</given-names>
            <surname>Sansonetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Gasparetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Micarelli</surname>
          </string-name>
          ,
          <article-title>Crossdomain recommendation for enhancing cultural heritage experience</article-title>
          ,
          <source>in: Adjunct Publication of the 27th ACM UMAP Conference</source>
          , ACM, New York, NY, USA,
          <year>2019</year>
          , pp.
          <fpage>413</fpage>
          -
          <lpage>415</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Fogli</surname>
          </string-name>
          , G. Sansonetti,
          <article-title>Exploiting semantics for context-aware itinerary recommendation</article-title>
          ,
          <source>Personal and Ubiquitous Computing</source>
          <volume>23</volume>
          (
          <year>2019</year>
          )
          <fpage>215</fpage>
          -
          <lpage>231</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>D. D'Agostino</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Gasparetti</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Micarelli</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Sansonetti</surname>
          </string-name>
          ,
          <article-title>A social context-aware recommender of itineraries between relevant points of interest</article-title>
          ,
          <source>in: HCI International</source>
          <year>2016</year>
          , volume
          <volume>618</volume>
          , Springer International Publishing, Cham,
          <year>2016</year>
          , pp.
          <fpage>354</fpage>
          -
          <lpage>359</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>