-

Managing crowded museums: Visitors flow mea- Deep learning techniques for automatic detection, surement, analysis, modeling, and optimization, IEEE Access

Approach to Model Museum Visitors

Alessio Ferrato

ale.ferrato@stud.uniroma3.it 1 2

Carla Limongelli

limongel@dia.uniroma3.it 1 2

Mauro Mezzini

mauro.mezzini@uniroma3.it 0 2

Giuseppe Sansonetti

1 2

Helsinki, Finland

0 Department of Education, Roma Tre University , Viale del Castro Pretorio 20, 00185 Rome , Italy 1 Department of Engineering, Roma Tre University , Via della Vasca Navale 79, 00146 Rome , Italy 2 relies on the Faster Region-based Convolutional Neural

2016

8 2020 352 355

Although ubiquitous and fast access to the Internet allows us to admire objects and artworks exhibited worldwide from the comfort of our home, visiting a museum or an exhibition remains an essential experience today. Current technologies can help make that experience even more satisfying. For instance, they can assist the user during the visit, personalizing her experience by suggesting the artworks of her higher interest and providing her with related textual and multimedia content. To this aim, it is necessary to automatically acquire information relating to the active user. In this paper, we show how a deep neural network-based approach can allow us to obtain accurate information for understanding the behavior of the visitor alone or in a group. This information can also be used to identify users similar to the active one to suggest not only personalized itineraries but also possible visiting companions for promoting the museum as a vehicle for social and cultural inclusion.

User interfaces Computer vision Deep Learning Museum visitors

1. Introduction and Background Recent technological advances have made it possible to significantly improve the experience of citizens when they use public services [ 1, 2 ] or when they enjoy points Among the diferent possible points of interest, there are also museums and exhibits. The first studies concerning the observation and analysis of museum visitor behavior date back to the first half of the twentieth century [ 7, 8, 9].

Since then, the works that publish studies based on the analysis of visitor tracking have multiplied, namely, on the detailed recording of “not only where visitors go but also what visitors do while inside an exhibition” [10]. In early studies on the subject, the most common method for recording visitor behavior was the paper-and-pencil one. Although this method is simple and low-cost, several aspects limit its validity. Among these, the lack of temporal information, more complicated to collect by the observer, the need to transfer the data collected on paper to the database, and the inability to accurately determine the visitor’s real engagement. Fortunately, recent technological advances in Machine Learning [11] have made available to researchers several approaches for automatic visitor tracking [12]. In the research literature, there are several contributions on the technologies adopted for R-CNN). divided into three major parts: 1. A CNN backbone (composed of a ResNet [27] and a Feature Pyramid Network (FPN) [28]) that receives in input the image and gives in output a table position is fed by the detections of the model, the conv feature map; table camera is determined and created in advance by the 2. A Region Proposal Network (RPN) that takes as system supervisor. First of all, it can be convenient to input the conv feature map in output from the create the view dist_positions using the SQL Query 1. backbone and returns a set of rectangular boxes, each of which is associated with a score giving Query 1: View creation the likelihood that the region contains an object CREATE VIEW dist_positions AS or simple background; SELECT DISTINCT P.TIMESTMP as TIMESTMP, P.BADGE_ID as 3. The conv feature map given in output from the BADGE_ID, C.CT as CT, backbone is also used by a detection system that, /P.*XCh+anC.gXinAgSthX,e reference system */ given the set of regions from the RPN, determines, P.Y + C.Y AS Y, using the conv feature map of the backbone, for P.Z + C.Z AS Z each distinct class, a score ∈ [ 0, 1 ]. This value FROM positions P, camera C represents the likelihood that the object belongs WHERE P.CAMERA_ID = C.CAMERA_ID to the corresponding class. The detection system Furthermore we add another table artwork with atalso estimates the regression bounding box coor- tributes (ID, AX, AY, AZ, AW, AH). A tuple (, , , , , ℎ) dinates , , , ℎ of the object being proposed by of table artwork records the of the artwork, the upper the RPN. right corner coordinates , , (with respect to the muIn this paper, we show how the gathered information seum), the height ℎ and width of a rectangular box in can be used to model museum visitors and identify their front of the artwork . nearest neighbors. The aim is to promote the museum Using this simple database we may easily and efecas a vehicle to foster the social and cultural inclusion of tively retrieve the data described in the previous section. its visitors. In particular, we may obtain the distances between all pairs of distinct visitors (identified by a badge ID) and at the same choose only those pairs which have a distance 2. Database with Collected Data less than a prefixed threshold . In fact, we may first create the SQL view visit_times (Query 2) that computes for each artwork the total time a badge has spent nearby the artwork, for all badge ids and for the time interval between 0 and 1. In the following, for the sake of simplicity, we assume that every badge has been detected in front of every possible artwork. By this assumption, the query visit_times returns for every badge and every artwork at least a value.

In order to make the analysis of the data collection easy and at the same time efective, we propose the following database implementation and give some sample queries that could cover the most basic and useful needs when a museum staf member wants to extract useful information about visitor behavior from the database. The data collected through the proposed system can be stored in a data structure that supports spatial and temporal analyses of visitor behavior [29]. Let us suppose, for Query 2: Badge-Artwork visit time example, that we have cameras and badges. Each camera detects, at a generic timestamp, a badge at cer- SCERLEEACTTE VpI.BEAWDvGiEs_iItD_atismBeAsDAGSE_ID, a.ID as A_ID, SUM(p.CT) tain coordinates from the camera. We can store all those as V_TIMES detections in a database composed of two tables. The FROM dist_positions p, artwork a ifrst table, called positions, has attributes (TIMESTMP, WHERE p.TIMESTMP BETWEEN t_0 AND t_1 AND CAMERA_ID, BADGE_ID, X, Y, Z) and the second table, p.X BETWEEN a.AX AND a.AX + a.AW AND called camera, has attributes (CAMERA_ID, CT, X, Y, Z). pp..YZ BBEETTWWEEEENN aa..AAYZ AANNDD aa..AAYZ ++ 2a..A7H0 AND A single tuple (, _, _, , , ) of position represents GROUP BY p.bid, a.id a detection at timestamp from the camera _ of the badge _ at coordinates , , with respect to camera Using this view, we can compute the total time a single _ . A single tuple ( _, , , , ) of camera represents badge spent in front of any artwork (i.e, total_visit_times the coordinates , , of the camera _ in relation to the in Query 3). museum. The value is the time period of a frame. If is the frame rate of the camera, then we have = 1/ . Query 3: Badge total visit time For the sake of simplicity, hereafter, we suppose that CREATE VIEW total_visit_times AS assumes the same value for all cameras (i.e., 1/24 s), but SELECT BADGE_ID,SUM(V_TIMES) as TOTAL_TIMES all the discussion can be extended with simple and mini- FROM visit_times mal modifications to the general case, in which cameras GROUP BY BADGE_ID can have diferent frame rates. We note that, whilst the

Then, using the two previous views, namely, visit_times is defined by the following measure: and total_visit_times, we can compute the percentage of time spent by any badge in front of any artwork (i.e., total_visit_times_perc in Query 4).

Query 4: Badge-Artwork percentage time CREATE VIEW visit_times_perc AS SELECT BADGE_ID, A_ID, V_TIMES*100/TOTAL_TIMES as

V_TIMES_PERC

3. Distance

We want to define a measure that allows us to compare two visitors according to the time they spend in front on each artwork. Instead of considering absolute values, we reason on a percentage basis. Let us assume that each visitor spends a unit of time at the exhibition and let us calculate the percentage of the time spent observing each artwork. Let us define:

tion; • : the number of artworks present in the exhibi• : the -th visitor; • : the number of visitors we are going to track • : the overall time that spends to visit the entire exhibition. In this way, the time spent on each artwork is calculated as a percentage and ∑

=1 = 100.

We can define a model for the -th visitor as: = { 1, … , }

where ℎ, ℎ = 1, … , is the percent time that the the visitor spends in front of the artwork ℎ. The comparison of the previous visitor’s model with the the following -th model:

= { 1, … , } 1 200 ∑ | ℎ=1

ℎ − ℎ| (1) ( , ) = (| 1 − 1|+⋯+|

− |)/200 = In this way, the measure is a real number in [0, … , 1]. The definition given in 1 is a measure, as it enjoys properties of Positiveness, Minimality and Simmetry.

Positiveness Formula 1 is a sum of positive values, consequently this property is fulfilled.

Minimality When two

s coincide, their distance has is equal to 1. to be 0, which means that the times spent for each artwork are the same ( ℎ = ℎ, ∀ℎ = 1, … , ) and the overall sum is 0. On the contrary, if two s are completely diferent, it means that the two visitors have seen diferent artworks. When ≥ 0, = 0 and vice-versa. In this way, the sum Simmetry It is given by the absolute value of the observation time diference.

4. Data analysis

Data collection and analysis is useful for museum curators and staf members because it allows them to work on the fruition of the exhibition itself, as well as on the visitor flow. However, it also makes more refined analyzes possible. Our system analyzes the video recorded by the cameras positioned near each point of interest (POI) (i.e., artwork) and from each frame. It can identify the position in pixels of the badge and the distance from the camera for each visitor positioned in front of the POI up to a distance of 6 meters. By collecting and triangulating the data from the diferent recordings, we can track the path of each visitor in each room of the museum. Some of the analyses that can be performed on the data acquired by this system are: • Temporal analysis regarding the time spent in front of the artworks; the artworks; • Analysis of the trajectory of visitors in front of • Spatial-temporal analyses between viewing distance and time spent in front of an artwork; • Heatmap of temporal data on visitor positions in the room.

The system also allows us to perform a more refined analysis because it allows facial identification. Therefore, it enables us to collect specific data about the visitor’s attention to the artwork. If the face is not identified by the system, we can assume that the visitor is distracted and, therefore, increase the granularity of the analysis. cultures.

The possible future developments of the research work presented herein are manifold. First of all, there is the integration of the data collection system within a social recommender system [31, 32, 33]. This would allow us to assess the efective benefits of our system in terms of the inclusion of individuals from diferent backgrounds [ 34].

[1] G. D'Aniello , M.

Gaeta , I.

La Rocca, KnowMISABSA: an overview and a reference model for apFigure 2: Movements of one visitor. plications of sentiment analysis and aspect-based sentiment analysis , Artificial Intelligence Review ( 2022 ).

[2] G. D'Aniello , M.

Gaeta , F.

Orciuoli , G.

Sansonetti , F.

Sorgente , Knowledge-based smart city service system , Electronics (Switzerland) 9 ( 2020 ) 1 - 22 .

[3]

Sansonetti , Point of interest recommendation based on social and linked open data , Personal and Ubiquitous Computing 23 ( 2019 ) 199 - 214 .

[4]

Sansonetti ,

Gasparetti ,

Micarelli , Crossdomain recommendation for enhancing cultural heritage experience , in: Adjunct Publication of the 27th ACM UMAP Conference , ACM, New York, NY, USA, 2019 , pp. 413 - 415 .

[5]

Fogli , G. Sansonetti, Exploiting semantics for context-aware itinerary recommendation , Personal and Ubiquitous Computing 23 ( 2019 ) 215 - 231 .

[6] D. D'Agostino , F.

Gasparetti , A.

Micarelli , G.

Sansonetti , A social context-aware recommender of itineraries between relevant points of interest , in: HCI International 2016 , volume 618 , Springer International Publishing, Cham, 2016 , pp. 354 - 359 .