<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Smart Assistant for Visual Recognition of Painted Scenes</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Federico Concone</string-name>
          <email>federico.concone@unipa.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roberto Giaconia</string-name>
          <email>roberto.giaconia@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giuseppe Lo Re</string-name>
          <email>giuseppe.lore@unipa.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Morana</string-name>
          <email>marco.morana@unipa.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Joint Proceedings of the ACM IUI 2021 Workshops</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Smart Cities and Communities National Lab CINI - Consorzio Interuniversitario Nazionale per l'Informatica</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Palermo, Department of Engineering</institution>
          ,
          <addr-line>Viale delle Scienze, ed. 6, 90128, Palermo</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Nowadays, smart devices allow people to easily interact with the surrounding environment thanks to existing communication infrastructures, i.e., 3G/4G/5G or WiFi. In the context of a smart museum, data shared by visitors can be used to provide innovative services aimed to improve their cultural experience. In this paper, we consider as case study the painted wooden ceiling of the Sala Magna of Palazzo Chiaramonte in Palermo, Italy and we present an intelligent system that visitors can use to automatically get a description of the scenes they are interested in by simply pointing their smartphones to them. As compared to traditional applications, this system completely eliminates the need for indoor positioning technologies, which are unfeasible in many scenarios as they can only be employed when museum items are physically distinguishable. Experimental analysis aimed to evaluate the performance of the system in terms of accuracy of the recognition process, and the obtained results show its efectiveness in a real-world application scenario.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Machine Learning</kwd>
        <kwd>Human-Computer Interaction (HCI)</kwd>
        <kwd>Cultural Heritage</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Smart personal devices, such as smartphones
and tablets, have totally changed the way
people live. In addition to the traditional
calling and messaging capabilities, these devices
come with heterogeneous sensors that allow
users to collect and share information with
the surrounding environment, thus paving
the way to a new generation of applications
that would not otherwise be possible [1, 2].</p>
      <p>For instance, people with visual and hearing
© 2021 Copyright for this paper by its authors. Use
permitted under Creative Commons License Attribution 4.0
InterCPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g (nCCatEiEoUnUaRlR(C-CWBYSW4..o0o).rrgk)shop Proceedings
impairments may rely on specific services
provided by their smartphones to move in
public spaces, so helping them to live in a
more independent way [3].</p>
      <p>In this context, social and cultural
inclusion represents a primary goal for many
innovative IT systems. A smart museum, for
example, is a suitable scenario for this type
of solutions because a wide range of services
can be provided to users in order to create
a more inclusive and informative cultural
experience, both physically and virtually. In
an ideal smart museum, visitors should be
able to get suggestions about the items to
visit, as well as personalized descriptions
of the works according to their individual
knowledge (like a guided tour would do).</p>
      <p>Artificial Intelligence (AI) methods provide
an invaluable help to realize these services,
while also being non-invasive and ensuring
the visitor’s freedom to follow or not the
suggestions provided. This last aspect is of the various illustrated scenes and AI
techfundamental to any system, since it certainly niques. In particular, a Convolutional Neural
makes the exposition more appealing and Network (CNN) is employed to synthesize
captivating to visitors. a set of features from a given picture in</p>
      <p>For these reasons, modern museums have input, and a distance-based classification
recently started a process of deep trans- algorithm is used for the final inference.
formation by developing new interactive Such method normally has low accuracy on
interfaces and public spaces to meet the chal- single images, so the proposed solution relies
lenges raised by the technological revolution on the contextual classification of multiple
of the last years. Moreover, it should not be images. This kind of expedient exploits the
ignored that in cultural sites a broad range overall characteristics of the paintings and
of visitors can be found, from the youngest has not been researched by other works in
to the oldest. The younger generations the literature, since most image recognition
are accustomed to the new technologies, algorithms usually try to recognize generic
while others might feel alienated in a smart objects inside artworks.
environment that encourages them to use The remainder of the paper is organized
their personal devices. Hence, an inclusive as follows: related work is outlined in
Secsmart museum should allow visitors to tion 2. Section 3 introduces the case study,
access all of its services, working towards while Section 4 describes the proposed
sysmaking them easily accessible to everyone. tem as well as the algorithms behind the AI</p>
      <p>In this paper, we address a scenario in recognition modules. The experimental
rewhich users of a smart museum can exploit sults are shown and discussed in Section 5.
ad-hoc intelligent services aimed at im- Finally, conclusions and future works follow
proving their visit experience by providing in Section 6.
personalized descriptions of the artworks of
interest.</p>
      <p>The way in which the specific contents for 2. Related Work
diferent users are selected [4, 5] is out the
scope of this work, which instead focuses Technologies for smart environments [6, 7],
on the intelligent system responsible for and smart museums in particular, have
supporting the visit thanks to the use of been deeply researched in recent years. As
smart mobile devices. Our case study is the a result, a variety of solutions have been
Sala Magna of Palazzo Chiaramonte (also proposed to address specific challenges. In
known as Steri) in Palermo, Italy, which is this paper, we focus on an intelligent IT
characterized by a unique wooden ceiling system capable of supporting computer
asfrom the 14th century containing a variety sisted guides. Early technologies employed
of paintings. Visitors can use our system in museums usually consisted of recorded
to take a picture of any painting they are descriptions, played by a small sound player
interested in and get the corresponding through headphones, which required the
description, completely eliminating the need visitors to follow a predetermined tour, or to
for indoor positioning technologies, which manually input a code for every work. More
are unfeasible in our case study as they recently, researchers suggested to replace
can only be used when museum items are these systems with applications directly
physically separated. usable through the users’ smart devices.</p>
      <p>Our solution exploits visual information In this context, in order to make the apps
easier to use, several works focused on the describes a SIFT-based artwork recognition
specific issue such as activity recognition [8] subsystem that operates in two steps: at
or indoor positioning, which allow to detect first the images captured from the camera
the user’s movements and position within are pre-processed to remove blurred frames;
the museum so as to provide him/her with then, the SIFT features are extracted and
ad-hoc information (e.g., a description of the classified. Moreover, authors exploit the
items in the nearby). visitor’s location in order to select the nearby</p>
      <p>Indoor positioning systems typically artworks, so greatly reducing the
computaexploit visitor’s smart devices in order to tional efort of the matching process while
interact with Bluetooth Low Energy (BLE) at the same time increasing the recognition
beacons, sensor networks [9], or WiFi accuracy.
infrastructure. Although these technologies Despite having similar performances
are becoming increasingly accurate, energy to SIFT, Convolutional Neural Networks
eficient and afordable [10], there are some are more suited for large scale Content
types of exhibitions in which diferent items Based Image Retrieval (CBIR), and they are
are necessarily placed close to each other, generally faster to extract features from an
so making the positioning system unable to input [18]. An enhanced museum
guididentify the real interests of the user. ance system based on mobile phones and</p>
      <p>In this paper, we address this scenario on-device object recognition was proposed
and present a diferent approach that can be in [19]; here, given the limited performance
deployed in a variety of exhibitions, allowing of the devices [20], the recognition was
smart touring where indoor positioning is performed by a single layer perceptron
not feasible. neural network. Such a solution can be used</p>
      <p>The issue of uniquely referring to items together with indoor positioning, but it is
placed close to each other is frequently outdated, as smart devices are now capable
addressed by means of Quick Response (QR) of running larger neural networks, and
codes that identify each single work of art. CNNs should be preferred for large scale
The approach described in [11], for instance, CBIR. In [21], the authors rely on the CNN
exploits mobile phone apps and QR codes to extract relevant features of a specific
to enable smart visits of a museum. Once artwork, while the classification is treated as
an item has been identified by means of a regression problem. CNNs are particularly
its QR code, the visitors are provided with suitable to synthesize a small set of values
an augmented description, including text, (features) from a query image, which can
images, sounds, and videos. This system, as then be compared to a database of examples
well as many others in the literature [12, 13], in order to find images with similar contents.
is extremely easy to use, although the users This enables fast image classification within
generally prefer to recognize the artwork a restricted dataset, even using a pre-trained
itself rather than QR codes or numbered network for inference.
codes, as discussed in [14, 15]. To this aim, More recent works are exploring the
posvarious image processing algorithms and sibility of using CNNs in combination with
methods have been proposed for this kind other well-known techniques. For example,
of task, including Scale-Invariant Feature [22] discusses two novel hybrid recognition
Transform (SIFT) and its faster but less systems that combine Bags of Visual Word
accurate counterpart, Speeded Up Robust and CNNs in a cooperative and synergistic
Features (SURF) [16]. For example, [17] way in order to achieve better content
recog</p>
    </sec>
    <sec id="sec-2">
      <title>3. Application Scenario</title>
      <sec id="sec-2-1">
        <title>The system presented here has been designed with the aim of providing a non-invasive solution to enjoy Palazzo Chiaramonte in Palermo, Italy.</title>
        <p>One of the worth noting place to visit in
Palazzo Chiaramonte is the Sala Magna with
its 14th-century wooden ceiling (see Fig. 1),
which measures 23 × 8 meters and is
composed of 24 beams, parallel to the short sides
of the hall, perpendicularly divided by a long
fake beam. On all sides of the beams and
ceiling cofers, various scenes are illustrated,
some telling mythological and hunting
stories, others with plants and patterns.</p>
        <p>The large number of visitors the Sala
Magna receives every year, each of whom
surely owns a smart device, and the many
frescoes present in the hall make this place
a perfect scenario to test intelligent
applications aiming to improve the visitors’
experience [23] during the tour. For example,
information gathered from personal devices
may be used to alert the visitors about the
influx of crowds for a specific artwork, so
allowing them to plan the tours according to
their personal needs.</p>
        <p>In the scenario addressed in this paper,
visitors are interested in knowing the stories
painted on the various parts of the ceiling,
stories that are very close to each other and
hardly distinguishable. Even with the
assistance of the tour guides or other tools (such
as QR codes), this characteristic makes it
very dificult for the visitors to find the scene
of interest, unless they manually count the
beams. Our idea is to let visitors exploit their
smart devices to locate a specific scene, select
it, and obtain an augmented description, e.g.,
by means of 3D images, stories or videos
telling of the scene. Moreover, once an item
has been recognized, descriptions can be
easily transferred to a “totem” located in
the Sala Magna, that is a smart touchscreen
enabling users to enjoy the scenes of interest
in a more comfortable way. This is very
important, for instance, to people that are
not accustomed to prolonged interactions
with small smart devices [14].</p>
        <p>Such an alternative is also very useful for
visiting groups, as it both allows individual
touring on smartphones and tour guides’
presentations at the multimedia totems.</p>
        <p>Take photo
Select pixel
Crops generation</p>
        <p>Triplets extraction</p>
        <p>Features extraction</p>
        <p>Classification</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. System Overview</title>
      <p>4.1. Crops Generation
The recognition system is based on a client- Visitors select a scene of interest by pointing
server architecture in which two diferent a specific region (e.g., object/item) using the
kinds of clients are available. The first is touchscreen of the smart device. The picture
the totem (i.e. the touch-screen monitor), is then decomposed in two sets of crops of
where visitors can browse the artworks diferent resolutions, namely 512 x 512 and
and their descriptions in a very clear way. 336 x 336 pixels. Each set consists of five
reThe second client is a mobile app that runs gions,  = {  ,   ,   ,   ,   }, the first centered
on the visitors’ personal smart devices, (C) on the pixel chosen by the user, the others
as well as other smartphones and tablets selected at the left (L), right (R), top (T), and
provided by the museum. In addition to the bottom (B) of the central one.
functionalities provided by the monitor, the While the center crops might seem
sufimobile software application includes a scene cient to classify the location selected by the
recognition service that can be also used in visitor, using adjacent crops and diferent
combination with the other kind of client, resolutions allows to increase the
recognii.e., sharing the identifier of the recognized tion performances, as it will be discussed
scene with the touch-screen totem. This in Section 5.2. The obtained crops are sent
enables visitors to select an object of interest to the server for the last three steps, i.e.
through the mobile devices and then show triplets extraction, feature extraction, and the
the results in the totem, thus enhancing classification tasks.
their experience within the museum. This
architecture is much lighter than three-level 4.2. Triplets and Feature
solutions based, for instance, on the fog Extraction
paradigm [24, 25] and it is adequate for the
purposes of the proposed application. In order to obtain a compact
representa</p>
      <p>In the following of this section we tion of input data, the server groups the
present the main phases of the recognition crops into two horizontal/vertical triplets
procedure, summarized in Fig. 2.
 1 = (  ,   ,   ) and  2 = (  ,   ,   ), and
builds feature vectors by using a
convolutional neural network. The adoption of CNN
in our system is justified by the intrinsic
nature of this category of neural networks, feature, the system leverages on a
Convoluwhich are specialized in processing data that tional Neural Network in which the last two
has a grid-like topology, such as an image. A dense layers and the soft-max function were
CNN is typically made up of three diferent discarded from the network. The underlying
kinds of layers, named convolutional, pooling, idea is to describe the content and shapes of
and fully-connected layers [26]. the graphical content of the crops with a high</p>
      <p>The convolutional layer aims to synthesize level of abstraction, so that it is possible to
the spatial relationship between pixels of the make a comparison by only using the feature
input image, without losing features which vectors.
are critical for getting a good prediction. This To be more specific, the network model we
layer uses a combination of linear and non- adopted (refers to Fig. 3) is made of 13
convolinear operations, i.e., convolution operation lution layers, with 3-by-3 kernel size filters,
and activation function. The convolution is each one using a ReLU activation function. It
an element-wise product between the input also employs 5 pooling layers, specifically
2image and a kernel, and its output is a fea- by-2 max-pooling, to achieve downsampling.
ture map containing diferent characteristics The original network [28] uses 3 dense
layof the input image. More are the kernels used ers and a final soft-max function, specifically
during the analysis, more feature maps are targeted towards the 1000 classes of the
Imagenerated. The feature maps are then evalu- geNet dataset; our network only uses the first
ated by means of a nonlinear activation func- dense layer, so the output of our network is
tion, such as the sigmoid, hyperbolic, or rec- a set of 4096 features.
tified linear unit (ReLU) functions.</p>
      <p>The pooling layer performs a downsam- 4.3. Classification
pling operation with the aim to reduce the
spatial dimensionality of the feature maps The complete classification process consists
generated at the previous layer and, at the in a two steps procedure. At first, the
same time, to extract dominant features 4096-elements feature vectors obtained
invariant in rotation and position. One of from each crop in the triplets are classified
the most adopted operation at this stage is according to a minimum distance
apthe max pooling, which applies a filter of size proach [29]. Generally, given a training set
 ×  to the feature maps, and extracts the  = { 1,  2, … ,   } and the corresponding
maximum value for each of them. label set  = { 1,  2, … ,   }, a new point  ∗</p>
      <p>Finally, the pooling layer output is trans- of unknown class  ∗ is classified to   ∈ 
formed in a one-dimensional array and if the distance measure  ( ∗,   ),   ∈  , is
mapped by a subset of fully-connected smaller than those to all other points into
layers to the final outputs of the network. the training set:
Hence, these layers return class scores
for classification and regression purposes,
using the same principles of the traditional  ∗ =   if  ( ∗,   ) &lt;  ( ∗,   ), (1)
Multi-Layer Perceptron neural network  ≠ ,  = 1, … , .
(MLP). Here, each feature vector associated with the</p>
      <p>If such a layer is not included into the neu- crop in   is classified as belonging to the class
ral network, then the CNN can be used to by using the Frobenius distance [30];
extract a set of features from the input im-   ∈ 
thus, the output of the first classification step
age [27]. As our goal is only to extract the</p>
      <p>STACK 2</p>
      <p>STACK 3
2x2windowwithstride1
2x2windowwithstride1
is represented by two new triplets  1 and  2 but rather improves the system precision.
containing the predicted class for each crop.</p>
      <p>The second phase aims to evaluate if crops
in a triplet   were classified as depicting the 5. Experimental Evaluation
same object. To accomplish such a task, we
introduced the concepts of strong confidence The efectiveness of the proposed solution
and a weak confidence for classification. The was evaluated through several experiments
ifrst one is achieved when every element of focused on the case study described in
the triplet is associated with the same object; Section 3.
the latter occurs when only two elements are
associated with the same class. 5.1. Experimental Setup</p>
      <p>Firstly, the system checks for a strong
confidence in any of the triplets and, if none
is found, it tries for the weak confidence. If
none of the triplets achieve strong or weak
confidences, the visitor is asked to take a
new picture of the artwork. This process
is performed in near real-time, therefore it
does not slow down the visiting experience,
The experiments were carried out using
three diferent models of smartphones, and
one tablet (see Table 1) provided with the
client software application. The mobile app
(supporting both Android and iOS systems)
supports visitors during all the recognition
phases described so far, i.e., it allows to
observe and select a scene in the wooden
ceiling, automatically extracts the crops ered 100 diferent relevant locations within
from it and sends them to the classification the ceiling, thus the number of images
server. Moreover, the software application is obtained from each device is 1500.
also able to manage information received by The number of locations is calculated by
the server, thus enabling visitors to read the taking into account the specific structure of
description of the scene in the device itself our case study. The ceiling is made of 24
or on the touch-screen monitor. beams, each of which is divided in two parts</p>
      <p>Fig. 4 shows three examples of the by a central beam; on each side, we defined
smartphone-side application. The leftmost two locations: one for the side of the beam
image represents the interface visitors can facing East, and one for that facing West (i.e.
use to login to the service. The image 4 locations for each beam). In addition to
in the center is the main interface of the these 96 classes, the East and West walls of
application that allows visitors to zoom the hall also have paintings similar to the
in/click on a detail of the scene of interest sides of the beams, so 4 further locations
and stars the remote recognition process. were considered.</p>
      <p>Finally, the rightmost image shows infor- Early experiments were conducted by
mation provided to users for the recognized randomly splitting such a dataset into
element; in particular, in addition to the title training and testing sets; we will refer to
and the description of the scene, the visitors this case as mixed dataset. Then, other tests
are provided with high resolution pictures were performed by dividing the dataset so
of the details of interest that are dificult to that the training and testing sets contained
distinguish by standing at a great distance images acquired from diferent cameras.
from the ceiling. This separate dataset is closer to the
ap</p>
      <p>While the CNN is pre-trained on Ima- plication scenario because every visitor
geNet, the dataset to train the classification will use devices with camera settings and
algorithm and perform the experiments was characteristics that might difer from the
captured using the devices listed in Table 1. ones used to train the system.
Each class in the dataset corresponds to a
part of the ceiling, captured in three pictures 5.2. Recognition Results
taken from diferent positions (Fig. 5), for
each of which five regions of interest have The first set of experiments aimed to assess
been manually selected (Fig. 6). We consid- the system performance when considering
gle crop classification by using the separate
dataset. Results from Fig 7-b show a
similar trend as the train-to-test ratio increases,
but also highlight a significantly lower mean
accuracy, thus demonstrating the inadequacy
of a single crop to drive the classification
process.</p>
      <p>The next set of experiments concerns the
evaluation of the proposed three crops
classiifcation system, in which two diferent
classification confidence settings are introduced,
Figure 6: Example of manual cropping of the namely weak and strong.
photo to create the dataset. Performances were evaluated both in
terms of accuracy and percentage of crop
discarded because they have not reached a
the recognition of a single (central) crop weak or strong classification confidence. It
from the mixed dataset. The mean accuracy is worth noting that discarded images imply
achieved in this case is shown in Fig 7-a, that visitors would be asked to take the
where each bar represents a diferent ratio photos again; thus, the lower this value, the
of training and testing samples. Since higher the usability of the system.
our classifier has to be able to distinguish Fig. 8 shows the result obtained on the
between 100 classes (scenes), results indicate separate dataset. By observing the mean
that a larger number of training samples accuracy values (Fig. 8-a) we can notice
are required in order to obtain satisfactory a significant improvement as compared
accuracy values. In this case, images for to the single crop classification (Fig 7-b).
both the training and the testing sets were Unfortunately, Fig. 8-b indicates that as the
captured by using the same devices, which number of samples in the training set varies,
is not representative of a real scenario the number of discarded crops is stably high.
that involves hundreds of testing devices This is mainly due to not having enough
equipped with diferent cameras. images in the dataset. For this reason, the
For this reason, we also evaluated the sin- next set of experiments aimed to evaluate the
impact of data augmentation. This technique
is used to artificially increase the number tion improved less, but still in a noticeable
of samples in the training set in order to way: from 78% accuracy and 80% discarded
extract additional information [31]. In our queries, to 88% accuracy and 68% discarded
system, data augmentation is performed by queries. This confirms that data
augmencreating crops of diferent resolutions of the tation enhances the performances of the
original locations of interest so as to obtain classifier without requiring the involvement
new samples. of new capturing devices.</p>
      <p>Results in Fig. 9 show that data augmen- The last set of experiments was aimed to
tation causes an increase in accuracy and a assess the classification procedure that will
decrease in the number of discarded crops. be actually performed by the smart museum
Weak classification best results improved application: instead of using only one triplet
from 56% accuracy and 47% discarded of crops, users’ devices will send to the
clasqueries, to 69% accuracy and 41% discarded sification server all ten crops introduced in
queries respectively. The strong classifica- Section 4.1, divided in 4 horizontal/vertical
)100
%
(
ed 80
d
r
isca 60
D
and 40
y
c
ra 20
u
c
c
A 0</p>
    </sec>
    <sec id="sec-4">
      <title>6. Conclusions</title>
    </sec>
    <sec id="sec-5">
      <title>7. Acknowledgments</title>
      <sec id="sec-5-1">
        <title>This research is partially funded by the</title>
        <p>Project VASARI of Italian MIUR (PNR
2015-2020, DD MIUR n. 2511).
adoption of CNNs allows to extract features
from 10 diferent regions of the photo taken
by a visitor, taking advantage of the shape
of the items in our specific scenario.
Experimental results showed the performance of
the recognition system in terms of accuracy
and percentage of discarded crops, thus
proving its efectiveness in a real-world
application scenario.</p>
        <p>The system will be soon deployed to
support the visitors of Palazzo Chiaramonte.</p>
        <p>This will enable us to collect a greater
number of query examples (captured from a
wide range of devices), which could also be
exploited to further refine the model.
user experience in museums, in: 2017 age and Music, Springer Berlin
HeidelArtificial Intelligence and Signal Pro- berg, Berlin, Heidelberg, 2010, pp. 170–
cessing Conference (AISP), IEEE, Pis- 183.
cataway, NJ, USA, 2017, pp. 195–200. [17] S. Alletto, R. Cucchiara, G. Del Fiore,
doi:10.1109/AISP.2017.8324080. L. Mainetti, V. Mighali, L. Patrono,
[11] T. Octavia, A. Handojo, W. T. KUSUMA, G. Serra, An indoor location-aware
T. C. YUNANTO, R. L. THIOSDOR, system for an iot-based smart
muet al., Museum interactive edutainment seum, IEEE Internet of Things Journal
using mobile phone and qr code, vol- 3 (2016) 244–253. doi:10.1109/JIOT.
ume 15-17 June, 2019, pp. 815–819. 2015.2506258.
[12] M. S. Patil, M. S. Limbekar, M. A. Mane, [18] V. D. Sachdeva, J. Baber, M.
BakhtM. N. Potnis, Smart guide–an approach yar, I. Ullah, W. Noor, A. Basit,
Perto the smart museum using android, In- formance evaluation of sift and
conternational Research Journal of Engi- volutional neural network for image
neering and Technology 5 (2018). retrieval, Performance Evaluation 8
[13] S. Ali, B. Koleva, B. Bedwell, S. Ben- (2017).</p>
        <p>ford, Deepening visitor engagement [19] P. Föckler, T. Zeidler, B. Brombach,
with museum exhibits through hand- E. Bruns, O. Bimber, Phoneguide:
Mucrafted visual markers, in: Proceedings seum guidance supported by on-device
of the 2018 Designing Interactive Sys- object recognition on mobile phones,
tems Conference, DIS ’18, Association in: Proceedings of the 4th
Internafor Computing Machinery, New York, tional Conference on Mobile and
UbiqNY, USA, 2018, p. 523–534. doi:10. uitous Multimedia, MUM ’05,
Associ1145/3196709.3196786. ation for Computing Machinery, New
[14] L. Wein, Visual recognition in mu- York, NY, USA, 2005, p. 3–10. doi:10.
seum guide apps: Do visitors want it?, 1145/1149488.1149490.
in: Proceedings of the SIGCHI Con- [20] S. Gaglio, G. Lo Re, G. Martorella,
ference on Human Factors in Comput- D. Peri, Dc4cd: A platform for
dising Systems, CHI ’14, Association for tributed computing on constrained
deComputing Machinery, New York, NY, vices, ACM Transactions on Embedded
USA, 2014, p. 635–638. doi:10.1145/ Computing Systems 17 (2017). doi:10.
2556288.2557270. 1145/3105923.
[15] M. K. Schultz, A case study on the [21] G. Taverriti, S. Lombini, L. Seidenari,
appropriateness of using quick re- M. Bertini, A. Del Bimbo, Real-time
sponse (qr) codes in libraries and wearable computer vision system for
museums, Library &amp; Information improved museum experience, in:
ProScience Research 35 (2013) 207 – 215. ceedings of the 24th ACM International
doi:https://doi.org/10.1016/j. Conference on Multimedia, MM ’16,
lisr.2013.03.002. Association for Computing Machinery,
[16] B. Ruf, E. Kokiopoulou, M. Detyniecki, New York, NY, USA, 2016, p. 703–704.</p>
        <p>Mobile museum guide based on fast doi:10.1145/2964284.2973813.
sift recognition, in: M. Detyniecki, [22] G. Ioannakis, L. Bampis, A.
KoutU. Leiner, A. Nürnberger (Eds.), Adap- soudis, Exploiting artificial intelligence
tive Multimedia Retrieval. Identifying, for digitally enriched museum visits,
Summarizing, and Recommending Im- Journal of Cultural Heritage 42 (2020)
171 – 180. doi:https://doi.org/10.</p>
        <p>1016/j.culher.2019.07.019.
[23] G. Lo Re, M. Morana, M. Ortolani,</p>
        <p>Improving user experience via motion
sensors in an ambient intelligence
scenario, 2013, pp. 29–34.
[24] F. Concone, G. Lo Re, M. Morana, A
fog-based application for human
activity recognition using personal smart
devices, ACM Transactions on Internet
Technology 19 (2019). doi:10.1145/
3266142.
[25] F. Concone, G. Lo Re, M. Morana,</p>
        <p>Smcp: a secure mobile
crowdsensing protocol for fog-based
applications, Human-centric Computing and
Information Sciences 10 (2020). doi:10.</p>
        <p>1186/s13673-020-00232-y.
[26] R. Yamashita, M. Nishio, R. K. G. Do,</p>
        <p>K. Togashi, Convolutional neural
networks: an overview and application
in radiology, Insights into imaging 9
(2018) 611–629.
[27] T. Bluche, H. Ney, C. Kermorvant,</p>
        <p>Feature extraction with convolutional
neural networks for handwritten word
recognition, in: 2013 12th
International Conference on Document
Analysis and Recognition, IEEE, IEEE,
Piscataway, NJ, USA, 2013, pp. 285–289.
[28] K. Simonyan, A. Zisserman, Very
deep convolutional networks for
largescale image recognition, arXiv preprint
arXiv:1409.1556 (2014).
[29] P. Kamavisdar, S. Saluja, S. Agrawal,</p>
        <p>A survey on image classification
approaches and techniques, International
Journal of Advanced Research in
Computer and Communication Engineering
2 (2013) 1005–1009.
[30] G. H. Golub, et al., Cf vanloan,
matrix computations, The Johns Hopkins
(1996).
[31] A. Mikołajczyk, M. Grochowski,</p>
        <p>Data augmentation for improving</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>deep learning in image classification problem</article-title>
          , in: 2018
          <source>International Interdisciplinary PhD Workshop</source>
          (IIPhDW), IEEE, Piscataway, NJ, USA,
          <year>2018</year>
          , pp.
          <fpage>117</fpage>
          -
          <lpage>122</lpage>
          . doi:
          <volume>10</volume>
          .1109/IIPHDW.
          <year>2018</year>
          .
          <volume>8388338</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>