1. Introduction

A Smart Assistant for Visual Recognition of Painted Scenes

Federico Concone

federico.concone@unipa.it 0 1 2

Roberto Giaconia

roberto.giaconia@gmail.com 0 1

Giuseppe Lo Re

giuseppe.lore@unipa.it 0 1 2

Marco Morana

marco.morana@unipa.it 0 1 2 0 Joint Proceedings of the ACM IUI 2021 Workshops 1 Smart Cities and Communities National Lab CINI - Consorzio Interuniversitario Nazionale per l'Informatica 2 University of Palermo, Department of Engineering , Viale delle Scienze, ed. 6, 90128, Palermo , Italy

Nowadays, smart devices allow people to easily interact with the surrounding environment thanks to existing communication infrastructures, i.e., 3G/4G/5G or WiFi. In the context of a smart museum, data shared by visitors can be used to provide innovative services aimed to improve their cultural experience. In this paper, we consider as case study the painted wooden ceiling of the Sala Magna of Palazzo Chiaramonte in Palermo, Italy and we present an intelligent system that visitors can use to automatically get a description of the scenes they are interested in by simply pointing their smartphones to them. As compared to traditional applications, this system completely eliminates the need for indoor positioning technologies, which are unfeasible in many scenarios as they can only be employed when museum items are physically distinguishable. Experimental analysis aimed to evaluate the performance of the system in terms of accuracy of the recognition process, and the obtained results show its efectiveness in a real-world application scenario.

eol>Machine Learning Human-Computer Interaction (HCI) Cultural Heritage

1. Introduction

Smart personal devices, such as smartphones and tablets, have totally changed the way people live. In addition to the traditional calling and messaging capabilities, these devices come with heterogeneous sensors that allow users to collect and share information with the surrounding environment, thus paving the way to a new generation of applications that would not otherwise be possible [1, 2].

For instance, people with visual and hearing © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 InterCPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g (nCCatEiEoUnUaRlR(C-CWBYSW4..o0o).rrgk)shop Proceedings impairments may rely on specific services provided by their smartphones to move in public spaces, so helping them to live in a more independent way [3].

In this context, social and cultural inclusion represents a primary goal for many innovative IT systems. A smart museum, for example, is a suitable scenario for this type of solutions because a wide range of services can be provided to users in order to create a more inclusive and informative cultural experience, both physically and virtually. In an ideal smart museum, visitors should be able to get suggestions about the items to visit, as well as personalized descriptions of the works according to their individual knowledge (like a guided tour would do).

Artificial Intelligence (AI) methods provide an invaluable help to realize these services, while also being non-invasive and ensuring the visitor’s freedom to follow or not the suggestions provided. This last aspect is of the various illustrated scenes and AI techfundamental to any system, since it certainly niques. In particular, a Convolutional Neural makes the exposition more appealing and Network (CNN) is employed to synthesize captivating to visitors. a set of features from a given picture in

For these reasons, modern museums have input, and a distance-based classification recently started a process of deep trans- algorithm is used for the final inference. formation by developing new interactive Such method normally has low accuracy on interfaces and public spaces to meet the chal- single images, so the proposed solution relies lenges raised by the technological revolution on the contextual classification of multiple of the last years. Moreover, it should not be images. This kind of expedient exploits the ignored that in cultural sites a broad range overall characteristics of the paintings and of visitors can be found, from the youngest has not been researched by other works in to the oldest. The younger generations the literature, since most image recognition are accustomed to the new technologies, algorithms usually try to recognize generic while others might feel alienated in a smart objects inside artworks. environment that encourages them to use The remainder of the paper is organized their personal devices. Hence, an inclusive as follows: related work is outlined in Secsmart museum should allow visitors to tion 2. Section 3 introduces the case study, access all of its services, working towards while Section 4 describes the proposed sysmaking them easily accessible to everyone. tem as well as the algorithms behind the AI

In this paper, we address a scenario in recognition modules. The experimental rewhich users of a smart museum can exploit sults are shown and discussed in Section 5. ad-hoc intelligent services aimed at im- Finally, conclusions and future works follow proving their visit experience by providing in Section 6. personalized descriptions of the artworks of interest.

The way in which the specific contents for 2. Related Work diferent users are selected [4, 5] is out the scope of this work, which instead focuses Technologies for smart environments [6, 7], on the intelligent system responsible for and smart museums in particular, have supporting the visit thanks to the use of been deeply researched in recent years. As smart mobile devices. Our case study is the a result, a variety of solutions have been Sala Magna of Palazzo Chiaramonte (also proposed to address specific challenges. In known as Steri) in Palermo, Italy, which is this paper, we focus on an intelligent IT characterized by a unique wooden ceiling system capable of supporting computer asfrom the 14th century containing a variety sisted guides. Early technologies employed of paintings. Visitors can use our system in museums usually consisted of recorded to take a picture of any painting they are descriptions, played by a small sound player interested in and get the corresponding through headphones, which required the description, completely eliminating the need visitors to follow a predetermined tour, or to for indoor positioning technologies, which manually input a code for every work. More are unfeasible in our case study as they recently, researchers suggested to replace can only be used when museum items are these systems with applications directly physically separated. usable through the users’ smart devices.

Our solution exploits visual information In this context, in order to make the apps easier to use, several works focused on the describes a SIFT-based artwork recognition specific issue such as activity recognition [8] subsystem that operates in two steps: at or indoor positioning, which allow to detect first the images captured from the camera the user’s movements and position within are pre-processed to remove blurred frames; the museum so as to provide him/her with then, the SIFT features are extracted and ad-hoc information (e.g., a description of the classified. Moreover, authors exploit the items in the nearby). visitor’s location in order to select the nearby

Indoor positioning systems typically artworks, so greatly reducing the computaexploit visitor’s smart devices in order to tional efort of the matching process while interact with Bluetooth Low Energy (BLE) at the same time increasing the recognition beacons, sensor networks [9], or WiFi accuracy. infrastructure. Although these technologies Despite having similar performances are becoming increasingly accurate, energy to SIFT, Convolutional Neural Networks eficient and afordable [10], there are some are more suited for large scale Content types of exhibitions in which diferent items Based Image Retrieval (CBIR), and they are are necessarily placed close to each other, generally faster to extract features from an so making the positioning system unable to input [18]. An enhanced museum guididentify the real interests of the user. ance system based on mobile phones and

In this paper, we address this scenario on-device object recognition was proposed and present a diferent approach that can be in [19]; here, given the limited performance deployed in a variety of exhibitions, allowing of the devices [20], the recognition was smart touring where indoor positioning is performed by a single layer perceptron not feasible. neural network. Such a solution can be used

The issue of uniquely referring to items together with indoor positioning, but it is placed close to each other is frequently outdated, as smart devices are now capable addressed by means of Quick Response (QR) of running larger neural networks, and codes that identify each single work of art. CNNs should be preferred for large scale The approach described in [11], for instance, CBIR. In [21], the authors rely on the CNN exploits mobile phone apps and QR codes to extract relevant features of a specific to enable smart visits of a museum. Once artwork, while the classification is treated as an item has been identified by means of a regression problem. CNNs are particularly its QR code, the visitors are provided with suitable to synthesize a small set of values an augmented description, including text, (features) from a query image, which can images, sounds, and videos. This system, as then be compared to a database of examples well as many others in the literature [12, 13], in order to find images with similar contents. is extremely easy to use, although the users This enables fast image classification within generally prefer to recognize the artwork a restricted dataset, even using a pre-trained itself rather than QR codes or numbered network for inference. codes, as discussed in [14, 15]. To this aim, More recent works are exploring the posvarious image processing algorithms and sibility of using CNNs in combination with methods have been proposed for this kind other well-known techniques. For example, of task, including Scale-Invariant Feature [22] discusses two novel hybrid recognition Transform (SIFT) and its faster but less systems that combine Bags of Visual Word accurate counterpart, Speeded Up Robust and CNNs in a cooperative and synergistic Features (SURF) [16]. For example, [17] way in order to achieve better content recog

3. Application Scenario The system presented here has been designed with the aim of providing a non-invasive solution to enjoy Palazzo Chiaramonte in Palermo, Italy.

One of the worth noting place to visit in Palazzo Chiaramonte is the Sala Magna with its 14th-century wooden ceiling (see Fig. 1), which measures 23 × 8 meters and is composed of 24 beams, parallel to the short sides of the hall, perpendicularly divided by a long fake beam. On all sides of the beams and ceiling cofers, various scenes are illustrated, some telling mythological and hunting stories, others with plants and patterns.

The large number of visitors the Sala Magna receives every year, each of whom surely owns a smart device, and the many frescoes present in the hall make this place a perfect scenario to test intelligent applications aiming to improve the visitors’ experience [23] during the tour. For example, information gathered from personal devices may be used to alert the visitors about the influx of crowds for a specific artwork, so allowing them to plan the tours according to their personal needs.

In the scenario addressed in this paper, visitors are interested in knowing the stories painted on the various parts of the ceiling, stories that are very close to each other and hardly distinguishable. Even with the assistance of the tour guides or other tools (such as QR codes), this characteristic makes it very dificult for the visitors to find the scene of interest, unless they manually count the beams. Our idea is to let visitors exploit their smart devices to locate a specific scene, select it, and obtain an augmented description, e.g., by means of 3D images, stories or videos telling of the scene. Moreover, once an item has been recognized, descriptions can be easily transferred to a “totem” located in the Sala Magna, that is a smart touchscreen enabling users to enjoy the scenes of interest in a more comfortable way. This is very important, for instance, to people that are not accustomed to prolonged interactions with small smart devices [14].

Such an alternative is also very useful for visiting groups, as it both allows individual touring on smartphones and tour guides’ presentations at the multimedia totems.

Take photo Select pixel Crops generation

Triplets extraction

Features extraction

Classification

4. System Overview

4.1. Crops Generation The recognition system is based on a client- Visitors select a scene of interest by pointing server architecture in which two diferent a specific region (e.g., object/item) using the kinds of clients are available. The first is touchscreen of the smart device. The picture the totem (i.e. the touch-screen monitor), is then decomposed in two sets of crops of where visitors can browse the artworks diferent resolutions, namely 512 x 512 and and their descriptions in a very clear way. 336 x 336 pixels. Each set consists of five reThe second client is a mobile app that runs gions, = { , , , , }, the first centered on the visitors’ personal smart devices, (C) on the pixel chosen by the user, the others as well as other smartphones and tablets selected at the left (L), right (R), top (T), and provided by the museum. In addition to the bottom (B) of the central one. functionalities provided by the monitor, the While the center crops might seem sufimobile software application includes a scene cient to classify the location selected by the recognition service that can be also used in visitor, using adjacent crops and diferent combination with the other kind of client, resolutions allows to increase the recognii.e., sharing the identifier of the recognized tion performances, as it will be discussed scene with the touch-screen totem. This in Section 5.2. The obtained crops are sent enables visitors to select an object of interest to the server for the last three steps, i.e. through the mobile devices and then show triplets extraction, feature extraction, and the the results in the totem, thus enhancing classification tasks. their experience within the museum. This architecture is much lighter than three-level 4.2. Triplets and Feature solutions based, for instance, on the fog Extraction paradigm [24, 25] and it is adequate for the purposes of the proposed application. In order to obtain a compact representa

In the following of this section we tion of input data, the server groups the present the main phases of the recognition crops into two horizontal/vertical triplets procedure, summarized in Fig. 2. 1 = ( , , ) and 2 = ( , , ), and builds feature vectors by using a convolutional neural network. The adoption of CNN in our system is justified by the intrinsic nature of this category of neural networks, feature, the system leverages on a Convoluwhich are specialized in processing data that tional Neural Network in which the last two has a grid-like topology, such as an image. A dense layers and the soft-max function were CNN is typically made up of three diferent discarded from the network. The underlying kinds of layers, named convolutional, pooling, idea is to describe the content and shapes of and fully-connected layers [26]. the graphical content of the crops with a high

The convolutional layer aims to synthesize level of abstraction, so that it is possible to the spatial relationship between pixels of the make a comparison by only using the feature input image, without losing features which vectors. are critical for getting a good prediction. This To be more specific, the network model we layer uses a combination of linear and non- adopted (refers to Fig. 3) is made of 13 convolinear operations, i.e., convolution operation lution layers, with 3-by-3 kernel size filters, and activation function. The convolution is each one using a ReLU activation function. It an element-wise product between the input also employs 5 pooling layers, specifically 2image and a kernel, and its output is a fea- by-2 max-pooling, to achieve downsampling. ture map containing diferent characteristics The original network [28] uses 3 dense layof the input image. More are the kernels used ers and a final soft-max function, specifically during the analysis, more feature maps are targeted towards the 1000 classes of the Imagenerated. The feature maps are then evalu- geNet dataset; our network only uses the first ated by means of a nonlinear activation func- dense layer, so the output of our network is tion, such as the sigmoid, hyperbolic, or rec- a set of 4096 features. tified linear unit (ReLU) functions.

The pooling layer performs a downsam- 4.3. Classification pling operation with the aim to reduce the spatial dimensionality of the feature maps The complete classification process consists generated at the previous layer and, at the in a two steps procedure. At first, the same time, to extract dominant features 4096-elements feature vectors obtained invariant in rotation and position. One of from each crop in the triplets are classified the most adopted operation at this stage is according to a minimum distance apthe max pooling, which applies a filter of size proach [29]. Generally, given a training set × to the feature maps, and extracts the = { 1, 2, … , } and the corresponding maximum value for each of them. label set = { 1, 2, … , }, a new point ∗

Finally, the pooling layer output is trans- of unknown class ∗ is classified to ∈ formed in a one-dimensional array and if the distance measure ( ∗, ), ∈ , is mapped by a subset of fully-connected smaller than those to all other points into layers to the final outputs of the network. the training set: Hence, these layers return class scores for classification and regression purposes, using the same principles of the traditional ∗ = if ( ∗, ) < ( ∗, ), (1) Multi-Layer Perceptron neural network ≠ , = 1, … , . (MLP). Here, each feature vector associated with the

If such a layer is not included into the neu- crop in is classified as belonging to the class ral network, then the CNN can be used to by using the Frobenius distance [30]; extract a set of features from the input im- ∈ thus, the output of the first classification step age [27]. As our goal is only to extract the

STACK 2

STACK 3 2x2windowwithstride1 2x2windowwithstride1 is represented by two new triplets 1 and 2 but rather improves the system precision. containing the predicted class for each crop.

The second phase aims to evaluate if crops in a triplet were classified as depicting the 5. Experimental Evaluation same object. To accomplish such a task, we introduced the concepts of strong confidence The efectiveness of the proposed solution and a weak confidence for classification. The was evaluated through several experiments ifrst one is achieved when every element of focused on the case study described in the triplet is associated with the same object; Section 3. the latter occurs when only two elements are associated with the same class. 5.1. Experimental Setup

Firstly, the system checks for a strong confidence in any of the triplets and, if none is found, it tries for the weak confidence. If none of the triplets achieve strong or weak confidences, the visitor is asked to take a new picture of the artwork. This process is performed in near real-time, therefore it does not slow down the visiting experience, The experiments were carried out using three diferent models of smartphones, and one tablet (see Table 1) provided with the client software application. The mobile app (supporting both Android and iOS systems) supports visitors during all the recognition phases described so far, i.e., it allows to observe and select a scene in the wooden ceiling, automatically extracts the crops ered 100 diferent relevant locations within from it and sends them to the classification the ceiling, thus the number of images server. Moreover, the software application is obtained from each device is 1500. also able to manage information received by The number of locations is calculated by the server, thus enabling visitors to read the taking into account the specific structure of description of the scene in the device itself our case study. The ceiling is made of 24 or on the touch-screen monitor. beams, each of which is divided in two parts

Fig. 4 shows three examples of the by a central beam; on each side, we defined smartphone-side application. The leftmost two locations: one for the side of the beam image represents the interface visitors can facing East, and one for that facing West (i.e. use to login to the service. The image 4 locations for each beam). In addition to in the center is the main interface of the these 96 classes, the East and West walls of application that allows visitors to zoom the hall also have paintings similar to the in/click on a detail of the scene of interest sides of the beams, so 4 further locations and stars the remote recognition process. were considered.

Finally, the rightmost image shows infor- Early experiments were conducted by mation provided to users for the recognized randomly splitting such a dataset into element; in particular, in addition to the title training and testing sets; we will refer to and the description of the scene, the visitors this case as mixed dataset. Then, other tests are provided with high resolution pictures were performed by dividing the dataset so of the details of interest that are dificult to that the training and testing sets contained distinguish by standing at a great distance images acquired from diferent cameras. from the ceiling. This separate dataset is closer to the ap

While the CNN is pre-trained on Ima- plication scenario because every visitor geNet, the dataset to train the classification will use devices with camera settings and algorithm and perform the experiments was characteristics that might difer from the captured using the devices listed in Table 1. ones used to train the system. Each class in the dataset corresponds to a part of the ceiling, captured in three pictures 5.2. Recognition Results taken from diferent positions (Fig. 5), for each of which five regions of interest have The first set of experiments aimed to assess been manually selected (Fig. 6). We consid- the system performance when considering gle crop classification by using the separate dataset. Results from Fig 7-b show a similar trend as the train-to-test ratio increases, but also highlight a significantly lower mean accuracy, thus demonstrating the inadequacy of a single crop to drive the classification process.

The next set of experiments concerns the evaluation of the proposed three crops classiifcation system, in which two diferent classification confidence settings are introduced, Figure 6: Example of manual cropping of the namely weak and strong. photo to create the dataset. Performances were evaluated both in terms of accuracy and percentage of crop discarded because they have not reached a the recognition of a single (central) crop weak or strong classification confidence. It from the mixed dataset. The mean accuracy is worth noting that discarded images imply achieved in this case is shown in Fig 7-a, that visitors would be asked to take the where each bar represents a diferent ratio photos again; thus, the lower this value, the of training and testing samples. Since higher the usability of the system. our classifier has to be able to distinguish Fig. 8 shows the result obtained on the between 100 classes (scenes), results indicate separate dataset. By observing the mean that a larger number of training samples accuracy values (Fig. 8-a) we can notice are required in order to obtain satisfactory a significant improvement as compared accuracy values. In this case, images for to the single crop classification (Fig 7-b). both the training and the testing sets were Unfortunately, Fig. 8-b indicates that as the captured by using the same devices, which number of samples in the training set varies, is not representative of a real scenario the number of discarded crops is stably high. that involves hundreds of testing devices This is mainly due to not having enough equipped with diferent cameras. images in the dataset. For this reason, the For this reason, we also evaluated the sin- next set of experiments aimed to evaluate the impact of data augmentation. This technique is used to artificially increase the number tion improved less, but still in a noticeable of samples in the training set in order to way: from 78% accuracy and 80% discarded extract additional information [31]. In our queries, to 88% accuracy and 68% discarded system, data augmentation is performed by queries. This confirms that data augmencreating crops of diferent resolutions of the tation enhances the performances of the original locations of interest so as to obtain classifier without requiring the involvement new samples. of new capturing devices.

Results in Fig. 9 show that data augmen- The last set of experiments was aimed to tation causes an increase in accuracy and a assess the classification procedure that will decrease in the number of discarded crops. be actually performed by the smart museum Weak classification best results improved application: instead of using only one triplet from 56% accuracy and 47% discarded of crops, users’ devices will send to the clasqueries, to 69% accuracy and 41% discarded sification server all ten crops introduced in queries respectively. The strong classifica- Section 4.1, divided in 4 horizontal/vertical )100 % ( ed 80 d r isca 60 D and 40 y c ra 20 u c c A 0

6. Conclusions 7. Acknowledgments This research is partially funded by the

Project VASARI of Italian MIUR (PNR 2015-2020, DD MIUR n. 2511). adoption of CNNs allows to extract features from 10 diferent regions of the photo taken by a visitor, taking advantage of the shape of the items in our specific scenario. Experimental results showed the performance of the recognition system in terms of accuracy and percentage of discarded crops, thus proving its efectiveness in a real-world application scenario.

The system will be soon deployed to support the visitors of Palazzo Chiaramonte.

This will enable us to collect a greater number of query examples (captured from a wide range of devices), which could also be exploited to further refine the model. user experience in museums, in: 2017 age and Music, Springer Berlin HeidelArtificial Intelligence and Signal Pro- berg, Berlin, Heidelberg, 2010, pp. 170– cessing Conference (AISP), IEEE, Pis- 183. cataway, NJ, USA, 2017, pp. 195–200. [17] S. Alletto, R. Cucchiara, G. Del Fiore, doi:10.1109/AISP.2017.8324080. L. Mainetti, V. Mighali, L. Patrono, [11] T. Octavia, A. Handojo, W. T. KUSUMA, G. Serra, An indoor location-aware T. C. YUNANTO, R. L. THIOSDOR, system for an iot-based smart muet al., Museum interactive edutainment seum, IEEE Internet of Things Journal using mobile phone and qr code, vol- 3 (2016) 244–253. doi:10.1109/JIOT. ume 15-17 June, 2019, pp. 815–819. 2015.2506258. [12] M. S. Patil, M. S. Limbekar, M. A. Mane, [18] V. D. Sachdeva, J. Baber, M. BakhtM. N. Potnis, Smart guide–an approach yar, I. Ullah, W. Noor, A. Basit, Perto the smart museum using android, In- formance evaluation of sift and conternational Research Journal of Engi- volutional neural network for image neering and Technology 5 (2018). retrieval, Performance Evaluation 8 [13] S. Ali, B. Koleva, B. Bedwell, S. Ben- (2017).

ford, Deepening visitor engagement [19] P. Föckler, T. Zeidler, B. Brombach, with museum exhibits through hand- E. Bruns, O. Bimber, Phoneguide: Mucrafted visual markers, in: Proceedings seum guidance supported by on-device of the 2018 Designing Interactive Sys- object recognition on mobile phones, tems Conference, DIS ’18, Association in: Proceedings of the 4th Internafor Computing Machinery, New York, tional Conference on Mobile and UbiqNY, USA, 2018, p. 523–534. doi:10. uitous Multimedia, MUM ’05, Associ1145/3196709.3196786. ation for Computing Machinery, New [14] L. Wein, Visual recognition in mu- York, NY, USA, 2005, p. 3–10. doi:10. seum guide apps: Do visitors want it?, 1145/1149488.1149490. in: Proceedings of the SIGCHI Con- [20] S. Gaglio, G. Lo Re, G. Martorella, ference on Human Factors in Comput- D. Peri, Dc4cd: A platform for dising Systems, CHI ’14, Association for tributed computing on constrained deComputing Machinery, New York, NY, vices, ACM Transactions on Embedded USA, 2014, p. 635–638. doi:10.1145/ Computing Systems 17 (2017). doi:10. 2556288.2557270. 1145/3105923. [15] M. K. Schultz, A case study on the [21] G. Taverriti, S. Lombini, L. Seidenari, appropriateness of using quick re- M. Bertini, A. Del Bimbo, Real-time sponse (qr) codes in libraries and wearable computer vision system for museums, Library & Information improved museum experience, in: ProScience Research 35 (2013) 207 – 215. ceedings of the 24th ACM International doi:https://doi.org/10.1016/j. Conference on Multimedia, MM ’16, lisr.2013.03.002. Association for Computing Machinery, [16] B. Ruf, E. Kokiopoulou, M. Detyniecki, New York, NY, USA, 2016, p. 703–704.

Mobile museum guide based on fast doi:10.1145/2964284.2973813. sift recognition, in: M. Detyniecki, [22] G. Ioannakis, L. Bampis, A. KoutU. Leiner, A. Nürnberger (Eds.), Adap- soudis, Exploiting artificial intelligence tive Multimedia Retrieval. Identifying, for digitally enriched museum visits, Summarizing, and Recommending Im- Journal of Cultural Heritage 42 (2020) 171 – 180. doi:https://doi.org/10.

1016/j.culher.2019.07.019. [23] G. Lo Re, M. Morana, M. Ortolani,

Improving user experience via motion sensors in an ambient intelligence scenario, 2013, pp. 29–34. [24] F. Concone, G. Lo Re, M. Morana, A fog-based application for human activity recognition using personal smart devices, ACM Transactions on Internet Technology 19 (2019). doi:10.1145/ 3266142. [25] F. Concone, G. Lo Re, M. Morana,

Smcp: a secure mobile crowdsensing protocol for fog-based applications, Human-centric Computing and Information Sciences 10 (2020). doi:10.

1186/s13673-020-00232-y. [26] R. Yamashita, M. Nishio, R. K. G. Do,

K. Togashi, Convolutional neural networks: an overview and application in radiology, Insights into imaging 9 (2018) 611–629. [27] T. Bluche, H. Ney, C. Kermorvant,

Feature extraction with convolutional neural networks for handwritten word recognition, in: 2013 12th International Conference on Document Analysis and Recognition, IEEE, IEEE, Piscataway, NJ, USA, 2013, pp. 285–289. [28] K. Simonyan, A. Zisserman, Very deep convolutional networks for largescale image recognition, arXiv preprint arXiv:1409.1556 (2014). [29] P. Kamavisdar, S. Saluja, S. Agrawal,

A survey on image classification approaches and techniques, International Journal of Advanced Research in Computer and Communication Engineering 2 (2013) 1005–1009. [30] G. H. Golub, et al., Cf vanloan, matrix computations, The Johns Hopkins (1996). [31] A. Mikołajczyk, M. Grochowski,

Data augmentation for improving

deep learning in image classification problem , in: 2018 International Interdisciplinary PhD Workshop (IIPhDW), IEEE, Piscataway, NJ, USA, 2018 , pp. 117 - 122 . doi: 10 .1109/IIPHDW. 2018 . 8388338 .