1. Introduction

CNN-based Classification of Car Images for Android Devices

Iveta Mrázová

Georgi Georgiev

0 0 Faculty of Mathematics and Physics, Charles University , Prague , Czech Republic

The design of eficient yet robust methods for real-time image classification belongs to scorching topics in contemporary AI, particularly in the case of mobile and edge devices. Various types of convolutional neural networks seem to contribute to solving this task. Especially those architectures proposed explicitly for mobile devices, e.g., MobileNet, and EficientNet, are classified to the least time-consuming ones. This paper thoroughly reviews the structure, performance, and main characteristics of the considered network types. Based on the obtained results, we introduce a mobile-phone application to classify cars we might see on the street and search for nearby car dealerships, e.g., to buy a car similar to that one of interest. The developed application involves the TensorFlow EficientNet Lite model. Finally, we provide an outlook for a possible enhancement of the application with federated learning.

eol>image classification convolutional neural networks EficientNet TensorFlow Lite models for mobile applications

1. Introduction Modern convolutional neural networks (CNNs) are

known to beat human performance in many tasks. However, their state-of-the-art architectures require substantial computational resources. A pretty natural question thus arises if we can also benefit from CNN image processing capabilities when implemented on mobile devices.

Recent Android products range from the Google Pixel 6 mobile phone equipped with the newest Google Tensor processor to mobile devices that use Edge TPU (Tensor Processing Unit) chip. Two examples of Edge TPU devices include Coral Dev Board and the Coral USB accelerator.

Although the CNNs comprise a considerable number of neurons at diferent layers, the model benefits from weight sharing that keeps down the number of trainable parameters. With local receptive fields (i.e., rectangular iflters), the CNNs scan the presented images to look for significant visual pattern features. This information is combined in subsequent layers to detect more complex higher-order features. The neurons’ activities form the so- Figure 1: A snapshot of the application running on the Pixel called feature maps representing the extracted knowledge 3 virtual mobile device: the classification of a presented sports in each layer. Alternating pooling layers blur the exact car is followed by searching for the closest dealerships ofering position of the features and allow for down-sampling of similar cars in Poprad. feature maps.

Our ultimate objective is to develop a mobile-phone application to classify cars we might see and search for Considering the limited hardware means of mobile dedealerships to rent or buy a similar car. Fig. 1 presents vices, a crucial steppingstone in the application design a snapshot illustrating the function of the application. represents the choice of an accurate, robust, and memory/time eficient network model for the CNN-based car ITAT’22: Information technologies – Applications and Theory, Septem- classifier. ber 23–27, 2022, Zuberec, Slovakia As a part of our research, we tested the classification G$e oivregtia.G.meorargzoievva.9@9@mf.sceuznnia.cmz.(cIz. (MGr.áGzoevoár)g;iev) and robustness performance of 10 selected CNN mod0000-0002-3765-1400 (I. Mrázová) els. The results indicate that the EficcientNet models © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License are superior in all cases. More precisely, EficientNetB5, CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g ACttEribUutRion W4.0oInrtekrnsahtioonpal (PCCroBYce4.0e).dings (CEUR-WS.org)

2. Related Work To find the best model satisfying the above-specified requirements, we have selected 10 candidate network mod

The data comes from the Keras Application page of the oficial Keras documentation [11]. model is characterized by the depth of 189 and 23.9 mil- large for the small dataset we have and could easily get lion parameters. InceptionV3 achieved the best results overfitted. for its time, but today it gets easily outperformed even The state-of-the-art family of the so-called Eficientby smaller models. Nets [21] exploits the NASNet strategy. The baseline

The InceptionResNet model replaces the concatena- model of EficientNetB0 and its variants B1 to B7 upscaled tions of InceptionV3 with residual connections skipping uniformly in all the network parameters (i.e., width, the layers [16]. Inserting such shortcuts improves the depth, or resolution) belong to the most accurate and network’s ability to back-propagate errors across multi- memory-eficient CNNs, now. EficientNetB0 is bigger ple layers. InceptionResNetV2 has the depth of 449 layers than MobileNetV2, but has still a very small size comand 55.9 million parameters which makes it one of the pared to the remaining models. The automated construcbiggest models we dealt with. In this case the size of the tion of EficientNetB0 aims at finding the best possible model and its residual Inception-like structure result into network given the predefined operations. very good accuracy and robustness results. The bigger size of EficientNetB5 and B7 leads to im

The Xception network splits full convolutional oper- proved accuracy and noise robustness. B5 is characterators into depthwise and pointwise convolutions. The ized by the depth of 312 and 30.6 million parameters. B7 depthwise separable convolutions reduce the necessary is with its depth of 438 and 66.7 million parameters the computational costs almost ten times with only slightly second largest model in our collection (behind NASNetreducing the accuracy compared to standard convolu- Large). EficientNetB7 also performs the best. tions [17]. This led to a considerable drop in depth to 81.

The number of parameters was, however, reduced just by 2.2. Android Mobile Devices 1 million (to 22.9 million). Still, despite of a reduced number of layers and parameters, Xception usually performs Recent mobile devices, e.g., Google Pixel 6 equipped with slightly better than InceptionV3. Google Tensor processor [4], can perform complex com

To support feature reuse, the DenseNet model em- putations such as image and video processing, real-time braces an architecture connecting each convolutional evaluation of CNNs, or other machine learning tasks. layer to all its successors [18]. DenseNet121 belongs to Except for the traditional CPU and GPU modules, the rather smaller models. It has just 8.1 million parameters Google Tensor processor [4] also contains a TPU module. and a depth of 242. We can clearly see the efect of the Below, we will provide a brief overview of mobile devices added connections from each layer to all its successors. suitable for CNNs, such as mobile phones, accelerators, The number of trainable parameters remains low even and micro-computers. Based on their price, we will conthough the depth of the model is above average. In addi- sider three categories of Android-run smartphones: tion to reducing the number of network parameters, this approach further improves the eficiency of the network. • Mobile phones priced at about 100 EUR, e.g., Xi

The austere model of MobileNetV2 [19] exploits the aomi Redmi 9 A with 2GB RAM and 32GB interso-called linear bottleneck layers to capture the function nal memory. It runs Android 10, has an 8-core of the entire layer. The model also takes advantage of MediaTek Helio G25 CPU, and supports AI facethe so-called inverted residuals. In this case, several bot- scanning [1]. For the tests, we have created a less tlenecks follow the input within a residual block and are powerful virtual mobile phone that could evalenhanced by an expansion afterward. Utilizing the much uate even the Lite versions of the EficientNet smaller input and output dimensions for the shortcuts models. improves eficiency of the inverted design considerably. • Middle-priced smartphones at around 470 EUR, MobileNetV2 has one the smallest depths (105) and also e.g., the Samsung Galaxy phone A52s 5G with the smallest number of parameters (3.5 million) of all the 6GB RAM, 128GB internal memory, and an 8-core models in our selection. Considering its small size, Mo- CPU [2]. It runs Android 11 (can be upgraded to bileNetV2 is able to outperform bigger models in some the newest Android 12 OS). Without problems, of the tests. these phones can operate TensorFlow applica

The NASNet approach automatically searches for the tions. best network architecture considering the data at hand • Cutting-edge phones at about 1150 EUR, e.g., the [20]. However, the learned image features can be trans- Galaxy S21 Ultra 5G phone with 12GB RAM (can ferred to other computer vision problems. NASNetMobile be bought even with 16GB RAM), 256GB internal and NASNetLarge are two variants of the same model. memory (512GB is possible, too), 8-core CPU, and They share the same structure and difer only in their more [3]. It supports Android 11 and 12. These size. They preceded the EficientNet model family and phones are more powerful than some laptops toalso achieve worse results. Due to the depth of 533 and day, and we might even use them to fine-tune 88.9 million parameters, the NasNetLarge may be too small neural network models in the future.

Coral Dev Board is a single-board computer equipped

with the Edge TPU coprocessor. Edge TPU is a chip crafted specifically to accelerate machine learning inference (MLI) for mobile CNN models [5]. Another example of a device that uses the Edge TPU coprocessor is Coral USB Accelerator. Its purpose is to enable or accelerate MLI on other external devices. Both Coral Dev Board and Coral USB Accelerator support Tensor-Flow Lite. Further, the accelerator can cooperate with devices that run Debian Linux, macOS, or Windows 10, even with another single-board computer such as Raspberry Pi. Unfortunately, most reviewed mobile devices are not powerful enough to train deep neural networks.

3. Application Design This section highlights the main design principles for

the planned Android classifier of cars’ body styles. A viable implementation option would be to gather the input images and send them to a distant server via the Internet. Then, the server shall evaluate the acquired images with a CNN and return the classification results to the Android application afterward to present them to the user. This approach requires a stable Internet connection; without it, the application is out of order.

To overcome this limit, we decided to classify the car images directly within the Android application by a builtin TensorFlow Lite CNN model. A working Internet connection is thus needed only to search for the best scoring cars of the resulting type or the closest car dealerships ofering these cars. On the other hand, the chosen neural network model has to be small enough to fit into the Android application. At the same time, the selected CNN must be as accurate and robust as possible (EficientNetB5, in our case). Figure 3 outlines the application flow diagram.

3.1. The Form of the Employed Data We used a variant of the Stanford Cars dataset [6] to

train and test the respective CNN models. The modified dataset consists of 2560 images of the size 224x224 that belong to 8 diferent classes (Fig. 2) - BigCoupeOrSedan, Hatchback, MuscleCar, PickUp, Van, SportsCar, SUV, and Unknown. ’Unknown’ contains images with no identifiable cars. Table 2 summarizes the class distribution for the involved class labels. For training, we split the dataset into batches of size 64 (i.e., 40 batches in total). 80% of the images form the training set, 20% make up the test set.

For most of the classes, their pattern distributions are comparable. The only exception is the class ’Unknown’.

The results we have obtained for the performed tests, however, do not indicate significant overtraining of the CNN models. A reason for this efect could represent extensive data augmentation applied during training. In addition, we can consider the so-called stratified sampling when generating training, testing, or validation datasets in the future.

A critical issue in machine learning consists in adequate data preprocessing. Preliminary experiments indicate, for example, that even the color of the vehicles might strongly afect the classification result. If the data contains specific cars only in one color, the trained model can pick that color as the distinguishing feature. Sometimes, this choice might correspond to particular brand colors, e.g., a red Ferrari or a blue Subaru.

Other factors can also significantly afect the classification results, e.g., the car’s angle in the photo. To limit the considerable probability of misclassification in such cases, we decided to augment the training data with car images enhanced by various transformations (e.g., corrupted by noise or taken from diferent perspectives). 20% of the original dataset patterns were randomly chosen for testing during the 5-fold CV, the rest was used for training.

During training, CNNs extract features characteristic for the given class and then attempt to detect these features in the images provided for recall. Poorly trained networks can, however, fail to identify representative features from the data. Manufacturers often use, e.g., appealing body parts like headlights or the grille’s shape for diferent types of vehicles they produce. Misguided networks sometimes prefer to choose familiar design elements as vital for classification. We shall thus prepare the training data carefully to encourage an improved classifier performance. In the forthcoming section on supporting experiments, we will describe the employed data set in more detail. modified Stanford Cars dataset (see Section 3.1). To enhance the recall capabilities of the trained networks, we added an image augmentation layer to the considered models. This layer automatically adds random noise to the images and is active only during training. Further, the considered augmentations comprise horizontal flip, up to 54 degrees rotation, contrast with a factor set to 0.5, zoom with the height factor set to 0.15 (upper and lower zooming limits), and translation (height and width factors set to 0.15).

During training, the image modifications were performed on-place by means of a set of sequential image augmentation layers from the Keras library. Every training image has thus been randomly modified by all augmentation layers. There also exists a very small probability that the image is left without any modification. We shall, however, highlight that augmentation layers can modify the same image diferently in diferent training epochs. This boosts the involved training dataset several times. In fact, each iteration employs the same number of diferent training patterns (of the same nature).

We used a Gaussian filter implemented within the SciPy library (scipy.ndimage.gaussian_filter) with the standard deviation (sigma factor) set to 1.5 to create blurry images. The 2D Gaussian kernel we used is defined as: (, ) = 1 2 and denote the distance from the origin (at the center (0, 0) of the filter) in the horizontal and vertical axes.

Image rotation was performed with the Keras RandomRotation layer which involves the respective rotation 4. Supporting Experiments matrices. In order to make the model noise-robust, we considered random rotations of up to 54 degrees. The We will use the above-specified dataset to test the per- newly appeared empty regions near the image borders formance of the considered CNN models: MobileNetV2, are filled using a reflection (by reflecting the closest imEficientNetB0, EficientNetB5, EficientNetB7, NASNet- age pixel).

Mobile, NASNetLarge, InceptionV3, Xception, Inception- To apply horizontal flips, we used the Keras layer called ResNetV2 and DenseNet121. While we used Python to RandomFlip which performs flipping with a 50% chance. write the project for evaluating the experiments, we have Similarly, using the methods from SciPy, we implemented implemented the example Android application in Java also the remaining layers such as RandomContrast, Ranusing Android API and Android Studio version 4.1.3. domZoom and RandomTranslation.

To train and test the models, we resorted to the li- We attached the augmentation layers to the beginning braries TensorFlow 2.5.0-rc1 [9], TensorFlow Lite [10] of the models, which allows to modify the images using a and Keras 2.5.0 [11]. Keras can work directly with the GPU acceleration. The augmentation layers also become ImageNet [8] checkpoints of the selected models. Further, part of the SavedModel during serialization. The augmenwe applied NumPy 1.19.5 [12], and Pandas [13, 14] to pro- tation layers thus do not have to be created separately cess the gathered data (count means, standard deviations, after loading the model [9]. confidence intervals, etc.). An alternative option would be to use a TensorFlow Image augmentation pipeline. The augmentation meth4.1. The Accuracy Test ods are, namely, part of the tf.image library and can be applied to the dataset using the tf.data.Dataset.map funcTo test the architectures for the achievable top-1 accu- tion. The advantages of this approach are that the data racy, we used the 5-fold cross-validation (CV) over the for the following epoch can be prepared by the CPU in advance during the current epoch and the model itself also does not have to be further modified. The CNN would be therefore a little bit smaller, but the image augmentation pipeline should be manually constructed every time before the training of the model starts.

For the beginning five epochs, we trained just the last classifier layers of the networks. Afterward, we kept adjusting the top 10% layers of the networks for additional ten epochs (with fixed last classification layer weights).

The results are summarized in Table 3 together with the necessary memory requirements. The table shows that EficientNetB7, EficientNetB5, and InceptionResNetV2 belong to the most accurate models.

Some models, however, did not achieve accuracy rates measured on the ImageNet dataset due to the limited number of training epochs, e.g., Xception (with the top-1 accuracy of 79.0% reported for ImageNet). In such a case, (re)training of additional top 20% to 30% of network layers with early stopping and patience set to 10 indicated a significant improvement in the final top-1 accuracy (up to 85.7% for the EficientNetB7 model and roughly 80% for the other networks).

4.2. The Robustness Tests With these tests, we wanted to assess the resilience of

the considered networks to noise corruption and various image modifications. We prepared a unique validation set of 287 images not used previously in training for the experiments. The dataset contains images selected both from the Stanford Cars dataset [6], and from the 1 The accuracy is averaged over all 5-fold cross validation steps, CI specifies 95% confidence intervals. 1 The top-1 accuracy of 5 diferent checkpoints trained on the original data set with early stopping was considered, its mean was calculated together with the corresponding 95 % confidence intervals (in %). 2 RGB stands for RGB noise. Random pixels were set to a random color value with probability p (%).

Regards the robustness to random pixel color changes

(see, e.g., Fig. 4), EficientNetB7 and EficientNetB5 are the most robust but, at the same time, memory-intensive models. The smallest network, MobileNetV2, demonstrates, on the other hand, the worst results in this test.

As an acceptable compromise, we can thus pick the EfifcientNetB0 model with just 31.4MB memory requirements and results outperforming many more extensive networks, e.g., Xception, NASNetLarge, and even InceptionV3 (except for highly noised images).

On blurred and grayscale image sets, the models performed better than in the RGB noise test, and the MobileNetV2 model achieved even higher accuracy than EficientNetB0. The other two EficientNet models are significantly more accurate than both the MobileNetV2 and EficientNetB0. Yet as the top-1 accuracy fell significantly for grayscale images compared to the original Figure 7: A negative cropping test done with EficientNetB5. validation dataset, color proves to play an essential role The upper image is the original one correctly classified as a in the classification process. Also, classification is more hatchback. The bottom one was cropped from all sides with a accurate for blurred images than for grayscale images. factor of 12 (cropping amount: width and height divided by Although blurring does not improve the overall accu12), yet misclassified as an SUV. racy of the networks, it can sometimes emphasize significant image characteristics and improve the classification.

For example, let us consider the case illustrated in Fig. 5.

DVM-CAR dataset, [7] (63 of them to better simulate the The EficientNetB5 model misclassified the shown vehireality). cle as a sports car but correctly classified the blurred one

Further, we created multiple variants of the 287-image as a hatchback. Blurring emphasized the edge separatvalidation dataset (see Table 2). Each variant contains ing the back of the vehicle from the background for the all of the images from the original dataset modified in a model, thus better indicating a hatchback. diferent way: changing random pixels to a random color For many CNN architectures, cropping of images rewith a certain probability, blurring the images, cropping sults in a higher top-1 accuracy compared to the original them, and making them grayscale and/or blurring them validation set. During training, the augmentation layer at the same time. This way we were able to test separately prepares the network for this test scenario, and cropping the behavior of each model on modified images. removes the image’s noisy edges, thus focusing better on

We trained the networks by employing early stopping the main object, see, e.g., Fig. 6. On the other hand, cropwith patience set to 10 and GPU acceleration. The train- ping can also impact unwanted results, see, e.g., Fig. 7 of ing ran for all models five times in a row and used always a hatchback misclassified as an SUV after image cropping the same validation data. Table 4 shows the results aver- caused the car to fill the whole image and appear to be aged over all five runs. more spacious.

Contemporary Android devices are powerful enough for

real-time image processing based on neural networks.

In this paper, we studied the accuracy, evaluation speed, robustness, and size of 10 considered CNN models and selected the best-performing ones to upload to the developed Android smartphone application.

Table 5 summarizes the results obtained for the car Table 6 dataset. According to top-1 accuracy, the most accurate Memory usage during training (m81o.d8e%l)saanrde EEfi cficiieennttNNeettBB57(8(804.5.4%%).),TIhnecseepmtioondeRlessaNreettVh2e Model GPuUsemd(eGmBo)ry Musaexd. R(GABM) 1 Mo(dMeBls)ize2 biggest ones for their TensorFlow Lite size (243, 207, and EficientNetB5 5.3 2.5 251.5 108MB, resp.) yet remain easy to train. EficientNetB7 5.4 2.7 556.7

After only 15 training epochs, the networks achieved InceptionResNetV2 3.0 2.6 425.3 adequate accuracy in the 5-fold CV test. The aforemen- 1 A possible bias can be caused by programs running on tioned networks are robust against random RGB noise the background. and various image distortions. All of them can be con- 2 The size of the model located on the GPU during trainverted to the Tensor-Flow Lite format, although Eficient- ing (including the training metadata). NetB7 and InceptionResNetV2 do not fit into an Android application. Due to its size, the most accurate and noise- on-device training. The conversion to Lite reduces the robust model suitable for an Android Studio application models’ size up to two times without reducing their acseems to be EficientNetB5. curacy.

Should the model be as small and as fast as possible, The TensorFlow Lite library was, on the other hand, EficientNetB0 might pose a better choice. It achieves built to operate on portable devices with low computasatisfiable accuracy and robustness results and it is the tional power. Originally, the Lite library did not allow second smallest model among the considered ones. Fur- on-device training of Lite models. Meanwhile, this limitather, it can achieve better results than many bigger mod- tion has been removed and on-device training is already els like Xception, NASNetLarge, and even InceptionV3. supported. Despite of a well-written TensorFlow docThe main contribution of this study thus consists in: umentation, training of Lite models still remains quite • the development of a mobile Android application cumbersome, at least from the programming point of that facilitates the classification of car images view.

according to the car’s body style. Another limitation for our research comes from the An• the choice of the EficientNetB5 model for the de- droid Studio that we have used to implement the trained veloped smartphone application. Extensive test- CNNs in mobile applications. It has an inbuilt size limit ing of the CNN models in question justifies this de- of 200MB for external files to be uploaded to a project. cision that constitutes an acceptable compromise As a result of this restriction, we were not able to upload for all the criteria, particularly concerning the Lite models bigger than 200MB to mobile applications.. model’s accuracy, robustness, and the required The last limitation is that Android Studio does not oftime and memory costs. ifcially support uploading of TensorFlow models saved Only for EficientNetB5, we obtained good results in formats diferent from TensorFlow Lite. On the other (although not the very good ones) conforming to hand, TensorFlow Lite supports also other operating sysall three considered criteria. None of the other tems such as iOS, so the developers are not limited to models meets all of them. The other candidate writing their applications just for Android. models, EficientNetB7 and InnceptionResNetV2, Further research could enhance the developed applicaachieving acceptable accuracy results, do not fit tion both with on-device training and with federated into the mobile application. learning. Federated learning enables robust training across several decentralized edge devices or servers hold

While working with the standard TensorFlow library, ing local data samples without sharing them. This way, we did not encounter any significant problems. But to as- the inbuilt CNN classifier could be easier retrained on sess the viability of the networks for future on-device fine- new data to keep the implementation up-to-date. Other tuning, we also measured the memory requirements of intriguing options for future research comprise the area EficientNetB5, EficientNetB7, and InceptionResNetV2 of architecture optimization for the trained networks during training (see Table 6; we averaged the obtained and the involvement of nature-inspired heuristics in the results over five training sessions). EficientNetB5 con- process of CNN design. sumed 5.3GB of GPU memory and 2.5GB of RAM during each training session.

Our computer needed 251.5MB of GPU memory to Acknowledgments store EficientNetB5 and its training metadata. The other two models were more demanding. Yet, even if we This research was supported by SVV project No. 260 575. focused only on the EficientNetB5, we would need a cutting-edge category smartphone like the Galaxy S21 Ultra 5G phone equipped with 12GB of RAM to launch Weinberger, “Densely connected convolutional networks”, CVPR, 2017, pp. 2261-2269. [1] Xiaomi Czech, “Xiaomi Redmi 9A”, [19] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov and link https://www.xiaomi.cz/xiaomi-redmi-9a-2gb- L. -C. Chen, “MobileNetV2: Inverted residuals and 32gb-sky-blue/, Accessed: 27 Jan. 2022 linear bottlenecks”, CVPR, 2018, pp. 4510-4520. [2] Alza.cz, “Samsung Galaxy A52s 5G”, link: [20] B. Zoph, V. Vasudevan, J. Shlens and Q. V. Le, “Learnhttps://www.alza.cz/samsungu-galaxy-a52s- ing transferable architectures for scalable image 5g?dq=6667487, Accessed: 27 Jan. 2022 recognition”, CVPR, 2018, pp. 8697-8710. [3] Samsung, “Samsung Galaxy S21 Ultra 5G 256GB”, [21] M. Tan and Q. V. Le, “EficientNet: Rethinking link: https://www.samsung.com/cz/smartphones/ model scaling for convolutional neural networks”, galaxy-s21-5g/buy/, Accessed: 27 Jan. 2022 Proc. of the 36th International Conference on Ma[4] Monika Gupta, “Google Tensor is a mile- chine Learning, PMLR 97:6105-6114, 2019. stone for machine learning,” 19 Oct. 2021, Accessed: 14 Nov. 2021, link: https://blog.google/products/pixel/introducinggoogle-tensor/ [5] Google LLC, “Edge TPU performance benchmarks”, 2020, Accessed: 16 Nov. 2021, link: https://coral.ai/docs/edgetpu/benchmarks/ [6] J. Krause, M. Stark, J. Deng, and L. Fei-Fei, “3D object representations for fine-grained categorization”, 4th IEEE Workshop on 3D Represent. and

Recogn., ICCV 2013. [7] J. Huang, B. Chen, L. Luo, S. Yue and I. Ounis, “DVM-CAR: A large-scale automotive dataset for visual marketing research and applications”, 2021, arXiv: 2109.00881 [8] J. Deng et al., “ImageNet: A large-scale hierarchical

image database”, CVPR, 2009, pp. 248–255. [9] M. Abadi et al., “TensorFlow: Large-scale machine learning on heterogeneous systems, 2015”, Software available from tensorflow.org. [10] TensorFlow, “Deploy machine learning models on mobile and IoT devices”, Accessed: 6 Feb. 2022, link: https://www.tensorflow.org/lite [11] F. Chollet et al., “Keras.”, 2015, link: https://keras.io [12] C.R. Harris et al., “Array programming with

NumPy.” Nature 585, 2020, pp. 357–362. [13] The pandas development team, “pandasdev/pandas: Pandas 1.3.4”, 2021, doi: 10.5281/zenodo.5574486 [14] W. McKinney et al., “Data structures for statistical computing in python”, Proc. of the 9th Python in

Science Conference Vol. 445, 2021, pp. 51–56. [15] C. Szegedy, V. Vanhoucke, S. Iofe, J. Shlens and Z.

Wojna, “Rethinking the Inception architecture for computer vision”, CVPR, 2016, pp. 2818–2826. [16] C. Szegedy, S. Iofe, V. Vanhoucke, and A.A. Alemi, “Inception-v4, Inception-ResNet and the impact of residual connections on learning”, AAAI, 2017, pp.

4278–4284. [17] F. Chollet, “Xception: Deep learning with depthwise separable convolutions”, CVPR, 2017, pp. 18001807. [18] G. Huang, Z. Liu, L. van der Maaten and K. Q.