Forecasting of Fruits Stock Life using CNN-based
Deep Learning Techniques: A Comprehensive Study ⋆
Neha Gautam1,∗,† , Nisha Chaurasia2,∗,†
1
    Department of Computer Science and Engineering, Dr. B.R Ambedkar National Institute of Technology, Jalandhar, India
2
    Department of Computer Science and Engineering, Dr. B.R Ambedkar National Institute of Technology, Jalandhar, India


                                         Abstract
                                         Fruits have lavish fibre and nutrients such as proteins, Vitamin A, C, and E, Folic acid, Magnesium, Zinc,
                                         Phosphorous and others which needs to be pampered for high gain. Fruits freshness have a short life,
                                         especially during the time of supply. Suppliers due to a lack of accurate knowledge about fruits freshness
                                         forecasting during the sorting and packaging process, supply such fruits that are unfit for consumption
                                         because the fruit’s freshness gradually decays over days. For detecting fruits spoilage at the initial
                                         production stage of consumption is necessary to reduce fruits being rotten/spoiled. Automatic fruit
                                         grading on the basis of quality and characteristics is a commercially important process to obtain high
                                         fruits production in the food industry. This can be done traditionally but it would be time consuming,
                                         costly and required more labour and human being can be exhausted and bored after doing the same
                                         work which is not the case with machines. Fruit businesses completely rely on the quality of fruits based
                                         on colour, texture, physical appearance, shape and size incurring fast and effective methods to know
                                         the grading and worth of fruits. The exact evaluation of fruits products assumes a significant part in
                                         the rural and food industry to expand the benefit and to upgrade intensity. In this way, the nature of
                                         organic products plays a fundamental job as it is utilized in assortments of utilizations like the product,
                                         creating food things like organic product juice, jams, and so on that are healthy for human beings. The
                                         unfit fruits production affects the economy of any country indirectly and the level of emission of carbon
                                         dioxide (CO2). In this paper, several methods have been implemented by reviewed authors to predict
                                         the quality of the fruits. Also, this study puts emphasis on the need to timely sell out the stock which
                                         reduced loss to the sellers.

                                         Keywords
                                         CNN, SVM, K-means, Image enhancement, Fruits Classification, Recognition, Segmentation


1. Introduction
The fruits industry is the reinforcement of the Indian economy where the quality of fruits
creation assumes a significant part. In this era, each food industry throughout the world wide
wants to use such a technology that is automatic by which time and money could be saved to
analyse and recognise the grading of fruits quality. India positions second in the world in the
production of fruits. It offers India gigantic open doors for the export of fruits [6]. A lot of expert
experience and knowledge is required in traditional Methods for fruits quality analysis. Fruits
ACI’22: Workshop on Advances in Computation Intelligence, its Concepts Applications at ISIC 2022, May 17-19, Savannah,
United States
∗
  Neha Gautam.
†


Envelope-Open nehagautam796@gmail.com (N. Gautam); chaurasian@nitj.ac.in (N. Chaurasia)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)


                                                                                                         108
Figure 1: Comparison of deep learning model with human experts [7].


quality supervision can be inconsistent due to human to human based on their skill and physical
factors [19]. In paper [7] author compared deep model to the human expert and found the result
given in Figure 1 where deep model is more accurate than the expert. Hence to decrease human
labour and effort with accurate and fast result, we are required an automatic identifier system
by using this fruits production can be increased [10]. In this paper, various efforts are made by
authors to develop such a system using CNN-based deep learning techniques such as ResNet50,
MobileNet V2, DenseNet-121, NASNet-A, SVM and EfficientNet B0-B2 and transfer learning.
   In this paper, we are going to depict CNN-based deep learning techniques to classify the fruits
on the basis of fruits quality. This paper is divided into six segments. The first section is a brief
introduction and in the second section, we will review the already published paper in the field
of fruits quality prediction using CNN-based deep learning techniques in the literature review.
In the third section, the methodology used in experiments is explained. In the fourth section,
CNN architecture is briefly explained. CNN-based deep learning techniques are summarized.
In the last section, the paper is finished after coming to a conclusion.


2. Related Work
Several researchers went through the various experiments and ideas about fruit quality analysis
using deep learning techniques. The precise review of various related papers is discussed here.
N. Ismail et al. [1] proposed a machine vision system using stacking ensembled deep learning


                                                109
techniques that automatically inspect the quality of the fruits, providing the real-time visual
inspection facility for interaction. To remove the noise, Gaussian filter with a value of 0.01
having a 3X3 kernel is used and for smoothing, cross-correlation function is applied on images,
Histogram equalization, contrast limiting threshold has been used using OpenCV in image pre-
processing. For image segmentation, Mean shift clustering, Otsu thresholding, and watershed
segmentation techniques. ResNet50, DenseNet121, NASNet and EfficientNet B0-B2 architectures
are used. The learning rate range test and Bayesian Optimization techniques were used to find
the optimal value of learning rate and hyperparameters respectively. Specificity, Area under ROC
Curve and sensitivity have been selected to evaluate the proposed model. EfficientNet-B2 gave
a high performance that provide a 99.2% recognition rate. To improve models, multiple features
of EfficientNetB0+ B1+ B2 were stacked. Sovon et al. [2] designed a model using deep CNN that
will prevent fresh fruits to be contaminated by other rotten fruits. MobileNetV2 architecture
having 19 layer was used to classify and recognize the rotten fruits. Max pooling and average
pooling were compared the base of accuracy and resulted max Pooling give higher accuracy
that is 99.46 % for training and 99.61% for validation. Lili Zhu et al. [3] presented a system
based on mobile visual using two layers image processing that will grade banana consisting
label as unripen, ripened, overripened further, well ripened and mid ripened are two class of
ripened banana. In first layer, SVM (Support Vector Machine) is used to classify the banana that
give 98.5% and in second layer, YOLOv3 locates defected area of peel which accuracy is 87.5%.
CycleGAN for augmentation, K-means for segmentation, SVM for classification, YOLOv3 for
grading, edge clouding to get less network communication as well as to reduce computational
resource were used. Recall, Precision, F1-Score are used for first classification evaluation while
for YOLOv3 mean Average Precision recall and IoU (intersection over union) were applied. V.
Bhole et al. [4] worked using 4560 thermal images with resolution 720x1280 as well as RGB
2322x4128 pixel of same mangoes images having 19 classes and proposed such a predictor
system that forecast the remaining time of mangoes using transfer learning concept based
on lighted weighted CNN architecture like MobileNetv2, ShuffleNet and SqueezeNet. After
experimentation found that ShuffleNet was faster than MobileNetv2 and SqueezNet. Precision,
recall, F1-score, false discover rate (FDR) and false positive rate (FPR) was used to measure the
performance and resulted thermal imaging outstrip RGB with accuracy 98.15O. M. Lawal et
al.[5] proposed a robust model, can be used in robots to harvest the fruits(muskmelon), named
YOLOMuskmelon for fast and accurate muskmelon detection using RELU activated ResNet43
as backbone with SPP(spatial pyramid pooling) for optimization, CIoU(complete intersection
over union) loss for better performance and fast convergence, residual block arrangement
to prevent Vanishing gradient, FPN(feature pyramid arrangement) to generalize the models
and DIoUNMS(distance intersection over union non maximum Suppression) for overlap area
consideration. YOLO Muskmelon detect the muskmelon with the speed of 96.3 frame per second
and 89.6 % precision value and 56.1% faster than YOLOv4. K. Zhang et al. [6] proposed an
automatic harvesting robot for strawberry using R-YOLO (rotational you only look once) that
was designed using YOLOv3 with MobileNet V-1 as backbone to extract features. In R-YOLO,
rotational angle of Fruits axis was calculated for precise localization of real-time video images
of strawberry detection. In R-YOLO (x, y, w, h, alpha) were the bounding box parameter where
alpha is angle Between long side of bounding box and y-axis, (x, y) are center coordinate, w is
width and h as height of bounding box. Feature extraction, recall and recognition accuracy is


                                               110
adversely affected because of using MobileNet but R-YOLO is 3.6 time faster than YOLOv3 that
was 18 frame per second. N. Stasenko et al. [7] trained a model for detection and prediction
of the decaying surface of apples using U-Net and DeepLab CNN architecture to improve
the storage process of apples. There is done investigation for the performance of U-Net and
DeepLab based on mIoU(mean intersection over union) then found U-Net and DeepLab yield
99.71% and 99.99% mIoU respectively. A testbed was used to capture RGB images of apples
having 12000 images and four class Malus Domestica Borkh, Fuji, MiroLeto, and Golden and
annotated into corresponding JSON files. To extract the feature ResNet was used as a backbone,
ImageNet as an encoder weight, loss, and mIoU as a performance metric, learning rate 0.001,
batch size was 4 for U-Net. In DeepLab Atrius convolution to control the size of the feature,
ASPP (atrous Spatial Pyramid Pooling). Chai C. Foong et al. [8] proposed a model to classify
the rotten fruits having six class fresh and rotten apples, oranges, and bananas,350 images each
using ResNet50. Segmentation using color threshold function, feature extraction, and HSV
color technique is used to detect the background of the image. The author runs the model
with segmentation and without segmentation on the same parameters batch size 10, learning
rate 0.0001 and epoch 6, found without segmentation yield the same accuracy that is 98.9% to
segmentation in less time. Jiangong N. et al. [9] proposed such a model that will forecast the
banana’s storage time and freshness using transfer learning and GoogleNet not involving any
destructive detection. The images of banana were captured in 11 days to create the dataset
and apply augmentation to adopt generalization and avoid overfitting of the network, gradient
classification activation mapping (Grad-CAM) for feature extraction were used. To determine
the model generalization strawberry images were used for training and testing and yielded a
92.47% accuracy rate. S. Bulla et al. [10] proposed a model that will help to prevent spreading the
of rottenness by classifying the rotten and fresh fruits, applying transfer learning. There is used
max-pooling layer to get rid of overfitting, reduce the amount of memory and time required for
computation, Batch Normalization for feature map normalization, and dropout that was 0.5 for
fast computational speed at each stage of convolution layer and regularization is used for adding
penalties used in loss function on the layer while optimizing. There are used Random_uniform
to initialize the kernel, bias, and weight that can be updated according to the output value.
Categorical cross-entropy as a loss function and Adam as an optimizer with 0.0001, learning
rate 16 batch size and 225 epoch was used achieved 97.8% accuracy. The Proposed model used
fewer filters and parameters that decrease computational time, memory usage, which makes a
feasible model to predict fresh and rotten fruits. K. Roy et al. [11] proposed a deep learning
model using real-time semantic segmentation for rotten parts of apple to detect and categories
fresh or rotten apples on the base of peel presented on the surface of apple’s RGB image using a
deep learning model and dataset downloaded from Kaggle. There were 3102 images of apple
in the dataset and the batch size was 97 and each batch has 32 images. En-UNet, in which
U-Net was used as a backbone, was used to segment the images and yield 97.46% training and
97.54% validation accuracy, While U-Net accuracy was 95.36% and 0.066 training and 0.062
validation loss. the performance of the model evaluated on the basis of accuracy, loss, and mean
IoU using 0.95 threshold value. During image pre-processing RGB images are converted into
gray images, threshold and binarization were used to get output. S. Bulla et al. [12] proposed
a CNN model that will automatically recognize and classifies fruits dataset consisting of two
types of images one is public having 758 images with a simple background consisting of 5


                                               111
classes and another is himself creating dataset having 1152 images with a complex background
consisting 7 classes. In pre-processing multi-channel images are converted into mono(blue)
images, the global threshold value is used for segmentation and Halcon software was used for
the identification and classification of fruits. The author experimented on various batch values
in the CNN model and found 56 and 64 were the best balance between memory capacity and
efficiency. To improve the classification accuracy enhancement techniques were combined three
methods of image enhancement such as random flip, random crop, and enhanced brightness after
the experiment combination of random flip and enhanced brightness yielded 98.1% accuracy. L.
Wu et al. [14] designed a model using data augmentation techniques and YOLOv4 to enable such
a robot that will pick the apples quickly and accurately in orchards having complex backgrounds
and used crawler technology for image labeling. EfficientNet replaced CSPDarknet53(Cross
Stage Partial Darknet53) and was used as the backbone in addition Convolution layer was also
added to adjust and extract the feature that will reduce the computational complexity making
model lighter. The result shows that YOLOv4 with EfficientNet-B0 gave better performance
comparable to YOLOv4, YOLOv3, and Faster R-CNN with ResNet for apple detection on the base
of precision, recall value, and F1 score. M. O. Lawal et al. [15] used modified YOLOv3 to propose
YOLO-tomato model which detects ripe and unripe tomatoes having complex environments.
The author conducted an experiment on trained YOLO-tomato, YOLOv3, and YOLOv4 to verify
the effectiveness of models on the basis of precision, recall, F1 score, and average precision
(AP), in which YOLO-tomato using mish and SPP to reduce missed detection and inaccuracies,
showed the best performance with 99.4% AP, better generalization and real-time detection. This
model can be used in harvesting robots in agriculture industries. V. Bhole et al. [16] created a
texture-based RGB and thermal images dataset of 11 varieties of fruits keeping them on the
revolving tray using the digital and thermal capturing camera in the experiment. There was
done experiments using KNN (k-nearest neighbor) and RF (random forest) algorithm to classify
the images of fruits and evaluated the performance of the classifier on the basis of accuracy
and Kappa value. KNN with RGB images showed more accuracy than RF. M. Oltean et al. [17]
dataset having 90380 images was created and built a deep learning model that will identify fruits
from an image having single and multiple fruits. The consumption of computational resources
is improved by adjusting the depth and width of the network having constant computational
power. The model was trained on RGB, HSV, and grayscale images and RGB images show
outperform comparable to others and yield 99.86% testing accuracy and 100% training accuracy.
F. Valentino et al. [18] proposed a CNN model based on computer vision for fruits freshness
detection and dataset was downloaded from Kaggle having 6 classes of fruits named fresh apple,
banana and oranges and rotten apple, banana, and orange. The value of dropouts was taken
0.25 to avoid overfitting and reduce the size of data. There is a used web application using
Python Flask for testing and this web can be accessed through mobile and PC using the browser.
Further, Table 1 describes related work briefly.


3. Methodology
The artificial brain is trained and modeled through design and artificial neuron that mimics
the human brain neuron. These review papers confer fruits quality and grading prediction


                                              112
Figure 2: Block diagram of fruit grading system


based on classification, computer vision technology, and CNN model such as AlexNet, VGG16,
VGG19, ResNet, GoogleNet, MobileNet e.t.c on the base of colour, texture, and shape of fruits.
Firstly, fruits are detected then fruits texture behaviour is analysed, then grading to the fruits
is assigned. On the basis of grading, time can be determined in which this particular type of
fruit will completely rotten. Figure 2 shows the block diagram of the CNN base fruits quality
grading system. The fruits images are separated into multiple classes by labeling them fruits
type [2]. Image pre-processing, segmentation, feature extraction, and classification is used in
the training phase.

3.1. Image Acquisition
In computer vision and image processing, image acquisition is the first step in which images are
retrieved from sources using various hardware systems like Ultrasound, Tomographic imaging,
stereo system, magnetic resonance image (MRI), X-ray, Thermal imaging [3] give significant
result as compared to RGB imaging [4] because thermal works on the internal feature of image
[4]. Machine vision system (MVS) can be used in which, images can be acquired from real-time
video and photographs [3]. Drive Webcam can be used to capture a real-time image and these
images are saved in google drive. Webcam can be used to capture real-time image and these
images are saved in google drive.

3.2. Image pre-processing
The source images that are often corrupted due to poor illumination and undesirable high
frequency signal, are processed to increase image information and improve the quality of raw
data using various filters such as Gaussian filter [1] to remove the noise, median filter [6],


                                                  113
rank filter log transformation [3] can be used to reduce and improve the contrast of the image.
Images can be resized according to the aspect ratio of the original image to maintain the aspect
ratio of the original image [2]. The ratio of width and height is known as the aspect ratio [4].
image Annotation [5][6] [8][15] and Labeling (LWYS approach) [15] are used for annotation
and labeling the image. Let consider I (x, y) is an original illuminated source image and r (x, y)
is a reflected image or filter with pixel (x, y). Then the image is processed using the following
formula shown in equation 1 which is known as processed image denoted by f (x, y).

                                    𝑓 (𝑥, 𝑦) = 𝐼 (𝑥, 𝑦) ⋆ 𝑟(𝑥, 𝑦)                              (1)

3.3. Data Augmentation and Enhancement
It is a technique to amplify images without label changing to enlarge dataset by applying crop-
ping, brightness, dropout [1][4][14] rotation, zooming, contrast changing, mirroring, translation
[9] [14], horizontally and vertically flipping, shearing [2] and shifting methods. The size of
dataset volume can be increased using CycleGAN [3]. In paper [12], there were randomly
combined image enhancement methods, such as the first is a combination of random cropping
and random flip, the second is a combination of enhanced brightness and random flip, the third
is a combination of random crop and enhanced brightness, and the fourth is a combination
of random crop, random flip and enhanced brightness. Image enhancement method has a
significant improvement in the classification and recognition of self-made data sets [12].

3.4. Image segmentation
Image segmentation is a mechanism to partitioned the target from background to extract the
region of interest and understand and analysis the important segment of an image on the base
of color, texture, shape, brightness, contrast, and gray level characteristics of fruits [3] and
assign a label to every pixel in the image. There are various techniques for segmentation such
as clustering, threshold method, edge-based on segmentation, partial differential equation base
techniques and ANN-based segmentation, k-means for segmentation [3], Grad-CAM (Gradient
weighted class activation method) [9][1].

3.5. Feature Extraction
To decrease the number of resources and reduce the dimensionality needed to describe and
analyze large data feature extraction is performed. Color features (color coherence vectors, color
moments, color sets, and color histograms), RGB, and HSV (for color segmentation HSV can be
used) for statistical analysis. HSV is more suitable comparable to RGB [3], texture features [LBP
local binary pattern], shape features, spatial features, PCA [3] techniques, and GLCM (grayscale
co-occurrence matrix) can be used to extract texture base feature of image [9][16].

3.6. Classification
Image classification is a process in which an image is taken as an input and gives the output
in form of the probability of each class. To whom class input image belongs, would have the


                                                114
highest probability. On the basis of this probability, value images are classified having labeled.
KNN, Random-forest (consisting of many decision trees with no correlation to each other [9],
Naïve Bayes, SVM (is capable to classify both linear and non-linear data in consisting high
dimensional space with high accuracy [21]). For simple classification, the performance of KNN
is better than SVM [3]. Deep learning is more accurate than the machine learning model for
fruits classification [1]. The selection of Kernel plays an important role in classification accuracy
improvement and the same dataset give different accuracy on different kernel [21].

3.7. Evaluation and Prediction
After training and testing, next phase is to measure the model performance using Sensitivity or
recall, specificity, precision, f1 score, accuracy that can be calculated using following equation
2, 3, 4, 5 and 6 respectively [2][6].
                                                                𝑇𝑃
                                 𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦𝑜𝑟𝑟𝑒𝑐𝑎𝑙𝑙 =                                           (2)
                                                           (𝑇 𝑃 + 𝐹 𝑁 )
                                                            𝑇𝑁
                                     𝑆𝑝𝑒𝑐𝑖𝑓 𝑖𝑐𝑖𝑡𝑦 =                                              (3)
                                                        (𝑇 𝑁 + 𝐹 𝑃)
                                                          𝑇𝑃
                                      𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =                                                (4)
                                                      (𝑇 𝑃 + 𝐹 𝑃)
                                                 (𝑅𝑒𝑐𝑎𝑙𝑙 ∗ 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛)
                                 𝑓 1 − 𝑠𝑐𝑜𝑟𝑒 =                                                   (5)
                                                 (𝑅𝑒𝑐𝑎𝑙𝑙 + 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛)
                                                  (𝑇 𝑃 + 𝑇 𝑁 )
                               𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =                                                        (6)
                                            (𝑇 𝑃 + 𝑇 𝑁 + 𝐹 𝑃 + 𝐹 𝑁 )

4. CNN Architecture
In the field of image pre-processing and computer vision, CNN is the most powerful method of
deep learning for image classification, digital character recognition, and object recognition [21].
CNN has a stack of convolution layers to transform input image into output as a probability of
classes and yield the highest probability for that class, from which the input image belongs to.
That is the reason why authors have mainly studied the papers on CNN. CNN consists of the
input layer, convolution layer, normalization, Pooling, and fully connected layer.
   In Figure 3, the input image has 224 height and width with 3 channel is applied to the
convolution neural network layer by layer. There is taken three convolution layers that use
(3X3) pool size, 64, 128, and 256 filters. RELU activation function is used to reduce the exponential
growth in computation in operating CNN and batch normalization is taken to increase the
computation rate. To reduce the size of the dimension of data, max pooling is passed that has a
size of (2X2) filter and s indicates stride to slide the max pool filter over input image.


                                                  115
Figure 3: CNN architecture


4.1. Convolution layer
In image processing, Convolution is a basic building block that is applied for smoothing,
sharpening, and edge detection. There is performed corresponding element-wise multiplication
between the matrix of the input image and convolution kernel, then to represent a grid cell like
a pixel in output feature map all multiplied elements are added. The convolution kernel having
a square matrix of integers is applied on a subset of the input pixel value of image from the top
left corner of the input image and kernel is stridden from left to right and top to bottom to apply
convolution at every pixel of image [20] to get feature map. If multiple convolution kernels are
applied within a convolution layer then multiple features maps are created as an output [10].

4.2. RELU
RELU (rectified linear unit) is used as a non-linear activation function between the convolution
layer and the Pooling layer [10], which will eliminate the negative value of pixel into zero. If
there is taken real-world data for the CNN model, a lot of real-world data is non-linear in nature
so RELU is implemented for non-linear data on the CNN model [8]. A separate study on various
activation functions such as RELU, Swish, Mish, and Leaky is done by O. M. Lawal et al. [5]
to determine the most effective on the model and on the base of the P-R curve, AP (average
precision) RELU perform remarkable.

4.3. Batch normalization
Batch normalization helps in providing stability in model prediction, overfitting reduction by
using regularization and increasing the speed of training by order of magnitude. It is the process
of normalization within the activation layer of the current batch subtracting the mean of the
batch’s activation function and dividing the standard deviation batch’s activation function. SGD
undo the normalization minimizing loss function [23].


                                               116
4.4. Pooling
To reduce the dimensionality of the feature map retaining important features to avoid the
overfitting pooling layer is used [8]. There are various polling methods such as min pooling,
max pooling, average pooling. In max pooling, the net value is replaced by the max value of
nearby elements of feature maps or channel in the window eliminating by the largest element
[9][10].

4.5. Flatten and Fully connected layers
The image matrix finding from the final convolution layer is transformed into 1-D vector to
flat the input then used as input for a fully connected layer. In a fully connected layer that
is the same as ANN, all the neurons of input are connected to every neuron of the next layer
and perform the mathematical operation as shown in equation [7]. Where x is input having
dimension [n,1], w is weight having dimension [n, m] where n and m are number of neurons
in previous and current layer respectively and b is bias having size [n, 1]. In the last, fully
connected layer Softmax activation is used to predict probabilities of input being classified
[4][7].
                              𝑌 = 𝐴𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛𝑓 𝑢𝑛𝑐𝑡𝑖𝑜𝑛(𝑥 ∗ 𝑤 + 𝑏)                            (7)

4.6. Hyperparameter Tunning
Optimizers used to optimize the accuracy of the model are based on hyperparameters that
are fine tuning optimization, including parameters such as the number of epochs, batch
size, optimizer, iterations per epoch, and validation per step [1][18]. There is a need to
set hyperparameters variables before applying learning algorithms [24]. Effects of various
hyperparameters such as-

   • Effect of Batch Size : To reduce memory usage, the number of input samples is applied
to the layers of the network [10]. Batch size largely impacts on the result of the experiment if
the batch size is too small then there is fear of underfitting and on taking large value there may
be fear of overfitting [12].

   • Effect of Number of epochs : One pass or iteration over the entire dataset is known as
an epoch. Overfitting and underfitting are two problems that might be incurred during epochs
optimization which model learns even noise that impacts negative effect on model accuracy. If
the model is trained using the small number of epochs, might incur underfitting and the model
is trained using large epochs that may incur overfitting. If validation error is increased then the
trained model is said overfitted [24]. What could the correct number of epochs be taken is
depend upon training and validation loss.

  • Effect of optimizer : Performance of model by updating weight parameters to reduce
the loss function where the loss function is differences between actual and predicted output.
Feature optimization is a process that reduces the overfitting risk using mathematical functions.
To converge the cost function to the global minima and for minimum misclassification


                                                117
hyperparameter should be selected carefully [18].

   • Effect of learning rate : Learning rate is known as step size, consisting of weights and
biases that are initialized randomly before training and low learning rate yield overfit data while
high learning rate yield underfits data and divergent nature [24]. During the training of the
model, some current values of weights are updated taking place next epoch, these weights are
known by learning rates consisting of a range from 0 to 1 [10]. There was seen, on reducing the
value of learning rate from 0.1 to 0.0001 accuracy of the model improved from 17.36% to 97.82%.
CLR is used to find appropriate learning rate in paper [1].


5. Deep Learning Techniques under CNN for Fruits Life
   Prediction
A deep neural network comprises the interconnection of neurons that perform complex tasks
that are challenging for people. The input and weight are assigned to neurons then they will
transform input into output. The output of the previous layer behaves like an input for neurons
of the next layer [19]. Deep learning models are trained on a large amount of labeled datasets
using strong computing power such as GPU [20].

5.1. ResNet
To solve the vanishing and exploding gradient problem, ResNet is used which is a major
breakthrough in image processing. It uses skip connection technique to skip training from few
layers and directly connects to output layer. ResNet consists of deep neural network. To reduce
computational time, ResNet50 is used because it consists of 50 deep layers [8].

5.2. DenseNet
DenseNet use fewer parameters, promote the reusability of the features, lighten the problem
of gradient disappearance and strengthen features flow. Connectivity, DenseBlocks, growth
rate and bottleneck layers are the components of DenseNet. DenseNet121 comprise 121 layers
having pretrained weights [1].

5.3. MobileNetv2
MobileNet is light weight deep learning network because it uses depth-wise seperable con-
volution [2], Mainly it is used for reducing memory consumption by reducing the number of
parameters and using inverted residual which are employed between the layers of bottleneck.
For this reason, it can be used in which system that has less computing power [1][2].

5.4. EfficientNet
It is a CNN and scaling method that scales dimension of resolution of image, depth and width
of network using a set of fixed scaling coefficients. It is capable to optimize accuracy as well as


                                                118
better efficiency by performing neural architectural search. The base EfficientNet-B0 is based
on the inverted bottleneck residual block of MobileNetv2 [1].

5.5. ShuffleNet
ShuffleNet uses pointwise group convolution to reduce computation complexity and channel
shuffle to flow information across feature channels. It is designed for mobile devices having
efficient computation power and cost maintaining accuracy [4].

5.6. SqueezNet
SqueezNet is used as replacement of AlexNet having smaller network. It has 50x less parameters
than AlexNet and three time faster than AlexNet. It has Sqeeze consisting 1x1 filters and expand
layers consisting 3x3 filters [4].

5.7. GoogLeNet
GoogLeNet has 22 deep layers, is excellent CNN model which is achieve through the ImageNet
pretrained data. The purpose of GoogLeNet is to reduce the networks parameters, prevent
overfitting and make the network faster [9].
  On the base of the parameters discussed above, the result of various authors is compared in
Table 2.


6. CONCLUSION AND FUTURE SCOPE
Several researches have been done using state-of-art deep learning techniques for fruits quality
forecasting. In this paper, several existing papers are reviewed then compared. Among all
reviewed papers, existing methods yield remarkable accuracy with many challenges such as
high loss rate and more detection and learning time-consuming. In these methods, only a single
view of the image was used. Various image enhancements techniques were combined and
applied filters to improve classification accuracy and yield higher validation accuracy. However,
there is a lack of a robust and generative system that can be used for sorting, counting, rotten
fruits detection, and grading in multiple fruits automatically because all researchers have used
such a dataset which have only one fruit of a particular type in the images. This review will
surely aid for further research work having the new design of CNN model for automatic fruits
grading and quality prediction for smart agriculture industry areas. In the future, an enhanced
deep learning system can be developed that can predict the time span, in how many days this
fruit will rotten so that fruits could be sold before spoil. It would provide profit to the seller of
the fruits industry, the customer as well as the economy of the country.


References
[1] N. Ismail and O. A. Malik, “Real-time visual inspection system forgrading fruits using
    computer vision and deep learning techniques,”Information Processing in Agriculture, 2021.


                                                119
[2] S. Chakraborty, F. J. M. Shamrat, M. M. Billah, M. Al Jubair, M. Alauddin, and R. Ranjan,
    “Implementation of deep learning methods toidentify rotten fruits,” in2021 5th International
    Conference on Trendsin Electronics and Informatics (ICOEI), pp. 1207–1212, IEEE, 2021.
[3] L. Zhu and P. Spachos, “Support vector machine and yolo for a mobilefood grading sys-
    tem,”Internet of Things, vol. 13, p. 100359, 2021.
[4] V. Bhole and A. Kumar, “A transfer learning-based approach to predictthe shelf life of
    fruit,”Inteligencia Artificial, vol. 24, no. 67, pp. 102–120, 2021.
[5] O. M. Lawal, “Yolomuskmelon: quest for fruit detection speed andaccuracy using deep
    learning,”IEEE Access, vol. 9, pp. 15221–15227,2021.
[6] Y. Yu, K. Zhang, H. Liu, L. Yang, and D. Zhang, “Real-time visual localization of the
    picking points for a ridgeplanting strawberry harvestingrobot,”IEEE Access, vol. 8, pp.
    116556–116568, 2020.
[7] N. Stasenko, E. Chernova, D. Shadrin, G. Ovchinnikov, I. Krivolapov,and M. Pukalchik,
    “Deep learning for improving the storage process:Accurate and automatic segmentation
    of spoiled areas on apples,” in2021 IEEE International Instrumentation and Measurement
    TechnologyConference (I2MTC), pp. 1–6, IEEE, 2021.
[8] C. C. Foong, G. K. Meng, and L. L. Tze, “Convolutional neural networkbased rotten fruit de-
    tection using resnet50,” in2021 IEEE 12th Controland System Graduate Research Colloquium
    (ICSGRC), pp. 75–80, IEEE,2021.
[9] J. Ni, J. Gao, L. Deng, and Z. Han, “Monitoring the change processof banana freshness by
    googlenet,”IEEE Access, vol. 8, pp. 228369–228376, 2020.
[10] S. S. S. Palakodati, V. R. R. Chirra, D. Yakobu, and S. Bulla, “Freshand rotten fruits classifi-
    cation using cnn and transfer learning.,”Rev.d’Intelligence Artif., vol. 34, no. 5, pp. 617–622,
    2020.
[11] K. Roy, S. S. Chaudhuri, and S. Pramanik, “Deep learning based real-time industrial frame-
    work for rotten and fresh fruit detection usingsemantic segmentation,”Microsystem Tech-
    nologies, vol. 27, no. 9,pp. 3365–3375, 2021.
[12] L. Wu, H. Zhang, R. Chen, and J. Yi, “Fruit classification using convolutional neural
    network via adjust parameter and data enhancement,”in2020 12th International Conference
    on Advanced ComputationalIntelligence (ICACI), pp. 294–301, IEEE, 2020.
[13] H. B. ̈Unal, E. Vural, B. K. Savas ̧, and Y. Becerikli, “Fruit recognitionand classification with
    deep learning support on embedded system(fruitnet),” in2020 Innovations in Intelligent
    Systems and ApplicationsConference (ASYU), pp. 1–5, IEEE, 2020.
[14] L. Wu, J. Ma, Y. Zhao, and H. Liu, “Apple detection in complex sceneusing the improved
    yolov4 model,”Agronomy, vol. 11, no. 3, p. 476,2021.
[15] M. O. Lawal, “Tomato detection based on modified yolov3 framework,”Scientific Reports,
    vol. 11, no. 1, pp. 1–11, 2021.
[16] V. Bhole, A. Kumar, and D. Bhatnagar, “A texture-based analysis andclassification of fruits
    using digital and thermal images,” inICT Analysisand Applications, pp. 333–343, Springer,
    2020.
[17] H. Mures ̧an and M. Oltean, “Fruit recognition from images using deeplearning,”Acta
    Universitatis Sapientiae, Informatica, vol. 10, no. 1,pp. 26–42, 2018.
[18] F. Valentino, T. W. Cenggoro, and B. Pardamean, “A design of deeplearning experimentation
    for fruit freshness detection,” inIOP Conference Series: Earth and Environmental Science,


                                                 120
    vol. 794, p. 012110, IOPPublishing, 2021.
[19] Y. Kumar, A. K. Dubey, R. R. Arora, and A. Rocha, “Multiclassclassification of nutrients
    deficiency of apple using deep neural network,”Neural Computing and Applications, pp.
    1–12, 2020.
[20] D. T. P. Chung and D. Van Tai, “A fruits recognition system based ona modern deep learning
    technique,” inJournal of physics: conferenceseries, vol. 1327, p. 012050, IOP Publishing,
    2019.
[21] D. Karakaya, O. Ulucan, and M. Turkan, “A comparative analysis onfruit freshness classifi-
    cation,” in2019 Innovations in Intelligent Systemsand Applications Conference (ASYU), pp.
    1–4, IEEE, 2019.
[22] J. Feng, L. Zeng, and L. He, “Apple fruit recognition algorithm based onmultispectral
    dynamic image analysis,”Sensors, vol. 19, no. 4, p. 949,2019.
[23] A. F. Agarap, “Deep learning using rectified linear units (relu),”arXivpreprint
    arXiv:1803.08375, 2018.
[24] S. Afaq and S. Rao, “Significance of epochs on training a neuralnetwork,”International
    Journal of Scientific and Technology Research,vol. 19, no. 6, pp. 485–488, 2020.


                                             121
Table 1
Some Related Work
 Author           Dataset          Dataset Used        Methods used      Merits            Demerits
 N. Ismail et al. Apple and        Internal feeding    ResNet50,         Deep learning     Classifier confuse
 [1]              Banana           worm dataset of     MobileNet V2,     based on ma-      between yellow-
                  images           CASC                DenseNet-121,     chine learning    ish green and
                                                       NASNet-A and      model      has    green
                                                       EfficientNet      cost low for
                                                       B0-B2             grading fruits.
 Sovon et al. [2]   Apple, ba-     Kaggle              MobileNetV2       Performance       Loss is very high
                    nana and                                             is very high
                    orange
                    images
 Lili Zhu et al.    150 banana     Author created      SVM     and       Less network      Too much small
 [3]                images         from    online      YOLOv3, CY-       communi-          dataset is taken
                                   source              cleGAN            cation    and
                                                                         save     com-
                                                                         putational
                                                                         resource
 V. Bhole et al.    4560 Ther-     Author created      MobileNetv2,      proposed a        Thermal imaging
 [4]                mal    and     Real images of      Shuf-             versatile sys-    is expensive
                    RGB images     three type of       fleNet   and      tem can work
                    of mangoes     mangoes             SqueezeNet.       RGB as well
                    each                                                 as thermal
                                                                         image      for
                                                                         prediction
 O. M. Lawal et     410            collected from      ResNet43,         Model is ro-      To mentioned ac-
 al. [5]            muskmelon      greenhouse in       SPP,     FPN,     bust and fast.    curacy and too
                    images         various province    DIoU-NMS                            much deep net-
                                   of China and        and CIoU                            work is used.
                                   labeled       by
                                   Github
 K. Zhang et al.    2000 Straw-    Downloaded          R-YOLO            Proposed          Performance is
 [6]                berry          from       Inter-   improved          model was         poor for multiple
                    images         net and real        YOLOv3 with       excellent         fruits, occlusion
                                   captured      of    MobileNet-V1,     real-time per-    and overlap.
                                   strawberry          K-Means           formance and
                                                                         3.6 time faster
                                                                         comparable
                                                                         to YOLOv3
 N. Stasenko et     12000 apple    Captured using      U-Net   and       mIoU        for   Not focus on ac-
 al. [7]            images         Testbed             DeepLab           Deeplab was       curacy and Preci-
                                                                         remarkable.       sion.
 Chai C. Foong      2100 apples,   Kaggle              ResNet50          Taking less       Not always accu-
 et al. [8]         bananas and                                          time to be        rate for green ap-
                    oranges                                              trained.          ples.
 Jiangong N. et     618 banana     Author    pho-      Transfer          Model is fast     Recognition was
 al. [9]            images         tographed           learning and      and scalable,     not correct
                                   images during       GoogleNet         can be de-
                                   11 days                               ployed      on
                                                                         mobile
 S. Bulla et al.    5989 apples,   Kaggle              Transfer learn-   Model is fea-     More number of
 [10]               banana and                         ing and CNN       sible because     epochs are used
                    oranges im-                                          being used of     to get remarkable
                    ages                                                 less compu-       accuracy.
                                                                         tational time
                                                                         and memory
                                                                         space.


                                                 122
Table 2
Comparison of Various Authors on the base of Sensitivity, Specificity, F1 Score and Accuracy
 Author                   Sensitivity or   Specificity         Precision        F1-score       Accuracy
                          recall

 N. Ismail et al. [1]     99.6 % for ap-   99.2 % for ap-      NA               NA             99.5 % for ap-
                          ple 98.9% for    ple 99.1% for ba-                                   ple 99.2% for
                          banana           nana                                                banana
 Sovon et al. [2]         98.6%            NA                  98.6%            98%(average)   99.6%
 Lili Zhu et al. [3]      96.8             NA                  98.5%            97.6 %         98.5 %
 V. Bhole et al. [4]      RGB      97%     NA                  97.5% and 98.3   RGB 97.3%      RGB 97.1%
                          and Thermal                          %                and Thermal    and Thermal
                          98.1%                                                 98.2%          98.5%

 O. M. Lawal et al. [5]   82%              NA                  96.3%            84%            NA
 K. ZHANG et al. [6]      93.5%            NA                  94.4%            93.9%          84.3%
 Jiangong N. et al. [9]   99.6%            NA                  100%             99.8           98.9%
 S. Bulla et al. [10]     NA               NA                  NA               NA             97.8%
 L. Wu et al. 14]         97.4%            NA                  95.5%            96.5%          NA


                                                    123