Estimating Tomato Fruit Masses through Image Processing and
                         Artificial Intelligence
                         Elognissè Erasme Guérin AGOSSADOU1,† , Mahugnon Géraud AZEHOUN PAZOU1,∗,† ,
                         Régis Donald HONTINFINDE1,† and Ahmed Dooguy Kora2,†
                         1
                             Université nationale des sciences, Technologie, Ingénierie et Mathématiques (UNSTIM), POBox 486, SOGBO ALIHO, Abomey, Benin
                         2
                             EDMI, Cheikh Anta Diop University. Dakar, Senegal.


                                           Abstract
                                           The integration of intelligent and connected production systems has positioned artificial intelligence (AI) as a pivotal component in
                                           society’s digital transformation, becoming indispensable. Leveraging the vast amounts of data generated, AI can now make critical
                                           decisions to mitigate potential disasters. This study focuses on developing a method that combines computer vision and machine
                                           learning algorithms to estimate tomato weights. A dataset of tomato images was compiled, and a modified Mask R-CNN algorithm
                                           was employed to detect, segment, and extract individual fruit masks. Various regression models were evaluated to predict tomato
                                           weight based on visual features. The results on the test dataset indicate that this approach can estimate the number and total weight of
                                           tomatoes with approximately 93% accuracy. This research highlights the potential for automated monitoring of market garden crop
                                           yields through AI.

                                           Keywords
                                           tomato fruit mass estimation, image processing, prediction models, Neural network, deep learning, pix2pix, rcnn


                         1. Introduction                                                                                              abling the extraction and counting of fruit centroids. The
                                                                                                                                      study reported a detection precision of 0.88 and recall of
                         Agriculture faces major challenges in sustainably feeding                                                    0.80, demonstrating the method’s efficacy in controlled en-
                         a growing global population, making accurate crop yield                                                      vironments for tomato detection and counting.
                         estimation essential for informed decision-making by farm-                                                      In Indonesia, the increasing demand for tomatoes neces-
                         ers. While traditional methods such as field surveys can be                                                  sitates efficient post-harvest handling. A study by Sari et
                         helpful, they are often limited by issues of accuracy, cost,                                                 al. [5] proposed a sorting system that categorizes tomatoes
                         and time efficiency.                                                                                         based on color, size, and weight using image processing
                            Tomato (Solanum lycopersicum) is a crucial vegetable                                                      with the OpenCV [6] library. The system sorts tomatoes
                         crop globally, boasting 183 million tonnes in 2018 [1]. Native                                               into red, yellow, and green categories and measures dimen-
                         to Central and South America, the tomato was introduced to                                                   sions by identifying the outermost points of the detected
                         Europe in the 16th century, quickly gaining popularity for                                                   fruits. It utilizes a weight sensor for mass measurement.
                         its delicious, nutrient-rich fruits loaded with vitamins, min-                                               The prototype, which incorporates a webcam, Arduino, and
                         erals, and antioxidants [2]. Major producers include China,                                                  conveyor system, achieved 100% accuracy in color detec-
                         India, the United States, and Turkey, with significant cultiva-                                              tion and 95% in weight measurement, although dimensional
                         tion also occurring in African nations such as Nigeria, Egypt,                                               measurement accuracy was only 5%.
                         Morocco, and Algeria, primarily for local consumption [3].                                                      Van Daalen et al. [7] examined the application of aug-
                         Tomatoes are generally classified into two main varieties:                                                   mented reality (AR) in agriculture, focusing on detecting
                         determinate, which have limited growth, and indetermi-                                                       tomato ripeness using the 3D scanning capabilities of the
                         nate, which continue growing throughout their lifecycle.                                                     HoloLens [8]. Their experimental setup, which included
                         Whether cultivated in open fields or under protective covers                                                 various tomato varieties, highlighted both the opportunities
                         like greenhouses, tomato farming requires careful irriga-                                                    and challenges of using AR for hands-free tasks like training
                         tion due to the plant’s deep taproot system. Furthermore,                                                    and harvesting in greenhouse environments.
                         challenges such as pest infestations—like downy mildew                                                          Similarly, Lee et al. [9] proposed an artificial intelligence-
                         and Botrytis necessitate the use of appropriate cultivation                                                  based system for tomato detection and mass estimation,
                         practices and phytosanitary measures to ensure optimal                                                       utilizing multi-class detection and instance-wise segmenta-
                         yields.                                                                                                      tion. By analyzing a tomato image dataset with a calibrated
                            Several approaches have been investigated in the liter-                                                   vision system, the study demonstrated a high correlation
                         ature to address the challenge of fruit weight estimation.                                                   between fruit dimensions and mass. Their method achieved
                         For instance, Yamamoto et al. [4] developed a method to                                                      a mean absolute percentage error of 7.09%, showcasing the
                         accurately count individual tomato fruits from images of                                                     effectiveness of computer vision and machine learning for
                         plants grown in a laboratory setting. This method employed                                                   automating tasks such as yield monitoring and fruit sizing.
                         decision trees to analyze pixel color characteristics, achiev-                                                  In another study, Nyalala et al. [10] developed seven re-
                         ing precise pixel-level segmentation. Post-processing was                                                    gression models, including Support Vector Regression (SVR)
                         then applied to group pixels corresponding to fruits, en-                                                    [11] and artificial neural networks (ANNs) [12] with differ-
                                                                                                                                      ent training algorithms. These models effectively estimated
                         Cotonou’24: Conférence Internationale des Technologies de l’Information
                                                                                                                                      fruit weight and volume, offering significant potential for
                         et de la Communication de l’ANSALB, June 27–28, 2024, Cotonou, BENIN
                         ∗
                              Corresponding author.                                                                                   improvements in fruit sorting and grading processes.
                         †
                             These authors contributed equally.                                                                          Basak et al. [13] introduced a non-destructive method
                         Envelope-Open agossadourin@gmail.com (E. E. G. AGOSSADOU);                                                   for estimating strawberry fruit weight using machine learn-
                         geraud.pazou@gmail.com (M. G. A. PAZOU); hontinfinde7@gmail.com                                              ing models. By analyzing 900 samples from three different
                         (R. D. HONTINFINDE); ahmed.kora@esmt.sn (A. D. Kora)                                                         strawberry cultivars, they used image processing to calcu-
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License
                                       Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
late pixel numbers. Linear regression (LR) and non-linear         Table 1
SVR models were applied, resulting in training and testing        Dataset Overview
accuracies of 96.3% and 89.6%, respectively.                            Source            Number of        Number of
   This study focuses on applying recent advancements in                                  images           fruit instances
computer vision, particularly object detection, and machine
learning algorithms to estimate tomato weight from real-                Online            180              1043
                                                                        Field-collected   100              100
world images. The subsequent sections describe the equip-
ment used, the structure and composition of the dataset,                Total             280              1143
and the methodology employed to generate accurate quan-
titative measures such as projected surface area and total
                                                                  Table 2
weight for detected fruits. Our findings demonstrate the          Additional information on images taken in the field
effectiveness of this approach. Additionally, we discuss the
challenges faced and propose recommendations for future
                                                                                         weight    real_surface (mm2 )
research.
                                                                            count    100.000000             100.000000
                                                                            mean      33.341900            2565.479377
2. Material and Methods                                                        std
                                                                              min
                                                                                      13.884898
                                                                                       9.930000
                                                                                                            912.439551
                                                                                                            856.037079
                                                                              25%     19.932500            1723.114236
2.1. Dataset                                                                  50%     35.955000            2609.542487
The data used in this study consists of tomato fruit images                   75%     42.877500            3186.808853
collected both online and in the field under real-world con-                 max      63.760000            4931.281258
ditions. The dataset includes a total of 180 images obtained
online and 100 images taken in the field, containing a to-
tal of 1143 tomato fruit instances. Table 1 illustrates the
                                                                  2.2.2. Projected Surface Area Estimation of Each
composition of our dataset.
                                                                         Tomato
   Images captured in the field helped to collect additional
information such as actual fruit area and actual fruit weight,    To evaluate the projected area of each tomato from images,
which enriches the dataset by providing accurate and rele-        a dataset was constructed, including individual images of
vant measurements for tomato fruit weight estimation. Ta-         tomatoes, their actual weight in grams, the total number
ble 2 presents additional insights concerning field-captured      of pixels in the image, the number of pixels corresponding
images. Upon analysis of the table, the average fruit weight      tomato (obtained by semantic segmentation), and the total
is 35.30 g , with a standard deviation of 14.56 g . The aver-     area of the image in square meters, obtained by camera
age true area is 2673.48 mm2 , with a standard deviation of       calibration.
873.68 mm2 . Quartile values provide insights into the distri-       The estimation of the projected area took place in two
bution of the data. Thus, 25% of the fruits have a weight of      steps: first, the segmentation mask allows us to calculate
less than 25.21 g, 50% have a weight of less than 37.00 g , and   the area in pixels occupied by the tomato in the image.
75% have a weight of less than 43.49 g. For the actual surface    Then, a camera calibration converted this pixel area into
area, 25% of fruits have an area less than 2, 024.93 mm2 , 50%    an actual metric area, using a coin as a reference object.
have an area less than 2, 779.53 mm2 , and 75% have an area       By photographing the tomatoes under the same conditions
less than 3, 219.12 mm2 .                                         as the reference piece, the resulting conversion factor was
                                                                  used to convert the pixel area of each fruit into a measure
2.2. Methods                                                      of its actual projected area in metric units. This method
                                                                  uses a rule of three, where the actual surface area of the
To estimate tomato fruit weights, we developed a four steps       tomato (𝐴𝑡𝑜𝑚𝑎𝑡𝑜 ) is estimated based on the number of pixels
approach (see figure 1)                                           corresponding to the tomato in the image (𝑃𝑡𝑜𝑚𝑎𝑡𝑜 ), using
                                                                                                                             𝐴𝑟𝑒𝑓
                                                                  the conversion factor established during calibration: 𝑃 .
2.2.1. Detection, segmentation and extraction of                                                                              𝑟𝑒𝑓

       tomato fruit masks                                                                                  𝐴𝑟𝑒𝑓
                                                                                     𝐴𝑡𝑜𝑚𝑎𝑡𝑒 = 𝑃𝑡𝑜𝑚𝑎𝑡𝑒 ×                        (1)
To train our segmentation model, we prepared a dataset                                                     𝑃𝑟𝑒𝑓
of tomato images, labeled in the COCO format. The                    With this method, we were able to estimate the real sur-
dataset consisted of 180 images containing 1043 instances of      face area of each tomato in physical space from segmenta-
tomatoes, sourced from both the internet and field pho-           tion in image space, thanks to precise calibration using a
tography, and annotated using the ROboflow platform.              reference object.
We employed the Mask R-CNN instance segmentation
model through the Detectron2 framework, selecting the             2.2.3. Tomato Mass Estimation
mask_rcnn_R_50_FPN_3x configuration developed by Face-
book AI Research. This model, pre-trained on the COCO             To estimate the weight of the tomatoes based on their pro-
dataset, combines the Mask R-CNN architecture with a              jected surface area, we tested several regression models,
ResNet-50 backbone and Feature Pyramid Network (FPN)              including Simple Linear Regression (SLR), Multiple Linear
for high-performance, multi-scale object detection.               Regression (MLR), and Partial Least Squares Regression
                                                                  (PLSR). These models aimed to establish a mathematical
                                                                  relationship between the surface area (independent vari-
                                                                  able) and the weight (dependent variable) of the tomatoes.
         Figure 1: Summary illustration of the methodology


  Figure 2: Model accuracy                                      Figure 3: Evolution of the cost function


The performance of each model was evaluated on a vali-          Table 3
dation set consisting of 20% of the total dataset, collected    Model results in terms of Average Precision
under real-world conditions. Standard metrics, such as Root
                                                                Metric           AP        AP50      AP75     APm      APl
Mean Square Error (RMSE) and the Coefficient of Deter-
mination (𝑅2 ), were employed to assess model accuracy.         Detection        55.901    74.083    62.361   30.294   66.144
We also applied 10 -fold cross-validation to each model to      Segmentation     54.591    73.763    61.112   24.978   64.943
reduce the likelihood of overfitting.
   Figure 1 depicts the summary of the methodology adopted
in this study.                                                  to enhance performance.
                                                                   The projected surface area of each fruit was derived from
                                                                the segmented mask by calculating the pixel area, then con-
3. Results and Discussion                                       verting it to real-world units using camera calibration infor-
                                                                mation as defined in Equation 1. This method achieved a
3.1. Results                                                    precision of approximately 95.
Figure 2 illustrates the model’s accuracy, while Figure 3          For tomato weight estimation, a subset of the dataset con-
depicts the evolution of the cost function                      taining real-world images was used, which included precise
   The performance of the model was evaluated on the test       data on both the actual weight of each tomato and their pro-
set consisting of 19 images containing a total of 149 tomato    jected surface area. A mathematical relationship between
annotations. The Average Precision (AP) metric was used to      the weight and projected area was established through the
quantify the model’s ability to correctly detect and segment    evaluation of several regression methods. The algorithms
tomatoes under various conditions.                              tested included Least Squares Regression (LSR), Multiple Lin-
   Table 3 presents the results obtained for the detection      ear Regression (MLR), and Support Vector Machines (SVM),
and semantic segmentation tasks. We observe an average          and their performance was compared using cross-validation
AP of 55.9% for detection and 54.6% for segmentation on         and Mean Square Error (MSE) as the evaluation metric.
different IoU thresholds between 0.5 and 0.95. The model           Table 4 highlights the performance metrics of the tested
achieves better performance on large fruits (AP of 66.1% in     models.
detection) than on small tomatoes (AP of 30.3%).                   Among the evaluated models, Lasso Regression achieved
   These results confirm the model’s effectiveness in detect-   the best performance, with a MAE of 5,776 and an MSE of
ing and segmenting tomatoes in real-world conditions. Fur-      62.99.
ther data annotation and model optimization are expected           The corresponding model equation is:
     Table 4
     Performance metrics of different models
                                                          MSE          MAE         RSE          R2
                              Linear Regression           67.465310    5.959565    8.110772     0.614756
                              Lasso Regression            62.990660    5.775707    7.900871     0.659433
                              Ridge Regression            64.222324    5.820851    7.789839     0.662985
                              ElasticNet Regression       65.214001    5.919661    8.063410     0.534604
                              SVR                         81.623252    6.884133    8.980888     0.564414
                              Random Forest               67.078331    6.002012    8.102465     0.622985
                              AdaBoost Regression         76.441269    6.757964    8.621712     0.578526
                              KNeighbors Regression       68.750815    6.179380    8.225651     0.634068
                              Decision Tree               126.243306   8.132200    11.068062    0.322372


     Table 5
     Prediction results on the test set

                      Projected area      actual weight   Estimated weight    Absolute error    Relative error (%)
                      3219.122984         48.370          42.042340           6.327660          13.081785
                      2566.503710         30.760          33.377463           2.617463          8.509306
                      3279.246427         38.690          42.840604           4.150604          10.727847
                      2635.552676         30.600          34.294231           3.694231          12.072651
                      1273.816970         105.590         16.214358           89.375642         84.644040
                      2733.490428         30.360          35.594558           5.234558          17.241629
                      2521.044293         28.530          32.773894           4.243894          14.875199
                      3122.755376         37.570          40.762860           3.192860          8.498430
                      3501.459234         50.850          45.790941           5.059059          9.948985
                      2535.848511         35.070          32.970451           2.099549          5.986738
                      3098.277947         41.740          40.437871           1.302129          3.119618
                      …                   …               …                   …                 …
                      2782.959320         26.520          36.251361           9.731361          36.694423
                      2436.892034         33.990          31.656598           2.333402          6.864966
                      2810.053656         37.080          36.611095           0.468905          1.264578
                      3040.780757         44.260          39.674477           4.585523          10.360424
                      3229.282192         46.620          42.177225           4.442775          9.529762
                      Total               1361.29         1257.726200         96.563799         7.09


                                                                       MAE of 7.09 grams for a similar tomato weight estimation
             𝑀 = 0.01327708 × 𝑃𝐴 − 0.69821033                 (2)      task.
                                                                          When applied to the test dataset, this model achieved
   Table 3.1 presents the prediction results on the test dataset,
                                                                       a relative error of 7.09% in estimating the total weight of
where our model achieved a relative error of 7.09% in esti-
                                                                       tomatoes. These results demonstrate the potential of this
mating the total weight. When applied in an autonomous
                                                                       combined approach for automated tomato yield estimation,
field system, this method shows great potential to enhance
                                                                       although the ideal conditions of the study (fully visible fruits)
yield estimation efficiency, helping farmers save time and
                                                                       suggest that further research is needed to address real-world
reduce labor costs.
                                                                       challenges such as occlusion.
                                                                          While this study yielded promising results, it’s impor-
3.2. Discussion                                                        tant to acknowledge its primary limitation: the experiments
The study employed a multi-step methodology to estimate                were conducted under idealized conditions that do not fully
tomato fruit weights from images. First, a Mask R-CNN                  represent real-world agricultural environments. All toma-
model, using the mask_rcnn_R_50_FPN_3x configuration,                  toes in the study were fully visible and unobstructed, which
was trained on a dataset of 180 images containing 1043                 rarely occurs in actual fields where fruits are often partially
tomato instances. After detection and segmentation, the                hidden by leaves, branches, or other fruits. This idealization
projected surface area of each tomato was estimated us-                may lead to overly optimistic performance estimates.
ing a calibrated conversion from pixel area to metric units,              To bridge this gap and enhance the model’s practical
achieving approximately 95% accuracy. For weight estima-               applicability, future research will focus on developing ro-
tion, several regression models were evaluated on a subset             bust occlusion handling techniques, such as implementing
of real-world images with known weights and projected                  advanced image processing algorithms for reconstructing
areas. Among the regression models evaluated, the Lasso                partially obscured fruits or using ellipse fitting methods to
Regression algorithm demonstrated superior performance                 estimate the full shape of partially visible tomatoes.
in estimating tomato weights. This model achieved a Mean                  Additionally, creating more representative datasets that
Absolute Error (MAE) of 5.776 grams and a Mean Squared                 reflect the challenging conditions found in real agricultural
Error (MSE) of 62.99 grams2̂. Our model outperformed the               settings, including various levels of occlusion and diverse
approach described by Lee et al. [9], which reported an                growth stages, will be crucial. By addressing these limita-
                                                                       tions and training on more diverse and challenging datasets,
future iterations of this system could significantly improve     [9] J.-S. Lee, H. Nazki, J. Baek, Y. Hong, M. hun Lee, Artifi-
in accuracy and robustness, making it a more reliable tool           cial intelligence approach for tomato detection and
for automated agricultural yield estimation in real-world            mass estimation in precision agriculture, Sustain-
scenarios.                                                           ability (2020). URL: https://api.semanticscholar.org/
                                                                     CorpusID:228852288.
                                                                [10] I. Nyalala, C. Okinda, Q. Chao, P. Mecha, T. Korohou,
4. Conclusion                                                        Z. Yi, S. Nyalala, Z. Jiayu, L. Chao, C. Kunjie, Weight
                                                                     and volume estimation of single and occluded
This study successfully introduced an innovative approach
                                                                     tomatoes using machine vision,             International
for accurately assessing tomato crop yields through the
                                                                     Journal of Food Properties 24 (2021) 818–832. URL:
use of advanced image processing, computer vision, and
                                                                     https://doi.org/10.1080/10942912.2021.1933024.
artificial intelligence techniques. The results align closely
                                                                     doi:10.1080/10942912.2021.1933024 .
with the objectives of estimating both the quantity and total
                                                                     arXiv:https://doi.org/10.1080/10942912.2021.1933024 .
weight of fruits, highlighting the practical benefits of this
                                                                [11] F. Zhang, L. J. O’Donnell, Chapter 7 - support
methodology for farmers.
                                                                     vector regression,      in: A. Mechelli, S. Vieira
   Looking ahead, future enhancements will focus on re-
                                                                     (Eds.), Machine Learning, Academic Press, 2020,
fining the approach by integrating multispectral imaging
                                                                     pp. 123–140. URL: https://www.sciencedirect.
to improve data acquisition. Additionally, algorithmic ad-
                                                                     com/science/article/pii/B9780128157398000079.
vancements, including image generation and ellipse fitting
                                                                     doi:10.1016/B978- 0- 12- 815739- 8.00007- 9 .
techniques, will be employed to tackle challenges related to
                                                                [12] K. O’Shea, R. Nash, An introduction to convolutional
occlusion. These developments will enhance the model’s
                                                                     neural networks, 2015. URL: https://arxiv.org/abs/1511.
scalability and robustness, facilitating large-scale deploy-
                                                                     08458. arXiv:1511.08458 .
ment in real-world agricultural settings. The anticipated
                                                                [13] J. K. Basak, B. Paudel, N. E. Kim, N. C. Deb, B. G.
implementation of this approach in automated systems that
                                                                     Kaushalya Madhavi, H. T. Kim, Non-destructive
utilize drones and ground-based robots presents exciting
                                                                     estimation of fruit weight of strawberry using ma-
opportunities for digital agriculture, paving the way for
                                                                     chine learning models, Agronomy 12 (2022). URL:
precise, efficient, and automated yield estimation.
                                                                     https://www.mdpi.com/2073-4395/12/10/2487. doi:10.
                                                                     3390/agronomy12102487 .
References
 [1] Food and Agriculture Organization, Agricultural pro-
     duction statistics, n.d. Retrieved from https://www.fao.
     org/3/cc3751en/cc3751en.pdf.
 [2] M. Dorais, D. Ehret, A. Papadopoulos, Tomato
     (solanum lycopersicum) health components: From
     the seed to the consumer, Phytochemistry Reviews 7
     (2008) 231–250. doi:10.1007/s11101- 007- 9085- x .
 [3] WordAtlas,       The world’s leading tomato
     producing countries,           n.d. Retrieved from
     https://www.worldatlas.com/articles/
     which-are-the-world-s-leading-tomato-producing-countries.
     html.
 [4] K. Yamamoto, W. Guo, Y. Yoshioka, S. Ninomiya, On
     plant detection of intact tomato fruits using image
     analysis and machine learning methods, Sensors
     14 (2014) 12191–12206. URL: https://www.mdpi.com/
     1424-8220/14/7/12191. doi:10.3390/s140712191 .
 [5] M. I. Sari, R. Fajar, T. Gunawan, R. Handayani, The
     use of image processing and sensor in tomato sorting
     machine by color, size, and weight, JOIV : International
     Journal on Informatics Visualization (2022). URL: https:
     //api.semanticscholar.org/CorpusID:250542375.
 [6] Opencv: Open source computer vision library, https:
     //opencv.org/, n.d. Accessed: 2024-10-02.
 [7] T. van Daalen, J. Peller, J. Balendonck,             De-
     termining fresh tomato weight using depth im-
     ages from an ar headset, IFAC-PapersOnLine 55
     (2022) 119–123. URL: https://www.sciencedirect.com/
     science/article/pii/S2405896322027586. doi:10.1016/
     j.ifacol.2022.11.125 .
 [8] Microsoft            hololens,             https://www.
     microsoft.com/fr-fr/hololens?msockid=
     1255574f41cb6082275f4248408c611d, n.d. Accessed:
     2024-10-02.