=Paper=
{{Paper
|id=Vol-3789/Paper7
|storemode=property
|title=Estimating tomato fruit masses using image processing and artificial intelligence
|pdfUrl=https://ceur-ws.org/Vol-3789/Paper7.pdf
|volume=Vol-3789
|authors=Elognissè Erasme Guérin AGOSSADOU,Mahugnon Géraud AZEHOUN PAZOU,Régis Donald HONTINFINDE,Ahmed Dooguy KORA
|dblpUrl=https://dblp.org/rec/conf/cita2/AgossadouPHK24
}}
==Estimating tomato fruit masses using image processing and artificial intelligence==
Estimating Tomato Fruit Masses through Image Processing and
Artificial Intelligence
Elognissè Erasme Guérin AGOSSADOU1,† , Mahugnon Géraud AZEHOUN PAZOU1,∗,† ,
Régis Donald HONTINFINDE1,† and Ahmed Dooguy Kora2,†
1
Université nationale des sciences, Technologie, Ingénierie et Mathématiques (UNSTIM), POBox 486, SOGBO ALIHO, Abomey, Benin
2
EDMI, Cheikh Anta Diop University. Dakar, Senegal.
Abstract
The integration of intelligent and connected production systems has positioned artificial intelligence (AI) as a pivotal component in
society’s digital transformation, becoming indispensable. Leveraging the vast amounts of data generated, AI can now make critical
decisions to mitigate potential disasters. This study focuses on developing a method that combines computer vision and machine
learning algorithms to estimate tomato weights. A dataset of tomato images was compiled, and a modified Mask R-CNN algorithm
was employed to detect, segment, and extract individual fruit masks. Various regression models were evaluated to predict tomato
weight based on visual features. The results on the test dataset indicate that this approach can estimate the number and total weight of
tomatoes with approximately 93% accuracy. This research highlights the potential for automated monitoring of market garden crop
yields through AI.
Keywords
tomato fruit mass estimation, image processing, prediction models, Neural network, deep learning, pix2pix, rcnn
1. Introduction abling the extraction and counting of fruit centroids. The
study reported a detection precision of 0.88 and recall of
Agriculture faces major challenges in sustainably feeding 0.80, demonstrating the method’s efficacy in controlled en-
a growing global population, making accurate crop yield vironments for tomato detection and counting.
estimation essential for informed decision-making by farm- In Indonesia, the increasing demand for tomatoes neces-
ers. While traditional methods such as field surveys can be sitates efficient post-harvest handling. A study by Sari et
helpful, they are often limited by issues of accuracy, cost, al. [5] proposed a sorting system that categorizes tomatoes
and time efficiency. based on color, size, and weight using image processing
Tomato (Solanum lycopersicum) is a crucial vegetable with the OpenCV [6] library. The system sorts tomatoes
crop globally, boasting 183 million tonnes in 2018 [1]. Native into red, yellow, and green categories and measures dimen-
to Central and South America, the tomato was introduced to sions by identifying the outermost points of the detected
Europe in the 16th century, quickly gaining popularity for fruits. It utilizes a weight sensor for mass measurement.
its delicious, nutrient-rich fruits loaded with vitamins, min- The prototype, which incorporates a webcam, Arduino, and
erals, and antioxidants [2]. Major producers include China, conveyor system, achieved 100% accuracy in color detec-
India, the United States, and Turkey, with significant cultiva- tion and 95% in weight measurement, although dimensional
tion also occurring in African nations such as Nigeria, Egypt, measurement accuracy was only 5%.
Morocco, and Algeria, primarily for local consumption [3]. Van Daalen et al. [7] examined the application of aug-
Tomatoes are generally classified into two main varieties: mented reality (AR) in agriculture, focusing on detecting
determinate, which have limited growth, and indetermi- tomato ripeness using the 3D scanning capabilities of the
nate, which continue growing throughout their lifecycle. HoloLens [8]. Their experimental setup, which included
Whether cultivated in open fields or under protective covers various tomato varieties, highlighted both the opportunities
like greenhouses, tomato farming requires careful irriga- and challenges of using AR for hands-free tasks like training
tion due to the plant’s deep taproot system. Furthermore, and harvesting in greenhouse environments.
challenges such as pest infestations—like downy mildew Similarly, Lee et al. [9] proposed an artificial intelligence-
and Botrytis necessitate the use of appropriate cultivation based system for tomato detection and mass estimation,
practices and phytosanitary measures to ensure optimal utilizing multi-class detection and instance-wise segmenta-
yields. tion. By analyzing a tomato image dataset with a calibrated
Several approaches have been investigated in the liter- vision system, the study demonstrated a high correlation
ature to address the challenge of fruit weight estimation. between fruit dimensions and mass. Their method achieved
For instance, Yamamoto et al. [4] developed a method to a mean absolute percentage error of 7.09%, showcasing the
accurately count individual tomato fruits from images of effectiveness of computer vision and machine learning for
plants grown in a laboratory setting. This method employed automating tasks such as yield monitoring and fruit sizing.
decision trees to analyze pixel color characteristics, achiev- In another study, Nyalala et al. [10] developed seven re-
ing precise pixel-level segmentation. Post-processing was gression models, including Support Vector Regression (SVR)
then applied to group pixels corresponding to fruits, en- [11] and artificial neural networks (ANNs) [12] with differ-
ent training algorithms. These models effectively estimated
Cotonou’24: Conférence Internationale des Technologies de l’Information
fruit weight and volume, offering significant potential for
et de la Communication de l’ANSALB, June 27–28, 2024, Cotonou, BENIN
∗
Corresponding author. improvements in fruit sorting and grading processes.
†
These authors contributed equally. Basak et al. [13] introduced a non-destructive method
Envelope-Open agossadourin@gmail.com (E. E. G. AGOSSADOU); for estimating strawberry fruit weight using machine learn-
geraud.pazou@gmail.com (M. G. A. PAZOU); hontinfinde7@gmail.com ing models. By analyzing 900 samples from three different
(R. D. HONTINFINDE); ahmed.kora@esmt.sn (A. D. Kora) strawberry cultivars, they used image processing to calcu-
© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
late pixel numbers. Linear regression (LR) and non-linear Table 1
SVR models were applied, resulting in training and testing Dataset Overview
accuracies of 96.3% and 89.6%, respectively. Source Number of Number of
This study focuses on applying recent advancements in images fruit instances
computer vision, particularly object detection, and machine
learning algorithms to estimate tomato weight from real- Online 180 1043
Field-collected 100 100
world images. The subsequent sections describe the equip-
ment used, the structure and composition of the dataset, Total 280 1143
and the methodology employed to generate accurate quan-
titative measures such as projected surface area and total
Table 2
weight for detected fruits. Our findings demonstrate the Additional information on images taken in the field
effectiveness of this approach. Additionally, we discuss the
challenges faced and propose recommendations for future
weight real_surface (mm2 )
research.
count 100.000000 100.000000
mean 33.341900 2565.479377
2. Material and Methods std
min
13.884898
9.930000
912.439551
856.037079
25% 19.932500 1723.114236
2.1. Dataset 50% 35.955000 2609.542487
The data used in this study consists of tomato fruit images 75% 42.877500 3186.808853
collected both online and in the field under real-world con- max 63.760000 4931.281258
ditions. The dataset includes a total of 180 images obtained
online and 100 images taken in the field, containing a to-
tal of 1143 tomato fruit instances. Table 1 illustrates the
2.2.2. Projected Surface Area Estimation of Each
composition of our dataset.
Tomato
Images captured in the field helped to collect additional
information such as actual fruit area and actual fruit weight, To evaluate the projected area of each tomato from images,
which enriches the dataset by providing accurate and rele- a dataset was constructed, including individual images of
vant measurements for tomato fruit weight estimation. Ta- tomatoes, their actual weight in grams, the total number
ble 2 presents additional insights concerning field-captured of pixels in the image, the number of pixels corresponding
images. Upon analysis of the table, the average fruit weight tomato (obtained by semantic segmentation), and the total
is 35.30 g , with a standard deviation of 14.56 g . The aver- area of the image in square meters, obtained by camera
age true area is 2673.48 mm2 , with a standard deviation of calibration.
873.68 mm2 . Quartile values provide insights into the distri- The estimation of the projected area took place in two
bution of the data. Thus, 25% of the fruits have a weight of steps: first, the segmentation mask allows us to calculate
less than 25.21 g, 50% have a weight of less than 37.00 g , and the area in pixels occupied by the tomato in the image.
75% have a weight of less than 43.49 g. For the actual surface Then, a camera calibration converted this pixel area into
area, 25% of fruits have an area less than 2, 024.93 mm2 , 50% an actual metric area, using a coin as a reference object.
have an area less than 2, 779.53 mm2 , and 75% have an area By photographing the tomatoes under the same conditions
less than 3, 219.12 mm2 . as the reference piece, the resulting conversion factor was
used to convert the pixel area of each fruit into a measure
2.2. Methods of its actual projected area in metric units. This method
uses a rule of three, where the actual surface area of the
To estimate tomato fruit weights, we developed a four steps tomato (𝐴𝑡𝑜𝑚𝑎𝑡𝑜 ) is estimated based on the number of pixels
approach (see figure 1) corresponding to the tomato in the image (𝑃𝑡𝑜𝑚𝑎𝑡𝑜 ), using
𝐴𝑟𝑒𝑓
the conversion factor established during calibration: 𝑃 .
2.2.1. Detection, segmentation and extraction of 𝑟𝑒𝑓
tomato fruit masks 𝐴𝑟𝑒𝑓
𝐴𝑡𝑜𝑚𝑎𝑡𝑒 = 𝑃𝑡𝑜𝑚𝑎𝑡𝑒 × (1)
To train our segmentation model, we prepared a dataset 𝑃𝑟𝑒𝑓
of tomato images, labeled in the COCO format. The With this method, we were able to estimate the real sur-
dataset consisted of 180 images containing 1043 instances of face area of each tomato in physical space from segmenta-
tomatoes, sourced from both the internet and field pho- tion in image space, thanks to precise calibration using a
tography, and annotated using the ROboflow platform. reference object.
We employed the Mask R-CNN instance segmentation
model through the Detectron2 framework, selecting the 2.2.3. Tomato Mass Estimation
mask_rcnn_R_50_FPN_3x configuration developed by Face-
book AI Research. This model, pre-trained on the COCO To estimate the weight of the tomatoes based on their pro-
dataset, combines the Mask R-CNN architecture with a jected surface area, we tested several regression models,
ResNet-50 backbone and Feature Pyramid Network (FPN) including Simple Linear Regression (SLR), Multiple Linear
for high-performance, multi-scale object detection. Regression (MLR), and Partial Least Squares Regression
(PLSR). These models aimed to establish a mathematical
relationship between the surface area (independent vari-
able) and the weight (dependent variable) of the tomatoes.
Figure 1: Summary illustration of the methodology
Figure 2: Model accuracy Figure 3: Evolution of the cost function
The performance of each model was evaluated on a vali- Table 3
dation set consisting of 20% of the total dataset, collected Model results in terms of Average Precision
under real-world conditions. Standard metrics, such as Root
Metric AP AP50 AP75 APm APl
Mean Square Error (RMSE) and the Coefficient of Deter-
mination (𝑅2 ), were employed to assess model accuracy. Detection 55.901 74.083 62.361 30.294 66.144
We also applied 10 -fold cross-validation to each model to Segmentation 54.591 73.763 61.112 24.978 64.943
reduce the likelihood of overfitting.
Figure 1 depicts the summary of the methodology adopted
in this study. to enhance performance.
The projected surface area of each fruit was derived from
the segmented mask by calculating the pixel area, then con-
3. Results and Discussion verting it to real-world units using camera calibration infor-
mation as defined in Equation 1. This method achieved a
3.1. Results precision of approximately 95.
Figure 2 illustrates the model’s accuracy, while Figure 3 For tomato weight estimation, a subset of the dataset con-
depicts the evolution of the cost function taining real-world images was used, which included precise
The performance of the model was evaluated on the test data on both the actual weight of each tomato and their pro-
set consisting of 19 images containing a total of 149 tomato jected surface area. A mathematical relationship between
annotations. The Average Precision (AP) metric was used to the weight and projected area was established through the
quantify the model’s ability to correctly detect and segment evaluation of several regression methods. The algorithms
tomatoes under various conditions. tested included Least Squares Regression (LSR), Multiple Lin-
Table 3 presents the results obtained for the detection ear Regression (MLR), and Support Vector Machines (SVM),
and semantic segmentation tasks. We observe an average and their performance was compared using cross-validation
AP of 55.9% for detection and 54.6% for segmentation on and Mean Square Error (MSE) as the evaluation metric.
different IoU thresholds between 0.5 and 0.95. The model Table 4 highlights the performance metrics of the tested
achieves better performance on large fruits (AP of 66.1% in models.
detection) than on small tomatoes (AP of 30.3%). Among the evaluated models, Lasso Regression achieved
These results confirm the model’s effectiveness in detect- the best performance, with a MAE of 5,776 and an MSE of
ing and segmenting tomatoes in real-world conditions. Fur- 62.99.
ther data annotation and model optimization are expected The corresponding model equation is:
Table 4
Performance metrics of different models
MSE MAE RSE R2
Linear Regression 67.465310 5.959565 8.110772 0.614756
Lasso Regression 62.990660 5.775707 7.900871 0.659433
Ridge Regression 64.222324 5.820851 7.789839 0.662985
ElasticNet Regression 65.214001 5.919661 8.063410 0.534604
SVR 81.623252 6.884133 8.980888 0.564414
Random Forest 67.078331 6.002012 8.102465 0.622985
AdaBoost Regression 76.441269 6.757964 8.621712 0.578526
KNeighbors Regression 68.750815 6.179380 8.225651 0.634068
Decision Tree 126.243306 8.132200 11.068062 0.322372
Table 5
Prediction results on the test set
Projected area actual weight Estimated weight Absolute error Relative error (%)
3219.122984 48.370 42.042340 6.327660 13.081785
2566.503710 30.760 33.377463 2.617463 8.509306
3279.246427 38.690 42.840604 4.150604 10.727847
2635.552676 30.600 34.294231 3.694231 12.072651
1273.816970 105.590 16.214358 89.375642 84.644040
2733.490428 30.360 35.594558 5.234558 17.241629
2521.044293 28.530 32.773894 4.243894 14.875199
3122.755376 37.570 40.762860 3.192860 8.498430
3501.459234 50.850 45.790941 5.059059 9.948985
2535.848511 35.070 32.970451 2.099549 5.986738
3098.277947 41.740 40.437871 1.302129 3.119618
… … … … …
2782.959320 26.520 36.251361 9.731361 36.694423
2436.892034 33.990 31.656598 2.333402 6.864966
2810.053656 37.080 36.611095 0.468905 1.264578
3040.780757 44.260 39.674477 4.585523 10.360424
3229.282192 46.620 42.177225 4.442775 9.529762
Total 1361.29 1257.726200 96.563799 7.09
MAE of 7.09 grams for a similar tomato weight estimation
𝑀 = 0.01327708 × 𝑃𝐴 − 0.69821033 (2) task.
When applied to the test dataset, this model achieved
Table 3.1 presents the prediction results on the test dataset,
a relative error of 7.09% in estimating the total weight of
where our model achieved a relative error of 7.09% in esti-
tomatoes. These results demonstrate the potential of this
mating the total weight. When applied in an autonomous
combined approach for automated tomato yield estimation,
field system, this method shows great potential to enhance
although the ideal conditions of the study (fully visible fruits)
yield estimation efficiency, helping farmers save time and
suggest that further research is needed to address real-world
reduce labor costs.
challenges such as occlusion.
While this study yielded promising results, it’s impor-
3.2. Discussion tant to acknowledge its primary limitation: the experiments
The study employed a multi-step methodology to estimate were conducted under idealized conditions that do not fully
tomato fruit weights from images. First, a Mask R-CNN represent real-world agricultural environments. All toma-
model, using the mask_rcnn_R_50_FPN_3x configuration, toes in the study were fully visible and unobstructed, which
was trained on a dataset of 180 images containing 1043 rarely occurs in actual fields where fruits are often partially
tomato instances. After detection and segmentation, the hidden by leaves, branches, or other fruits. This idealization
projected surface area of each tomato was estimated us- may lead to overly optimistic performance estimates.
ing a calibrated conversion from pixel area to metric units, To bridge this gap and enhance the model’s practical
achieving approximately 95% accuracy. For weight estima- applicability, future research will focus on developing ro-
tion, several regression models were evaluated on a subset bust occlusion handling techniques, such as implementing
of real-world images with known weights and projected advanced image processing algorithms for reconstructing
areas. Among the regression models evaluated, the Lasso partially obscured fruits or using ellipse fitting methods to
Regression algorithm demonstrated superior performance estimate the full shape of partially visible tomatoes.
in estimating tomato weights. This model achieved a Mean Additionally, creating more representative datasets that
Absolute Error (MAE) of 5.776 grams and a Mean Squared reflect the challenging conditions found in real agricultural
Error (MSE) of 62.99 grams2̂. Our model outperformed the settings, including various levels of occlusion and diverse
approach described by Lee et al. [9], which reported an growth stages, will be crucial. By addressing these limita-
tions and training on more diverse and challenging datasets,
future iterations of this system could significantly improve [9] J.-S. Lee, H. Nazki, J. Baek, Y. Hong, M. hun Lee, Artifi-
in accuracy and robustness, making it a more reliable tool cial intelligence approach for tomato detection and
for automated agricultural yield estimation in real-world mass estimation in precision agriculture, Sustain-
scenarios. ability (2020). URL: https://api.semanticscholar.org/
CorpusID:228852288.
[10] I. Nyalala, C. Okinda, Q. Chao, P. Mecha, T. Korohou,
4. Conclusion Z. Yi, S. Nyalala, Z. Jiayu, L. Chao, C. Kunjie, Weight
and volume estimation of single and occluded
This study successfully introduced an innovative approach
tomatoes using machine vision, International
for accurately assessing tomato crop yields through the
Journal of Food Properties 24 (2021) 818–832. URL:
use of advanced image processing, computer vision, and
https://doi.org/10.1080/10942912.2021.1933024.
artificial intelligence techniques. The results align closely
doi:10.1080/10942912.2021.1933024 .
with the objectives of estimating both the quantity and total
arXiv:https://doi.org/10.1080/10942912.2021.1933024 .
weight of fruits, highlighting the practical benefits of this
[11] F. Zhang, L. J. O’Donnell, Chapter 7 - support
methodology for farmers.
vector regression, in: A. Mechelli, S. Vieira
Looking ahead, future enhancements will focus on re-
(Eds.), Machine Learning, Academic Press, 2020,
fining the approach by integrating multispectral imaging
pp. 123–140. URL: https://www.sciencedirect.
to improve data acquisition. Additionally, algorithmic ad-
com/science/article/pii/B9780128157398000079.
vancements, including image generation and ellipse fitting
doi:10.1016/B978- 0- 12- 815739- 8.00007- 9 .
techniques, will be employed to tackle challenges related to
[12] K. O’Shea, R. Nash, An introduction to convolutional
occlusion. These developments will enhance the model’s
neural networks, 2015. URL: https://arxiv.org/abs/1511.
scalability and robustness, facilitating large-scale deploy-
08458. arXiv:1511.08458 .
ment in real-world agricultural settings. The anticipated
[13] J. K. Basak, B. Paudel, N. E. Kim, N. C. Deb, B. G.
implementation of this approach in automated systems that
Kaushalya Madhavi, H. T. Kim, Non-destructive
utilize drones and ground-based robots presents exciting
estimation of fruit weight of strawberry using ma-
opportunities for digital agriculture, paving the way for
chine learning models, Agronomy 12 (2022). URL:
precise, efficient, and automated yield estimation.
https://www.mdpi.com/2073-4395/12/10/2487. doi:10.
3390/agronomy12102487 .
References
[1] Food and Agriculture Organization, Agricultural pro-
duction statistics, n.d. Retrieved from https://www.fao.
org/3/cc3751en/cc3751en.pdf.
[2] M. Dorais, D. Ehret, A. Papadopoulos, Tomato
(solanum lycopersicum) health components: From
the seed to the consumer, Phytochemistry Reviews 7
(2008) 231–250. doi:10.1007/s11101- 007- 9085- x .
[3] WordAtlas, The world’s leading tomato
producing countries, n.d. Retrieved from
https://www.worldatlas.com/articles/
which-are-the-world-s-leading-tomato-producing-countries.
html.
[4] K. Yamamoto, W. Guo, Y. Yoshioka, S. Ninomiya, On
plant detection of intact tomato fruits using image
analysis and machine learning methods, Sensors
14 (2014) 12191–12206. URL: https://www.mdpi.com/
1424-8220/14/7/12191. doi:10.3390/s140712191 .
[5] M. I. Sari, R. Fajar, T. Gunawan, R. Handayani, The
use of image processing and sensor in tomato sorting
machine by color, size, and weight, JOIV : International
Journal on Informatics Visualization (2022). URL: https:
//api.semanticscholar.org/CorpusID:250542375.
[6] Opencv: Open source computer vision library, https:
//opencv.org/, n.d. Accessed: 2024-10-02.
[7] T. van Daalen, J. Peller, J. Balendonck, De-
termining fresh tomato weight using depth im-
ages from an ar headset, IFAC-PapersOnLine 55
(2022) 119–123. URL: https://www.sciencedirect.com/
science/article/pii/S2405896322027586. doi:10.1016/
j.ifacol.2022.11.125 .
[8] Microsoft hololens, https://www.
microsoft.com/fr-fr/hololens?msockid=
1255574f41cb6082275f4248408c611d, n.d. Accessed:
2024-10-02.