Estimating Tomato Fruit Masses through Image Processing and Artificial Intelligence Elognissè Erasme Guérin AGOSSADOU1,† , Mahugnon Géraud AZEHOUN PAZOU1,∗,† , Régis Donald HONTINFINDE1,† and Ahmed Dooguy Kora2,† 1 Université nationale des sciences, Technologie, Ingénierie et Mathématiques (UNSTIM), POBox 486, SOGBO ALIHO, Abomey, Benin 2 EDMI, Cheikh Anta Diop University. Dakar, Senegal. Abstract The integration of intelligent and connected production systems has positioned artificial intelligence (AI) as a pivotal component in society’s digital transformation, becoming indispensable. Leveraging the vast amounts of data generated, AI can now make critical decisions to mitigate potential disasters. This study focuses on developing a method that combines computer vision and machine learning algorithms to estimate tomato weights. A dataset of tomato images was compiled, and a modified Mask R-CNN algorithm was employed to detect, segment, and extract individual fruit masks. Various regression models were evaluated to predict tomato weight based on visual features. The results on the test dataset indicate that this approach can estimate the number and total weight of tomatoes with approximately 93% accuracy. This research highlights the potential for automated monitoring of market garden crop yields through AI. Keywords tomato fruit mass estimation, image processing, prediction models, Neural network, deep learning, pix2pix, rcnn 1. Introduction abling the extraction and counting of fruit centroids. The study reported a detection precision of 0.88 and recall of Agriculture faces major challenges in sustainably feeding 0.80, demonstrating the method’s efficacy in controlled en- a growing global population, making accurate crop yield vironments for tomato detection and counting. estimation essential for informed decision-making by farm- In Indonesia, the increasing demand for tomatoes neces- ers. While traditional methods such as field surveys can be sitates efficient post-harvest handling. A study by Sari et helpful, they are often limited by issues of accuracy, cost, al. [5] proposed a sorting system that categorizes tomatoes and time efficiency. based on color, size, and weight using image processing Tomato (Solanum lycopersicum) is a crucial vegetable with the OpenCV [6] library. The system sorts tomatoes crop globally, boasting 183 million tonnes in 2018 [1]. Native into red, yellow, and green categories and measures dimen- to Central and South America, the tomato was introduced to sions by identifying the outermost points of the detected Europe in the 16th century, quickly gaining popularity for fruits. It utilizes a weight sensor for mass measurement. its delicious, nutrient-rich fruits loaded with vitamins, min- The prototype, which incorporates a webcam, Arduino, and erals, and antioxidants [2]. Major producers include China, conveyor system, achieved 100% accuracy in color detec- India, the United States, and Turkey, with significant cultiva- tion and 95% in weight measurement, although dimensional tion also occurring in African nations such as Nigeria, Egypt, measurement accuracy was only 5%. Morocco, and Algeria, primarily for local consumption [3]. Van Daalen et al. [7] examined the application of aug- Tomatoes are generally classified into two main varieties: mented reality (AR) in agriculture, focusing on detecting determinate, which have limited growth, and indetermi- tomato ripeness using the 3D scanning capabilities of the nate, which continue growing throughout their lifecycle. HoloLens [8]. Their experimental setup, which included Whether cultivated in open fields or under protective covers various tomato varieties, highlighted both the opportunities like greenhouses, tomato farming requires careful irriga- and challenges of using AR for hands-free tasks like training tion due to the plant’s deep taproot system. Furthermore, and harvesting in greenhouse environments. challenges such as pest infestations—like downy mildew Similarly, Lee et al. [9] proposed an artificial intelligence- and Botrytis necessitate the use of appropriate cultivation based system for tomato detection and mass estimation, practices and phytosanitary measures to ensure optimal utilizing multi-class detection and instance-wise segmenta- yields. tion. By analyzing a tomato image dataset with a calibrated Several approaches have been investigated in the liter- vision system, the study demonstrated a high correlation ature to address the challenge of fruit weight estimation. between fruit dimensions and mass. Their method achieved For instance, Yamamoto et al. [4] developed a method to a mean absolute percentage error of 7.09%, showcasing the accurately count individual tomato fruits from images of effectiveness of computer vision and machine learning for plants grown in a laboratory setting. This method employed automating tasks such as yield monitoring and fruit sizing. decision trees to analyze pixel color characteristics, achiev- In another study, Nyalala et al. [10] developed seven re- ing precise pixel-level segmentation. Post-processing was gression models, including Support Vector Regression (SVR) then applied to group pixels corresponding to fruits, en- [11] and artificial neural networks (ANNs) [12] with differ- ent training algorithms. These models effectively estimated Cotonou’24: Conférence Internationale des Technologies de l’Information fruit weight and volume, offering significant potential for et de la Communication de l’ANSALB, June 27–28, 2024, Cotonou, BENIN ∗ Corresponding author. improvements in fruit sorting and grading processes. † These authors contributed equally. Basak et al. [13] introduced a non-destructive method Envelope-Open agossadourin@gmail.com (E. E. G. AGOSSADOU); for estimating strawberry fruit weight using machine learn- geraud.pazou@gmail.com (M. G. A. PAZOU); hontinfinde7@gmail.com ing models. By analyzing 900 samples from three different (R. D. HONTINFINDE); ahmed.kora@esmt.sn (A. D. Kora) strawberry cultivars, they used image processing to calcu- © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings late pixel numbers. Linear regression (LR) and non-linear Table 1 SVR models were applied, resulting in training and testing Dataset Overview accuracies of 96.3% and 89.6%, respectively. Source Number of Number of This study focuses on applying recent advancements in images fruit instances computer vision, particularly object detection, and machine learning algorithms to estimate tomato weight from real- Online 180 1043 Field-collected 100 100 world images. The subsequent sections describe the equip- ment used, the structure and composition of the dataset, Total 280 1143 and the methodology employed to generate accurate quan- titative measures such as projected surface area and total Table 2 weight for detected fruits. Our findings demonstrate the Additional information on images taken in the field effectiveness of this approach. Additionally, we discuss the challenges faced and propose recommendations for future weight real_surface (mm2 ) research. count 100.000000 100.000000 mean 33.341900 2565.479377 2. Material and Methods std min 13.884898 9.930000 912.439551 856.037079 25% 19.932500 1723.114236 2.1. Dataset 50% 35.955000 2609.542487 The data used in this study consists of tomato fruit images 75% 42.877500 3186.808853 collected both online and in the field under real-world con- max 63.760000 4931.281258 ditions. The dataset includes a total of 180 images obtained online and 100 images taken in the field, containing a to- tal of 1143 tomato fruit instances. Table 1 illustrates the 2.2.2. Projected Surface Area Estimation of Each composition of our dataset. Tomato Images captured in the field helped to collect additional information such as actual fruit area and actual fruit weight, To evaluate the projected area of each tomato from images, which enriches the dataset by providing accurate and rele- a dataset was constructed, including individual images of vant measurements for tomato fruit weight estimation. Ta- tomatoes, their actual weight in grams, the total number ble 2 presents additional insights concerning field-captured of pixels in the image, the number of pixels corresponding images. Upon analysis of the table, the average fruit weight tomato (obtained by semantic segmentation), and the total is 35.30 g , with a standard deviation of 14.56 g . The aver- area of the image in square meters, obtained by camera age true area is 2673.48 mm2 , with a standard deviation of calibration. 873.68 mm2 . Quartile values provide insights into the distri- The estimation of the projected area took place in two bution of the data. Thus, 25% of the fruits have a weight of steps: first, the segmentation mask allows us to calculate less than 25.21 g, 50% have a weight of less than 37.00 g , and the area in pixels occupied by the tomato in the image. 75% have a weight of less than 43.49 g. For the actual surface Then, a camera calibration converted this pixel area into area, 25% of fruits have an area less than 2, 024.93 mm2 , 50% an actual metric area, using a coin as a reference object. have an area less than 2, 779.53 mm2 , and 75% have an area By photographing the tomatoes under the same conditions less than 3, 219.12 mm2 . as the reference piece, the resulting conversion factor was used to convert the pixel area of each fruit into a measure 2.2. Methods of its actual projected area in metric units. This method uses a rule of three, where the actual surface area of the To estimate tomato fruit weights, we developed a four steps tomato (𝐴𝑡𝑜𝑚𝑎𝑡𝑜 ) is estimated based on the number of pixels approach (see figure 1) corresponding to the tomato in the image (𝑃𝑡𝑜𝑚𝑎𝑡𝑜 ), using 𝐴𝑟𝑒𝑓 the conversion factor established during calibration: 𝑃 . 2.2.1. Detection, segmentation and extraction of 𝑟𝑒𝑓 tomato fruit masks 𝐴𝑟𝑒𝑓 𝐴𝑡𝑜𝑚𝑎𝑡𝑒 = 𝑃𝑡𝑜𝑚𝑎𝑡𝑒 × (1) To train our segmentation model, we prepared a dataset 𝑃𝑟𝑒𝑓 of tomato images, labeled in the COCO format. The With this method, we were able to estimate the real sur- dataset consisted of 180 images containing 1043 instances of face area of each tomato in physical space from segmenta- tomatoes, sourced from both the internet and field pho- tion in image space, thanks to precise calibration using a tography, and annotated using the ROboflow platform. reference object. We employed the Mask R-CNN instance segmentation model through the Detectron2 framework, selecting the 2.2.3. Tomato Mass Estimation mask_rcnn_R_50_FPN_3x configuration developed by Face- book AI Research. This model, pre-trained on the COCO To estimate the weight of the tomatoes based on their pro- dataset, combines the Mask R-CNN architecture with a jected surface area, we tested several regression models, ResNet-50 backbone and Feature Pyramid Network (FPN) including Simple Linear Regression (SLR), Multiple Linear for high-performance, multi-scale object detection. Regression (MLR), and Partial Least Squares Regression (PLSR). These models aimed to establish a mathematical relationship between the surface area (independent vari- able) and the weight (dependent variable) of the tomatoes. Figure 1: Summary illustration of the methodology Figure 2: Model accuracy Figure 3: Evolution of the cost function The performance of each model was evaluated on a vali- Table 3 dation set consisting of 20% of the total dataset, collected Model results in terms of Average Precision under real-world conditions. Standard metrics, such as Root Metric AP AP50 AP75 APm APl Mean Square Error (RMSE) and the Coefficient of Deter- mination (𝑅2 ), were employed to assess model accuracy. Detection 55.901 74.083 62.361 30.294 66.144 We also applied 10 -fold cross-validation to each model to Segmentation 54.591 73.763 61.112 24.978 64.943 reduce the likelihood of overfitting. Figure 1 depicts the summary of the methodology adopted in this study. to enhance performance. The projected surface area of each fruit was derived from the segmented mask by calculating the pixel area, then con- 3. Results and Discussion verting it to real-world units using camera calibration infor- mation as defined in Equation 1. This method achieved a 3.1. Results precision of approximately 95. Figure 2 illustrates the model’s accuracy, while Figure 3 For tomato weight estimation, a subset of the dataset con- depicts the evolution of the cost function taining real-world images was used, which included precise The performance of the model was evaluated on the test data on both the actual weight of each tomato and their pro- set consisting of 19 images containing a total of 149 tomato jected surface area. A mathematical relationship between annotations. The Average Precision (AP) metric was used to the weight and projected area was established through the quantify the model’s ability to correctly detect and segment evaluation of several regression methods. The algorithms tomatoes under various conditions. tested included Least Squares Regression (LSR), Multiple Lin- Table 3 presents the results obtained for the detection ear Regression (MLR), and Support Vector Machines (SVM), and semantic segmentation tasks. We observe an average and their performance was compared using cross-validation AP of 55.9% for detection and 54.6% for segmentation on and Mean Square Error (MSE) as the evaluation metric. different IoU thresholds between 0.5 and 0.95. The model Table 4 highlights the performance metrics of the tested achieves better performance on large fruits (AP of 66.1% in models. detection) than on small tomatoes (AP of 30.3%). Among the evaluated models, Lasso Regression achieved These results confirm the model’s effectiveness in detect- the best performance, with a MAE of 5,776 and an MSE of ing and segmenting tomatoes in real-world conditions. Fur- 62.99. ther data annotation and model optimization are expected The corresponding model equation is: Table 4 Performance metrics of different models MSE MAE RSE R2 Linear Regression 67.465310 5.959565 8.110772 0.614756 Lasso Regression 62.990660 5.775707 7.900871 0.659433 Ridge Regression 64.222324 5.820851 7.789839 0.662985 ElasticNet Regression 65.214001 5.919661 8.063410 0.534604 SVR 81.623252 6.884133 8.980888 0.564414 Random Forest 67.078331 6.002012 8.102465 0.622985 AdaBoost Regression 76.441269 6.757964 8.621712 0.578526 KNeighbors Regression 68.750815 6.179380 8.225651 0.634068 Decision Tree 126.243306 8.132200 11.068062 0.322372 Table 5 Prediction results on the test set Projected area actual weight Estimated weight Absolute error Relative error (%) 3219.122984 48.370 42.042340 6.327660 13.081785 2566.503710 30.760 33.377463 2.617463 8.509306 3279.246427 38.690 42.840604 4.150604 10.727847 2635.552676 30.600 34.294231 3.694231 12.072651 1273.816970 105.590 16.214358 89.375642 84.644040 2733.490428 30.360 35.594558 5.234558 17.241629 2521.044293 28.530 32.773894 4.243894 14.875199 3122.755376 37.570 40.762860 3.192860 8.498430 3501.459234 50.850 45.790941 5.059059 9.948985 2535.848511 35.070 32.970451 2.099549 5.986738 3098.277947 41.740 40.437871 1.302129 3.119618 … … … … … 2782.959320 26.520 36.251361 9.731361 36.694423 2436.892034 33.990 31.656598 2.333402 6.864966 2810.053656 37.080 36.611095 0.468905 1.264578 3040.780757 44.260 39.674477 4.585523 10.360424 3229.282192 46.620 42.177225 4.442775 9.529762 Total 1361.29 1257.726200 96.563799 7.09 MAE of 7.09 grams for a similar tomato weight estimation 𝑀 = 0.01327708 × 𝑃𝐴 − 0.69821033 (2) task. When applied to the test dataset, this model achieved Table 3.1 presents the prediction results on the test dataset, a relative error of 7.09% in estimating the total weight of where our model achieved a relative error of 7.09% in esti- tomatoes. These results demonstrate the potential of this mating the total weight. When applied in an autonomous combined approach for automated tomato yield estimation, field system, this method shows great potential to enhance although the ideal conditions of the study (fully visible fruits) yield estimation efficiency, helping farmers save time and suggest that further research is needed to address real-world reduce labor costs. challenges such as occlusion. While this study yielded promising results, it’s impor- 3.2. Discussion tant to acknowledge its primary limitation: the experiments The study employed a multi-step methodology to estimate were conducted under idealized conditions that do not fully tomato fruit weights from images. First, a Mask R-CNN represent real-world agricultural environments. All toma- model, using the mask_rcnn_R_50_FPN_3x configuration, toes in the study were fully visible and unobstructed, which was trained on a dataset of 180 images containing 1043 rarely occurs in actual fields where fruits are often partially tomato instances. After detection and segmentation, the hidden by leaves, branches, or other fruits. This idealization projected surface area of each tomato was estimated us- may lead to overly optimistic performance estimates. ing a calibrated conversion from pixel area to metric units, To bridge this gap and enhance the model’s practical achieving approximately 95% accuracy. For weight estima- applicability, future research will focus on developing ro- tion, several regression models were evaluated on a subset bust occlusion handling techniques, such as implementing of real-world images with known weights and projected advanced image processing algorithms for reconstructing areas. Among the regression models evaluated, the Lasso partially obscured fruits or using ellipse fitting methods to Regression algorithm demonstrated superior performance estimate the full shape of partially visible tomatoes. in estimating tomato weights. This model achieved a Mean Additionally, creating more representative datasets that Absolute Error (MAE) of 5.776 grams and a Mean Squared reflect the challenging conditions found in real agricultural Error (MSE) of 62.99 grams2̂. Our model outperformed the settings, including various levels of occlusion and diverse approach described by Lee et al. [9], which reported an growth stages, will be crucial. By addressing these limita- tions and training on more diverse and challenging datasets, future iterations of this system could significantly improve [9] J.-S. Lee, H. Nazki, J. Baek, Y. Hong, M. hun Lee, Artifi- in accuracy and robustness, making it a more reliable tool cial intelligence approach for tomato detection and for automated agricultural yield estimation in real-world mass estimation in precision agriculture, Sustain- scenarios. ability (2020). URL: https://api.semanticscholar.org/ CorpusID:228852288. [10] I. Nyalala, C. Okinda, Q. Chao, P. Mecha, T. Korohou, 4. Conclusion Z. Yi, S. Nyalala, Z. Jiayu, L. Chao, C. Kunjie, Weight and volume estimation of single and occluded This study successfully introduced an innovative approach tomatoes using machine vision, International for accurately assessing tomato crop yields through the Journal of Food Properties 24 (2021) 818–832. URL: use of advanced image processing, computer vision, and https://doi.org/10.1080/10942912.2021.1933024. artificial intelligence techniques. The results align closely doi:10.1080/10942912.2021.1933024 . with the objectives of estimating both the quantity and total arXiv:https://doi.org/10.1080/10942912.2021.1933024 . weight of fruits, highlighting the practical benefits of this [11] F. Zhang, L. J. O’Donnell, Chapter 7 - support methodology for farmers. vector regression, in: A. Mechelli, S. Vieira Looking ahead, future enhancements will focus on re- (Eds.), Machine Learning, Academic Press, 2020, fining the approach by integrating multispectral imaging pp. 123–140. URL: https://www.sciencedirect. to improve data acquisition. Additionally, algorithmic ad- com/science/article/pii/B9780128157398000079. vancements, including image generation and ellipse fitting doi:10.1016/B978- 0- 12- 815739- 8.00007- 9 . techniques, will be employed to tackle challenges related to [12] K. O’Shea, R. Nash, An introduction to convolutional occlusion. These developments will enhance the model’s neural networks, 2015. URL: https://arxiv.org/abs/1511. scalability and robustness, facilitating large-scale deploy- 08458. arXiv:1511.08458 . ment in real-world agricultural settings. The anticipated [13] J. K. Basak, B. Paudel, N. E. Kim, N. C. Deb, B. G. implementation of this approach in automated systems that Kaushalya Madhavi, H. T. Kim, Non-destructive utilize drones and ground-based robots presents exciting estimation of fruit weight of strawberry using ma- opportunities for digital agriculture, paving the way for chine learning models, Agronomy 12 (2022). URL: precise, efficient, and automated yield estimation. https://www.mdpi.com/2073-4395/12/10/2487. doi:10. 3390/agronomy12102487 . References [1] Food and Agriculture Organization, Agricultural pro- duction statistics, n.d. Retrieved from https://www.fao. org/3/cc3751en/cc3751en.pdf. [2] M. Dorais, D. Ehret, A. Papadopoulos, Tomato (solanum lycopersicum) health components: From the seed to the consumer, Phytochemistry Reviews 7 (2008) 231–250. doi:10.1007/s11101- 007- 9085- x . [3] WordAtlas, The world’s leading tomato producing countries, n.d. Retrieved from https://www.worldatlas.com/articles/ which-are-the-world-s-leading-tomato-producing-countries. html. [4] K. Yamamoto, W. Guo, Y. Yoshioka, S. Ninomiya, On plant detection of intact tomato fruits using image analysis and machine learning methods, Sensors 14 (2014) 12191–12206. URL: https://www.mdpi.com/ 1424-8220/14/7/12191. doi:10.3390/s140712191 . [5] M. I. Sari, R. Fajar, T. Gunawan, R. Handayani, The use of image processing and sensor in tomato sorting machine by color, size, and weight, JOIV : International Journal on Informatics Visualization (2022). URL: https: //api.semanticscholar.org/CorpusID:250542375. [6] Opencv: Open source computer vision library, https: //opencv.org/, n.d. Accessed: 2024-10-02. [7] T. van Daalen, J. Peller, J. Balendonck, De- termining fresh tomato weight using depth im- ages from an ar headset, IFAC-PapersOnLine 55 (2022) 119–123. URL: https://www.sciencedirect.com/ science/article/pii/S2405896322027586. doi:10.1016/ j.ifacol.2022.11.125 . [8] Microsoft hololens, https://www. microsoft.com/fr-fr/hololens?msockid= 1255574f41cb6082275f4248408c611d, n.d. Accessed: 2024-10-02.