=Paper=
{{Paper
|id=Vol-3058/paper32
|storemode=property
|title=Design And Development Of Machine Learning Model For Crop Yield Prediction
|pdfUrl=https://ceur-ws.org/Vol-3058/Paper-056.pdf
|volume=Vol-3058
|authors=Taman Kumar,Dr. Kiran Jyoti,,Dr. Sandeep Kumar Singla
}}
==Design And Development Of Machine Learning Model For Crop Yield Prediction==
Design and Development of Machine Learning Model for Crop
Yield Prediction
Taman Kumar1, Kiran Jyoti2, and Sandeep K.Singla3
1,2,3
Guru Nanak Dev Engineering College, Ludhiana (GNDEC), Panjab, India
Abstract
Agriculture is one of the major sources of employment as well as contributor in the GDP of
India. Machine learning is the latest technology which can be used to help the agriculture
sector. This paper will focus in using the machine learning technique to predicting the wheat
crop yield. The regression algorithms which are used in it are simple regression, gradient
booster, polynomial regression and random forest. The results of every algorithm are
compared with actual results in the last.
Keywords 1
Crop yield prediction, machine learning, regression.
1. Introduction
Agriculture is majorly adopted by population of India as a source of livelihood. Almost all
industries depend on raw materials produced by agriculture. That is why agriculture and allied sectors
contribute 15.4% in the GDP of India. India is second largest in producer and seventh largest exporter
of agricultural goods. The boom in this sector is measured after the green revolution of 1967. The
production of crops are depend on different parameters such as rainfall, irrigation, temperature,
different climate conditions, quality of seeds, consumption of NPK (Nitrogen, Phosphorus,
Potassium) and many more. Many changes are required in agricultural domain to improve the changes
in Indian economy (Ramesh et al. 2019). The agricultural information can be extracted by two
methods manual and by using computer and IT tools. However, manual methods have some
limitations:
1. Biasing: The manual information is always one person’s perspective. Each and every person
has their own perspective and the provided information is not fit in every situation.
2. Time delay: Delayed information is not useful.
3. Correctness: To err is human, that is why there is always probability of mistakes.
4. Reliability: All above factors affects the reliability of manual methods.
On the other hand, technology enhancements are well known for precision. Recently the most
common used technological enhancements for agriculture domains are:
1. Machine Learning.
2. Deep learning.
Machine learning: Machine learning is used in many domains such as malls to predict the behavior
of customer’s shopping, stock market trends, moreover it is used in agriculture fields also. There are
many processes that are included in agriculture like irrigation scheduling, crop diseases, by-products,
transportation etc. All procedures ultimately lead to crop yield. Despite going for mini procedures we
opted for main task i.e. crop yield. Crop yield prediction is one of the challenging problems in
precision agriculture, and many models have been proposed and validated so far. This problem
requires the use of several datasets since crop yield depends on many different factors such as climate,
International Conference on Emerging Technologies: AI, IoT, and CPS for Science & Technology Applications, September 06–07, 2021,
NITTTR Chandigarh, India
EMAIL: tamankumar0808@gmail.com (A. 1); kiranjyoti@gmail.com (A. 2); sandeepkumar.singla@gmail.com (A. 3)
ORCID: XXXX-XXXX-XXXX-XXXX (A. 1); XXXX-XXXX-XXXX-XXXX (A. 2); XXXX-XXXX-XXXX-XXXX (A. 3)
©2021 Copyright for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR Workshop Proceedings (CEUR-WS.org)
weather, soil, use of fertilizers, and seed variety. This indicates that crop yield prediction is not a
trivial task; instead, it consists of several complicated steps. Nowadays, crop yield prediction models
can estimate the yield, but a better performance in yield prediction is still desirable (Klompenburg et
al. 2020).
2. Literature
A. Agricultural Information Extraction
1) Raorane and Kulkarni (2015), used datamining tools in crop management system. They
used regression algorithms. The disadvantage is the model is not specified.
2) Kushwaha and Bhattachrya (2015), concluded the method which is helpful in finding the
suitable crop according to the land. Agro algorithm is used in this paper.
3) Santra et al. (2016), used artificial neural network, decision tree algorithm and regression
analysis to providing the information of crops and help in increasing the yield rate. The
negative is method is not clearly specified.
B. Crop Yield Estimation
1) Kumar et al. (2015), suggested the method which is helpful in improving the yield of crops.
Classifications are used and the parameters are compared. The demerit is the accuracy and
performance is not proper.
2) Babu and Babu (2016),gave method which provide solutions to some farming problems such
as water and fertilizers. They have also used the agro algorithm and the accuracy is also the
problem in it.
3) Jain et al. (2017), in their paper found the better sequence according to which the crops
should be sown so that the maximum yield is extracted. Not only sequence they also used
machine learning for irrigation and crop diseases.
4) Djodiltachoumy (2017), used K means algorithms (Clustering) on previous years data and
predict yield according to that database. The demerit is they used fewer amounts of data and it
is suitable only for association rule.
5) Nigam et al. (2019), have concluded the random forest regression gives the highest yield
prediction accuracy. Simple recurrent neural network performs better on rainfall prediction
while LSTM is good for temperature prediction.
C. Machine Learning Algorithms
1) Khairunniza-Bejo et al. (2014), defined a method using Artificial Neural Network to help
the farmers solving some of their problems. The disadvantage is the proposed method is very
time consuming.
2) Ramesh and Vardhan (2015),used multiple linear regression method to analyze and verify
the database. The demerit is this method is of less accuracy.
3) Savla et al. (2015), suggested the framework using Normalization, Clustering and
Classification to understand the crop yield rate zones based on attributes.
4) Sindhura et al. (2016), also used multiple linear regression methods to predict and support
the decision making in many sectors.
The comprehensive study of literature review revealed that the crop yield estimation and
agricultural information extraction from the ancillary data as well as historical data is an open
problem. Various machine learning models and other algorithms have been used in past for the yield
estimation.
3. Methodology
Figure 1: Flowchart of Proposed Methodology
Elaboration of methodology:
Step 1: The datasets are collected and processed.
Step 2: If there are any impurities in dataset, these are removed.
Step 3: The data is normalized if needed and can be converted into smaller volume of data.
Step 4: The data is converted into supporting format.
Step 5: Processed data is stored in the databases.
Step 6: The required method is applied.
Step 7: Final results are collected.
A. Working of model:
i. Real time datasets of different parameters such as Precipitation, Wheat Crop Yield, NPK
Consumption, Mean Temperature, Relative Humidity, Surface Pressure, Annual Rainfall
is collected and downloaded from authentic sites such as data.gov.in and
power.larc.nasa.gov/data-access-viewer. The area chosen is widely from Punjab, India.
The variables and their respective units of measures are given below in table 1:
Table 1
Units of Measures
Variable Units of Measures
Precipitation mm/day
Relative Humidity %
Surface Pressure kPa
Mean Temperature C
Mean Wind Speed m/s
Earth Skin Temperature C
NPK Consumption TNT
ii. Collected data is preprocessed. There were some ‘NA’ values which are filled by taking
average value of the above and below column.
iii. Feature selection is applied to extract important parameters for modeling framework. A
process to find correlation between all the parameters is applied and the parameters which
were not affecting the crop yield are eliminated. Image of correlation is given below:
Figure 2: Correlation Between Variables
iv. Dataset is partitioned into training and testing set. 80% of data is used for training
purpose of the model and 20% is used for testing of the model.
v. Various machine learning algorithms named as Random Forest trees, Polynomial
Regression, GBM, Multiple Linear Regression and Linear Regression are implemented
on the dataset to predict the output.
B. Output
1. Results of applied machine learning algorithms are compared to evaluate the model. The table
of results are given below in table 2:
Table 2
Comparison of Predictions with Actual Results
Actual Results (kg/ha) Random Forest Trees Gradient Simple Polynomial
(kg/ha) Booster Regression Regression
(kg/ha) (kg/ha) (kg/ha)
4693 4152.03 4474.749 4507 3994.286
5097 3943.56 4352.341 4507 3772.596
4724 4149.2 4184.22 4179 3449.539
5017 3945.25 3933.049 3853 3843.369
4304 4001.91 4208.408 4179 4038.906
4583 4277.22 4369.696 4221 3825.896
5046 4160.33 4224.234 4221 4004.914
5077 4233.55 4226.763 4207 3716.017
2. The representation of all the predicted values and actual values from year 2011 to 2018 is also
given below in line and bar graph:
Fig. 3. Bar Graphical Representation of Predictions with Actual Results.
Fig. 4. Line Graphical Representation of Predictions with Actual Results.
3. The table of performance evolution measures such as Mean Absolute Error, Mean Squared
Error, Root Mean Squared Error and Mean Absolute Percentage Error of applied algorithm is
given below in table 3:
Table 3. Table of Performance Evolution Measures
Type of Errors Random Forest Gradient Simple Polynomial
Trees Booster Regression Regression
Mean Absolute Error 709.744 570.942 583.375 986.935
Mean Squared Error 597,836.8 440,162.5 452,351 1,102,9
13 65 .375 40.261
Root Mean Squared 773.199 663.447 672.571 1050.21
Error
Mean Absolute 0.144 0.115 0.118 0.202
Percentage Error
4. Accuracy of applied models is given below in table 4:
Table 4. Accuracy of Applied Models
Random Forest Gradient Booster Simple Polynomial
Trees Regression Regression
85.6% 88.5% 88.2% 79.8%
4. Conclusion and Future work
From the results it is clearly shown that Gradient booster gives the maximum accurate results.
The results are obtained currently using the Knime software but our future work is to develop an
application so that the farmers can operate it easily.
5. References
[1] Babu, T. Giri, and Dr G. Anjan Babu. "Big Data Analytics to Produce Big Results in the
Agricultural Sector." (2016).
[2] Djodiltachoumy, S. "A Model for Prediction of Crop Yield." International Journal of
Computational Intelligence and Informatics 6, no. 4 (2017).
[3] Ghadge, Rushika, Juilee Kulkarni, Pooja More, Sachee Nene, and R. L. Priya. "Prediction of
crop yield using machine learning." Int. Res. J. Eng. Technol.(IRJET) 5 (2018).
[4] Huang, Jui-Chan, Kuo-Min Ko, Ming-Hung Shu, and Bi-Min Hsu. "Application and comparison
of several machine learning algorithms and their integration models in regression problems."
Neural Computing and Applications 32, no. 10 (2020): 5461-5469.
[5] Jain, Nishit, Amit Kumar, Sahil Garud, Vishal Pradhan, and Prajakta Kulkarni. "Crop selection
method based on various environmental factors using machine learning." International Research
Journal of Engineering and Technology (IRJET) 4, no. 2 (2017): 1530-1533.
[6] Kale, Shivani S., and Preeti S. Patil. "A Machine Learning Approach to Predict Crop Yield and
Success Rate." In 2019 IEEE Pune Section International Conference (PuneCon), pp. 1-5. IEEE,
2019.
[7] Khairunniza-Bejo, Siti, Samihah Mustaffha, and Wan Ishak Wan Ismail. "Application of
artificial neural network in predicting crop yield: A review." Journal of Food Science and
Engineering 4, no. 1 (2014): 1.
[8] Kumar, Rakesh, M. P. Singh, Prabhat Kumar, and J. P. Singh. "Crop Selection Method to
maximize crop yield rate using machine learning technique." In 2015 international conference on
smart technologies and management for computing, communication, controls, energy and
materials (ICSTM), pp. 138-145. IEEE, 2015.
[9] Kushwaha, Ashwani Kumar, and Sweta Bhattachrya. "Crop yield prediction using Agro
Algorithm in Hadoop." International Journal of Computer Science and Information Technology
& Security (IJCSITS) 5, no. 2 (2015): 271-274.
[10] Medar, Ramesh, Vijay S. Rajpurohit, and Shweta Shweta. "Crop yield prediction using machine
learning techniques." In 2019 IEEE 5th International Conference for Convergence in Technology
(I2CT), pp. 1-5. IEEE, 2019.
[11] Mishra, Subhadra, Debahuti Mishra, and Gour Hari Santra. "Applications of machine learning
techniques in agricultural crop production: a review paper." Indian Journal of Science and
Technology 9, no. 38 (2016): 1-14.
[12] Nigam, Aruvansh, Saksham Garg, Archit Agrawal, and Parul Agrawal. "Crop yield prediction
using machine learning algorithms." In 2019 Fifth International Conference on Image
Information Processing (ICIIP), pp. 125-130. IEEE, 2019.
[13] Ramesh, D., and B. Vishnu Vardhan. "Analysis of crop yield prediction using data mining
techniques." International Journal of research in engineering and technology 4, no. 1 (2015): 47-
473.
[14] Raorane, A. A., and R. V. Kulkarni. "Application of DataMining tool to crop management
system." Russian Journal of Agricultural and Socio-Economic Sciences 37, no. 1 (2015).
[15] Rajak, Rohit Kumar, AnkitPawar, MitaleePendke, PoojaShinde, Suresh Rathod, and
AvinashDevare. "Crop recommendation system to maximize crop yield using machine learning
technique." International Research Journal of Engineering and Technology 4, no. 12 (2017):
950-953.
[16] Savla, Anshal, Himtanaya Bhadada, Parul Dhawan, and Vatsa Joshi. "Application of machine
learning techniques for yield prediction on delineated zones in precision agriculture." IJNCAA
(2015): 48
[17] Son, Nguyen-Thanh, Chi-Farn Chen, Cheng-Ru Chen, Horng-Yuh Guo, Youg-Sing Cheng, Shu-
Ling Chen, Huan-Sheng Lin, and Shih-Hsiang Chen. "Machine learning approaches for rice crop
yield predictions using time-series satellite data in Taiwan." International Journal of Remote
Sensing 41, no. 20 (2020): 7868-7888.
[18] D. Sindhura, B. Navya Krishna, K. Sai Prasanna Lakshmi, B. Mallikarjun Rao, Dr. J Rajendra
Prasad, Effects of Climate Changes on Agriculture International Journal of Advanced Research
in Computer Science and Software Engineering,2016.
[19] Van Klompenburg, Thomas, Ayalew Kassahun, and Cagatay Catal. "Crop yield prediction using
machine learning: A systematic literature review." Computers and Electronics in Agriculture 177
(2020): 105709.