Abstract:
Agriculture in Ethiopia is the fundamental engine of its economy and employs the majority of
the Ethiopian people. Most of these are smallholder farmers’ practice subsistence farming.
These farmers whose output is the production of pulses such as Faba bean, Haricot bean and
Peas which contributes to smallholder income as a higher value crop than cereal crops in
Ethiopia. Pulses crop yield primarily depends on climatic conditions, diseases and pests,
geographical and biological factors and the likes; these results decrease in the crop production.
Currently, prediction of pulses crop yield is performed by the farmer’s based on long-term
experience through visiting the condition on a particular field and using traditional statistical
analysis. However these methods are subjective, insufficient ground observation, substantial
inaccuracies might occur, resulting in inaccurate prediction of pulses crop yield. Thus, this
research introduces the development of different predictive models using supervised machine
learning techniques to predict the future pulses crop yield that to solve the aforementioned
problems. Predicting pulses crop yields early is critical in order to plan and make various policy
decisions like import-export, storage, pricing and marketing. The aim of this study isto develop
a predictive model for pulses crop yield using supervised machine learning techniques in the
case of Hadiya Zone. The researchers used Random Forest, Extreme Gradient Boosting,
Decision Tree, K-Nearest Neighbor and Polynomial Regression algorithms. And also data
analysis and implementation is done by using Anaconda software tool which consists of Python
IDLE, Jupyter Notebook and Spyder. The performance of these models were evaluated by
using different performance metrics like R-square, mean squared error and cross validation
mechanism. The R square achieved in these five models was compared and Random Forest
model is the best predictive model with R square of the 0.9711 and mean square error of the
0.4126 than the rest of the aforementioned models. Therefore, Random Forest Regression
model is best outperformed and effective model for pulses crop yield prediction. Promisingly,
this model will be used by the agricultural sector to have effective decision and policy making
practices