Prediction Of Oil Palm Yield For Smallholders Estates In Tropical Region Using Extra Trees Method

Global food security and sustainable use of natural resources heavily relies on the timely prediction of crop yields. The oil palm, being the most profitable crop for oil production worldwide, requires accurate yield predictions to maintain a balance between its global demand and supply. This thesis...

Full description

Saved in:
Bibliographic Details
Main Author: Khan, Nuzhat
Format: Thesis
Language:English
Published: 2023
Subjects:
Online Access:http://eprints.usm.my/60016/1/NUZHAT%20KHAN%20-%20TESIS%20cut.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Global food security and sustainable use of natural resources heavily relies on the timely prediction of crop yields. The oil palm, being the most profitable crop for oil production worldwide, requires accurate yield predictions to maintain a balance between its global demand and supply. This thesis proposes machine learning regression approach to predict oil palm fresh fruit bunches yield. The main objective of this research is to develop a machine learning model trained on actual data to predict long-term oil palm yield with high precision. The study utilizes data obtained from multiple sources including Malaysia Palm Oil Board (MPOB), Meteorological Department Malaysia (MET) and NASA. The proposed methodology is implemented on site specific data recorded from an entire state Pahang Malaysia. The data is comprised of 18 variables including historical yield, soil, and weather variables, to accurately predict future yield. The statistical analysis facilitated to assess data quality and to extract the agricultural information. The outcomes of the correlation analysis reveal the complex interdependencies of yield influencing factors. The data exploration is followed by a preprocessing pipeline to convert raw data into meaningful information. The data preprocessing pipeline includes treating outliers, normalization, features selection and data splitting into training, testing, and validation sets. Based on the prepared data, the automated model selection is used to identify the most appropriate prediction model.