The enhancement of normal ratio method through multiple imputation approach in estimating missing data with outliers for Peninsular Malaysian rainfall dataset / Siti Nur Zahrah Amin Burhanuddin

The complete rainfall dataset is very important in representing the climatological characteristics precisely, especially for hydrological and meteorological studies. It is also contributed to effective and efficient environmental management. However, the rainfall data is highly vulnerable to the mis...

Full description

Saved in:
Bibliographic Details
Main Author: Amin Burhanuddin, Siti Nur Zahrah
Format: Thesis
Language:English
Published: 2020
Subjects:
Online Access:https://ir.uitm.edu.my/id/eprint/60924/1/60924.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-uitm-ir.60924
record_format uketd_dc
spelling my-uitm-ir.609242022-06-03T02:09:43Z The enhancement of normal ratio method through multiple imputation approach in estimating missing data with outliers for Peninsular Malaysian rainfall dataset / Siti Nur Zahrah Amin Burhanuddin 2020-10 Amin Burhanuddin, Siti Nur Zahrah Data processing Rain and rainfall The complete rainfall dataset is very important in representing the climatological characteristics precisely, especially for hydrological and meteorological studies. It is also contributed to effective and efficient environmental management. However, the rainfall data is highly vulnerable to the missing problem due to the dynamic process of the climatic variable. Furthermore, the data is exposed to the seasonal activities that could contribute to the uncertainty and irregularity variations in the rainfall amount which will cause the presence of outliers in the dataset. These situations will affect the quality of the rainfall dataset and subsequently provide inaccurate information to the users. Concerning this situation, this study attempts to develop a practical and reliable approach to treat the missing values in the effort to provide a good quality dataset for the public domain. Spatial estimation method, i.e. normal ratio method was considered in this study to estimate the missing rainfall data. Various efforts were proposed to improve the performance of the method, however, there are lacking works on robustifying the method so that it can perform well for the dataset that contains outliers. Therefore, this study aims to propose the enhancement of normal ratio methods for imputing the missing values in the daily rainfall dataset with outliers. The robust statistics (i.e. trimmed mean, median, and geometric median) were adopted in the proposed methods to make them less affected by the outliers. The normal ratio method was commonly implemented through single imputation approach, but this approach encounters with the limitation of not considering uncertainty in missing values. Thus, this study has proposed a multiple imputation approach based on block bootstrap to overcome the limitation of single imputation approach as well as improving the performance of the existing multiple imputation approach incorporated in Amelia package. Block bootstrap was firstly introduced in the proposed multiple imputation approach (named as NRMI-Bboot) to enhance the performance when dealing with the rainfall time series. The performance of each estimation method was evaluated based on five performance criteria at six different levels of missing data (5%, 10%, 15%. 20%, 25%, and 30%) and three levels of outlying data (5%, 10%, and 15%) that have been created in the dataset. Complete 40 years daily rainfall data from 22 meteorology stations were considered for the analysis purpose. Four target stations were selected as the representative of the main regions in Peninsular Malaysia (northwest, east, west, and southwest). The capability of the estimation methods was further verified using distribution fitting. The adoption of the robust statistics in the proposed estimation methods associated with the NRMI-Bboot approach has provided an improvement to the estimation results, especially when dealing with the dataset that contains extreme outliers. The block bootstrap ensured that the original rainfall time series structure was preserved within each monsoon block and consequently produced more accurate estimation results. This indicates the advantages of the proposed estimation methods and multiple imputation approach in their role of providing accurate imputed values for missingness in Peninsular Malaysian daily rainfall dataset. 2020-10 Thesis https://ir.uitm.edu.my/id/eprint/60924/ https://ir.uitm.edu.my/id/eprint/60924/1/60924.pdf text en public phd doctoral Universiti Teknologi MARA Faculty of Computer and Mathematical Sciences Mohd Deni, Sayang (Assoc. Prof. Dr.)
institution Universiti Teknologi MARA
collection UiTM Institutional Repository
language English
advisor Mohd Deni, Sayang (Assoc. Prof. Dr.)
topic Data processing
Rain and rainfall
spellingShingle Data processing
Rain and rainfall
Amin Burhanuddin, Siti Nur Zahrah
The enhancement of normal ratio method through multiple imputation approach in estimating missing data with outliers for Peninsular Malaysian rainfall dataset / Siti Nur Zahrah Amin Burhanuddin
description The complete rainfall dataset is very important in representing the climatological characteristics precisely, especially for hydrological and meteorological studies. It is also contributed to effective and efficient environmental management. However, the rainfall data is highly vulnerable to the missing problem due to the dynamic process of the climatic variable. Furthermore, the data is exposed to the seasonal activities that could contribute to the uncertainty and irregularity variations in the rainfall amount which will cause the presence of outliers in the dataset. These situations will affect the quality of the rainfall dataset and subsequently provide inaccurate information to the users. Concerning this situation, this study attempts to develop a practical and reliable approach to treat the missing values in the effort to provide a good quality dataset for the public domain. Spatial estimation method, i.e. normal ratio method was considered in this study to estimate the missing rainfall data. Various efforts were proposed to improve the performance of the method, however, there are lacking works on robustifying the method so that it can perform well for the dataset that contains outliers. Therefore, this study aims to propose the enhancement of normal ratio methods for imputing the missing values in the daily rainfall dataset with outliers. The robust statistics (i.e. trimmed mean, median, and geometric median) were adopted in the proposed methods to make them less affected by the outliers. The normal ratio method was commonly implemented through single imputation approach, but this approach encounters with the limitation of not considering uncertainty in missing values. Thus, this study has proposed a multiple imputation approach based on block bootstrap to overcome the limitation of single imputation approach as well as improving the performance of the existing multiple imputation approach incorporated in Amelia package. Block bootstrap was firstly introduced in the proposed multiple imputation approach (named as NRMI-Bboot) to enhance the performance when dealing with the rainfall time series. The performance of each estimation method was evaluated based on five performance criteria at six different levels of missing data (5%, 10%, 15%. 20%, 25%, and 30%) and three levels of outlying data (5%, 10%, and 15%) that have been created in the dataset. Complete 40 years daily rainfall data from 22 meteorology stations were considered for the analysis purpose. Four target stations were selected as the representative of the main regions in Peninsular Malaysia (northwest, east, west, and southwest). The capability of the estimation methods was further verified using distribution fitting. The adoption of the robust statistics in the proposed estimation methods associated with the NRMI-Bboot approach has provided an improvement to the estimation results, especially when dealing with the dataset that contains extreme outliers. The block bootstrap ensured that the original rainfall time series structure was preserved within each monsoon block and consequently produced more accurate estimation results. This indicates the advantages of the proposed estimation methods and multiple imputation approach in their role of providing accurate imputed values for missingness in Peninsular Malaysian daily rainfall dataset.
format Thesis
qualification_name Doctor of Philosophy (PhD.)
qualification_level Doctorate
author Amin Burhanuddin, Siti Nur Zahrah
author_facet Amin Burhanuddin, Siti Nur Zahrah
author_sort Amin Burhanuddin, Siti Nur Zahrah
title The enhancement of normal ratio method through multiple imputation approach in estimating missing data with outliers for Peninsular Malaysian rainfall dataset / Siti Nur Zahrah Amin Burhanuddin
title_short The enhancement of normal ratio method through multiple imputation approach in estimating missing data with outliers for Peninsular Malaysian rainfall dataset / Siti Nur Zahrah Amin Burhanuddin
title_full The enhancement of normal ratio method through multiple imputation approach in estimating missing data with outliers for Peninsular Malaysian rainfall dataset / Siti Nur Zahrah Amin Burhanuddin
title_fullStr The enhancement of normal ratio method through multiple imputation approach in estimating missing data with outliers for Peninsular Malaysian rainfall dataset / Siti Nur Zahrah Amin Burhanuddin
title_full_unstemmed The enhancement of normal ratio method through multiple imputation approach in estimating missing data with outliers for Peninsular Malaysian rainfall dataset / Siti Nur Zahrah Amin Burhanuddin
title_sort enhancement of normal ratio method through multiple imputation approach in estimating missing data with outliers for peninsular malaysian rainfall dataset / siti nur zahrah amin burhanuddin
granting_institution Universiti Teknologi MARA
granting_department Faculty of Computer and Mathematical Sciences
publishDate 2020
url https://ir.uitm.edu.my/id/eprint/60924/1/60924.pdf
_version_ 1783735186312134656