Robust variable selection methods for large- scale data in the presence of multicollinearity, autocorrelated errors and outliers

The robust correlation coefficient based on robust multivariate location and scatter matrix such as Fast Minimum Covariance Determinant (Fast MCD) is not feasible option for high dimensional data due to its time consuming procedure. To overcome this problem, robust adjusted Winsorization correlat...

Full description

Saved in:
Bibliographic Details
Main Author: Uraibi, Hassan S.
Format: Thesis
Language:English
Published: 2016
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/69762/1/IPM%202016%205%20-%20IR.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-upm-ir.69762
record_format uketd_dc
spelling my-upm-ir.697622019-10-29T06:54:25Z Robust variable selection methods for large- scale data in the presence of multicollinearity, autocorrelated errors and outliers 2016-06 Uraibi, Hassan S. The robust correlation coefficient based on robust multivariate location and scatter matrix such as Fast Minimum Covariance Determinant (Fast MCD) is not feasible option for high dimensional data due to its time consuming procedure. To overcome this problem, robust adjusted Winsorization correlation (Adj.Winso.cor) is put forward. Unfortunately, the Adj.Winso.cor yields very poor results in the presence of multivariate outliers. Hence, we propose robust multivariate correlation matrix based on Reweighted Fast Consistent and High breakdown (RFCH) estimator. The findings show that the RFCH.cor is more robust than the Adj.Winso.cor in the presence of multivariate outliers. Forward selection (FS) is very effective variable selection procedure for selecting a parsimonious subset of covariates from a large number of candidate covariates. However, FS is not robust to outliers. Robust forward selection method (FS.Winso) based on partial correlations which is derived from Maronna’s bivariate M-estimator of scatter matrix and adjusted Winsorization pairwise correlation are introduced in a literatures to overcome the problem of outliers. We develop Robust Forward Selection algorithm based on RFCH correlation coefficient (RFS.RFCH) because FS.Winso is not robust to multivariate outliers. The results of our study indicate that the RFS.RFCH is more efficient than the FS and FS.Winso. The existing Robust-LARS based on Winsorization correlation (RLARS-Winsor) has some drawbacks whereby it is not robust in the presence of multivariate outliers. Hence, Robust-LARS (RLARS-RFCH) based on √ consistent multivariate (RFCH) correlation matrix is developed. The proposed method is computationally efficient and its performance outperformed the RLARS-Winsor The algorithm of all possible subsets is greedy and it is inefficient and unstable in the presence of autocorrelated errors and outliers. To overcome the instability selection problem, a stability selection approach is put forward to enhance the performance of single-split variable selection method. Unfortunately, the classical stability selection procedure is very sensitive to outliers and serially correlated errors. The stability procedure based on RFCH estimator is therefore developed. The results of the study show that our propose Robust Multi Split based on RFCH successfully and consistently select the correct variables in the final model. Thus far, there is no variable selection procedure in literature that deal with the problem of high magnitude of multicollinearity in the presence of outliers. Hence, Robust Non- Grouped variable selection(RNGVS.RFCH) in the presence of high multicollinearity problem and outliers is developed. The results signify that our proposed RNGVS.RFCH method able to correctly select the important variables in the final model. Not much research is focused on the problem of large data in the presence of outliers and autocorrelated errors. In this situation, the existing Elastic-Net and RE-Net methods are not capable of selecting the important variables in the final model. Thus, a new method that we call before and after elastic-net (BAE-Net) regression is proposed. The Reweighted Multivariate Normal (RMVN) algorithm is incorporated in the algorithm of the BAE-Net. The BAE-Net is found to do a credible job in selecting the correct important variables in the final model. Robust statistics Outliers (Statistics) Multicollinearity 2016-06 Thesis http://psasir.upm.edu.my/id/eprint/69762/ http://psasir.upm.edu.my/id/eprint/69762/1/IPM%202016%205%20-%20IR.pdf text en public doctoral Universiti Putra Malaysia Robust statistics Outliers (Statistics) Multicollinearity
institution Universiti Putra Malaysia
collection PSAS Institutional Repository
language English
topic Robust statistics
Outliers (Statistics)
Multicollinearity
spellingShingle Robust statistics
Outliers (Statistics)
Multicollinearity
Uraibi, Hassan S.
Robust variable selection methods for large- scale data in the presence of multicollinearity, autocorrelated errors and outliers
description The robust correlation coefficient based on robust multivariate location and scatter matrix such as Fast Minimum Covariance Determinant (Fast MCD) is not feasible option for high dimensional data due to its time consuming procedure. To overcome this problem, robust adjusted Winsorization correlation (Adj.Winso.cor) is put forward. Unfortunately, the Adj.Winso.cor yields very poor results in the presence of multivariate outliers. Hence, we propose robust multivariate correlation matrix based on Reweighted Fast Consistent and High breakdown (RFCH) estimator. The findings show that the RFCH.cor is more robust than the Adj.Winso.cor in the presence of multivariate outliers. Forward selection (FS) is very effective variable selection procedure for selecting a parsimonious subset of covariates from a large number of candidate covariates. However, FS is not robust to outliers. Robust forward selection method (FS.Winso) based on partial correlations which is derived from Maronna’s bivariate M-estimator of scatter matrix and adjusted Winsorization pairwise correlation are introduced in a literatures to overcome the problem of outliers. We develop Robust Forward Selection algorithm based on RFCH correlation coefficient (RFS.RFCH) because FS.Winso is not robust to multivariate outliers. The results of our study indicate that the RFS.RFCH is more efficient than the FS and FS.Winso. The existing Robust-LARS based on Winsorization correlation (RLARS-Winsor) has some drawbacks whereby it is not robust in the presence of multivariate outliers. Hence, Robust-LARS (RLARS-RFCH) based on √ consistent multivariate (RFCH) correlation matrix is developed. The proposed method is computationally efficient and its performance outperformed the RLARS-Winsor The algorithm of all possible subsets is greedy and it is inefficient and unstable in the presence of autocorrelated errors and outliers. To overcome the instability selection problem, a stability selection approach is put forward to enhance the performance of single-split variable selection method. Unfortunately, the classical stability selection procedure is very sensitive to outliers and serially correlated errors. The stability procedure based on RFCH estimator is therefore developed. The results of the study show that our propose Robust Multi Split based on RFCH successfully and consistently select the correct variables in the final model. Thus far, there is no variable selection procedure in literature that deal with the problem of high magnitude of multicollinearity in the presence of outliers. Hence, Robust Non- Grouped variable selection(RNGVS.RFCH) in the presence of high multicollinearity problem and outliers is developed. The results signify that our proposed RNGVS.RFCH method able to correctly select the important variables in the final model. Not much research is focused on the problem of large data in the presence of outliers and autocorrelated errors. In this situation, the existing Elastic-Net and RE-Net methods are not capable of selecting the important variables in the final model. Thus, a new method that we call before and after elastic-net (BAE-Net) regression is proposed. The Reweighted Multivariate Normal (RMVN) algorithm is incorporated in the algorithm of the BAE-Net. The BAE-Net is found to do a credible job in selecting the correct important variables in the final model.
format Thesis
qualification_level Doctorate
author Uraibi, Hassan S.
author_facet Uraibi, Hassan S.
author_sort Uraibi, Hassan S.
title Robust variable selection methods for large- scale data in the presence of multicollinearity, autocorrelated errors and outliers
title_short Robust variable selection methods for large- scale data in the presence of multicollinearity, autocorrelated errors and outliers
title_full Robust variable selection methods for large- scale data in the presence of multicollinearity, autocorrelated errors and outliers
title_fullStr Robust variable selection methods for large- scale data in the presence of multicollinearity, autocorrelated errors and outliers
title_full_unstemmed Robust variable selection methods for large- scale data in the presence of multicollinearity, autocorrelated errors and outliers
title_sort robust variable selection methods for large- scale data in the presence of multicollinearity, autocorrelated errors and outliers
granting_institution Universiti Putra Malaysia
publishDate 2016
url http://psasir.upm.edu.my/id/eprint/69762/1/IPM%202016%205%20-%20IR.pdf
_version_ 1747812724586315776