Robust diagnostics and variable selection procedure based on modified reweighted fast consistent and high breakdown estimator for high dimensional data
The reweighted fast, consistent and high breakdown (RFCH) estimator is a multivariate procedure used to estimate the robust location and scatter matrix. It is incorporated in the robust Mahalanobis distance to detect the presence of high leverage points in a dataset. The method showed excellent p...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2022
|
Subjects: | |
Online Access: | http://psasir.upm.edu.my/id/eprint/104718/1/ISHAQ%20ABDULLAHI%20BABA%20-%20IR.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my-upm-ir.104718 |
---|---|
record_format |
uketd_dc |
spelling |
my-upm-ir.1047182023-10-05T06:36:21Z Robust diagnostics and variable selection procedure based on modified reweighted fast consistent and high breakdown estimator for high dimensional data 2022-01 Baba, Ishaq Abdullahi The reweighted fast, consistent and high breakdown (RFCH) estimator is a multivariate procedure used to estimate the robust location and scatter matrix. It is incorporated in the robust Mahalanobis distance to detect the presence of high leverage points in a dataset. The method showed excellent performance compared to its competitors. However, it cannot be applied when the sample size is less than the number of predictor variables. In addressing this problem, some robust procedures for high dimensional dataset via the RFCH algorithm are developed. A modified reweighted fast consistent and high breakdown (MRFCH) estimator in high dimensional data based on the diagonal elements of the scatter matrix instead of its entire elements in the computation of robust Mahalanobis distance within the RFCH algorithm is developed. The proposed method inherits the robustness properties of the original RFCH estimators. Simulation results and artificial data examples showed that the proposed MRFCH is more efficient and faster than the MRCD and OGK estimators. Outlier detection and classification are critical issues that affect prediction accuracy if not handled correctly. Mahalanobis distance (MD) measure is one of the most popular multivariate analysis tools used to detect multivariate outlying observations. However, the traditional MD based on the classical mean and covariance rarely identifies all the multivariate outliers in a given dataset, which gives rise to the masking and swamping problems. Therefore, the robust location and covariance matrix based on the MRFCH is used instead of the classical estimators to tackle these problems. The proposed algorithm has been applied to detect outliers in the high dimensional data. The results obtained from the simulation study and real data sets indicate that the proposed method possesses high detection power with minimal misclassification error compared to the MRCD and MDP methods. The classical correlation estimators that employ the sample mean of the dependent and independent variables are known to be affected by outliers. Therefore, the robust weighted correlation coefficient that can reduce the effect of outliers is proposed. The weights based on the RD (MRFCH) are incorporated in establishing the proposed robust correlation to solve the problems. The performance of the proposed method is illustrated using simulation study and on glass vessel data with 1920 variables, cardiomyopathy microarray data with 6319 variables, and octane data with 226 dimensions. The results show that the robust weighted correlation based on RD (MRFCH) is more powerful and efficient than the existing methods, irrespective of dimension, sample size, and contamination levels. Sure screening-based correlation methods are popular tools used to select the most significant variables in the true model in sparse and high dimensional analysis. However, in practice, high leverage points may lead to misleading results in solving variable selection problems. Therefore, a robust sure independence screening procedure based on the weighted correlation algorithm of MRFCH for high dimensional data is developed to address this problem. The simulation study results and real data sets indicate that the proposed MRFCHCS+LAD-SCAD estimator was found to be the best method compared to other methods in this study. Algorithms Robust control 2022-01 Thesis http://psasir.upm.edu.my/id/eprint/104718/ http://psasir.upm.edu.my/id/eprint/104718/1/ISHAQ%20ABDULLAHI%20BABA%20-%20IR.pdf text en public doctoral Universiti Putra Malaysia Algorithms Robust control Midi, Habshah |
institution |
Universiti Putra Malaysia |
collection |
PSAS Institutional Repository |
language |
English |
advisor |
Midi, Habshah |
topic |
Algorithms Robust control |
spellingShingle |
Algorithms Robust control Baba, Ishaq Abdullahi Robust diagnostics and variable selection procedure based on modified reweighted fast consistent and high breakdown estimator for high dimensional data |
description |
The reweighted fast, consistent and high breakdown (RFCH) estimator is a multivariate
procedure used to estimate the robust location and scatter matrix. It is incorporated
in the robust Mahalanobis distance to detect the presence of high leverage
points in a dataset. The method showed excellent performance compared to its competitors. However, it cannot be applied when the sample size is less than the number
of predictor variables. In addressing this problem, some robust procedures for high
dimensional dataset via the RFCH algorithm are developed.
A modified reweighted fast consistent and high breakdown (MRFCH) estimator in
high dimensional data based on the diagonal elements of the scatter matrix instead
of its entire elements in the computation of robust Mahalanobis distance within the
RFCH algorithm is developed. The proposed method inherits the robustness properties
of the original RFCH estimators. Simulation results and artificial data examples
showed that the proposed MRFCH is more efficient and faster than the MRCD and
OGK estimators.
Outlier detection and classification are critical issues that affect prediction accuracy
if not handled correctly. Mahalanobis distance (MD) measure is one of the most
popular multivariate analysis tools used to detect multivariate outlying observations.
However, the traditional MD based on the classical mean and covariance rarely identifies
all the multivariate outliers in a given dataset, which gives rise to the masking
and swamping problems. Therefore, the robust location and covariance matrix based
on the MRFCH is used instead of the classical estimators to tackle these problems.
The proposed algorithm has been applied to detect outliers in the high dimensional
data. The results obtained from the simulation study and real data sets indicate that
the proposed method possesses high detection power with minimal misclassification
error compared to the MRCD and MDP methods.
The classical correlation estimators that employ the sample mean of the dependent
and independent variables are known to be affected by outliers. Therefore, the robust
weighted correlation coefficient that can reduce the effect of outliers is proposed.
The weights based on the RD (MRFCH) are incorporated in establishing the proposed
robust correlation to solve the problems. The performance of the proposed
method is illustrated using simulation study and on glass vessel data with 1920 variables,
cardiomyopathy microarray data with 6319 variables, and octane data with
226 dimensions. The results show that the robust weighted correlation based on
RD (MRFCH) is more powerful and efficient than the existing methods, irrespective
of dimension, sample size, and contamination levels.
Sure screening-based correlation methods are popular tools used to select the most
significant variables in the true model in sparse and high dimensional analysis. However,
in practice, high leverage points may lead to misleading results in solving variable
selection problems. Therefore, a robust sure independence screening procedure
based on the weighted correlation algorithm of MRFCH for high dimensional data
is developed to address this problem. The simulation study results and real data sets
indicate that the proposed MRFCHCS+LAD-SCAD estimator was found to be the
best method compared to other methods in this study. |
format |
Thesis |
qualification_level |
Doctorate |
author |
Baba, Ishaq Abdullahi |
author_facet |
Baba, Ishaq Abdullahi |
author_sort |
Baba, Ishaq Abdullahi |
title |
Robust diagnostics and variable selection procedure based on modified reweighted fast consistent and high breakdown estimator for high dimensional data |
title_short |
Robust diagnostics and variable selection procedure based on modified reweighted fast consistent and high breakdown estimator for high dimensional data |
title_full |
Robust diagnostics and variable selection procedure based on modified reweighted fast consistent and high breakdown estimator for high dimensional data |
title_fullStr |
Robust diagnostics and variable selection procedure based on modified reweighted fast consistent and high breakdown estimator for high dimensional data |
title_full_unstemmed |
Robust diagnostics and variable selection procedure based on modified reweighted fast consistent and high breakdown estimator for high dimensional data |
title_sort |
robust diagnostics and variable selection procedure based on modified reweighted fast consistent and high breakdown estimator for high dimensional data |
granting_institution |
Universiti Putra Malaysia |
publishDate |
2022 |
url |
http://psasir.upm.edu.my/id/eprint/104718/1/ISHAQ%20ABDULLAHI%20BABA%20-%20IR.pdf |
_version_ |
1783725836042502144 |