Solution To The Multicollinearity Problem In Ridge Regression Model

In regression analysis, presence of multicollinearity in independent variables is a popular problem that results in serious non-desirable impacts on the analysis. One of the main effects on Ordinary-Least Squares (OLS) is that the estimator generates large sampling variances that can lead to exclusi...

Full description

Saved in:
Bibliographic Details
Main Author: Hanan Moh. B. Duzan
Format: Thesis
Language:en_US
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-usim-ddms-13423
record_format uketd_dc
spelling my-usim-ddms-134232024-05-29T06:53:06Z Solution To The Multicollinearity Problem In Ridge Regression Model Hanan Moh. B. Duzan In regression analysis, presence of multicollinearity in independent variables is a popular problem that results in serious non-desirable impacts on the analysis. One of the main effects on Ordinary-Least Squares (OLS) is that the estimator generates large sampling variances that can lead to exclusion of important coefficients from the model. To tackle this instability, a number of methods were developed and the most common of them is ridge regression (RR). To address the problems of multicollinearity in the data in regression analysis, this study will focus on several parts. Firstly, robust tests are applied to investigate the presence multicollinearity in the data, both in RR and OLS regression models. The behaviors of these models are examined. In addition, simulation studies of the finite sample behaviors based on Monte Carlo simulation are considered. The researcher found evidence on that in the presence of multicollinearity in the data RR outperforms the commonly used OLS regression analysis. Secondly, this study proposes alternative method for estimating regression parameters in RR. Some properties and statistical inference for the parameters are also considered. To better understand the sample behavior of the proposed method, the study ran Monte Carlo simulation studies. The proposed estimator yields unbiased estimates with small MSE in the presence of multicollinearity in the data. The results indicate that the proposed estimator outperforms OLS models. Besides the measures of the standard deviation, VIF, SSE, MSE, ß, and R2 of the estimates, accuracy of the parameter estimates was also assessed to support these findings. As well, the corresponding confidence intervals were estimated and a cross-validation analysis for each model was performed. Thirdly, the study attempted to identify the most relevant k value for RR in two-, and four-variable regression models. Simulation studies were implemented to examine the behavior of RR in each case using 1,000 random data sets in each simulation study. The p-variable linear regression models were fit by the least-squares method. The simulation outcomes illustrate that there is a statistically-significant relationship between k and R2. The various models established were tested for goodness of fit using the coefficient of determination and cross-validation as criteria. The most appropriate model to describe the relation between k and R2 is multiple regression model. A second round of simulation analysis was performed using a sample size of 100. The study found that the equation for k does not differ, irrespective of sample size. Finally, the researcher tested performance of the proposed method using two examples of real data obtained from the Uniform Crime Reporting Program survey of National Violent Crimes in the USA over the period 1987-2014 to assess the proposed k value, both in the two-, and the four-covariate cases. The researcher employed the methods discussed above and re-analyzed the data accordingly. Eventually, the study reached to the conclusion that RR outperforms the OLS regression under all scenarios. Universiti Sains Islam Malaysia 2017-08 Thesis en_US https://oarep.usim.edu.my/handle/123456789/13423 https://oarep.usim.edu.my/bitstreams/d29c5dc1-1ec2-4c18-b379-128719f693f8/download 8a4605be74aa9ea9d79846c1fba20a33 Multicollinearity Ordinary-Least Squares (OLS), Ridge Regression. Regression analysis.
institution Universiti Sains Islam Malaysia
collection USIM Institutional Repository
language en_US
topic Multicollinearity
Multicollinearity
Regression analysis.
spellingShingle Multicollinearity
Multicollinearity
Regression analysis.
Hanan Moh. B. Duzan
Solution To The Multicollinearity Problem In Ridge Regression Model
description In regression analysis, presence of multicollinearity in independent variables is a popular problem that results in serious non-desirable impacts on the analysis. One of the main effects on Ordinary-Least Squares (OLS) is that the estimator generates large sampling variances that can lead to exclusion of important coefficients from the model. To tackle this instability, a number of methods were developed and the most common of them is ridge regression (RR). To address the problems of multicollinearity in the data in regression analysis, this study will focus on several parts. Firstly, robust tests are applied to investigate the presence multicollinearity in the data, both in RR and OLS regression models. The behaviors of these models are examined. In addition, simulation studies of the finite sample behaviors based on Monte Carlo simulation are considered. The researcher found evidence on that in the presence of multicollinearity in the data RR outperforms the commonly used OLS regression analysis. Secondly, this study proposes alternative method for estimating regression parameters in RR. Some properties and statistical inference for the parameters are also considered. To better understand the sample behavior of the proposed method, the study ran Monte Carlo simulation studies. The proposed estimator yields unbiased estimates with small MSE in the presence of multicollinearity in the data. The results indicate that the proposed estimator outperforms OLS models. Besides the measures of the standard deviation, VIF, SSE, MSE, ß, and R2 of the estimates, accuracy of the parameter estimates was also assessed to support these findings. As well, the corresponding confidence intervals were estimated and a cross-validation analysis for each model was performed. Thirdly, the study attempted to identify the most relevant k value for RR in two-, and four-variable regression models. Simulation studies were implemented to examine the behavior of RR in each case using 1,000 random data sets in each simulation study. The p-variable linear regression models were fit by the least-squares method. The simulation outcomes illustrate that there is a statistically-significant relationship between k and R2. The various models established were tested for goodness of fit using the coefficient of determination and cross-validation as criteria. The most appropriate model to describe the relation between k and R2 is multiple regression model. A second round of simulation analysis was performed using a sample size of 100. The study found that the equation for k does not differ, irrespective of sample size. Finally, the researcher tested performance of the proposed method using two examples of real data obtained from the Uniform Crime Reporting Program survey of National Violent Crimes in the USA over the period 1987-2014 to assess the proposed k value, both in the two-, and the four-covariate cases. The researcher employed the methods discussed above and re-analyzed the data accordingly. Eventually, the study reached to the conclusion that RR outperforms the OLS regression under all scenarios.
format Thesis
author Hanan Moh. B. Duzan
author_facet Hanan Moh. B. Duzan
author_sort Hanan Moh. B. Duzan
title Solution To The Multicollinearity Problem In Ridge Regression Model
title_short Solution To The Multicollinearity Problem In Ridge Regression Model
title_full Solution To The Multicollinearity Problem In Ridge Regression Model
title_fullStr Solution To The Multicollinearity Problem In Ridge Regression Model
title_full_unstemmed Solution To The Multicollinearity Problem In Ridge Regression Model
title_sort solution to the multicollinearity problem in ridge regression model
granting_institution Universiti Sains Islam Malaysia
_version_ 1812444758204743680