Solution To The Multicollinearity Problem In Ridge Regression Model

In regression analysis, presence of multicollinearity in independent variables is a popular problem that results in serious non-desirable impacts on the analysis. One of the main effects on Ordinary-Least Squares (OLS) is that the estimator generates large sampling variances that can lead to exclusi...

Full description

Saved in:
Bibliographic Details
Main Author: Hanan Moh. B. Duzan
Format: Thesis
Language:en_US
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In regression analysis, presence of multicollinearity in independent variables is a popular problem that results in serious non-desirable impacts on the analysis. One of the main effects on Ordinary-Least Squares (OLS) is that the estimator generates large sampling variances that can lead to exclusion of important coefficients from the model. To tackle this instability, a number of methods were developed and the most common of them is ridge regression (RR). To address the problems of multicollinearity in the data in regression analysis, this study will focus on several parts. Firstly, robust tests are applied to investigate the presence multicollinearity in the data, both in RR and OLS regression models. The behaviors of these models are examined. In addition, simulation studies of the finite sample behaviors based on Monte Carlo simulation are considered. The researcher found evidence on that in the presence of multicollinearity in the data RR outperforms the commonly used OLS regression analysis. Secondly, this study proposes alternative method for estimating regression parameters in RR. Some properties and statistical inference for the parameters are also considered. To better understand the sample behavior of the proposed method, the study ran Monte Carlo simulation studies. The proposed estimator yields unbiased estimates with small MSE in the presence of multicollinearity in the data. The results indicate that the proposed estimator outperforms OLS models. Besides the measures of the standard deviation, VIF, SSE, MSE, ß, and R2 of the estimates, accuracy of the parameter estimates was also assessed to support these findings. As well, the corresponding confidence intervals were estimated and a cross-validation analysis for each model was performed. Thirdly, the study attempted to identify the most relevant k value for RR in two-, and four-variable regression models. Simulation studies were implemented to examine the behavior of RR in each case using 1,000 random data sets in each simulation study. The p-variable linear regression models were fit by the least-squares method. The simulation outcomes illustrate that there is a statistically-significant relationship between k and R2. The various models established were tested for goodness of fit using the coefficient of determination and cross-validation as criteria. The most appropriate model to describe the relation between k and R2 is multiple regression model. A second round of simulation analysis was performed using a sample size of 100. The study found that the equation for k does not differ, irrespective of sample size. Finally, the researcher tested performance of the proposed method using two examples of real data obtained from the Uniform Crime Reporting Program survey of National Violent Crimes in the USA over the period 1987-2014 to assess the proposed k value, both in the two-, and the four-covariate cases. The researcher employed the methods discussed above and re-analyzed the data accordingly. Eventually, the study reached to the conclusion that RR outperforms the OLS regression under all scenarios.