Types of covariate and distribution effects on parameter estimates and goodness-of-fit test using clustering partitioning strategy for multinomial logistic regression / Hamzah Abdul Hamid

This thesis presents a simulation study on parameter estimation for binary and multinomial logistic regression, and the extension of the clustering partitioning strategy for goodness-of-fit test to multinomial logistic regression model. The motivation behind this study is influenced by two main fact...

Full description

Saved in:
Bibliographic Details
Main Author: Abdul Hamid, Hamzah
Format: Thesis
Language:English
Published: 2017
Subjects:
Online Access:https://ir.uitm.edu.my/id/eprint/66514/1/66514.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-uitm-ir.66514
record_format uketd_dc
spelling my-uitm-ir.665142023-01-27T02:50:08Z Types of covariate and distribution effects on parameter estimates and goodness-of-fit test using clustering partitioning strategy for multinomial logistic regression / Hamzah Abdul Hamid 2017 Abdul Hamid, Hamzah Regression analysis. Correlation analysis. Spatial analysis (Statistics) This thesis presents a simulation study on parameter estimation for binary and multinomial logistic regression, and the extension of the clustering partitioning strategy for goodness-of-fit test to multinomial logistic regression model. The motivation behind this study is influenced by two main factors. Firstly, parameter estimation is often sensitive to sample size and types of data. Simulation studies are useful to assess and confirm the effects of parameter estimation for binary and multinomial logistic regression under various conditions. The first phase of this study covers the effect of different types of covariate, distributions and sample size on parameter estimation for binary and multinomial logistic regression model. Data were simulated for different sample sizes, types of covariate (continuous, count, categorical) arid distributions (normal or skewed for continuous variable). The simulation results show that the effect of skewed and categorical covariate reduces as sample size increases. The parameter estimates for normal distribution covariate apparently are less affected by sample size. For multinomial logistic regression model with a single covariate, a sample size of at least 300 is required to obtain unbiased estimates when the covariate is positively skewed or is a categorical covariate. A much larger sample size is required when covariates are negatively skewed. In Phase 2, we investigate the goodness-of-fit (GoF) tests for multinomial logistic regression. Goodness-of-fit tests are important to assess if the model fits the data. We investigated the Type I error and power of two goodness-of-fit tests for multinomial logistic regression via a simulation study. The GoF test using partitioning strategy (clustering) in the covariate space, XP*G w a s compared with another test, Cg which was based on grouping of predicted probabilities. The power of both tests was investigated when quadratic term or interaction term were omitted from the model. The proposed test XP*G shows good Type I error and ample power except for multinomial models with highly skewed covariate distribution. Additionally, the proposed test XP*G has good power in detecting omission of continuous interaction term. Further simulation results showd that partitioning strategy using Hierarchical Clustering with Canberra distance, %C,G performs better than XP*G (Hiearchical clustering with Euclidean distance) and XI*G (Partitioning using k-medoids). The application on a real dataset confirmed the simulation results. The simulation and analyses were carried out using R, an open-source programming language for statistical computing and graphics. 2017 Thesis https://ir.uitm.edu.my/id/eprint/66514/ https://ir.uitm.edu.my/id/eprint/66514/1/66514.pdf text en public phd doctoral Universiti Teknologi MARA (UiTM) Faculty of Computer and Mathematical Sciences Yap, Bee Wah
institution Universiti Teknologi MARA
collection UiTM Institutional Repository
language English
advisor Yap, Bee Wah
topic Regression analysis
Correlation analysis
Spatial analysis (Statistics)
spellingShingle Regression analysis
Correlation analysis
Spatial analysis (Statistics)
Abdul Hamid, Hamzah
Types of covariate and distribution effects on parameter estimates and goodness-of-fit test using clustering partitioning strategy for multinomial logistic regression / Hamzah Abdul Hamid
description This thesis presents a simulation study on parameter estimation for binary and multinomial logistic regression, and the extension of the clustering partitioning strategy for goodness-of-fit test to multinomial logistic regression model. The motivation behind this study is influenced by two main factors. Firstly, parameter estimation is often sensitive to sample size and types of data. Simulation studies are useful to assess and confirm the effects of parameter estimation for binary and multinomial logistic regression under various conditions. The first phase of this study covers the effect of different types of covariate, distributions and sample size on parameter estimation for binary and multinomial logistic regression model. Data were simulated for different sample sizes, types of covariate (continuous, count, categorical) arid distributions (normal or skewed for continuous variable). The simulation results show that the effect of skewed and categorical covariate reduces as sample size increases. The parameter estimates for normal distribution covariate apparently are less affected by sample size. For multinomial logistic regression model with a single covariate, a sample size of at least 300 is required to obtain unbiased estimates when the covariate is positively skewed or is a categorical covariate. A much larger sample size is required when covariates are negatively skewed. In Phase 2, we investigate the goodness-of-fit (GoF) tests for multinomial logistic regression. Goodness-of-fit tests are important to assess if the model fits the data. We investigated the Type I error and power of two goodness-of-fit tests for multinomial logistic regression via a simulation study. The GoF test using partitioning strategy (clustering) in the covariate space, XP*G w a s compared with another test, Cg which was based on grouping of predicted probabilities. The power of both tests was investigated when quadratic term or interaction term were omitted from the model. The proposed test XP*G shows good Type I error and ample power except for multinomial models with highly skewed covariate distribution. Additionally, the proposed test XP*G has good power in detecting omission of continuous interaction term. Further simulation results showd that partitioning strategy using Hierarchical Clustering with Canberra distance, %C,G performs better than XP*G (Hiearchical clustering with Euclidean distance) and XI*G (Partitioning using k-medoids). The application on a real dataset confirmed the simulation results. The simulation and analyses were carried out using R, an open-source programming language for statistical computing and graphics.
format Thesis
qualification_name Doctor of Philosophy (PhD.)
qualification_level Doctorate
author Abdul Hamid, Hamzah
author_facet Abdul Hamid, Hamzah
author_sort Abdul Hamid, Hamzah
title Types of covariate and distribution effects on parameter estimates and goodness-of-fit test using clustering partitioning strategy for multinomial logistic regression / Hamzah Abdul Hamid
title_short Types of covariate and distribution effects on parameter estimates and goodness-of-fit test using clustering partitioning strategy for multinomial logistic regression / Hamzah Abdul Hamid
title_full Types of covariate and distribution effects on parameter estimates and goodness-of-fit test using clustering partitioning strategy for multinomial logistic regression / Hamzah Abdul Hamid
title_fullStr Types of covariate and distribution effects on parameter estimates and goodness-of-fit test using clustering partitioning strategy for multinomial logistic regression / Hamzah Abdul Hamid
title_full_unstemmed Types of covariate and distribution effects on parameter estimates and goodness-of-fit test using clustering partitioning strategy for multinomial logistic regression / Hamzah Abdul Hamid
title_sort types of covariate and distribution effects on parameter estimates and goodness-of-fit test using clustering partitioning strategy for multinomial logistic regression / hamzah abdul hamid
granting_institution Universiti Teknologi MARA (UiTM)
granting_department Faculty of Computer and Mathematical Sciences
publishDate 2017
url https://ir.uitm.edu.my/id/eprint/66514/1/66514.pdf
_version_ 1783735617002143744