Development and modification of H-statistic with winsorized approach means

Student’s t-test and ANOVA F-test are the classical statistical tests for comparing two or more independent groups. Both are powerful tests when data is normally distributed and variances are homogenous. However, the data with these properties sometime is difficult to be met in real-life will affect...

Full description

Saved in:
Bibliographic Details
Main Author: Teh, Kian Wooi
Format: Thesis
Language:eng
eng
Published: 2017
Subjects:
Online Access:https://etd.uum.edu.my/6991/1/s811121_01.pdf
https://etd.uum.edu.my/6991/2/s811121_02.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-uum-etd.6991
record_format uketd_dc
institution Universiti Utara Malaysia
collection UUM ETD
language eng
eng
advisor Abdullah, Suhaida
Md. Yusof, Zahayu
topic QA273-280 Probabilities
Mathematical statistics
spellingShingle QA273-280 Probabilities
Mathematical statistics
Teh, Kian Wooi
Development and modification of H-statistic with winsorized approach means
description Student’s t-test and ANOVA F-test are the classical statistical tests for comparing two or more independent groups. Both are powerful tests when data is normally distributed and variances are homogenous. However, the data with these properties sometime is difficult to be met in real-life will affect the Type I error rates control and reduce statistical power of the tests. H-statistic is a robust statistic but performs well only under non-normality dataset. This statistic had been invented with MOM estimator denoted as MOM-H. Therefore, in this study, two modified H-statistic with mean using Winsorizing approach are proposed to handle both violated properties. The proposed statistics are the H-statistic with Winsorized mean (WM) and the H-statistic with adaptive Winsorized mean (AWM) which denoted as WM-H and AWM-H, respectively. Using this modification, the tests perform better not only under non-normality, but also under heterogeneity of variances. The approach use predetermined values of 15% and 25% Winsorization. The WM is Winsorizing symmetrically while the AWM is Winsorizing adaptively according to the shape of distribution based on hinge estimators, HQ and HQ₁. The WM-H statistic consists of 15WM-H and 25WM-H, whereas the AWM-H comprises of 15WHQ-H, 25WHQ-H, 15WHQ₁-H and 25WHQ₁-H. The performances of the proposed tests are evaluated using Type I error rates and power of test based on simulation study. All the results from the proposed tests are compared with the original H-statistic, MOM-H and classical statistical tests. The findings indicate that 15WHQ-H performs the best for two groups case especially under heavy tailed distribution. Under skewed distribution, WM-H has better performance to others but comparable to MOM-H. In overall the proposed tests are able to give better results than the MOM-H and the classical statistical tests under certain conditions. The proposed tests are also validated using real dataset.
format Thesis
qualification_name masters
qualification_level Master's degree
author Teh, Kian Wooi
author_facet Teh, Kian Wooi
author_sort Teh, Kian Wooi
title Development and modification of H-statistic with winsorized approach means
title_short Development and modification of H-statistic with winsorized approach means
title_full Development and modification of H-statistic with winsorized approach means
title_fullStr Development and modification of H-statistic with winsorized approach means
title_full_unstemmed Development and modification of H-statistic with winsorized approach means
title_sort development and modification of h-statistic with winsorized approach means
granting_institution Universiti Utara Malaysia
granting_department Awang Had Salleh Graduate School of Arts & Sciences
publishDate 2017
url https://etd.uum.edu.my/6991/1/s811121_01.pdf
https://etd.uum.edu.my/6991/2/s811121_02.pdf
_version_ 1747828141702774784
spelling my-uum-etd.69912021-08-18T08:03:00Z Development and modification of H-statistic with winsorized approach means 2017 Teh, Kian Wooi Abdullah, Suhaida Md. Yusof, Zahayu Awang Had Salleh Graduate School of Arts & Sciences Awang Had Salleh Graduate School of Arts and Sciences QA273-280 Probabilities. Mathematical statistics Student’s t-test and ANOVA F-test are the classical statistical tests for comparing two or more independent groups. Both are powerful tests when data is normally distributed and variances are homogenous. However, the data with these properties sometime is difficult to be met in real-life will affect the Type I error rates control and reduce statistical power of the tests. H-statistic is a robust statistic but performs well only under non-normality dataset. This statistic had been invented with MOM estimator denoted as MOM-H. Therefore, in this study, two modified H-statistic with mean using Winsorizing approach are proposed to handle both violated properties. The proposed statistics are the H-statistic with Winsorized mean (WM) and the H-statistic with adaptive Winsorized mean (AWM) which denoted as WM-H and AWM-H, respectively. Using this modification, the tests perform better not only under non-normality, but also under heterogeneity of variances. The approach use predetermined values of 15% and 25% Winsorization. The WM is Winsorizing symmetrically while the AWM is Winsorizing adaptively according to the shape of distribution based on hinge estimators, HQ and HQ₁. The WM-H statistic consists of 15WM-H and 25WM-H, whereas the AWM-H comprises of 15WHQ-H, 25WHQ-H, 15WHQ₁-H and 25WHQ₁-H. The performances of the proposed tests are evaluated using Type I error rates and power of test based on simulation study. All the results from the proposed tests are compared with the original H-statistic, MOM-H and classical statistical tests. The findings indicate that 15WHQ-H performs the best for two groups case especially under heavy tailed distribution. Under skewed distribution, WM-H has better performance to others but comparable to MOM-H. In overall the proposed tests are able to give better results than the MOM-H and the classical statistical tests under certain conditions. The proposed tests are also validated using real dataset. 2017 Thesis https://etd.uum.edu.my/6991/ https://etd.uum.edu.my/6991/1/s811121_01.pdf text eng public https://etd.uum.edu.my/6991/2/s811121_02.pdf text eng public masters masters Universiti Utara Malaysia Abdullah, S. (2011). Kaedah Alexander-Govern menggunakan penganggar teguh dengan pendekatan pangkasan data: Satu kajian simulasi. (Unpublished Doctoral thesis). Universiti Utara Malaysia, Sintok, Malaysia Ahmad Mahir, R., & Al-Khazaleh, A. M. H. (2009). New method to estimate missing data by using the asymmetrical Winsorized mean in a time series. Applied Mathematical Sciences, 3(35), 1715 – 1726. Bradley, J. V. (1968). Distribution-free statistical tests. Englewood Cliffs, NJ: Prentice Hall. Bradley, J. V. (1978). Robustness? British Journal of Mathematical and Statistical Psychology, 31, 144-152. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). New York: Academic Press. Cohen, J. (1992a). A power primer. Psychological bulletin, 112(1), 155-159. doi:10.1037/0033-2909.112.1.155 Cohen, J. (1992b). Statistical power analysis. Current Directions in Psychological Science, 1(3), 98-101. Dixon W. J. (1960). Simplified estimation from censored normal samples. The Annals of Mathematical Statistics, 31(2), 385-391. Dixon W. J., & Tukey J. W. (1968) Approximate behavior of the distribution of winsorized t (trimming/winsorization 2). Technometrics, 10(1), 83-98. doi:10.2307/1266226 Efron B. (1979). Bootstrap methods: Another look at the Jackknife. The Annals of Statistics, 7(1), 1-26. Efron, B., & Tibshirani, R. J. (1986). Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Statistical Science, 1(1), 54-77. Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. New York: Chapman & Hall Inc. Erceg-Hurn, D. M., & Mirosevich, V. M. (2008). Modern robust statistical methods. American Psychologist, 63(7)591-601. doi:10.1037/0003-066X.63.7.591 Hall, P. (1986). On the number of bootstrap simulations required to construct a confidence interval. The Annals of Statistics. 14(4), 1453-1462. Hoaglin, D. C. (1985). Summarizing shape numerically: The g-and h-distributions. In D. Hoaglin, F. Mosteller, & J. Tukey (Eds), Exploring data tables, trends, and shapes (pp. 461–513). New York: Wiley. Hoaglin, D. C., Mosteller, F., & Tukey, J. W. (1983). Understanding robust and exploratory data analysis. New York: Wiley. Hogg, R. V. (1974). Adaptive robust procedures: A partial review and some suggestions for future applications and theory. Journal of the American Statistical Association, 69(348), 909-923. Huber, P. J. (1972). Robust statistics: A review. The Annals of Mathematical Statistics, 43(4), 1041-1067. Keselman, H. J., Algina, J., Lix, L., Wilcox, R. R., & Deering, K. (2008). A generally robust approach for testing hypotheses and setting confidence intervals for effect sizes. Psychological Methods, 13(2), 110-129. doi:10.1037/1082989X.13.2.110. Keselman, H. J., Othman, A. R., Wilcox, R. R., & Fradette, K. (2004). The new and improved two-sample t test. Psychological Science, 15(1), 47-51. Keselman, H. J., Wilcox, R. R., Algina, J., Fradette, K., & Othman, A. R. (2004). A power comparison of robust test statistics based on adaptive estimators. Journal of Modern Applied Statistical Methods, 3(1), 27-38. Keselman, H. J., Wilcox, R. R., Lix, L. M., Algina, J., & Fradette, K. (2007). Adaptive robust estimation and testing. British Journal of Mathematical and Statistical Psychology, 60, 267–293. doi:10.1348/000711005X63755 Keselman, H. J., Wilcox, R. R., Othman, A. R., & Fradette, K. (2002). Trimming, transforming statistics, and bootstrapping: circumventing the biasing effects of heteroscedasticity and non-normality. Journal of Modern Applied Statistical Methods, 1(2), 288-399. Lix, L. M., and Keselman, H. J. (1998). To trim or not to trim: Tests of location equality under heteroscedasticity and non-normality. Educational and Psychological Measurement, 58(3), 409-429. doi:10.1177/0013164498058003004 Manly, B. F. J. (2007). Randomization, bootstrap and Monte Carlo methods in biology (3rd ed.). Boca Raton, FL: Chapman & Hall/CRC. Md Yusof, Z., Abdullah, S., & Syed Yahaya, S. S. (2012). Type I error rates of parametric, robust and nonparametric methods for two groups cases. World Applied Sciences Journal, 16(12), 1815-1819. Md Yusof, Z., Abdullah, S., Syed Yahaya, S. S., & Othman, A. R. (2012). A robust alternative to the t–Test. Modern Applied Science, 6(5), 27-33. doi:10.5539/mas.v6n5p27 Mudholkar, A., Mudholkar, G. S., & Srivastava, D. K. (1991). A construction and appraisal of pooled trimmed-t statistics. Communications in Statistics: Theory and Methods, 20(4), 1345-1359. doi:10.1080/03610929108830569 Murphy, K. R., Myors, B., & Wolach, A. (2008). Statistical power analysis: A simple and general model for traditional and modern hypothesis tests (3rd ed.). New York: Routledge. Othman, A. R., Keselman, H. J., Padmanabhan, A R., Wilcox, R. R., & Fradette, K. (2004). Comparing measures of the "typical" score across treatment groups. British Journal of Mathematical and Statistical Psychology, 57(2), 215-234. Rasmussen, J. L. (1989). Data transformation, Type I error rate and power. British Journal of Mathematical and Statistical Psychology, 42(2), 203–213. doi:10.1111/j.2044-8317.1989.tb00910.x Reed, J. F., & Stark, D. B. (1996). Hinge estimators of location: robust to asymmetry. Computer Methods and Programs in Biomedicine, 49(1), 11-17. doi:10.1016/0169-2607(95)01708-9 Rivest, L. P. (1994). Statistical properties of Winsorized means for skewed distributions. Biometrika, 81(2), 373-383. doi:10.2307/2336967 Rosenberger, J. L., & Gasko, M. (1983). Comparing location estimators: Trimmed means, medians, and trimean. In D. Hoaglin, F. Mosteller, & J. Tukey (Eds.), Understanding robust and exploratory data analysis (pp. 297– 336). New York: Wiley. SAS Institute Inc. (2011). SAS/IML User's Guide version 9.3. Cary, NC: SAS Institute Inc. Schrader, R. M., & Hettmansperger, T. P. (1980). Robust Analysis of Variance Based Upon a Likelihood Ratio Criterion. Biometrika, 67(1), 93-101. doi:10.2307/2335321 Siegel, S. (1957). Nonparametric Statistics. The American Statistician, 11(3), 13-19. Staudte, R. G., & Sheather, S. J. (1990). Robust estimation and testing. New York: John Wiley & Sons, Inc. Stigler, S. M. (1977). Do robust estimators work with real data? The Annals of Statistics, 5(6), 1055-1098. doi:10.1214/aos/1176343997 Syed Yahaya, S. S. (2005). Robust statistical procedures for testing the equality of central tendency parameters under skewed distributions. (Unpublished Doctoral thesis). Universiti Sains Malaysia, Pulau Pinang, Malaysia. Syed Yahaya, S. S., Othman, A. R., & Keselman, H. J. (2006). Comparing the “typical Score” across independent groups based on different criteria for trimming. Metodološki zvezki, 3(1), 49-62. Teh, K.W., Abdullah, S., Syed Yahaya, S. S., & Md Yusof, Z. (2014). Modified H- statistic with adaptive Winsorized mean in two groups test. AIP Conference Proceedings, 1602(1), 1021-1025. doi: 10.1063/1.4882609 Tukey, J. W. (1977). Exploratory Data Analysis. Reading, MA: Addison-Wesley. Tukey J. W., & McLaughlin D. H. (1963). Less vulnerable confidence and significance procedures for location based on a single sample: Trimming/Winsorization 1. Sankhyā: The Indian Journal of Statistics, Series A, 25(3), 331-352. Welch, B. L. (1947). The generalization of Student's problem when several different population variances are involved. Biometrika, 34(1/2), 28-35. doi:10.2307/2332510 Welch, B. L. (1951). On the comparison of several mean values: an alternative approach. Biometrika, 38(3/4), 330-336. doi:10.2307/2332579 Wilcox, R. R. (1994). A one-way random effects model for trimmed means. Psychometrika, 59(3), 289-306. doi:10.1007/BF02296126 Wilcox, R. R. (2003). Applying contemporary statistical technique. San Diego, CA: Academic Press. Wilcox, R. R. (2012). Introduction to Robust Estimation and Hypothesis Testing (3rd ed.). New York: Academic Press. Wilcox, R. R., Charlin, V. L., & Thompson, K. L. (1986). New Monte Carlo results on the robustness of the ANOVA F, W and F* statistics. Communications in Statistics-Simulations, 15(4), 933-943. doi:10.1080/03610918608812553 Wilcox, R. R., & Keselman, H. J. (2002). Power analyses when comparing trimmed means. Journal of Modern Applied Statistical Methods, 1(1), 24-31. Wilcox, R. R., & Keselman, H. J. (2003a). Modern robust data analysis methods: Measures of central tendency. Psychological Methods, 8(3), 254-274. doi:10.1037/1082-989X.8.3.254 Wilcox, R. R., & Keselman, H. J. (2003b). Repeated measures one-way ANOVA based on a modified one-step M-estimator. British Journal of Mathematical and Statistical Psychology, 56(1), 15-26. doi:10.1348/000711003321645313 Wilcox, R. R., Keselman, H. J., & Kowalchuk, R. K. (1998). Can test for treatment group equality be improved?: The bootstrap and trimmed means conjecture. British Journal of Mathematical and Statistical Psychology, 51(1), 123-134. doi:10.1111/j.2044-8317.1998.tb00670.x Wilcox, R. R., Keselman, H. J., Muska, J., & Cribbie, R. (2000). Repeated measures ANOVA: Some new results on comparing trimmed means and means. The British Psychological Society, 53(1), 69-82. doi:10.1348/000711000159187 Yang, K., Li, J., & Gao, H. (2006). The impact of sample imbalance on identifying differentially expressed genes. BMC Bioinformatics, 7(Suppl 4), S8. doi:10.1186/1471-2105-7-S4-S8