Empirical Comparison of Techniques for Handling Missing Values

The performance of all technologies is highly depended on the quality of the data. For example, Neural Network (NN) technique can be applied very well if the data have been well prepared and free from noise and missing value. This study empirically compares several handling missing value methods fo...

Full description

Saved in:

Bibliographic Details
Main Author:	Tikla, Salleh Mansour Mohamed
Format:	Thesis
Language:	eng eng
Published:	2006
Subjects:	QA76 Computer software
Online Access:	https://etd.uum.edu.my/1855/1/Salleh_Mansour_Mohamed_Tikla_-_Empirical_comparisons_of_techniques_for_handling_missing_values.pdf https://etd.uum.edu.my/1855/2/Salleh_Mansour_Mohamed_Tikla_-_Empirical_comparisons_of_techniques_for_handling_missing_values.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my-uum-etd.1855
record_format	uketd_dc
institution	Universiti Utara Malaysia
collection	UUM ETD
language	eng eng
topic	QA76 Computer software
spellingShingle	QA76 Computer software Tikla, Salleh Mansour Mohamed Empirical Comparison of Techniques for Handling Missing Values
description	The performance of all technologies is highly depended on the quality of the data. For example, Neural Network (NN) technique can be applied very well if the data have been well prepared and free from noise and missing value. This study empirically compares several handling missing value methods for NN based on literature. Six of those methods have been identified and compared using adult data set (retrieved from UCI database). The methods include mean average, replace with one, replace with zero, replace with maximum, and replace with minimum and regression. The result shows that replace with maximum value method yield better accuracy compare to the other methods.
format	Thesis
qualification_name	masters
qualification_level	Master's degree
author	Tikla, Salleh Mansour Mohamed
author_facet	Tikla, Salleh Mansour Mohamed
author_sort	Tikla, Salleh Mansour Mohamed
title	Empirical Comparison of Techniques for Handling Missing Values
title_short	Empirical Comparison of Techniques for Handling Missing Values
title_full	Empirical Comparison of Techniques for Handling Missing Values
title_fullStr	Empirical Comparison of Techniques for Handling Missing Values
title_full_unstemmed	Empirical Comparison of Techniques for Handling Missing Values
title_sort	empirical comparison of techniques for handling missing values
granting_institution	Universiti Utara Malaysia
granting_department	Faculty of Information Technology
publishDate	2006
url	https://etd.uum.edu.my/1855/1/Salleh_Mansour_Mohamed_Tikla_-_Empirical_comparisons_of_techniques_for_handling_missing_values.pdf https://etd.uum.edu.my/1855/2/Salleh_Mansour_Mohamed_Tikla_-_Empirical_comparisons_of_techniques_for_handling_missing_values.pdf
_version_	1747827218910806016
spelling	my-uum-etd.18552013-07-24T12:13:26Z Empirical Comparison of Techniques for Handling Missing Values 2006 Tikla, Salleh Mansour Mohamed Faculty of Information Technology Faculty of Information Technology QA76 Computer software The performance of all technologies is highly depended on the quality of the data. For example, Neural Network (NN) technique can be applied very well if the data have been well prepared and free from noise and missing value. This study empirically compares several handling missing value methods for NN based on literature. Six of those methods have been identified and compared using adult data set (retrieved from UCI database). The methods include mean average, replace with one, replace with zero, replace with maximum, and replace with minimum and regression. The result shows that replace with maximum value method yield better accuracy compare to the other methods. 2006 Thesis https://etd.uum.edu.my/1855/ https://etd.uum.edu.my/1855/1/Salleh_Mansour_Mohamed_Tikla_-_Empirical_comparisons_of_techniques_for_handling_missing_values.pdf application/pdf eng validuser https://etd.uum.edu.my/1855/2/Salleh_Mansour_Mohamed_Tikla_-_Empirical_comparisons_of_techniques_for_handling_missing_values.pdf application/pdf eng public masters masters Universiti Utara Malaysia Dempster, A.P., Laird, N.M. & Rubin, D.B. (1978). Maximum Likelihood from Incomplete Data via the EM Algorithm (with discussion), Journal of Royal Statistical Society,vol.82, pp. 528-550. Enders, C.K. (2001). A primer on maximum likelihood algorithms available for use with missing data. Structural Equation Modeling, 8, pp. 128-141. Enders, C.K, & Peugh, J.L. (2004). Using an EM covariance matrix to estimate structural equation models with missing data: Choosing an adjusted sample size to improve the accuracy of inferences. Structural Equation Modeling,11, pp.1-19. Enqvist, A., Karlsson, G., Loviken, G., Moller, A. , Nilseng, AB, Nilsson, C. & Olsson, L. (2005). Methodology for handling forest industry environmental data - Method report, Chalmers University of Technology. Fujikawa, Y. (2001). Efficient Algorithms for Dealing with Missing values in Knowledge Discovery. Workpaper series 01-11, Re ts - o, sov Japan. Gan X., Liew A. W. & Yanl H. (2006). Missing Microarray Data Estimation Based on Projection onto Convex Sets Method Gold, M. S. & Bentler, P. M. (2000). Treatments of missing data: A Monte Carlo comparison of RBHDI, iterative stochastic regression imputation, and expectation maximization. Structural Equation Modeling, 7, pp.319-355. Graham, J.W. & Hofer, S.M. (2000). Multiple imputation in multivariate research. In T.D. Little, K.U. Schnabel, and J. Baumert (Eds.), Modeling longitudinal and multilevel data: Practical issues, applied approaches, and specific examples (pp. 201-218). Heitjan, D.F. (1997). Annotation: What can be done about missing data? Approaches to imputation. American Journal of Public Health, 87(4), pp. 548-550. Howell, D.C. (2002). Treatment of Missing Data, retrieved from http://www.uvm.edu/~dhowell/StatPages/StatHomePage.html Huisman, M. & Goudriaan, H. (2001). Handling missing item responses due to item non response and incomplete designs. In J. Bethlehem & S. van Buuren (Eds.), pp. 57-73. Information Technology Services at The University of Texas at Austin, http://InformationTechnologyServices.htm, May 10,2004. Joseph L. S. & John W. G., (2002). Missing Data: Our View of the State of the Art, Psychological Methods, Inc. 2002, 7(2), pp. 147-177. Lakshminarayan K., Harp, S. A. and Samad, T. (1999). Imputation of Missing Data in Industrial Databases, Applied Intelligence, vol 11, pp. 259-275. Little, R. J. A. & Rubin, D. A. (1987).Statistical analysis with missing data. John Wiley and Sons. Little, R.J.A. & Rubin, D.B. (1989). The analysis of social science data with missing values, Sociological Methods and Research, 18, pp. 292-326. Luo J, TaoYang & Yan Wang (2005). Missing Value Estimation For Microarray Data Based On Fuzzy C-means Clustering, School Computer and Communication, Hunan University, Changsha, China pp. 611-616 Mitchell T.M. (1997). Machine Learning, McGraw-Hill, Muhammad Shoaib B. S., Gondal I. & Dooley L. (2005). A Collateral Missing Value Estimation Algorithm For Dna Microarrays, Gscit, Monash University, VIC 3842, Australia IEEE Muhammad Shoaib B. S., Gondal I. & Dooley L. (2005). K-Ranked Covariance Based Missing Values Estimation for Microarray Data Classification, Monash University, VIC 3842, Australia IEEE Muthen, B., Kaplan, D. & Hollis, M. (1987). On structural equation modeling with data that are not missing completely at random. Psychometrika, 51, pp. 431-462. MuthCn, L.K. & Muthen, B.O. (2002). How to use a Monte Carlo study to decide on sample size and determine power. Structural Equation Modeling, 4, pp. 599-620. Rahm E. & Do H. H. (2000). Data Cleaning: Problems and Current Approaches, Bulletin of the IEEE Technical Committee on Data Engineering, pp. 313. Roth, P. (1994). Missing data: A conceptual review for applied psychologists. Personnel Psychology, 47, 537-560. Utsunomiya K. & Sonoda K. (2002). Methodology for Handling Missing Values in TANKAN, Research and Statistics Department, Bank of Japan, C.P.0 BOX203 TOKYO, 100-8630 JAPAN Vach, W. (1994), Missing Values: Statistical Theory and Computational Practice, In: P. Dirschedl, and R. Ostermann,(Eds.), Computational Statistics,Physica-Verlag, pp.345-354. Wagstaff K. L. & Laidler V. G. (2005). Making the Most of Missing Values: Object Clustering with Partial Data in Astronomy, Astronomical Data Analysis Software and Systems XIV P2.1.25, ASP Conference Series, VXXX, 2005 Wang, X.., Ao, L., Jiang, Z. & Feng, H. (2005).Novel method for missing value estimation in gene expression profile based on support vector regression, Department of Electronic Science and Technology, University of Science and Technology of China, Hefei, Anhui 230026, China Yuan, K. H. & Bentler, P.M.. (2000). Three Likelihood-Based Methods for Mean and Covariance Structure Analysis with Non-Normal Missing Data, Sociological Methodology, pp. 165-200. Zhang, S., Qin, Z., Ling, C. & Sheng, S. (2005). Missing is Useful: Missing Values in Cost sensitive Decision Trees, IEEE Transactions on Knowledge and Data Engineering,17(12), pp. 1689-1693.

Empirical Comparison of Techniques for Handling Missing Values

Similar Items