Empirical Comparison of Techniques for Handling Missing Values

The performance of all technologies is highly depended on the quality of the data. For example, Neural Network (NN) technique can be applied very well if the data have been well prepared and free from noise and missing value. This study empirically compares several handling missing value methods fo...

Full description

Saved in:
Bibliographic Details
Main Author: Tikla, Salleh Mansour Mohamed
Format: Thesis
Language:eng
eng
Published: 2006
Subjects:
Online Access:https://etd.uum.edu.my/1855/1/Salleh_Mansour_Mohamed_Tikla_-_Empirical_comparisons_of_techniques_for_handling_missing_values.pdf
https://etd.uum.edu.my/1855/2/Salleh_Mansour_Mohamed_Tikla_-_Empirical_comparisons_of_techniques_for_handling_missing_values.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-uum-etd.1855
record_format uketd_dc
institution Universiti Utara Malaysia
collection UUM ETD
language eng
eng
topic QA76 Computer software
spellingShingle QA76 Computer software
Tikla, Salleh Mansour Mohamed
Empirical Comparison of Techniques for Handling Missing Values
description The performance of all technologies is highly depended on the quality of the data. For example, Neural Network (NN) technique can be applied very well if the data have been well prepared and free from noise and missing value. This study empirically compares several handling missing value methods for NN based on literature. Six of those methods have been identified and compared using adult data set (retrieved from UCI database). The methods include mean average, replace with one, replace with zero, replace with maximum, and replace with minimum and regression. The result shows that replace with maximum value method yield better accuracy compare to the other methods.
format Thesis
qualification_name masters
qualification_level Master's degree
author Tikla, Salleh Mansour Mohamed
author_facet Tikla, Salleh Mansour Mohamed
author_sort Tikla, Salleh Mansour Mohamed
title Empirical Comparison of Techniques for Handling Missing Values
title_short Empirical Comparison of Techniques for Handling Missing Values
title_full Empirical Comparison of Techniques for Handling Missing Values
title_fullStr Empirical Comparison of Techniques for Handling Missing Values
title_full_unstemmed Empirical Comparison of Techniques for Handling Missing Values
title_sort empirical comparison of techniques for handling missing values
granting_institution Universiti Utara Malaysia
granting_department Faculty of Information Technology
publishDate 2006
url https://etd.uum.edu.my/1855/1/Salleh_Mansour_Mohamed_Tikla_-_Empirical_comparisons_of_techniques_for_handling_missing_values.pdf
https://etd.uum.edu.my/1855/2/Salleh_Mansour_Mohamed_Tikla_-_Empirical_comparisons_of_techniques_for_handling_missing_values.pdf
_version_ 1747827218910806016
spelling my-uum-etd.18552013-07-24T12:13:26Z Empirical Comparison of Techniques for Handling Missing Values 2006 Tikla, Salleh Mansour Mohamed Faculty of Information Technology Faculty of Information Technology QA76 Computer software The performance of all technologies is highly depended on the quality of the data. For example, Neural Network (NN) technique can be applied very well if the data have been well prepared and free from noise and missing value. This study empirically compares several handling missing value methods for NN based on literature. Six of those methods have been identified and compared using adult data set (retrieved from UCI database). The methods include mean average, replace with one, replace with zero, replace with maximum, and replace with minimum and regression. The result shows that replace with maximum value method yield better accuracy compare to the other methods. 2006 Thesis https://etd.uum.edu.my/1855/ https://etd.uum.edu.my/1855/1/Salleh_Mansour_Mohamed_Tikla_-_Empirical_comparisons_of_techniques_for_handling_missing_values.pdf application/pdf eng validuser https://etd.uum.edu.my/1855/2/Salleh_Mansour_Mohamed_Tikla_-_Empirical_comparisons_of_techniques_for_handling_missing_values.pdf application/pdf eng public masters masters Universiti Utara Malaysia Dempster, A.P., Laird, N.M. & Rubin, D.B. (1978). Maximum Likelihood from Incomplete Data via the EM Algorithm (with discussion), Journal of Royal Statistical Society,vol.82, pp. 528-550. Enders, C.K. (2001). A primer on maximum likelihood algorithms available for use with missing data. Structural Equation Modeling, 8, pp. 128-141. Enders, C.K, & Peugh, J.L. (2004). Using an EM covariance matrix to estimate structural equation models with missing data: Choosing an adjusted sample size to improve the accuracy of inferences. Structural Equation Modeling,11, pp.1-19. Enqvist, A., Karlsson, G., Loviken, G., Moller, A. , Nilseng, AB, Nilsson, C. & Olsson, L. (2005). Methodology for handling forest industry environmental data - Method report, Chalmers University of Technology. Fujikawa, Y. (2001). Efficient Algorithms for Dealing with Missing values in Knowledge Discovery. Workpaper series 01-11, Re ts - o, sov Japan. Gan X., Liew A. W. & Yanl H. (2006). Missing Microarray Data Estimation Based on Projection onto Convex Sets Method Gold, M. S. & Bentler, P. M. (2000). Treatments of missing data: A Monte Carlo comparison of RBHDI, iterative stochastic regression imputation, and expectation maximization. Structural Equation Modeling, 7, pp.319-355. Graham, J.W. & Hofer, S.M. (2000). Multiple imputation in multivariate research. In T.D. Little, K.U. Schnabel, and J. Baumert (Eds.), Modeling longitudinal and multilevel data: Practical issues, applied approaches, and specific examples (pp. 201-218). Heitjan, D.F. (1997). Annotation: What can be done about missing data? Approaches to imputation. American Journal of Public Health, 87(4), pp. 548-550. Howell, D.C. (2002). Treatment of Missing Data, retrieved from http://www.uvm.edu/~dhowell/StatPages/StatHomePage.html Huisman, M. & Goudriaan, H. (2001). Handling missing item responses due to item non response and incomplete designs. In J. Bethlehem & S. van Buuren (Eds.), pp. 57-73. Information Technology Services at The University of Texas at Austin, http://InformationTechnologyServices.htm, May 10,2004. Joseph L. S. & John W. G., (2002). Missing Data: Our View of the State of the Art, Psychological Methods, Inc. 2002, 7(2), pp. 147-177. Lakshminarayan K., Harp, S. A. and Samad, T. (1999). Imputation of Missing Data in Industrial Databases, Applied Intelligence, vol 11, pp. 259-275. Little, R. J. A. & Rubin, D. A. (1987).Statistical analysis with missing data. John Wiley and Sons. Little, R.J.A. & Rubin, D.B. (1989). The analysis of social science data with missing values, Sociological Methods and Research, 18, pp. 292-326. Luo J, TaoYang & Yan Wang (2005). Missing Value Estimation For Microarray Data Based On Fuzzy C-means Clustering, School Computer and Communication, Hunan University, Changsha, China pp. 611-616 Mitchell T.M. (1997). Machine Learning, McGraw-Hill, Muhammad Shoaib B. S., Gondal I. & Dooley L. (2005). A Collateral Missing Value Estimation Algorithm For Dna Microarrays, Gscit, Monash University, VIC 3842, Australia IEEE Muhammad Shoaib B. S., Gondal I. & Dooley L. (2005). K-Ranked Covariance Based Missing Values Estimation for Microarray Data Classification, Monash University, VIC 3842, Australia IEEE Muthen, B., Kaplan, D. & Hollis, M. (1987). On structural equation modeling with data that are not missing completely at random. Psychometrika, 51, pp. 431-462. MuthCn, L.K. & Muthen, B.O. (2002). How to use a Monte Carlo study to decide on sample size and determine power. Structural Equation Modeling, 4, pp. 599-620. Rahm E. & Do H. H. (2000). Data Cleaning: Problems and Current Approaches, Bulletin of the IEEE Technical Committee on Data Engineering, pp. 313. Roth, P. (1994). Missing data: A conceptual review for applied psychologists. Personnel Psychology, 47, 537-560. Utsunomiya K. & Sonoda K. (2002). Methodology for Handling Missing Values in TANKAN, Research and Statistics Department, Bank of Japan, C.P.0 BOX203 TOKYO, 100-8630 JAPAN Vach, W. (1994), Missing Values: Statistical Theory and Computational Practice, In: P. Dirschedl, and R. Ostermann,(Eds.), Computational Statistics,Physica-Verlag, pp.345-354. Wagstaff K. L. & Laidler V. G. (2005). Making the Most of Missing Values: Object Clustering with Partial Data in Astronomy, Astronomical Data Analysis Software and Systems XIV P2.1.25, ASP Conference Series, VXXX, 2005 Wang, X.., Ao, L., Jiang, Z. & Feng, H. (2005).Novel method for missing value estimation in gene expression profile based on support vector regression, Department of Electronic Science and Technology, University of Science and Technology of China, Hefei, Anhui 230026, China Yuan, K. H. & Bentler, P.M.. (2000). Three Likelihood-Based Methods for Mean and Covariance Structure Analysis with Non-Normal Missing Data, Sociological Methodology, pp. 165-200. Zhang, S., Qin, Z., Ling, C. & Sheng, S. (2005). Missing is Useful: Missing Values in Cost sensitive Decision Trees, IEEE Transactions on Knowledge and Data Engineering,17(12), pp. 1689-1693.