An Integrated Principal Component Analysis And Weighted Apriori-T Algorithm For Imbalanced Data Root Cause Analysis

Root Cause Analysis (RCA) is often used in manufacturing analysis to prevent the reoccurrence of undesired events. Association rule mining (ARM) was introduced in RCA to extract frequently occur patterns, interesting correlations, associations or casual structures among items in the database. Howev...

Full description

Saved in:

Bibliographic Details
Main Author:	Ong, Phaik Ling
Format:	Thesis
Language:	English English
Published:	2016
Subjects:	Q Science (General) QA Mathematics
Online Access:	http://eprints.utem.edu.my/id/eprint/18350/1/An%20Integrated%20Principal%20Component%20Analysis%20And%20Weighted%20Apriori-T%20Algorithm%20For%20Imbalanced%20Data%20Root%20Cause%20Analysis.pdf http://eprints.utem.edu.my/id/eprint/18350/2/An%20Integrated%20Principal%20Component%20Analysis%20And%20Weighted%20Apriori-T%20Algorithm%20For%20Imbalanced%20Data%20Root%20Cause%20Analysis.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my-utem-ep.18350
record_format	uketd_dc
institution	Universiti Teknikal Malaysia Melaka
collection	UTeM Repository
language	English English
topic	Q Science (General) QA Mathematics
spellingShingle	Q Science (General) QA Mathematics Ong, Phaik Ling An Integrated Principal Component Analysis And Weighted Apriori-T Algorithm For Imbalanced Data Root Cause Analysis
description	Root Cause Analysis (RCA) is often used in manufacturing analysis to prevent the reoccurrence of undesired events. Association rule mining (ARM) was introduced in RCA to extract frequently occur patterns, interesting correlations, associations or casual structures among items in the database. However, frequent pattern mining (FPM) using Apriori-like algorithms and support-confidence framework suffers from the myth of rare item problem in nature. This has greatly reduced the performance of RCA, especially in manufacturing domain, where existence of imbalanced data is a norm in a production plant. In addition, exponential growth of data causes high computational costs in Apriori-like algorithms. Hence, this research aims to propose a two stage FPM, integrating Principal Component Analysis (PCA) and Weighted Apriori-T (PCA-WAT) algorithm to address these problems. PCA is used to generate item weight by considering maximally distributed covariance to normalise the effect of rare items. Using PCA, significant rare item will have a higher weight while less significant high occurance item will have a lower weight. On the other hand, Apriori-T with indexing enumeration tree is used for low cost FPM. A semiconductor manufacturing case study with Work In Progress data and true alarm data is used to proof the proposed algorithm. The proposed PCA-WAT algorithm is benchmarked with the Apriori and Apriori-T algorithms.Comparison analysis on weighted support has been performed to evaluate the capability of PCA in normalising item’s support value. The experimental results have proven that PCA is able to normalise the item support value and reduce the influence of imbalance data in FPM.Both quality and performance measure are used as performance measurement. The quality measures aim to compare the frequent itemsets and interesting rules generated across different support and confidence thresholds, ranging from 5% to 20%, and 10% to 90% respectively.The rules validation involves a business analyst from the related field. The domain expert has verified that the generated rules are able to explain the contributing factors towards failure analysis. However, significant rare rules are not easily discovered because the normalized weighted support values are generally lower compared to the original suppport values. The performance measures aim to compare the execution time in second (s) and the execution Random Access Memory (RAM) in megabyte (MB). The experiment results proven that the implementation of Apriori-T has lowered the computational cost by at least 90% of computation time and 35.33% of computation RAM as compared to Apriori. The primary contribution of this study is to propose a two-stage FPM to perform RCA in manufacturing domain with the existence of imbalanced dataset. In conclusion, the proposed algorithm is able to overcome the rare item issue by implementing covariance based support value normalization and high computational costs issue by implementing indexing enumeration tree structure.Future work of this study should focus on rule interpretation to generate more human understandable rule by novice in data mining. In addition, suitable support and confidence thresholds are needed after the normalisation process to better discover the significant rare itemset.
format	Thesis
qualification_name	Master of Philosophy (M.Phil.)
qualification_level	Master's degree
author	Ong, Phaik Ling
author_facet	Ong, Phaik Ling
author_sort	Ong, Phaik Ling
title	An Integrated Principal Component Analysis And Weighted Apriori-T Algorithm For Imbalanced Data Root Cause Analysis
title_short	An Integrated Principal Component Analysis And Weighted Apriori-T Algorithm For Imbalanced Data Root Cause Analysis
title_full	An Integrated Principal Component Analysis And Weighted Apriori-T Algorithm For Imbalanced Data Root Cause Analysis
title_fullStr	An Integrated Principal Component Analysis And Weighted Apriori-T Algorithm For Imbalanced Data Root Cause Analysis
title_full_unstemmed	An Integrated Principal Component Analysis And Weighted Apriori-T Algorithm For Imbalanced Data Root Cause Analysis
title_sort	integrated principal component analysis and weighted apriori-t algorithm for imbalanced data root cause analysis
granting_institution	Universiti Teknikal Malaysia Melaka
granting_department	Faculty of Information and Communication Technology
publishDate	2016
url	http://eprints.utem.edu.my/id/eprint/18350/1/An%20Integrated%20Principal%20Component%20Analysis%20And%20Weighted%20Apriori-T%20Algorithm%20For%20Imbalanced%20Data%20Root%20Cause%20Analysis.pdf http://eprints.utem.edu.my/id/eprint/18350/2/An%20Integrated%20Principal%20Component%20Analysis%20And%20Weighted%20Apriori-T%20Algorithm%20For%20Imbalanced%20Data%20Root%20Cause%20Analysis.pdf
_version_	1747833919484461056
spelling	my-utem-ep.183502021-10-10T15:28:01Z An Integrated Principal Component Analysis And Weighted Apriori-T Algorithm For Imbalanced Data Root Cause Analysis 2016 Ong, Phaik Ling Q Science (General) QA Mathematics Root Cause Analysis (RCA) is often used in manufacturing analysis to prevent the reoccurrence of undesired events. Association rule mining (ARM) was introduced in RCA to extract frequently occur patterns, interesting correlations, associations or casual structures among items in the database. However, frequent pattern mining (FPM) using Apriori-like algorithms and support-confidence framework suffers from the myth of rare item problem in nature. This has greatly reduced the performance of RCA, especially in manufacturing domain, where existence of imbalanced data is a norm in a production plant. In addition, exponential growth of data causes high computational costs in Apriori-like algorithms. Hence, this research aims to propose a two stage FPM, integrating Principal Component Analysis (PCA) and Weighted Apriori-T (PCA-WAT) algorithm to address these problems. PCA is used to generate item weight by considering maximally distributed covariance to normalise the effect of rare items. Using PCA, significant rare item will have a higher weight while less significant high occurance item will have a lower weight. On the other hand, Apriori-T with indexing enumeration tree is used for low cost FPM. A semiconductor manufacturing case study with Work In Progress data and true alarm data is used to proof the proposed algorithm. The proposed PCA-WAT algorithm is benchmarked with the Apriori and Apriori-T algorithms.Comparison analysis on weighted support has been performed to evaluate the capability of PCA in normalising item’s support value. The experimental results have proven that PCA is able to normalise the item support value and reduce the influence of imbalance data in FPM.Both quality and performance measure are used as performance measurement. The quality measures aim to compare the frequent itemsets and interesting rules generated across different support and confidence thresholds, ranging from 5% to 20%, and 10% to 90% respectively.The rules validation involves a business analyst from the related field. The domain expert has verified that the generated rules are able to explain the contributing factors towards failure analysis. However, significant rare rules are not easily discovered because the normalized weighted support values are generally lower compared to the original suppport values. The performance measures aim to compare the execution time in second (s) and the execution Random Access Memory (RAM) in megabyte (MB). The experiment results proven that the implementation of Apriori-T has lowered the computational cost by at least 90% of computation time and 35.33% of computation RAM as compared to Apriori. The primary contribution of this study is to propose a two-stage FPM to perform RCA in manufacturing domain with the existence of imbalanced dataset. In conclusion, the proposed algorithm is able to overcome the rare item issue by implementing covariance based support value normalization and high computational costs issue by implementing indexing enumeration tree structure.Future work of this study should focus on rule interpretation to generate more human understandable rule by novice in data mining. In addition, suitable support and confidence thresholds are needed after the normalisation process to better discover the significant rare itemset. 2016 Thesis http://eprints.utem.edu.my/id/eprint/18350/ http://eprints.utem.edu.my/id/eprint/18350/1/An%20Integrated%20Principal%20Component%20Analysis%20And%20Weighted%20Apriori-T%20Algorithm%20For%20Imbalanced%20Data%20Root%20Cause%20Analysis.pdf text en public http://eprints.utem.edu.my/id/eprint/18350/2/An%20Integrated%20Principal%20Component%20Analysis%20And%20Weighted%20Apriori-T%20Algorithm%20For%20Imbalanced%20Data%20Root%20Cause%20Analysis.pdf text en validuser https://plh.utem.edu.my/cgi-bin/koha/opac-detail.pl?biblionumber=100237 mphil masters Universiti Teknikal Malaysia Melaka Faculty of Information and Communication Technology 1. Agrawal, R. and Srikant, R., 1994. Fast Algorithms for Mining Association Rules. In: Proceedings of the 1993 ACM SIGMOD international Conference on Very Large Databases. pp.487–499. 2. Alhammady, H. and Ramamohanarao, K., 2004. The Application of Emerging Patterns for Improving the Quality of Rare-Class Classification. In: Advances in Knowledge Discovery and Data Mining. Springer Berlin Heidelberg, pp.207–211. 3. Altuntas, S. and Selim, H., 2012. Facility Layout using Weighted Association Rule-Based Data Mining Algorithms: Evaluation with Simulation. Expert Systems with Applications, 39 (1), pp.3–13. 4. Anonymous, 2015. Wire Bond [online]. Available at: http://www.semi-tech.com.sg/service2.aspx [Accessed 15 Nov 2015]. 5. Anthony, W., 2014. Automated Weight Generation for Weighted Association Rule Mining. The University of Auckland. 6. Bayardo, R.J., Agrawal, R., Gunopulos, D., and R.Agrawal, 2000. Constraint-Based Rule Mining in Large Dense Databases. Data Mining and Knowledge Discovery, 4 (2), pp.217–240. 7. Ben-Gal, I., 2006. Outlier Detection. In: O. Maimon and L. Rokach, eds. Data Mining and Knowledge Discovery Handbook. London: Springer, pp.131–146. 8. Brachman, R.J. and Anand, T., 1996. The Process of Knowledge Discovery in Databases. In: Advances in Knowledge Discovery and Data Mining. California: American Association for Artificial Intelligence, pp.37–57. 9. Buddhakulsomsiri, J. and Zakarian, A., 2005. Mining Warranty Data in Manufacturing Industry Mining Warranty Data in Manufacturing Industry. The University of Michigan-Dearborn. 10. Cai, C.H., Fu, A.W.C., Cheng, C.H., and Kwong, W.W., 1998. Mining Association Rules with Weighted Items. International Database Engineering and Applications Symposium, pp.68–77. 11. Charlton, M., Brunsdon, C., Demšar, U., Harris, P., and Stewart, A., 2010. Principal Components Analysis : from Global to Local. In: 13th AGILE International Conference on Geographic Information Science. Guimaraes, Portugal, pp.1–10. 12. Chawla, N. V, Japkowicz, N., and Aleksander, K., 2004. Editorial : Special Issue on Learning from Imbalanced Data Sets. ACM Sigkdd Explorations Newsletter, 6 (1), pp.1–6. 13. Chen, W.-C., Tseng, S.-S., and Wang, C.-Y., 2005. A Novel Manufacturing Defect Detection Method using Association Rule Mining Techniques. Expert Systems with Applications. 14. Chien, C.-F., Chang, K.-H., and Wang, W.-C., 2014. An Empirical Study of Design-Of-Experiment Data Mining for Yield-Loss Diagnosis for Semiconductor Manufacturing. Journal of Intelligent Manufacturing, 25 (5), pp.961–972. 15. Chien, C.-F., Cheng, J.-C., and Lin, Y.-S., n.d. A Hybrid Decision Tree Approach for Semiconductor Manufacturing Data Mining and An Empirical Study. 16. Choudhary, a. K., Harding, J. a., and Tiwari, M.K., 2009. Data Mining in Manufacturing: A Review Based on the Kind of Knowledge. Journal of Intelligent Manufacturing, 20 (5), pp.501–521. 17. Choudhary, A., Harding, J., and H Lin, 2007. Engineering Moderator to Universal Knowledge Moderator for Moderating Collaborative Projects. Global Journal of e-Business & Knowledge Management, 3 (1), pp.5–12. 18. Cios, K., Swiniarski, R., Pedrycz, W., and Kurgan, L., 2007. The Knowledge Discovery Process. In: Data Mining. Springer, pp.9–24. 19. Cios, K.J., Pedrycz, W., Swiniarski, R.W., and Kurgan, L., 1998a. Data Mining : A knowledge Discovery Approach. Springer. 20. Cios, K.J., Swiniarski, R.W., Pedrycz, W., and Kurgan, L.A., 1998b. Data Mining and KNowledge Discovery. In: Data Mining: A knowledge Discovery Approach. Kluwer Academic, pp.495. 21. Cohen, E., Datar, M., Fujiwara, S., Gionis, A., Indyk, P., Motwani, R., Ullman, J.D., and Yang, C., 2001. Finding Interesting Associations without Support Pruning. Knowledge and Data Engineering, 13 (1), pp.64–78. 22. Dalal, S. and Chhillar, R.S., 2013. Empirical Study of Root Cause Analysis of Software Failure. ACM SIGSOFT Software Engineering Notes, 38 (4), pp.1–7. 23. Davis, J. and Edgar, T., 2009. Smart Process Manufacturing : An Operation and Technology Roadmap. Engineering Virtual Organizationmart. 24. Dew, J., 1991. In Search of the Root Cause. Quality Progress. 25. Doggett, A.M., 2005. Root Cause Analysis: A Framework for Tool Selection. Quality Management Journal, 14 (4), pp.34. 26. Durham, J., Marcos, V.J., Vincent, T., Martinez, J., Shelton, S., Fortner, G., Clayton, M., and Felker, S., 1995. Automation and Statistical Process Control of a Single Wafer Etcher in a Manufacturing Environment. Proceedings of SEMI Advanced Semiconductor Manufacturing Conference and Workshop, pp.3–5. 27. Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P., 1996. The KDD Process for Extracting Useful Knowledge from Volumes of Data. Communications of the ACM, 39 (11), pp.27–34. 28. Fayyad, U. and Uthurusamy, R., 1996. Data Mining and Knowledge Discovery in Databases. Communications of the ACM, 39 (11), pp.24–26. 29. Filmer, D. and Pritchett., L.H., 2001. Estimating Wealth Effects without Expenditure Data—Or tears: An Application to Educational Enrollments in States of India. Demography, 38 (1), pp.115–132. 30. Frawley, W.J., Piatetsky-shapiro, G., and Matheus, C.J., 1992. Knowledge Discovery in Databases : An Overview. AI Magazine, 13 (3), pp.57–70. 31. Gardner, M. and Bieker, J., 2000. Data Mining Solves Tough Semiconductor Manufacturing Problems. Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining KDD 00, pp.376–383. 32. Geist, I., 2002. A Framework for Data Mining and KDD. In: Symposium on Applied computing. New York: ACM, pp.508–513. 33. Gertosio, C. and Dussauchoy, A., 2004. Knowledge Discovery from Industrial Databases. Journal of Intelligent Manufacturing, 15 (1), pp.29–37. 34. Gothane, S. and Bamnote, G.R., 2012. An Automated Weighted Support Approach Based Associative Classification with Analytical Study For Health Disease Prediction. International Journal of Engineering Research and Applications (IJERA), 2 (5), pp.458–463. 35. Gröger, C., Niedermann, F., and Mitschang, B., 2012. Data Mining-Driven Manufacturing Process Optimization. Proceedings of the World Congress on Engineering, 3, pp.1–6. 36. Guo, X., Yin, Y., Dong, C., Yang, G., and Guangtong Zhou, 2008. On the Class Imbalance Problem. Fourth International Conference on Natural Computation, 4, pp.192–201. 37. Han, S., Yuan, B., and Liu, W., 2009. Rare Class Mining : Progress and Prospect. Chinese Conference Pattern Recognition, pp.1–5. 38. Harding, J. a., Shahbaz, M., Srinivas, and Kusiak, a., 2006. Data Mining in Manufacturing: A Review. Journal of Manufacturing Science and Engineering, 128 (4), pp.969. 39. Hausmann, R. and Hidalgo, C., 2014. The Atlas of Economic Complexity: Mapping Paths to Prosperity. MIT Press. 40. He, S.G., Zhen, H., Wang, A., and Li, L., 2009. Quality Improvement using Data Mining in Manufacturing Processes. In: J. Ponce and A. Karahoca, eds. Data Mining and Knowledge Discovery in Real Life Applications. I-Tech Education and Publishing, pp.436. 41. Van Hulse, J. and Khoshgoftaar, T., 2009. Knowledge Discovery from Imbalanced and Noisy Data. Data & Knowledge Engineering, 68 (12), pp.1513–1542. 42. James, M., Jeff, S., Richarcd, D., Gernot, S., Louis, R., Jan, M., Jaana, R., Charles, R., Katy, G., David, O., and Sreenivas, R., 2012. Manufacturing the Future : The Next Era of Global Growth and Innovation. 43. Jian, W. and Ming, L.X., 2008. An Effective Algorithm for Mining Weighted Association Rules in Telecommunication Networks. Journal of Computers 3, 3 (10), pp.20–27. 44. Jolliffe, I., 2002. Principal Component Analysis. 2nd ed. New York: Springer Berlin Heidelberg. 45. Kamsu-Foguem, B., Rigal, F., and Mauget, F., 2013. Mining Association Rules for the Quality Improvement of the Production Process. Expert Systems with Applications, 40 (4), pp.1034–1045. 46. Kayal, P. and Kannan, S., 2016. A Partial Weighted Utility Measure for Fuzzy Association Rule Mining. Indian Journal of Science and Technology, 9 (10), pp.1–6. 47. Keqin, W., Shurong, T., Eynard, B., Roucoules, L., and Matta, N., 2007. Review on Application of Data Mining in Product Design and Manufacturing. In: Fuzzy Systems and Knowledge Discovery FSKD. IEEE, pp.613–618. 48. Kerdprasop, K. and Kerdprasop, N., 2011. A Data Mining Approach to Automate Fault Detection Model Development in the Semiconductor Manufacturing Process. International Journal of Mechanics, 5 (4), pp.336–344. 49. Kleinberg, J.M., 1999. Authoritative Sources in a Hyperlinked Environment. Journal of the ACM, 46 (5), pp.604–632. 50. Koh, Y.S. and Nathan, R., 2009. Rare Association Rule Mining: An Overview. In: Y.S. Koh, ed. Rare Association Rule Mining and Knowledge Discovery: Technologies for Infrequent and Critical Event Detection. New York: IGI Global, pp.320. 51. Koh, Y.S., Pears, R., and Dobbie, G., 2011. Automatic Assignment of Item Weights for Pattern Mining on Data Streams. In: Advances in Knowledge Discovery and Data Mining. Springer Berlin Heidelberg, pp.387–398. 52. Koh, Y.S., Pears, R., and Yeap, W., 2010. Valency Based Weighted Association Rule Mining. Advances in Knowledge Discovery and Data Mining, pp.274–285. 53. Koh, Y.S. and Rountree, N., 2005. Finding Sporadic Rules Using Apriori-Inverse. Advances in Knowledge Discovery and Data Mining, pp.97–106. 54. Koh, Y.S., Rountree, N., and O’Keefe, R. a., 2007. Mining Interesting Imperfectly Sporadic Rules. Knowledge and Information Systems, 14 (2), pp.179–196. 55. Kurgan, L. A. & Musilek, P., 2006. A Survey of Knowledge Discovery and Data Mining Process Models. The Knowledge Engineering Review, 21 (1), pp.1–24. 56. Kusiak, a. and Kurasek, C., 2001. Data Mining of Printed-Circuit Board Defects. IEEE Transactions on Robotics and Automation, 17 (2), pp.191–196. 57. Laura, S.-C., 2013. Measuring data quality for ongoing improvement: A data quality assessment framework. Waltham: Morgan Kaufmann. 58. Laxman, S., Shadid, B., Sastry, P.S., and Unnikrishnan, K.P., 2009. Temporal Data Mining for Root-Cause Analysis of Machine Faults in Automotive Assembly Lines. arXiv preprint arXiv:0904.4608, pp.1–15. 59. Lee, D., Park, S.H., and Moon, S., 2013. Utility-Based Association Rule Mining: A Marketing Solution for Cross-Selling. Expert Systems with Applications, 40 (7), pp.2715–2725. 60. Li, J., Zhang, X., Dong, G., Ramamohanarao, K., and Sun, Q., 1999. Efficient Mining of High Confidence Association Rules without Support Thresholds. Principles of Data Mining and Knowledge Discovery, pp.406–411. 61. Li, T.-S., Huang, C.-L., and Wu, Z.-Y., 2006. Data Mining using Genetic Programming for Construction of a Semiconductor Manufacturing Yield Rate Prediction System. Journal of Intelligent Manufacturing, 17 (3), pp.355–361. 62. Lin, L., Ma, J., Ye, X., and Xu, X., 2010. Mechanical fault prediction based on principal component analysis. The 2010 IEEE International Conference on Information and Automation, pp.2258–2262. 63. Lin, L. and Shyu, M.L., 2010. Weighted Association Rule Mining for Video Semantic Detection. International Journal of Multimedia Data Engineering & Management, 1 (1), pp.37–54. 64. Liu, B., Hsu, W., and Ma, Y., 1999. Mining Association Rules with Multiple Minimum Supports. In: Proceedings of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp.337–341. 65. Malarvizhi, S.P. and Sathiyabhama, B., 2016. Frequent Pagesets from Web Log by Enhanced Weighted Association Rule Mining. Cluster Computing, 19 (1), pp.269–277. 66. Marazopoulou, K., Ghosh, R., Lade, P., and Jensen, D., 2016. Causal Discovery for Manufacturing Domains. arXiv preprint arXiv:1605.04056. 67. Marbán, O., Segovia, J., Menasalvas, E., and Fernández-Baizán, C., 2009. Toward Data Mining Engineering: A Software Engineering Approach. Information Systems, 34 (1), pp.87–107. 68. Mariscal, G., Marbán, Ó., and Fernández, C., 2010. A Survey of Data Mining and Knowledge Discovery Process Models and Methodologies. The Knowledge Engineering Review, 25 (02), pp.137–166. 69. Martínez-de-Pisón, Javier, F., Andrés Sanz, E.M.-P., Emilio, J., and Dante, C., 2012. Mining Association Rules from Time Series to Explain Failures in a Hot-Dip Galvanizing Steel Line. Computers & Industrial Engineering, 63 (1), pp.22–36. 70. Mehmed, K., 2011. Data Mining: Concepts, Models, Methods, and Algorithms. 2nd ed. Wiley-Blackwell. 71. Moser, C. and Felton, A., 2009. The Construction of an Asset Index Measuring Asset Accumulation in Ecuador. Poverty dynamics Interdisciplinary Perspectives. 72. Moyle, S. and Jorge, A., 2001. RAMSYS - A methodology for supporting rapid remote collaborative data mining projects. ECML/PKDD01 Workshop: Integrating Aspects of Data Mining, Decision Support and Meta-learning (IDDM-2001). 73. Nedelcu, B., 2013. About Big Data and its Challenges and Benefits in Manufacturing. Database Systems Journal BOARD, IV (3), pp.10–19. 74. Ngai, E.W.T., Hu, Y., Wong, Y.H., Chen, Y., and Sun, X., 2011. The Application of Data Mining Techniques in Financial Fraud Detection: A Classification Framework and an Academic Review of Literature. Decision Support Systems, 50 (3), pp.559–569. 75. Padmavalli, M. and Sreenivasa Rao, K., 2013. An Efficient Interesting Weighted Association Rule Mining. International Journal of Advanced Research in Computer Science and Software Engineering (IJARCSSE), 3 (10), pp.1059–1064. 76. Pears, R., Koh, Y.S., and Dobbie, G., 2010. EWGen : Automatic Generation of Item Weights for Weighted Association Rule Mining. In Advanced Data Mining and Applications, pp.36–47. 77. Pears, R., Koh, Y.S., Dobbie, G., and Yeap, W., 2013. Weighted Association Rule Mining via a Graph Based Connectivity Model. Information Sciences, 218, pp.61–84. 78. Pete, C., Julian, C., Randy, K., Thomas, K., Thomas, R., Colin, S., and Wirth, R., 2000. CRISP-DM 1.0. CRISP-DM Consortium. 79. Piatetsky-Shapiro, G., 2000. Knowledge Discovery in Databases: 10 Years After. ACM SIGKDD Explorations Newsletter, 1 (2), pp.59–61. 80. Pisalpanus, S., 2012. A Landmark Model for Assigning Item Weight for Pattern Mining. Auckland University of Technology. 81. Polczynski, M. and Kochanski, A., 2010. Knowledge Discovery and Analysis in Manufacturing. Quality Engineering, 22 (3), pp.169–181. 82. Premalatha, S. and Usha Nandhini, C., 2015. Efficiently Generating The Rank Based Weighted Association Rule Mining Using Apriori Algorithm In High Biological Database. International Research Journal of Engineering and Technology (IRJET), 2 (9), pp.2143–2147. 83. Pressman, R.S., 2005. Software Engineering: A Practitioner’s Approach. 6th ed. Palgrave Macmillan. 84. Pyzdek, T. and Keller, P., 2014. The six sigma handbook. New York: McGraw Hill. 85. Rahal, I., Ren, D., Wu, W., and Perrizo, W., 2004. Mining Confident Minimal Rules with Fixed-Consequents. In: Tools with Artificial Intelligence. 16th IEEE International Conference, pp.6–13. 86. Ramkumar, G.D., Ranka, S., and Tsur, S., 1997. Weighted Association Rules: Model and Algorithm. Proc.ACM SIGKDD, pp.1–13. 87. Rawat, S.S. and Rajamani, L., 2011. Probability Apriori based Approach to Mine Rare Association Rules. In: Data Mining and Optimization (DMO). IEEE, pp.253–258. 88. Rokach, L. and Hutter, D., 2012. Automatic Discovery of the Root Causes for Quality Drift in High Dimensionality Manufacturing Processes. Journal of Intelligent Manufacturing, 23 (5), pp.1915–1930. 89. Rokach, L. and Maimom, O., 2006. Data Mining for Improving the Quality of Manufacturing: A Feature Set Decomposition Approach. Journal of Intelligent Manufacturing, 17 (3), pp.285–299. 90. Rooney, J.J. and Heuvel, L.N. Vanden, 2004. Root Cause Analysis for Beginners. Quality Progress, 37 (7), pp.45–56. 91. Rushi, L., Snehlata, S.D., and Latesh, M., 2013. Class Imbalance Problem in Data Mining: Review. International Journal of Computer Science and Network (IJCSN), 2 (1). 92. Salzman, J., 2003. Methodological Choices Encountered in the Construction of Composite Indices of Economic and Social Well-Being. Center for the Study of Living Standards. 93. Seno, M. and Karypis, G., 2001. LPMiner: An Algorithm for Finding Frequent Itemsets Using Length-Decreasing Support Constraint. Proceedings 2001 IEEE International Conference on Data Mining, pp.1–11. 94. Sharmila, V., Scholar, P.G., and Shanmugasundaram, M., 2012. A Survey of Data Mining Techniques for Quality Improvement in Process Industries. IJCA Proceedings on National Conference on Advances in Computer Science and Applications (NCACSA), 4, pp.20–22. 95. Sim, H., Choi, D., and Kim, C.O., 2014. A data mining approach to the causal analysis of product faults in multi-stage PCB manufacturing. International Journal of Precision Engineering and Manufacturing, 15 (8), pp.1563–1573. 96. Simoudis, E. and Cabena, P., 1998. Discovering Data Mining, From Concept to Implementation. Prentice Hall Inc. 97. Soenjaya, J., Hsu, W., Lee, M.L.I., and Lee, T., 2005. Mining Wafer Fabrication: Framework and Challenges. In: M.M. Kantardzic and J. Zurada., eds. Next Generation of Data-Mining Application. New York: Wiley-IEEE Press, pp.17–40. 98. Sproull, B. and Sproull, R., 2001. Process Problem Solving: A Guide for Maintenance and Operations Teams. SteinerBooks. 99. Stam, E. and Garnsey, E., 2007. Entrepreneurship in the Knowledge Economy. Centre for Technology Management (CTM) Working Paper. University of Cambridge. 100. Steven G Barbee, 2007. The Discovery by Data Mining of Rogue Equipment in the Manufacture of Semiconductor Devices. Central Connecticut State University. 101. Sun, K. and Bai, F., 2008. Mining Weighted Association Rules without Preassigned Weights. IEEE Transactions on Knowledge and Data Engineering, 20 (4), pp.489–495. 102. Suriadi, S., Ouyang, C., Van Der Aalst, W.M.P., and Ter Hofstede, A.H.M., 2013. Root Cause Analysis with Enriched Process Logs. In: Business Process Management Workshops. Springer Berlin Heidelberg, pp.174–186. 103. Symeonidis, A. and Mitkas, P., 2005. Data Mining and Knowledge Discovery: A Brief Overview. In: Agent Intelligence Through Data Mining. United States: Springer. 104. Szathmary, L., Napoli, A., and Valtchev, P., 2007. Towards Rare Itemset Mining. In: Tools with Artificial Intelligence. IEEE, pp.305–312. 105. Tao, F., Murtagh, F., and Farid, M., 2003. Weighted Association Rule Mining using Weighted Support and Significance Framework. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. pp.661–666. 106. Thokal, Y., 2015. Semiconductor Manufacturing for Making Chip [online]. Alchetron. Available at: http://alchetron.com/Semiconductor-Manufacturing-for-making-Chip-2143-W [Accessed on 16 Oct 2015]. 107. Tohmatsu, D.T., 2012. The Future of Manufacturing: Opportunities to Drive Economic Growth. World Economic Forum. Geneva, Switzerland. 108. Trivedi, D., Singh, S., and Thakur, R., 2013. Enhancement of Marketing Strategies using Weighted Association Rule Mining. International Journal of Computer Applications, 68 (21), pp.28–33. 109. Vidya, V., 2015. An Efficient Fuzzy Weighted Association Rule Mining With Enhanced Hits Algorithm. ARPN Journal of Engineering and Applied Sciences, 9 (5), pp.765–774. 110. Vorley, G., 2008. Mini Guide To Root Cause Analysis. Guildford Surrey United Kingdom: Quality Management and Training Ltd. 111. Wang, K., Cheung, D.W., and Chin, F.Y.L., 2001. Mining Confident Rules Without Support Requirement. In: Proceedings of the tenth international conference on Information and knowledge management. pp.89–96. 112. Wang, K., He, Y., and Han, J., 2003. Pushing Support Constraints Into Association Rules Mining. Knowledge and Data Engineering, 15 (3), pp.642–658. 113. Wang, K.-S., 2013. Towards Zero-Defect Manufacturing (ZDM)—A Data Mining Approach. Advances in Manufacturing, 1 (1), pp.62–74. 114. Wang, S. and Yao, X., 2012. Multiclass Imbalance Problems: Analysis and Potential Solutions. IEEE transactions on Systems, Man, and Cybernetics. Part B, Cybernetics, 42 (4), pp.1119–1130. 115. Wang, W., Yang, J., and Yu, P., 2004. WAR: Weighted Association Rules for Item Intensities. Knowledge and Information Systems, 6 (2), pp.203–229. 116. Wang, X.Z. and McGreavy, C., 1998. Automatic Classification for Mining Process Operational Data. Industrial & Engineering Chemical Research, 37 (6), pp.2215–2222. 117. Weiss, G.M., 2004. Mining with Rarity : A Unifying Framework. SIGKDD Explorations Newsletter, 6 (1), pp.7–19. 118. Weiss, G.M., 2010. Mining with Rare Cases. In: Data Mining and Knowledge Discovery Handbook. Springer US, pp.747–757. 119. West, M., 2011. Developing High Quality Data Models. 1st ed. Elsevier. 120. Wilson, P.F., Dell, L.D., and Anderson, G.F., 1993. Root cause analysis: A tool for total quality management. ASQ Quality Press. 121. Wu, J. tong, 2014. Interpretation of Association Rules with Multi-tier Granule Mining. Queensland University of Technology. 122. Xhafaj, E. and Nurja, I., 2014. Some Considerations Related to Poverty using The Principal Component Analysis (PCA). In: The 2nd International Conference on Research and Educatıon. 123. Xiong, H., Tan, P.-N., and Kumar, V., 2003. Mining Strong Affinity Association Patterns in Data Sets with Skewed Support Distribution. In: Third IEEE International Conference. IEEE, pp.387–394. 124. Yang, Y., Farid, S.S., and Thornhill, N.F., 2014. Data Mining for Rapid Prediction of Facility Fit and Debottlenecking of Biomanufacturing Facilities. Journal of Biotechnology, 179, pp.17–25. 125. Yuniarto, H., 2012. The Shortcomings of Existing Root Cause Analysis Tools. Proceedings of the World Congress on Engineering, 3.

An Integrated Principal Component Analysis And Weighted Apriori-T Algorithm For Imbalanced Data Root Cause Analysis

Similar Items