Scene Recognition and Classification Model Based on Human Pre-attentive Visual Attention

Scene recognition is considered as one of the most important functionalities of human vision. In the field of computer vision, scene recognition problem is very significant and important. Scene recognition or classification is a process of organizing images and predicting the class category of a sce...

Full description

Saved in:
Bibliographic Details
Main Author: Ahmad Ridzuan, Kudus
Format: Thesis
Language:English
Published: 2021
Subjects:
Online Access:http://ir.unimas.my/id/eprint/34925/3/Ahmad%20Ridzuan%20Kudus%20ft.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-unimas-ir.34925
record_format uketd_dc
institution Universiti Malaysia Sarawak
collection UNIMAS Institutional Repository
language English
topic BF Psychology
QA76 Computer software
BF Psychology
spellingShingle BF Psychology
QA76 Computer software
BF Psychology
Ahmad Ridzuan, Kudus
Scene Recognition and Classification Model Based on Human Pre-attentive Visual Attention
description Scene recognition is considered as one of the most important functionalities of human vision. In the field of computer vision, scene recognition problem is very significant and important. Scene recognition or classification is a process of organizing images and predicting the class category of a scene image. Human can accurately classify scene effortlessly within short period of time. Using this concept, a novel approach of scene classification model which built based on human pre-attentive visual attention has been proposed in this study by utilizing one of the earliest saliency model to generate a set of high-quality regions potentially contain salient objects. An experimental study was performed to investigate the efficiency of Saliency Toolbox on natural indoor scene images when its parameters are manipulated. At the end of this experiment, an acceptable parameter scales have been finalized for the use of Saliency Toolbox in the proposed scene classification model. The proposed model is developed with three main operations; (i) salient region proposals generation, (ii) feature extraction and concatenation, and (iii) classification. The proposed model has been trained and tested on MIT Indoor 67 dataset. An experiment and a benchmarking testing have been conducted on the proposed model. The results of the experiment have clearly shown providing more salient regions means providing more meaningful details of an input image. For the benchmarking testing, the result has proved that saliency model used in this study is capable to generate high-quality informative salient regions that lead to good classification accuracy. The proposed model achieves a higher average accuracy percentage than a standard approach model, which classifies based on one whole image. This indicates the advantages of using deep features of local salient objects over global deep features. Two experiments have been conducted in this study to test and evaluate human performance on scene classification for various visual input conditions. The accuracy of human classification on complete scene images for a brief period of time in Experiment 1 is compared to the accuracy obtained by the proposed scene classification model. Furthermore, the accuracy of human classification in Experiment 1 is also compared to the accuracy obtained by human in Experiment 2, where their classification performance is tested on cropped salient regions. Evaluation of results from these experiments have shown that the proposed model has not achieved the same standard as human. Using only object features to differentiate between two different scenes is not enough to achieve the best classification accuracy as human. The scene background and layout, relationship between objects and human memory are the other features that affect human classification performance. These other attributes of scene need to be taken in the process of recognition and classification of scene images in further study.
format Thesis
qualification_level Master's degree
author Ahmad Ridzuan, Kudus
author_facet Ahmad Ridzuan, Kudus
author_sort Ahmad Ridzuan, Kudus
title Scene Recognition and Classification Model Based on Human Pre-attentive Visual Attention
title_short Scene Recognition and Classification Model Based on Human Pre-attentive Visual Attention
title_full Scene Recognition and Classification Model Based on Human Pre-attentive Visual Attention
title_fullStr Scene Recognition and Classification Model Based on Human Pre-attentive Visual Attention
title_full_unstemmed Scene Recognition and Classification Model Based on Human Pre-attentive Visual Attention
title_sort scene recognition and classification model based on human pre-attentive visual attention
granting_institution Universiti Malaysia Sarawak
granting_department Faculty of Cognitive Sciences and Human Development
publishDate 2021
url http://ir.unimas.my/id/eprint/34925/3/Ahmad%20Ridzuan%20Kudus%20ft.pdf
_version_ 1811771550759649280
spelling my-unimas-ir.349252024-08-20T06:44:48Z Scene Recognition and Classification Model Based on Human Pre-attentive Visual Attention 2021-03-16 Ahmad Ridzuan, Kudus BF Psychology QA76 Computer software T201 Patents. Trademarks Scene recognition is considered as one of the most important functionalities of human vision. In the field of computer vision, scene recognition problem is very significant and important. Scene recognition or classification is a process of organizing images and predicting the class category of a scene image. Human can accurately classify scene effortlessly within short period of time. Using this concept, a novel approach of scene classification model which built based on human pre-attentive visual attention has been proposed in this study by utilizing one of the earliest saliency model to generate a set of high-quality regions potentially contain salient objects. An experimental study was performed to investigate the efficiency of Saliency Toolbox on natural indoor scene images when its parameters are manipulated. At the end of this experiment, an acceptable parameter scales have been finalized for the use of Saliency Toolbox in the proposed scene classification model. The proposed model is developed with three main operations; (i) salient region proposals generation, (ii) feature extraction and concatenation, and (iii) classification. The proposed model has been trained and tested on MIT Indoor 67 dataset. An experiment and a benchmarking testing have been conducted on the proposed model. The results of the experiment have clearly shown providing more salient regions means providing more meaningful details of an input image. For the benchmarking testing, the result has proved that saliency model used in this study is capable to generate high-quality informative salient regions that lead to good classification accuracy. The proposed model achieves a higher average accuracy percentage than a standard approach model, which classifies based on one whole image. This indicates the advantages of using deep features of local salient objects over global deep features. Two experiments have been conducted in this study to test and evaluate human performance on scene classification for various visual input conditions. The accuracy of human classification on complete scene images for a brief period of time in Experiment 1 is compared to the accuracy obtained by the proposed scene classification model. Furthermore, the accuracy of human classification in Experiment 1 is also compared to the accuracy obtained by human in Experiment 2, where their classification performance is tested on cropped salient regions. Evaluation of results from these experiments have shown that the proposed model has not achieved the same standard as human. Using only object features to differentiate between two different scenes is not enough to achieve the best classification accuracy as human. The scene background and layout, relationship between objects and human memory are the other features that affect human classification performance. These other attributes of scene need to be taken in the process of recognition and classification of scene images in further study. Universiti Malaysia Sarawak (UNIMAS) 2021-03 Thesis http://ir.unimas.my/id/eprint/34925/ http://ir.unimas.my/id/eprint/34925/3/Ahmad%20Ridzuan%20Kudus%20ft.pdf text en validuser masters Universiti Malaysia Sarawak Faculty of Cognitive Sciences and Human Development Afif, M., Ayachi, R., Said, Y., & Atri, M. (2020). Deep Learning Based Application for Indoor Scene Recognition. Neural Processing Letters, 55, 1-11. Almeida, A. F., Figueiredo, R., Bernardino, A., & Santos-Victor, J. (2017, November). Deep Networks for Human Visual Attention: A Hybrid Model Using Foveal Vision. In Iberian Robotics Conference, (pp. 117-128). Arbeláez, P., Pont-Tuset, J., Barron, J. T., Marques, F., & Malik, J. (2014). Multiscale Combinatorial Grouping. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 328-335). Beale, M. H., Hagan, M. T., & Demuth, H. B. (2017). Neural Network Toolbox Getting Started Guide. [e-books] Natick, MA The Mathworks, Inc. Available through: <https://www.mathworks.com/help/pdf_doc/deeplearning/index.html?s_cid=doc_ftr> [Accessed on 17 July 2018]. Borji, A., & Itti, L. (2012). State-of-the-art in Visual Attention Modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1), 185-207. Borji, A., Sihite, D. N., & Itti, L. (2013). What Stands out in A Scene? A Study of Human Explicit Saliency Judgment. Vision Research, 91, 62-77. Broadbent, D. E. (2013). Perception and communication. Amsterdem: Elsevier. Carrasco, M. (2011). Visual Attention: The past 25 years. Vision Research, 51(13), 1484-1525. Chen, H. Y., & Leou, J. J. (2010). Visual Attention Region Detection Using Texture, Object Features. Journal of Information Science Engineering, 26(5), 1657-1675. Cheng, X., Lu, J., Feng, J., Yuan, B., & Zhou, J. (2018). Scene Recognition with Objectness. Pattern Recognition, 74, 474-487. Ciptadi, A., Hermans, T., & Rehg, J. M. (2013). An in Depth View of Saliency. Georgia Institute of Technology, North Avenue, Atlanta. Dodge, S., & Karam, L. (2017, July). A Study and Comparison of Human and Deep Learning Recognition Performance Under Visual Distortions. In 2017 26th International Conference on Computer Communication and Networks, (pp. 1-7). Failing, M., & Theeuwes, J. (2018). Selection History: How Reward Modulates Selectivity of Visual Attention. Psychonomic Bulletin & Review, 25(2), 514-538. Gong, Y., Wang, L., Guo, R., & Lazebnik, S. (2014, September). Multi-scale orderless pooling of deep convolutional activation features. In European Conference on Computer Vision, (pp. 392-407). Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. [e-book] Cambridge: MIT Press. Available through: <http://www.deeplearningbook.org> [Accessed on 6 July 2018]. Guo, S., Huang, W., Wang, L., & Qiao, Y. (2017). Locally Supervised Deep Hybrid Model for Scene Recognition. IEEE Transactions on Image Processing, 26(2), 808-820. Guo, W., Wu, R., Chen, Y., & Zhu, X. (2018). Deep Learning Scene Recognition Method Based on Localization Enhancement. Sensors, 18(10), 3376-3395. Herranz, L., Jiang, S., & Li, X. (2016). Scene Recognition with CNNs: Objects, Scales and Dataset Bias. In Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition, (pp. 571-579). Hijazi, S., Kumar, R., & Rowen, C. (2015). Using Convolutional Neural Networks for Image Recognition. Cadence Design Systems Inc, San Jose, CA, USA. Itti, L., Koch, C., & Niebur, E. (1998). A Model of Saliency-based Visual Attention for Rapid Scene Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11), 1254-1259. Izadinia, H., Sadeghi, F., & Farhadi, A. (2014). Incorporating Scene Context and Object Layout into Appearance Modeling. In Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition, (pp. 232-239). Javed, S. A., & Nelakanti, A. K. (2017). Object-Level Context Modeling For Scene Classification with Context-CNN. arXiv:1705.04358. Jiafa, M., Weifeng, W., Yahong, H., & Weiguo, S. (2019). A Scene Recognition Algorithm Based on Deep Residual Network. Systems Science & Control Engineering, 7(1), 243-251. Jiang, M., Huang, S., Duan, J., & Zhao, Q. (2015, June). Salicon: Saliency in Context. In 2015 IEEE Conference on Computer Vision and Pattern Recognition, (pp. 1072-1080). Judd, T., Ehinger, K., Durand, F., & Torralba, A. (2009, September). Learning to Predict Where Humans Look. In 2009 IEEE 12th International Conference on Computer Vision, (pp. 2106-2113). Juneja, M., Vedaldi, A., Jawahar, C. V., & Zisserman, A. (2013, June). Blocks that Shout: Distinctive Parts for Scene Classification. In 2013 IEEE Conference on Computer Vision and Pattern Recognition, (pp. 923-930). Khan, S. H., Hayat, M., Bennamoun, M., Togneri, R., & Sohel, F. A. (2016). A Discriminative Representation of Convolutional Features for Indoor Scene Recognition. IEEE Transactions on Image Processing, 25(7), 3372-3383. Konkle, T., Brady, T. F., Alvarez, G. A., & Oliva, A. (2010). Conceptual Distinctiveness Supports Detailed Visual Long-term Memory for Real-World Objects. Journal of Experimental Psychology: General, 139(3), 558-578. Kumar, K. V., Rao, R. R., Ramaiah, V. S., & Kaka, J. R. (2012). Content Based Image Retrieval System Consume Semantic Gap. International Journal of Computer Science and Information Technologies, 3(5), 5231-5235. Lazebnik, S., Schmid, C., & Ponce, J. (2006, June). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 2169-2178). LeCun, Y., Bengio,Y., & Hinton, G. (2015). Deep Learning. Nature, 521(7553), 436-444. LeCun, Y., Kavukcuoglu, K., & Farabet, C. (2010, May). Convolutional Networks and Applications in Vision. In Proceedings of 2010 IEEE International Symposium on Circuits and Systems, (pp. 253-256). Li, L. J., Su, H., Fei-Fei, L., & Xing, E. P. (2010). Object Bank: A High-level Image Representation for Scene Classification & Semantic Feature Sparsification. In Advances in Neural Information Processing Systems, (pp. 1378-1386). Li, L. J., Su, H., Lim, Y., & Fei-Fei, L. (2010, September). Objects as attributes for scene classification. In European Conference on Computer Vision, (pp. 57-69). Li, N., Zhao, X., Yang, Y., & Zou, X. (2016). Objects Classification by Learning-Based Visual Saliency Model and Convolutional Neural Network. Computational Intelligence and Neuroscience, 2016, 1-12. Li, X., & Guo, Y. (2012). An Object Co-occurrence Assisted Hierarchical Model for Scene Understanding. In British Machine Vision Conference, (pp. 1-11). Li, Y., Zhang, J., Cheng, Y., Huang, K., & Tan, T. (2018, April). DF 2 Net: Discriminative Feature Learning and Fusion Network for RGB-D Indoor Scene Classification. In Thirty-Second AAAI Conference on Artificial Intelligence, (pp. 7041-7048). Liao, Y., Kodagoda, S., Wang, Y., Shi, L., & Liu, Y. (2016, May). Understand Scene Categories by Objects: A Semantic Regularized Scene Classifier using Convolutional Neural Networks. In 2016 IEEE International Conference on Robotics and Automation, (pp. 2318-2325). Liu, T., Yuan, Z., Sun, J., Wang, J., Zheng, N., Tang, X., & Shum, H. Y. (2011). Learning to Detect a Salient Object. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(2), 353-367. Liu, W., Li, Y., & Wu, Q. (2018). An Attribute-based High-level Image Representation for Scene Classification. IEEE Access, 7, 4629-4640. Lowe, D. G. (2004). Distinctive Image Features from Scale-invariant Keypoints. International Journal of Computer Vision, 60(2), 91-110. Min, X., Zhai, G., Gao, Z., & Gu, K. (2014, June). Visual Attention Data for Image Quality Assessment Databases. In 2014 IEEE International Symposium on Circuits and Systems, (pp. 894-897). Navalpakkam, V., Arbib, M., & Itti, L. (2005). Attention and Scene Understanding. In Neurobiology of Attention, (pp. 197-203). Neisser, U. (2014). Cognitive psychology. Classic Edition. New York: Psychology Press. Oliva, A., & Torralba, A. (2001). Modeling The Shape of The Scene: A Holistic Representation of The Spatial Envelope. International Journal of Computer Vision, 42(3), 145-175. Olson, R. S., La Cava, W., Orzechowski, P., Urbanowicz, R. J., & Moore, J. H. (2017). PMLB: A Large Benchmark Suite for Machine Learning Evaluation and Comparison. BioData Mining, 10(1), 1-13. Padmanabhan, S. (2016). Convolutional Neural Network for Image Classification and Captioning. Unpublished manuscript, Department of Computer Science, Stanford University. Pandey, M., & Lazebnik, S. (2011, November). Scene Recognition and Weakly Supervised Object Localization with Deformable Part-based Models. In 2011 IEEE International Conference on Computer Vision, (pp. 1307-1314). Quattoni, A., & Torralba, A. (2009, June). Recognizing Indoor Scenes. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, (pp. 413-420). Redmon, J., & Farhadi, A. (2017). YOLO9000: Better, Faster, Stronger. In Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition, (pp. 7263-7271). Rensink, R. A. (2000). The Dynamic Representation of Scenes. Visual Cognition, 7(1-3), 17-42. Seong, H., Hyun, J., & Kim, E. (2020). FOSNet: An End-to-end Trainable Deep Neural Network for Scene Recognition. IEEE Access, 8, 82066-82077. Sun, N., Li, W., Liu, J., Han, G., & Wu, C. (2018). Fusing Object Semantics and Deep Appearance Features for Scene Recognition. IEEE Transactions on Circuits and Systems for Video Technology, 29(6), 1715-1728. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going Deeper with Convolutions. In Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition, (pp. 1-9). Uijlings, J. R., Van De Sande, K. E., Gevers, T., & Smeulders, A. W. (2013). Selective Search for Object Recognition. International Journal of Computer Vision, 104(2), 154-171. Walther, D., & Koch, C. (2006). Modeling Attention to Salient Proto-objects. Neural Networks, 19(9), 1395-1407. Wang, W., & Shen, J. (2017). Deep Visual Attention Prediction. IEEE Transactions on Image Processing, 27(5), 2368-2378. Wang, X., Zhong, Y., Xu, Y., Zhang, L., & Xu, Y. (2018). Saliency-based Endmember Detection for Hyperspectral Imagery. IEEE Transactions on Geoscience and Remote Sensing, 56(7), 3667-3680. Ward, M. O., Grinstein, G., & Keim, D. (2010). Interactive Data Visualization: Foundations, Techniques, and Applications. Florida: CRC Press. Wolfe, J. M., & Horowitz, T. S. (2017). Five Factors that Guide Attention in Visual Search. Nature Human Behaviour, 1(3), 1-8. Wu, R., Wang, B., Wang, W., & Yu, Y. (2015, December). Harvesting Discriminative Meta Objects with Deep CNN Features for Scene Classification. In 2015 IEEE International Conference on Computer Vision, (pp. 1287-1295). Yang, J., Yu, K., Gong, Y., & Huang, T. (2009, June). Linear Spatial Pyramid Matching using Sparse Coding for Image Classification. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, (pp. 1794-1801). Zanca, D., & Gori, M. (2017). Variational Laws of Visual Attention for Dynamic Scenes. In Advances in Neural Information Processing Systems, (pp. 3823-3832). Zhang, L., Zhen, X., & Shao, L. (2014). Learning Object-to-class Kernels for Scene Classification. IEEE Transactions on Image Processing, 23(8), 3241-3253. Zhao, Z., & Larson, M. (2018, October). From Volcano to Toyshop: Adaptive Discriminative Region Discovery for Scene Recognition. In Proceedings of The 26th ACM International Conference on Multimedia, (pp. 1760-1768). Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., & Oliva, A. (2014). Learning Deep Features for Scene Recognition using Places Database. In Advances in Neural Information Processing Systems, (pp. 487-495). Zitnick, C. L., & Dollár, P. (2014, September). Edge Boxes: Locating Object Proposals from Edges. In European Conference on Computer Vision, (pp. 391-405)