High Accuracy Estimation Of Head Yaw Using Bounding Box Algorithm On A Vision Based System

Interaction between human and machines can be seen in various application nowadays such as robotic restaurant waiters, robots that assist elderly people and much more. The key to the success of these applications is the artificial intelligence of the machine that can comprehend the gestures of the h...

Full description

Saved in:
Bibliographic Details
Main Author: Tarmizi, Aine Ilina
Format: Thesis
Language:English
English
Published: 2020
Subjects:
Online Access:http://eprints.utem.edu.my/id/eprint/25402/1/High%20Accuracy%20Estimation%20Of%20Head%20Yaw%20Using%20Bounding%20Box%20Algorithm%20On%20A%20Vision%20Based%20System.pdf
http://eprints.utem.edu.my/id/eprint/25402/2/High%20Accuracy%20Estimation%20Of%20Head%20Yaw%20Using%20Bounding%20Box%20Algorithm%20On%20A%20Vision%20Based%20System.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-utem-ep.25402
record_format uketd_dc
institution Universiti Teknikal Malaysia Melaka
collection UTeM Repository
language English
English
advisor Shukor, Ahmad Zaki

topic T Technology (General)
T Technology (General)
spellingShingle T Technology (General)
T Technology (General)
Tarmizi, Aine Ilina
High Accuracy Estimation Of Head Yaw Using Bounding Box Algorithm On A Vision Based System
description Interaction between human and machines can be seen in various application nowadays such as robotic restaurant waiters, robots that assist elderly people and much more. The key to the success of these applications is the artificial intelligence of the machine that can comprehend the gestures of the human. In a face-to-face interaction, most applications use vision sensors such as camera as an input to the system. For the head movements of a human, shaking sideways is one of the common gestures used, which involve changes in head yaw angle. From the literatures, most researchers investigate on face recognition instead of head yaw or pose estimation. This research focuses on the head yaw estimation of a human from a faceto-face view of the camera. The purpose of this research is to establish an approach that can estimate the head yaw angle by using a single camera view. The objectives of the research are divided into three, to detect a face from the real-time stream of a camera and obtain bounding box coordinates around the detected face of the human, to analyze the change of distance and head yaw angle between the human face and camera with the effect of area and width of the bounding box on and lastly to design and validate the algorithm for estimation of the head yaw angle from the viewpoint of the camera. It was hypothesized that the width and area of the bounding box will decrease with the increase of head yaw angle. To achieve the objectives mentioned, the face detection was performed first, using Deep Learning method of Caffe framework which uses Convolutional Neural Network. By doing face detection, the bounding box was drawn around the face, which were used for the next analysis. From the video input of the camera, the changes in head yaw angle (orientation) incur changes in the bounding box location, width and area. The changes of the bounding box location, width and area were then analyzed to determine its relationship with the head yaw angle. The bounding box width and area were also analyzed with the different distance set between the face and the camera. In the head yaw estimation, the analysis performed earlier was used to estimate the head yaw angle by using the gradient and y-crossing values obtained. The hypothesis of the relation of the bounding box area and width with the head yaw angle was proven. The percentage of accuracy of the yaw estimation using a bounding box algorithm is 92.4%. In addition, the yaw estimation was also able to identify the turning direction of the head (left or right). A negative value of yaw angle shows the left turning direction while the positive value of yaw angle shows the right turning direction.
format Thesis
qualification_name Master of Philosophy (M.Phil.)
qualification_level Master's degree
author Tarmizi, Aine Ilina
author_facet Tarmizi, Aine Ilina
author_sort Tarmizi, Aine Ilina
title High Accuracy Estimation Of Head Yaw Using Bounding Box Algorithm On A Vision Based System
title_short High Accuracy Estimation Of Head Yaw Using Bounding Box Algorithm On A Vision Based System
title_full High Accuracy Estimation Of Head Yaw Using Bounding Box Algorithm On A Vision Based System
title_fullStr High Accuracy Estimation Of Head Yaw Using Bounding Box Algorithm On A Vision Based System
title_full_unstemmed High Accuracy Estimation Of Head Yaw Using Bounding Box Algorithm On A Vision Based System
title_sort high accuracy estimation of head yaw using bounding box algorithm on a vision based system
granting_institution Universiti Teknikal Malaysia Melaka
granting_department Faculty of Electrical Enginering
publishDate 2020
url http://eprints.utem.edu.my/id/eprint/25402/1/High%20Accuracy%20Estimation%20Of%20Head%20Yaw%20Using%20Bounding%20Box%20Algorithm%20On%20A%20Vision%20Based%20System.pdf
http://eprints.utem.edu.my/id/eprint/25402/2/High%20Accuracy%20Estimation%20Of%20Head%20Yaw%20Using%20Bounding%20Box%20Algorithm%20On%20A%20Vision%20Based%20System.pdf
_version_ 1747834119389184000
spelling my-utem-ep.254022021-11-18T14:08:13Z High Accuracy Estimation Of Head Yaw Using Bounding Box Algorithm On A Vision Based System 2020 Tarmizi, Aine Ilina T Technology (General) TA Engineering (General). Civil engineering (General) Interaction between human and machines can be seen in various application nowadays such as robotic restaurant waiters, robots that assist elderly people and much more. The key to the success of these applications is the artificial intelligence of the machine that can comprehend the gestures of the human. In a face-to-face interaction, most applications use vision sensors such as camera as an input to the system. For the head movements of a human, shaking sideways is one of the common gestures used, which involve changes in head yaw angle. From the literatures, most researchers investigate on face recognition instead of head yaw or pose estimation. This research focuses on the head yaw estimation of a human from a faceto-face view of the camera. The purpose of this research is to establish an approach that can estimate the head yaw angle by using a single camera view. The objectives of the research are divided into three, to detect a face from the real-time stream of a camera and obtain bounding box coordinates around the detected face of the human, to analyze the change of distance and head yaw angle between the human face and camera with the effect of area and width of the bounding box on and lastly to design and validate the algorithm for estimation of the head yaw angle from the viewpoint of the camera. It was hypothesized that the width and area of the bounding box will decrease with the increase of head yaw angle. To achieve the objectives mentioned, the face detection was performed first, using Deep Learning method of Caffe framework which uses Convolutional Neural Network. By doing face detection, the bounding box was drawn around the face, which were used for the next analysis. From the video input of the camera, the changes in head yaw angle (orientation) incur changes in the bounding box location, width and area. The changes of the bounding box location, width and area were then analyzed to determine its relationship with the head yaw angle. The bounding box width and area were also analyzed with the different distance set between the face and the camera. In the head yaw estimation, the analysis performed earlier was used to estimate the head yaw angle by using the gradient and y-crossing values obtained. The hypothesis of the relation of the bounding box area and width with the head yaw angle was proven. The percentage of accuracy of the yaw estimation using a bounding box algorithm is 92.4%. In addition, the yaw estimation was also able to identify the turning direction of the head (left or right). A negative value of yaw angle shows the left turning direction while the positive value of yaw angle shows the right turning direction. 2020 Thesis http://eprints.utem.edu.my/id/eprint/25402/ http://eprints.utem.edu.my/id/eprint/25402/1/High%20Accuracy%20Estimation%20Of%20Head%20Yaw%20Using%20Bounding%20Box%20Algorithm%20On%20A%20Vision%20Based%20System.pdf text en validuser http://eprints.utem.edu.my/id/eprint/25402/2/High%20Accuracy%20Estimation%20Of%20Head%20Yaw%20Using%20Bounding%20Box%20Algorithm%20On%20A%20Vision%20Based%20System.pdf text en public https://plh.utem.edu.my/cgi-bin/koha/opac-detail.pl?biblionumber=119686 mphil masters Universiti Teknikal Malaysia Melaka Faculty of Electrical Enginering Shukor, Ahmad Zaki 1. Al Haj, M., Gonzalez, J. and Davis, L.S., 2012. Partial least squares in head pose estimation: How to simultaneously deal with misalignment. IEEE Conference on Computer Vision and Pattern Recognition, pp. 2602-2609. 2. Beleznai, C., Fruhstuck, B. and Bischof, H., 2004. Human detection in groups using a fast mean shift procedure. International Conference on Image Processing. ICIP'04. Vol. 1, pp. 349-352. 3. Brilakis, I., Park, M.W. and Jog, G., 2011. Automated vision tracking of project related entities. Advanced Engineering Informatics, 25(4), pp.713-724. 4. Chawla, I., 2018. Face Detection & Recognition using Tensor Flow: A Review. International Journal of Computers & Technology, 18, pp.7381-7388. 5. Corke, P.I. and Hutchinson, S.A., 2000. Real-time vision, tracking and control. Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings, Vol. 1, pp. 622-629. 6. Cutler, R. and Davis, L.S., 2000. Robust real-time periodic motion detection, analysis, and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), pp.781-796. 7. Dalal, N. and Triggs, B., 2005. Histograms of oriented gradients for human detection. IEEE Computer Society Conference, Vol. 1, pp. 886-893. 8. Das, D., Rashed, M.G., Kobayashi, Y. and Kuno, Y., 2015. Supporting human–robot interaction based on the level of visual focus of attention. IEEE Transactions on HumanMachine Systems, 45(6), pp.664-675. 9. Deep, S., Shaikh, R.A., Li, J.P., Memon, M.H., Khan, A. and Tao, Z., 2014, December. Pattern based object recognition in image processing. In 2014 11th International Computer Conference on Wavelet Actiev Media Technology and Information Processing (ICCWAMTIP) (pp. 276-279). IEEE. 10. Elzein, H., Lakshmanan, S. and Watta, P., 2003. A motion and shape-based pedestrian detection algorithm. IEEE IV2003 Intelligent Vehicles Symposium Proceedings, pp. 500- 504. 11. Fanelli, G., Gall, J., Van Gool, L. 2011. Real time head pose estimation with random regression forests. Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 617-624. 12. Gavrila, D.M. and Giebel, J., 2002, June. Shape-based pedestrian detection and tracking. Intelligent Vehicle Symposium, 2002. IEEE, Vol. 1, pp. 8-14. 13. Glas, D. F., Miyashita, T., Ishiguro, H. and Hagita, N., 2007. Laser tracking of human body motion using adaptive shape modelling. International Conference on Intelligent Robots and Systems, pp. 602–608. 14. Haga, T., Sumi, K. and Yagi, Y., 2004. Human detection in outdoor scene using spatiotemporal motion analysis. Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004. Vol. 4, pp. 331-334. 15. Han, J. and Bhanu, B., 2003, July. Detecting moving humans using color and infrared video. Proceedings of IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, MFI2003. pp. 228-233. 16. Hjelmås, E. and Low, B.K., 2001. Face detection: A survey. Computer vision and image understanding, 83(3), pp.236-274. 17. Hsu, R.L., Abdel-Mottaleb, M. and Jain, A.K., 2002. Face detection in color images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5), pp.696-706. 18. Hu, Y., Chen, L., Zhou, Y. and Zhang, H., 2004, May. Estimating face pose by facial asymmetry and geometry. Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004 Proceedings. (pp. 651-656). 19. Imai, T., Moore, S.T., Raphan, T. and Cohen, B., 2001. Interaction of the body, head, and eyes during walking and turning. Experimental brain research, 136(1), pp.1-18. 20. Jesorsky, O., Kirchberg, K.J. and Frischholz, R.W., 2001, June. Robust Face Detection Using the Hausdorff Distance. International conference on audio-and video-based biometric person authentication (pp. 90-95). Springer, Berlin, Heidelberg. 21. Jiang, L., Tian, F., Shen, L.E., Wu, S., Yao, S., Lu, Z. and Xu, L., 2004. Perceptual-based fusion of ir and visual images for human detection. Proceedings of 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, 2004. (pp. 514-517). 22. Joachims, T., 1998. Making large-scale SVM learning practical (No. 1998, 28). Technical Report. 23. Kar, A. Corcoran, P. 2017. A review and analysis of eye-gaze estimation systems, algorithms and performance evaluation methods in consumer platforms. IEEE Access, 5, pp 16495-16519. 24. Kim, M., Kumar, S., Pavlovic, V. and Rowley, H., 2008, June. Face tracking and recognition with visual constraints in real-world videos. In 2008 IEEE Conference on Computer Vision and Pattern Recognition (pp. 1-8). IEEE. 25. Koutras, P. and Maragos, P. Estimation of eye gaze direction angles based on active appearance models. 2015. Proceedings of International Conference of Image Processing ICIP. pp. 2424-2428. 26. Li, L., Ge, S.S., Sim, T., Koh, Y.T. and Hunag, X., 2004. Object-oriented scale-adaptive filtering for human detection from stereo images. IEEE Conference on Cybernetics and Intelligent Systems, 2004. Vol. 1, pp. 135-140. 27. Lin, M., Chen, Q. and Yan, S., 2013. Network in network. arXiv preprint arXiv:1312.4400. 28. Liu, X., Liang, W., Wang, Y., Li, S. and Pei, M. 2016. 3D head pose estimation with Convolutional Neural Network trained on synthetic images. Proceedings of International Conference on Image Processing, ICIP, vol 2016, pp1289-1293. 29. Murphy-Chutorian, E. and Trivedi, M.M., 2008. Head pose estimation in computer vision: A survey. IEEE transactions on pattern analysis and machine intelligence, 31(4), pp.607- 626. 30. Ogale, N.A., 2006. A survey of techniques for human detection from video. Survey, University of Maryland, 125(133), p.19. 31. Oh, N.G., Cho, J.I. and Park, K., 2010, October. On performance enhancement of a following tracker using stereo vision. ICCAS 2010 (pp. 1259-1262). IEEE. 32. Gupta, I., Patil, V., Kadam, C. and Dumbre, S., 2016, December. Face detection and recognition using Raspberry Pi. In 2016 IEEE International WIE Conference on Electrical and Computer Engineering (WIECON-ECE) (pp. 83-86). IEEE. 33. Raytchev, B., Yoda, I., and Sakaue, K. 2004. Head pose estimation by nonlinear manifold learning. Proceedings on International Conference on Pattern Recognition, Vol. 4, pp 462-466. 34. Rowley, H.A., Baluja, S. and Kanade, T., 1998. Neural network-based face detection. IEEE Transactions on pattern analysis and machine intelligence, 20(1), pp.23-38. 35. Schodl, A., Haro, A. and Essa, I.A., 1998. Head tracking using a textured polygonal model. Georgia Institute of Technology. 36. Schwartz, W.R., Kembhavi, A., Harwood, D. and Davis, L.S., 2009, September. Human detection using partial least squares analysis. In 2009 IEEE 12th international conference on computer vision (pp. 24-31). IEEE. 37. Sidenbladh, H., 2004. Detecting human motion with support vector machines. Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004. Vol. 2, pp. 188-191. 38. Sonoura, T., Nakamoto, H., Nishiyama, M., Matsuhira, N., Tokura, S. and Yoshimi, T., 2008. Person following robot with vision-based and sensor fusion tracking algorithm (pp. 519-538). INTECH Open Access Publisher. 39. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V. and Rabinovich, A., 2015. Going deeper with convolutions. Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9. 40. Tian, Y. L. Brown, L., Connell, C., Pankanti, S., Hampapur, A., Senior, A., Bolle, R. 2003. 41. Absolute head pose estimation from overhead wide-angle cameras, 2003 IEEE International SOI Conference, pp. 92–99. 42. Utsumi, A. and Tetsutani, N., 2002. Human detection using geometrical pixel value structures. Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition (pp. 39-44). IEEE. 43. Viola, P. and Jones, M.J., 2004. Robust real-time face detection. International journal of computer vision, 57(2), pp.137-154. 44. Viola, P., Jones, M.J. and Snow, D., 2005. Detecting pedestrians using patterns of motion and appearance. International Journal of Computer Vision, 63(2), pp.153-161. 45. Wang, J.G. and Sung, E., 2001. Pose determination of human faces by using vanishing points. Pattern Recognition, 34(12), pp.2427-2445. 46. Wren, C.R., Azarbayejani, A., Darrell, T. and Pentland, A.P., 1997. Pfinder: Real-time tracking of the human body. IEEE Transactions on pattern analysis and machine intelligence, 19(7), pp.780-785. 47. Xu, M., Pen, Y., Yang, J. and Shen, Y.F., 2011. Human Detection Using Depth Information. Journal of Computational Information Systems, 7, pp.807-815. 48. Xu, F. and Fujimura, K., 2003, July. Human detection using depth and gray images. In Proceedings of the IEEE Conference on Advanced Video and Signal Based Surveillance, 2003. pp. 115-121. 49. Yamazoe, H. 2017. Head and shoulders pose estimation using a body-mounted camera. 2017 IEEE International conference on Ubiquitous Robots and Artificial Intelligence (URAI), pp 142-145. 50. Yang, W., Zhou, L., Li, T. and Wang, H., 2019. A face detection method based on cascade convolutional neural network. Multimedia Tools and Applications, 78(17), pp.24373- 24390. 51. Yoon, S.M. and Kim, H., 2004. Real-time multiple people detection using skin color, motion and appearance information. 13th IEEE International Workshop on Robot and Human Interactive Communication, pp. 331-334. 52. Yu, M. Lin, Y. Tang, X. Schmidt, D., Wang, X. and Guo, Y. 2015. An Easy Iris Center Detection Method for Eye Gaze Tracking System, Journal of Eye Movement Research. 8(3), pp.1-20. 53. Zarkasi, A., 2018. Face Movement Detection Using Template Matching. International Conference on Electrical Engineering and Computer Science (ICECOS). 17, pp. 333–338. 54. Zhou, J. and Hoang, J., 2005. Real time robust human detection and tracking system. IEEE Computer Society Conference on Computer Vision and Pattern Recognition pp. 149-149. 55. Zhu, X. and Ramanan, D., 2012, June. Face detection, pose estimation, and landmark localization in the wild. In 2012 IEEE conference on computer vision and pattern recognition (pp. 2879-2886). IEE