Human Action Recognition using Depth Motion Map and Resnet
Main Article Content
Abstract
Human action recognition is an active research topic in recent years due to its wide application in reality. This paper presents a new method for human action recognition from depth maps which are nowadays highly available thanks to the popularity of depth sensors. The proposed method composes of three components: video representation; feature extraction and action classification. In video representation, we adopt a technique of motion depth map (DMM) which is simple and efficient and more importantly it could capture long-term movement of the action. We then deploy a deep learning based technique, Resnet in particular, for extracting features and doing action classification. We have conducted extensively experiments on a benchmark dataset of 20 activities (CMDFall) and compared with some state of the art techniques. The experimental results show competitive performance of the proposed method. The proposed method could achieve about 98.8% of accuracy for fall and non-fall detection. This is a promising result for application of monitoring elderly people.
Keywords
Human action recognition, depth motion map, deep neural network, support Vector Machine
Article Details
References
[1] R. Poppe; A survey on vision-based human action recognition; Image Vis. Comput., vol. 28, no. 6, pp. 976–990, Jun. 2010.
[2] J. Carreira and A. Zisserman; Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset; ArXiv170507750 Cs, May 2017.
[3] O. Russakovsky et al.; ImageNet Large Scale Visual Recognition Challenge; ArXiv14090575 Cs, Sep. 2014.
[4] D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri; Learning Spatiotemporal Features with 3D Convolutional Networks; in Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Washington, DC, USA, 2015, pp. 4489–4497.
[5] M. A. R. Ahad, J. K. Tan, H. Kim, and S. Ishikawa; Motion history image: its variants and applications; Mach. Vis. Appl., vol. 23, no. 2, pp. 255–281, Mar. 2012.
[6] C. Chen, K. Liu, and N. Kehtarnavaz; Real-time human action recognition based on depth motion map; J. Real-Time Image Process., vol. 12, no. 1, pp. 155–163, Jun. 2016.
[7] X. Li, Y. Makihara, C. Xu, D. Muramatsu, Y. Yagi, and M. Ren; Gait Energy Response Functions for Gait Recognition against Various Clothing and Carrying Status; Appl. Sci., vol. 8, no. 8, p. 1380, Aug. 2018.
[8] O. Oreifej and Z. Liu; HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences; in 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 716–723.
[9] X. Yang, C. Zhang, and Y. Tian; Recognizing Actions Using Depth Motion Maps-based Histograms of Oriented Gradients; in Proceedings of the 20th ACM International Conference on Multimedia, New York, NY, USA, 2012, pp. 1057–1060.
[10] C. Chen, R. Jafari, and N. Kehtarnavaz; Action Recognition from Depth Sequences Using Depth Motion Maps-based Local Binary Patterns; in 2015 IEEE Winter Conference on Applications of Computer Vision, 2015, pp. 1092–1099.
[11] T.-H. Tran and V.-T. Nguyen; How Good Is Kernel Descriptor on Depth Motion Map for Action Recognition; in Computer Vision Systems, 2015, pp. 137–146.
[12] T.-H. Tran, T.-L. Le, V.-N. Hoang, and H. Vu; Continuous detection of human fall using multimodal features from Kinect sensors in scalable environment; Comput. Methods Programs Biomed., vol. 146, pp. 151–165, Jul. 2017.
[13] K. Simonyan and A. Zisserman; Two-Stream Convolutional Networks for Action Recognition in Videos; ArXiv14062199 Cs, Jun. 2014.
[14] V. Khong and T. Tran; Improving Human Action Recognition with Two-Stream 3D Convolution Neural Network; in 2018 1st International Conference on Multimedia Analysis and Pattern Recognition (MAPR), New York, NY, USA, pp. 1–6.
[15] K. He, X. Zhang, S. Ren, and J. Sun; Deep Residual Learning for Image Recognition; ArXiv151203385 Cs, Dec. 2015.
[16] Thanh-Hai Tran et al.; A Multimodal multiview dataset for human fall analysis and preliminary investigation on modality; in The 20th International Conference on Pattern Recognition (ICPR’2018), Beijing, China.
[17] Z. Zhang, S. Wei, Y. Song, and Y. Zhang; Gesture Recognition Using Enhanced Depth Motion Map and Static Pose Map; in 2017 12th IEEE International Conference on Automatic Face Gesture Recognition (FG2017), 2017, pp. 238–244.
[2] J. Carreira and A. Zisserman; Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset; ArXiv170507750 Cs, May 2017.
[3] O. Russakovsky et al.; ImageNet Large Scale Visual Recognition Challenge; ArXiv14090575 Cs, Sep. 2014.
[4] D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri; Learning Spatiotemporal Features with 3D Convolutional Networks; in Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Washington, DC, USA, 2015, pp. 4489–4497.
[5] M. A. R. Ahad, J. K. Tan, H. Kim, and S. Ishikawa; Motion history image: its variants and applications; Mach. Vis. Appl., vol. 23, no. 2, pp. 255–281, Mar. 2012.
[6] C. Chen, K. Liu, and N. Kehtarnavaz; Real-time human action recognition based on depth motion map; J. Real-Time Image Process., vol. 12, no. 1, pp. 155–163, Jun. 2016.
[7] X. Li, Y. Makihara, C. Xu, D. Muramatsu, Y. Yagi, and M. Ren; Gait Energy Response Functions for Gait Recognition against Various Clothing and Carrying Status; Appl. Sci., vol. 8, no. 8, p. 1380, Aug. 2018.
[8] O. Oreifej and Z. Liu; HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences; in 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 716–723.
[9] X. Yang, C. Zhang, and Y. Tian; Recognizing Actions Using Depth Motion Maps-based Histograms of Oriented Gradients; in Proceedings of the 20th ACM International Conference on Multimedia, New York, NY, USA, 2012, pp. 1057–1060.
[10] C. Chen, R. Jafari, and N. Kehtarnavaz; Action Recognition from Depth Sequences Using Depth Motion Maps-based Local Binary Patterns; in 2015 IEEE Winter Conference on Applications of Computer Vision, 2015, pp. 1092–1099.
[11] T.-H. Tran and V.-T. Nguyen; How Good Is Kernel Descriptor on Depth Motion Map for Action Recognition; in Computer Vision Systems, 2015, pp. 137–146.
[12] T.-H. Tran, T.-L. Le, V.-N. Hoang, and H. Vu; Continuous detection of human fall using multimodal features from Kinect sensors in scalable environment; Comput. Methods Programs Biomed., vol. 146, pp. 151–165, Jul. 2017.
[13] K. Simonyan and A. Zisserman; Two-Stream Convolutional Networks for Action Recognition in Videos; ArXiv14062199 Cs, Jun. 2014.
[14] V. Khong and T. Tran; Improving Human Action Recognition with Two-Stream 3D Convolution Neural Network; in 2018 1st International Conference on Multimedia Analysis and Pattern Recognition (MAPR), New York, NY, USA, pp. 1–6.
[15] K. He, X. Zhang, S. Ren, and J. Sun; Deep Residual Learning for Image Recognition; ArXiv151203385 Cs, Dec. 2015.
[16] Thanh-Hai Tran et al.; A Multimodal multiview dataset for human fall analysis and preliminary investigation on modality; in The 20th International Conference on Pattern Recognition (ICPR’2018), Beijing, China.
[17] Z. Zhang, S. Wei, Y. Song, and Y. Zhang; Gesture Recognition Using Enhanced Depth Motion Map and Static Pose Map; in 2017 12th IEEE International Conference on Automatic Face Gesture Recognition (FG2017), 2017, pp. 238–244.