Human Action Recognition using Prominent Camera

Document Type : Original Article


1 Indira Gandhi Delhi Technical University for Women, Delhi, India

2 Delhi Technological University, Delhi, India


Human action recognition has undoubtedly been under research for a long time. The reason being its vast applications such as visual surveillance, security, video retrieval, human interaction with machine/robot in the entertainment sector, content-based video compression, and many more. Multiple cameras are used to overcome human action recognition challenges such as occlusion and variation in viewpoint. The use of multiple cameras overloads the system with a large amount of data, thus a good recognition rate is achieved with cost (in terms of both computation and data) as the overhead. In this research, we propose a methodology to improve the action recognition rate by using a single camera from multiple camera environments. We applied a modified bag-of-visual-words based action recognition method with the Radial Basis Function-Support Vector Machine (RBF-SVM) as a classifier. Our experiment on a standard and publicly available dataset with multiple cameras shows an improved recognition rate compared to other state-of-the-art methods.


  1. Iosifidis, A., Anastasios T. and Ioannis P., “Multi-view Human Action Recognition: A Survey.” 2013 Ninth International Conference on Intelligent Information Hiding and Multimedia Signal Processing (2013), 522-525, doi: 10.1109/IIH-MSP.2013.135.   
  2. Yilmaz A. and Shah M., “Recognizing human actions in videos acquired by uncalibrated moving cameras” Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1, Beijing, (2005), 150-157, doi: 10.1109/ICCV.2005.201.
  3. Ashraf, N., Sun, C. and Hassan, F., "View Invariant Action Recognition Using Projective Depth” Computer Vision and Image Understanding, Vol. 123, (2014) doi.:10.1016/j.cviu.2014.03.005.
  4. Junejo, I., Dexter, E., Laptev, Ivan and Pérez, P., “View-Independent Action Recognition from Temporal Self-Similarities," In IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 33, No. 1, (2011), 172-185, doi.: 10.1109/TPAMI.2010.68.
  5. Lewandowski, M., Dimitrios, M. and Jean-Christophe, N., “View and Style-Independent Action Manifolds for Human Activity Recognition.” ECCV (2010).
  6. Ahmad, M. and Lee, S. W., “HMM-based human action recognition using multiview image sequences”, In Proceedings - 18th International Conference on Pattern Recognition, ICPR 2006, (2006), 263-266. doi.:10.1109/ICPR.2006.630
  7. Wu, X. and Jia, Y., “View-Invariant Action Recognition Using Latent Kernelized Structural SVM”, Fitzgibbon A., Lazebnik S., Perona P., Sato Y., Schmid C. (eds) Computer Vision – ECCV 2012, Lecture Notes in Computer Science, Vol. 7576. (2012) Springer, Berlin, Heidelberg. doi.: 10.1007/978-3-642-33715-4_30
  8. Zhu, F., Shao, L. and Lin, M. “Multi-view action recognition Using local similarity random forests and sensor fusion”, Pattern Recognition Letters. Vol. 34, (2013) 20-24. doi.: 10.1016/j.patrec.2012.04.016.
  9. Iosifidis, A., Tefas, A. and Pitas, I., “View-Invariant Action Recognition Based on Artificial Neural Networks,” IEEE Transactions on Neural Networks and Learning Systems, Vol. 23, No. 3, (2012). 412-424, doi.: 10.1109/TNNLS.2011.2181865.
  10. Wang, J., Zheng, H., Gao, J. and Cen, J., “Cross-View Recognition based ona Statistical Translation Framework”, IEEE Transactions on Circuits and Systems for Video Technology, Vol 26, (2014) doi. : 10.1109/TCSVT.2014.2382984
  11. Pierobon, M. Marcon, M., Sarti, A. and Tubaro, S., “3-D Body Posture Tracking For Human Action Template Matching,” 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, Toulouse, (2006) doi.: 10.1109/ICASSP.2006.1660389.
  12. Weinland, D., Ronfard, R. and Boyer, E., “Free viewpoint action recognition using motion history volumes”, Computer Vision and Image Understanding, Vol. 104, No. 2-3, (2006), 249-257, doi :
  13. Kazhdan, M.M., Funkhouser, T.A., and Rusinkiewicz, S., “Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors”, Symposium on Geometry Processing, (2003) doi.:10.2312/SGP/SGP03/156-165
  14. Gkalelis, N. Nokilaidis, N. and  Pitas, I., “View independent human movement recognition from multi-view video exploiting a circular invariant posture representation”,  IEEE International Conference on Multimedia and Expo, (2009), 394-397. doi.: 10.1109/ICME.2009.5202517.
  15. Holte M.B., Moeslund, T.B. and Fihl, P., “View-invariant gesture recognition using 3D optical flow and harmonic motion context”, Computer Vision and Image Understanding, Vol. 114, No. 12, (2010), 1353-136,
  16. Holte, M.B., Moeslund, T., Nikolaidis, N. and  Pitas, I., “3D Human Action Recognition for Multi-view Camera Systems”, International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission, (2011), 342-349. doi.: 10.1109/3DIMPVT.2011.50.
  17. Feizi, A., “Convolutional Gating Network for Object Tracking”,  International Journal of Engineering, Transactions A: Basics, Vol. 32, No. 7, (2019), 931-939,
  18. Sezavar, A., Farsi, H. and Mohamadzadeh, S., “A Modified Grasshopper Optimization Algorithm Combined with CNN for Content Based Image Retrieval,” International Journal of Engineering, Transactions A: Basics, Vol. 32, No. 7, (2019), 924-930,
  19. Anding, K., Kuritcyn, P. and Garten, D., “Using artificial intelligence strategies for process-related automated inspection in the production environment”,  Journal of Physics: Conference Series, Vol. 772, (2016). doi.:10.1088/1742-6596/772/1/012026.
  20. Holte, M.B., Moeslund, T., Tran, C. and Trivedi M.M., “Human Action Recognition using Multiple Views: A Comparative Perspective on Recent Developments”, MM’11-Proceedings of the 2011 ACM Multimedia Conference and Co-Located Workshops, (2011), doi.: 10.1145/2072572.2072588
  21. Chen, Z. and Ellis, T.J., “A self-adaptive Gaussian mixture model”, Computer Vision and Image Understanding., (2014), Vol. 122, 35-46.
  22. Bobick, A.F. and Davis, J.W., “The recognition of human movement using temporal templates,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 23, No. 3, (2001), 257-267. doi.: 10.1109/34.910878.
  23. Mishra, O.,  Kapoor, R. and Tripathi, M.M., “Human Action Recognition Using Modified Bag of Visual Word based on Spectral Perception”,  International Journal of Image, Graphics and Signal Processing, Vol 11, (2019), 34-43. doi.: 10.5815/ijigsp.2019.09.04.
    1. Weinland, D., Ronfard, R. and Boyer, E., “Free viewpoint action recognition using motion history volumes”, Computer Vision and Image Understanding, Vol. 104, No. 2-3, (2006), 249-257. doi: