Vietnamese Sign Language Alphabet Recognition Using Deep Learning and Mediapipe Methods
Main Article Content
Abstract
Sign language serves as a vital communication method for individuals with hearing impairments, relying on hand movements and gestures to convey meaning. For centuries, it has enabled interaction for people with hearing and speech disabilities. However, despite its historical significance, many individuals in society struggle to interpret these signs, creating a communication barrier with the deaf and mute community. This paper proposes a deep learning-based system specifically designed to recognize Vietnamese Sign Language (VSL) gestures. The dataset developed includes 23 alphabet signs and 2 accent marks unique to VSL, with 22 of the alphabet signs resembling those in English. The proposed system achieves an accuracy exceeding 91% on the raw dataset and 95% on the processed dataset.
Keywords
Vietnamese Sign Language, Mediapipe, keypoint, image processing, deep learning.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
References
[1] R. J. Ruben, Sign language: Its history and contribution to the understanding of the biological nature of language, Acta Oto-Laryngologica, vol. 125, iss. 5, pp. 464-467, 2005. https://doi.org/10.1080/00016480510026287
[2] K. Emmorey, J. S. Reilly, and J. S. Reilly, Language, Gesture, and Space, New York, Psychology Press, 2013, 464 pp. https://doi.org/10.4324/9780203773413
[3] G. Plouffe and A.-M. Cretu, Static and dynamic hand gesture recognition in depth data using dynamic time warping, IEEE Transactions on Instrumentation and Measurement, vol. 65, iss. 2, pp. 305-316, Nov. 2015. https://doi.org/10.1109/TIM.2015.2498560
[4] S. Shah, A. Kotia, K. Nisar, A. Udeshi and P. P. M. Chawan, A vision based hand gesture recognition system using convolutional neural networks, International Research Journal of Engineering and dnhTechnology (IRJET), vol. 06, no. 04, Apr. 2019.
[5] A. Vo, B. N. Thiem and V. H. Pham, Deep Learning for Vietnamese sign language recognition in video sequence, International Journal of Machine Learning and Computing, vol. 9, no. 4, Aug. 2019. https://doi.org/10.18178/ijmlc.2019.9.4.823
[6] Z. Zhou, K. Chen, X. Li, S. Zhang, Y. Wu, Y. Zhou, K. Meng, C. Sun, Q. He, W. Fan, E. Fan, Z. Lin, X. Tan, W. Deng, J. Yang and J. Chen, Sign-to-speech translation using machine-learning-assisted stretchable sensor arrays, Nature Electronics, pp. 571-578, Jun. 2020. https://doi.org/10.1038/s41928-020-0428-6
[7] Oyedotun, O.K., & Khashman, Deep learning in vision-based static hand gesture recognition, Neural Computing and Applications, vol. 28, iss. 12, pp. 3941-3951, Dec. 2017. https://doi.org/10.1007/s00521-016-2294-8
[8] D. Golekar, R. Bula, R. Hole, S. Katare and P. S. Parab, Sign language recognition using Python andopencv, International Research Journal of Modernization in Engineering Technology and Science , vol. 04, iss. 02, pp. 1179-1183, Feb. 2022.
[9] Wang, H., Ru, B., Miao, X., Gao, Q., Habib, M., Liu, L., et al., MEMS devicesbased hand gesture recognition via wearable computing, Micromachines, vol. 14, iss. 5, Apr. 2023. https://doi.org/10.3390/mi14050947 [10] Wang, S., Wang, A., Ran, M., Liu, L., Peng, Y., Liu, M., et al, Hand gesture recognition framework using a lie group based spatio-temporal recurrent network with multiple hand-worn motion sensors, Information Sciences, vol. 606, pp. 722-74, Aug. 2022. https://doi.org/10.1016/j.ins.2022.05.085
[11] Al-Shamayleh, A. S., Ahmad, R., Abushariah, M. A. M., Alam, K. A., & Jomhari,A systematic literature review on vision based gesture recognition techniques, Multimedia Tools Applications, vol. 77, pp. 28121-28184, Apr. 2018. https://doi.org/10.1007/s11042-018-5971-z
[12] Rahman, M. M., Uzzaman, A., & Aktaruzzaman, M., Developing a real-time touchless human-computer interaction using hand gesture recognition, in IEEE CS BDC summer symposium. IEEE, Bangladesh, Jun. 2023.
[13] F. R. Cordeiro, S. Chevtchenko, R. F. Vale and V. Macario, A convolutional neural network with feature fusion for real-time hand posture recognition, Applied Soft Computing, vol. 73, pp. 748-766, Nov. 2018. https://doi.org/10.1016/j.asoc.2018.09.010
[14] P. Rathi, R. K. Gupta, S. Agarwal, A. Shukla and R. Tiwari, Sign language recognition using ResNet50 deep neural network architecture, in 5th International Conference on Next Generation Computing Technologies, Feb. 2020. https://doi.org/10.2139/ssrn.3545064
[15] P. Bhatia and A. Wadhawan, Deep learning-based sign language recognition system for static signs, Neural Computing and Applications, vol. 32, pp. 7957-7968, Jan. 2020. https://doi.org/10.1007/s00521-019-04691-y
[16] H. B. D. Nguyen and H. N. Do, Deep learning for american sign language fingerspelling recognition system in 2019 26th International Conference on Telecommunications (ICT), Hanoi, Vietnam, 2019, pp. 314-318. https://doi.org/10.1109/ICT.2019.8798856
[17] Bowen Shi, Diane Brentari, Greg Shakhnarovich, Karen Livescu; Fingerspelling detection in American Sign Language in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4166-4175, Jun. 2021 https://doi.org/10.1109/CVPR46437.2021.00415
[18] Costello, Elaine, American Sign Language Dictionary, Random House Reference, 2nd ed., 2008.
[19] Vietnamese Ministry of Education and Training, Promulgate regulations on national standards on sign language for people with disabilities, 2020.
[20] S. Chandran, Color image to grayscale image conversion, Conference on Computer Engineering and Applications (ICCEA), 2010 Second International Conference, vol. 2, Apr. 2010.
[21] N. Mahamkali and V. Ayyasamy, OpenCV for computer vision applications, Conference: Proceedings of National Conference on Big Data and Cloud Computing (NCBDC’15), March 20, 2015.
[22] C. Lugaresi, J. Tang, H. Nash, C. McClanahan, E. Uboweja, M. Hays, F. Zhang, C.-L. Chang, M. G. Yong, J. Lee, W.-T. Chang, W. Hua, M. Georg and M. Grundmann, MediaPipe: A framework for building perception pipelines, arXiv preprint arXiv:1906.08172, Jun. 2019.
[23] Y. LeCun, Y. Bengio and G. Hinton, Deep learning, N U R S E, Nature, vol. 521, pp. 436-444, May. 2015. https://doi.org/10.1038/nature14539
[2] K. Emmorey, J. S. Reilly, and J. S. Reilly, Language, Gesture, and Space, New York, Psychology Press, 2013, 464 pp. https://doi.org/10.4324/9780203773413
[3] G. Plouffe and A.-M. Cretu, Static and dynamic hand gesture recognition in depth data using dynamic time warping, IEEE Transactions on Instrumentation and Measurement, vol. 65, iss. 2, pp. 305-316, Nov. 2015. https://doi.org/10.1109/TIM.2015.2498560
[4] S. Shah, A. Kotia, K. Nisar, A. Udeshi and P. P. M. Chawan, A vision based hand gesture recognition system using convolutional neural networks, International Research Journal of Engineering and dnhTechnology (IRJET), vol. 06, no. 04, Apr. 2019.
[5] A. Vo, B. N. Thiem and V. H. Pham, Deep Learning for Vietnamese sign language recognition in video sequence, International Journal of Machine Learning and Computing, vol. 9, no. 4, Aug. 2019. https://doi.org/10.18178/ijmlc.2019.9.4.823
[6] Z. Zhou, K. Chen, X. Li, S. Zhang, Y. Wu, Y. Zhou, K. Meng, C. Sun, Q. He, W. Fan, E. Fan, Z. Lin, X. Tan, W. Deng, J. Yang and J. Chen, Sign-to-speech translation using machine-learning-assisted stretchable sensor arrays, Nature Electronics, pp. 571-578, Jun. 2020. https://doi.org/10.1038/s41928-020-0428-6
[7] Oyedotun, O.K., & Khashman, Deep learning in vision-based static hand gesture recognition, Neural Computing and Applications, vol. 28, iss. 12, pp. 3941-3951, Dec. 2017. https://doi.org/10.1007/s00521-016-2294-8
[8] D. Golekar, R. Bula, R. Hole, S. Katare and P. S. Parab, Sign language recognition using Python andopencv, International Research Journal of Modernization in Engineering Technology and Science , vol. 04, iss. 02, pp. 1179-1183, Feb. 2022.
[9] Wang, H., Ru, B., Miao, X., Gao, Q., Habib, M., Liu, L., et al., MEMS devicesbased hand gesture recognition via wearable computing, Micromachines, vol. 14, iss. 5, Apr. 2023. https://doi.org/10.3390/mi14050947 [10] Wang, S., Wang, A., Ran, M., Liu, L., Peng, Y., Liu, M., et al, Hand gesture recognition framework using a lie group based spatio-temporal recurrent network with multiple hand-worn motion sensors, Information Sciences, vol. 606, pp. 722-74, Aug. 2022. https://doi.org/10.1016/j.ins.2022.05.085
[11] Al-Shamayleh, A. S., Ahmad, R., Abushariah, M. A. M., Alam, K. A., & Jomhari,A systematic literature review on vision based gesture recognition techniques, Multimedia Tools Applications, vol. 77, pp. 28121-28184, Apr. 2018. https://doi.org/10.1007/s11042-018-5971-z
[12] Rahman, M. M., Uzzaman, A., & Aktaruzzaman, M., Developing a real-time touchless human-computer interaction using hand gesture recognition, in IEEE CS BDC summer symposium. IEEE, Bangladesh, Jun. 2023.
[13] F. R. Cordeiro, S. Chevtchenko, R. F. Vale and V. Macario, A convolutional neural network with feature fusion for real-time hand posture recognition, Applied Soft Computing, vol. 73, pp. 748-766, Nov. 2018. https://doi.org/10.1016/j.asoc.2018.09.010
[14] P. Rathi, R. K. Gupta, S. Agarwal, A. Shukla and R. Tiwari, Sign language recognition using ResNet50 deep neural network architecture, in 5th International Conference on Next Generation Computing Technologies, Feb. 2020. https://doi.org/10.2139/ssrn.3545064
[15] P. Bhatia and A. Wadhawan, Deep learning-based sign language recognition system for static signs, Neural Computing and Applications, vol. 32, pp. 7957-7968, Jan. 2020. https://doi.org/10.1007/s00521-019-04691-y
[16] H. B. D. Nguyen and H. N. Do, Deep learning for american sign language fingerspelling recognition system in 2019 26th International Conference on Telecommunications (ICT), Hanoi, Vietnam, 2019, pp. 314-318. https://doi.org/10.1109/ICT.2019.8798856
[17] Bowen Shi, Diane Brentari, Greg Shakhnarovich, Karen Livescu; Fingerspelling detection in American Sign Language in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4166-4175, Jun. 2021 https://doi.org/10.1109/CVPR46437.2021.00415
[18] Costello, Elaine, American Sign Language Dictionary, Random House Reference, 2nd ed., 2008.
[19] Vietnamese Ministry of Education and Training, Promulgate regulations on national standards on sign language for people with disabilities, 2020.
[20] S. Chandran, Color image to grayscale image conversion, Conference on Computer Engineering and Applications (ICCEA), 2010 Second International Conference, vol. 2, Apr. 2010.
[21] N. Mahamkali and V. Ayyasamy, OpenCV for computer vision applications, Conference: Proceedings of National Conference on Big Data and Cloud Computing (NCBDC’15), March 20, 2015.
[22] C. Lugaresi, J. Tang, H. Nash, C. McClanahan, E. Uboweja, M. Hays, F. Zhang, C.-L. Chang, M. G. Yong, J. Lee, W.-T. Chang, W. Hua, M. Georg and M. Grundmann, MediaPipe: A framework for building perception pipelines, arXiv preprint arXiv:1906.08172, Jun. 2019.
[23] Y. LeCun, Y. Bengio and G. Hinton, Deep learning, N U R S E, Nature, vol. 521, pp. 436-444, May. 2015. https://doi.org/10.1038/nature14539