A Comprehensive Review of Vietnamese Sign Language Recognition Techniques
Main Article Content
Abstract
This paper presents a systematic quantitative literature review of Vietnamese Sign Language (VSL) recognition techniques developed between 2015 and 2025. VSL recognition plays a vital role in bridging communication gaps and enhancing accessibility for the deaf and hard-of-hearing community in Vietnam. To identify and synthesize current trends and challenges, we conducted a structured search and screening process across major academic databases. These works were analyzed based on recognition approach (e.g., computer vision, wearable sensors, data-driven methods, and multimodal data fusion), datasets used, feature extraction strategies, classification models, and performance metrics. Descriptive statistics were used to map the evolution of methods over time, while comparative analyses highlighted the strengths and limitations of different techniques across real-time and static recognition tasks. Our findings indicate a growing shift towards deep learning and sensor fusion methods, though limitations persist in dataset availability, model generalizability, and real-world deployment. This review provides critical insights into current research gaps and offers guidance for future work on scalable, culturally adaptive VSL recognition systems.
Keywords
Computer vision, machine learning, sign language recognition, Vietnamese sign language.
Article Details
References
[2] K. Nguyen-Trong, H. N. Vu, N. N. Trung, and C. Pham, Gesture recognition using wearable sensors with Bi-Long short-term memory convolutional neural networks, IEEE Sensors Journal, vol. 21, no. 13, pp. 15065–15079, 2021. https://doi.org/10.1109/JSEN.2021.3074642
[3] L. T. Phi, H. D. Nguyen, T. Q. Bui, and T. T. Vu, A glove-based gesture recognition system for Vietnamese sign language, in 15th International Conference on Control Automation and Systems, Busan, 2015. https://doi.org/10.1109/ICCAS.2015.7364604
[4] H. Q. Nguyen, T. H. Le, T. K. Tran, H. N. Tran, T. H. Tran, T. L. Le, H. Vu, C. Pham, T. P. Nguyen, and H. T. Nguyen, Hand gesture recognition from wrist-worn camera for human–machine interaction, IEEE Access, vol. 11, pp. 53262–53274, 2023. https://doi.org/10.1109/ACCESS.2023.3279845
[5] V. H. P. Anh H. Vo, and B. T. Nguyen, Deep learning for Vietnamese sign language recognition in video sequence, International Journal of Machine Learning and Computing, vol. 9, no. 4, 2019.
https://doi.org/10.18178/ijmlc.2019.9.4.823
[6] Q. P. Van and B. N. Thanh, Vietnamese sign language recognition using dynamic object extraction and deep learning, in IEEE Eighth International Conference on Communications and Electronics (ICCE), Phu Quoc Island, 2020.
[7] K. H. V. Nguyen, A.-D. Phan, T. B. Minh, T. T. T. Phan, and X. P. Do, Gesture recognition model with multi-tracking capture system for human-robot interaction, in International Conference on System Science and Engineering (ICSSE), 2023. https://doi.org/10.1109/ICSSE58758.2023.10227183
[8] V. Iliukhina, K. Mitkovskiib, D. Bizyanovaa, and A. Akopyana, E development of motion capture system based on kinect sensor abstract and bluetooth-gloves, Procedia Engineering, vol. 176, pp. 506–513, 2017. https://doi.org/10.1016/j.proeng.2017.02.350
[9] L. Huynh and V. Ngo, Recognize Vietnamese sign language using deep neural network, in 7th NAFOSTED Conference on Information and Computer Science (NICS), Ho Chi Minh City, 2020.
https://doi.org/10.1109/NICS51282.2020.9335904
[10] A. H. Vo, V.-H. Pham, and B. T. Nguyen, Deep learning for Vietnamese sign language recognition in video sequence, International Journal of Machine Learning and Computing, vol. 9, no. 4, 2019.
https://doi.org/10.18178/ijmlc.2019.9.4.823
[11] C. M. Jin, Z. Omar, and M. H. Jaward, A mobile application of American sign language translation via image processing algorithms, in IEEE Region 10 Symposium (TENSYMP), Bali, 2016.
https://doi.org/10.1109/TENCONSpring.2016.7519386
[12] D. H. Vo, H. H. Huynh, P. M. Doan, and J. Meunier, Dynamic gesture classification for Vietnamese sign language recognition, (IJACSA) International Journal of Advanced Computer Science and Applications, vol. 8, no. 3, 2017. https://doi.org/10.14569/IJACSA.2017.080357
[13] H. P. The, H. Chau, V. P. Bui, and K. Ha, Automatic feature extraction for Vietnamese sign language recognition using support vector machine, in 2nd International Conference on Recent Advances in Signal Processing, Telecommunications and Computing (SigTelCom), 2018.
[14] D.-H. Vo, Huu-Huynh, T.-N. Nguyen, and J. Meunier, Automatic hand gesture segmentation for recognition of Vietnamese sign language in 7th Symposium on Information and Communication Technology, 2016.
[15] Q. L. Da, N. H. D. Khang, and N. C. Ngon, Converting the Vietnamese television news into 3D sign language animations for the deaf, Lecture Notes of the Institute for Computer Sciences - Social Informatics and Telecommunications Engineering, vol. 257, 2019.
[16] P. T. Hai, H. C. Thinh, B. Van Phuc, and H. H. Kha, Automatic feature extraction for Vietnamese sign language recognition using support vector machine, in Proceedings - 2018 2nd International Conference on Recent Advances in Signal Processing, Telecommunications and Computing, SIGTELCOM 2018, Institute of Electrical and Electronics Engineers Inc., pp. 146–151, Mar. 2018.
https://doi.org/10.1109/SIGTELCOM.2018.8325780
[17] D. H. Vo, H. H. Huynh, T. N. Nguyen, and J. Meunier, Automatic hand gesture segmentation for recognition of Vietnamese sign language, in ACM International Conference Proceeding Series, Association for Computing Machinery, pp. 368–373, Dec. 2016. https://doi.org/10.1145/3011077.3011135
[19] L. D. Quach and C.-N. Nguyen, Conversion of the Vietnamese grammar into sign language structure using the example-based machine translation algorithm conversion of the Vietnamese grammar into sign language structure using the example-based machine translation algorithm in International Conference on Advanced Technologies for Communications (ATC), 2018. https://doi.org/10.1109/ATC.2018.8587584
[20] D.-H. Vo, T.-N. Nguyen, H.-H. Huynh, and J. Meunier, Recognizing Vietnamese sign language based on rank matrix and alphabetic rules, in International Conference on Advanced Technologies for Communications (ATC), Ho Chi Minh City, 2015.
[21] A. H. Vo, N. T. Q. Nguyen, N. T. B. Nguyen, H. V. Pham, and B. T. Nguyen, Video-based Vietnamese sign language recognition using local descriptors, intelligent information and database systems, vol. 11432, 2019.
[22] H.-Q. Nguyen, T.-H. Le, T.-K. Tran, H.-N. Tran, T.-H. Tran, T.-L. Le, H. Vu, C. Pham, T. P. Nguyen, and H. T. Nguyen, Hand gesture recognition from wrist-worn camera for human–machine interaction, IEEE Access, vol. 11, pp. 53262–53274, 2023. https://doi.org/10.1109/ACCESS.2023.3279845
[23] B. Khaertdinov, E. Ghaleb, and S. Asteriadis, Contrastive self-supervised learning for sensor-based human activity recognition, in IEEE International Joint Conference on Biometrics (IJCB), Shenzhen, 2021. https://doi.org/10.1109/IJCB52358.2021.9484410
[24] L.-D. Quach, N. Duong-Trung, A.-V. Vu, and C.-N. Nguyen, Recommending the workflow of Vietnamese sign language translation via a comparison of several classification algorithms, in International Conference of the Pacific Association for Computational Linguistics, 2020.
https://doi.org/10.1007/978-981-15-6168-9_12
[25] H.-N. Vu, T. Hoang, C. Tran and C. Pham, Sign language recognition with self-learning fusion model, IEEE Sensors Journal, vol. 23, no. 22, pp. 27828–27840, 2023. https://doi.org/10.1109/JSEN.2023.3314728
[26] K. D. Bach, P. T. Duong, P. T. T. Ha, B. N. Anh, and N. T. son, Vietnamese sign language detection using mediapipe, in Proceedings of the 2021 10th International Conference on Software and Computer Applications (ICSCA), Kuala Lumpur, 2021.
[27] P. N. Huu, T. L. Ngoc, and Q. T. Minh, Proposing gesture recognition algorithm using two-stream convolutional network and LSTM, in International Conference on Communications and Electronics (ICCE), Phu Quoc Island, 2021. https://doi.org/10.1109/ICCE48956.2021.9352147
[28] Dang Khanh, Bessmertny I. A. ViSL one-shot: generating Vietnamese sign language data set, Scientific and Technical Journal of Information Technologies, Mechanics and Optics, vol. 24, no. 2, pp. 241–248, 2024. https://doi.org/10.17586/2226-1494-2024-24-2-241-248
[29] Dang Khanh, Bessmertny I. A. ViSL one-shot: generating Vietnamese sign language data set. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, vol. 24, no. 2, pp. 241–248, 2024. https://doi.org/10.17586/2226-1494-2024-24-2-241-248
[30] Xuan-Phuoc Nguyen, Thi-Huong Nguyen, Duc-Tan Tran, Tien-Son Bui, and Van-Toi Nguyen, An isolated Vietnamese sign language recognition method using a fusion of heatmap and depth information based on convolutional neural networks, Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2024. https://doi.org/10.1109/APSIPAASC63619.2025.10848961
[31] P. N. Huu and H. N. T. Thu, Gesture recognition algorithm combining CNN for health monitoring, in 6th NAFOSTED Conference on Information and Computer Science (NICS), Hanoi, 2019.
https://doi.org/10.1109/NICS48868.2019.9023804
[32] P. N. Huu, Q. T. Minh, and H. L. The, An ANN-based gesture recognition algorithm for smart-home applications, KSII Transactions on Internet and Information Systems, vol. 14, no. 5, pp. 1967–1983, 2020. https://doi.org/10.3837/tiis.2020.05.006
[33] P. N. Huu and T. L. Ngoc, Two-stream convolutional network for dynamic hand gesture recognition using convolutional long short-term memory networks, Vietnam Journal of Science and Technology, vol. 58, no. 4, pp. 514–523, 2020. https://doi.org/10.15625/2525-2518/58/4/14742
[34] Dang Kh., Bessmertny I. A. ViSL model: the model automatically generates sentences of Vietnamese sign language, Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2024, vol. 24, no. 5, pp. 779–787. https://doi.org/10.17586/2226-1494-2024-24-5-779-787
[35] P. N. Huu, H. N. T. Thu, and Q. T. Minh, Proposing a recognition system of gestures using mobilenetV2 combining single shot detector network for smart-home applications, Journal of Electrical and Computer Engineering, vol. 2021, 2021. https://doi.org/10.1155/2021/6610461
[36] Tran Anh Vu, Phung Van Kien, Hoang Quang Huy, and Pham Thi Viet Huong, Vietnamese sign language ecognition using deep learning and mediapipe methods, Journal of Science and Technology: Smart Systems and Devices, vol. 35, no. 1, pp. 010–019, 2025.
https://doi.org/10.51316/jst.179.ssad.2025.35.1.2
[37] Hoai, N. V., and Anh, D. T., Diffusion-guided graph convolutional networks for sign language recognition, Signal, Image and Video Processing, 2025. https://doi.org/10.1007/s11760-025-04007-9
[38] Dinh, S. N., Tran, T. D., Pham, H. N., Tran, H. T., Tong, A. N., Hoang, H. Q., and Nguyen, L. P., Sign language recognition: a large-scale multi-view dataset and comprehensive evaluation, in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 7876–7886, 2025. https://doi.org/10.1109/WACV61041.2025.00766
[39] T. T. D. Nguyen, T. T. N. Do, Q. H. Hoang, P. L. Nguyen, and A. V. Tran, M³-SLR: self-supervised pretraining with MaxFlow MaskFeat for improved multi-view sign language representation, in IEEE Access, vol. 13, pp. 148170–148191, 2025. https://doi.org/10.1109/ACCESS.2025.3601235