Multi-Task Learning for Age, Gender, and Emotion Recognition on Edge Processing
Main Article Content
Abstract
In this work, a multi-task learning model for age, gender, and emotion recognition on edge processing is developed. The multi-task model is based on the backbone of MobileNetV2 in which the three last layers are customized to have three outputs for age, gender, and emotion. The model was trained and tested on a dataset which is the combination of the well-known dataset, namely Internet Movie Database (IMDB) and our self-collected dataset. The trained model is then optimized and quantized to be implemented on Neural Processing Unit (NPU) of the chip RK3588 from Rockchip on Orange Pi plus hardware platform. Experimental evaluation on several testcase was performed. It is known that the multi-task model outputs prediction accuracy as high as single-task model while significantly reducing computational processing requirements. On Orange Pi platform, the highest prediction accuracy for age, gender and emotion are 3.485 MAE, 98.281%, and 93.917%, respectively. The computational performance reaches 285.7 frames per second as the highest. These results have a high potential for many practical applications on edge devices.
Keywords
Age, gender, and emotion recognition, multi-task learning, NPUs, edge processing.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
References
[2] Face analysis, Visage Technology, Diskettgatan 11A, 583 30 Linköping, Sweden. [Online]. Available: https://visagetechnologies.com/face-analysis/, Accessed on: 03/08/2024.
[3] Foggia, P., Greco, A., Saggese, A., and Vento, M., Multi-task learning on the edge for effective gender, age, ethnicity and emotion recognition, Engineering Applications of Artificial Intelligence, vol. 118, Feb. 2023, Art. no. 105651. https://doi.org/10.1016/j.engappai.2022.105651
[4] Sang, D. V., Cuong, L. T. B., and Thieu, V. V., Multi-task learning for smile detection, emotion recognition and gender classification in Proceedings of the 8th International Symposium on Information and Communication Technology, Dec. 2017, pp. 340-347. https://doi.org/10.1145/3155133.3155207
[5] Vu, D. Q., Phung, T. T. T., Wang, C. Y., and Wang, C., Age and gender recognition using multi-task CNN, in 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Lanzhou, China, 18-21 Nov. 2019, pp. 1937-1941. https://doi.org/10.1109/APSIPAASC47483.2019.9023045
[6] Yoo, B., Kwak, Y., Kim, Y., Choi, C., and Kim, J., Deep facial age estimation using conditional multitask learning with weak label expansion in IEEE Signal Processing Letters, vol. 25, iss. 6, Apr. 2018, pp. 808-812. https://doi.org/10.1109/LSP.2018.2822241
[7] Xu, L., Fan, H., and Xiang, J., Hierarchical multi-task network for race, gender and facial attractiveness recognition in 2019 IEEE International conference on image processing (ICIP), Taipei, Taiwan, Aug. 2019, pp. 3861-3865. https://doi.org/10.1109/ICIP.2019.8803614
[8] Ming, Z., Xia, J., Luqman, M. M., Burie, J. C., and Zhao, K., Dynamic multi-task learning for face recognition with facial expression, arXiv preprint arXiv:1911.03281, 2019. [Online]. Available: https://arxiv.org/abs/1911.03281
[9] Han, H., Jain, A. K., Wang, F., Shan, S., and Chen, X., Heterogeneous face attribute estimation: a deep multi-task learning approach, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, iss. 11, Aug. 2017, pp. 2597-2609. https://doi.org/10.1109/TPAMI.2017.2738004
[10] Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and Chen, L. C., MobileNetV2: inverted residuals and linear bottlenecks in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18-23 Jun. 2018, Salt Lake City, UT, USA, pp. 4510-4520. https://doi.org/10.1109/CVPR.2018.00474
[11] Lin, Y., Shen, J., Wang, Y., and Pantic, M., FP-Age: leveraging face parsing attention for facial age estimation in the wild, IEEE Transactions on Image Processing, vol. 31, pp. 5979-5992, 2022.
[12] IMDB-Clean: A novel benchmark for age estimation in the wild. [Online]. Available: https://github.com/yiminglin-ai/imdb-clean, Accessed on: 03/08/2024.
[13] Rockchip product introduction, Fuzhou, Fujian, PRC. [Online]. Available: https://www.rock-chips.com/a/en/, Accessed on: 03/08/2024.
[14] Orange Pi production introduction, Shenzhen, Guangdong, China. [Online]. Available: http://www.orangepi.org/html/hardWare/computerAndMicrocontrollers/service-and-support/Orange-Pi-5-plus.html, Accessed on: 03/08/2024.
[15] Introduction of Rockchip Toolkit. [Online]. Available: https://github.com/rockchip-linux/rknn-toolkit, Accessed on: 03/08/2024 [16] P. J. Huber, Robust estimation of a location parameter in Breakthroughs in Statistics, S. Kotz and N. L. Johnson, Eds. New York, Springer, 1992, pp. 492-518. https://doi.org/10.1007/978-1-4612-4380-9_35
[17] D. T. Tran, EfficientNet: a new approach to model scaling for convolutional neural networks (In Vietnamese: Cách tiếp cận mới về model scaling cho convolutional neural networks). [Online]. Available: https://viblo.asia/p/efficientnet-cach-tiep-can-moi-ve-model-scaling-cho-convolutional-neural-networks-Qbq5QQzm5D8, Accessed on: 03/08/2024.