Comparative analysis of machine learning models for diabetes readmission classification

Hang Dang Thuy; Quyen Vo

pdf

Abstract Views: 6
Views pdf: 2

Issue

Vol. 36 No. 192: Journal of Science and Technology - Smart Systems and Devices 2026

Section

Research article

How to Cite

Dang Thuy, H., & Vo, Q. (2026). Comparative analysis of machine learning models for diabetes readmission classification. Smart Systems and Devices, 36(192). https://jst.vn/index.php/ssad/article/view/1310

Citation format:

Comparative analysis of machine learning models for diabetes readmission classification

Hang Dang Thuy, Quyen Vo

Abstract

This study presents a comparative analysis of various machine learning and deep learning models to predict the 30-day readmission risk for diabetic patients using Electronic Health Record (EHR) data. Utilizing the Diabetes 130-US Hospitals dataset, which comprises 101,766 records and 50 clinical features, the research aims to enhance classification accuracy for a highly unbalanced medical dataset. The proposed technical pipeline incorporates comprehensive preprocessing steps, specifically employing SMOTEENN for data balancing, Z-score normalization, and Principal Component Analysis (PCA) to reduce dimensionality while retaining 80% of the total variance (26 principal components). Seven classification models were benchmarked: Logistic Regression, Decision Tree, K-Nearest Neighbors, Random Forest (RF), Support Vector Machine, Multilayer Perceptron, and a proposed Deep Multilayer Perceptron with a hierarchical architecture. Experimental results demonstrate that the Random Forest model significantly outperforms others, achieving an accuracy of 86.90%, a peak F1-score of 89.94%, and a remarkably high recall of 92.67%. This represents a substantial improvement of 78.37% in recall compared to existing baseline studies. Furthermore, the hierarchical Deep MLP architecture showed a 2.77% improvement in recall over standard MLP models, highlighting its effectiveness in capturing complex clinical correlations. These findings suggest that the optimized Random Forest model possesses high potential for integration into early warning systems. By identifying high-risk patients promptly, healthcare providers can implement timely interventions, thereby reducing readmission rates and optimizing medical resources.

Keywords

Diabetes readmission, healthcare prediction, random forest.

This work is licensed under a Creative Commons Attribution 4.0 International License.

References

[1] B. Strack, J. P. DeShazo, C. Gennings, J. L. Olmo, S. Ventura, K. J. Cios, and J. N. Clore, "Impact of HbA1c measurement on hospital readmission rates: Analysis of 70,000 clinical database patient records," BioMed Research International, vol. 2014, Art. no. 781670, 2014.
[2] V. B. Liu, L. Y. Sue, and Y. Wu, "Comparison of machine learning models for predicting 30-day readmission rates for patients with diabetes," Journal of Medical Artificial Intelligence, vol. 7, 2024.
[3] O. G. Emi-Johnson and K. J. Nkrumah, "Predicting 30-day hospital readmission in patients with diabetes using machine learning on electronic health record data," Cureus, vol. 17, no. 4, Art. no. 82437, 2025.
[4] UC Irvine, "Diabetes 130-US Hospitals for Years 1999-2008," UCI Machine Learning Repository, 2014 [Online]. Available:https://archive.ics.uci.edu/dataset/296/diabetes+130-us+hospitals+for+years+1999-2008. Accessed on: Feb. 28, 2026
[5] C. M. Bishop, "Pattern recognition and machine learning," ch. 7, sec. 7.2.1, 2006.
[6] L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees. Belmont, CA, USA: Wadsworth, 1984.
[7] T. M. Cover and P. E. Hart, "Nearest neighbor pattern classification," IEEE Transactions on Information Theory, vol. IT-13, no. 1, 1967.
[8] L. Breiman, "Random forests," University of California, Berkeley, CA, USA, 2001.
[9] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin, "LIBLINEAR: A library for large linear classification," Journal of Machine Learning Research, 2008.

Article Sidebar

Main Article Content

Abstract

Keywords

Article Details

References