Comparative analysis of machine learning models for diabetes readmission classification
Main Article Content
Abstract
This study presents a comparative analysis of various machine learning and deep learning models to predict the 30-day readmission risk for diabetic patients using Electronic Health Record (EHR) data. Utilizing the Diabetes 130-US Hospitals dataset, which comprises 101,766 records and 50 clinical features, the research aims to enhance classification accuracy for a highly unbalanced medical dataset. The proposed technical pipeline incorporates comprehensive preprocessing steps, specifically employing SMOTEENN for data balancing, Z-score normalization, and Principal Component Analysis (PCA) to reduce dimensionality while retaining 80% of the total variance (26 principal components). Seven classification models were benchmarked: Logistic Regression, Decision Tree, K-Nearest Neighbors, Random Forest (RF), Support Vector Machine, Multilayer Perceptron, and a proposed Deep Multilayer Perceptron with a hierarchical architecture. Experimental results demonstrate that the Random Forest model significantly outperforms others, achieving an accuracy of 86.90%, a peak F1-score of 89.94%, and a remarkably high recall of 92.67%. This represents a substantial improvement of 78.37% in recall compared to existing baseline studies. Furthermore, the hierarchical Deep MLP architecture showed a 2.77% improvement in recall over standard MLP models, highlighting its effectiveness in capturing complex clinical correlations. These findings suggest that the optimized Random Forest model possesses high potential for integration into early warning systems. By identifying high-risk patients promptly, healthcare providers can implement timely interventions, thereby reducing readmission rates and optimizing medical resources.
Keywords
Diabetes readmission, healthcare prediction, random forest.
Article Details

This work is licensed under a Creative Commons Attribution 4.0 International License.
References
[2] V. B. Liu, L. Y. Sue, and Y. Wu, "Comparison of machine learning models for predicting 30-day readmission rates for patients with diabetes," Journal of Medical Artificial Intelligence, vol. 7, 2024.
[3] O. G. Emi-Johnson and K. J. Nkrumah, "Predicting 30-day hospital readmission in patients with diabetes using machine learning on electronic health record data," Cureus, vol. 17, no. 4, Art. no. 82437, 2025.
[4] UC Irvine, "Diabetes 130-US Hospitals for Years 1999-2008," UCI Machine Learning Repository, 2014 [Online]. Available:https://archive.ics.uci.edu/dataset/296/diabetes+130-us+hospitals+for+years+1999-2008. Accessed on: Feb. 28, 2026
[5] C. M. Bishop, "Pattern recognition and machine learning," ch. 7, sec. 7.2.1, 2006.
[6] L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees. Belmont, CA, USA: Wadsworth, 1984.
[7] T. M. Cover and P. E. Hart, "Nearest neighbor pattern classification," IEEE Transactions on Information Theory, vol. IT-13, no. 1, 1967.
[8] L. Breiman, "Random forests," University of California, Berkeley, CA, USA, 2001.
[9] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin, "LIBLINEAR: A library for large linear classification," Journal of Machine Learning Research, 2008.