Principal Component Analysis for Dimensionality Reduction of the Breast Cancer Dataset

Anh Vu Tran1, Dieu Huyen Le1, Thi Diu Dong1, Quang Huy Hoang1, Thi Viet Huong Pham2,
1 Ha Noi University of Science and Technology, Ha Noi, Vietnam
2 International School, Vietnam National University, Hanoi, Vietnam

Main Article Content

Abstract

Breast cancer is a prevalent global health concern among women. This study systematically investigates the application of Principal Component Analysis for dimensionality reduction on the Wisconsin Breast Cancer Dataset. We evaluate the impact of varying PCA dimensions on the performance of several machine learning and deep learning models, including Support Vector Machine, K-Nearest Neighbors, Random Forest, Multilayer Perceptron, Fully Connected Neural Network, and Dropout models. Our findings demonstrate that PCA can enhance model performance and accuracy when dimensions are reduced from high to moderate levels. Conversely, overly aggressive dimensionality reduction leads to a significant degradation in performance (k<5). ML models exhibited optimal performance at different values of k. In which SVM achieves its best results at k=25, KNN shows stability in the range of k= 10-15, and Random Forest performs effectively at k=5. For DL models, accuracy remained stable for k ≥ 10 and saturated between k = 12-18, consistently achieving over 97% accuracy. These results underscore the critical importance of selecting an appropriate number of PCA dimensions to balance accuracy and computational efficiency, thereby improving the efficacy of breast cancer diagnostic support systems.

Article Details

References

[1] World Health Organization, “WHO launches new roadmap on breast cancer,” World Health Organization, Geneva, 2023.
[2] World Health Organization, “Vietnam fact sheets,” 2025. [Online]. Available: https://gco.iarc.who.int/media/globocan/factsheets/populations/704-viet-nam-fact-sheet.pdf.
[3] A. Tata, M. Woolman, M. Ventura, N. Bernards, M. Ganguly, A. Gribble, B. Shrestha, E. Bluemke, H. J. Ginsberg, A. Vitkin, J. Zheng and A. Zarrine-Afsar , “Rapid Detection of Necrosis in Breast Cancer with Desorption Electrospray Ionization Mass Spectrometry,” Scientific reports, vol. 6, p. 35374, 2017.
[4] World Health Organization, “Cancer,” World Health Organization, 3 February 2025. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/cancer.
[5] E. . M. F. E. Houby, “Framework of Computer Aided Diagnosis Systems for Cancer Classification Based on Medical Images,” Journal of Medical Systems, vol. 42, no. 8, pp. 1-12, 2018.
[6] T. S. Lim, K. G. Tay, A. Huong and X. Y. Lim, “Breast cancer diagnosis system using hybrid support vector machine-artificial neural network,” International Journal of Electrical and Computer Engineering, vol. 11, no. 4, pp. 3059-3069, 2021.
[7] D. L. Banks and S. E. Fienberg, “Curse of Dimensionality,” in Encyclopedia of Physical Science and Technology (Third Edition), 2003.
[8] A. Paul, S. Paul, E. Gamukama and K. Margaret, “Exploring Dimensionality Reduction Techniques for Improved Performance,” Journal of Applied Science and Information Science, vol. 5, no. 1, p. 10, 2024.
[9] L. v. d. Maaten, E. Postma and J. v. d. Herik, “Dimensionality Reduction: A Comparative Review,” Journal of Machine Learning Research, vol. 20, no. 1, p. 6, 2007.
[10] D. P. J. M. R. B. Isabel de-la-Bandera, I. de-la-Bandera , D. Palacios, J. Mendoza and R. Barco, “Feature Extraction for Dimensionality Reduction in Cellular Networks Performance Analysis,” Sensors, vol. 20, no. 23, p. 6944, 2020.
[11] Velliangiri, Alagumuthukrishnan and Joseph, “A Review of Dimensionality Reduction Techniques for Efficient Computation,” Procedia Computer Science, vol. 165, pp. 104-111, 2019.
[12] S. Wold, K. Esbensen and P. Geladi, “Principal component analysis,” Chemometrics and Intelligent Laboratory Systems, vol. 2, no. 1-3, pp. 37-52, 1987.
[13] N. Panahi, M. G. Shayesteh, S. Mihandoost and B. Z. Varghahan, “Recognition of Different Datasets Using PCA, LDA, and Various Classifiers,” in 5th International Conference on Application of Information and Communication Technologies (AICT), Baku, Azerbaijan, 2011.
[14] S. M. Smith, A. Hyvarinen, G. Varoquaux, K. L. Miller and C. F. Beckmann, “Group-PCA for very large fMRI datasets,” NeuroImage, vol. 101, pp. 738-749, 2014.
[15] G. Esen, A. Altaibek, J. Amankulov, B. Matkerim and M. Nurtas, “Enhancing Breast Cancer Detection with Dimensionality Reduction Techniques: A Study Using PCA and LDA on Wisconsin Breast Cancer Data,” Procedia Computer Science, vol. 251, pp. 414-421, 2024.
[16] K. Luo, “Application of Principal Component Analysis in the Diagnostic Classification of Breast Cancer,” in Proceedings of the 2023 International Conference on Image, Algorithms and Artificial Intelligence (ICIAAI 2023), 2023.
[17] W. Wolberg, O. Mangasarian, N. Street and W. Street, “Breast Cancer Wisconsin (Diagnostic),” 1993. [Online]. Available: http://archive.ics.uci.edu/dataset/17/breast+cancer+wisconsin+diagnostic. [Accessed 31 12 2025].