Android Malware Classification Using Deep Learning CNN with Co-occurrence Matrix Feature
Main Article Content
Abstract
Recently, deep learning has been widely applying to speech and image recognition. Convolutional neural network (CNN) is one of the main categories to do images classifications with a very high accuracy. In Android malware classification field, to take advantages of the CNN model, many works have been trying to convert Android malwares into “images” to make them well-matched with the CNN input. The performance, however, is not significant improved because simply converting malwares into images may lack several important features of the malwares. This paper proposes a method for improving the feature set of Android malware classification based on co-concurrence matrix (co-matrix). The co-matrix is established based on a list of raw features extracted from .APK files. The proposed feature can take the advantage of CNN while remaining important features of the Android malwares. Experimental results of CNN model conducted on a very popular Android malware dataset, Drebin, proves the feasibility of our proposed co-matrix feature.
Keywords
Android Malware classification, Drebin, Co-Matrix, CNN.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
References
[1] Mobile Operating System Market Share Worldwide. Available: https://gs.statcounter.com/os-marketshare/mobile/worldwide
[2] Statistics malware: available at https://www.av-test.org/en/statistics/malware/ [3] Bernard Meyer, These camera apps with billions of downloads might be stealing your data and infecting you with malware. Available: https://cybernews.com/security/popular-camera-appssteal-data-infect-malware [4] D. Arp, M. Spreitzenbarth, M. Hübner, H. Gascon, and K. Rieck, Drebin: Effective and Explainable Detection of Android Malware in Your Pocket, Proceedings 2014 Network and Distributed System Security Symposium, 2014 https://doi.org/10.14722/ndss.2014.23247
[5] F. Wei, Y. Li, S. Roy, X. Ou, and W. Zhou, Deep Ground Truth Analysis of Current Android Malware, Detection of Intrusions and Malware, and Vulnerability Assessment, vol. 10327, pp. 252–276, 2017. https://doi.org/10.1007/978-3-319-60876-1_12
[6] Md. S. Rana, S. S. M. M. Rahman, and A. H. Sung, Evaluation of Tree Based Machine Learning Classifiers for Android Malware Detection, Computational Collective Intelligence, vol. 11056, pp. 377–385, 2018, https://doi.org/10.1007/978-3-319-98446-9_35
[7] S. Wang, G. Zhou, J. Lu, and F. Zhang, A Novel Malware Detection and Classification Method Based on Capsule Network, Lecture Notes in Computer Science, vol. 11632, pp. 573–584, 2019, https://doi.org/10.1007/978-3-030-24274-9_52
[8] T. H. Huang and H. Kao, R2-D2: ColoR-inspired Convolutional NeuRal Network (CNN)-based AndroiD Malware Detections, 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, 2018, pp. 2633-2642, https://doi.org/10.1109/BigData.2018.8622324
[9] Z. Xu, K. Ren, S. Qin, and F. Craciun, CDGDroid: Android Malware Detection Based on Deep Learning Using CFG and DFG, in Formal Methods and Software Engineering, 2018, vol. 11232, pp. 177– 193, https://doi.org/10.1007/978-3-030-02450-5_11
[10] C. Li, K. Mills, D. Niu, R. Zhu, H. Zhang and H. Kinawi, Android Malware Detection Based on Factorization Machine, in IEEE Access, vol. 7, pp. 184008-184019, 2019, https://doi.org/10.1109/ACCESS.2019.2958927
[11] R. Nix and J. Zhang, Classification of Android apps and malware using deep neural networks, 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, 2017, pp. 1871-1878, https://doi.org/10.1109/IJCNN.2017.7966078
[12] Y. Ding, W. Zhao, Z. Wang and L. Wang, Automaticlly Learning Featurs Of Android Apps Using CNN, 2018 International Conference on Machine Learning and Cybernetics (ICMLC), Chengdu, 2018, pp. 331-336, https://doi.org/10.1109/ICMLC.2018.8526935
[13] Y. Jin, T. Liu, A. He, Y. Qu and J. Chi, Android Malware Detector Exploiting Convolutional Neural Network and Adaptive Classifier Selection, 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC), Tokyo, 2018, pp. 833-834, https://doi.org/10.1109/COMPSAC.2018.00143
[14] A. Abderrahmane, G. Adnane, C. Yacine and G. Khireddine, Android Malware Detection Based on System Calls Analysis and CNN Classification, 2019 IEEE Wireless Communications and Networking Conference Workshop (WCNCW), Marrakech, Morocco, 2019, pp. 1-6, https://doi.org/10.1109/WCNCW.2019.8902627
[15] Wikipedia, John Rupert Firth. Available: https://en.wikipedia.org/wiki/John_Rupert_Firth
[16] T. Watanabe, S. Ito, and K. Yokoi, Co-occurrence Histograms of Oriented Gradients for Pedestrian Detection, in Advances in Image and Video Technology, 2009, vol. 5414, pp. 37–47, https://doi.org/10.1007/978-3-540-92957-4_4
[17] W. Gomez, W. C. A. Pereira and A. F. C. Infantosi, Analysis of Co-Occurrence Texture Statistics as a Function of Gray-Level Quantization for Classifying Breast Ultrasound, in IEEE Transactions on Medical Imaging, vol. 31, no. 10, pp. 1889-1899, Oct. 2012, https://doi.org/10.1109/TMI.2012.2206398 [18] B. Pathak and D. Barooah, Textture analysis based on the gray-level Co-occurrence matrix considering possible orientations, International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering, vol. 2, no. 9.
[19] A. Eleyan and H. Demirel, Co-occurrence based statistical approach for face recognition, 2009 24th International Symposium on Computer and Information Sciences, Guzelyurt, 2009, pp. 611-615, https://doi.org/10.1109/ISCIS.2009.5291895
[20] L.Đ. Thuan, P.V. Huong, L.T.H. Van, HQ. Cuong, H.V. Hiep and N.K. Khanh, Improvement of feature set based on Apriori algorithm in Android malware classification using machine learning method, Nghiên cứu khoa học và công nghệ quân sự, no. August, pp. 32–41, 2018, ISSN 1859 – 1043.
[21] L. D. Thuan, P. Van Huong, H. Van Hiep and N. Kim Khanh, Improvement of feature set based on Apriori algorithm in Android malware classification using machine learning method, 2020 RIVF International Conference on Computing and Communication Technologies (RIVF), Ho Chi Minh City, Vietnam, 2020, pp. 1-7, https://doi.org/10.1109/RIVF48685.2020.9140779
[22] https://archive.org/details/2018-02-random-apkcollection.
[23] C.-W. Yeh, W.-T. Yeh, S.-H. Hung, and C.-T. Lin, Flattened data in convolutional neural networks: Using malware detection as case study, in Proc. Int. Conf. Res. Adapt. Convergent Syst., 2016, pp. 130– 135, https://doi.org/10.1145/2987386.2987406
[24] Mohammed K. Alzaylaee, Suleiman Y. Yerima, Sakir Sezer, DL-Droid: Deep learning based android malware detection using real devices, Computers & Security, Volume 89, 2020, 101663, ISSN 0167- 4048, https://doi.org/10.1016/j.cose.2019.101663.
[25] P. Feng, J. Ma, C. Sun, X. Xu and Y. Ma, A Novel Dynamic Android Malware Detection System With Ensemble Learning, in IEEE Access, vol. 6, pp. 30996-31011, 2018, https://doi.org/10.1145/2987386.2987406
[26] Z. Yuan, Y. Lu, Z. Wang, and Y. Xue, ‘Droid-sec: Deep learning in Android malware detection, in Proc. ACM Conf. SIGCOMM, 2014, pp. 371–372, https://doi.org/10.1145/2740070.2631434.
[27] Z. Yuan, Y. Lu and Y. Xue, Droiddetector: android malware characterization and detection using deep learning, in Tsinghua Science and Technology, vol. 21, no. 1, pp. 114-123, Feb. 2016, https://doi.org/10.1109/TST.2016.7399288
[28] L. Xu, D. Zhang, N. Jayasena, and J. Cavazos, HADM: Hybrid analysis for detection of malware, in Proc. SAI Intell. Syst. Conf. Springer, 2016, pp. 702– 724. https://doi.org/10.1007/978-3-319-56991-8_51
[2] Statistics malware: available at https://www.av-test.org/en/statistics/malware/ [3] Bernard Meyer, These camera apps with billions of downloads might be stealing your data and infecting you with malware. Available: https://cybernews.com/security/popular-camera-appssteal-data-infect-malware [4] D. Arp, M. Spreitzenbarth, M. Hübner, H. Gascon, and K. Rieck, Drebin: Effective and Explainable Detection of Android Malware in Your Pocket, Proceedings 2014 Network and Distributed System Security Symposium, 2014 https://doi.org/10.14722/ndss.2014.23247
[5] F. Wei, Y. Li, S. Roy, X. Ou, and W. Zhou, Deep Ground Truth Analysis of Current Android Malware, Detection of Intrusions and Malware, and Vulnerability Assessment, vol. 10327, pp. 252–276, 2017. https://doi.org/10.1007/978-3-319-60876-1_12
[6] Md. S. Rana, S. S. M. M. Rahman, and A. H. Sung, Evaluation of Tree Based Machine Learning Classifiers for Android Malware Detection, Computational Collective Intelligence, vol. 11056, pp. 377–385, 2018, https://doi.org/10.1007/978-3-319-98446-9_35
[7] S. Wang, G. Zhou, J. Lu, and F. Zhang, A Novel Malware Detection and Classification Method Based on Capsule Network, Lecture Notes in Computer Science, vol. 11632, pp. 573–584, 2019, https://doi.org/10.1007/978-3-030-24274-9_52
[8] T. H. Huang and H. Kao, R2-D2: ColoR-inspired Convolutional NeuRal Network (CNN)-based AndroiD Malware Detections, 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, 2018, pp. 2633-2642, https://doi.org/10.1109/BigData.2018.8622324
[9] Z. Xu, K. Ren, S. Qin, and F. Craciun, CDGDroid: Android Malware Detection Based on Deep Learning Using CFG and DFG, in Formal Methods and Software Engineering, 2018, vol. 11232, pp. 177– 193, https://doi.org/10.1007/978-3-030-02450-5_11
[10] C. Li, K. Mills, D. Niu, R. Zhu, H. Zhang and H. Kinawi, Android Malware Detection Based on Factorization Machine, in IEEE Access, vol. 7, pp. 184008-184019, 2019, https://doi.org/10.1109/ACCESS.2019.2958927
[11] R. Nix and J. Zhang, Classification of Android apps and malware using deep neural networks, 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, 2017, pp. 1871-1878, https://doi.org/10.1109/IJCNN.2017.7966078
[12] Y. Ding, W. Zhao, Z. Wang and L. Wang, Automaticlly Learning Featurs Of Android Apps Using CNN, 2018 International Conference on Machine Learning and Cybernetics (ICMLC), Chengdu, 2018, pp. 331-336, https://doi.org/10.1109/ICMLC.2018.8526935
[13] Y. Jin, T. Liu, A. He, Y. Qu and J. Chi, Android Malware Detector Exploiting Convolutional Neural Network and Adaptive Classifier Selection, 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC), Tokyo, 2018, pp. 833-834, https://doi.org/10.1109/COMPSAC.2018.00143
[14] A. Abderrahmane, G. Adnane, C. Yacine and G. Khireddine, Android Malware Detection Based on System Calls Analysis and CNN Classification, 2019 IEEE Wireless Communications and Networking Conference Workshop (WCNCW), Marrakech, Morocco, 2019, pp. 1-6, https://doi.org/10.1109/WCNCW.2019.8902627
[15] Wikipedia, John Rupert Firth. Available: https://en.wikipedia.org/wiki/John_Rupert_Firth
[16] T. Watanabe, S. Ito, and K. Yokoi, Co-occurrence Histograms of Oriented Gradients for Pedestrian Detection, in Advances in Image and Video Technology, 2009, vol. 5414, pp. 37–47, https://doi.org/10.1007/978-3-540-92957-4_4
[17] W. Gomez, W. C. A. Pereira and A. F. C. Infantosi, Analysis of Co-Occurrence Texture Statistics as a Function of Gray-Level Quantization for Classifying Breast Ultrasound, in IEEE Transactions on Medical Imaging, vol. 31, no. 10, pp. 1889-1899, Oct. 2012, https://doi.org/10.1109/TMI.2012.2206398 [18] B. Pathak and D. Barooah, Textture analysis based on the gray-level Co-occurrence matrix considering possible orientations, International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering, vol. 2, no. 9.
[19] A. Eleyan and H. Demirel, Co-occurrence based statistical approach for face recognition, 2009 24th International Symposium on Computer and Information Sciences, Guzelyurt, 2009, pp. 611-615, https://doi.org/10.1109/ISCIS.2009.5291895
[20] L.Đ. Thuan, P.V. Huong, L.T.H. Van, HQ. Cuong, H.V. Hiep and N.K. Khanh, Improvement of feature set based on Apriori algorithm in Android malware classification using machine learning method, Nghiên cứu khoa học và công nghệ quân sự, no. August, pp. 32–41, 2018, ISSN 1859 – 1043.
[21] L. D. Thuan, P. Van Huong, H. Van Hiep and N. Kim Khanh, Improvement of feature set based on Apriori algorithm in Android malware classification using machine learning method, 2020 RIVF International Conference on Computing and Communication Technologies (RIVF), Ho Chi Minh City, Vietnam, 2020, pp. 1-7, https://doi.org/10.1109/RIVF48685.2020.9140779
[22] https://archive.org/details/2018-02-random-apkcollection.
[23] C.-W. Yeh, W.-T. Yeh, S.-H. Hung, and C.-T. Lin, Flattened data in convolutional neural networks: Using malware detection as case study, in Proc. Int. Conf. Res. Adapt. Convergent Syst., 2016, pp. 130– 135, https://doi.org/10.1145/2987386.2987406
[24] Mohammed K. Alzaylaee, Suleiman Y. Yerima, Sakir Sezer, DL-Droid: Deep learning based android malware detection using real devices, Computers & Security, Volume 89, 2020, 101663, ISSN 0167- 4048, https://doi.org/10.1016/j.cose.2019.101663.
[25] P. Feng, J. Ma, C. Sun, X. Xu and Y. Ma, A Novel Dynamic Android Malware Detection System With Ensemble Learning, in IEEE Access, vol. 6, pp. 30996-31011, 2018, https://doi.org/10.1145/2987386.2987406
[26] Z. Yuan, Y. Lu, Z. Wang, and Y. Xue, ‘Droid-sec: Deep learning in Android malware detection, in Proc. ACM Conf. SIGCOMM, 2014, pp. 371–372, https://doi.org/10.1145/2740070.2631434.
[27] Z. Yuan, Y. Lu and Y. Xue, Droiddetector: android malware characterization and detection using deep learning, in Tsinghua Science and Technology, vol. 21, no. 1, pp. 114-123, Feb. 2016, https://doi.org/10.1109/TST.2016.7399288
[28] L. Xu, D. Zhang, N. Jayasena, and J. Cavazos, HADM: Hybrid analysis for detection of malware, in Proc. SAI Intell. Syst. Conf. Springer, 2016, pp. 702– 724. https://doi.org/10.1007/978-3-319-56991-8_51