Building and implementation of a big data mining model for businesses

Tran Truong Do1, Xuan Dung Nguyen1, Xuan Quang Phuong1, Quang Vinh Tran1,
1 Hanoi University of Science and Technology, Ha Noi, Vietnam

Main Article Content

Abstract

Today's businesses are facing a number of difficulties in statistics, analysis, and processing of their data sources to make appropriate business decisions. The reason is that the data source of the enterprise is stored discretely in many file types with different structures and is not unified. In this paper, we design and build a model of a centralised storage system, using big data mining to provide business data analysis functions according to their business requirements. To test and evaluate the effectiveness of the proposed model, we use the input data, which are the actual business data set of an accessory business. The results when the model is applied show that the source data sets are organised, stored, and analyzed with many different criteria, and are displayed on the charts in an obvious and detailed way. Furthermore, we also compare the proposed model with some existing models. The results show that the proposed model is easy to use for end-users. It has high scalability and fault tolerance, and a faster processing speed compared to traditional models.

Article Details

References

[1] M. M. Al-Debei, Data warehouse as a backbone for business intelligence: Issues and challenges, European Journal of Economics, Finance and Administrative Sciences, vol. 33, no. 1, pp. 153-166, 2011.
[2] H. J. Watson and B. H. Wixom, The current state of business intelligence, Computer, vol. 40, no. 9, pp. 96-99, 2007, https://doi.org/10.1109/MC.2007.331
[3] J. Nandimath, E. Banerjee, A. Patil, P. Kakade, S. Vaidya, and D. Chaturvedi, Big data analysis using apache hadoop, in 2013 IEEE 14th International Conference on Information Reuse & Integration (IRI). IEEE, 2013, pp. 700-703, https://doi.org/10.1109/IRI.2013.6642536
[4] L. W. Santoso et al., Data warehouse with big data technology for higher education, Procedia Computer Science, vol. 124, pp. 93-99, 2017, https://doi.org/10.1016/j.procs.2017.12.134
[5] L. W. Santoso, Classifier combination for telegraphese restoration, in 2011 International Conference on Uncertainty Reasoning and Knowledge Engineering, vol. 1. IEEE, 2011, pp. 79-82, https://doi.org/10.1109/URKE.2011.6007845
[6] J. Joe, Data warehouse and big data integration, Int. Journal of Comp. Sci. and Inf. Tech., Vol. 9(2), p.1-17, 2022.
[7] R. Kimball, M. Ross, W. Thorthwaite, B. Becker, and J. Mundy, The data warehouse lifecycle toolkit. John Wiley & Sons, 2008.
[8] C. Todman, Designing a Data Warehouse: Supporting Customer Relationship Management. Prentice Hall PTR, 2000.
[9] W. H. Inmon, Building the Data Warehouse. John wiley & sons, 2005.
[10] M. Golfarelli and S. Rizzi, A survey on temporal data warehousing, IJDWM, vol. 5, pp. 1-17, 01 2009, https://doi.org/10.4018/jdwm.2009010101
[11] P. Ponniah, Data Warehousing: A Comprehensive Guide for It Professional, New York: The McGraw-Hill Companies, 2010.
[12] G. S. Reddy, R. Srinivasu, M. P. C. Rao, and S. R. Rikkula, Data warehousing, data mining, olap and oltp technologies are essential elements to support decision-making process in industries, International Journal on Computer Science and Engineering, vol. 2, no. 9, pp. 2865-2873, 2010.
[13] V. Gour, S. Sarangdevot, G. S. Tanwar, and A. Sharma, Improve performance of extract, transform and load (ETL) in data warehouse, International Journal on Computer Science and Engineering, vol. 2, no. 3, pp. 786-789, 2010, https://doi.org/10.5120/623-887
[14] Daniel Gutierrez, Gartner says beware of the data lake fallacy - insideBIGDATA, insidebigdata.com (accessed on 14 March 2019).
[15] C. Campbell, Top five differences between data lakes and data warehouses, White paper: BLUE GRANITE, 2015.
[16] P. P. Khine and Z. S. Wang, Data lake: a new ideology in big data era, in ITM web of conferences, vol. 17. EDP Sciences, 2018, p. 03025, https://doi.org/10.1051/itmconf/20181703025
[17] T. John and P. Misra, Data Lake for Enterprises. Packt Publishing Ltd, 2017.
[18] S. G. Manikandan and S. Ravi, Big data analysis using apache hadoop, in 2014 International Conference on IT Convergence and Security (ICITCS). IEEE, 2014, pp. 1-4, https://doi.org/10.1109/ICITCS.2014.7021746
[19] D. Calvanese, G. De Giacomo, and M. Lenzerini, On the decidability of query containment under constraints, in Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, 1998, pp. 149-158, https://doi.org/10.1145/275487.275504