A New Hybrid Framework for Filter based Feature Selection using Information Gain and Symmetric Uncertainty (TECHNICAL NOTE)

Authors

Department of Computer Science, Nehru Memorial College, Puthanampatti, Tiruchirappalli-Dt,Tamil Nadu, India

Abstract

Feature selection is a pre-processing technique used for eliminating the irrelevant and redundant features which results in enhancing the performance of the classifiers. When a dataset contains more irrelevant and redundant features, it fails to increase the accuracy and also reduces the performance of the classifiers. To avoid them, this paper presents a new hybrid feature selection method using information gain and symmetric uncertainty. The proposed work uses median based discretization for converting the quantitative features into qualitative one, information gain in finding the relevant features and symmetric uncertainty to remove the redundant features. As the proposed work uses both relevance and redundant analyses the predictive accuracy of the Naive Bayesian classifier has been improved. Further the efficiency and effectiveness of the proposed methodology is analyzed by comparing with other existing methods using real-world datasets of high dimensionality.

Keywords


1.     Hemati, H., Ghasemzadeh, M. and Meinel, C., "A hybrid machine learning method for intrusion detection", International Journal of Engineering-Transactions C: Aspects,  Vol. 29, No. 9, (2016), 1242-1246.

2.     Hamidi, H. and Daraee, A., "Analysis of pre-processing and post-processing methods and using data mining to diagnose heart diseases", International Journal of Engineering-Transactions A: Basics,  Vol. 29, No. 7, (2016), 921-930.

3.     Han, J., Pei, J. and Kamber, M., "Data mining: Concepts and techniques, Elsevier,  (2011).

4.     Amr, T., "Survey on feature selection", IEEE Transactions on Information Forensics and Security,  Vol. 3, No. 1, (2008), 91-100.

5.     Yu, L. and Liu, H., "Feature selection for high-dimensional data: A fast correlation-based filter solution", in ICML. Vol. 3, (2003), 856-863.

6.     Hall, M. A., "Correlation-based feature selection of discrete and numeric class machine learning", Seventeenth International Conference on Machine Learning, USA, Morgan Kaufmann Publishers Inc., (2000), 359-366. (2000).

7.     Haindl, M., Somol, P., Ververidis, D. and Kotropoulos, C., "Feature selection based on mutual correlation", Progress in Pattern Recognition, Image Analysis and Applications,  Vol., 4225, (2006), 569-577.

8.     Pino, A. and Morell, C., "Analytical and experimental study of filter feature selection algorithms for high-dimensional datasets", in Fourth International Workshop on Knowledge Discovery, Knowledge Management and Decision Support, Atlantis Press., (2013), 339-349.

9.     Biesiada, J. and Duch, W., "Feature selection for high-dimensional data—a pearson redundancy based filter", Computer Recognition Systems 2,  Vol. 45, (2007), 242-249.

10.   Senliol, B., Gulgezen, G., Yu, L. and Cataltepe, Z., "Fast correlation based filter (FCBF) with a different search strategy", in Computer and Information Sciences, 23rd International Symposium on, IEEE, (2008), 1-4.

11.   Peter, T. J. and Somasundaram, K., "Study and development of novel feature selection framework for heart disease prediction", International Journal of Scientific and Research Publications,  Vol. 2, No. 10, (2012), 1-7.

12.   Mani. K. and Kalpana. P., "A filter-based feature selection using  information gain with median based discretization for naive  bayesian classifier", International Journal of Applied and Engineering Research,  Vol. 10, No. 82, (2015), 280-285.

13.   Yu, L. and Liu, H., "Efficient feature selection via analysis of relevance and redundancy", Journal of Machine Learning Research,  Vol. 5, No. Oct, (2004), 1205-1224.

14.   Rajesh, K. and Sangeetha, V., "Application of data mining methods and techniques for diabetes diagnosis", International Journal of Engineering and Innovative Technology (IJEIT),  Vol. 2, No. 3, (2012).

15.   Tang, J., Alelyani, S. and Liu, H., "Feature selection for classification: A review", Data Classification: Algorithms and Applications, (2014), 37-64.

16.   Mahdizadeh, M. and M. Eftekhari, "A novel cost sensitive imbalanced classification method based on new hybrid fuzzy cost assigning approaches, fuzzy clustering and evolutionary algorithms", International Journal of Engineering (IJE), Transactions B: Applications,  Vol. 28, No. 8, (2015), 1160-1168.