Feature Engineering Methods in Intrusion Detection System: A Performance Evaluation

Document Type : Original Article


Department of Compute Engineering, University of Mazandaran, Mazandaran, Iran


Today, the number of cyber-attacks has increased and become more complex with an increase in the size of high-dimensional data, which includes noisy and irrelevant features. In such cases, the removal of irrelevant and noisy features, by Feature Selection (FS) and Dimensions Reduction (DR) methods, can be very effective in increasing the performance of intrusion detection systems (IDS). This paper compares some FS and DR methods for detecting cyber-attacks with the best accuracy using implementation on KDDCUP99 dataset. A Deep Neural Network (DNN) is used for training and simulating them. The results show the filter methods are faster than wrapper methods but less accurate. Whereas the Wrapper methods have more accuracy but are computationally costlier. Embedded methods have the best output and maximum values, which is 99% for all the metrics, comparing to it the DR methods have shown a good performance and speed, among them Linear Discriminant Analysis (LDA) method even better than embedded method.


Main Subjects

  1. Li, X., Chen, W., Zhang, Q. and Wu, L., "Building auto-encoder intrusion detection system based on random forest feature selection", Computers & Security, Vol. 95, (2020), 101851. https://doi.org/10.1016/j.cose.2020.101851
  2. Kasongo, S.M. and Sun, Y., "A deep learning method with wrapper based feature extraction for wireless intrusion detection system", Computers & Security, Vol. 92, (2020), 101752. https://doi.org/10.1016/j.cose.2020.101752
  3. MR, G.R., Somu, N. and Mathur, A.P., "A multilayer perceptron model for anomaly detection in water treatment plants", International Journal of Critical Infrastructure Protection, Vol. 31, (2020), 100393. https://doi.org/10.1016/j.ijcip.2020.100393
  4. ur Rehman, S., Khaliq, M., Imtiaz, S.I., Rasool, A., Shafiq, M., Javed, A.R., Jalil, Z. and Bashir, A.K., "Diddos: An approach for detection and identification of distributed denial of service (ddos) cyberattacks using gated recurrent units (GRU)", Future Generation Computer Systems, Vol. 118, (2021), 453-466. https://doi.org/10.1016/j.future.2021.01.022
  5. Abdelaty, M., Doriguzzi-Corin, R. and Siracusa, D., "Daics: A deep learning solution for anomaly detection in industrial control systems", IEEE Transactions on Emerging Topics in Computing, Vol. 10, No. 2, (2021), 1117-1129. DOI: 10.1109/TETC.2021.3073017
  6. Butcher, B. and Smith, B.J., Feature engineering and selection: A practical approach for predictive models: by Max Kuhn and Kjell Johnson. Boca Raton, FL: Chapman & Hall/CRC Press, (2019), https://doi.org/10.1080/00031305.2020.1790217
  7. Tran, M.-Q., Liu, M.-K. and Elsisi, M., "Effective multi-sensor data fusion for chatter detection in milling process", ISA Transactions, Vol. 125, (2022), 514-527. https://doi.org/10.1016/j.isatra.2021.07.005
  8. Chalapathy, R. and Chawla, S., "Deep learning for anomaly detection: A survey", Computer Science, (2019). https://doi.org/10.48550/arXiv.1901.03407
  9. Guo, Y., Zhang, Z. and Tang, F., "Feature selection with kernelized multi-class support vector machine", Pattern Recognition, Vol. 117, (2021), 107988. https://doi.org/10.1016/j.patcog.2021.107988
  10. Nazir, A. and Khan, R.A., "A novel combinatorial optimization based feature selection method for network intrusion detection", Computers & Security, Vol. 102, (2021), 102164. https://doi.org/10.1016/j.cose.2020.102164
  11. Chio, C. and Freeman, D., "Machine learning and security: Protecting systems with data and algorithms, " O'Reilly Media, Inc.", (2018).
  12. Ghasemi, J. and Esmaily, J., "A novel intrusion detection systems based on genetic algorithms-suggested features by the means of different permutations of labels’ orders", International Journal of Engineering, Tansactions A: Basics, Vol. 30, No. 10, (2017), 1494-1502. DOI: 10.5829/ije.2017.30.10a.10
  13. Venkatesh, B. and Anuradha, J., "A review of feature selection and its methods", Cybernetics and Information Technologies, Vol. 19, No. 1, (2019), 3-26. https://doi.org/10.2478/cait-2019-0001
  14. Biglari, M., Mirzaei, F. and Hassanpour, H., "Feature selection for small sample sets with high dimensional data using heuristic hybrid approach", International Journal of Engineering, Tansactions B: Applications Vol. 33, No. 2, (2020), 213-220. DOI: 10.5829/IJE.2020.33.02B.05
  15. Kou, G., Yang, P., Peng, Y., Xiao, F., Chen, Y. and Alsaadi, F.E., "Evaluation of feature selection methods for text classification with small datasets using multiple criteria decision-making methods", Applied Soft Computing, Vol. 86, (2020), 105836. https://doi.org/10.1016/j.asoc.2019.105836
  16. Mohammadi, S., Mirvaziri, H., Ghazizadeh-Ahsaee, M. and Karimipour, H., "Cyber intrusion detection by combined feature selection algorithm", Journal of Information Security and Applications, Vol. 44, (2019), 80-88. https://doi.org/10.1016/j.jisa.2018.11.007
  17. Maza, S. and Touahria, M., "Feature selection algorithms in intrusion detection system: A survey", KSII Transactions on Internet and Information Systems, Vol. 12, No. 10, (2018), 5079-5099. https://doi.org/10.3837/tiis.2018.10.024
  18. Gündüz, S.Y. and ÇETER, M.N., "Feature selection and comparison of classification algorithms for intrusion detection", Anadolu University Journal of Science and Technology A-Applied Sciences and Engineering, Vol. 19, No. 1, (2018), 206-218. https://doi.org/10.18038/aubtda.356705
  19. Umar, M.A. and Zhanfang, C., "Effects of feature selection and normalization on network intrusion detection", (2020).
  20. Zhao, S., Li, W., Zia, T. and Zomaya, A.Y., "A dimension reduction model and classifier for anomaly-based intrusion detection in internet of things", in 2017 IEEE 15th Intl Conf on Dependable, Autonomic and Secure Computing, 15th Intl Conf on Pervasive Intelligence and Computing, 3rd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech), IEEE. (2017), 836-843. DOI: 10.1109/DASC-PICom-DataCom-CyberSciTec.2017.141
  21. Saranya, T., Sridevi, S., Deisy, C., Chung, T.D. and Khan, M.A., "Performance analysis of machine learning algorithms in intrusion detection system: A review", Procedia Computer Science, Vol. 171, (2020), 1251-1260. https://doi.org/10.1016/j.procs.2020.04.133
  22. Zhang, Y., Yang, C., Yang, A., Xiong, C., Zhou, X. and Zhang, Z., "Feature selection for classification with class-separability strategy and data envelopment analysis", Neurocomputing, Vol. 166, (2015), 172-184. https://doi.org/10.1016/j.neucom.2015.03.081
  23. El Bilali, A., Taleb, A. and Brouziyne, Y., "Groundwater quality forecasting using machine learning algorithms for irrigation purposes", Agricultural Water Management, Vol. 245, (2021), 106625. https://doi.org/10.1016/j.agwat.2020.106625
  24. Shlens, J.J.a.p.a., "A tutorial on principal component analysis", arXiv preprint arXiv:1404.1100, (2014). https://doi.org/10.48550/arXiv.1404.1100
  25. Izenman, A.J., Linear discriminant analysis, in Modern multivariate statistical techniques. 2013, Springer. 237-280.
  26. Van der Maaten, L. and Hinton, G.J.J.o.m.l.r., "Visualizing data using t-sne", Vol. 9, No. 11, (2008).
  27. Ravipati, R.D. and Abualkibash, M., "Intrusion detection system classification using different machine learning algorithms on kdd-99 and nsl-kdd datasets-a review paper", International Journal of Computer Science & Information Technology, Vol. 11, No. 3, (2019). http://dx.doi.org/10.2139/ssrn.3428211