A Random Forest Classifier based on Genetic Algorithm for Cardiovascular Diseases Diagnosis (RESEARCH NOTE)


Department of Computer Science & Engineering, B.I.T,Mesra,Ranchi, India


Machine learning-based classification techniques provide support for the decision making process in the field of healthcare, especially in disease diagnosis, prognosis and screening. Healthcare datasets are voluminous in nature and their high dimensionality problem comprises in terms of slower learning rate and higher computational cost. Feature selection is expected to deal with the high dimensionality of datasets in terms of reduced feature set. Feature selection improves the performance of classification accuracy particularly performing with less number of features in decision making process. In this paper, Random Forest (RF) is employed for the diagnosis of cardiovascular disease. The first phase of the proposed system aims at constructing various feature selection algorithms such as Principal Component Analysis (PCA), Relief- F, Sequential Forward Floating Search (SFFS), Sequential Backward Floating Search (SBFS) and Genetic Algorithm (GA) for reducing the dimension of cardiovascular disease dataset. The second phase switched to model construction based on RF algorithm for cardiovascular disease classification. The outcome shows that the combination with GA and RF delivered the highest classification accuracy of 93.2% by the help of six features.


1.     Koh, H.C. and Tan, G., "Data mining applications in healthcare", Journal of Healthcare Information Management,  Vol. 19, No. 2, (2011), 65-73.

2.     Dietterich, T.G., "Ensemble methods in machine learning", Multiple Classifier Systems,  Vol. 1857, (2000), 1-15.

3.     Van Der Maaten, L., Postma, E. and Van den Herik, J., "Dimensionality reduction: A comparative", The Journal of Machine Learning Research,  Vol. 10, (2009), 66-71.

4.     Guyon, I. and Elisseeff, A., "An introduction to variable and feature selection", Journal of Machine Learning Research,  Vol. 3, No. Mar, (2003), 1157-1182.

5.     Organization, W.H., "Prevention of cardiovascular disease: Guidelines for assessment and management of cardiovascular risk, World Health Organization, (2007), ISBN: 9789241547178

6.     Shilaskar, S. and Ghatol, A., "Feature selection for medical diagnosis: Evaluation for cardiovascular diseases", Expert Systems with Applications,  Vol. 40, No. 10, (2013), 4146-4153.

7.     Inbarani, H.H., Azar, A.T. and Jothi, G., "Supervised hybrid feature selection based on pso and rough sets for medical diagnosis", Computer Methods and Programs in Biomedicine,  Vol. 113, No. 1, (2014), 175-185.

8.     Liu, X., Wang, X., Su, Q., Zhang, M., Zhu, Y., Wang, Q. and Wang, Q., "A hybrid classification system for heart disease diagnosis based on the rfrs method", Computational and Mathematical Methods in Medicine,  Vol. 2017, (2017).

9.     Shafiee-Chafi, M. and Gholizade-Narm, H., "A novel fuzzy based method for heart rate variability prediction", International Journal of Engineering-Transactions A: Basics,  Vol. 27, No. 7, (2014), 1041.

10.   Polat, K., Sahan, S. and Gunes, S., "Automatic detection of heart disease using an artificial immune recognition system (airs) with fuzzy resource allocation mechanism and k-nn (nearest neighbour) based weighting preprocessing", Expert Systems with Applications,  Vol. 32, No. 2, (2007), 625-631.

11.   Shouman, M., Turner, T. and Stocker, R., "Using decision tree for diagnosing heart disease patients", in Proceedings of the Ninth Australasian Data Mining Conference-Volume 121, Australian Computer Society, Inc. (2011), 23-30.

12.   Das, R., Turkoglu, I. and Sengur, A., "Effective diagnosis of heart disease through neural networks ensembles", Expert Systems with Applications,  Vol. 36, No. 4, (2009), 7675-7680.

13.   Holland, J.H., "Genetic algorithms", Scientific American,  Vol. 267, No. 1, (1992), 66-73.

14.   Azar, A.T., Elshazly, H.I., Hassanien, A.E. and Elkorany, A.M., "A random forest classifier for lymph diseases", Computer Methods and Programs in Biomedicine,  Vol. 113, No. 2, (2014), 465-473.

15.   Elsayed, S.M., Sarker, R.A. and Essam, D.L., "A new genetic algorithm for solving optimization problems", Engineering Applications of Artificial Intelligence,  Vol. 27, (2014), 57-69.

16.   Amit, Y. and Geman, D., "Shape quantization and recognition with randomized trees", Neural Computation,  Vol. 9, No. 7, (1997), 1545-1588.

17.   Breiman, L., "Random forests", Machine Learning,  Vol. 45, No. 1, (2001), 5-32.

18.   Newman, D., Hettich, S., Blake, C., Merz, C. and Aha, D., "Uci repository of machine learning databases. Department of information and computer science, university of california, irvine, ca", in 1998 of Conference, http://archive. ics. uci. edu/ml/datasets. html., (1998).

19.   Powers, D.M., "Evaluation: From precision, recall and f-measure to roc, informedness, markedness and correlation",  Vol., No., (2011).

20.   Yang, T.-N. and Wang, S.-D., "Robust algorithms for principal component analysis", Pattern Recognition Letters,  Vol. 20, No. 9, (1999), 927-933.

21.   Kira, K. and Rendell, L.A., "A practical approach to feature selection", in Proceedings of the ninth international workshop on Machine learning., (1992), 249-256.

22.   Pudil, P., Novovicova, J. and Kittler, J., "Floating search methods in feature selection", Pattern Recognition Letters,  Vol. 15, No. 11, (1994), 1119-1125.

23.   Donner, A., Shoukri, M.M., Klar, N. and Bartfay, E., "Testing the equality of two dependent kappa statistics", Statistics in Medicine,  Vol. 19, No. 3, (2000), 373-387.