Electrical and Computer Engineering, Babol Noshirvani University of Technology
Electerical & Computer Engineering, Shahid Beheshti University
Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively new clusters are added. In each epoch all data points are checked for the k-th cluster center. Therefore a near global solution is obtained. In the gene expression clustering problem, since genes with significant differential expression levels, across the output class labels, are important for the accurate classification of samples, a fuzzy entropy measure is used to adjust the fast GKM for the gene expression data clustering application. To demonstrate the usefulness of the proposed method, three published microarray datasets are used: Leukemia, Prostate, and Colon. Classification results are found robust and accurate using three public classification methods: K-NN, SVM, and Naïve Bayesian.