Modification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis


1 Electrical and Computer Engineering, Babol Noshirvani University of Technology

2 Electerical & Computer Engineering, Shahid Beheshti University


Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively new clusters are added. In each epoch all data points are checked for the k-th cluster center. Therefore a near global solution is obtained. In the gene expression clustering problem, since genes with significant differential expression levels, across the output class labels, are important for the accurate classification of samples, a fuzzy entropy measure is used to adjust the fast GKM for the gene expression data clustering application. To demonstrate the usefulness of the proposed method, three published microarray datasets are used: Leukemia, Prostate, and Colon. Classification results are found robust and accurate using three public classification methods: K-NN, SVM, and Naïve Bayesian.