The Mathematical Institute, University of Oxford, Eprints Archive

DifFUZZY: A fuzzy clustering algorithm for complex data sets

Cominetti, Ornella and Matzavinos, Anastasios and Samarasinghe, Sandhya and Kulasiri, Don and Liu, Sijia and Maini, P. K. and Erban, R. (2010) DifFUZZY: A fuzzy clustering algorithm for complex data sets. International Journal of Computational Intelligence in Bioinformatics & Systems Biology, 1 (4). 402-417 .

[img]
Preview
PDF
798Kb

Abstract

Soft (fuzzy) clustering techniques are often used in the study of high-dimensional datasets, such as microarray and other high-throughput bioinformatics data. The most widely used method is the fuzzy C-means (FCM) algorithm , but it can present difficulties when dealing with some datasets. A fuzzy clustering algorithm, DifFUZZY, which utilises concepts from diffusion processes in graphs and is applicable to a larger class of clustering problems than other fuzzy clustering algorithms is developed. Examples of datasets (synthetic and real) for which this method outperforms other frequently used algorithms are presented, including two benchmark biological datasets, a genetic expression dataset and a dataset that contains taxonomic measurements. This method is better than traditional fuzzy clustering algorithms at handling datasets that are ‘curved’, elongated or those which contain clusters of different dispersion. The algorithm has been implemented in Matlab and C++ and is available at http://www.maths.ox.ac.uk/cmb/difFUZZY.

Item Type:Article
Uncontrolled Keywords:clustering algorithm; fuzzy clustering; diffusion distance; genetic expression data clustering.
Subjects:A - C > Biology and other natural sciences
Research Groups:Centre for Mathematical Biology
ID Code:1044
Deposited By:Philip Maini
Deposited On:18 Jan 2011 07:51
Last Modified:13 Mar 2011 12:29

Repository Staff Only: item control page