Hotel Estoril Eden, Monte Estoril,
5-8 October 2005



NextText Box: Participants
Text Box: Programme

valuating the Agreement Between Clusterings

Francisco R. Pinto1 and Jonas S. Almeida2
1Grupo de Biomatemática, Inst. de Tecnologia Química e Biológica, Portugal
2Department of Biostatistics, Bioinformatics and Epidemiology, Medical University of South Carolina, USA

Defining clusters or classifying genes, sequences or other biological entities is a common task in genomics and proteomics. Consequently, one frequently needs to compare the performance of two clustering algorithms, or determine how well a given classifications agrees with another. Many times, one needs to perform these comparisons in the absence of a gold standard classification. Here, a new method is presented which evaluates the extent of agreement (or disagreement) between any two partitions. Either a single one or both can result from classification or clustering algorithms and can be hierarchical or non-hierarchical. Overlapping partitions are also allowed. This new method is applied to identify the best agreement between genes clustered by expression profiles and by GO annotations. The results are compared with the Rand index, van Dongen criterion, cophenetic coefficient and the variation of information criterion. Compared with available approaches, the presented method is more broadly applicable.