Hotel Estoril Eden, Monte Estoril,
5-8 October 2005



NextText Box: Participants
Text Box: Programme

Selecting Relevant Genes in Microarray Data

Joaquim F. Pinto da Costa1, Hugo Alonso1, Luís A.C. Roque2 and Manuela Oliveira3
1Departamento de Matemática Aplicada, Universidade do Porto, Portugal
2Departamento de Matemática, ISEP, Universidade do Porto, Portugal
3Departamento de Matemática, Universidade de Évora, Portugal

In this work we consider the problem of selecting informative genes from the thousands of genes that are usually measured in microarray experiments. Firstly, the selection is done by taking into account the information about the class membership (disease) of each individual; we try to find which of the measured genes have relevant information to discriminate between the different classes by using Decision Trees [1]. Surprisingly, in the five datasets analysed, only a few of the thousands of genes that are ususally measured were selected; it seams that most of the genes are not good to discriminate between the diseases. Secondly, we approach the problem by finding the Principal Components of the most expressed genes. Two variants are used: the usual PCA using the Pearson correlation matrix and a “weighted” version which is introduced in this work. This weighted PCA consists in using an adaptation of a new rank correlation coefficient that gives more importance to the higher ranks and which was introduced by Pinto da Costa & Soares in [2].

Key-Words: Microarrays, Decision Trees, PCA, Weighted Rank Correlation

[1] Breiman, L., Friedman, J.H., Olshen, A. and Stone, C.J., 1984. Classification and Regression Trees. Wadsworht, Belmont.
[2] Pinto da Costa, J.F. and Soares, C., 2005. A weighted rank measure of correlation, Australian & New Zealand Journal of Statistics (to appear).