Hotel Estoril Eden, Monte Estoril,
5-8 October 2005



Text Box: Participants
NextText Box: Programme

ome Statistical Techniques for Selection of Differentially Expressed Genes from DNA Arrays Data

Aluísio Pinheiro
Departamento de Estatística, UNICAMP, Brazil

DNA arrays technology is the main strategy for large scale evaluation of gene expression profiles. An important application of DNA arrays is the identification of genes that show significant changes on their expression in RNA samples obtained from different individuals, tissues or physiological states. However, due to the noisy characteristic of array data and the usual few number of experimental replicates, a statistical framework to validate the selection of differentially expressed genes is not easily determined. We present the following two studies:

  1. Selection of differentially expressed genes from DNA arrays data by nonlinear data transformations and local fitting, by R.D. Drummond, A. Pinheiro, C.S. Rocha and M. Menossi, in which we present an algorithm designed to optimize the selection of genes that showed the most significant variations in expression among two RNA samples under study, with a single hybridization for each sample, based on the assumptions that most arrayed genes are equally expressed in both samples, that expression ratios distribution is close to lognormal and that the variability of expression ratios is dependent on the mean signal intensity of the genes. The algorithm uses an optimal data transformation to lead expression ratios closer to normal distribution, a sliding window through data to evaluate the dependence of ratios variability on the mean intensity of the genes, and spline interpolation to determine an intensitydependent criterion to select the genes whose expression varies most significantly between samples. When applied to simulated data the algorithm showed a good performance, identifying more than 90% of the differentially expressed genes. It also achieved satisfactory results when dealing with real data, its application to a public data set of gene expression from cold-stressed sugarcane (Nogueira et al., 2003) confirmed 83% of the genes originally identifyed as cold-responsive by the data set authors and selected several other genes also responsive to cold treatment. Supported by CAPES, FAPESP and PADCT/CNPq.
  2. Some Statistical Properties of Gene Expression Clustering for Array Data, by G.C.G Abreu, A. Pinheiro, R.D. Drummond, S.R. Camargo and M. Menossi, in which we propose an easy-to-implement and simple-to-use technique that uses bootstrap resampling to evaluate the statistical error of the nodes provided by SOM-based clustering. Comparisons between SOM and parametric clustering are presented for simulated as well as for two real data sets. We also implement a bootstrap-based pre-processing procedure for SOM, that improves the false discovery ratio of differentially expressed genes. Supported by CAPES, FAPESP and CNPq.