
A
New Limiting Distribution for the Statistic Test for the Homogeneity of Two multinomial Populations
Adelaide Valente Freitas^{1}, Miguel Pinheiro^{2},
José Luís Oliveira^{2}, Gabriela Moura^{3} and
Manuel Santos^{3}
^{1}Departamento de
Matemática, Universidade de Aveiro, Portugal
^{2}IEETA, Universidade de Aveiro, Portugal
^{3}Departamento de Biologia, Universidade de Aveiro, Portugal
Consider a sampled data crossclassified in a m x 2 contingency table from
two populations described by a unknown multinomial probability distribution.
For testing for the homogeneity of these two populations, Choulakian and
Mahdi (2000) considered a statistic test (L_{D}) defined by the
maximum term of a random number of independent and identically distributed
random variables. Applying Extreme Value Theory we derive a asymptotic
probability distribution of the statistic L_{D}, under convenient
normalization, as m ® ¥. Simulation studies
carried out show that for m large (m ³ 20) and
for a large number of observations the empirical pvalues are approximated
quite accurately by the target pvalues obtained using our limiting
distribution.
Applying this approach on the complete ORFeome
sequences of 3 yeast species, namely Saccharomyces cerevisiae, Saccharomyces
mikatae and Schizosaccharomyces pombe for testing the homogeneity of some
codon contexts of these species we conclude that the codon context rules for
S. pombe are rather statistically significantly different from those rules
of the other two species and similar for S. cerevisiae and S. mikatae. These
results confirm the divergence and convergence of those three species in the
phytogenetic tree.
Key Words and Phrases: extreme value
distribution, random sample size, contingency table, ORFeome, phytogenetic
tree
References:
Choulakian, V. and Mahdi, S. (2001) A new statistic for the analysis of
association between trait and polymorphic marker loci. Math. Biosciences,
164, 139145. 