Hotel Estoril Eden, Monte Estoril,
Portugal
5-8 October 2005

 

 
 

Previous
Main
AbstractsText Box: Participants
Text Box: Programme


L
ocal DNA Sequence Information Using Rényi Entropic Profiles

Susana Vinga1 and Jonas S. Almeida1,2
1
Instituto de Biologia Experimental e Tecnológica (IBET), and Biomathematics Group, Instituto de Tecnologia Química e Biológica – Universidade Nova de Lisboa (ITQB-UNL), Portugal
2Dept. Biostatistics, Bioinformatics and Epidemiology, Medical Univ. South Carolina, USA

In a recent report [1] the authors presented a new measure of Rényi continuous entropy for DNA sequences which allows the estimation of their randomness level. The definition therein explored was based on the Rényi entropy of the probability density estimation (pdf) using the Parzen’s window method, applied to Chaos Game Representation/Universal Sequence Maps (CGR/USM). CGR/USM are related to Markov chain models and their corresponding transition probability tables can be easily extracted from these vector maps.

This work extends the concept of continuous entropy by defining DNA sequence entropic profiles, using the pdf estimations previously obtained. These profiles are applied to the study of a sequence dataset constituted by artificial and real DNA, employing several kernel functions. This work shows that the entropic profiles are directly related to the statistical significance of motifs, allowing the study of under and over-representation of sub-strings. Furthermore, by spanning the parameters space of the kernel function, it is possible to extract important information about the scale of each DNA region, which can have future applications in the recognition of biologically significant segments of the genome.

Keywords: Rényi entropy, DNA, information theory, CGR/USM, Parzen’s method.

Reference:
[1] Vinga, S. and Almeida, J. S. (2004) J Theor Biol, 231(3):377-388.