Hotel Estoril Eden, Monte Estoril,
Natalie P. Thorne1,2, Jessica C. Pole3,
Paul A.W. Edwards3
and Simon Tavaré1,2
In particular, we will describe how it is possible to significantly improve the segmentation by incorporating biological information such as clone quality, clone length, the distance between clones or (in a tiling path setting) the overlap between clones. The model we propose extends the Hidden Markov Model approach of Fridlyand et al  by allowing the underlying Markov chain to be heterogeneous instead of homogeneous. This enables large amounts of additional information to be included in the model with only a small increase in the number of parameters. Additionally, our model allows an analysis to be carried out on a whole genome basis rather than on a chromosomal basis as is common at present.
In order to assess our model we used a
simulated dataset to compare its performance with that of a number of
commonly used algorithms. Additionally, we examined its performance in
segmenting data obtained from an arrayCGH experiment involving about 50
tumour (mainly breast) cell lines (Pole et al, ). The arrays used (Huang
et al, ) had low resolution coverage for the majority of the genome,
1.5Mb coverage for chromosome 8, and an additional tiled region in 8p12.
Breast cancer cell lines frequently have highly rearranged and complex
genomes, with breaks commonly occurring on 8p12. For several of these cell
lines, detailed FISH analysis of copy number has been used to determine the
number of copies in the region of interest. These data provide an excellent
framework for illustrating the efficacy of our model.