Hotel Estoril Eden, Monte Estoril,
5-8 October 2005



NextText Box: Participants
Text Box: Programme

tatistical Problems Arising in Physical Mapping

Sophie Schbath
Institut National de la Recherche Agronomique, France

A physical mapping project usually aims to produce a physical map of each chromosome of an organism. Such physical map is a set of ordered and overlapping genomic fragments (clones) spanning the entire chromosome. A genomic library of clones is first constructed; it is supposed to represent the chromosome. Then, clones are chosen at random and overlaps are inferred. Overlapping clones are linked into islands or contigs. Several approaches can be used to infer clone overlaps; we will review the main ones. For instance, clones or clone ends can be characterized by a fingerprint (sequence or restriction fragment information) and fingerprints are compared. One declares two clones overlap if their fingerprints are sufficiently similar. Or one can use small unique genomic sequences (anchors) to anchor the clones: clones containing a common anchor overlap.

Whatever the techniques used to infer the clone overlaps, the statistical questions are generally the same and concern the prediction of the progress of the physical mapping project. Namely, how many islands are obtained, what is their mean length, how many clones the islands are composed of, what is the proportion of chromosome covered by the islands? etc. One is interested in studying the evolution of the physical map with respect to the number of clones studied, to their length, and to some parameters related to the mapping strategy.

In this talk, we will present classical probabilistic models and associated tools used to answer the above questions. Most of the results concern expected values but we will mention recent works about variances and asymptotic distributions.