Hotel Estoril Eden, Monte Estoril,
5-8 October 2005



NextText Box: Participants
Text Box: Programme

hallenges in Data Analysis and Statistical Validation

Ruedi Aebersold
Institute for Systems Biology, Seattle, USA

The objective of proteomics is the systematic analysis of the proteins expressed by a cell, tissue or organism. It is expected that such analyses will define comprehensive molecular signatures of tissues, cells and body fluids in health and disease. These signatures will impact a wide range of biological and clinical research questions, such as the systematic study of biological processes and the discovery of molecular clinical markers for detection, diagnosis and assessment of treatment outcome. The application of proteomics technology has proven particularly beneficial in cases in which differences between the proteomes (or fractions thereof) isolated from cells at different states have been analyzed and an array of methods for high throughput quantitative proteomics have been developed. These methods have in common that they generate large amounts of raw data that have to be analyzed, validated and interpreted in terms of biological or clinical knowledge.

In this presentation, we will discuss a suite of computational tools for the robust and consistent analysis of proteomic data generated by LC-MS/MS experiments. Specific tools in the suite include mzXML - an open source, transparent format to represent MS and MS/MS data (1); PeptideProphet, a tool that uses statistical principles to compute the probability that the assignment between a MS/MS spectrum and a peptide sequence is correct (2); ProteinProphet, a tool that uses statistical analyses to compute the set of proteins that have been identified from a set of peptides (3); ASAPRatio a tool that calculates protein abundance ratios based on stable isotope signal ratios (4); and PeptideAtlas, a tool that projects the identified peptides on the human genome sequence (5). Collectively, these tools identify and quantify proteins in proteomics experiments with known error rates and facilitate the display of data in a genomic format.

  Pedrioli PGA, Eng J, Hubley R, Pratt B, Nilsson E, AEBERSOLD R. A standard open representation of mass spectrometry data and its application in a proteomics research environment. (submitted)
  Keller A, Nesvizhskii AI, Kolker E, Aebersold R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem., 2002;74:5383-92.
  Nesvizhskii AI, Keller A, Kolker E, Aebersold R. A statistical model for identifying proteins by tandem mass spectrometry. Anal. Chem., 2003;75:4646-58.
  Li X-J, Zhang H, Ranish JR, and Aebersold R. Automated statistical analysis of protein abundance ratios from data generated by stable isotope dilution and tandem mass spectrometry. Anal Chem., 2003;75:6648-6657.
  Desiere F, Deutsch EW, Nesvizhskii AI, Mallick P, King N, Eng JK, Aderem A, Brunner E, Donohoe S, Fausto N, Hafen E, Hood L, Katze MG, Kennedy K, Kregenow F, Lee H, Lin B, Ranish J, Rawlings DJ, Watts J, Wollscheid B, Wright ME, Yan W, Yi E, Zhang H, Aebersold R. Annotation of the human genome with peptide sequences obtained by high-throughput mass spectrometry. (submitted 2004)