If the your browser is internet explorer 5 or any older version, you are only able to read the content of this page, but not to see the layout.

In-Silico Analysis of Proteins

Celebrating the 20th anniversary of Swiss-Prot

July 30 - August 04, 2006 : Fortaleza, Brazil

Poster #RP219

Evaluation of the available sequence databases for the mass-spectrometry based identification of Arabidopsis thaliana proteins and introduction of a new non-redundant compiled database.

Kris Laukens*, Filip Lemière*, Eric Bonnet**, Eddy Esmans*, Harry Van Onckelen*, Geert De Jaeger**, Erwin Witters*

*University of Antwerp, Antwerp, Belgium; **University of Ghent, Ghent, Belgium

Mass spectrometry-based protein identification largely relies on the comparison of spectrum peak lists with in silico-generated masses of fragmented protein sequence databases. Being the first fully sequenced plant species, Arabidopsis thaliana, is currently the system of choice for many plant proteome studies. Several protein sequence databases are available for this species, and the choice of a database largely affects MS and tandem-MS based identification rates.

In this presentation, mass-spectrometry based protein identification success rates are compared between different databases. Using a real dataset real dataset resulting from a high-throughput tandem affinity purification project, consisting of MS and MSMS spectra from over 1600 gel-separated and digested proteins, the hit rates of the different databases are evaluated. False positive hit rates are determined using parallel queries against randomized 'decoy' sequence databases.

Based on the results, a new non redundant compiled database is introduced, which facilitates optimal spectrum-based protein identification and yields the best spectrum-protein matches possible. Though focussed on plant proteome research, the approaches described in this presentation can be extended towards other model organisms.