Limits...
The Drosophila melanogaster PeptideAtlas facilitates the use of peptide data for improved fly proteomics and genome annotation.

Loevenich SN, Brunner E, King NL, Deutsch EW, Stein SE, FlyBase ConsortiumAebersold R, Hafen E - BMC Bioinformatics (2009)

Bottom Line: Crucial foundations of any quantitative systems biology experiment are correct genome and proteome annotations.PeptideAtlas is an open access database for the Drosophila community that has several features and applications that support (1) reduction of the complexity inherently associated with performing targeted proteomic studies, (2) designing and accelerating shotgun proteomics experiments, (3) confirming or questioning gene models, and (4) adjusting gene models such that they are in line with observed Drosophila peptides.While the database consists of proteomic data it is not required that the user is a proteomics expert.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Molecular Systems Biology, ETH Zurich, 8093 Zurich, Switzerland. loevenich@imsb.biol.ethz.ch

ABSTRACT

Background: Crucial foundations of any quantitative systems biology experiment are correct genome and proteome annotations. Protein databases compiled from high quality empirical protein identifications that are in turn based on correct gene models increase the correctness, sensitivity, and quantitative accuracy of systems biology genome-scale experiments.

Results: In this manuscript, we present the Drosophila melanogaster PeptideAtlas, a fly proteomics and genomics resource of unsurpassed depth. Based on peptide mass spectrometry data collected in our laboratory the portal http://www.drosophila-peptideatlas.org allows querying fly protein data observed with respect to gene model confirmation and splice site verification as well as for the identification of proteotypic peptides suited for targeted proteomics studies. Additionally, the database provides consensus mass spectra for observed peptides along with qualitative and quantitative information about the number of observations of a particular peptide and the sample(s) in which it was observed.

Conclusion: PeptideAtlas is an open access database for the Drosophila community that has several features and applications that support (1) reduction of the complexity inherently associated with performing targeted proteomic studies, (2) designing and accelerating shotgun proteomics experiments, (3) confirming or questioning gene models, and (4) adjusting gene models such that they are in line with observed Drosophila peptides. While the database consists of proteomic data it is not required that the user is a proteomics expert.

Show MeSH
A peptide highlights a missing splice form. Part of the gene model of the Na pump alpha subunit (Atpα, CG5670) is depicted. In front of the black background, different types of sequence data are displayed: several predictions (light pink, purple, and different shades of turquoise), conserved coding regions (bright yellow), cDNAs alignments (greens), and peptides from the PeptideAtlas (bright pink). In front of the light blue background, alternative splice forms annotated in release 5.12 are shown in dark blue. The PeptideAtlas peptide PAp00073066 was identified in a 6-frame search and maps within the Atpα gene region. Note that while prediction algorithms postulate an alternative exon in this region, there are no supporting cDNAs (nor ESTs; not shown). The splice variant Atpα-PI, added in FlyBase annotation release 5.11, now accounts for the identified peptide sequence, NPEIDNLVNER. The codon for the last residue of the peptide spans the adjacent intron, thus supporting the annotated splice sites.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2648944&req=5

Figure 4: A peptide highlights a missing splice form. Part of the gene model of the Na pump alpha subunit (Atpα, CG5670) is depicted. In front of the black background, different types of sequence data are displayed: several predictions (light pink, purple, and different shades of turquoise), conserved coding regions (bright yellow), cDNAs alignments (greens), and peptides from the PeptideAtlas (bright pink). In front of the light blue background, alternative splice forms annotated in release 5.12 are shown in dark blue. The PeptideAtlas peptide PAp00073066 was identified in a 6-frame search and maps within the Atpα gene region. Note that while prediction algorithms postulate an alternative exon in this region, there are no supporting cDNAs (nor ESTs; not shown). The splice variant Atpα-PI, added in FlyBase annotation release 5.11, now accounts for the identified peptide sequence, NPEIDNLVNER. The codon for the last residue of the peptide spans the adjacent intron, thus supporting the annotated splice sites.

Mentions: Aiming to find peptides not anticipated by the current genome annotation, we searched a subset of the PeptideAtlas data against a 6-frame translation of the genome. A set of 889 distinct peptides originally found in this genomic search were not in agreement with any gene model annotated in the reference database. Those sequences can potentially represent currently un-annotated stretches of expressed sequence or novel splice variants. To understand the origin of those matches, we further investigated their genomic context. In a first step, we looked for peptides that could be explained by a newer release of FlyBase. We found that 68 peptides were explained by the newer annotation r5.2 or encoded by transposons. In a second step, to avoid ambiguous genomic placements, peptides were excluded which matched to more than one genomic location. As a third step, those peptides were filtered out which had been identified based solely on mass spectra in which no fragment ions with a larger m/z value than their precursor mass were observed (therefore likely representing singly charged peptides which often deliver poor spectra). Lastly, all spectra with a quality value < 1 as computed by the algorithm Qualscore [52] were removed from the final set. The remaining 250 peptides were subjected to detailed manual analysis in collaboration with the FlyBase curators. It was found that 46 peptides point to conservation-based exon predictions. By using the peptides, gene models that are likely to benefit most from those predictions can be identified easily by pointing the curators to predictions that can be confirmed with peptide data. An example case is shown in Figure 4. The peptide NPEIDNLVNER supports the addition of a novel isoform of the Na pump α subunit (Atpα, CG5670). In this case, several different prediction algorithms postulate a unique exon, but there are no cDNA or EST data to support such an alternative transcript. The peptide data confirm that the exon in question exists. Overall, new potential splice variants have been generated and will be included in a future FlyBase release. The remaining peptides partly contradict other evidence. They could potentially represent small genes, novel exons, cases of alternative reading frames, or false positives. They are subject to ongoing investigations. This shows that, even in Drosophila melanogaster, a species that is likely to have one of the best annotated genomes amongst higher eukaryotes, the use of PeptideAtlas information leads to an improvement of the genome annotation.


The Drosophila melanogaster PeptideAtlas facilitates the use of peptide data for improved fly proteomics and genome annotation.

Loevenich SN, Brunner E, King NL, Deutsch EW, Stein SE, FlyBase ConsortiumAebersold R, Hafen E - BMC Bioinformatics (2009)

A peptide highlights a missing splice form. Part of the gene model of the Na pump alpha subunit (Atpα, CG5670) is depicted. In front of the black background, different types of sequence data are displayed: several predictions (light pink, purple, and different shades of turquoise), conserved coding regions (bright yellow), cDNAs alignments (greens), and peptides from the PeptideAtlas (bright pink). In front of the light blue background, alternative splice forms annotated in release 5.12 are shown in dark blue. The PeptideAtlas peptide PAp00073066 was identified in a 6-frame search and maps within the Atpα gene region. Note that while prediction algorithms postulate an alternative exon in this region, there are no supporting cDNAs (nor ESTs; not shown). The splice variant Atpα-PI, added in FlyBase annotation release 5.11, now accounts for the identified peptide sequence, NPEIDNLVNER. The codon for the last residue of the peptide spans the adjacent intron, thus supporting the annotated splice sites.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2648944&req=5

Figure 4: A peptide highlights a missing splice form. Part of the gene model of the Na pump alpha subunit (Atpα, CG5670) is depicted. In front of the black background, different types of sequence data are displayed: several predictions (light pink, purple, and different shades of turquoise), conserved coding regions (bright yellow), cDNAs alignments (greens), and peptides from the PeptideAtlas (bright pink). In front of the light blue background, alternative splice forms annotated in release 5.12 are shown in dark blue. The PeptideAtlas peptide PAp00073066 was identified in a 6-frame search and maps within the Atpα gene region. Note that while prediction algorithms postulate an alternative exon in this region, there are no supporting cDNAs (nor ESTs; not shown). The splice variant Atpα-PI, added in FlyBase annotation release 5.11, now accounts for the identified peptide sequence, NPEIDNLVNER. The codon for the last residue of the peptide spans the adjacent intron, thus supporting the annotated splice sites.
Mentions: Aiming to find peptides not anticipated by the current genome annotation, we searched a subset of the PeptideAtlas data against a 6-frame translation of the genome. A set of 889 distinct peptides originally found in this genomic search were not in agreement with any gene model annotated in the reference database. Those sequences can potentially represent currently un-annotated stretches of expressed sequence or novel splice variants. To understand the origin of those matches, we further investigated their genomic context. In a first step, we looked for peptides that could be explained by a newer release of FlyBase. We found that 68 peptides were explained by the newer annotation r5.2 or encoded by transposons. In a second step, to avoid ambiguous genomic placements, peptides were excluded which matched to more than one genomic location. As a third step, those peptides were filtered out which had been identified based solely on mass spectra in which no fragment ions with a larger m/z value than their precursor mass were observed (therefore likely representing singly charged peptides which often deliver poor spectra). Lastly, all spectra with a quality value < 1 as computed by the algorithm Qualscore [52] were removed from the final set. The remaining 250 peptides were subjected to detailed manual analysis in collaboration with the FlyBase curators. It was found that 46 peptides point to conservation-based exon predictions. By using the peptides, gene models that are likely to benefit most from those predictions can be identified easily by pointing the curators to predictions that can be confirmed with peptide data. An example case is shown in Figure 4. The peptide NPEIDNLVNER supports the addition of a novel isoform of the Na pump α subunit (Atpα, CG5670). In this case, several different prediction algorithms postulate a unique exon, but there are no cDNA or EST data to support such an alternative transcript. The peptide data confirm that the exon in question exists. Overall, new potential splice variants have been generated and will be included in a future FlyBase release. The remaining peptides partly contradict other evidence. They could potentially represent small genes, novel exons, cases of alternative reading frames, or false positives. They are subject to ongoing investigations. This shows that, even in Drosophila melanogaster, a species that is likely to have one of the best annotated genomes amongst higher eukaryotes, the use of PeptideAtlas information leads to an improvement of the genome annotation.

Bottom Line: Crucial foundations of any quantitative systems biology experiment are correct genome and proteome annotations.PeptideAtlas is an open access database for the Drosophila community that has several features and applications that support (1) reduction of the complexity inherently associated with performing targeted proteomic studies, (2) designing and accelerating shotgun proteomics experiments, (3) confirming or questioning gene models, and (4) adjusting gene models such that they are in line with observed Drosophila peptides.While the database consists of proteomic data it is not required that the user is a proteomics expert.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Molecular Systems Biology, ETH Zurich, 8093 Zurich, Switzerland. loevenich@imsb.biol.ethz.ch

ABSTRACT

Background: Crucial foundations of any quantitative systems biology experiment are correct genome and proteome annotations. Protein databases compiled from high quality empirical protein identifications that are in turn based on correct gene models increase the correctness, sensitivity, and quantitative accuracy of systems biology genome-scale experiments.

Results: In this manuscript, we present the Drosophila melanogaster PeptideAtlas, a fly proteomics and genomics resource of unsurpassed depth. Based on peptide mass spectrometry data collected in our laboratory the portal http://www.drosophila-peptideatlas.org allows querying fly protein data observed with respect to gene model confirmation and splice site verification as well as for the identification of proteotypic peptides suited for targeted proteomics studies. Additionally, the database provides consensus mass spectra for observed peptides along with qualitative and quantitative information about the number of observations of a particular peptide and the sample(s) in which it was observed.

Conclusion: PeptideAtlas is an open access database for the Drosophila community that has several features and applications that support (1) reduction of the complexity inherently associated with performing targeted proteomic studies, (2) designing and accelerating shotgun proteomics experiments, (3) confirming or questioning gene models, and (4) adjusting gene models such that they are in line with observed Drosophila peptides. While the database consists of proteomic data it is not required that the user is a proteomics expert.

Show MeSH