Limits...
OryzaPG-DB: rice proteome database based on shotgun proteogenomics.

Helmy M, Tomita M, Ishihama Y - BMC Plant Biol. (2011)

Bottom Line: Users can search, download or navigate the database per chromosome, gene, protein, cDNA or transcript and download the updated annotations in standard GFF3 format, with visualization in PNG format.In addition, the database scheme of OryzaPG was designed to be generic and can be reused to host similar proteogenomic information for other species.OryzaPG is the first proteogenomics-based database of the rice proteome, providing peptide-based expression profiles, together with the corresponding genomic origin, including the annotation of novelty for each peptide.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute for Advanced Biosciences, Keio University, 403-1 Daihoji, Tsuruoka, Yamagata 997-0017, Japan.

ABSTRACT

Background: Proteogenomics aims to utilize experimental proteome information for refinement of genome annotation. Since mass spectrometry-based shotgun proteomics approaches provide large-scale peptide sequencing data with high throughput, a data repository for shotgun proteogenomics would represent a valuable source of gene expression evidence at the translational level for genome re-annotation.

Description: Here, we present OryzaPG-DB, a rice proteome database based on shotgun proteogenomics, which incorporates the genomic features of experimental shotgun proteomics data. This version of the database was created from the results of 27 nanoLC-MS/MS runs on a hybrid ion trap-orbitrap mass spectrometer, which offers high accuracy for analyzing tryptic digests from undifferentiated cultured rice cells. Peptides were identified by searching the product ion spectra against the protein, cDNA, transcript and genome databases from Michigan State University, and were mapped to the rice genome. Approximately 3200 genes were covered by these peptides and 40 of them contained novel genomic features. Users can search, download or navigate the database per chromosome, gene, protein, cDNA or transcript and download the updated annotations in standard GFF3 format, with visualization in PNG format. In addition, the database scheme of OryzaPG was designed to be generic and can be reused to host similar proteogenomic information for other species. OryzaPG is the first proteogenomics-based database of the rice proteome, providing peptide-based expression profiles, together with the corresponding genomic origin, including the annotation of novelty for each peptide.

Conclusions: The OryzaPG database was constructed and is freely available at http://oryzapg.iab.keio.ac.jp/.

Show MeSH
Assessment and visualization of a peptide's genomic novelty. (A) The design and architecture of PGFeval. (B) The assessment algorithm used in evaluating the peptide's novelty in PGFeval. (C) Schematic illustration of peptide clusters. (D) Example of the graphical output of PGFeval.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3094275&req=5

Figure 3: Assessment and visualization of a peptide's genomic novelty. (A) The design and architecture of PGFeval. (B) The assessment algorithm used in evaluating the peptide's novelty in PGFeval. (C) Schematic illustration of peptide clusters. (D) Example of the graphical output of PGFeval.

Mentions: The genomic features can be visualized using tools such as the generic genome browser (Gbrowse) or UCSC genome browser [34,35], but determination of whether or not the peptide represents a novel genomic feature and the type of novelty, e.g., intronic or exon-boundary spanning, cannot be done with these tools. We consider a peptide novel if it does not exist in the protein database. Therefore, all the peptides identified from the other three databases are considered novel. However, this does not mean that such a peptide represents a novel genomic feature. The peptide may be aligned to a known coding region, but may not exist in the protein database, due to incompleteness [36,37]. Hence, we need an evaluation tool and algorithm to assess the genomic novelty of each novel peptide. Therefore, we developed PGFeval (ProteoGenomic Features Evaluator), an evaluation and visualization tool using perl and the GD library http://www.libgd.org, which evaluates the genomic novelty of each peptide and draws the whole gene model with graphical annotation that incorporates the genomic novelty of the peptides. PGFeval analyzes the updated annotation file in GFF3 format and uses the type, start and end to draw the gene and its structural elements, such as the UTRs and exons (Figure 3A). Next, it implements an assessment algorithm to evaluate and cluster the peptides into four clusters (intronic, exon acceptor spanning, exon donor spanning and known) (Figure 3B). The four clusters are illustrated in Figure 3C. In addition, PGFeval exports two CSV reports, a genes report and a peptides report, in a master-slave style. The genes report contains one entry per gene summarizing the gene's features such as total peptides, number of novel peptides and novel genomic features, while the peptide report contains one entry per peptide, indicating its gene and assessment result, such as novelty, cluster and identification source. Thus, the two reports can be easily analyzed using any spreadsheet software or imported into any relational database as two tables with one-to-many relationship. Figure 3D shows an example of the PGFeval graphical output.


OryzaPG-DB: rice proteome database based on shotgun proteogenomics.

Helmy M, Tomita M, Ishihama Y - BMC Plant Biol. (2011)

Assessment and visualization of a peptide's genomic novelty. (A) The design and architecture of PGFeval. (B) The assessment algorithm used in evaluating the peptide's novelty in PGFeval. (C) Schematic illustration of peptide clusters. (D) Example of the graphical output of PGFeval.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3094275&req=5

Figure 3: Assessment and visualization of a peptide's genomic novelty. (A) The design and architecture of PGFeval. (B) The assessment algorithm used in evaluating the peptide's novelty in PGFeval. (C) Schematic illustration of peptide clusters. (D) Example of the graphical output of PGFeval.
Mentions: The genomic features can be visualized using tools such as the generic genome browser (Gbrowse) or UCSC genome browser [34,35], but determination of whether or not the peptide represents a novel genomic feature and the type of novelty, e.g., intronic or exon-boundary spanning, cannot be done with these tools. We consider a peptide novel if it does not exist in the protein database. Therefore, all the peptides identified from the other three databases are considered novel. However, this does not mean that such a peptide represents a novel genomic feature. The peptide may be aligned to a known coding region, but may not exist in the protein database, due to incompleteness [36,37]. Hence, we need an evaluation tool and algorithm to assess the genomic novelty of each novel peptide. Therefore, we developed PGFeval (ProteoGenomic Features Evaluator), an evaluation and visualization tool using perl and the GD library http://www.libgd.org, which evaluates the genomic novelty of each peptide and draws the whole gene model with graphical annotation that incorporates the genomic novelty of the peptides. PGFeval analyzes the updated annotation file in GFF3 format and uses the type, start and end to draw the gene and its structural elements, such as the UTRs and exons (Figure 3A). Next, it implements an assessment algorithm to evaluate and cluster the peptides into four clusters (intronic, exon acceptor spanning, exon donor spanning and known) (Figure 3B). The four clusters are illustrated in Figure 3C. In addition, PGFeval exports two CSV reports, a genes report and a peptides report, in a master-slave style. The genes report contains one entry per gene summarizing the gene's features such as total peptides, number of novel peptides and novel genomic features, while the peptide report contains one entry per peptide, indicating its gene and assessment result, such as novelty, cluster and identification source. Thus, the two reports can be easily analyzed using any spreadsheet software or imported into any relational database as two tables with one-to-many relationship. Figure 3D shows an example of the PGFeval graphical output.

Bottom Line: Users can search, download or navigate the database per chromosome, gene, protein, cDNA or transcript and download the updated annotations in standard GFF3 format, with visualization in PNG format.In addition, the database scheme of OryzaPG was designed to be generic and can be reused to host similar proteogenomic information for other species.OryzaPG is the first proteogenomics-based database of the rice proteome, providing peptide-based expression profiles, together with the corresponding genomic origin, including the annotation of novelty for each peptide.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute for Advanced Biosciences, Keio University, 403-1 Daihoji, Tsuruoka, Yamagata 997-0017, Japan.

ABSTRACT

Background: Proteogenomics aims to utilize experimental proteome information for refinement of genome annotation. Since mass spectrometry-based shotgun proteomics approaches provide large-scale peptide sequencing data with high throughput, a data repository for shotgun proteogenomics would represent a valuable source of gene expression evidence at the translational level for genome re-annotation.

Description: Here, we present OryzaPG-DB, a rice proteome database based on shotgun proteogenomics, which incorporates the genomic features of experimental shotgun proteomics data. This version of the database was created from the results of 27 nanoLC-MS/MS runs on a hybrid ion trap-orbitrap mass spectrometer, which offers high accuracy for analyzing tryptic digests from undifferentiated cultured rice cells. Peptides were identified by searching the product ion spectra against the protein, cDNA, transcript and genome databases from Michigan State University, and were mapped to the rice genome. Approximately 3200 genes were covered by these peptides and 40 of them contained novel genomic features. Users can search, download or navigate the database per chromosome, gene, protein, cDNA or transcript and download the updated annotations in standard GFF3 format, with visualization in PNG format. In addition, the database scheme of OryzaPG was designed to be generic and can be reused to host similar proteogenomic information for other species. OryzaPG is the first proteogenomics-based database of the rice proteome, providing peptide-based expression profiles, together with the corresponding genomic origin, including the annotation of novelty for each peptide.

Conclusions: The OryzaPG database was constructed and is freely available at http://oryzapg.iab.keio.ac.jp/.

Show MeSH