Limits...
The bacterial proteogenomic pipeline.

Uszkoreit J, Plohnke N, Rexroth S, Marcus K, Eisenacher M - BMC Genomics (2014)

Bottom Line: After combination of the search results and optional flagging for different experimental conditions, the results can be browsed and further inspected.Intermediate and final results can be exported into GFF3 format for visualization in common genome browsers.The pipeline allows integrating peptide identifications from various algorithms and emphasizes the visualization of spectral counts from different experimental conditions.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: Proteogenomics combines the cutting-edge methods from genomics and proteomics. While it has become cheap to sequence whole genomes, the correct annotation of protein coding regions in the genome is still tedious and error prone. Mass spectrometry on the other hand relies on good characterizations of proteins derived from the genome, but can also be used to help improving the annotation of genomes or find species specific peptides. Additionally, proteomics is widely used to find evidence for differential expression of proteins under different conditions, e.g. growth conditions for bacteria. The concept of proteogenomics is not altogether new, in-house scripts are used by different labs and some special tools for eukaryotic and human analyses are available.

Results: The Bacterial Proteogenomic Pipeline, which is completely written in Java, alleviates the conducting of proteogenomic analyses of bacteria. From a given genome sequence, a naïve six frame translation is performed and, if desired, a decoy database generated. This database is used to identify MS/MS spectra by common peptide identification algorithms. After combination of the search results and optional flagging for different experimental conditions, the results can be browsed and further inspected. In particular, for each peptide the number of identifications for each condition and the positions in the corresponding protein sequences are shown. Intermediate and final results can be exported into GFF3 format for visualization in common genome browsers.

Conclusions: To facilitate proteogenomics analyses the Bacterial Proteogenomic Pipeline is a set of comprehensive tools running on common desktop computers, written in Java and thus platform independent. The pipeline allows integrating peptide identifications from various algorithms and emphasizes the visualization of spectral counts from different experimental conditions.

Show MeSH
Screenshot of the Bacterial Proteogenomic Pipeline GUI. The GUI of the Bacterial Proteogenomic Pipeline leads the user through all steps required for a proteogenomic analysis. Shown is the final step, the analysis of the combined search results. After opening a file created in the "Combine Identifications" step, the identified peptide sequences are shown in a table with information about the sequence, the originating genomic sequence (usually the chromosome or a plasmid), corresponding protein accessions, whether or not the peptide occurs only in a pseudo protein, in an elongation of an annotated protein or is a standalone pseudo protein. Additionally the numbers of distinct identifications in all files and the (normalized) numbers of identifications per condition of the searched samples are given and represented in the bar charts in the lower half of the screen. For a selected peptide, the protein sequences containing the peptide are depicted, with the identified sequences highlighted in bold. The result table can be filtered and additional spectrum identification files can be added, for which the condition groups may be freely chosen.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4290607&req=5

Figure 1: Screenshot of the Bacterial Proteogenomic Pipeline GUI. The GUI of the Bacterial Proteogenomic Pipeline leads the user through all steps required for a proteogenomic analysis. Shown is the final step, the analysis of the combined search results. After opening a file created in the "Combine Identifications" step, the identified peptide sequences are shown in a table with information about the sequence, the originating genomic sequence (usually the chromosome or a plasmid), corresponding protein accessions, whether or not the peptide occurs only in a pseudo protein, in an elongation of an annotated protein or is a standalone pseudo protein. Additionally the numbers of distinct identifications in all files and the (normalized) numbers of identifications per condition of the searched samples are given and represented in the bar charts in the lower half of the screen. For a selected peptide, the protein sequences containing the peptide are depicted, with the identified sequences highlighted in bold. The result table can be filtered and additional spectrum identification files can be added, for which the condition groups may be freely chosen.

Mentions: The Bacterial Proteogenomic Pipeline consists of several Java classes which allow a complete proteogenomics approach using MS/MS data, except for the peptide identification step, which is done by search engines. All parts of the pipeline can be run on any current desktop system compatible with Java. The source code is available under a three-clause BSD license and thus open source for everyone. Besides the command line execution, we provide a GUI which will guide the user in six steps through the analysis. The steps will be further explained in the following paragraphs. Figure 1 shows the GUI at the last analysis step (i.e. the listing and visualization of the identified peptides).


The bacterial proteogenomic pipeline.

Uszkoreit J, Plohnke N, Rexroth S, Marcus K, Eisenacher M - BMC Genomics (2014)

Screenshot of the Bacterial Proteogenomic Pipeline GUI. The GUI of the Bacterial Proteogenomic Pipeline leads the user through all steps required for a proteogenomic analysis. Shown is the final step, the analysis of the combined search results. After opening a file created in the "Combine Identifications" step, the identified peptide sequences are shown in a table with information about the sequence, the originating genomic sequence (usually the chromosome or a plasmid), corresponding protein accessions, whether or not the peptide occurs only in a pseudo protein, in an elongation of an annotated protein or is a standalone pseudo protein. Additionally the numbers of distinct identifications in all files and the (normalized) numbers of identifications per condition of the searched samples are given and represented in the bar charts in the lower half of the screen. For a selected peptide, the protein sequences containing the peptide are depicted, with the identified sequences highlighted in bold. The result table can be filtered and additional spectrum identification files can be added, for which the condition groups may be freely chosen.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4290607&req=5

Figure 1: Screenshot of the Bacterial Proteogenomic Pipeline GUI. The GUI of the Bacterial Proteogenomic Pipeline leads the user through all steps required for a proteogenomic analysis. Shown is the final step, the analysis of the combined search results. After opening a file created in the "Combine Identifications" step, the identified peptide sequences are shown in a table with information about the sequence, the originating genomic sequence (usually the chromosome or a plasmid), corresponding protein accessions, whether or not the peptide occurs only in a pseudo protein, in an elongation of an annotated protein or is a standalone pseudo protein. Additionally the numbers of distinct identifications in all files and the (normalized) numbers of identifications per condition of the searched samples are given and represented in the bar charts in the lower half of the screen. For a selected peptide, the protein sequences containing the peptide are depicted, with the identified sequences highlighted in bold. The result table can be filtered and additional spectrum identification files can be added, for which the condition groups may be freely chosen.
Mentions: The Bacterial Proteogenomic Pipeline consists of several Java classes which allow a complete proteogenomics approach using MS/MS data, except for the peptide identification step, which is done by search engines. All parts of the pipeline can be run on any current desktop system compatible with Java. The source code is available under a three-clause BSD license and thus open source for everyone. Besides the command line execution, we provide a GUI which will guide the user in six steps through the analysis. The steps will be further explained in the following paragraphs. Figure 1 shows the GUI at the last analysis step (i.e. the listing and visualization of the identified peptides).

Bottom Line: After combination of the search results and optional flagging for different experimental conditions, the results can be browsed and further inspected.Intermediate and final results can be exported into GFF3 format for visualization in common genome browsers.The pipeline allows integrating peptide identifications from various algorithms and emphasizes the visualization of spectral counts from different experimental conditions.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: Proteogenomics combines the cutting-edge methods from genomics and proteomics. While it has become cheap to sequence whole genomes, the correct annotation of protein coding regions in the genome is still tedious and error prone. Mass spectrometry on the other hand relies on good characterizations of proteins derived from the genome, but can also be used to help improving the annotation of genomes or find species specific peptides. Additionally, proteomics is widely used to find evidence for differential expression of proteins under different conditions, e.g. growth conditions for bacteria. The concept of proteogenomics is not altogether new, in-house scripts are used by different labs and some special tools for eukaryotic and human analyses are available.

Results: The Bacterial Proteogenomic Pipeline, which is completely written in Java, alleviates the conducting of proteogenomic analyses of bacteria. From a given genome sequence, a naïve six frame translation is performed and, if desired, a decoy database generated. This database is used to identify MS/MS spectra by common peptide identification algorithms. After combination of the search results and optional flagging for different experimental conditions, the results can be browsed and further inspected. In particular, for each peptide the number of identifications for each condition and the positions in the corresponding protein sequences are shown. Intermediate and final results can be exported into GFF3 format for visualization in common genome browsers.

Conclusions: To facilitate proteogenomics analyses the Bacterial Proteogenomic Pipeline is a set of comprehensive tools running on common desktop computers, written in Java and thus platform independent. The pipeline allows integrating peptide identifications from various algorithms and emphasizes the visualization of spectral counts from different experimental conditions.

Show MeSH