Limits...
Corra: Computational framework and tools for LC-MS discovery and targeted mass spectrometry-based proteomics.

Brusniak MY, Bodenmiller B, Campbell D, Cooke K, Eddes J, Garbutt A, Lau H, Letarte S, Mueller LN, Sharma V, Vitek O, Zhang N, Aebersold R, Watts JD - BMC Bioinformatics (2008)

Bottom Line: However, they are generally not comparable to each other in terms of functionality, user interfaces, information input/output, and do not readily facilitate appropriate statistical data analysis.The Corra computational framework leverages computational innovation to enable biologists or other researchers to process, analyze and visualize LC-MS data with what would otherwise be a complex and not user-friendly suite of tools.For the user not trained in bioinformatics, Corra represents a complete, customizable, free and open source computational platform enabling LC-MS-based proteomic workflows, and as such, addresses an unmet need in the LC-MS proteomics field.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute for Systems Biology, 1441 North 34th Street, Seattle, WA 98103, USA. mbrusnia@systemsbiology.org

ABSTRACT

Background: Quantitative proteomics holds great promise for identifying proteins that are differentially abundant between populations representing different physiological or disease states. A range of computational tools is now available for both isotopically labeled and label-free liquid chromatography mass spectrometry (LC-MS) based quantitative proteomics. However, they are generally not comparable to each other in terms of functionality, user interfaces, information input/output, and do not readily facilitate appropriate statistical data analysis. These limitations, along with the array of choices, present a daunting prospect for biologists, and other researchers not trained in bioinformatics, who wish to use LC-MS-based quantitative proteomics.

Results: We have developed Corra, a computational framework and tools for discovery-based LC-MS proteomics. Corra extends and adapts existing algorithms used for LC-MS-based proteomics, and statistical algorithms, originally developed for microarray data analyses, appropriate for LC-MS data analysis. Corra also adapts software engineering technologies (e.g. Google Web Toolkit, distributed processing) so that computationally intense data processing and statistical analyses can run on a remote server, while the user controls and manages the process from their own computer via a simple web interface. Corra also allows the user to output significantly differentially abundant LC-MS-detected peptide features in a form compatible with subsequent sequence identification via tandem mass spectrometry (MS/MS). We present two case studies to illustrate the application of Corra to commonly performed LC-MS-based biological workflows: a pilot biomarker discovery study of glycoproteins isolated from human plasma samples relevant to type 2 diabetes, and a study in yeast to identify in vivo targets of the protein kinase Ark1 via phosphopeptide profiling.

Conclusion: The Corra computational framework leverages computational innovation to enable biologists or other researchers to process, analyze and visualize LC-MS data with what would otherwise be a complex and not user-friendly suite of tools. Corra enables appropriate statistical analyses, with controlled false-discovery rates, ultimately to inform subsequent targeted identification of differentially abundant peptides by MS/MS. For the user not trained in bioinformatics, Corra represents a complete, customizable, free and open source computational platform enabling LC-MS-based proteomic workflows, and as such, addresses an unmet need in the LC-MS proteomics field.

Show MeSH

Related in: MedlinePlus

Corra generated volcano plot of yeast phosphopeptide analyses. Phosphopeptides were isolated from two yeast strains, one wild type, and the other an Ark1 protein kinase knockout, and analyzed in triplicate on a very high mass accuracy LC-MS platform, as described under Methods. Volcano plot displays 22,562 features that aligned across 3 or more LC-MS runs. The x-axis shows observed log fold change in aligned feature mean intensities between the two yeast strains. The y-axis shows B-statistics log Odds for non-random differential abundance obtained for each aligned feature. Red colored dots indicate features with a log Odds value of ≥ 2.2 (which translates to a posterior probability of 90% chance of non-random differential abundance) and that also utilized the 'n/a replace' capability in Corra (for missing values). Blue colored dots indicate features with a log Odds value of ≥ 2.2, but did not require use of the 'n/a replace' function. A log Odds value of 0 corresponds to a 50% probability of non-random differential abundance, and a log Odds of 2.2 to a 90% probability.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2651178&req=5

Figure 10: Corra generated volcano plot of yeast phosphopeptide analyses. Phosphopeptides were isolated from two yeast strains, one wild type, and the other an Ark1 protein kinase knockout, and analyzed in triplicate on a very high mass accuracy LC-MS platform, as described under Methods. Volcano plot displays 22,562 features that aligned across 3 or more LC-MS runs. The x-axis shows observed log fold change in aligned feature mean intensities between the two yeast strains. The y-axis shows B-statistics log Odds for non-random differential abundance obtained for each aligned feature. Red colored dots indicate features with a log Odds value of ≥ 2.2 (which translates to a posterior probability of 90% chance of non-random differential abundance) and that also utilized the 'n/a replace' capability in Corra (for missing values). Blue colored dots indicate features with a log Odds value of ≥ 2.2, but did not require use of the 'n/a replace' function. A log Odds value of 0 corresponds to a 50% probability of non-random differential abundance, and a log Odds of 2.2 to a 90% probability.

Mentions: Figure 9 shows a clustering analysis for all 22,562 features that aligned across 3 or more runs, and demonstrated that, as expected, the aligned features distinguished very well between the two yeast strains. The excellent separation observed between the replicate analyses of each sample was clearly enhanced by the large, artificial, ratios generated via use of the 'n/a replace' function. A volcano plot, shown in Figure 10, shows the log Odds distribution for differential abundance, for the same 22,562 features aligned in 3 or more runs. Those with a log Odds value of ≥ 2.2 (i.e. > 90.0% chance of non-random observation of differential abundance), and for which the 'n/a replace' function was used, are colored red. The smaller number of blue-colored features represent those also with a log Odds value of ≥ 2.2, but for which the 'n/a replace' was not required, and thus these generally showed lower ratios of differential abundance (i.e. not artificial) than the red-colored features. In comparing Figures 8 and 10, we can also make a couple of general observations. In the yeast study, we observed much larger ratios, almost certainly due to the replacement of missing features. On the other hand, in the human diabetes study shown in Figure 8, we observed much larger log Odds values (i.e. increased confidence in differential abundance). This is almost certainly due to the much larger sample size (66 LC-MS analyses vs. 6 in the yeast study), therefore leading to better statistical confidence.


Corra: Computational framework and tools for LC-MS discovery and targeted mass spectrometry-based proteomics.

Brusniak MY, Bodenmiller B, Campbell D, Cooke K, Eddes J, Garbutt A, Lau H, Letarte S, Mueller LN, Sharma V, Vitek O, Zhang N, Aebersold R, Watts JD - BMC Bioinformatics (2008)

Corra generated volcano plot of yeast phosphopeptide analyses. Phosphopeptides were isolated from two yeast strains, one wild type, and the other an Ark1 protein kinase knockout, and analyzed in triplicate on a very high mass accuracy LC-MS platform, as described under Methods. Volcano plot displays 22,562 features that aligned across 3 or more LC-MS runs. The x-axis shows observed log fold change in aligned feature mean intensities between the two yeast strains. The y-axis shows B-statistics log Odds for non-random differential abundance obtained for each aligned feature. Red colored dots indicate features with a log Odds value of ≥ 2.2 (which translates to a posterior probability of 90% chance of non-random differential abundance) and that also utilized the 'n/a replace' capability in Corra (for missing values). Blue colored dots indicate features with a log Odds value of ≥ 2.2, but did not require use of the 'n/a replace' function. A log Odds value of 0 corresponds to a 50% probability of non-random differential abundance, and a log Odds of 2.2 to a 90% probability.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2651178&req=5

Figure 10: Corra generated volcano plot of yeast phosphopeptide analyses. Phosphopeptides were isolated from two yeast strains, one wild type, and the other an Ark1 protein kinase knockout, and analyzed in triplicate on a very high mass accuracy LC-MS platform, as described under Methods. Volcano plot displays 22,562 features that aligned across 3 or more LC-MS runs. The x-axis shows observed log fold change in aligned feature mean intensities between the two yeast strains. The y-axis shows B-statistics log Odds for non-random differential abundance obtained for each aligned feature. Red colored dots indicate features with a log Odds value of ≥ 2.2 (which translates to a posterior probability of 90% chance of non-random differential abundance) and that also utilized the 'n/a replace' capability in Corra (for missing values). Blue colored dots indicate features with a log Odds value of ≥ 2.2, but did not require use of the 'n/a replace' function. A log Odds value of 0 corresponds to a 50% probability of non-random differential abundance, and a log Odds of 2.2 to a 90% probability.
Mentions: Figure 9 shows a clustering analysis for all 22,562 features that aligned across 3 or more runs, and demonstrated that, as expected, the aligned features distinguished very well between the two yeast strains. The excellent separation observed between the replicate analyses of each sample was clearly enhanced by the large, artificial, ratios generated via use of the 'n/a replace' function. A volcano plot, shown in Figure 10, shows the log Odds distribution for differential abundance, for the same 22,562 features aligned in 3 or more runs. Those with a log Odds value of ≥ 2.2 (i.e. > 90.0% chance of non-random observation of differential abundance), and for which the 'n/a replace' function was used, are colored red. The smaller number of blue-colored features represent those also with a log Odds value of ≥ 2.2, but for which the 'n/a replace' was not required, and thus these generally showed lower ratios of differential abundance (i.e. not artificial) than the red-colored features. In comparing Figures 8 and 10, we can also make a couple of general observations. In the yeast study, we observed much larger ratios, almost certainly due to the replacement of missing features. On the other hand, in the human diabetes study shown in Figure 8, we observed much larger log Odds values (i.e. increased confidence in differential abundance). This is almost certainly due to the much larger sample size (66 LC-MS analyses vs. 6 in the yeast study), therefore leading to better statistical confidence.

Bottom Line: However, they are generally not comparable to each other in terms of functionality, user interfaces, information input/output, and do not readily facilitate appropriate statistical data analysis.The Corra computational framework leverages computational innovation to enable biologists or other researchers to process, analyze and visualize LC-MS data with what would otherwise be a complex and not user-friendly suite of tools.For the user not trained in bioinformatics, Corra represents a complete, customizable, free and open source computational platform enabling LC-MS-based proteomic workflows, and as such, addresses an unmet need in the LC-MS proteomics field.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute for Systems Biology, 1441 North 34th Street, Seattle, WA 98103, USA. mbrusnia@systemsbiology.org

ABSTRACT

Background: Quantitative proteomics holds great promise for identifying proteins that are differentially abundant between populations representing different physiological or disease states. A range of computational tools is now available for both isotopically labeled and label-free liquid chromatography mass spectrometry (LC-MS) based quantitative proteomics. However, they are generally not comparable to each other in terms of functionality, user interfaces, information input/output, and do not readily facilitate appropriate statistical data analysis. These limitations, along with the array of choices, present a daunting prospect for biologists, and other researchers not trained in bioinformatics, who wish to use LC-MS-based quantitative proteomics.

Results: We have developed Corra, a computational framework and tools for discovery-based LC-MS proteomics. Corra extends and adapts existing algorithms used for LC-MS-based proteomics, and statistical algorithms, originally developed for microarray data analyses, appropriate for LC-MS data analysis. Corra also adapts software engineering technologies (e.g. Google Web Toolkit, distributed processing) so that computationally intense data processing and statistical analyses can run on a remote server, while the user controls and manages the process from their own computer via a simple web interface. Corra also allows the user to output significantly differentially abundant LC-MS-detected peptide features in a form compatible with subsequent sequence identification via tandem mass spectrometry (MS/MS). We present two case studies to illustrate the application of Corra to commonly performed LC-MS-based biological workflows: a pilot biomarker discovery study of glycoproteins isolated from human plasma samples relevant to type 2 diabetes, and a study in yeast to identify in vivo targets of the protein kinase Ark1 via phosphopeptide profiling.

Conclusion: The Corra computational framework leverages computational innovation to enable biologists or other researchers to process, analyze and visualize LC-MS data with what would otherwise be a complex and not user-friendly suite of tools. Corra enables appropriate statistical analyses, with controlled false-discovery rates, ultimately to inform subsequent targeted identification of differentially abundant peptides by MS/MS. For the user not trained in bioinformatics, Corra represents a complete, customizable, free and open source computational platform enabling LC-MS-based proteomic workflows, and as such, addresses an unmet need in the LC-MS proteomics field.

Show MeSH
Related in: MedlinePlus