Limits...
Corra: Computational framework and tools for LC-MS discovery and targeted mass spectrometry-based proteomics.

Brusniak MY, Bodenmiller B, Campbell D, Cooke K, Eddes J, Garbutt A, Lau H, Letarte S, Mueller LN, Sharma V, Vitek O, Zhang N, Aebersold R, Watts JD - BMC Bioinformatics (2008)

Bottom Line: However, they are generally not comparable to each other in terms of functionality, user interfaces, information input/output, and do not readily facilitate appropriate statistical data analysis.The Corra computational framework leverages computational innovation to enable biologists or other researchers to process, analyze and visualize LC-MS data with what would otherwise be a complex and not user-friendly suite of tools.For the user not trained in bioinformatics, Corra represents a complete, customizable, free and open source computational platform enabling LC-MS-based proteomic workflows, and as such, addresses an unmet need in the LC-MS proteomics field.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute for Systems Biology, 1441 North 34th Street, Seattle, WA 98103, USA. mbrusnia@systemsbiology.org

ABSTRACT

Background: Quantitative proteomics holds great promise for identifying proteins that are differentially abundant between populations representing different physiological or disease states. A range of computational tools is now available for both isotopically labeled and label-free liquid chromatography mass spectrometry (LC-MS) based quantitative proteomics. However, they are generally not comparable to each other in terms of functionality, user interfaces, information input/output, and do not readily facilitate appropriate statistical data analysis. These limitations, along with the array of choices, present a daunting prospect for biologists, and other researchers not trained in bioinformatics, who wish to use LC-MS-based quantitative proteomics.

Results: We have developed Corra, a computational framework and tools for discovery-based LC-MS proteomics. Corra extends and adapts existing algorithms used for LC-MS-based proteomics, and statistical algorithms, originally developed for microarray data analyses, appropriate for LC-MS data analysis. Corra also adapts software engineering technologies (e.g. Google Web Toolkit, distributed processing) so that computationally intense data processing and statistical analyses can run on a remote server, while the user controls and manages the process from their own computer via a simple web interface. Corra also allows the user to output significantly differentially abundant LC-MS-detected peptide features in a form compatible with subsequent sequence identification via tandem mass spectrometry (MS/MS). We present two case studies to illustrate the application of Corra to commonly performed LC-MS-based biological workflows: a pilot biomarker discovery study of glycoproteins isolated from human plasma samples relevant to type 2 diabetes, and a study in yeast to identify in vivo targets of the protein kinase Ark1 via phosphopeptide profiling.

Conclusion: The Corra computational framework leverages computational innovation to enable biologists or other researchers to process, analyze and visualize LC-MS data with what would otherwise be a complex and not user-friendly suite of tools. Corra enables appropriate statistical analyses, with controlled false-discovery rates, ultimately to inform subsequent targeted identification of differentially abundant peptides by MS/MS. For the user not trained in bioinformatics, Corra represents a complete, customizable, free and open source computational platform enabling LC-MS-based proteomic workflows, and as such, addresses an unmet need in the LC-MS proteomics field.

Show MeSH

Related in: MedlinePlus

Top elements of APML. In the presented XML schema graph notation, dotted rectangles represent optional elements and solid rectangles represent required elements. Complex types, which can be used as common element types, are defined by shaded boxes. Elements with "+" indicate there are further subelements and elements with "-" indicate that it has been expanded to display in the figure.  indicates sequence type of child elements and  indicates choice type of child elements. A) The apml element has two child elements. The dataProcessing element stores software information, and data element child elements of either feature list as peak_list element, or alignment feature list as alignment element. The cluster_profile element is an optional element for a list of clustered feature references in any time course or dilution series experiment. The dataProcessing element stores software information, and data element stores either feature list as peak_list element or alignment feature list as alignment element. B) The peak_lists can have one to many peak_list elements, which stores the detected features of a single LC-MS run. C) The alignment element stores all LC-MS file information in feature_source_list, and aligned features are stored in aligned_features element.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2651178&req=5

Figure 1: Top elements of APML. In the presented XML schema graph notation, dotted rectangles represent optional elements and solid rectangles represent required elements. Complex types, which can be used as common element types, are defined by shaded boxes. Elements with "+" indicate there are further subelements and elements with "-" indicate that it has been expanded to display in the figure. indicates sequence type of child elements and indicates choice type of child elements. A) The apml element has two child elements. The dataProcessing element stores software information, and data element child elements of either feature list as peak_list element, or alignment feature list as alignment element. The cluster_profile element is an optional element for a list of clustered feature references in any time course or dilution series experiment. The dataProcessing element stores software information, and data element stores either feature list as peak_list element or alignment feature list as alignment element. B) The peak_lists can have one to many peak_list elements, which stores the detected features of a single LC-MS run. C) The alignment element stores all LC-MS file information in feature_source_list, and aligned features are stored in aligned_features element.

Mentions: The apml element has two child elements, the dataProcessing and data elements (Figure 1A). The dataProcessing element only stores software information in the SoftwareType, while the data element, and all its sub-elements, stores all data information and potential annotations. The primary elements for data storage are the peak_lists and alignment elements. However, there is also an optional cluster_profile element, which, for example, can be used to capture a list of clustered feature references, such as would be found in a time course or dilution series experiment, when needed. Picked feature lists are stored in peak_lists, and can have one to many peak_list elements, where each stores the detected features of a single LC-MS run (Figure 1B). The alignment element stores all the LC-MS file information in the feature_source_list element, and stores aligned features in the aligned_features element (Figure 1C).


Corra: Computational framework and tools for LC-MS discovery and targeted mass spectrometry-based proteomics.

Brusniak MY, Bodenmiller B, Campbell D, Cooke K, Eddes J, Garbutt A, Lau H, Letarte S, Mueller LN, Sharma V, Vitek O, Zhang N, Aebersold R, Watts JD - BMC Bioinformatics (2008)

Top elements of APML. In the presented XML schema graph notation, dotted rectangles represent optional elements and solid rectangles represent required elements. Complex types, which can be used as common element types, are defined by shaded boxes. Elements with "+" indicate there are further subelements and elements with "-" indicate that it has been expanded to display in the figure.  indicates sequence type of child elements and  indicates choice type of child elements. A) The apml element has two child elements. The dataProcessing element stores software information, and data element child elements of either feature list as peak_list element, or alignment feature list as alignment element. The cluster_profile element is an optional element for a list of clustered feature references in any time course or dilution series experiment. The dataProcessing element stores software information, and data element stores either feature list as peak_list element or alignment feature list as alignment element. B) The peak_lists can have one to many peak_list elements, which stores the detected features of a single LC-MS run. C) The alignment element stores all LC-MS file information in feature_source_list, and aligned features are stored in aligned_features element.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2651178&req=5

Figure 1: Top elements of APML. In the presented XML schema graph notation, dotted rectangles represent optional elements and solid rectangles represent required elements. Complex types, which can be used as common element types, are defined by shaded boxes. Elements with "+" indicate there are further subelements and elements with "-" indicate that it has been expanded to display in the figure. indicates sequence type of child elements and indicates choice type of child elements. A) The apml element has two child elements. The dataProcessing element stores software information, and data element child elements of either feature list as peak_list element, or alignment feature list as alignment element. The cluster_profile element is an optional element for a list of clustered feature references in any time course or dilution series experiment. The dataProcessing element stores software information, and data element stores either feature list as peak_list element or alignment feature list as alignment element. B) The peak_lists can have one to many peak_list elements, which stores the detected features of a single LC-MS run. C) The alignment element stores all LC-MS file information in feature_source_list, and aligned features are stored in aligned_features element.
Mentions: The apml element has two child elements, the dataProcessing and data elements (Figure 1A). The dataProcessing element only stores software information in the SoftwareType, while the data element, and all its sub-elements, stores all data information and potential annotations. The primary elements for data storage are the peak_lists and alignment elements. However, there is also an optional cluster_profile element, which, for example, can be used to capture a list of clustered feature references, such as would be found in a time course or dilution series experiment, when needed. Picked feature lists are stored in peak_lists, and can have one to many peak_list elements, where each stores the detected features of a single LC-MS run (Figure 1B). The alignment element stores all the LC-MS file information in the feature_source_list element, and stores aligned features in the aligned_features element (Figure 1C).

Bottom Line: However, they are generally not comparable to each other in terms of functionality, user interfaces, information input/output, and do not readily facilitate appropriate statistical data analysis.The Corra computational framework leverages computational innovation to enable biologists or other researchers to process, analyze and visualize LC-MS data with what would otherwise be a complex and not user-friendly suite of tools.For the user not trained in bioinformatics, Corra represents a complete, customizable, free and open source computational platform enabling LC-MS-based proteomic workflows, and as such, addresses an unmet need in the LC-MS proteomics field.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute for Systems Biology, 1441 North 34th Street, Seattle, WA 98103, USA. mbrusnia@systemsbiology.org

ABSTRACT

Background: Quantitative proteomics holds great promise for identifying proteins that are differentially abundant between populations representing different physiological or disease states. A range of computational tools is now available for both isotopically labeled and label-free liquid chromatography mass spectrometry (LC-MS) based quantitative proteomics. However, they are generally not comparable to each other in terms of functionality, user interfaces, information input/output, and do not readily facilitate appropriate statistical data analysis. These limitations, along with the array of choices, present a daunting prospect for biologists, and other researchers not trained in bioinformatics, who wish to use LC-MS-based quantitative proteomics.

Results: We have developed Corra, a computational framework and tools for discovery-based LC-MS proteomics. Corra extends and adapts existing algorithms used for LC-MS-based proteomics, and statistical algorithms, originally developed for microarray data analyses, appropriate for LC-MS data analysis. Corra also adapts software engineering technologies (e.g. Google Web Toolkit, distributed processing) so that computationally intense data processing and statistical analyses can run on a remote server, while the user controls and manages the process from their own computer via a simple web interface. Corra also allows the user to output significantly differentially abundant LC-MS-detected peptide features in a form compatible with subsequent sequence identification via tandem mass spectrometry (MS/MS). We present two case studies to illustrate the application of Corra to commonly performed LC-MS-based biological workflows: a pilot biomarker discovery study of glycoproteins isolated from human plasma samples relevant to type 2 diabetes, and a study in yeast to identify in vivo targets of the protein kinase Ark1 via phosphopeptide profiling.

Conclusion: The Corra computational framework leverages computational innovation to enable biologists or other researchers to process, analyze and visualize LC-MS data with what would otherwise be a complex and not user-friendly suite of tools. Corra enables appropriate statistical analyses, with controlled false-discovery rates, ultimately to inform subsequent targeted identification of differentially abundant peptides by MS/MS. For the user not trained in bioinformatics, Corra represents a complete, customizable, free and open source computational platform enabling LC-MS-based proteomic workflows, and as such, addresses an unmet need in the LC-MS proteomics field.

Show MeSH
Related in: MedlinePlus