Limits...
Candidate prioritization for low-abundant differentially expressed proteins in 2D-DIGE datasets.

Nandal UK, Vlietstra WJ, Byrman C, Jeeninga RE, Ringrose JH, van Kampen AH, Speijer D, Moerland PD - BMC Bioinformatics (2015)

Bottom Line: However, identification is often not possible for low-abundant proteins.Our approach exploits noisy information on the isoelectric point and apparent molecular mass of a protein spot in combination with functional similarities of candidate proteins to already identified proteins to select and rank candidates.Using leave-one-out cross-validation, we show that the true-positive rate for the top-5 ranked proteins is 43.8%.

View Article: PubMed Central - PubMed

Affiliation: Bioinformatics Laboratory, Academic Medical Center, University of Amsterdam, PO Box 22700, DE Amsterdam, 1100, The Netherlands. u.k.nandal@amc.uva.nl.

ABSTRACT

Background: Two-dimensional differential gel electrophoresis (2D-DIGE) provides a powerful technique to separate proteins on their isoelectric point and apparent molecular mass and quantify changes in protein expression. Abundantly available proteins in spots can be identified using mass spectrometry-based approaches. However, identification is often not possible for low-abundant proteins.

Results: We present a novel computational approach to prioritize candidate proteins for unidentified spots. Our approach exploits noisy information on the isoelectric point and apparent molecular mass of a protein spot in combination with functional similarities of candidate proteins to already identified proteins to select and rank candidates. We evaluated our method on a 2D-DIGE dataset comparing protein expression in uninfected and HIV-1 infected T-cells. Using leave-one-out cross-validation, we show that the true-positive rate for the top-5 ranked proteins is 43.8%.

Conclusions: Our approach shows good performance on a 2D-DIGE dataset comparing protein expression in uninfected and HIV-1 infected T-cells. We expect our method to be highly useful in (re-)mining other 2D-DIGE experiments in which especially the low-abundant protein spots remain to be identified.

Show MeSH

Related in: MedlinePlus

Prioritization of candidate proteins based on pI and Mw. Step 1: pI and Mw (Da) of the mature forms of the proteins identified by PMF are determined using the ExPASy tool “Compute pI/Mw” [13]. Step 2: The (x,y) coordinates of the identified spots and their corresponding pI and Mw (on log10-scale) are used as training data for fitting two cubic smoothing splines. Step 3: For an unidentified test spot u, a candidate list of proteins is generated using the ExPASy tool TagIdent [14] by specifying ranges Δ and δ(%) around the pI and Mw predicted by the smoothing splines, respectively. Step 4: Proteins in the candidate list are ranked by calculating their similarities with the PMF-identified ‘seed’ proteins using STRING association scores. Step 5 (optional): The ranked candidate list can be further filtered using presence (black) and absence (white) calls from the Gene Expression Barcode 3.0 [15]. A protein is excluded from the ranked list if the corresponding gene is expressed on none of the selected microarrays.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4384356&req=5

Fig1: Prioritization of candidate proteins based on pI and Mw. Step 1: pI and Mw (Da) of the mature forms of the proteins identified by PMF are determined using the ExPASy tool “Compute pI/Mw” [13]. Step 2: The (x,y) coordinates of the identified spots and their corresponding pI and Mw (on log10-scale) are used as training data for fitting two cubic smoothing splines. Step 3: For an unidentified test spot u, a candidate list of proteins is generated using the ExPASy tool TagIdent [14] by specifying ranges Δ and δ(%) around the pI and Mw predicted by the smoothing splines, respectively. Step 4: Proteins in the candidate list are ranked by calculating their similarities with the PMF-identified ‘seed’ proteins using STRING association scores. Step 5 (optional): The ranked candidate list can be further filtered using presence (black) and absence (white) calls from the Gene Expression Barcode 3.0 [15]. A protein is excluded from the ranked list if the corresponding gene is expressed on none of the selected microarrays.

Mentions: The objective of our method is to identify the most likely protein candidates for low-abundant differentially expressed spots. Our method uses proteins identified by PMF and their (x,y) coordinates on the gel to prioritize candidate proteins for unidentified spots. In this section we present the different steps of our prioritization approach (Figure 1).Figure 1


Candidate prioritization for low-abundant differentially expressed proteins in 2D-DIGE datasets.

Nandal UK, Vlietstra WJ, Byrman C, Jeeninga RE, Ringrose JH, van Kampen AH, Speijer D, Moerland PD - BMC Bioinformatics (2015)

Prioritization of candidate proteins based on pI and Mw. Step 1: pI and Mw (Da) of the mature forms of the proteins identified by PMF are determined using the ExPASy tool “Compute pI/Mw” [13]. Step 2: The (x,y) coordinates of the identified spots and their corresponding pI and Mw (on log10-scale) are used as training data for fitting two cubic smoothing splines. Step 3: For an unidentified test spot u, a candidate list of proteins is generated using the ExPASy tool TagIdent [14] by specifying ranges Δ and δ(%) around the pI and Mw predicted by the smoothing splines, respectively. Step 4: Proteins in the candidate list are ranked by calculating their similarities with the PMF-identified ‘seed’ proteins using STRING association scores. Step 5 (optional): The ranked candidate list can be further filtered using presence (black) and absence (white) calls from the Gene Expression Barcode 3.0 [15]. A protein is excluded from the ranked list if the corresponding gene is expressed on none of the selected microarrays.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4384356&req=5

Fig1: Prioritization of candidate proteins based on pI and Mw. Step 1: pI and Mw (Da) of the mature forms of the proteins identified by PMF are determined using the ExPASy tool “Compute pI/Mw” [13]. Step 2: The (x,y) coordinates of the identified spots and their corresponding pI and Mw (on log10-scale) are used as training data for fitting two cubic smoothing splines. Step 3: For an unidentified test spot u, a candidate list of proteins is generated using the ExPASy tool TagIdent [14] by specifying ranges Δ and δ(%) around the pI and Mw predicted by the smoothing splines, respectively. Step 4: Proteins in the candidate list are ranked by calculating their similarities with the PMF-identified ‘seed’ proteins using STRING association scores. Step 5 (optional): The ranked candidate list can be further filtered using presence (black) and absence (white) calls from the Gene Expression Barcode 3.0 [15]. A protein is excluded from the ranked list if the corresponding gene is expressed on none of the selected microarrays.
Mentions: The objective of our method is to identify the most likely protein candidates for low-abundant differentially expressed spots. Our method uses proteins identified by PMF and their (x,y) coordinates on the gel to prioritize candidate proteins for unidentified spots. In this section we present the different steps of our prioritization approach (Figure 1).Figure 1

Bottom Line: However, identification is often not possible for low-abundant proteins.Our approach exploits noisy information on the isoelectric point and apparent molecular mass of a protein spot in combination with functional similarities of candidate proteins to already identified proteins to select and rank candidates.Using leave-one-out cross-validation, we show that the true-positive rate for the top-5 ranked proteins is 43.8%.

View Article: PubMed Central - PubMed

Affiliation: Bioinformatics Laboratory, Academic Medical Center, University of Amsterdam, PO Box 22700, DE Amsterdam, 1100, The Netherlands. u.k.nandal@amc.uva.nl.

ABSTRACT

Background: Two-dimensional differential gel electrophoresis (2D-DIGE) provides a powerful technique to separate proteins on their isoelectric point and apparent molecular mass and quantify changes in protein expression. Abundantly available proteins in spots can be identified using mass spectrometry-based approaches. However, identification is often not possible for low-abundant proteins.

Results: We present a novel computational approach to prioritize candidate proteins for unidentified spots. Our approach exploits noisy information on the isoelectric point and apparent molecular mass of a protein spot in combination with functional similarities of candidate proteins to already identified proteins to select and rank candidates. We evaluated our method on a 2D-DIGE dataset comparing protein expression in uninfected and HIV-1 infected T-cells. Using leave-one-out cross-validation, we show that the true-positive rate for the top-5 ranked proteins is 43.8%.

Conclusions: Our approach shows good performance on a 2D-DIGE dataset comparing protein expression in uninfected and HIV-1 infected T-cells. We expect our method to be highly useful in (re-)mining other 2D-DIGE experiments in which especially the low-abundant protein spots remain to be identified.

Show MeSH
Related in: MedlinePlus