Limits...
Candidate prioritization for low-abundant differentially expressed proteins in 2D-DIGE datasets.

Nandal UK, Vlietstra WJ, Byrman C, Jeeninga RE, Ringrose JH, van Kampen AH, Speijer D, Moerland PD - BMC Bioinformatics (2015)

Bottom Line: However, identification is often not possible for low-abundant proteins.Our approach exploits noisy information on the isoelectric point and apparent molecular mass of a protein spot in combination with functional similarities of candidate proteins to already identified proteins to select and rank candidates.Using leave-one-out cross-validation, we show that the true-positive rate for the top-5 ranked proteins is 43.8%.

View Article: PubMed Central - PubMed

Affiliation: Bioinformatics Laboratory, Academic Medical Center, University of Amsterdam, PO Box 22700, DE Amsterdam, 1100, The Netherlands. u.k.nandal@amc.uva.nl.

ABSTRACT

Background: Two-dimensional differential gel electrophoresis (2D-DIGE) provides a powerful technique to separate proteins on their isoelectric point and apparent molecular mass and quantify changes in protein expression. Abundantly available proteins in spots can be identified using mass spectrometry-based approaches. However, identification is often not possible for low-abundant proteins.

Results: We present a novel computational approach to prioritize candidate proteins for unidentified spots. Our approach exploits noisy information on the isoelectric point and apparent molecular mass of a protein spot in combination with functional similarities of candidate proteins to already identified proteins to select and rank candidates. We evaluated our method on a 2D-DIGE dataset comparing protein expression in uninfected and HIV-1 infected T-cells. Using leave-one-out cross-validation, we show that the true-positive rate for the top-5 ranked proteins is 43.8%.

Conclusions: Our approach shows good performance on a 2D-DIGE dataset comparing protein expression in uninfected and HIV-1 infected T-cells. We expect our method to be highly useful in (re-)mining other 2D-DIGE experiments in which especially the low-abundant protein spots remain to be identified.

Show MeSH

Related in: MedlinePlus

Influence of pI range (Δ) and Mw range (δ) specified for TagIdent.(A) Influence on the average number of proteins in the candidate list. (B) Influence on recall, that is the fraction of seed proteins included in their own candidate list as returned by TagIdent. For each identified spot in the 2D-DIGE dataset and all combinations of predefined values for the pI and Mw range, a candidate list was generated following Steps 1–3 of our prioritization approach using LOOCV.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4384356&req=5

Fig2: Influence of pI range (Δ) and Mw range (δ) specified for TagIdent.(A) Influence on the average number of proteins in the candidate list. (B) Influence on recall, that is the fraction of seed proteins included in their own candidate list as returned by TagIdent. For each identified spot in the 2D-DIGE dataset and all combinations of predefined values for the pI and Mw range, a candidate list was generated following Steps 1–3 of our prioritization approach using LOOCV.

Mentions: An important ingredient of our prioritization approach is the information provided by the (x,y) coordinates of an unidentified spot on the pI and Mw of the protein(s) that migrated there. However, this information is noisy and can lead to considerable differences between observed and predicted pI and Mw values (Figure 1, Step 2). Such differences can, for example, be caused by undetected posttranslational modifications (PTMs) leading to changes in migration behaviour in both dimensions, as PTMs can alter both overall apparent molecular mass and charge. All SDS-PAGE separation techniques also have hydrophobic proteins showing anomalous migration due to extra SDS binding [22]. These factors and others can lead to errors of more than 10% when using SDS-PAGE to determine the Mw of a protein [23]. Our method takes the uncertainty of the predicted pI and Mw values into account and generates a list of candidate proteins for an unidentified spot using TagIdent by specifying the pI and Mw range around the estimated pI and Mw values (Figure 1, Step 3). The size of the chosen pI and Mw range is has a large influence on the performance of the prioritization method. When choosing ranges too narrow, candidate lists become short and the probability of the correct protein being included is small (Figure 2). For the smallest pI range (Δ=0.04) and Mw range (δ=1%), the average number of proteins in a candidate list was 6 with a recall of 6%. When choosing ranges too large, candidate lists in general contain the correct protein but become very long. For the largest pI range (Δ=1) and Mw range (δ=30%), the average number of proteins in a candidate list was 2,626 and 96.2% of the seed proteins appeared in their own candidate list. However, long candidate lists will likely lead to the correct protein being lowly ranked after prioritization. With a more moderate choice of pI and Mw range, for example Δ=0.2 and δ=8%, 60.9% of the seed proteins were contained in their own candidate list with an average candidate list length of 199.Figure 2


Candidate prioritization for low-abundant differentially expressed proteins in 2D-DIGE datasets.

Nandal UK, Vlietstra WJ, Byrman C, Jeeninga RE, Ringrose JH, van Kampen AH, Speijer D, Moerland PD - BMC Bioinformatics (2015)

Influence of pI range (Δ) and Mw range (δ) specified for TagIdent.(A) Influence on the average number of proteins in the candidate list. (B) Influence on recall, that is the fraction of seed proteins included in their own candidate list as returned by TagIdent. For each identified spot in the 2D-DIGE dataset and all combinations of predefined values for the pI and Mw range, a candidate list was generated following Steps 1–3 of our prioritization approach using LOOCV.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4384356&req=5

Fig2: Influence of pI range (Δ) and Mw range (δ) specified for TagIdent.(A) Influence on the average number of proteins in the candidate list. (B) Influence on recall, that is the fraction of seed proteins included in their own candidate list as returned by TagIdent. For each identified spot in the 2D-DIGE dataset and all combinations of predefined values for the pI and Mw range, a candidate list was generated following Steps 1–3 of our prioritization approach using LOOCV.
Mentions: An important ingredient of our prioritization approach is the information provided by the (x,y) coordinates of an unidentified spot on the pI and Mw of the protein(s) that migrated there. However, this information is noisy and can lead to considerable differences between observed and predicted pI and Mw values (Figure 1, Step 2). Such differences can, for example, be caused by undetected posttranslational modifications (PTMs) leading to changes in migration behaviour in both dimensions, as PTMs can alter both overall apparent molecular mass and charge. All SDS-PAGE separation techniques also have hydrophobic proteins showing anomalous migration due to extra SDS binding [22]. These factors and others can lead to errors of more than 10% when using SDS-PAGE to determine the Mw of a protein [23]. Our method takes the uncertainty of the predicted pI and Mw values into account and generates a list of candidate proteins for an unidentified spot using TagIdent by specifying the pI and Mw range around the estimated pI and Mw values (Figure 1, Step 3). The size of the chosen pI and Mw range is has a large influence on the performance of the prioritization method. When choosing ranges too narrow, candidate lists become short and the probability of the correct protein being included is small (Figure 2). For the smallest pI range (Δ=0.04) and Mw range (δ=1%), the average number of proteins in a candidate list was 6 with a recall of 6%. When choosing ranges too large, candidate lists in general contain the correct protein but become very long. For the largest pI range (Δ=1) and Mw range (δ=30%), the average number of proteins in a candidate list was 2,626 and 96.2% of the seed proteins appeared in their own candidate list. However, long candidate lists will likely lead to the correct protein being lowly ranked after prioritization. With a more moderate choice of pI and Mw range, for example Δ=0.2 and δ=8%, 60.9% of the seed proteins were contained in their own candidate list with an average candidate list length of 199.Figure 2

Bottom Line: However, identification is often not possible for low-abundant proteins.Our approach exploits noisy information on the isoelectric point and apparent molecular mass of a protein spot in combination with functional similarities of candidate proteins to already identified proteins to select and rank candidates.Using leave-one-out cross-validation, we show that the true-positive rate for the top-5 ranked proteins is 43.8%.

View Article: PubMed Central - PubMed

Affiliation: Bioinformatics Laboratory, Academic Medical Center, University of Amsterdam, PO Box 22700, DE Amsterdam, 1100, The Netherlands. u.k.nandal@amc.uva.nl.

ABSTRACT

Background: Two-dimensional differential gel electrophoresis (2D-DIGE) provides a powerful technique to separate proteins on their isoelectric point and apparent molecular mass and quantify changes in protein expression. Abundantly available proteins in spots can be identified using mass spectrometry-based approaches. However, identification is often not possible for low-abundant proteins.

Results: We present a novel computational approach to prioritize candidate proteins for unidentified spots. Our approach exploits noisy information on the isoelectric point and apparent molecular mass of a protein spot in combination with functional similarities of candidate proteins to already identified proteins to select and rank candidates. We evaluated our method on a 2D-DIGE dataset comparing protein expression in uninfected and HIV-1 infected T-cells. Using leave-one-out cross-validation, we show that the true-positive rate for the top-5 ranked proteins is 43.8%.

Conclusions: Our approach shows good performance on a 2D-DIGE dataset comparing protein expression in uninfected and HIV-1 infected T-cells. We expect our method to be highly useful in (re-)mining other 2D-DIGE experiments in which especially the low-abundant protein spots remain to be identified.

Show MeSH
Related in: MedlinePlus