Limits...
Analysis of protein complexes through model-based biclustering of label-free quantitative AP-MS data.

Choi H, Kim S, Gingras AC, Nesvizhskii AI - Mol. Syst. Biol. (2010)

Bottom Line: In doing so, nested clustering effectively addresses the problem of overrepresentation of interactions involving baits proteins as compared with proteins only identified as preys.The method does not require specification of the number of bait clusters, which is an advantage against existing model-based clustering methods.We also discuss general challenges of analyzing and interpreting clustering results in the context of AP-MS data.

View Article: PubMed Central - PubMed

Affiliation: Department of Pathology, University of Michigan, Ann Arbor, MI 48109, USA.

ABSTRACT
Affinity purification followed by mass spectrometry (AP-MS) has become a common approach for identifying protein-protein interactions (PPIs) and complexes. However, data analysis and visualization often rely on generic approaches that do not take advantage of the quantitative nature of AP-MS. We present a novel computational method, nested clustering, for biclustering of label-free quantitative AP-MS data. Our approach forms bait clusters based on the similarity of quantitative interaction profiles and identifies submatrices of prey proteins showing consistent quantitative association within bait clusters. In doing so, nested clustering effectively addresses the problem of overrepresentation of interactions involving baits proteins as compared with proteins only identified as preys. The method does not require specification of the number of bait clusters, which is an advantage against existing model-based clustering methods. We illustrate the performance of the algorithm using two published intermediate scale human PPI data sets, which are representative of the AP-MS data generated from mammalian cells. We also discuss general challenges of analyzing and interpreting clustering results in the context of AP-MS data.

Show MeSH
Overview of the computational method. (A) Nested clustering algorithm. Baits are probabilistically assigned to bait clusters with associated mean and variance. The diameter of circles is proportional to the normalized spectral counts. In bait clusters, mean abundance of each prey is drawn as a square. Mixture modeling is used to group these elements into a small number of abundance levels, completing nested clustering of prey proteins. (B) Resulting biclusters from the algorithm in (A). Each bicluster corresponds to a submatrix consisting of a bait cluster and an associated nested prey cluster. (C) Example of maximum a posteriori estimation. Bait clustering is illustrated in a hypothetical data with two preys. Each dot is a single purification with a different bait. Four unique sets of clustering configurations were generated in 100 samples. The number N is the number of samples sharing the given bait clusters, and maxP is the maximum posterior probability under the fixed bait cluster configuration. The Model 2 is the most frequently sampled configuration with the highest maximum posterior probability, and Model 3 is the second best competing model with similarly high posterior probability. The other two configurations have low posterior probability and low frequency of sampling.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2913403&req=5

f1: Overview of the computational method. (A) Nested clustering algorithm. Baits are probabilistically assigned to bait clusters with associated mean and variance. The diameter of circles is proportional to the normalized spectral counts. In bait clusters, mean abundance of each prey is drawn as a square. Mixture modeling is used to group these elements into a small number of abundance levels, completing nested clustering of prey proteins. (B) Resulting biclusters from the algorithm in (A). Each bicluster corresponds to a submatrix consisting of a bait cluster and an associated nested prey cluster. (C) Example of maximum a posteriori estimation. Bait clustering is illustrated in a hypothetical data with two preys. Each dot is a single purification with a different bait. Four unique sets of clustering configurations were generated in 100 samples. The number N is the number of samples sharing the given bait clusters, and maxP is the maximum posterior probability under the fixed bait cluster configuration. The Model 2 is the most frequently sampled configuration with the highest maximum posterior probability, and Model 3 is the second best competing model with similarly high posterior probability. The other two configurations have low posterior probability and low frequency of sampling.

Mentions: The nested clustering approach is illustrated in Figure 1. The algorithm identifies biclusters by stochastically drawing samples of bait and prey cluster configurations from the appropriate posterior distribution, as well as mean and variance parameters associated with them using the Markov chain Monte Carlo (MCMC) algorithm (see Supplementary information for detail). The biclustering configuration yielding the highest posterior probability is selected as the optimal solution. Figure 1A illustrates a single iteration for drawing bait and prey clusters.


Analysis of protein complexes through model-based biclustering of label-free quantitative AP-MS data.

Choi H, Kim S, Gingras AC, Nesvizhskii AI - Mol. Syst. Biol. (2010)

Overview of the computational method. (A) Nested clustering algorithm. Baits are probabilistically assigned to bait clusters with associated mean and variance. The diameter of circles is proportional to the normalized spectral counts. In bait clusters, mean abundance of each prey is drawn as a square. Mixture modeling is used to group these elements into a small number of abundance levels, completing nested clustering of prey proteins. (B) Resulting biclusters from the algorithm in (A). Each bicluster corresponds to a submatrix consisting of a bait cluster and an associated nested prey cluster. (C) Example of maximum a posteriori estimation. Bait clustering is illustrated in a hypothetical data with two preys. Each dot is a single purification with a different bait. Four unique sets of clustering configurations were generated in 100 samples. The number N is the number of samples sharing the given bait clusters, and maxP is the maximum posterior probability under the fixed bait cluster configuration. The Model 2 is the most frequently sampled configuration with the highest maximum posterior probability, and Model 3 is the second best competing model with similarly high posterior probability. The other two configurations have low posterior probability and low frequency of sampling.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2913403&req=5

f1: Overview of the computational method. (A) Nested clustering algorithm. Baits are probabilistically assigned to bait clusters with associated mean and variance. The diameter of circles is proportional to the normalized spectral counts. In bait clusters, mean abundance of each prey is drawn as a square. Mixture modeling is used to group these elements into a small number of abundance levels, completing nested clustering of prey proteins. (B) Resulting biclusters from the algorithm in (A). Each bicluster corresponds to a submatrix consisting of a bait cluster and an associated nested prey cluster. (C) Example of maximum a posteriori estimation. Bait clustering is illustrated in a hypothetical data with two preys. Each dot is a single purification with a different bait. Four unique sets of clustering configurations were generated in 100 samples. The number N is the number of samples sharing the given bait clusters, and maxP is the maximum posterior probability under the fixed bait cluster configuration. The Model 2 is the most frequently sampled configuration with the highest maximum posterior probability, and Model 3 is the second best competing model with similarly high posterior probability. The other two configurations have low posterior probability and low frequency of sampling.
Mentions: The nested clustering approach is illustrated in Figure 1. The algorithm identifies biclusters by stochastically drawing samples of bait and prey cluster configurations from the appropriate posterior distribution, as well as mean and variance parameters associated with them using the Markov chain Monte Carlo (MCMC) algorithm (see Supplementary information for detail). The biclustering configuration yielding the highest posterior probability is selected as the optimal solution. Figure 1A illustrates a single iteration for drawing bait and prey clusters.

Bottom Line: In doing so, nested clustering effectively addresses the problem of overrepresentation of interactions involving baits proteins as compared with proteins only identified as preys.The method does not require specification of the number of bait clusters, which is an advantage against existing model-based clustering methods.We also discuss general challenges of analyzing and interpreting clustering results in the context of AP-MS data.

View Article: PubMed Central - PubMed

Affiliation: Department of Pathology, University of Michigan, Ann Arbor, MI 48109, USA.

ABSTRACT
Affinity purification followed by mass spectrometry (AP-MS) has become a common approach for identifying protein-protein interactions (PPIs) and complexes. However, data analysis and visualization often rely on generic approaches that do not take advantage of the quantitative nature of AP-MS. We present a novel computational method, nested clustering, for biclustering of label-free quantitative AP-MS data. Our approach forms bait clusters based on the similarity of quantitative interaction profiles and identifies submatrices of prey proteins showing consistent quantitative association within bait clusters. In doing so, nested clustering effectively addresses the problem of overrepresentation of interactions involving baits proteins as compared with proteins only identified as preys. The method does not require specification of the number of bait clusters, which is an advantage against existing model-based clustering methods. We illustrate the performance of the algorithm using two published intermediate scale human PPI data sets, which are representative of the AP-MS data generated from mammalian cells. We also discuss general challenges of analyzing and interpreting clustering results in the context of AP-MS data.

Show MeSH