Limits...
Multiclassifier combinatorial proteomics of organelle shadows at the example of mitochondria in chromatin data.

Kustatscher G, Grabowski P, Rappsilber J - Proteomics (2016)

Bottom Line: We show covariation of mitochondrial proteins in chromatin proteomics data.We then exploit this covariation by multiclassifier combinatorial proteomics to define a list of mitochondrial proteins.This list agrees well with different databases on mitochondrial composition.

View Article: PubMed Central - PubMed

Affiliation: Wellcome Trust Centre for Cell Biology, University of Edinburgh, UK.

No MeSH data available.


Related in: MedlinePlus

A Random Forest model can predict mitochondrial proteins based on their covariation in chromatin proteomics data. (A) High accuracy of mitochondrial prediction is shown by ROC curves derived from the 100‐fold cross‐validated mitochondrial and nonmitochondrial reference set. The ten curves correspond to ten Random Forests generated with different negative training data, highlighting the robustness of the Random Forest model. AUC: area under the curve. (B) Random Forest scores for the 4565 proteins (gray) in our analysis. High‐confidence mitochondrial reference proteins (magenta) are heavily enriched toward higher scores. The pie‐chart shows the manual annotation of proteins within the dashed rectangle, corresponding to a score cut‐off of 0.69. Most proteins are either part of our high‐confidence mitochondrial reference set or other known mitochondrial proteins. Six proteins were poorly annotated. 18 proteins were classified as nonmitochondrial, i.e. they are well‐annotated but no evidence for mitochondrial function exists. This group was used to estimate that we have about 10% false positives at this score cut‐off.
© Copyright Policy - creativeCommonsBy-nc
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4862026&req=5

pmic12175-fig-0002: A Random Forest model can predict mitochondrial proteins based on their covariation in chromatin proteomics data. (A) High accuracy of mitochondrial prediction is shown by ROC curves derived from the 100‐fold cross‐validated mitochondrial and nonmitochondrial reference set. The ten curves correspond to ten Random Forests generated with different negative training data, highlighting the robustness of the Random Forest model. AUC: area under the curve. (B) Random Forest scores for the 4565 proteins (gray) in our analysis. High‐confidence mitochondrial reference proteins (magenta) are heavily enriched toward higher scores. The pie‐chart shows the manual annotation of proteins within the dashed rectangle, corresponding to a score cut‐off of 0.69. Most proteins are either part of our high‐confidence mitochondrial reference set or other known mitochondrial proteins. Six proteins were poorly annotated. 18 proteins were classified as nonmitochondrial, i.e. they are well‐annotated but no evidence for mitochondrial function exists. This group was used to estimate that we have about 10% false positives at this score cut‐off.

Mentions: We next performed 100‐fold cross‐validation to determine reliable prediction scores for our high‐confidence mitochondrial proteins. This means we constructed 100 Random Forests and in each left out a different 1% of the reference data, using the model generated with the remaining 99% to obtain unbiased prediction scores for these proteins. This allowed us to use a ROC curve to estimate the model's performance, in addition to the inbuilt out‐of‐bag error estimate of the Random Forest algorithm. The mean area under the ROC curve we obtained was 0.96 (Fig. 2A). This confirms the high accuracy of our prediction already indicated by the low out‐of‐bag error.


Multiclassifier combinatorial proteomics of organelle shadows at the example of mitochondria in chromatin data.

Kustatscher G, Grabowski P, Rappsilber J - Proteomics (2016)

A Random Forest model can predict mitochondrial proteins based on their covariation in chromatin proteomics data. (A) High accuracy of mitochondrial prediction is shown by ROC curves derived from the 100‐fold cross‐validated mitochondrial and nonmitochondrial reference set. The ten curves correspond to ten Random Forests generated with different negative training data, highlighting the robustness of the Random Forest model. AUC: area under the curve. (B) Random Forest scores for the 4565 proteins (gray) in our analysis. High‐confidence mitochondrial reference proteins (magenta) are heavily enriched toward higher scores. The pie‐chart shows the manual annotation of proteins within the dashed rectangle, corresponding to a score cut‐off of 0.69. Most proteins are either part of our high‐confidence mitochondrial reference set or other known mitochondrial proteins. Six proteins were poorly annotated. 18 proteins were classified as nonmitochondrial, i.e. they are well‐annotated but no evidence for mitochondrial function exists. This group was used to estimate that we have about 10% false positives at this score cut‐off.
© Copyright Policy - creativeCommonsBy-nc
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4862026&req=5

pmic12175-fig-0002: A Random Forest model can predict mitochondrial proteins based on their covariation in chromatin proteomics data. (A) High accuracy of mitochondrial prediction is shown by ROC curves derived from the 100‐fold cross‐validated mitochondrial and nonmitochondrial reference set. The ten curves correspond to ten Random Forests generated with different negative training data, highlighting the robustness of the Random Forest model. AUC: area under the curve. (B) Random Forest scores for the 4565 proteins (gray) in our analysis. High‐confidence mitochondrial reference proteins (magenta) are heavily enriched toward higher scores. The pie‐chart shows the manual annotation of proteins within the dashed rectangle, corresponding to a score cut‐off of 0.69. Most proteins are either part of our high‐confidence mitochondrial reference set or other known mitochondrial proteins. Six proteins were poorly annotated. 18 proteins were classified as nonmitochondrial, i.e. they are well‐annotated but no evidence for mitochondrial function exists. This group was used to estimate that we have about 10% false positives at this score cut‐off.
Mentions: We next performed 100‐fold cross‐validation to determine reliable prediction scores for our high‐confidence mitochondrial proteins. This means we constructed 100 Random Forests and in each left out a different 1% of the reference data, using the model generated with the remaining 99% to obtain unbiased prediction scores for these proteins. This allowed us to use a ROC curve to estimate the model's performance, in addition to the inbuilt out‐of‐bag error estimate of the Random Forest algorithm. The mean area under the ROC curve we obtained was 0.96 (Fig. 2A). This confirms the high accuracy of our prediction already indicated by the low out‐of‐bag error.

Bottom Line: We show covariation of mitochondrial proteins in chromatin proteomics data.We then exploit this covariation by multiclassifier combinatorial proteomics to define a list of mitochondrial proteins.This list agrees well with different databases on mitochondrial composition.

View Article: PubMed Central - PubMed

Affiliation: Wellcome Trust Centre for Cell Biology, University of Edinburgh, UK.

No MeSH data available.


Related in: MedlinePlus