MitoFates: improved prediction of mitochondrial targeting sequences and their cleavage sites.
Bottom Line: Five-hundred and eighty of these genes were not annotated as mitochondrial in either UniProt or Gene Ontology.Interestingly, these include candidate regulators of parkin translocation to damaged mitochondria, and also many genes with known disease mutations, suggesting that careful investigation of MitoFates predictions may be helpful in elucidating the role of mitochondria in health and disease.MitoFates is open source with a convenient web server publicly available.
Affiliation: From the ‡Department of Computational Biology, Graduate School of Frontier Sciences, The University Tokyo, 5-1-5, Kashiwanoha, Kashiwa, Chiba, 277-8561, Japan;Show MeSH
Related in: MedlinePlus
Mentions: We benchmarked presequence prediction performance between our predictor (MitoFates) and four previously developed predictors: TPpred2, TargetP (ver. 1.1), Predotar (ver. 1.03), and MitoProtII (ver. 1.101) on the independent test data containing 78 presequences described in Methods. Fig. 2A shows the 11 point precision-recall curve (PR-curve) of each predictor averaged over 10 random selections of 500 negative test set proteins. MitoFates achieves an average precision of 84% on the PR-curve, outperforming TPpred2, Predotar, TargetP, and MitoProtII, which obtained an average precision of 81%, 79%, 78%, and 74%, respectively. In particular, MitoFates attains better precision for recall values of 50–80% (in this range the average precision of MitoFates, TPpred2, Predotar, TargetP and MitoProtII is 91%, 81%, 82%, 77%, and 77%, respectively). The ROC AUC of MitoFates is also superior to other predictors (Table I). For MitoFates, we focused on two prediction cutoffs (0.5 and 0.385) based on a 5-fold cross-validation test within the training data set (supplemental Fig. S2); 0.5 is the default cutoff determined by LIBSVM (34) with a precision and recall of 0.83 and 0.73, respectively; and 0.385 corresponds to a precision and recall of 0.79 and 0.80. At both prediction cutoff values, MitoFates' Matthews correlation coefficient (MCC) is better than those of other predictors at their default cutoffs. In addition, the PR-curve and ROC AUC of MitoFates is better than TargetP and Predotar even when MitoFates is trained on their training data set (supplemental Fig. S3), suggesting that our novel features contribute to improved prediction accuracy (the training data set of TPpred2 overlapped to a large extent with our test data so we did not do this experiment on the TPpred2 training data). However, the PR-curve and ROC AUC of MitoFates trained on those data sets is inferior to those of MitoFates trained on its original data set, suggesting that the updated MitoFates training data also contributes to its superior performance.
Affiliation: From the ‡Department of Computational Biology, Graduate School of Frontier Sciences, The University Tokyo, 5-1-5, Kashiwanoha, Kashiwa, Chiba, 277-8561, Japan;