MitoFates: improved prediction of mitochondrial targeting sequences and their cleavage sites.
Bottom Line: Five-hundred and eighty of these genes were not annotated as mitochondrial in either UniProt or Gene Ontology.Interestingly, these include candidate regulators of parkin translocation to damaged mitochondria, and also many genes with known disease mutations, suggesting that careful investigation of MitoFates predictions may be helpful in elucidating the role of mitochondria in health and disease.MitoFates is open source with a convenient web server publicly available.
Affiliation: From the ‡Department of Computational Biology, Graduate School of Frontier Sciences, The University Tokyo, 5-1-5, Kashiwanoha, Kashiwa, Chiba, 277-8561, Japan;Show MeSH
Related in: MedlinePlus
Mentions: A large majority of presequences are cleaved by MPP, and many of those by secondary proteases as well. MPP cleavage sites display local sequence tendencies (20), the most conspicuous one being the presence of arginine in the −2 position in nearly all cases, consistent with electrostatic interaction between this arginine and negatively charged residues in MPP (27). After cleavage by MPP, the secondary proteases Oct1 and Icp55 further cleave some presequences, removing eight residues or a single residue, respectively (7). It is reasonable to hope that explicit modeling of this two-step process might improve the prediction of those presequences. Thus, we generated a consensus Position Weight Matrix (PWM) based on the frequencies of amino acids between the −4 position and the +5 position of training set sequences aligned by cleavage site. As with the amino acid composition values described above, we smoothed the observed frequencies in each column of the PWM with a 20-component Dirichlet mixture model (26). The PWM score is calculated as the log-odds ratio between those smoothed frequencies and a background composition based on the mature region of cleaved mitochondrial proteins. To predict if putative MPP cleavage sites are further cleaved by Oct1 or Icp55, we employed PWMs based on the cleavage sites of those peptidases in the training data. By inspection of the training data, we chose the range of positions covered by the PWMs to be [+1, +4] (length 4) and [+1, +2] (length 2) for MPP+Oct1 and MPP+Icp55, respectively (Fig. 1A). Because plant data was rather limited and PWMs require a large number of parameters (19 per column), we chose to use PWMs trained on the more abundant yeast data, even when making predictions for plant proteins (however, we did retrain the length distribution as described below).
Affiliation: From the ‡Department of Computational Biology, Graduate School of Frontier Sciences, The University Tokyo, 5-1-5, Kashiwanoha, Kashiwa, Chiba, 277-8561, Japan;