Limits...
Transferring functional annotations of membrane transporters on the basis of sequence similarity and sequence motifs.

Barghash A, Helms V - BMC Bioinformatics (2013)

Bottom Line: At similar identity thresholds, the nature of the transported substrates was more divergent (F-measure 40--75% at the same thresholds) than the TC family membership.Researchers who wish to apply these thresholds in their studies should multiply these thresholds by the size of the database they search against.Our findings should be useful to those who wish to transfer transporter functional annotations across species.

View Article: PubMed Central - HTML - PubMed

Affiliation: Center for Bioinformatics, Saarland University, Postfach 15 11 50, 66041 Saarbrücken, Germany. volkhard.helms@bioinformatik.uni-saarland.de.

ABSTRACT

Background: Membrane transporters catalyze the transport of small solute molecules across biological barriers such as lipid bilayer membranes. Experimental identification of the transported substrates is very tedious. Once a particular transport mechanism has been identified in one organism, it is thus highly desirable to transfer this information to related transporter sequences in different organisms based on bioinformatics evidence.

Results: We present a thorough benchmark at which level of sequence identity membrane transporters from Escherichia coli, Saccharomyces cerevisiae, and Arabidopsis thaliana belong to the same families of the Transporter Classification (TC) system, and at what level these membrane transporters mediate the transport of the same substrate. We found that two membrane transporter sequences from different organisms that are aligned with normalized BLAST expectation value better than E-value 1e-8 are highly likely to belong to the same TC family (F-measure around 90%). Enriched sequence motifs identified by MEME at thresholds below 1e-12 support accurate classification into TC families for about two thirds of the sequences (F-measure 80% and higher). For the comparison of transported substrates, we focused on the four largest substrate classes of amino acids, sugars, metal ions, and phosphate. At similar identity thresholds, the nature of the transported substrates was more divergent (F-measure 40--75% at the same thresholds) than the TC family membership.

Conclusions: We suggest an acceptable threshold of 1e-8 for BLAST and HMMER where at least three quarters of the sequences are classified according to the TC system with a reasonably high accuracy. Researchers who wish to apply these thresholds in their studies should multiply these thresholds by the size of the database they search against. Our findings should be useful to those who wish to transfer transporter functional annotations across species.

Show MeSH

Related in: MedlinePlus

Heatmap of BLASTing Sc substrate_TC families against At families. BLAST homology search of 69 Sc transporters against 84 At transporters from 4 substrate families (amino acids, sugars, phosphates, metals) and 13 TC families (Sc) and 12 TC families (At). The grey scale follows a logarithmic scheme where white means no match better than normalized E < e-04 and black means the best matches better than E < e-20. Families generally match their substrate_TC families. However, they may also match TC families from different substrate_TC families.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4219331&req=5

Figure 3: Heatmap of BLASTing Sc substrate_TC families against At families. BLAST homology search of 69 Sc transporters against 84 At transporters from 4 substrate families (amino acids, sugars, phosphates, metals) and 13 TC families (Sc) and 12 TC families (At). The grey scale follows a logarithmic scheme where white means no match better than normalized E < e-04 and black means the best matches better than E < e-20. Families generally match their substrate_TC families. However, they may also match TC families from different substrate_TC families.

Mentions: Comparison of the two preceding sections shows that substrate families have less sequence similarity on average compared to TC families. Now, we tested the combination of both properties, see Figure 3. We performed this comparison in a systematic way. For this, we named the extracted families in the form “substrate family_TC family”. The four substrate families (amino acids, sugars, phosphates, metals) belong to 19 TC families in Ec, 13 in At and 14 in Sc. 7 families substrate-TC are shared between Ec and At, 7 also are shared between Ec and Sc and 11 are shared between Sc and At. Some TC families belong to many different substrate families like the family 3.A.1 that contains members of 4 Ec substrate families. We used BLAST to analyze the affiliation of test sequences toward their TC or substrate families. Here, only the best match of each substrate_TC family is considered. The heatmap in Figure 3 shows the tendency of Sc sequences to match their analogues from At TC or substrate families. Some Sc transporters matched strongly (black rectangles) their actual substrate_TC families from At like sugar_2.A.1, phosphate_2.A.1 and metal_2.A.55. However, most sequences from shared TC families had weaker matches to their TC families rather than their substrate families. Similar results were obtained in the Ec-At and Ec-Sc comparison, see Additional files 4: Figure S4 and Additional file 5: Figure S5. Thus, we suggest that it is beneficial to apply substrate information as a pre-filter for transporter TC family classification. On the other hand, transporters that transport the same substrate but belong to different TC families generally do not share noticeable sequence similarity. TC information can be the stand alone feature used to classify transporters but a little tuning by substrate information elevates the prediction accuracy. Misclassification will occur in the small substrate_TC families not in the big TC families.


Transferring functional annotations of membrane transporters on the basis of sequence similarity and sequence motifs.

Barghash A, Helms V - BMC Bioinformatics (2013)

Heatmap of BLASTing Sc substrate_TC families against At families. BLAST homology search of 69 Sc transporters against 84 At transporters from 4 substrate families (amino acids, sugars, phosphates, metals) and 13 TC families (Sc) and 12 TC families (At). The grey scale follows a logarithmic scheme where white means no match better than normalized E < e-04 and black means the best matches better than E < e-20. Families generally match their substrate_TC families. However, they may also match TC families from different substrate_TC families.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4219331&req=5

Figure 3: Heatmap of BLASTing Sc substrate_TC families against At families. BLAST homology search of 69 Sc transporters against 84 At transporters from 4 substrate families (amino acids, sugars, phosphates, metals) and 13 TC families (Sc) and 12 TC families (At). The grey scale follows a logarithmic scheme where white means no match better than normalized E < e-04 and black means the best matches better than E < e-20. Families generally match their substrate_TC families. However, they may also match TC families from different substrate_TC families.
Mentions: Comparison of the two preceding sections shows that substrate families have less sequence similarity on average compared to TC families. Now, we tested the combination of both properties, see Figure 3. We performed this comparison in a systematic way. For this, we named the extracted families in the form “substrate family_TC family”. The four substrate families (amino acids, sugars, phosphates, metals) belong to 19 TC families in Ec, 13 in At and 14 in Sc. 7 families substrate-TC are shared between Ec and At, 7 also are shared between Ec and Sc and 11 are shared between Sc and At. Some TC families belong to many different substrate families like the family 3.A.1 that contains members of 4 Ec substrate families. We used BLAST to analyze the affiliation of test sequences toward their TC or substrate families. Here, only the best match of each substrate_TC family is considered. The heatmap in Figure 3 shows the tendency of Sc sequences to match their analogues from At TC or substrate families. Some Sc transporters matched strongly (black rectangles) their actual substrate_TC families from At like sugar_2.A.1, phosphate_2.A.1 and metal_2.A.55. However, most sequences from shared TC families had weaker matches to their TC families rather than their substrate families. Similar results were obtained in the Ec-At and Ec-Sc comparison, see Additional files 4: Figure S4 and Additional file 5: Figure S5. Thus, we suggest that it is beneficial to apply substrate information as a pre-filter for transporter TC family classification. On the other hand, transporters that transport the same substrate but belong to different TC families generally do not share noticeable sequence similarity. TC information can be the stand alone feature used to classify transporters but a little tuning by substrate information elevates the prediction accuracy. Misclassification will occur in the small substrate_TC families not in the big TC families.

Bottom Line: At similar identity thresholds, the nature of the transported substrates was more divergent (F-measure 40--75% at the same thresholds) than the TC family membership.Researchers who wish to apply these thresholds in their studies should multiply these thresholds by the size of the database they search against.Our findings should be useful to those who wish to transfer transporter functional annotations across species.

View Article: PubMed Central - HTML - PubMed

Affiliation: Center for Bioinformatics, Saarland University, Postfach 15 11 50, 66041 Saarbrücken, Germany. volkhard.helms@bioinformatik.uni-saarland.de.

ABSTRACT

Background: Membrane transporters catalyze the transport of small solute molecules across biological barriers such as lipid bilayer membranes. Experimental identification of the transported substrates is very tedious. Once a particular transport mechanism has been identified in one organism, it is thus highly desirable to transfer this information to related transporter sequences in different organisms based on bioinformatics evidence.

Results: We present a thorough benchmark at which level of sequence identity membrane transporters from Escherichia coli, Saccharomyces cerevisiae, and Arabidopsis thaliana belong to the same families of the Transporter Classification (TC) system, and at what level these membrane transporters mediate the transport of the same substrate. We found that two membrane transporter sequences from different organisms that are aligned with normalized BLAST expectation value better than E-value 1e-8 are highly likely to belong to the same TC family (F-measure around 90%). Enriched sequence motifs identified by MEME at thresholds below 1e-12 support accurate classification into TC families for about two thirds of the sequences (F-measure 80% and higher). For the comparison of transported substrates, we focused on the four largest substrate classes of amino acids, sugars, metal ions, and phosphate. At similar identity thresholds, the nature of the transported substrates was more divergent (F-measure 40--75% at the same thresholds) than the TC family membership.

Conclusions: We suggest an acceptable threshold of 1e-8 for BLAST and HMMER where at least three quarters of the sequences are classified according to the TC system with a reasonably high accuracy. Researchers who wish to apply these thresholds in their studies should multiply these thresholds by the size of the database they search against. Our findings should be useful to those who wish to transfer transporter functional annotations across species.

Show MeSH
Related in: MedlinePlus