Limits...
Domain annotation of trimeric autotransporter adhesins--daTAA.

Szczesny P, Lupas A - Bioinformatics (2008)

Bottom Line: Their high sequence diversity and distinct mosaic-like structure lead to difficulties in the annotation of their sequences.These stem from the large number of short repeats, the presence of compositionally unusual coiled-coils, fuzzy domain boundaries and regions of seemingly low sequence complexity.Compared to general domain annotation servers such as PFAM, daTAA captures more domains and provides more sensitive domain detection, as well as integrated and detailed coiled-coil assignments.

View Article: PubMed Central - PubMed

Affiliation: Department of Protein Evolution, Max-Planck Institute for Developmental Biology, Spemannstr 35, 72076 Tuebingen, Germany.

ABSTRACT

Motivation: Trimeric autotransporter adhesins (TAAs), such as Yersinia YadA, Neisseria NadA, Moraxella UspAs, Haemophilus Hia and Bartonella BadA, are important pathogenicity factors of proteobacteria. Their high sequence diversity and distinct mosaic-like structure lead to difficulties in the annotation of their sequences. These stem from the large number of short repeats, the presence of compositionally unusual coiled-coils, fuzzy domain boundaries and regions of seemingly low sequence complexity.

Results: We have developed a workflow, named daTAA, for the accurate domain annotation of TAAs. Its core consists of manually curated alignments and of knowledge-based rules that enhance assignments made by sequence similarity. Compared to general domain annotation servers such as PFAM, daTAA captures more domains and provides more sensitive domain detection, as well as integrated and detailed coiled-coil assignments.

Availability: The daTAA server is freely accessible at http://toolkit.tuebingen.mpg.de/dataa

Show MeSH

Related in: MedlinePlus

Details of daTAA and PFAM performance in comparison with manual annotation. The two sets of sequences are as described in the Methods. In each group, the first box denotes the PFAM annotation, the second daTAA and the third the manual annotation. Matches are colored according to their functional class as autotransporter signal peptide: blue, heads: red, connectors: green, stalks: yellow, anchor: grey. Set A. 1. gi/153095004/ Mannheimia haemolytica PHL213 2. gi/149190224/ Vibrio shilonii AK1 3. gi/154149446/ Campylobacter hominis ATCC BAA-381 4. gi/153834639/ Vibrio harvei HY01 5. gi/153093295/ Mannheimia haemolytica PHL213 6. gi/150380584/ Shewanella sediminis HAW-EB3 7. gi/148827620/ Haemophilus influenzae PittGG 8. gi/154149537/ Campylobacter hominis ATCC BAA-381 9. gi/149909020/ Moritella sp. PE36 Set B. 1. gi/78061293/ Burkholderia sp. 383 2. gi/161017094/ Bartonella tribocorum CIP 105476 3. gi/161505469/ Salmonella enterica subsp. arizonae 4. gi/156124985/ Acinetobacter venetianus 5. gi/157145682/ Citrobacter koseri ATCC BAA895 6. gi/155199120/ Escherichia coli 7. gi/86750771/ Rhodopseudomonas palustris HaA2 8. gi/85709253/ Erythrobacter sp. NAP1 9. gi/162429157/ Methylobacterium nodulans ORS 2060.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2373917&req=5

Figure 3: Details of daTAA and PFAM performance in comparison with manual annotation. The two sets of sequences are as described in the Methods. In each group, the first box denotes the PFAM annotation, the second daTAA and the third the manual annotation. Matches are colored according to their functional class as autotransporter signal peptide: blue, heads: red, connectors: green, stalks: yellow, anchor: grey. Set A. 1. gi/153095004/ Mannheimia haemolytica PHL213 2. gi/149190224/ Vibrio shilonii AK1 3. gi/154149446/ Campylobacter hominis ATCC BAA-381 4. gi/153834639/ Vibrio harvei HY01 5. gi/153093295/ Mannheimia haemolytica PHL213 6. gi/150380584/ Shewanella sediminis HAW-EB3 7. gi/148827620/ Haemophilus influenzae PittGG 8. gi/154149537/ Campylobacter hominis ATCC BAA-381 9. gi/149909020/ Moritella sp. PE36 Set B. 1. gi/78061293/ Burkholderia sp. 383 2. gi/161017094/ Bartonella tribocorum CIP 105476 3. gi/161505469/ Salmonella enterica subsp. arizonae 4. gi/156124985/ Acinetobacter venetianus 5. gi/157145682/ Citrobacter koseri ATCC BAA895 6. gi/155199120/ Escherichia coli 7. gi/86750771/ Rhodopseudomonas palustris HaA2 8. gi/85709253/ Erythrobacter sp. NAP1 9. gi/162429157/ Methylobacterium nodulans ORS 2060.

Mentions: In order to evaluate daTAA annotations, we did a three-way comparison between daTAA, PFAM and manual annotation on a test set of recently deposited TAA sequences, as described in the Methods (Fig. 3). The coverage achieved by daTAA was 50%, against 28% obtained by PFAM and 56% obtained manually. The three domain types present in both daTAA and PFAM (Table 1; Ylhead = Hep_Hag repeat; neck = HIM motif; membrane anchor = YadA family) accounted for one-third of total residues (Table 2); DUF1079 was not considered as it only occurs in Moraxella UspAs, none of which was present in our test set. daTAA predicted all three types accurately: it only missed three variant necks and a small number of divergent Ylhead repeats, and overpredicted one Ylhead repeat in a segment identified by manual annotation as a new motif (HIM2). Although PFAM also performed very well, considering that it is a general domain annotation system, it had issues with both domain recognition and domain boundary definitions. Thus, it failed to identify one-third of the anchor domains and assigned the others in a shortened form, omitting part of the coiled-coil that forms the N-terminal third of this domain. Its performance on neck sequences was mixed: it did identify two of the three variant necks (only partly, however, and without recognizing that they were disrupted by longer insertions), but it overpredicted two additional necks. Finally, it predicted the regions of Ylhead repeats well, but not the repeats themselves, as its profile includes two repeats and does not coincide with the ends of the constituent β-strands (Fig. 4).Fig. 3.


Domain annotation of trimeric autotransporter adhesins--daTAA.

Szczesny P, Lupas A - Bioinformatics (2008)

Details of daTAA and PFAM performance in comparison with manual annotation. The two sets of sequences are as described in the Methods. In each group, the first box denotes the PFAM annotation, the second daTAA and the third the manual annotation. Matches are colored according to their functional class as autotransporter signal peptide: blue, heads: red, connectors: green, stalks: yellow, anchor: grey. Set A. 1. gi/153095004/ Mannheimia haemolytica PHL213 2. gi/149190224/ Vibrio shilonii AK1 3. gi/154149446/ Campylobacter hominis ATCC BAA-381 4. gi/153834639/ Vibrio harvei HY01 5. gi/153093295/ Mannheimia haemolytica PHL213 6. gi/150380584/ Shewanella sediminis HAW-EB3 7. gi/148827620/ Haemophilus influenzae PittGG 8. gi/154149537/ Campylobacter hominis ATCC BAA-381 9. gi/149909020/ Moritella sp. PE36 Set B. 1. gi/78061293/ Burkholderia sp. 383 2. gi/161017094/ Bartonella tribocorum CIP 105476 3. gi/161505469/ Salmonella enterica subsp. arizonae 4. gi/156124985/ Acinetobacter venetianus 5. gi/157145682/ Citrobacter koseri ATCC BAA895 6. gi/155199120/ Escherichia coli 7. gi/86750771/ Rhodopseudomonas palustris HaA2 8. gi/85709253/ Erythrobacter sp. NAP1 9. gi/162429157/ Methylobacterium nodulans ORS 2060.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2373917&req=5

Figure 3: Details of daTAA and PFAM performance in comparison with manual annotation. The two sets of sequences are as described in the Methods. In each group, the first box denotes the PFAM annotation, the second daTAA and the third the manual annotation. Matches are colored according to their functional class as autotransporter signal peptide: blue, heads: red, connectors: green, stalks: yellow, anchor: grey. Set A. 1. gi/153095004/ Mannheimia haemolytica PHL213 2. gi/149190224/ Vibrio shilonii AK1 3. gi/154149446/ Campylobacter hominis ATCC BAA-381 4. gi/153834639/ Vibrio harvei HY01 5. gi/153093295/ Mannheimia haemolytica PHL213 6. gi/150380584/ Shewanella sediminis HAW-EB3 7. gi/148827620/ Haemophilus influenzae PittGG 8. gi/154149537/ Campylobacter hominis ATCC BAA-381 9. gi/149909020/ Moritella sp. PE36 Set B. 1. gi/78061293/ Burkholderia sp. 383 2. gi/161017094/ Bartonella tribocorum CIP 105476 3. gi/161505469/ Salmonella enterica subsp. arizonae 4. gi/156124985/ Acinetobacter venetianus 5. gi/157145682/ Citrobacter koseri ATCC BAA895 6. gi/155199120/ Escherichia coli 7. gi/86750771/ Rhodopseudomonas palustris HaA2 8. gi/85709253/ Erythrobacter sp. NAP1 9. gi/162429157/ Methylobacterium nodulans ORS 2060.
Mentions: In order to evaluate daTAA annotations, we did a three-way comparison between daTAA, PFAM and manual annotation on a test set of recently deposited TAA sequences, as described in the Methods (Fig. 3). The coverage achieved by daTAA was 50%, against 28% obtained by PFAM and 56% obtained manually. The three domain types present in both daTAA and PFAM (Table 1; Ylhead = Hep_Hag repeat; neck = HIM motif; membrane anchor = YadA family) accounted for one-third of total residues (Table 2); DUF1079 was not considered as it only occurs in Moraxella UspAs, none of which was present in our test set. daTAA predicted all three types accurately: it only missed three variant necks and a small number of divergent Ylhead repeats, and overpredicted one Ylhead repeat in a segment identified by manual annotation as a new motif (HIM2). Although PFAM also performed very well, considering that it is a general domain annotation system, it had issues with both domain recognition and domain boundary definitions. Thus, it failed to identify one-third of the anchor domains and assigned the others in a shortened form, omitting part of the coiled-coil that forms the N-terminal third of this domain. Its performance on neck sequences was mixed: it did identify two of the three variant necks (only partly, however, and without recognizing that they were disrupted by longer insertions), but it overpredicted two additional necks. Finally, it predicted the regions of Ylhead repeats well, but not the repeats themselves, as its profile includes two repeats and does not coincide with the ends of the constituent β-strands (Fig. 4).Fig. 3.

Bottom Line: Their high sequence diversity and distinct mosaic-like structure lead to difficulties in the annotation of their sequences.These stem from the large number of short repeats, the presence of compositionally unusual coiled-coils, fuzzy domain boundaries and regions of seemingly low sequence complexity.Compared to general domain annotation servers such as PFAM, daTAA captures more domains and provides more sensitive domain detection, as well as integrated and detailed coiled-coil assignments.

View Article: PubMed Central - PubMed

Affiliation: Department of Protein Evolution, Max-Planck Institute for Developmental Biology, Spemannstr 35, 72076 Tuebingen, Germany.

ABSTRACT

Motivation: Trimeric autotransporter adhesins (TAAs), such as Yersinia YadA, Neisseria NadA, Moraxella UspAs, Haemophilus Hia and Bartonella BadA, are important pathogenicity factors of proteobacteria. Their high sequence diversity and distinct mosaic-like structure lead to difficulties in the annotation of their sequences. These stem from the large number of short repeats, the presence of compositionally unusual coiled-coils, fuzzy domain boundaries and regions of seemingly low sequence complexity.

Results: We have developed a workflow, named daTAA, for the accurate domain annotation of TAAs. Its core consists of manually curated alignments and of knowledge-based rules that enhance assignments made by sequence similarity. Compared to general domain annotation servers such as PFAM, daTAA captures more domains and provides more sensitive domain detection, as well as integrated and detailed coiled-coil assignments.

Availability: The daTAA server is freely accessible at http://toolkit.tuebingen.mpg.de/dataa

Show MeSH
Related in: MedlinePlus